Nature Identical Prosody: Data-driven Prosodic Feature Assignment for Diphone Synthesis

Peter Juel Henrichsen

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    32 Downloads (Pure)

    Abstract

    Today's synthetic voices are largely based on diphone synthesis (DiSyn) and unit selection synthesis (UnitSyn). In most DiSyn systems, prosodic envelopes are generated with formal models while UnitSyn systems refer to extensive, highly
    indexed sound databases. Each approach has its drawbacks; such as low naturalness (DiSyn) and dependence on huge amounts of background data (UnitSyn). We present a hybrid model based on high-level speech data. As preliminary tests show, prosodic models combining DiSyn style at the phone level with UnitSyn style at the supra-segmental levels may approach UnitSyn quality on a DiSyn footprint. Our test data are Danish, but our algorithm is language neutral.
    Original languageEnglish
    Title of host publicationSLTC 2012 : Proceedings of the Conference
    Number of pages2
    Place of PublicationLund
    PublisherLund University
    Publication date2012
    Pages37-38
    Publication statusPublished - 2012
    EventThe Fourth Swedish Language Technology Conference 2012 - Lund, Sweden
    Duration: 24 Oct 201226 Oct 2012
    Conference number: 4

    Conference

    ConferenceThe Fourth Swedish Language Technology Conference 2012
    Number4
    Country/TerritorySweden
    CityLund
    Period24/10/201226/10/2012

    Cite this