Analysis of Phonetic Transcriptions for Danish Automatic Speech Recognition

Andreas Søeborg Kirkedal

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review


    Automatic speech recognition (ASR) relies on three resources: audio, orthographic transcriptions and a pronunciation dictionary. The dictionary or lexicon maps orthographic words to sequences of phones or phonemes that represent the pronunciation of the corresponding word. The quality of a speech recognition system depends heavily on the dictionary and the transcriptions therein. This paper presents an analysis of phonetic/phonemic features that are salient for current Danish ASR systems. This preliminary study consists of a series of experiments using an ASR system trained on the DK-PAROLE corpus. The analysis indicates that transcribing e.g. stress or vowel duration has a negative impact on performance. The best performance is obtained with coarse phonetic annotation and improves performance 1% word error rate and 3.8% sentence error rate.
    Original languageEnglish
    Title of host publicationProceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)
    EditorsStephan Oepen, Kristin Hagen, Janne Bondi Johannessen
    Place of PublicationLinköping
    PublisherLinköping University Electronic Press
    Publication date2013
    ISBN (Print)9789175195896
    Publication statusPublished - 2013
    EventNODALIDA 2013: The 19th Nordic Conference of Computational Linguistics - University of Oslo, Oslo, Norway
    Duration: 22 May 201324 May 2013
    Conference number: 19


    ConferenceNODALIDA 2013
    LocationUniversity of Oslo
    Internet address
    SeriesNEALT (Northern European Association of Language Technology) Proceedings Series

    Cite this