Abstract
Automatic speech recognition (ASR) relies on three resources: audio, orthographic transcriptions and a pronunciation dictionary. The dictionary or lexicon maps orthographic words to sequences of phones or phonemes that represent the pronunciation of the corresponding word. The quality of a speech recognition system depends heavily on the dictionary and the transcriptions therein. This paper presents an analysis of phonetic/phonemic features that are salient for current Danish ASR systems. This preliminary study consists of a series of experiments using an ASR system trained on the DK-PAROLE corpus. The analysis indicates that transcribing e.g. stress or vowel duration has a negative impact on performance. The best performance is obtained with coarse phonetic annotation and improves performance 1% word error rate and 3.8% sentence error rate.
Original language | English |
---|---|
Title of host publication | Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013) |
Editors | Stephan Oepen, Kristin Hagen, Janne Bondi Johannessen |
Place of Publication | Linköping |
Publisher | Linköping University Electronic Press |
Publication date | 2013 |
Pages | 321-330 |
ISBN (Print) | 9789175195896 |
Publication status | Published - 2013 |
Event | NODALIDA 2013: The 19th Nordic Conference of Computational Linguistics - University of Oslo, Oslo, Norway Duration: 22 May 2013 → 24 May 2013 Conference number: 19 http://www.hf.uio.no/iln/english/research/news-and-events/events/conferences/2013/nodalida/index.html |
Conference
Conference | NODALIDA 2013 |
---|---|
Number | 19 |
Location | University of Oslo |
Country/Territory | Norway |
City | Oslo |
Period | 22/05/2013 → 24/05/2013 |
Internet address |
Series | NEALT (Northern European Association of Language Technology) Proceedings Series |
---|---|
Volume | 16 |
ISSN | 1736-6305 |