Abstract
This paper presents a solution to evaluate spoken post-editing of imperfect machine translation output by a human translator. We compare two approaches to the combination of machine translation (MT) and automatic speech recognition (ASR): a heuristic algorithm and a machine learning method. To obtain a data set with spoken post-editing information, we use the French version of TED talks as the source texts submitted to MT, and the spoken English counterparts as their corrections, which are submitted to an ASR system. We experiment with various levels of artificial ASR noise and also with a state-of-the-art ASR system. The results show that the combination of MT with ASR improves over both individual outputs of MT and ASR in terms of BLEU scores, especially when ASR performance is low.
Original language | English |
---|---|
Title of host publication | The LREC 2016 Proceedings : Tenth International Conference on Language Resources and Evaluation |
Editors | Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis |
Number of pages | 8 |
Place of Publication | Paris |
Publisher | European Language Resources Association |
Publication date | 2016 |
Pages | 2232-2239 |
ISBN (Electronic) | 9782951740891 |
Publication status | Published - 2016 |
Event | The 10th International Conference on Language Resources and Evaluation. LREC 2016 - Portorož, Slovenia Duration: 23 May 2016 → 28 May 2016 Conference number: 10 http://lrec2016.lrec-conf.org/en/ |
Conference
Conference | The 10th International Conference on Language Resources and Evaluation. LREC 2016 |
---|---|
Number | 10 |
Country/Territory | Slovenia |
City | Portorož |
Period | 23/05/2016 → 28/05/2016 |
Internet address |
Keywords
- Machine translation
- Spoken post-editing
- Evaluation