Using the TED Talks to Evaluate Spoken Post-editing of Machine Translation

Jeevanthi Liyanapathirana, Andrei Popescu-Belis

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    14 Downloads (Pure)

    Abstract

    This paper presents a solution to evaluate spoken post-editing of imperfect machine translation output by a human translator. We compare two approaches to the combination of machine translation (MT) and automatic speech recognition (ASR): a heuristic algorithm and a machine learning method. To obtain a data set with spoken post-editing information, we use the French version of TED talks as the source texts submitted to MT, and the spoken English counterparts as their corrections, which are submitted to an ASR system. We experiment with various levels of artificial ASR noise and also with a state-of-the-art ASR system. The results show that the combination of MT with ASR improves over both individual outputs of MT and ASR in terms of BLEU scores, especially when ASR performance is low.
    Original languageEnglish
    Title of host publicationThe LREC 2016 Proceedings : Tenth International Conference on Language Resources and Evaluation
    EditorsNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
    Number of pages8
    Place of PublicationParis
    PublisherEuropean Language Resources Association
    Publication date2016
    Pages2232-2239
    ISBN (Electronic)9782951740891
    Publication statusPublished - 2016
    EventThe 10th International Conference on Language Resources and Evaluation. LREC 2016 - Portorož, Slovenia
    Duration: 23 May 201628 May 2016
    Conference number: 10
    http://lrec2016.lrec-conf.org/en/

    Conference

    ConferenceThe 10th International Conference on Language Resources and Evaluation. LREC 2016
    Number10
    CountrySlovenia
    CityPortorož
    Period23/05/201628/05/2016
    Internet address

    Keywords

    • Machine translation
    • Spoken post-editing
    • Evaluation

    Cite this