English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting

Michael Carl, Akiko Aizawa, Masaru Yamada

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    137 Downloads (Pure)


    Speech-enabled interfaces have the potential to become one of the most efficient and ergonomic environments for human-computer interaction and for text production. However, not much research has been carried out to investigate in detail the processes and strategies involved in the different modes of text production. This paper introduces and evaluates a corpus of more than 55 hours of English-to-Japanese user activity data that were collected within the ENJA15 project, in which translators were observed while writing and speaking translations (translation dictation) and during machine translation post-editing. The transcription of the spoken data, keyboard logging and eye-tracking data were recorded with Translog-II, post-processed and integrated into the CRITT Translation Process Research-DB (TPR-DB), which is publicly available under a creative commons license. The paper presents the ENJA15 data as part of a large multilingual Chinese, Danish, German, Hindi and Spanish translation process data collection of more than 760 translation sessions. It compares the ENJA15 data with the other language pairs and reviews some of its particularities.
    Original languageEnglish
    Title of host publicationThe LREC 2016 Proceedings : Tenth International Conference on Language Resources and Evaluation
    EditorsNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
    Place of PublicationParis
    PublisherEuropean Language Resources Association
    Publication date2016
    ISBN (Electronic)9782951740891
    Publication statusPublished - 2016
    EventThe 10th International Conference on Language Resources and Evaluation. LREC 2016 - Portorož, Slovenia
    Duration: 23 May 201628 May 2016
    Conference number: 10


    ConferenceThe 10th International Conference on Language Resources and Evaluation. LREC 2016
    Internet address

    Bibliographical note

    Paper accepted for LREC2016 under the title: "ENJA15: A Free Corpus of English à Japanese Translation Process Data".

    Cite this