English-to-Japanese Translation vs. Dictation vs. Post-editing: Comparing Translation Modes in a Multilingual Setting

Michael Carl, Akiko Aizawa, Masaru Yamada

    Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningpeer review

    145 Downloads (Pure)

    Abstract

    Speech-enabled interfaces have the potential to become one of the most efficient and ergonomic environments for human-computer interaction and for text production. However, not much research has been carried out to investigate in detail the processes and strategies involved in the different modes of text production. This paper introduces and evaluates a corpus of more than 55 hours of English-to-Japanese user activity data that were collected within the ENJA15 project, in which translators were observed while writing and speaking translations (translation dictation) and during machine translation post-editing. The transcription of the spoken data, keyboard logging and eye-tracking data were recorded with Translog-II, post-processed and integrated into the CRITT Translation Process Research-DB (TPR-DB), which is publicly available under a creative commons license. The paper presents the ENJA15 data as part of a large multilingual Chinese, Danish, German, Hindi and Spanish translation process data collection of more than 760 translation sessions. It compares the ENJA15 data with the other language pairs and reviews some of its particularities.
    OriginalsprogEngelsk
    TitelThe LREC 2016 Proceedings : Tenth International Conference on Language Resources and Evaluation
    RedaktørerNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
    UdgivelsesstedParis
    ForlagEuropean Language Resources Association
    Publikationsdato2016
    Sider4024-4031
    ISBN (Elektronisk)9782951740891
    StatusUdgivet - 2016
    BegivenhedThe 10th International Conference on Language Resources and Evaluation. LREC 2016 - Portorož, Slovenien
    Varighed: 23 maj 201628 maj 2016
    Konferencens nummer: 10
    http://lrec2016.lrec-conf.org/en/

    Konference

    KonferenceThe 10th International Conference on Language Resources and Evaluation. LREC 2016
    Nummer10
    Land/OmrådeSlovenien
    ByPortorož
    Periode23/05/201628/05/2016
    Internetadresse

    Bibliografisk note

    Paper accepted for LREC2016 under the title: "ENJA15: A Free Corpus of English à Japanese Translation Process Data".

    Citationsformater