ENJA15: A Free Corpus of English - Japanese Translation Process

Michael Carl, Akiko Aizawa, Masaru Yamada

    Research output: Contribution to conferencePaperResearchpeer-review


    Speech-enabled interfaces have the potential to become one of the most efficient and ergonomic environments for human-computer interfaces, text production and documentation. This paper introduces and evaluates a corpus of more than 55 hours of English-to-Japanese user activity data that were collected within the ENJA15 project, in which translators were observed while writing and speaking translations (translation dictation) and during machine translation post-editing. The transcription of the spoken data, keyboard logging and eye-tracking data were recorded with Translog-II, post-processed and integrated into the CRITT TPR-DB1, which is publicly available under a creative commons license.
    The paper also evaluates a subset corpus, the JTD16 study, a collection of translation dictation activity data produced by seven participants who were tasked with additional six sessions of dictation assignments over six consecutive days. This longitudinal study aims at investigating translators’ learning effect in the automatic speech recognition (ASR) environment.
    A preliminary evaluation of the data was conducted with respect to the productivity of the three translation modes where we found that translation dictation is nearly as quick as post-editing. We also investigate typing activity (i.e. the number of insertions and deletions) and properties of the text production units where it was observed that translation dictation involves longer production unit with high number of coherent insertions. The longitudinal study on dictation also finds that learning effect over six consecutive days is minimal.
    Original languageEnglish
    Publication dateJun 2016
    Number of pages11
    Publication statusPublished - Jun 2016
    EventThe Eighth Asia-Pacific Translation and Interpreting Forum. APTIF 2016 - Xi'an, China
    Duration: 17 Jun 201618 Jun 2016
    Conference number: 8


    ConferenceThe Eighth Asia-Pacific Translation and Interpreting Forum. APTIF 2016
    Internet address


    • Translation dictation
    • Speech recognition
    • Translation process research

    Cite this