Automatic Evaluation of Machine Translation: Correlating Post-editing Effort and Translation Edit Rate (TER) Scores

Mercedes Garcia Martinez, Arlene Koglin, Bartolomé Mesa-Lao, Michael Carl

    Publikation: Bidrag til bog/antologi/rapportKonferenceabstrakt i proceedingsForskningpeer review

    Resumé

    The availability of systems capable of producing fairly accurate translations has increased the popularity of machine translation (MT). The translation industry is steadily incorporating MT in their workflows engaging the human translator to post-edit the raw MT output in order to comply with a set of quality criteria in as few edits as possible. The quality of MT systems is generally measured by automatic metrics, producing scores that should correlate with human evaluation.In this study, we investigate correlations between one of such metrics, i.e. Translation Edit Rate (TER), and actual post-editing effort as it is shown in post-editing process data collected under experimental conditions. Using the CasMaCat workbench as a post-editing tool, process data were collected using keystrokes and eye-tracking data from five professional translators under two different conditions: i) traditional post-editing and ii) interactive post-editing. In the second condition, as the user types, the MT system suggests alternative target translations which the post-editor can interactively accept or overwrite, whereas in the first condition no aids are provided to the user while editing the raw MT output. Each one of the five participants was asked to post-edit 12 different texts using the interactivity provided by the system and 12 additional texts without interactivity (i.e. traditional post-editing) over a period of 6 weeks.Process research in post-editing is often grounded on three different but related categories of post-editing effort, namely i) temporal (time), ii) cognitive (mental processes) and iii) technical (keyboard activity). For the purposes of this research, TER scores were correlated with two different indicators of post-editing effort as computed in the CRITT Translation Process Database (TPR-DB) *. On the one hand, post-editing temporal effort was measured using FDur values (duration of segment production time excluding keystroke pauses >_ 200 seconds) and KDur values (duration of coherent keyboard activity excluding keystroke pauses >_ 5 seconds). On the other hand, post-editing technical effort was measured using Mdel values (number of manually generated deletions) and Mins values (number of manually generated insertions).Results show that TER scores have a positive correlation with actual post-editing effort as reflected in the form of manual insertions and deletions (Mins/Mdel) as well as time to perform the task (KDur/FDur).
    OriginalsprogEngelsk
    TitelBooks of Abstracts of the 5th IATIS Conference : Innovation Paths in Translation and Intercultural Studies
    RedaktørerFábio Alves, Adriana Silvina Pagano, Arthur de Melo Sá, Kícila Ferreguetti
    Antal sider1
    Udgivelses stedBelo Horizonte
    ForlagInternational Association for Translation and Intercultural Studies. IATIS
    Publikationsdato2015
    Sider150
    StatusUdgivet - 2015
    BegivenhedIATIS 5th International Conference: Innovation Paths in Translation and Intercultural Studies - Belo Horizonte, Brasilien
    Varighed: 7 jul. 201510 jul. 2015
    Konferencens nummer: 5
    http://www.iatis.org/index.php/iatis-belo-horizonte-conference/itemlist/category/195-main-programme

    Konference

    KonferenceIATIS 5th International Conference
    Nummer5
    LandBrasilien
    ByBelo Horizonte
    Periode07/07/201510/07/2015
    Internetadresse

    Citer dette

    Martinez, M. G., Koglin, A., Mesa-Lao, B., & Carl, M. (2015). Automatic Evaluation of Machine Translation: Correlating Post-editing Effort and Translation Edit Rate (TER) Scores. I F. Alves, A. S. Pagano, A. de Melo Sá, & K. Ferreguetti (red.), Books of Abstracts of the 5th IATIS Conference: Innovation Paths in Translation and Intercultural Studies (s. 150). Belo Horizonte: International Association for Translation and Intercultural Studies. IATIS.
    Martinez, Mercedes Garcia ; Koglin, Arlene ; Mesa-Lao, Bartolomé ; Carl, Michael. / Automatic Evaluation of Machine Translation : Correlating Post-editing Effort and Translation Edit Rate (TER) Scores. Books of Abstracts of the 5th IATIS Conference: Innovation Paths in Translation and Intercultural Studies. red. / Fábio Alves ; Adriana Silvina Pagano ; Arthur de Melo Sá ; Kícila Ferreguetti. Belo Horizonte : International Association for Translation and Intercultural Studies. IATIS, 2015. s. 150
    @inbook{412d0a95fd764aaabdcca440e7097f3a,
    title = "Automatic Evaluation of Machine Translation: Correlating Post-editing Effort and Translation Edit Rate (TER) Scores",
    abstract = "The availability of systems capable of producing fairly accurate translations has increased the popularity of machine translation (MT). The translation industry is steadily incorporating MT in their workflows engaging the human translator to post-edit the raw MT output in order to comply with a set of quality criteria in as few edits as possible. The quality of MT systems is generally measured by automatic metrics, producing scores that should correlate with human evaluation.In this study, we investigate correlations between one of such metrics, i.e. Translation Edit Rate (TER), and actual post-editing effort as it is shown in post-editing process data collected under experimental conditions. Using the CasMaCat workbench as a post-editing tool, process data were collected using keystrokes and eye-tracking data from five professional translators under two different conditions: i) traditional post-editing and ii) interactive post-editing. In the second condition, as the user types, the MT system suggests alternative target translations which the post-editor can interactively accept or overwrite, whereas in the first condition no aids are provided to the user while editing the raw MT output. Each one of the five participants was asked to post-edit 12 different texts using the interactivity provided by the system and 12 additional texts without interactivity (i.e. traditional post-editing) over a period of 6 weeks.Process research in post-editing is often grounded on three different but related categories of post-editing effort, namely i) temporal (time), ii) cognitive (mental processes) and iii) technical (keyboard activity). For the purposes of this research, TER scores were correlated with two different indicators of post-editing effort as computed in the CRITT Translation Process Database (TPR-DB) *. On the one hand, post-editing temporal effort was measured using FDur values (duration of segment production time excluding keystroke pauses >_ 200 seconds) and KDur values (duration of coherent keyboard activity excluding keystroke pauses >_ 5 seconds). On the other hand, post-editing technical effort was measured using Mdel values (number of manually generated deletions) and Mins values (number of manually generated insertions).Results show that TER scores have a positive correlation with actual post-editing effort as reflected in the form of manual insertions and deletions (Mins/Mdel) as well as time to perform the task (KDur/FDur).",
    author = "Martinez, {Mercedes Garcia} and Arlene Koglin and Bartolom{\'e} Mesa-Lao and Michael Carl",
    year = "2015",
    language = "English",
    pages = "150",
    editor = "F{\'a}bio Alves and Pagano, {Adriana Silvina} and {de Melo S{\'a}}, Arthur and K{\'i}cila Ferreguetti",
    booktitle = "Books of Abstracts of the 5th IATIS Conference",
    publisher = "International Association for Translation and Intercultural Studies. IATIS",
    address = "WWW",

    }

    Martinez, MG, Koglin, A, Mesa-Lao, B & Carl, M 2015, Automatic Evaluation of Machine Translation: Correlating Post-editing Effort and Translation Edit Rate (TER) Scores. i F Alves, AS Pagano, A de Melo Sá & K Ferreguetti (red), Books of Abstracts of the 5th IATIS Conference: Innovation Paths in Translation and Intercultural Studies. International Association for Translation and Intercultural Studies. IATIS, Belo Horizonte, s. 150, IATIS 5th International Conference, Belo Horizonte, Brasilien, 07/07/2015.

    Automatic Evaluation of Machine Translation : Correlating Post-editing Effort and Translation Edit Rate (TER) Scores. / Martinez, Mercedes Garcia; Koglin, Arlene; Mesa-Lao, Bartolomé ; Carl, Michael.

    Books of Abstracts of the 5th IATIS Conference: Innovation Paths in Translation and Intercultural Studies. red. / Fábio Alves; Adriana Silvina Pagano; Arthur de Melo Sá; Kícila Ferreguetti. Belo Horizonte : International Association for Translation and Intercultural Studies. IATIS, 2015. s. 150.

    Publikation: Bidrag til bog/antologi/rapportKonferenceabstrakt i proceedingsForskningpeer review

    TY - ABST

    T1 - Automatic Evaluation of Machine Translation

    T2 - Correlating Post-editing Effort and Translation Edit Rate (TER) Scores

    AU - Martinez, Mercedes Garcia

    AU - Koglin, Arlene

    AU - Mesa-Lao, Bartolomé

    AU - Carl, Michael

    PY - 2015

    Y1 - 2015

    N2 - The availability of systems capable of producing fairly accurate translations has increased the popularity of machine translation (MT). The translation industry is steadily incorporating MT in their workflows engaging the human translator to post-edit the raw MT output in order to comply with a set of quality criteria in as few edits as possible. The quality of MT systems is generally measured by automatic metrics, producing scores that should correlate with human evaluation.In this study, we investigate correlations between one of such metrics, i.e. Translation Edit Rate (TER), and actual post-editing effort as it is shown in post-editing process data collected under experimental conditions. Using the CasMaCat workbench as a post-editing tool, process data were collected using keystrokes and eye-tracking data from five professional translators under two different conditions: i) traditional post-editing and ii) interactive post-editing. In the second condition, as the user types, the MT system suggests alternative target translations which the post-editor can interactively accept or overwrite, whereas in the first condition no aids are provided to the user while editing the raw MT output. Each one of the five participants was asked to post-edit 12 different texts using the interactivity provided by the system and 12 additional texts without interactivity (i.e. traditional post-editing) over a period of 6 weeks.Process research in post-editing is often grounded on three different but related categories of post-editing effort, namely i) temporal (time), ii) cognitive (mental processes) and iii) technical (keyboard activity). For the purposes of this research, TER scores were correlated with two different indicators of post-editing effort as computed in the CRITT Translation Process Database (TPR-DB) *. On the one hand, post-editing temporal effort was measured using FDur values (duration of segment production time excluding keystroke pauses >_ 200 seconds) and KDur values (duration of coherent keyboard activity excluding keystroke pauses >_ 5 seconds). On the other hand, post-editing technical effort was measured using Mdel values (number of manually generated deletions) and Mins values (number of manually generated insertions).Results show that TER scores have a positive correlation with actual post-editing effort as reflected in the form of manual insertions and deletions (Mins/Mdel) as well as time to perform the task (KDur/FDur).

    AB - The availability of systems capable of producing fairly accurate translations has increased the popularity of machine translation (MT). The translation industry is steadily incorporating MT in their workflows engaging the human translator to post-edit the raw MT output in order to comply with a set of quality criteria in as few edits as possible. The quality of MT systems is generally measured by automatic metrics, producing scores that should correlate with human evaluation.In this study, we investigate correlations between one of such metrics, i.e. Translation Edit Rate (TER), and actual post-editing effort as it is shown in post-editing process data collected under experimental conditions. Using the CasMaCat workbench as a post-editing tool, process data were collected using keystrokes and eye-tracking data from five professional translators under two different conditions: i) traditional post-editing and ii) interactive post-editing. In the second condition, as the user types, the MT system suggests alternative target translations which the post-editor can interactively accept or overwrite, whereas in the first condition no aids are provided to the user while editing the raw MT output. Each one of the five participants was asked to post-edit 12 different texts using the interactivity provided by the system and 12 additional texts without interactivity (i.e. traditional post-editing) over a period of 6 weeks.Process research in post-editing is often grounded on three different but related categories of post-editing effort, namely i) temporal (time), ii) cognitive (mental processes) and iii) technical (keyboard activity). For the purposes of this research, TER scores were correlated with two different indicators of post-editing effort as computed in the CRITT Translation Process Database (TPR-DB) *. On the one hand, post-editing temporal effort was measured using FDur values (duration of segment production time excluding keystroke pauses >_ 200 seconds) and KDur values (duration of coherent keyboard activity excluding keystroke pauses >_ 5 seconds). On the other hand, post-editing technical effort was measured using Mdel values (number of manually generated deletions) and Mins values (number of manually generated insertions).Results show that TER scores have a positive correlation with actual post-editing effort as reflected in the form of manual insertions and deletions (Mins/Mdel) as well as time to perform the task (KDur/FDur).

    M3 - Conference abstract in proceedings

    SP - 150

    BT - Books of Abstracts of the 5th IATIS Conference

    A2 - Alves, Fábio

    A2 - Pagano, Adriana Silvina

    A2 - de Melo Sá, Arthur

    A2 - Ferreguetti, Kícila

    PB - International Association for Translation and Intercultural Studies. IATIS

    CY - Belo Horizonte

    ER -

    Martinez MG, Koglin A, Mesa-Lao B, Carl M. Automatic Evaluation of Machine Translation: Correlating Post-editing Effort and Translation Edit Rate (TER) Scores. I Alves F, Pagano AS, de Melo Sá A, Ferreguetti K, red., Books of Abstracts of the 5th IATIS Conference: Innovation Paths in Translation and Intercultural Studies. Belo Horizonte: International Association for Translation and Intercultural Studies. IATIS. 2015. s. 150