Abstract
The purpose of the current investigation is to predict post-editor profiles based on user behaviour and demographics using machine learning techniques to gain a better understanding of post-editor styles. Our study extracts process unit features from the CasMaCat LS14 database from the CRITT Translation Process Research Database (TPR-DB). The analysis has two main research goals: We create n-gram models based on user activity and part-of-speech sequences
to automatically cluster post-editors, and we use discriminative classifier models to characterize post-editors based on a diverse range of translation process features. The classification and clustering of participants resulting from our study suggest this type of exploration could be used as a tool to develop new translation tool features or customization possibilities.
to automatically cluster post-editors, and we use discriminative classifier models to characterize post-editors based on a diverse range of translation process features. The classification and clustering of participants resulting from our study suggest this type of exploration could be used as a tool to develop new translation tool features or customization possibilities.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the Workshop on Interactive and Adaptive Machine Translation |
Redaktører | Francisco Casacuberta, Marcello Federico, Philipp Koehn |
Antal sider | 10 |
Forlag | Association for Machine Translation in the Americas (AMTA) |
Publikationsdato | 2014 |
Sider | 51-60 |
Status | Udgivet - 2014 |
Begivenhed | The 11th Conference of the Association for Machine Translation in the Americas 2014 - Vancouver, Canada Varighed: 22 okt. 2014 → 26 okt. 2014 Konferencens nummer: 11 http://amta2014.amtaweb.org/ |
Konference
Konference | The 11th Conference of the Association for Machine Translation in the Americas 2014 |
---|---|
Nummer | 11 |
Land/Område | Canada |
By | Vancouver |
Periode | 22/10/2014 → 26/10/2014 |
Internetadresse |