Abstract
The purpose of the current investigation is to predict post-editor profiles based on user behaviour and demographics using machine learning techniques to gain a better understanding of post-editor styles. Our study extracts process unit features from the CasMaCat LS14 database from the CRITT Translation Process Research Database (TPR-DB). The analysis has two main research goals: We create n-gram models based on user activity and part-of-speech sequences
to automatically cluster post-editors, and we use discriminative classifier models to characterize post-editors based on a diverse range of translation process features. The classification and clustering of participants resulting from our study suggest this type of exploration could be used as a tool to develop new translation tool features or customization possibilities.
to automatically cluster post-editors, and we use discriminative classifier models to characterize post-editors based on a diverse range of translation process features. The classification and clustering of participants resulting from our study suggest this type of exploration could be used as a tool to develop new translation tool features or customization possibilities.
Original language | English |
---|---|
Title of host publication | Proceedings of the Workshop on Interactive and Adaptive Machine Translation |
Editors | Francisco Casacuberta, Marcello Federico, Philipp Koehn |
Number of pages | 10 |
Publisher | Association for Machine Translation in the Americas (AMTA) |
Publication date | 2014 |
Pages | 51-60 |
Publication status | Published - 2014 |
Event | The 11th Conference of the Association for Machine Translation in the Americas 2014 - Vancouver, Canada Duration: 22 Oct 2014 → 26 Oct 2014 Conference number: 11 http://amta2014.amtaweb.org/ |
Conference
Conference | The 11th Conference of the Association for Machine Translation in the Americas 2014 |
---|---|
Number | 11 |
Country/Territory | Canada |
City | Vancouver |
Period | 22/10/2014 → 26/10/2014 |
Internet address |