Abstract
We describe a set of experiments to explore statistical techniques for ranking and selecting the best translations in a graph of translation hypotheses. In a previous paper (Carl, 2007) we have described how the hypotheses graph is generated through shallow mapping and permutation rules . We have given examples of its
nodes consisting of vectors representing morpho-syntactic properties of words and phrases. This paper describes a number of methods for elaborating statistical feature functions from some of the vector components. The feature functions are trained off-line on different types of text and their log-linear combination is then used to retrieve the best translation paths in the graph. We compare two language modelling toolkits, the CMU and the SRI toolkit and arrive at three results: 1) word-lemma based feature function models produce better results than token-based models, 2) adding a PoS-tag feature function to the word-lemma model improves the output and 3) weights for lexical translations are suitable if the training material is similar to the texts to be translated.
nodes consisting of vectors representing morpho-syntactic properties of words and phrases. This paper describes a number of methods for elaborating statistical feature functions from some of the vector components. The feature functions are trained off-line on different types of text and their log-linear combination is then used to retrieve the best translation paths in the graph. We compare two language modelling toolkits, the CMU and the SRI toolkit and arrive at three results: 1) word-lemma based feature function models produce better results than token-based models, 2) adding a PoS-tag feature function to the word-lemma model improves the output and 3) weights for lexical translations are suitable if the training material is similar to the texts to be translated.
Originalsprog | Engelsk |
---|---|
Titel | The LREC 2008 Proceedings : The Sixth International Conference on Language Resources and Evaluation (LREC'08) |
Redaktører | Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias |
Antal sider | 8 |
Udgivelsessted | Paris |
Forlag | European Language Resources Association |
Publikationsdato | 2008 |
Sider | 1140-1147 |
ISBN (Trykt) | 2951740840 |
Status | Udgivet - 2008 |
Udgivet eksternt | Ja |
Begivenhed | The 6th International Conference on Language Resources and Evaluation. LREC 2008 - Marrakech, Marokko Varighed: 28 maj 2008 → 30 maj 2008 Konferencens nummer: 6 http://www.lrec-conf.org/lrec2008/ |
Konference
Konference | The 6th International Conference on Language Resources and Evaluation. LREC 2008 |
---|---|
Nummer | 6 |
Land/Område | Marokko |
By | Marrakech |
Periode | 28/05/2008 → 30/05/2008 |
Internetadresse |
Emneord
- Maskinoversættelse
- Oversættelsesteori