Interactive machine translation
Interactive Machine Translation (IMT), is a specific sub-field of computer-aided translation. Under this translation paradigm, the computer software that assists the human translator attempts to predict the text the user is going to input by taking into account all the information it has available. Whenever such prediction is wrong and the user provides feedback to the system, a new prediction is performed considering the new information available. Such process is repeated until the translation provided matches the user's expectations.
Interactive machine translation is specially interesting when translating texts in domains where it is not admissible to output a translation containing errors, hence requiring a human user to amend the translations provided by the system. In such cases, interactive machine translation has been proved to provide benefit to potential users. Nevertheless, there is no commercial software that implements interactive machine translation yet and work done in the field is restrained to academic research.
Historically, interactive machine translation is born as an evolution of the computer-aided translation paradigm, where the human translator and the machine translation system were intended to work as a tandem. This first work was extended within the TransType research project, funded by the Canadian government. In this project, the human interaction was aimed towards producing the target text for the first time by embedding data-driven machine translation techniques within the interactive translation environment with the goal of achieving the best of both actors: the efficiency of the automatic system and the reliability of human translators.
Later, a larger-scale research project, TransType2, funded by the European Commission extended such work by analyzing the incorporation of a complete machine translation system into the process, with the goal of producing a complete translation hypothesis, which the human user is allowed to amend or accept. If the user decides to amend the hypothesis, the system then attempts to make the best use of such feedback in order to produce a new translation hypothesis that takes into account the modifications introduced by the user.
Recent work on involving an extensive evaluation with human users revealed the fact that interactive machine translation may even be used by users that do not speak the source language in order to achieve near professional translation quality. Moreover, it also elucidated the fact that an interactive scenario is more beneficial than a classic post-edition scenario.
The interactive machine translation process starts with the system suggesting a translation hypothesis to the user. Then, the user may accept the complete sentence as correct, or may modify it if he considers there is some error. Typically, when modifying a given word, it is assumed that the prefix until that word is correct, leading to a left-to-right interaction scheme. Once the user has changed the word considered incorrect, the system then proposes a new suffix, i.e. the remainder of the sentence. Such process continues until the translation provided satisfies the user.
Although explained at the word level, the previous process may also be implemented at the character level, and hence the system provides a suffix whenever the human translator types in a single character. In addition, there is ongoing effort towards changing the typical left-to-right interaction scheme in order to make human-machine interaction easier within the MIPRCV project, funded by the Spanish government.
A similar approach is used in the Caitra translation tool.
Evaluation is a difficult issue in interactive machine translation. Ideally, evaluation should take place in experiments involving human users. However, given the high monetary cost this would imply, this is seldom the case. Moreover, even when considering human translators in order to perform a true evaluation of interactive machine translation techniques, it is not clear what should be measured in such experiments, since there are many different variables that should be taken into account and cannot be controlled, as is for instance the time the user takes in order to get used to the process.
Typically, interactive machine translation is measured in laboratory conditions by using the key stroke ratio or the word stroke ratio. Such criteria attempt to measure how many key-strokes or words did the user need to introduce before producing the final translated document.
Differences with classical computer-aided translation
Although interactive machine translation is a sub-field of computer-aided translation, the main attractive of the former with respect to the latter is the interactivity. In classical computer-aided translation, the translation system may suggest one translation hypothesis in the best case, and then the user is required to post-edit such hypothesis. In contrast, in interactive machine translation the system produces a new translation hypothesis each time the user interacts with the system, i.e. after each word (or letter) has been introduced.
- Machine translation
- Statistical machine translation
- Computer-aided translation
- Computational linguistics
- Casacuberta, Francisco; Civera, Jorge; Cubel, Elsa; Lagarda, Antonio L.; Lapalme, Guy; Macklovitch, Elliott; Vidal, Enrique (2009). "Human interaction for high quality machine translation". Communications of the ACM 52 (10): 135–138. doi:10.1145/1562764.1562798.
- Barrachina, Sergio; Bender, Oliver; Casacuberta, Francisco; Civera, Jorge; Cubel, Elsa; Khadivi, Shahram; Lagarda, Antonio L.; Ney, Hermann; Tomás, Jesús; Vidal, Enrique (2009). "Statistical approaches to computer-assisted translation". Computational Linguistics 25 (1): 3–28.
- Foster, George; Isabelle, Pierre; Plamondon, Pierre (1997). "Target-text mediated interactive machine translation". Machine Translation 12 (1): 175–194. doi:10.1023/a:1007999327580.
- Koehn, Philipp (June 2010). "Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL)". Los Angeles, California: Association for Computational Linguistics. pp. 537–545.
- Sanchis-Trilles, Germán; Ortiz-Martínez, Daniel; Civera, Jorge; Casacuberta, Francisco; Vidal, Enrique; Hoang, Hieu (October 2008). "Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP)". Honolulu, Hawaii: Association for Computational Linguistics. pp. 485–494.
- González-Rubio, Jesús; Ortiz-Martínez, Daniel; Casacuberta, Francisco (July 2010). "Proceedings of the ACL 2010 Conference Short Papers (ACL)". Uppsala, Sweden: Association for Computational Linguistics. pp. 173–177.