History of machine translation
||This article may require copy editing for grammar, style, cohesion, tone, or spelling. (December 2013)|
Machine translation, (MT), became a reality in research in the 1950s, although references to the idea and work with it can be found as early as the 17th century. In 1954, one of earliest projects was the Georgetown experiment, which involved fully automatic translation of more than sixty Russian sentences into English. The experiment was a success and ushered in an era of significant funding for machine translation research in the United States. Researchers of the Georgetown experiment asserted their beliefs that within three to five years, machine translation would be a solved problem. In the Soviet Union, similar experiments were performed shortly after.
Achieved progress was much slower. In 1966, the ALPAC report found that ten years of research had not fulfilled the expectations of the Georgetown experiment. Subsequently, funding for machine translation was dramatically reduced.
Starting in the late 1980s, as computational power increased, became more common and also less expensive, more interest began to grow in statistical models for machine translation.
Today, there is still no autonomous system of "fully automatic high quality translation of unrestricted text". However, there are many programs now available that are capable of providing useful output within strict constraints. Several of these programs are available online, such as Google Translate and the SYSTRAN system that powers AltaVista's BabelFish (now Yahoo's Babelfish as of May 9, 2008).
The first patents for "translating machines" were applied for in the mid-1930s. One proposal, by Georges Artsrouni, was simply an automatic bilingual dictionary using paper tape. Another proposal, by Russian Peter Troyanskii, was more detailed and included both the bilingual dictionary and a method for dealing with grammatical roles between languages, based on the grammatical system of Esperanto. The system was split up into three stages: the first was for a native-speaking editor in the source language to organize the words into their logical forms and to exercise the syntactic functions; the second was for the machine to "translate" these forms into the target language; and the third was for a native-speaking editor in the target language to normalize this output. His scheme remained unknown until the late 1950s, by which time computers were well-known and utilized.
The early years
The first set of proposals for computer based machine translation was presented by Warren Weaver, a researcher at the Rockefeller Foundation, in his 1949 "Translation memorandum". These proposals were based on the information theory, successes in code breaking during the Second World War, and theories about the universal principles underlying natural language.
A few years after Warren Weaver's proposals, research began in earnest at many universities in the United States. On 7 January 1954 the Georgetown-IBM experiment was held in New York at the head office of IBM. This was the first public demonstration of an MT system. The demonstration was widely reported in the newspapers and garnered public interest. The system itself, however, was no more than a "toy" system. It had only 250 words and translated 49 carefully selected Russian sentences into English — mainly in the field of chemistry. Nevertheless, it encouraged the idea that machine translation was imminent. It stimulated the financing of the research, not only in the US but worldwide.
Early systems used large bilingual dictionaries and hand-coded rules for fixing the word order in the final output. This was eventually found to be too restrictive, and developments in linguistics at the time. For example generative linguistics and transformational grammar were exploited to improve the quality of translations. During this period operational systems were installed. The United States Air Force used a system produced by IBM and Washington University, while the Atomic Energy Commission and Euratom, in Italy, used a system developed at Georgetown University. While the quality of the output was poor it met many of the customers' needs, particularly in terms of speed.
At the end of the 1950s an argument was introduced by Yehoshua Bar-Hillel, a researcher. They were asked by the US government to look into machine translation to assess the possibility of "Fully Automatic High Quality Translation" by machines. This argument is one of semantic ambiguity or double-meaning. Consider the following sentence:
- Little John was looking for his toy box. Finally he found it. The box was in the pen.
The word pen may have two meanings: the first meaning, something used to write in ink with; the second meaning, a container of some kind. To a human, the meaning is obvious, but Bar-Hillel claimed that without a "universal encyclopedia" a machine would never be able to deal with this problem. Today, this type of semantic ambiguity can be solved by writing source texts for machine translation in a controlled language that uses a vocabulary in which each word has exactly one meaning.
The 1960s, the ALPAC report and the seventies
Research in the 1960s in both the Soviet Union and the United States concentrated mainly on the Russian-English language pair. The objects of translation were chiefly scientific and technical documents, such as articles from scientific journals. The rough translations produced were sufficient to get a basic understanding of the articles. If an article discussed a subject deemed to be confidential, it was sent to a human translator for a complete translation; if not, it was discarded.
A great blow came to machine translation research in 1966 with the publication of the ALPAC report. The report was commissioned by the US government and delivered by ALPAC, the Automatic Language Processing Advisory Committee, a group of seven scientists convened by the US government in 1964. The US government was concerned that there was a lack of progress being made despite significant expenditure. The report concluded that machine translation was more expensive, less accurate and slower than human translation, and that despite the expenditures, machine translation was not likely to reach the quality of a human translator in the near future.
The report recommended, however, that tools be developed to aid translators — automatic dictionaries, for example — and that some research in computational linguistics should continue to be supported.
The publication of the report had a profound impact on research into machine translation in the United States, and to a lesser extent the Soviet Union and United Kingdom. Research, at least in the US, was almost completely abandoned for over a decade. In Canada, France and Germany, however, research continued. In the US the main exceptions were the founders of Systran (Peter Toma) and Logos (Bernard Scott), who established their companies in 1968 and 1970 respectively and served the US Department of Defense. In 1970, the Systran system was installed for the United States Air Force, and subsequently by the Commission of the European Communities in 1976. The METEO System, developed at the Université de Montréal, was installed in Canada in 1977 to translate weather forecasts from English to French, and was translating close to 80,000 words per day or 30 million words per year until it was replaced by a competitor's system on 30 September 2001.
While research in the 1960s concentrated on limited language pairs and input, demand in the 1970s was for low-cost systems that could translate a range of technical and commercial documents. This demand was spurred by the increase of globalisation and the demand for translation in Canada, Europe, and Japan.
The 1980s and early 1990s
By the 1980s, both the diversity and the number of installed systems for machine translation had increased. A number of systems relying on mainframe technology were in use, such as Systran, Logos, Ariane-G5, and Metal.
As a result of the improved availability of microcomputers, there was a market for lower-end machine translation systems. Many companies took advantage of this in Europe, Japan, and the USA. Systems were also brought onto the market in China, Eastern Europe, Korea, and the Soviet Union.
During the 1980s there was a lot of activity in MT in Japan especially. With the fifth generation computer Japan intended to leap over its competition in computer hardware and software, and one project that many large Japanese electronics firms found themselves involved in was creating software for translating into and from English (Fujitsu, Toshiba, NTT, Brother, Catena, Matsushita, Mitsubishi, Sharp, Sanyo, Hitachi, NEC, Panasonic, Kodensha, Nova, Oki).
Research during the 1980s typically relied on translation through some variety of intermediary linguistic representation involving morphological, syntactic, and semantic analysis.
At the end of the 1980s, there was a large surge in a number of novel methods for machine translation. One system was developed at IBM that was based on statistical methods. Makoto Nagao and his group used methods based on large numbers of translation examples, a technique that is now termed example-based machine translation. A defining feature of both of these approaches was the neglect of syntactic and semantic rules and reliance instead on the manipulation of large text corpora.
There was significant growth in the use of machine translation as a result of the advent of low-cost and more powerful computers. It was in the early 1990s that machine translation began to make the transition away from large mainframe computers toward personal computers and workstations. Two companies that led the PC market for a time were Globalink and MicroTac, following which a merger of the two companies (in December 1994) was found to be in the corporate interest of both. Intergraph and Systran also began to offer PC versions around this time. Sites also became available on the internet, such as AltaVista's Babel Fish (using Systran technology) and Google Language Tools (also initially using Systran technology exclusively).
The field of machine translation has seen major changes in the last few years. Currently a large amount of research is being done into statistical machine translation and example-based machine translation. In the area of speech translation, research has focused on moving from domain-limited systems to domain-unlimited translation systems. In different research projects in Europe (like TC-STAR) and in the United States (STR-DUST and US-DARPA-GALE), solutions for automatically translating Parliamentary speeches and broadcast news have been developed. In these scenarios the domain of the content is no longer limited to any special area, but rather the speeches to be translated cover a variety of topics. More recently, the French-German project Quaero investigates the possibility of making use of machine translations for a multi-lingual internet. The project seeks to translate not only webpages, but also videos and audio files on the internet.
Today, only a few companies use statistical machine translation commercially, e.g. Asia Online, SDL / Language Weaver (sells translation products and services), Google (uses its proprietary statistical MT system for some language combinations in Google's language tools), Microsoft (uses its proprietary statistical MT system to translate knowledge base articles), and Ta with you (offers a domain-adapted machine translation solution based on statistical MT with some linguistic knowledge). There has been a renewed interest in hybridisation, with researchers combining syntactic and morphological (i.e., linguistic) knowledge into statistical systems, as well as combining statistics with existing rule-based systems.
- Hutchins, J. (2005). "The history of machine translation in a nutshell".[self-published source]
- Madsen, Mathias Winther (23 December 2009). The Limits of Machine Translation (Thesis). University of Copenhagen. p. 11.
- Melby, Alan K. (1995). The Possibility of Language. Amsterdam: J. Benjamins. pp. 27–41. ISBN 9027216142.
- Wooten, Adam (February 14, 2006). "A Simple Model Outlining Translation Technology". T&I Business.
- "Appendix III of 'The present status of automatic translation of languages'". Advances in Computers 1. 1960. pp. 158–163. Reprinted in Y.Bar-Hillel (1964). Language and information. Massachusetts: Addison-Wesley. pp. 174–179.
- "Weaver memorandum". March 1949. Archived from the original on 2006-10-05.
- "PROCUREMENT PROCESS". Canadian International Trade Tribunal. 30 July 2002. Archived from the original on 2011-07-06. Retrieved 2007-02-10.
- Nagao, Makoto (1984). "Procedures Of the International NATO Symposium on Artificial and Human Intelligence". New York: Elsevier North-Holland, Inc. pp. 173–180. ISBN 0-444-86545-4.
- "the Association for Computational Linguistics – 2003 ACL Lifetime Achievement Award". Association for Computational Linguistics. Retrieved 2010-03-10.
- "TC-Star". Retrieved 2010-10-25.
- "U.S.-DARPA-GALE". Retrieved 2010-10-25.
- Hutchins, J. (2005). "Milestones in machine translation — No.6: Bar-Hillel and the nonfeasibility of FAHQT]".
- Van Slype, Georges (1983). Better translation for better communication. Paris: Pergamon Press. ISBN 9780080305349.