= Apertium =

Apertium
- Logo: Apertium logo.svg
- Screenshot: Apertium-tolk.png
- Ver Layout: simple
- Programming Language: C++
- Operating System: POSIX compatible and Windows NT (limited support)
- Language: 35 languages, see below
- Genre: Rule-based machine translation
- License: GNU General Public License
- Repo: https://github.com/apertium

Apertium is a free/open-source rule-based machine translation platform. It is free software and released under the terms of the GNU General Public License.

== Overview ==
Apertium is a transfer-based machine translation system, which uses finite state transducers for all of its lexical transformations, and Constraint Grammar taggers as well as hidden Markov models or Perceptrons for part-of-speech tagging / word category disambiguation. A structural transfer component is responsible for word movement and agreement; most Apertium language pairs up until now have used "chunking" or shallow transfer rules, though newer pairs use (possibly recursive) rules defined in a Context-free grammar.

Many existing machine translation systems available at present are commercial or use proprietary technologies, which makes them very hard to adapt to new usages. Apertium code and data is free software and uses a language-independent specification, to allow for the ease of contributing to Apertium, more efficient development, and enhancing the project's overall growth.

At present (December 2020), Apertium has released 51 stable language pairs, delivering fast translation with reasonably intelligible results (errors are easily corrected). Being an open-source project, Apertium provides tools for potential developers to build their own language pair and contribute to the project.

== History ==
Apertium originated as one of the machine translation engines in the project OpenTrad, which was funded by the Spanish government, and developed by the Transducens research group at the Universitat d'Alacant. It was originally designed to translate between closely related languages, although it has recently been expanded to treat more divergent language pairs. To create a new machine translation system, one just has to develop linguistic data (dictionaries, rules) in well-specified XML formats.

Language data developed for it (in collaboration with the Universidade de Vigo, the Universitat Politècnica de Catalunya and the Universitat Pompeu Fabra) currently support (in stable version) the Arabic, Aragonese, Asturian, Basque, Belarusian, Breton, Bulgarian, Catalan, Crimean Tatar, Danish, English, Esperanto, French, Galician, Hindi, Icelandic, Indonesian, Italian, Kazakh, Macedonian, Malaysian, Maltese, Northern Sami, Norwegian (Bokmål and Nynorsk), Occitan, Polish, Portuguese, Romanian, Russian, Sardinian, Serbo-Croatian, Silesian, Slovene, Spanish, Swedish, Tatar, Ukrainian, Urdu, and Welsh languages. A full list is available below. Several companies are also involved in the development of Apertium, including Prompsit Language Engineering, Imaxin Software and Eleka Ingeniaritza Linguistikoa.

The project has taken part in the 2009, 2010, 2011, 2012, 2013 and 2014 editions of Google Summer of Code and the 2010, 2011, 2012, 2013, 2014, 2015, 2016 and 2017 editions of Google Code-In.

== Translation methodology ==

This is an overall, step-by-step view how Apertium works.

The diagram displays the steps that Apertium takes to translate a source-language text (the text we want to translate) into a target-language text (the translated text).
1. Source language text is passed into Apertium for translation.
2. The deformatter removes formatting markup (HTML, RTF, etc.) that should be kept in place but not translated.
3. The morphological analyser segments the text (expanding elisions, marking set phrases, etc.), and looks up segments in the language dictionaries, returning dictionary forms and tags for all matches. In pairs that involve agglutinative morphology, including a number of Turkic languages, a Helsinki Finite State Transducer (HFST) is used. Otherwise, an Apertium-specific finite state transducer system called lttoolbox, is used.
4. The morphological disambiguator (the morphological analyser and the morphological disambiguator together form the part of speech tagger) resolves ambiguous segments (i.e., when there is more than one match) by choosing one match. Apertium uses Constraint Grammar rules (with the vislcg3 parser) for most of its language pairs.
5. Retokenisation uses a finite state transducer to match sequences of lexical units and may reorder or translate tags (often used for translating idiomatic expressions into something that more approaches the target language grammar)
6. Lexical transfer looks up disambiguated source-language basewords to find their target-language equivalents (i.e., mapping source language to target language). For lexical transfer, Apertium uses an XML-based dictionary format called bidix.
7. Lexical selection chooses between alternative translations when the source text word has alternative meanings. Apertium uses a specific XML-based technology, apertium-lex-tools, to perform lexical selection.
8. Structural transfer (i.e., it is an XML format that allows writing complex structural transfer rules) can consist of one-step chunking transfer, three-step chunking transfer or a CFG-based transfer module. The chunking modules flag grammatical differences between the source language and target language (e.g. gender or number agreement) by creating a sequence of chunks containing markers for this. They then reorder or modify chunks in order to produce a grammatical translation in the target-language. The newer CFG-based module matches input sequences into possible parse trees, selecting the best-ranking one and applying transformation rules on the tree.
9. The morphological generator uses the tags to deliver the correct target language surface form. The morphological generator is a morphological transducer, just like the morphological analyser. A morphological transducer both analyses and generates forms.
10. The post-generator makes any necessary orthographic changes due to the contact of words (e.g. elisions).
11. The reformatter replaces formatting markup (HTML, RTF, etc.) that was removed by the deformatter in the first step.
12. Apertium delivers the target-language translation.

== Supported languages ==
As of , the following 108 pairs and <onlyinclude>51</onlyinclude> languages and languages varieties are supported by Apertium.

1. Afrikaans to Dutch
2. Arabic to Maltese
3. Aragonese to Catalan
4. Aragonese to Spanish
5. Arpitan (Franco-Provençal) to French
6. Basque to English
7. Basque to Spanish
8. Belarusian to Russian
9. Breton to French
10. Bulgarian to Macedonian
11. Catalan to Aragonese
12. Catalan to English
13. Catalan to Esperanto
14. Catalan to French
15. Catalan to Italian
16. Catalan to Occitan
17. Catalan to Aranese
18. Catalan to Portuguese
19. Catalan to Brazilian Portuguese
20. Catalan to European Portuguese (traditional spelling)
21. Catalan to Romanian
22. Catalan to Sardinian
23. Catalan to Spanish
24. Crimean Tatar to Turkish
25. Danish to Norwegian (Bokmål)
26. Danish to Norwegian (Nynorsk)
27. Danish to Swedish
28. Dutch to Afrikaans
29. English to Catalan
30. English to Valencian
31. English to Esperanto
32. English to Galician
33. English to Serbo-Croatian
34. English to Spanish
35. Esperanto to English
36. French to Arpitan (Franco-Provençal)
37. French to Catalan
38. French to Esperanto
39. French to Occitan
40. French to Gascon
41. French to Spanish
42. Galician to English
43. Galician to Portuguese
44. Galician to Spanish
45. Hindi to Urdu
46. Icelandic to English
47. Icelandic to Swedish
48. Indonesian to Malay
49. Italian to Catalan
50. Italian to Sardinian
51. Italian to Spanish
52. Kazakh to Tatar
53. Macedonian to Bulgarian
54. Macedonian to English
55. Malay to Indonesian
56. Maltese to Arabic
57. Northern Sámi to Norwegian (Bokmål)
58. Norwegian (Bokmål) to Danish
59. Norwegian (Bokmål) to Norwegian (Nynorsk)
60. Norwegian (Bokmål) to East Norwegian, vi→vi
61. Norwegian (Bokmål) to Swedish
62. Norwegian (Nynorsk) to Danish
63. Norwegian (Nynorsk) to Norwegian (Bokmål)
64. Norwegian (Nynorsk) to East Norwegian, vi→vi
65. Norwegian (Nynorsk) to Swedish
66. East Norwegian, vi→vi to Norwegian (Nynorsk)
67. Occitan to Catalan
68. Occitan to French
69. Occitan to Spanish
70. Aranese to Catalan
71. Aranese to Spanish
72. Gascon to French
73. Polish to Silesian
74. Portuguese to Catalan
75. Portuguese to Galician
76. Portuguese to Spanish
77. Romanian to Catalan
78. Romanian to Spanish
79. Russian to Belarusian
80. Russian to Ukrainian
81. Sardinian to Italian
82. Serbo-Croatian to English
83. Serbo-Croatian to Macedonian
84. Serbo-Croatian to Slovenian
85. Silesian to Polish
86. Slovenian to Serbo-Croatian
87. Spanish to Aragonese
88. Spanish to Asturian
89. Spanish to Catalan
90. Spanish to Valencian
91. Spanish to English
92. Spanish to Esperanto
93. Spanish to French
94. Spanish to Galician
95. Spanish to Italian
96. Spanish to Occitan
97. Spanish to Aranese
98. Spanish to Portuguese
99. Spanish to Brazilian Portuguese
100. Swedish to Danish
101. Swedish to Icelandic
102. Swedish to Norwegian (Bokmål)
103. Swedish to Norwegian (Nynorsk)
104. Tatar to Kazakh
105. Turkish to Crimean Tatar
106. Ukrainian to Russian
107. Urdu to Hindi
108. Welsh to English

== See also ==

- Babel Fish (discontinued; redirects to main Yahoo! site)
- Comparison of machine translation applications
- Jollo (discontinued)
- Microsoft Translator
- Moses
- OpenLogos
- SYSTRAN
- Yandex.Translate
