Jump to content

Tatoeba

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Papabearrr (talk | contribs) at 13:14, 20 March 2011 (Added secondary sources.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Tatoeba.org
Type of site
Open collaborative multilingual sentence dictionary
Available in15 languages; content in 87 languages
OwnerTrang Ho
Created byTrang Ho
URLhttp://tatoeba.org/
CommercialNo
RegistrationOptional

Tatoeba.org is a free online database of example sentences geared towards foreign language learners. Its name comes from the Japanese term "tatoeba" (例えば tatoeba), meaning "for example". Unlike other online dictionaries, which focus on words, Tatoeba focuses on complete sentences, their grammatical properties, and translating them into other languages. Anyone can sign up and contribute, regardless of linguistics background or second language proficiency. Tatoeba was founded and is maintained by its sole administrator, Trang Ho, and is hosted and supported by the Free Software Foundation France[1].

Content

Tatoeba is also the current home of the Tanaka Corpus, a once public-domain series of English-Japanese sentence pairs compiled by Hyogo University professor Yasuhito Tanaka first released in 2001[2].

Site capabilities

Users, even non-registered ones, can search for words in any language to retrieve a list of sentences using that word. Each sentence in the Tatoeba database are displayed next to its translations in other languages; direct and indirect translations are differentiated. Sentences are tagged for content such as subject matter, dialect, or vulgarity; they also each have individual comment threads to facilitate feedback and corrections from other users and cultural notes. Almost 10,000 sentences in 8 languages currently have audio readings. Sentences can also be browsed by language, tag, or audio.

Registered users can add new sentences or translate or proofread existing ones, even if their target language is not their native tongue. Translations are linked to the original sentence automatically. Users can freely edit their own sentences, "adopt" and correct sentences without an owner, and comment on others' sentences. Trusted users, a rank above new users, can tag, untag, link, and unlink sentences.

Database structure

A simplified diagram of Tatoeba's underlying data structure.

Tatoeba's basic data structure is a series of nodes and edges. Each sentence is a node; each edge bridges two sentences with the same meaning[3]

License

The entire Tatoeba database is published under a Creative Commons Attribution 2.0 license. Under this license, English and Japanese sentence pairs originally from the Tanaka Corpus are used in Jim Breen's WWWJDIC, an English-Japanese dictionary.

Acclaim

Tatoeba received a grant from Mozilla Drumbeat in December 2010[4][5].

References

  1. ^ "Tatoeba, un dictionnaire de langues pour phrases d'exemples". fsffrance.org (in French). Paris: FSF France. February 24, 2011. Retrieved March 20, 2011. {{cite web}}: Unknown parameter |trans_title= ignored (|trans-title= suggested) (help)
  2. ^ "Tanaka Corpus". EDRDG Wiki. Electronic Dictionary Research and Development Group. February 3, 2011. Retrieved March 20, 2011.
  3. ^ Ho, Trang (February 23, 2010). "How to be a good contributor in Tatoeba". Tatoeba Project Blog. Retrieved March 20, 2011.
  4. ^ Ho, Trang (January 17, 2011). "Grant from Mozilla Drumbeat". Tatoeba Project Blog. Retrieved March 20, 2011.
  5. ^ Moltke, Henrik (December 30, 2010). "Best Drumbeat Projects: Tatoeba – a free and open database of sentences". Yoyodyne.cc. Retrieved March 20, 2011. ...the Mozilla Foundation wants to encourage and help the Tatoeba project by giving it a USD 2.5K Mozilla Drumbeat Grant.

{{{inline}}}