Jump to content

Crowdsourcing as human-machine translation

From Wikipedia, the free encyclopedia

The use of crowdsourcing and text corpus in human-machine translation (HMT) within the last few years have become predominant in their area,[1] in comparison to solely using machine translation (MT). There have been a few recent academic journals looking into the benefits that using crowdsourcing as a translation technique could bring to the current approach to the task and how it could help improve and make more efficient the current tools available to the public.

Crowdsourcing in translation

[edit]

Technologies translated through crowdsourcing

[edit]

Known as a distribution of Linux, the translation of this open-source system is often carried out by individual users who have a desire to use it in their mother tongue.

The service "Google in your own language" (GIYL)[2] was a project which was translated by both users and translation volunteers.[3]

In March 2008, through crowdsourcing, the entire site was translated into French within 24 hours by allegedly over 4,000 native French speakers.[4]

This language institute, based on the fictional language heard in Star Trek, provides a forum to its members for discussion and interaction with other fellow enthusiasts. Partnered with TED (conference) as its technological partner, dotSUB developed a "browser based, one-stop, self contained system for creating and viewing subtitles for videos in multiple languages across all platforms, including web based, mobile devices, and transcription and video editing systems"[5]

An open source collaborative translation platform, Worldwide Lexicon (WWL) consider themselves to be "a translation memory, essentially a giant database of translations, which can be embedded in almost any website or web application"[6] through the use of a browser plugin.

Advantages

[edit]

Crowdsourcing translation is considered to be highly efficient, thanks to their threefold advantages:

  • Multilingual support

Through human or manual translation, there are no boundaries or limitations to the languages or dialects the source text can be translated into. Through crowdsourcing, the creation of a large base of translators with a large variety of native tongues provides the possibility for the original text to be translated into many different languages.

  • Quick solution

If the text provided is submitted as an open source, the time in which it is translated could be within only a few minutes (if the text in question is relatively small). This is due to the large number of people who have access to the task. Despite varying levels of competency within the users, an accurate translation is usually reached due to the sheer number of participants that would be able to correct and overrule mistakes. However, communication between large numbers of people would be difficult to co-ordinate effectively.

  • Monetary benefits

The company which implements the crowdsourcing is considered to be the main benefactor, due to the low cost of maintaining a crowdsourcing platform once it has been set up. Translators on open sources are not generally considered to be freelancers or professional translators; rather hobbyists who are willing to translate for free.

Challenges

[edit]
  • Technological boundaries

Crowdsourcing tends to only be effective to its fullest extent when employed on the internet. This renders groups of people who are not internet-savvy, or even without free, reliable access to the internet under-represented in crowdsourcing. Therefore, valid and perhaps important dialects could be omitted from the results. Time zone barriers also play an important role in delay in the delivery time of the final product, and should be taken into consideration.

  • Quality

Research into specifically the quality of Wikipedia[7] "concluded that adding more editors to an article improved article quality only when they used appropriate coordination techniques and was harmful when they did not.".[3] The most important issue to take into account is the aforementioned unprofessionalism of the open source translators the text is released to, thus creating somewhat variable results.

  • Motivation

Without a source of motivation, obtaining usable results from a crowdsourcing project is almost impossible. It is vital to create interest and enthusiasm within the group of people in order to maintain their commitment to the project. Therefore, rewards are often offered for the best contributions.

  • Control

As the following for a project increases, the ability to control and manage the group decreases, leading to an unorganised and chaotic result, and can therefore be very time-consuming and high cost.

Crowdsourcing vs. machine translation

[edit]

The main difference between these two techniques is that crowdsourcing is human-generated translation, whilst MT is automated by a computer, although both share similarities. A concise table is available in Crowdsourcing as Human-Machine Translation by Anastasiou and Gupta:[3]

Crowdsourcing Translation vs. Machine Translation
Crowdsourcing Translation Machine Translation
Start 2006 1955
Output "engine" Humans Computer software
Human involvement Always At revision
Control No Yes
Terminological consistency No Yes
Source text Uncontrolled Controlled
Speed Less than MT High
Cost Low implementation cost Acquisition cost of commercial systems
Quality High Low
Profit Company profits MT user (single person or company) profits

Future prospects

[edit]

Anastasious and Gupta believe that in the future, both advantages of using crowdsourcing and machines in translating will merge to form an efficient, cost-effective and high-quality translation service.[3]

References

[edit]
  1. ^ Ambati, Vamshi. Active Learning and Crowd-Sourcing for Machine Translation. Carnegie Mellon University, Pittsburgh. CiteSeerX 10.1.1.164.9485.
  2. ^ "Google In Your Own Language". 2010.
  3. ^ a b c d Anastasiou, Dimitra; Gupta, Rajat (2011). "Crowdsourcing as Human-Machine Translation (HMT)". Journal of Information Science. 37 (XX (X)): 1–25. doi:10.1177/0165551511418760. S2CID 33422441.
  4. ^ Sawers, Paul (April–May 2009). "Facebook's un-rebellion". Multilingual (62).
  5. ^ "About dotSUB". Retrieved 9 June 2012.
  6. ^ "Worldwide Lexicon". Retrieved 9 June 2012.
  7. ^ Kittur, A; Kraut, R.E. (2008). "Harnessing the wisdom of crowds in wikipedia: Quality through coordination". Proceedings of the 2008 ACM conference on Computer supported cooperative work. pp. 37–46. doi:10.1145/1460563.1460572. ISBN 9781605580074. S2CID 1184433.