Asia Online

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Asia Online Pte Ltd
Industry Translation / Portals
Founder(s) Gregory Binger, Dion Wiggins, Bob Hayward, Philipp Koehn
Headquarters Singapore
Number of locations Singapore, Thailand, Los Angeles, Indonesia
Key people Gregory Binger, Dion Wiggins, Bob Hayward, Philipp Koehn, Tim Cox, Kirti Vashee.
Products Language Studio Machine Translation and Language Processing Platform
Services Automated translation, custom machine translation engines, language processing
Website http://www.asiaonline.net, http://www.languagestudio.com

Asia Online is a privately owned automated translation company backed by individual investors and institutional venture capital. Its corporate headquarters are in Singapore, with significant operations in Bangkok, Thailand, R&D activities throughout Asia, and sales operations in Europe and North America.[1] The firm was founded in 2007 by the University of Edinburgh's Philipp Koehn, Gregory Binger a technologist and IT/IP lawyer, and former Gartner senior analysts Bob Hayward and Dion Wiggins.[2]

The firm is undertaking what it calls the world's largest literacy project[3] by translating vast quantities of the worlds English language knowledge into Asian languages. This is achieved using statistical machine translation (SMT) technologies developed and enhanced in Thailand with a specific focus on Asian languages. Despite the name, Asia Online is not limited to just Asian languages and also supports all 23 official EU languages across each other.[4]

The firm's statistically based translation software employ recent advances in automated translation. Until the early 1990s, almost all production-level machine translation technology relied on collections of linguistic rules to analyze the source sentence, and then map the syntactic and semantic structure into the target language. Its current approach uses statistical techniques from cryptography, applying machine learning algorithms that automatically acquire statistical models from existing parallel collections of human translations, in the same way as Google Translate and the systems made using Koehn's own open source Moses tool for SMT.

Portal Initiatives[edit]

On January 7, 2011, Asia Online launched its Thai language consumer portal,[5] funded in part by CAT Telecom and the Thai Ministry of ICT. All 3.6 million English language Wikipedia articles were translated from English into Thai. Then Prime Minister Abhisit Vejjajiva and Minister of ICT Chuti Krairiksh launched the site as part of Thailand’s Children’s Day celebrations. A crowd sourcing approach is being taken to proofread the articles after they have been machine translated.[6]

Differences from other approaches[edit]

Google, Microsoft and SDL Language Weaver have also created SMT systems, some publicly accessible. The specific difference in Asia Online's approaches are:

  • Clean data: The traditional approach leveraged content found on the web in corporate sites, news articles and other similar sources where the same content was available in multiple languages: this gives low-quality data. Asia Online has focused machine and human resources in this area to ensure that the data is as clean and as accurate as possible. The company's data is sourced from high-quality translations provided by book publishers and translation companies, and is aligned at the segment level (usually sentences) and converted into a consistent format in order to be processed by the learning software. This step includes extracting segments from files and documents if they are not in a TMX format. Then the extracted sequence are aligned—an process by machines, with humans used to validate the accuracy.The data is converted to a base UTF-8 encoding for training the SMT system, small subsets are extracted to guide training, and finally the data is reviewed, cleaned, and analyzed.
  • Multiple domains: the system allows for training in many domains, by extending a base set of information with multiple additional learning sources.
  • Real-time corrections

The firm currently has more than 530 language pairs available in a baseline form and is progressively deploying 15 domains across each language pair. Another 200+ language pairs are under development. Currently supported languages are the Asian languages Arabic, Chinese, Hindi, Japanese, Bahasa Indonesian, Bahasa Malay, Korean, and Thai; and the European languages Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese Norwegian, Polish, Portuguese, Romanian, Slovak, Slovene. Spanish, Swedish, Russian, and Ukrainian. The additional Asian languages Bengali, Gujarati, Punjabi, Tagalog, Tamil, Urdu, and Vietnamese are under development .[7]

Their systems are currently used to build customized translation systems for corporate and language service provider (LSP) customers who add their bilingual parallel corpus to the existing data to create higher quality translation systems.

The company characterizes its products as a "platform", a suite of independent tools and products that can work independently and together. Some are locally installed and some are only available in their SaaS. This is described in the CSA blog entry.

The Language Studio product suite was reviewed by Common Sense Advisory, a translation industry market research firm, in their Global Watchtower blog shown in the link below.

See also[edit]

References[edit]

External links[edit]