This article has been nominated to be checked for its neutrality. (May 2008) (Learn how and when to remove this template message)
SDL Language Weaver is a Los Angeles, California–based company that was founded in 2002 by the University of Southern California's Kevin Knight and Daniel Marcu, to commercialize a statistical approach to automatic language translation and natural language processing - now known globally as statistical machine translation software (SMTS).
SDL Language Weaver’s statistically based translation software is an instance of a recent advance in automated translation. While earlier machine translation technology relied on collections of linguistic rules to analyze the source sentence, and then map the syntactic and semantic structure into the target language, SDL Language Weaver uses statistical techniques from cryptography, applying machine learning algorithms that automatically acquire statistical models from existing parallel collections of human translations. These models are more likely to be up to date, appropriate and idiomatic, because they are learned directly from real translations. The software can also be quickly customized to any subject area or style and do a full translation of previously unseen text.
Statistical MT was once thought appropriate only for languages with very large amounts of pre-translated data. However, with new advances in SMT, SDL Language Weaver has been able to also create translation systems for languages smaller amounts of parallel data. Additionally, with customization, SMT can also "learn" to accurately translate highly technical material.
SDL Language Weaver's primary product is their translation software. They currently offer 24 bi-directional language pairs—these include English to and from French, Italian, Danish, Greek, Spanish, German, Dutch, Portuguese, Swedish, Russian, Czech, Romanian, Polish, Arabic, Persian, Simplified and Traditional Chinese, Korean, and Hindi. Several non-English language pairs are also available, such as Arabic-Spanish, Arabic-French, Spanish-French and French-German.
The current language pairs all utilize phrase-based statistical MT. However, the company is also working on syntax-based statistical MT for certain language pairs to improve the overall translation quality.
SDL Language Weaver can also create customized (domain specific) language pairs for particular companies. They use a customer's existing, pre-translated data to "train" a new translation system that statistically understands how to translate that customers information so new data can be translated in a shorter amount of time and edited as needed prior to publication.
As well as their primary translation software, SDL Language Weaver has several other products available. Their Alignment Tool is a translation memory generator. This allows users to enter previously translated documents, and align them at the segment level, producing a translation memory file. The company also has Customizer, a customization tool. This product allows users to fine-tune the translation system using small amounts (up to 2 million words) of pre-translated data in a specific subject area. This tool allows for incremental improvements over time and gives users more control of the process. However, some customer feedback indicates that while vocabulary may get better, fluency of the translation can be negatively impacted.
Notes and references
- Company Homepage at the Library of Congress Web Archives (archived 2006-04-10)
- A New Scientist article
- Interview with Mark Tapling, President and CEO of Language Weaver, on Fox Business Network, Dec. 5, 2008 at the Wayback Machine (archived 2008-12-07)
|This article about machine translation is a stub. You can help Wikipedia by expanding it.|