Corpora in Translation Studies

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Corpora in Translation Studies Gradually the translator’s workplace has changed over the last ten years ago. And today computer could be considered an important or even essential tool in translation. However the computer doesn’t substitute traditional tools such as monolingual and bilingual dictionaries, terminologies and encyclopaedias on paper or in digital format. Personal computers have the capacity of process information easier and quicker than ever before. However the problem is although we can find a piece of information we need to find the right and reliable information.

Here Corpora and concordancing software play an important role since gaining access to information about language, content and translation practices which was hardly available to translators before the present stage of ICT development.

Machine Translation based in Corpus[edit]

Machine translation from corpus linguistics is based in the analysis of real samples with its own translations. Between the different devices that use corpus, there are statistical methods and based in examples.


The main objective of statistic machine translation is to generate translations from statistical methods based in corpus of bilingual texts. For instance European parliament minutes are written in all of EU ( European Union)official languages. If there would be more of this corpus, we would get excellent results of translation of texts about those subjects. The first statistic machine translation program was CANDIDE by IBM.

Based in examples[edit]

Machine translation based in examples, is well known for using a bilingual corpus as main source of knowledge. Basically it’s an analogical translation and could be interpreted as a practice of cases reasoning used in automatic learning, which consist in solving a problem basing in solutions of others similar problems.

Corpora and Translation[edit]

Translation typology[edit]

According to EAGLES, we can make a general distinction between Monolingual and Multilingual corpora. At the same time in multilingual corpora, we can distinguish between: Comparable corpora: Corpora compiled using similar design criteria but which are not translations).

Parallel or Translation corpora: which are texts in one language aligned with their translation in another. We have to take into account several variables like directness of translation, number of languages, etc.

There are so much Monolingual Comparable Corpora ( corpus composed in two sub-sections one of original texts in one language and the other texts translated into the same language. It’s useful for translation theorists and researchers but Professional technical translators use translation memories.

Defining Translation memories[edit]

Translation memory is a very specific type of parallel corpus in that:

a)It is “propietory” TMs are created individually or collectively around specific translation projects. b)TMs tends to closure, standarizated and restricted range of linguistic options.

Translation workbenches and TMs could be considered the most successful translation tool; however it’s restricted to specific text types.

Corpora aids in Translation[edit]

The previous kinds of corpora can be combined with other tools like dictionary for example. Corpora can function as general or specialized dictionaries. On that way, comparable corpora can be seen as a monolingual dictionary and Parallel corpora could be compared to bilingual dictionary.

Corpus resources for Translators[edit]

Not all dictionaries are the same, and either are all corpora. Apart from translation memories, corpus resources with a potencial use for professional translators could be classified from “robust” to “virtual”.

Many examples of corpora could be BNC ( British National Corpus) or the Spanish corpus CREA or the Italian one CORIS and so on.

It’s important to mention the difference by corpus linguistics between corpora and archives of electronic texts, the second one is only a repertory of electronic texts. Building a corpus of web pages implies an information retrieval operation, in order to locate relevant and reliable documents.

In many translation classes students have made their own corpora with DIY ( Do it yourself) corpora. The main beneficts of DIY corpora may be summarize as: • They are easy to make

• They are great resource for content information.

• They are a great resource for content information.

• They are a great resource for terminology and phraseology.

• Not all topics, not all types and not all languages are available.

• The relevance and reliability of documents need to be carrefully assessed.

• Existing concordancing software isn’t well equipped HTML or XML files.

Finally the advantages of “robust” corpora that we can see over “virtual” corpora are follows:

• They are usually more reliable

• They are usually larger.

• They may be improved with linguistic and contextual information.


Baker, M (1993). "Corpus linguistics and translation studies. "Implications and applications" in M. Baker G. Francis & E. Tognini-Bonelli (eds.) Text and technology. Philadelphia/ Amsterdam: John Benjamins, 232–252.

Scott, M. (1996) Wordsmith tools .Oxford: Oxford University Press.

Zanettin, Federico. Corpora in Translation Practice. Universitá per Stranieri di Perugia.

External links[edit]