Talk:Text corpus

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Linguistics / Theoretical Linguistics / Applied Linguistics  (Rated Start-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
Taskforce icon
This article is supported by the Theoretical Linguistics Task Force.
Taskforce icon
This article is supported by the Applied Linguistics Task Force.
 

Structure of the article[edit]

This article is not structured properly. An article is supposed to start off with a brief introduction (which in this case would rougly summarise what a corpus is and the uses of corpora, I think). Then comes the body of the article which is where all aspects of the subject are explained in more detail. Thus the information in the introduction can be thought of as a subset and preview of what will be contained somewhere in the rest of the article. In this article as it now stands, that is not the case: the "rest of the article" (i.e. what comes after the table of contents) does not fully cover the subject, but seems to supplement what has been said in the introduction (which is therefore more than an introduction). Basically this means that a lot of what is now in the "introduction" needs to be moved down into the body of the article and one or more sections added to the latter to contain that information. --A R King (talk) 17:51, 28 March 2008 (UTC)

What are the criteria for listing "notable corpora"? --84.20.240.176 (talk) 21:37, 3 September 2008 (UTC)Sim

I think we should change this article to the List of text corpora. The text corpus can be explained in Corpus linguistics and here we can just put the important corpora. I would not merge it with Corpus linguistics. As for the notable corpora, I would look at how much they are cited in corpus linguistics papers. Let us say, if a corpus is used / cited by 10 different authors (not just by 10 papers which could be from the same author / team) then we can consider it as notable? I plan to enlarge this article in the near future. Vít Baisa (talk) 09:38, 25 January 2016 (UTC)

Sites on Corpus[edit]

Here are some sites on Corpus. But I don't know how useful they are. http://www.tu-chemnitz.de/phil/english/chairs/linguist/independent/kursmaterialien/language_computers/whatis.htm http://www.cambridge.org/elt/corpus/what_is_a_corpus.htm http://www.tlumaczenia-angielski.info/linguistics/corpus.htm Verycuriousboy (talk) 14:44, 9 June 2009 (UTC)

Bilingual corpus[edit]

I think it might be useful to add link to search multiple BIlingual corpus interface availble at http://glosbe.com/tmem — Preceding unsigned comment added by 89.76.126.251 (talk) 20:50, 31 August 2011 (UTC)

List of influential corpora for future use[edit]