British National Corpus
|
|
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (January 2011) |
The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources.[1] It was compiled as a general corpus (collection of texts) in the field of corpus linguistics. The corpus covers British English of the late twentieth century from a wide variety of genres with the intention that it be a representative sample of spoken and written British English of that time.
Of the two parts to the 10-million word spoken corpus, one part is demographic, containing transcriptions of spontaneous natural conversations made by members of the public and the other involves context-governed aspects such as transcriptions of recordings made at specific types of meeting and event. All the original recordings transcribed for inclusion in the BNC have been deposited at the British Library Sound Archive.
The corpus is marked up following the recommendations of the Text Encoding Initiative and includes full linguistic annotation and contextual information. The most recent edition, from March 2007, is distributed in XML format along with the Xaira software. It is freely available under a licence and is widely distributed.
[edit] See also
[edit] References
- ^ Burnard, Lou; Aston, Guy (1998). The BNC handbook: exploring the British National Corpus. Edinburgh: Edinburgh University Press. p. xiii. ISBN 0-7486-1055-3.
[edit] External links
| This linguistics article is a stub. You can help Wikipedia by expanding it. |