Scottish Corpus of Texts and Speech

The Scottish Corpus of Texts & Speech (SCOTS) is an ongoing project to build a corpus of modern-day (post-1940) written and spoken texts in Scottish English and varieties of Scots. SCOTS has been available online since November 2004, and can be freely searched and browsed. It reached 4.7 million words by 2015.^[1]

The project is a venture by the Department of English Language and STELLA project at the University of Glasgow. SCOTS is grant-funded by the Arts and Humanities Research Council.

Language variety[edit]

SCOTS contains texts in Scottish English and varieties of broad Scots, including Doric, Lallans, urban varieties such as Glaswegian and Insular Scots. SCOTS contains a geographical spread of texts as well as a demographic spread. Each text is accompanied by extensive metadata, including such information as author's decade of birth, gender, occupation, birthplace and place of residence, and details about the text such as publication information, audience, date and genre.

Genre and mode[edit]

SCOTS is a multimedia corpus, containing written texts and spoken texts, available as orthographic transcriptions, accompanied by source audio or video files. SCOTS includes a large number of genres and text types, including prose fiction, poetry, business and personal correspondence, religious texts, parliamentary and administrative documents, emails, conversations and interviews.

Search and analysis[edit]

SCOTS can be investigated in various ways, depending on the user's interest. The corpus can be browsed, for example by the author's name or date of the text, and all texts can be downloaded in plain text format.

Transcriptions are synchronised with audio / video files, which are streamed and may also be downloaded.

An Advanced Search facility allows the user to build up more complex queries, choosing from all the fields available in the metadata. Geographical results are plotted on an interactive map, so regional variation may be investigated.

Advanced Search results can also be viewed as a KWIC concordance, which can be reordered to highlight collocational patterns.

References[edit]

^ Kopaczyk, Joanna (29 April 2016). "Wendy Anderson (ed.), Language in Scotland. Corpus-based studies". Northern Scotland. 7 (1): 112–117. doi:10.3366/nor.2016.0117. ISSN 0306-5278.

External links[edit]

Official website

This article about a digital library is a stub. You can help Wikipedia by expanding it.

This article about Germanic languages is a stub. You can help Wikipedia by expanding it.

[1] Kopaczyk, Joanna (29 April 2016). "Wendy Anderson (ed.), Language in Scotland. Corpus-based studies". Northern Scotland. 7 (1): 112–117. doi:10.3366/nor.2016.0117. ISSN 0306-5278.

[1]

v t e Corpus linguistics
Text corpora, English	American National Corpus Bank of English Bergen Corpus of London Teenage Language British National Corpus Brown Corpus Buckeye Corpus Cambridge English Corpus Corpus of Contemporary American English Enron Corpus EnTenTen International Corpus of English Lancaster-Oslo-Bergen Corpus Oxford English Corpus PropBank Spoken English Corpus Switchboard Telephone Speech Corpus TIMIT VerbNet Wellington Corpus of Spoken New Zealand English
Text corpora, non-English	Bijankhan Corpus CHILDES CorCenCC National Corpus of Contemporary Welsh Croatian Language Corpus Croatian National Corpus Czech National Corpus Europarl Corpus German Reference Corpus Hamshahri Corpus National Corpus of Polish Neo-Assyrian Text Corpus Project Persian Speech Corpus Quranic Arabic Corpus Russian National Corpus Scottish Corpus of Texts and Speech Slovenian National Corpus TalkBank Tatoeba Tehran Monolingual Corpus Tekstaro de Esperanto TenTen Corpus Family Thesaurus Linguae Graecae
Organizations	BNC consortium COBUILD Sketch Engine