Jump to content

Language documentation

From Wikipedia, the free encyclopedia

Language documentation (also: documentary linguistics) is a subfield of linguistics which aims to describe the grammar and use of human languages. It aims to provide a comprehensive record of the linguistic practices characteristic of a given speech community.[1][2][3] Language documentation seeks to create as thorough a record as possible of the speech community for both posterity and language revitalization. This record can be public or private depending on the needs of the community and the purpose of the documentation. In practice, language documentation can range from solo linguistic anthropological fieldwork to the creation of vast online archives that contain dozens of different languages, such as FirstVoices or OLAC.[4]

Language documentation provides a firmer foundation for linguistic analysis in that it creates a corpus of materials in the language. The materials in question can range from vocabulary lists and grammar rules to children's books and translated works. These materials can then support claims about the structure of the language and its usage.[5] This should be seen as a basic taxonomic task for linguistics, identifying the range of languages and their characteristics.


Typical steps involve recording, maintaining metadata, transcribing (often using the International Phonetic Alphabet and/or a "practical orthography" made up for that language), annotation and analysis, translation into a language of wider communication, archiving and dissemination.[6] Critical is the creation of good records in the course of doing language description. The materials can be archived, but not all archives are equally adept at handling language materials preserved in varying technological formats, and not all are equally accessible to potential users.[7]

Language documentation complements language description, which aims to describe a language's abstract system of structures and rules in the form of a grammar or dictionary. By practising good documentation in the form of recordings with transcripts and then collections of texts and a dictionary, a linguist works better and can provide materials for use by speakers of the language. New technologies permit better recordings with better descriptions which can be housed in digital archives such as AILLA, Pangloss, or Paradisec. These resources can then be made available to the speakers. The first example of a grammar with a media corpus is Thieberger's grammar of South Efate (2006).[8]

Language documentation has also given birth to new specialized publications, such as the free online and peer-reviewed journal Language Documentation & Conservation and the SOAS working papers Language Documentation & Description.

Digital language archives[edit]

The digitization of archives is a critical component of language documentation and revitalization projects.[9] There are descriptive records of local languages that could be put to use in language revitalization projects that are overlooked due to obsolete formatting, incomplete hard-copy records, or systematic inaccessibility. Local archives in particular, which may have vital records of the area's indigenous languages, are chronically underfunded and understaffed.[10] Historic records relating to language that have been collected by non-linguists such as missionaries can be overlooked if the collection is not digitized.[11] Physical archives are naturally more vulnerable to damage and information loss.[9]

Teaching with documentation[edit]

Language documentation can be beneficial to individuals who would like to teach or learn an endangered language.[12] If a language has limited documentation this also limits how it can be used in a language revitalization context. Teaching with documentation and linguist's field notes can provide more context for those teaching the language and can add information they were not aware of.[12] Documentation can be useful for understanding culture and heritage, as well as learning the language. Important components when teaching a language includes: Listening, reading, speaking, writing, and cultural components. Documentation gives resources to further the skills for learning a language.[12] For example, the Kaurna language was revitalized through written resources.[13] These written documents served as the only resource and were used to re-introduce the language and one way was through teaching, which also included the making of a teaching guide for the Kaurna language.[13] Language documentation and teaching have a relationship because if there are no fluent speakers of a language, documentation can be used as a teaching resource.


Language description, as a task within linguistics, may be divided into separate areas of specialization:

  • Phonetics, the study of the sounds of human language
  • Phonology, the study of the sound system of a language
  • Morphology, the study of the internal structure of words
  • Syntax, the study of how words combine to form grammatical sentences
  • Semantics, the study of the meaning of words (lexical semantics), and how these combine to form the meanings of sentences
  • Historical linguistics, the study of languages whose historical relations are recognizable through similarities in vocabulary, word formation, and syntax
  • Pragmatics, the study of how language is used by its speakers
  • Stylistics, the study of style in languages
  • Paremiography, the collection of proverbs and sayings

Related research areas[edit]



  1. ^ Himmelmann, Nikolaus P. (1998). "Documentary and descriptive linguistics" (PDF). Linguistics. 36 (1): 161–195. doi:10.1515/ling.1998.36.1.161. S2CID 53134117. Retrieved 2018-01-18.
  2. ^ Gippert, Jost; Himmelmann, Nikolaus P.; Mosel, Ulrike, eds. (2006). Essentials of language documentation. Berlin: Mouton de Gruyter. pp. x, 424. ISBN 978-3-11-018864-6.
  3. ^ Woodbury, Anthony C. (2003). "Defining documentary linguistics". In Austin, Peter K. (ed.). Language documentation and description. Vol. 1. London: SOAS. pp. 35–51. Retrieved 2018-01-18.
  4. ^ Bird, Steven; Simons, Gary (2003). "Seven Dimensions of Portability for Language Documentation and Description". Language. 79 (3): 557–582. arXiv:cs/0204020. doi:10.1353/lan.2003.0149. ISSN 0097-8507. JSTOR 4489465. S2CID 2046136.
  5. ^ Cushman, Ellen (2013). "Wampum, Sequoyan, and Story: Decolonizing the Digital Archive". College English. 76 (2): 115–135. ISSN 0010-0994. JSTOR 24238145.
  6. ^ Boerger, Brenda H.; Moeller, Sarah Ruth; Reiman, Will; Self, Stephen (2018). Language and culture documentation manual. Leanpub. Retrieved 2018-01-18.
  7. ^ Chang, Debbie. 2011. TAPS: Checklist for Responsible Archiving of Digital Language Resources Archived 2013-06-17 at the Wayback Machine. MA thesis: Graduate Institute of Applied Linguistics.
  8. ^ Thieberger, Nick (2006). A Grammar of South Efate: An Oceanic Language of Vanuatu. Oceanic Linguistics Special Publication. Honolulu: University of Hawai'i Press. ISBN 9780824830618.
  9. ^ a b Conway, Paul (2010). "Preservation in the Age of Google: Digitization, Digital Preservation, and Dilemmas". The Library Quarterly. 80 (1): 61–79. doi:10.1086/648463. hdl:2027.42/85223. JSTOR 10.1086/648463. S2CID 57213909.
  10. ^ Miller, Larisa K. (2013). "All Text Considered: A Perspective on Mass Digitizing and Archival Processing". The American Archivist. 76 (2): 521–541. doi:10.17723/aarc.76.2.6q005254035w2076. ISSN 0360-9081. JSTOR 43490366.
  11. ^ Bickel, Rachel; Dupont, Sarah (2018-11-29). "Indigitization". KULA: Knowledge Creation, Dissemination, and Preservation Studies. 2 (1): 11. doi:10.5334/kula.56. ISSN 2398-4112.
  12. ^ a b c Sapién, Racquel-María; Hirata-Edds, Tracy (2019-07-12). "Using existing documentation for teaching and learning endangered languages". Language and Education. 33 (6): 560–576. doi:10.1080/09500782.2019.1622711. ISSN 0950-0782. S2CID 199154941.
  13. ^ a b Amery, Rob (2009-12-13). Phoenix or Relic? Documentation of Languages with Revitalization in Mind. University of Hawai'i Press. OCLC 651064087.

External links[edit]