Jump to content

Language documentation tools and methods

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Chris the speller (talk | contribs) at 13:54, 9 October 2016 (Toolbox: per WP:HYPHEN, sub-subsection 3, points 3,4,6, replaced: widely- → widely using AWB). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The field of language documentation in the modern context involves a complex and ever-evolving set of tools and methods, and the study and development of their use - and, especially, identification and promotion of best practices - can be considered a sub-field of language documentation proper.[1] Among these are hardware tools, software tools, workflows and methods, and ethical practices.[2]

Workflows and other methods

Researchers in language documentation often begin with linguistic fieldwork, by recording audiovisual files that document language use in traditional contexts. Because the types of environment in which linguistic fieldwork often takes place may be logistically challenging, not every type of recording tool is necessarily ideal, and compromises must often be struck between quality, cost and usability. It is also important to envision the remainder of one's workflow; for example, if video files are made, some amount of processing may be required to expose the audio component to processing in various ways by different software packages.

Hardware

Video+audio recorders

Audio recorders and microphones

Audio-only recorders can be used in scenarios where video is impractical or otherwise undesirable. In most cases it is advantageous to combine the use of an audio-only recorder with one or more external microphones, however many modern audio recorders include built-in microphones which are usable if cost or setup speed are important concerns. Digital (solid state) recorders are preferred for most language documentation scenarios. Modern digital recorders achieve a very high level of quality at a relatively low price. Some of the most popular field recorders are found in the Zoom range, including the H1, H2, H4, H5 and H6. The H1 is particularly suitable for situations in which cost and user-friendliness are major desiderata.

Several types of microphone can be effectively used in language documentation scenarios, depending on the situation (especially, including factors such as number, position and mobility of speakers) and on budget. In general, condenser microphones should be selected rather than dynamic microphones. It is an advantage in most fieldwork situations if a condenser microphone is self-powered (via a battery); however, when power is not a major factor, phantom-powered models can also be used. A stereo microphone setup is needed whenever more than one speaker is involved in a recording; this can be achieved via an array of two mono microphones, or by a dedicated stereo microphone. Directional microphones should be used in most cases, in order to isolate a speaker's voice from other potential noise sources. However, omnidirectional microphones may be preferred in situations involving larger numbers of speakers arrayed in a relatively large space. Among directional microphones, cardioid microphones are suitable for most applications, however in some cases a hypercardioid ("shotgun") microphone may be preferred. Good quality headset microphones are comparatively expensive, but can produce recordings of unsurpassed quality in controlled situations. Lavalier microphones may be used in some situations, however they produce recordings which are inferior to a headset microphone for phonetic analysis, and are subject to the same concerns that headset microphones are in terms of restriction of a recording to a single speaker.

Other recording tools

Electrical power generation, storage and management

Computer systems

Accessories

Software

There is as yet no single software suite which is designed to or able to handle all aspects of a typical language documentation workflow. Instead, there is a large and increasing number of packages designed to handle various aspects of the workflow, many of which overlap considerably. Some of these packages use standard formats and are inter-operable, whereas others are much less so.

SayMore

SayMore is a language documentation package developed by SIL International in Dallas which primarily focuses on the initial stages in language documentation, and aims for a relatively uncomplicated user experience.

The primary functions of SayMore are: (a) audio recording (b) file import from recording device (video and/or audio) (c) file organization (d) metadata entry at session and file levels (e) association of AV files with evidence of informed consent and other supplementary objects (such as photographs) (f) AV file segmentation (g) transcription/translation (h) BOLD-style Careful Speech annotation and Oral Translation.

SayMore files can be further exported for annotation in FLEx, and metadata can be exported in .csv and IMDI formats for archiving.

ELAN

ELAN is developed by The Language Archive at the Max Planck Institute for Psycholinguistics in Nijmegen. ELAN is a full-featured transcription tool, particularly useful for researchers with complex annotation needs/goals.

FLEx

FieldWorks Language Explorer, FLEx is developed by SIL International formerly Summer Institute of Linguistics, Inc. at SIL International in Dallas. FLEx allows the user to build a "lexicon" of the language, i.e. a word-list with definitions and grammatical information, and also to store texts from the language. Within the texts, each word or part of a word (i.e. a "morpheme") is linked to an entry in the lexicon.

Toolbox

Field Linguist's Toolbox (usually called Toolbox) is a precursor of FLEx and has been one of the most widely used language documentation packages for some decades. Previously known as Shoebox, Toolbox's primary functions are construction of a lexical database, and interlinearization of texts through interaction with the lexical database. Both lexical database and texts can be exported to a word processing environment, in the case of the lexical database using the Multi-Dictionary Formatter (MDF) conversion tool. It is also possible to use Toolbox as a transcription environment.[3] By comparison with ELAN and FLEx, Toolbox has relatively limited functionality, and is felt by some to have an unintuitive design and interface. However, a large number of projects have been carried-out in the Shoebox/Toolbox environment over its lifespan, and its user base continues to enjoy its advantages of familiarity, speed, and community support.

Tools for automating components of the workflow

Language documentation may be partially automated thanks to a number of software tools, including:

  • Maus
  • Sox
  • Prosodylab Aligner
  • eSpeak
  • HTK

Data Formats

Standards for formats are critical for interoperability between software tools, e.g. OLAC.

Ethics

Ethical practices in language documentation have been the focus of much recent discussion and debate.[4] The Linguistic Society of America has prepared an Ethics Statement, and maintains an Ethics Discussion Blog which is primarily focused on ethics in the language documentation context. The morality of ethics protocols has itself been brought into question by George van Driem.[5]

Literature

The peer-reviewed journal Language Documentation and Conservation has published a large number of articles focusing on tools and methods in language documentation.

See also

LRE Map Language resources map Searchable by Resource Type, Language(s), Language type, Modality, Resource Use, Availability, Production Status, Conference(s), Resource name

Richard Littauer's GitHub catalog A catalog of "open-source code that would be useful for documenting, conserving, developing, preserving, or working with endangered languages".

RNLD software page Research Network for Linguistic Diversity's page on linguistic software.

References

  1. ^ "LD Tools Summit". sites.google.com. Retrieved 2016-06-02.
  2. ^ Bowern, Claire. Linguistic Fieldwork - Springer. doi:10.1057/9780230590168.
  3. ^ Margetts, Andrew (2009). "Using Toolbox with Media Files". Language Documentation & Conservation. 3 (1): 51–86.
  4. ^ Austin, Peter K. 2010. 'Communities, ethics and rights in language documentation.' In Peter K. Austin, Ed., Language Documentation and Description Vol 7. London, SOAS: 34-54.
  5. ^ van Driem, George (2016). "Endangered Language Research and the Moral Depravity of Ethics Protocols". Language Documentation and Conservation 10: 243-252.