Forensic linguistics

From Wikipedia, the free encyclopedia

Forensic linguistics, legal linguistics, or language and the law is the application of linguistic knowledge, methods, and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure. It is a branch of applied linguistics.

There are principally three areas of application for linguists working in forensic contexts:[1]

  • understanding language of the written law,
  • understanding language use in forensic and judicial processes, and
  • the provision of linguistic evidence.

The discipline of forensic linguistics is not homogeneous; it involves a range of experts and researchers in different areas of the field.


The term forensic linguistics first appeared in 1968 when Jan Svartvik, a Swedish professor of linguistics, used it in "The Evans Statements: A Case for Forensic Linguistics" an analysis of statements by Timothy John Evans.[2] It was in regard to re-analyzing the statements given to police at Notting Hill police station, England, in the case of an alleged murder by Evans in 1949. Evans was suspected of murdering his wife and baby, and he was tried and hanged for the crime. Yet, when Svartvik studied the statements allegedly given by Evans, he found that there were different stylistic markers involved, and Evans did not actually give the statements to the police officers as had been stated at the trial.[3] Sparked by this case, forensic linguists in the UK at the time were focused on questioning the validity of police interrogations. As seen in numerous famous cases (e.g. the convictions of Derek Bentley, the Guildford Four, the Bridgewater Three), many of the major concerns were of the statements police officers used. Numerous times, the topic of police register came up – the type of stylist language and vocabulary used by members of the law enforcement when transcribing witness statements.[3]

In the US, forensic linguistics can be traced back as early as 1927 to a ransom note in Corning, New York. The Associated Press reported, "Duncan McLure, of Johnson City, uncle of the [kidnapped] girl, is the only member of the family to spell his name 'McLure' instead of 'McClure'. The letter he received, supposedly from the kidnappers, was addressed to him by the proper name, indicating that the writer was familiar with the difference in spelling."[4] Other work of forensic linguistics in the United States concerned the rights of individuals with regard to understanding their Miranda rights during the interrogation process.[3] The 1963 case of Ernesto Miranda was pivotal to the beginning of the forensic linguistics field. His case led to the creation of the Miranda rights and pushed the focus of forensic linguistics on witness questioning rather than police statements. Various cases came about that challenged whether or not suspects truly understood what their rights meant – leading to a distinction of coercive versus voluntary interrogations.[3] An early application of forensic linguistics in the United States was related to the status of trademarks as words or phrases in the language. One of the bigger cases involved fast food giant McDonald's claiming that it had originated the process of attaching unprotected words to the "Mc" prefix (referred to as McWords) and was unhappy with Quality Inns International's intention of opening a chain of economy hotels to be called "McSleep".[5]

In the 1980s, Australian linguists discussed the application of linguistics and sociolinguistics to legal issues.[3] They discovered that a phrase such as "the same language" is open to interpretation. Aboriginal people have their own understanding and use of "English", something that is not always appreciated by speakers of the dominant version of English, i.e. "white English". The Aboriginal people also bring their own culturally-based interactional styles to the interview.

The 2000s saw a considerable shift in the field of forensic linguistics, which has been described as a coming-of-age of the discipline, as it spread to many countries around the world, from Europe to Australia and Japan.[6] Today, not only does the field have professional associations such as the International Association for Forensic and Legal Linguistics (IAFLL), founded as the International Association of Forensic Linguists (IAFL) in 1993, and the Austrian Association for Legal Linguistics (AALL), founded in 2017,[7] but it can now provide the scientific community with a range of textbooks such as Coulthard, Johnson and Wright (2017), Gibbons (2003), and Olsson (2008).[8]

Additionally, certain leading institutions have developed programs of study and institutes focused on forensic linguistics. In the United States, Hofstra University has developed a master's degree program, and the Institute for Forensic Linguistics, Threat Assessment, and Strategic Analysis, which conducts special projects, research, and internships in forensic linguistics.[9] This institute contains the Forensic Linguistics Capital Case Innocence Project, formed in 2014, which involves supervised interns reanalyzing language data in capital punishment cases to locate any possibilities of appeal.[10] In the United Kingdom, Aston University has developed master's degree and Ph.D. programs as well as the Aston Institute for Forensic Linguistics (AIFL), founded in 2019 and formerly known as the Aston Centre for Forensic Linguistics which was founded in 2008, that studies forensic texts and conducts research using a variety of methods from numerous areas of linguistics.[11]

Areas of study[edit]

The range of topics within forensic linguistics is diverse, but research occurs in the following areas:

The language of legal texts[edit]

Communication problems may occur between the written law and lay persons due to the complex nature of the language and vocabulary used in these legal texts.[12] Forensic linguists will study these texts to understand how these issues arise, and if necessary, provide explanations or translations of the contents.[13]

One area of the language of legal texts encompasses the Miranda warning in the United States. These warnings let the defendant know that they have the right to be silent, since whatever they say from the moment they are in police custody can and will be used against them in a court of law. The recipients who are advised of these rights must have a certain level of competency in the English language in order to completely understand the warning.[14]

The language of legal processes[edit]

Among other things, this area examines language as it is used in cross-examination, evidence presentation, judge's direction, police cautions, police testimonies in court, summing up to a jury, interview techniques, the questioning process in court, and in other areas such as police interviews.

Police officers use specific language to elicit certain responses from civilians. Because of a police officer's social stature, and the way they often phrase "requests" as "commands", people may be confused as to what their rights are when they are being questioned by police. Officers use linguistic tactics including putting the blame onto the victim and asking questions with ambiguous phrasing to elicit specific responses from people.[15] In addition, language structure in witness or suspect interviews can have an impact on the statements which are made from these. In the case of Derek Bentley, elicitation of a narrative through questions rather than a natural account resulted in a wrongful prosecution and sentence of the death penalty. Analysis of the statement revealed the statement was not Bentley's own words and was constructed by the police officers.[16]

When a victim is invoking their right to a lawyer, there are directions stating that the request may not come off as ambiguous. In fact, if the request is not stated in a way that the officer deems to be clear, the victim may not receive their request for counsel at all.

During the examination process, language plays a substantial role in the presentation of a story to the courtroom. A defendant's ambiguity may be deemed unacceptable. The language used by the lawyer to construct the story to the courtroom elicits specific responses from the witness, and specific emotions from the jury. For example, in an instance where a lawyer is examining a hostile witness, they will often use language to limit the response of the witness, in order to avoid having the witness present conflicting evidence. In this instance, yes/no questions will be targeted, and questions with room for elaboration, such as wh-formation questions, will likely be avoided. In a situation where a lawyer interviews a friendly witness whose testimony could potentially strengthen the story constructed by the lawyer, the opposite may occur, where wh-questions are targeted to allow for elaboration.[17]

Lawyers employ specific tactics for both themselves and their witnesses to come off as more or less truthful to the jury and the people of the courtroom. For example, the lawyer may refer to the witness by their first name or a nickname to humanize the witness, or they may speak using slang in order to create less social distance between themselves and the courtroom. The lawyer may also avoid using slang, and instead use complicated law terminology to set themselves apart from the courtroom and define their status.[17]

The lawyer works in constructing the language of the legal process of the courtroom, and specific witnesses may respond to the lawyer's questions in different ways, eliciting new language tactics and opinions from the jury. For example, witnesses may use direct or indirect speech based on their previous societal experiences, gender differences, socioeconomic differences, or differences in education level. Using particular dialects, slang, or sentence formations could assist in making the witness more or less truthful to the jury.[17]

Since language can be used to elicit responses in the legal process, the right to an interpreter plays a part in the fairness of the trial. "The right to an interpreter is essentially a procedural right that derives from the right to a fair trial: everyone charged with a criminal offence has the right to certain minimum procedural guarantees, and these include the right to the free assistance of an interpreter where s/he cannot understand or speak the language of the concerned court."[18]

Forensic text types[edit]

Emergency call[edit]

In an emergency call, the recipient or emergency operator's ability to extract primarily linguistic information in threatening situations and to come up with the required response in a timely manner is crucial to the successful completion of the call. Emphasis on intonation, voice pitch, and the extent to which there is cooperation between the caller and the recipient at any one time are also very important in analyzing an emergency call. Full cooperation includes frank and timely responses.

Urgency plays a role in emergency calls, so hesitations, signs of evasiveness, and incomplete or overly short answers indicate that the caller might be making a false or hoax call. A genuine call has distinctive interlocking and slight overlap of turns. The recipient trusts the caller to provide accurate information and the caller trusts the recipient to ask only pertinent questions. If the caller uses a rising pitch at the end of every turn, it might represent a lack of commitment; the recipient's use of a rising pitch indicates doubt or desire for clarification. The call ideally moves from nil knowledge on the part of the recipient to a maximum amount of knowledge in a minimum possible period of time. This makes the emergency call unlike any other kind of service encounter.[2]

Ransom demands or other threat communication[edit]

Threat is a counterpart of a promise and is an important feature in a ransom demand. Ransom demands are also examined to identify between genuine and false threats. An example of a ransom note analysis can be seen in the case of the Lindbergh kidnapping, where the first ransom note (sometimes referred to as the Nursery Note) stated: "We warn you for making anyding public or for notify the Polise the child is in gut care" [sic].[19] In the sentence, the kidnapper makes the claim that the child is in good hands, but to make such a claim, the note would have to have been written before the perpetrator entered the premises. Therefore, the claim is false (at the time of writing) since the kidnapper had not even encountered the child when he wrote the note.[20]

Kidnappers may write statements that later end up being true, such as "your child is being held in a private location" being written ahead of time.

Ransom demands in the style of written notes have been present in many notable cases. The style of writing used in a ransom note is examined by forensic linguists in order to determine the writing's true intent, as well as to determine who wrote the note. Forensic linguists look at factors such as syntactic structures, stylistic patterns, punctuation, and even spelling while analyzing ransom notes.[21] In the case of the Lindbergh ransom note, forensic linguists compared similarities of writing styles from the note to that of writing of the suspect, creating a better chance at discovering who wrote the note.[15]

Bomb threats are another form of threat communication. It is the Forensic Linguists job to determine the validity of the statement and if the note has been tampered with. Linguists often work with other interrelated fields such as cyber analysts if the threat is made through text or an internet forum to test for validity or alterations.[22]

Suicide letters[edit]

Stefan Zweig Suicide Letter

A suicide note is typically brief, concise, and highly propositional with a degree of evasiveness.[2] A credible suicide letter must be making a definite unequivocal proposition in a situational context. The proposition of genuine suicide is thematic, directed to the addressee (or addressees), and relevant to the relationship between them and the writer. Suicide notes generally have sentences alluding to the act of killing oneself, or the method of suicide that was undertaken.[23] The contents of a suicide note could be intended to make the addressee suffer or feel guilt. Genuine suicide letters are short, typically less than 300 words in length.[2] Extraneous or irrelevant material is often excluded from the text.[23]

Death row statements[edit]

Death row statements either admit the crime, leaving the witness with an impression of honesty and forthrightness, or deny the crime, leaving the witness with an impression of innocence. They may also denounce witnesses as dishonest, critique law enforcement as corrupt in an attempt to portray innocence, or seek an element of revenge in their last moments.[23] Death row statements are made within the heavily institutionalized setting of death row prisons.

The Forensic Linguistics Institute holds a corpus of these documents and is conducting research on them.

Social media[edit]

Social media statements are often context-specific, and their interpretation can be highly subjective. Forensic application of a selection of stylistic and stylometric techniques has been done in a simulated authorship attribution case involving texts in relation to Facebook.[24] Analysis of social media postings can reveal whether they are illegal (e.g. sex trade) or unethical (e.g. intended to harm) or whether they are not (e.g. simply provocative or free speech).[25]

Use of linguistic evidence in legal proceedings[edit]

These areas of application have varying degrees of acceptability or reliability within the field. Linguists have provided evidence in:

  • Trademark and other intellectual property disputes
  • Disputes of meaning and use
  • Author identification (determining who wrote an anonymous text by making comparisons to known writing samples of a suspect; such as threat letters, mobile phone texts or emails)
  • Forensic stylistics (identifying cases of plagiarism)
  • Voice identification, also known as forensic phonetics (used to determine, through acoustic qualities, if the voice on a tape recorder is that of the defendant)
  • Discourse analysis (the analysis of the structure of written or spoken utterance to determine who is introducing topics or whether a suspect is agreeing to engage in criminal conspiracy)
  • Language analysis (forensic dialectology) tracing the linguistic history of asylum seekers (Language Analysis for the Determination of Origin)[26]
  • Reconstruction of mobile phone text conversations
  • Forensic phonetics

Specialist databases of samples of spoken and written natural language (called corpora) are now frequently used by forensic linguists. These include corpora of suicide notes, mobile phone texts, police statements, police interview records, and witness statements. They are used to analyze language, understand how it is used, and to reduce the effort needed to identify words that tend to occur near each other (collocations or collocates).

Author identification[edit]

The identification of whether a given individual said or wrote something relies on analysis of their idiolect,[27] or particular patterns of language use (vocabulary, collocations, pronunciation, spelling, grammar, etc.). The idiolect is a theoretical construct based on the idea that there is linguistic variation at the group level and hence there may also be linguistic variation at the individual level. William Labov has stated that nobody has found a "homogenous data" in idiolects,[28] and there are many reasons why it is difficult to provide such evidence.

Firstly, language is not an inherited property, but one which is socially acquired.[29] Because acquisition is continuous and life-long, an individual's use of language is always susceptible to variation from a variety of sources, including other speakers, the media, and macro-social changes. Education can have a profoundly homogenizing effect on language use.[2] Research into authorship identification is ongoing. The term authorship attribution is now felt to be too deterministic.[30]

The paucity of documents (ransom notes, threatening letters, etc.) in most criminal cases in a forensic setting means there is often too little text upon which to base a reliable identification. However, the information provided may be adequate to eliminate a suspect as an author or narrow down an author from a small group of suspects.

Authorship measures that analysts use include word length average, average number of syllables per word, article frequency, type-token ratio, punctuation (both in terms of overall density and syntactic boundaries), and the measurements of hapax legomena (unique words in a text). Statistical approaches include factor analysis, Bayesian statistics, Poisson distribution, multivariate analysis, and discriminant function analysis of function words.

The CUSUM (cumulative sum) method for text analysis has also been developed.[31] CUSUM analysis works even on short texts and relies on the assumption that each speaker has a unique set of habits, thus rendering no significant difference between their speech and writing. Speakers tend to utilize two- to three-letter words in a sentence and their utterances tend to include vowel-initial words.

In order to carry out the CUSUM test on habits of utilizing two- to three-letter words and vowel-initial words in a sentential clause, the occurrences of each type of word in the text must be identified and the distribution plotted in each sentence. The CUSUM distribution for these two habits will be compared with the average sentence length of the text. The two sets of values should track each other. Any altered section of the text would show a distinct discrepancy between the values of the two reference points. The tampered-with section will exhibit a different pattern from the rest of the text.

Forensic stylistics[edit]

This category focuses on more written and spoken evidence where the examiner will determine the meaning, content, speaker identification, and determination of author to find the bases of the plagiarism.[32]

One of the earliest[citation needed] cases where forensic stylistics was used to detect plagiarism was the case of Helen Keller's short story "The Frost King", in which the deaf-blind American author was accused of plagiarism in 1892. An investigation revealed that "The Frost King" had been plagiarized from Margaret Canby's "Frost Fairies", which had been read to Keller some time earlier.[33] Keller was found to have made only minute changes to common words and phrases and used less common words to say the same thing, suggesting mere alterations to original ideas.[citation needed] Keller used "vast wealth" instead of "treasure" (230 times less common in the language), "bethought" instead of "concluded" (approximately 450 times less common), and "bade them" instead of "told them" (approximately 30 times less common). Keller used the phrase "ever since that time", but Canby chose "from that time" (the latter 50 times more common than the former). Keller also used "I cannot imagine", but Canby used "I do not know". "Know" is approximately ten times more common than "imagine". Keller relied on a lexis that is less common than that of Canby. The Flesch and Flesch–Kincaid readability test showed that Canby's text had more originality than Keller's. Canby's text obtained a higher grade on the reading ease scale compared to Keller's. The distinctions between Keller and Canby's text are at the lexical and phrasal level.

Other examples of plagiarism include the cases between Richard Condon, author of The Manchurian Candidate, and English novelist Robert Graves and between Martin Luther King Jr. and Archibald Carey. Judging by the text in The Manchurian Candidate, Condon's work is rich in clichés such as "in his superstitious heart of hearts". While Helen Keller took pride in using rare phrases and avoided common source words, Condon was fond of expanding existing words into phrases and existing phrases into more extensive ones. Condon was also found to have borrowed from a wide range of Graves' work.[2] However, in the plagiarism case between King and Archibald Carey, almost half of King's doctoral dissertation was discovered to have been copied from another theology student. King simply changed the names of the mountains and used much more alliteration and assonance.[34] Carey's and Graves' texts (source texts) were noticeably shorter, pithier, and simpler in structure, while Condon's and King's texts relied on "purple" devices, extending the existing text and flourishing their language significantly.[citation needed]

Another famous example is that in the case of Ted Kaczynski, who was eventually convicted of being the "Unabomber". Ted's brother, David Kaczynski, recognized his writing style in the phrase "cool-headed logicians" in the published 35,000-word Industrial Society and Its Future (commonly called the "Unabomber Manifesto") and notified the authorities. FBI agents searching Kaczynski's hut found hundreds of documents written by Kaczynski but not published anywhere. An analysis produced by FBI Supervisory Special Agent James R. Fitzgerald identified numerous lexical items and phrases common to the two documents.[35] Some were more distinctive than others, but the prosecution argued that even the more common words and phrases being used by Kaczynski became distinctive when used in combination with one another.[36]

Forensic stylistics was used to analyze the texts of Jack the Ripper dating back to 1888. Jack the ripper’s first letter to the police was received on September 24, 1888. Its contents contained the details of the second murder of Annie Chapman, who was killed on September 8, 1888, and the letter was unsigned. The second text received on September 27 was the “Dear Boss” letter, which was signed “Jack the Ripper”. The third notable piece of writing was a postcard titled “Saucy Jacky” received on October 1, 1888. The postcard addressed the double murder of Elizabeth Stride and Catherine Eddowes on September 30, 1888 and was also signed “Jack the Ripper”. The final important text was titled “Moab the Midian”, contained elements of a triple homicide, was justified by religious motives, and was received on October 5, 1888. These are the three most important texts regarding the Ripper case, according to forensic linguistic analysis. Jack the Ripper became very famous, very quickly, and people enjoyed writing in false letters on his behalf. These three letters were received before any publicity came to the case, and therefore are considered the most reliable. Linguists went on to analyze the texts claiming to be from the Ripper to determine whether or not they were written by the same person. A JRC (Jack the Ripper Corpus) was compiled with the 209 texts containing 17, 643 words. The average length of one piece of text was 83 tokens, with the minimum being 7 and the maximum being 648. Looking at the addressee’s, 67% were to Scotland Yard or other law enforcement units, 20% were to common citizens, and 13% were to unknown addressees. The majority of the postmarks were from London, however some of them were from all over the United Kingdom. All of the letters were handwritten, and 4% contained drawings depicting items like knives and coffins. 75% of the letters were signed “Jack the Ripper” or “Jack the Whitechapel Ripper”, “JR”, or “jack ripper and son”. 14% used other pseudonyms like “Jim the Cutter” and “Bill the Bowler”. 11% were not signed. The dates of the 209 texts ranged from September 24, 1888 to October 14th, 1896, however 62% of the texts in the JRC were received during October through November 1888. The term “author clustering” refers to the process carried out in order to determine if a set of texts has the same author. The Ripper texts were too short to use current computation methods of author clustering and authorship identification, so the Jaccard coefficient and Jaccard deviation was used. The Jaccard coefficient is the number of shared features between both texts divided by the total number of features in both texts. This method is currently applied to text messages and short texts, like emails, newspaper articles, and personal narratives. In simpler terms, it does a plagiarism check, but hoping for a match. This analyzes for word n-grams, which are strings of words of a certain length (n). The more n-gram words in common, the more similarities between the two documents, the higher the likeliness they are not independent. The method of using word n-grams is called frequency-based stylometrics. Although this method is not as reliable as looking for function words, simple word frequencies, or character n-grams, word strings are rarer because combinations of words are at the core of language processing. They reveal idiolectal nature because every author subconsciously uses their own idiosyncratic set of lexical choices. For the JRC, they looked at the presence and absence of word n-grams instead of the frequency of word n-grams because the texts were so short. Because they’re using the presence and absence instead of the frequency, the Jaccard distance is used. This will provide a more inclusive answer consisting of major groups of texts that are more like each other, and then smaller groups can be examined after. The JRC was analyzed against a Comparison corpus consisting of the following: COHA, CLMET3, and EOBC. All of the JRC texts were compared to each other and the comparison corpora with a word n-gram of n=2, and a radial dendrogram was produced. From this, “Dear Boss” and “Saucy Jacky” were deemed the most similar with a Jaccard distance of 0.93. That degree of dissimilarity can be found in less than 5% of the texts in the JRC. Moab the Midian came in with the next most similarities. The final conclusion was that Dear Boss and Saucy Jacky cannot be considered independent from one another because they share more word 2-gram uses than 95% of all other possible pairs of texts in the JRC. Of the pre-publication texts, Moab the Midian is almost as close as Dear Boss is to Saucy Jacky and is the only text that is like both.

Discourse analysis[edit]

Discourse analysis deals with analyzing written, oral, or sign language use, or any significant semiotic event. According to the method, the close analysis of a covert recording can produce useful deductions. The use of "I" instead of "we" in a recording highlights non-complicity in a conspiracy. The utterance of "yeah" and "uh-huh" as responses indicate that the suspect understands the suggestion, while feedback markers such as "yeah" and "uh-huh" do not denote the suspect's agreement to the suggestion. Discourse analysts are not always allowed to testify but during preparation for a case they are often useful to lawyers.[citation needed]

Linguistic dialectology[edit]

This refers to the study of dialects in a methodological manner based on anthropological information. It is becoming more important to conduct systematic studies of dialects, especially within the English language, because they are no longer as distinct as they once were due to the onslaught of mass media and population mobility.[citation needed] Political and social issues have also caused languages to straddle geographical borders resulting in certain language varieties spoken in multiple countries, leading to complications when determining an individual's origin by means of his/her language or dialect.

Dialectology was used during the investigations into the Yorkshire Ripper tape hoax.[37]

Forensic phonetics[edit]

The forensic phonetician is concerned with the production of accurate transcriptions of what was being said. Transcriptions can reveal information about a speaker's social and regional background. Forensic phonetics can determine similarities between the speakers of two or more separate recordings. Voice recording as a supplement to the transcription can be useful as it allows victims and witnesses to indicate whether the voice of a suspect is that of the accused, i.e. alleged, criminal.

A man accused of manufacturing the drug ecstasy was misheard by the police transcriber as "hallucinogenic".[36] The police transcriber heard "but if it's, as you say, it's hallucinogenic, it's in the Sigma catalogue". However, the actual utterance was "but if it's, as you say, it's German, it's in the Sigma catalogue".

Another disputed utterance was between a police officer and a suspect. One of the topics of conversation was a third man known as "Ernie". The poor signal of the recording made "Ernie" sound like "Ronnie". The surveillance tape presented acoustic problems: an intrusive electronic-sounding crackle, the sound of the car engine, the playing of the car radio, the movement of the target vehicle, and the intrusive noise all coincided with the first syllable of the disputed name.[2]

Forensic speechreading is the complement of forensic voice identification. Transcripts of surveilled video records can sometimes allow expert speechreaders to identify speech content or style where the identity of the talker is apparent from the video record.


There are three ethical rules that linguists are to follow when taking part in court proceedings. [38]

  • The first rule is being a qualified expert witness. Strong academic credentials make it easier for the court and the jury to allow linguists to testify. When taking on cases Forensic Linguists are careful as it may be dangerous to pursue a case that the linguist cannot provide full expertise. [38]
  • The second rule is to testify with truth. It is not the linguists job to prove innocence or guilt, that lies with the judge and jury. However, it is the linguist duty to provide linguistic truth. Their primary job is to help the judge and jury understand the evidence provided in a scientific way. [38]
  • The third rule is to maintain composure throughout the examination. The linguist should be prepared for cross-examination. It is important for them to maintain neutrality and only provide scientific facts based on their expert knowledge. [38]


Evidence from forensic linguistics has more power to eliminate someone as a suspect than to prove him or her guilty.[citation needed] Linguistic expertise has been employed in criminal cases to defend an individual suspected of a crime and during government investigations. Forensic linguists have given expert evidence in a wide variety of cases, including abuse of process, where police statements were found to be too similar to have been independently produced by police officers; the authorship of hate mail; the authorship of letters to an Internet child pornography service; the contemporaneity of an arsonist's diary; the comparison between a set of mobile phone texts and a suspect's police interview, and the reconstruction of a mobile phone text conversation. Some well-known examples include an appeal against the conviction of Derek Bentley; the identification of Subcomandante Marcos, the Zapatistas' charismatic leader, by Max Appedole; and the identification of Ted Kaczynski as the so-called "Unabomber" by James R. Fitzgerald.[39]

The criminal laboratories Bundeskriminalamt (in Germany) and the Nederlands Forensisch Instituut (in the Netherlands) both employ forensic linguists.[26]

Forensic linguistics contributed to the overturning of Derek Bentley's conviction for murder in 1998, although there were other non-linguistic issues. Nineteen-year-old Bentley, who was functionally illiterate, had been hanged in 1953 for his part in the murder of PC Sidney Miles; he had been convicted partly on the basis of his statement to police, allegedly transcribed verbatim from a spoken monologue. When the case was reopened, a forensic linguist found that the frequency and usage of the word "then" in police transcripts suggested the transcripts were not verbatim statements but had been partially authored by police interviewers; this and other evidence led to Bentley's posthumous pardon.[40]

During the investigative stage in identifying Subcomandante Marcos, the Mexican government speculated that he was a dangerous guerrilla fighter. This theory gained much traction at the end of 1994, after the dissident Zapatista Comandante, Salvador Morales Garibay, gave away the identity of his former fellow Zapatistas to the Mexican government, among them Marcos' identity.[41] They all were indicted for terrorism, arrest warrants were issued, and arrests were made in a military action.[42] The Mexican government alleges some Zapatistas to be terrorists, among them Marcos. There was a storm of political pressures claiming for a fast military solution to the 1995 Zapatista Crisis. On 9 February 1995, in a televised special presidential broadcast, President Ernesto Zedillo announced Subcomandante Marcos to be one Rafael Sebastián Guillén Vicente, born 19 June 1957 to Spanish immigrants in Tampico, Tamaulipas, a former professor at the Universidad Autónoma Metropolitana School of Sciences and Arts for the Design.

After the government revealed Marcos' identity in January 1995, Max Appedole, old friend, classmate with the Jesuits at the Instituto Cultural Tampico, made direct intervention in the conflict. Appedole played a major role with the Mexican government to avoid a military solution to the 1995 Zapatista Crisis by demonstrating that, contrary to the accusations announced by President Ernesto Zedillo,[43] Rafael Guillén was no terrorist. Max Appedole identified Marcos' linguistic fingerprint, based on Marcos' specific, unique way of speaking, recognized his literary style in all Marcos' manifestos that were published in the media, and linked them to literary tournaments organized by the Jesuits in which they competed in Mexico. Everyone has an idiolect, encompassing vocabulary, grammar, and pronunciation, that differs from the way other people talk. He confirmed that he had no doubt that Marcos was his friend Rafael Guillén, a pacifist. Max Appedole closed the first successful linguistic profiling confirmation case in the history of law enforcement. Based on these achievements, a new science was developed, giving way to what is now called forensic linguistics. This motivated a new division of forensic linguistics called "criminal profiling in law enforcement".[44][45][46][47]

Forensic linguistic evidence also played a role in the investigation of the 2005 disappearance of Julie Turner, a 40-year-old woman living in Yorkshire. After she was reported missing, her partner received several text messages from Julie's mobile phone, such as "Stopping at jills, back later need to sort my head out", and "Tell kids not to worry. sorting my life out. be in touch to get some things" [sic]. Investigators found that letters written by Turner's friend Howard Simmerson shared linguistic similarities with the text messages, suggesting that Simmerson had been aware of the contents of the messages.[48] Simmerson was eventually found guilty of Turner's murder.[49]

19-year-old Jenny Nicholl disappeared on 30 June 2005. Her body was never found, giving police and forensic scientists little information to go on about what might have happened to Jenny. After looking through her phone for clues, forensic linguists came to the conclusion that the texts sent from her phone around the time that she disappeared seemed very different from her usual texting style, and soon started looking to her ex-boyfriend, David Hodgson, for clues of what happened to her, including looking through his phone and studying his texting style. The forensic linguists found a number of stylistic similarities between David's texting style and the messages sent from Jenny's phone around the time she went missing. Using the timeframe of when she went missing, combined with the differences in texting styles and other forensic details, Jenny's murderer, David Hodgson, was convicted. The analysis of the text messages and their submission in court helped to pave the way for forensic linguistics to be acknowledged as a science in UK law, rather than opinion. To this day, her body has not been found, but justice was still served for her and her family because of forensic linguistics.[50][51]

Forensic linguist John Olsson gave evidence in a murder trial on the meaning of "jooking" in connection with a stabbing.[52]

During the appeal against the conviction of the Bridgewater Four, the forensic linguist examined the written confession of Patrick Molloy, one of the defendants – a confession which he had retracted immediately – and a written record of an interview which the police claimed took place immediately before the confession was dictated. Molloy denied that the interview had ever taken place, and the analysis indicated that the answers in the interview were not consistent with the questions being asked. The linguist came to the conclusion that the interview had been fabricated by police. The conviction against the Bridgewater Four was quashed before the linguist in the case, Malcolm Coulthard, could produce his evidence.

In an Australian case reported by Eagleson, a "farewell letter" had apparently been written by a woman prior to her disappearance. The letter was compared with a sample of her previous writing and that of her husband. Eagleson came to the conclusion that the letter had been written by the husband of the missing woman, who subsequently confessed to having written it and to having killed his wife. The features analyzed included sentence breaks, marked themes, and deletion of prepositions.[53]

In 2009, there was a house fire where a father was able to save his children, but his wife died in the house.[54] The police thought that the fire was actually not an accident, but instead a cover-up of the father murdering the mother. The forensic linguists were able to obtain the phones of both the father and mother, and realized that there were texts still being sent from the mother's phone the whole day – long after the police thought she had died. Using information from the two phones, the linguists were able to study the texting styles of both parents to see if they could obtain any more information about what happened that day.[55] It turned out that the texts sent from the wife's phone were actually the husband pretending to be the wife so that no one would know she was murdered, and everyone would believe that she perished in the house fire. The forensic linguists were able to figure this out by studying the husband's texting style, spelling errors, and more, and were able to come to the conclusion that the texts sent after the wife was thought to be deceased were actually the husband texting from her phone pretending to be her. Without this knowledge, it would have been much more difficult to convict the husband of murder and get justice for the family.[56]

Additional concepts[edit]

Linguistic fingerprinting[edit]

A linguistic fingerprint is a concept put forward by some scholars that each human being uses language differently, and that this difference between people involves a collection of markers which stamps a speaker/writer as unique, similar to a fingerprint. Under this view, it is assumed that every individual uses languages differently and this difference can be observed as a fingerprint.[2] It is formed as a result of merged language style. A person's linguistic fingerprint can be reconstructed from the individual's daily interactions and relate to a variety of self-reported personality characteristics, situational variables, and physiological markers (e.g. blood pressure, cortisol, testosterone).[57] In the process of an investigation, the emphasis should be on the relative rather than absolute difference between the authors and how investigators can classify their texts. John Olsson, however, argues that although the concept of linguistic fingerprinting is attractive to law enforcement agencies, there is so far little hard evidence to support the notion.[2]


Intra-author variations are the ways in which one author's texts differ from each other. Inter-author variations are the ways in which different authors' writing varies. Two texts by one author do not necessarily vary less than texts by two different authors.

  • Genre: When texts are being measured in different genres, considerable variation is observed even though they are by the same author.
  • Text Type: Personal letters contain more inter-relationship bonding strategies than academic articles or term papers.
  • Fiction vs. Non-Fiction: Some fiction writers are journalists. Due to the different demands of each medium, they can be completely different from one another and this results in intra-author variation.
  • Private vs. Public: A politician writing a political speech, which is a public text, will differ greatly from a private text to a friend or family member.
  • Time lapse: The greater the time lapse between two works, the greater the likely variation. Language changes more than we realize in a relatively short span of time, influencing our susceptibility to language changes around us.
  • Disguise: A writer can publish pseudonymously or anonymously, disguising output to prevent recognition.

Forensic transcription[edit]

The two main types of transcriptions are written documents and video and audio records. Accurate, reliable text transcription is important because the text is the data which becomes the available evidence. If a transcription is wrong, the evidence is altered. If there is failure to transcribe the full text, evidence is once again altered unwittingly. There must be emphasis on the text being the evidence. A transcription of an audio file should never be assumed to be completely accurate. Each type of transcription contains its own problems. A handwritten document might contain unusual spellings which may result in ambiguous meanings, illegible handwriting, and illustrations that are difficult to comprehend. A scanned document is tricky, as scanning may alter the original document. Audio and video documents can include repetitions, hesitation, nonsensical talk, jargon which can be hard to understand, and speakers mumbling incoherently and inaudibly. Non-linguistic sounds such as crying and laughing may also be included in the audio and video text which cannot be transcribed easily. Because of this, civil libertarians have argued that interrogations in major criminal cases should be recorded and the recordings kept, as well as transcribed.[58]

Digital Communication[edit]

Digital communicative texts, such as social media posts or text messages, typically display features which are not seen in traditional linguistics. Individuals may employ a variety of methods to convey paralanguage in order to better communicate tone of voice, volume, and expression, such as using capital letters to portray shouting.

With the rise of digital communication, the world has also seen an increase in the use of emoji and emoticons, which are often used to replace non-verbal gestures or facial expressions. The use of emoji and emoticons for authorship identification is still a relatively new idea in forensic linguistics.[59]

See also[edit]


  1. ^ "Centre for Forensic Linguistics". Aston University. Archived from the original on 27 September 2010.
  2. ^ a b c d e f g h i Olsson, John (2008). Forensic Linguistics (Second ed.). London: Continuum. ISBN 978-0-8264-6109-4.
  3. ^ a b c d e Olsson, John. "What is Forensic Linguistics?" (PDF).
  4. ^ "Think Corning Girl Wrote Ransom Note". Plattsburgh Daily Press. Associated Press. 9 September 1927. p. 2 – via NYS Historic Newspapers.
  5. ^ Ayres, Jr, B. Drummond (22 July 1988). "McDonald's, to Court: 'Mc' Is Ours". The New York Times. New York. Retrieved 19 March 2012.
  6. ^ Okawara, Mami Hiraike (2018). "The Interdisciplinary Study of Law and Language: Forensic Linguistics in Japan". In Hebert, D. G. (ed.). International Perspectives on Translation, Education and Innovation in Japanese and Korean Societies. Cham: Springer. pp. 197–206. doi:10.1007/978-3-319-68434-5_13. ISBN 978-3-319-68432-1.
  7. ^ Leisser, Daniel (5 May 2017). "Gründung der Österreichischen Gesellschaft für Rechtslinguistik". Abendgymnasium Wien (in German).
  8. ^ Johnson, Alison; Coulthard, Malcolm. "Current debates in forensic linguistics". The Routledge Handbook of Forensic Linguistics. London: Routledge. p. 1.
  9. ^ "Linguistics Institute: Home | Hofstra University". Retrieved 22 December 2022.
  10. ^ "Linguistics: Innocence Project | Hofstra University". Retrieved 22 December 2022.
  11. ^ "Aston Institute for Forensic Linguistics | Aston University". Retrieved 22 December 2022.
  12. ^ Coulthard, Malcolm (2010). "Forensic Linguistics: the application of language description in legal contexts". Langage et société. 132 (2): 15–33. doi:10.3917/ls.132.0015. ISSN 0181-4095. S2CID 62224476.
  13. ^ Umiyati, Mirsa (2020). "A Literature Review of Forensic Linguistics". International Journal of Forensic Linguistics. 1 (1): 23–29.
  14. ^ Pavlenko, Aneta (March 2008). ""I'm Very Not About the Law Part": Nonnative Speakers of English and the Miranda Warnings". TESOL Quarterly. 42 (1): 1–30. doi:10.1002/j.1545-7249.2008.tb00205.x. ISSN 0039-8322.
  15. ^ a b Solan, Lawrence M.; Tiersma, Peter M. (2005). Speaking of Crime: The Language of Criminal Justice. University of Chicago Press.
  16. ^ Coulthard, Malcolm (2002). Language and the Legal Process. London: Palgrave Macmillan. p. 19-34.
  17. ^ a b c Olsson, John; Luchjenbroers, June (2013). Forensic Linguistics. Bloomsbury. ISBN 9781472569578.
  18. ^ Ali, Siddig Ahmed (November 2013). "The Role of Forensic Translation in Courtroom Contexts". Arab World English Journal: 176.
  19. ^ "The Kidnapping". PBS Online.
  20. ^ Falzini, Mark W. (9 September 2008). "The Ransom Notes: An Analysis of Their Content & 'Signature'". Archival Ramblings.
  21. ^ Potts, Kimberly (17 September 2016). "'The Case Of: JonBenét Ramsey': Investigator Says He and His Colleagues Will Name a Suspect". Yahoo Entertainment.
  22. ^ Alduais, Ahmed; Al-Khulaidi, Mohammed Ali; Allegretta, Silvia; Abdulkhalek, Mona Mohammed (23 May 2023). "Forensic linguistics: A scientometric review". Cogent Arts & Humanities. 10 (1). doi:10.1080/23311983.2023.2214387. ISSN 2331-1983.
  23. ^ a b c Olsson, John (2004). An Introduction to Language Crime and the Law. London: Continuum International Publishing Group.
  24. ^ Michell, C.S. (2013). Investigating the use of forensic stylistic and stylometric techniques in the analysis of authorship on a publicly accessible social networking site (Facebook) (PDF) (MA in Linguistics thesis). University of South Africa.
  25. ^ C. Hardaker (2015). The ethics of online aggression: Where does "virtual" end, and "reality" begin? BAAL Conference on The Ethics of Online Research Methods. Cardiff.
  26. ^ a b Tiersma, Peter. "forensic linguistics".
  27. ^ Coulthard, M. (2004). Author identification, idiolect and linguistic uniqueness. Applied Linguistics, 25(4), 431-447.
  28. ^ Labov, William (1972) Sociolinguistic patterns. Philadelphia, PA: University of Pennsylvania Press, p192.
  29. ^ Miller, C. (1984). "Genre as social action". Quarterly Journal of Speech. 70 (2): 151–157. doi:10.1080/00335638409383686.
  30. ^ Grant, T. D. (2008). "Approaching questions in forensic authorship analysis". In Gibbons, J.; Turell, M. T. (eds.). Dimensions of Forensic Linguistics. Amsterdam: John Benjamins.
  31. ^ Morton, A.Q., and S. Michaelson (1990) The Qsum Plot. Internal Report CSR-3-90, Department of Computer Science, University of Edinburgh.
  32. ^ Ariani, Mohsen Ghasemi; Sajedi, Fatemeh; Sajedi, Mahin (19 December 2014). "Forensic Linguistics: A Brief Overview of the Key Elements". Procedia - Social and Behavioral Sciences. 14th Language, Literature and Stylistics Symposium. 158: 222–225. doi:10.1016/j.sbspro.2014.12.078. ISSN 1877-0428.
  33. ^ "WHEN HELEN FROZE OVER". 5 June 2014. Retrieved 8 September 2018.
  34. ^ "Boston U. Panel Finds Plagiarism by Dr. King". The New York Times. 11 October 1991. Retrieved 14 June 2008.
  35. ^ Fitzgerald, J. R. (2004). "Using a forensic linguistic approach to tracking the Unabomber.". In Campbell, J.; DeNevi, D. (eds.). Profilers: Leading investigators take you inside the criminal mind. New York: Prometheus Books. pp. 193–222.
  36. ^ a b Coulthard, M., & Johnson, A. (2007). An introduction to forensic linguistics: Language in evidence. Oxford: Routledge:162-3.
  37. ^ Martin Fido (1994), The Chronicle of Crime: The infamous felons of modern history and their hideous crimes
  38. ^ a b c d Butters, Ronald R. (13 May 2011). "Forensic Linguistics". Journal of English Linguistics. 39 (2): 196–202. doi:10.1177/0022022111403849. ISSN 0075-4242.
  39. ^ Jones, Abigail (29 July 2017). "'Manhunt: Unabomber' explores how a federal investigator caught Ted Kaczynski and changed FBI profiling forever". Newsweek. Retrieved 17 August 2017.
  40. ^ Coulthard, R.M. (2000). " Whose text is it? On the linguistic investigation of authorship ", in S. Sarangi and R.M. Coulthard: Discourse and Social Life. London, Longman.
  41. ^ "Entrevista con Salvador Morales Garibay" (PDF). Letras Libres (in Spanish). February 1999. Archived from the original (PDF) on 17 October 2013.
  42. ^ "PGR ordena la captura y devela la identidad del Subcomandante Marcos (9 de febrero 1995)". YouTube (in Spanish).
  44. ^ "Marcos, en la mira de Zedillo". Proceso (in Spanish). 5 August 2002. Retrieved 16 January 2022.
  45. ^ "Marcos sí es Sebastián Guillén". (in Spanish). 10 December 2006. Archived from the original on 17 October 2013.
  46. ^ Trejo, Ángel (22 May 2006). "La Otra campaña, pintada de azul" (PDF). Buzos (in Spanish): 13–15.
  47. ^ "Maestros y condiscípulos de Tampico recuerdan a Rafael Sebastián Guillén". Proceso (in Spanish). 6 August 1995. Archived from the original on 22 December 2015.
  48. ^ Svoboda, Elizabeth (11 May 2009). "Speech Patterns in Messages Betray a Killer". The New York Times. Retrieved 23 October 2020.
  49. ^ "Life term for man who shot lover". BBC News. 8 November 2005. Retrieved 16 November 2011.
  50. ^ Mitchell, Elizabeth (8 September 2008). "The case for forensic linguistics". BBC News. Retrieved 20 August 2017.
  51. ^ British Association for the Advancement of Science (8 September 2008). "Txt Crimes, Sex Crimes And Murder: The Science of Forensic Linguistics". ScienceDaily.
  52. ^ Trial of Rehan Asghar, Central Criminal Court, London, January 2008.
  53. ^ Eagleson, Robert. (1994). 'Forensic analysis of personal written texts: a case study', John Gibbons (ed.), Language and the Law, London: Longman, 362–373.
  54. ^ "Man Jailed over Wife Fire Murder". BBC News. 11 December 2009.
  55. ^ "Professor Tim Grant". Aston University. Archived from the original on 25 January 2021.
  56. ^ Oliver, Huw (9 April 2015). "Forensic Linguists Use Spelling Mistakes to Help Convict Criminals". Vice.
  57. ^ Pennebaker, J. W. (1990). 'Physiological factors influencing the reporting of physical symptoms'. The Science of Self-report: Implications for Research and Practice. Mahwah, NJ: Erlbaum Publishers, pp. 299-316
  58. ^ "Recording Police Questioning". The New York Times. 15 June 2004.
  59. ^ Marko, Karoline (3 March 2022). ""Depends on Who I'm Writing To"—The Influence of Addressees and Personality Traits on the Use of Emoji and Emoticons, and Related Implications for Forensic Authorship Analysis". Frontiers in Communication. 7: 840646. doi:10.3389/fcomm.2022.840646. ISSN 2297-900X.
  • Coulthard, M. and Johnson, A. (2007)
  • Forensic linguistics; An Introduction to Language, Crime and Law (with original cases in Bureau of Police Investigation and Courts) by Azizi, Syrous & Momeni, Negar, Tehran: JahadDaneshgahi Publication, 2012
  • Nini, Andrea. “An Authorship Analysis of the Jack the Ripper Letters.” Digital Scholarship in the Humanities, vol. 33, no. 3, 2018, pp. 621–636.,

Further reading[edit]

  • Baldwin, J. R. and P. French (1990). Forensic phonetics. London: Pinter Publishers.
  • Coulthard, M. and Johnson, A (2007). An Introduction to Forensic Linguistics: Language in Evidence. London: Routledge.
  • Coulthard, M. and Johnson, A (2010). The Handbook of Forensic Linguistics. London: Routledge.
  • Coulthard, M., Johnson, A., and Wright, D. (2017) An Introduction to Forensic Linguistics: Language in Evidence. (2nd edition). London: Routledge.
  • Coulthard, M., May, A., and Sousa-Silva, R. (2021) The Handbook of Forensic Linguistics (2nd edition). London: Routledge.
  • Ellis, S. (1994). "Case report: The Yorkshire Ripper enquiry, Part 1", Forensic Linguistics 1, ii, 197–206.
  • Fairclough, N. (1989). Language and Power. London: Longman.
  • Gibbons, J. (2003). Forensic Linguistics: an introduction to language in the Justice System. Blackwell.
  • Gibbons, J., V Prakasam, K V Tirumalesh, and H Nagarajan (eds) (2004). Language in the Law. New Delhi: Orient Longman.
  • Gibbons, J. and M. Teresa Turell (eds) (2008). Dimensions of Forensic Linguistics. Amsterdam: John Benjamins.
  • Grant, T. (2008). "Quantifying evidence in forensic authorship analysis", Journal of Speech, Language and the Law 14(1).
  • Grant, T. and Baker, K. (2001). "Reliable, valid markers of authorship", Forensic Linguistics VIII(1): 66–79.
  • Heydon, G. (2014). "Forensic Linguistics: Forms and Processes", Masyarakat Linguistik Indonesia 31(1): 1–10.
  • Hollien, H. (2002). "Forensic Voice Identification". New York: Harcourt.
  • Hoover, D. L. (2001). "Statistical stylistics and authorship attribution: an empirical investigation", Literary and Linguistic Computing, XIV(4): 421–44.
  • Koenig, B.J. (1986). "Spectrographic voice identification: a forensic survey", letter to the editor of Journal of the Acoustical Society of America, 79(6): 2088–90.
  • Koenig, J. (2014). "Getting the Truth: Discover the Real Message Know Truth Know Deception", Principia Media.
  • Koenig, J. (2018). "Getting the Truth: I am D.B. Cooper", Principia Media.
  • Maley, Y. (1994). "The language of the law", in J. Gibbons (ed.), Language and the Law. London: Longman, 246–69.
  • McGehee, F. (1937). "The reliability of the identification of the human voice", Journal of General Psychology, 17: 249–71.
  • McMenamin, G. (1993). Forensic Stylistics. Amsterdam: Elsevier.
  • Nolan, F. and Grabe, E. (1996). "Preparing a voice lineup", Forensic Linguistics, 3 i, 74–94.
  • Pennycook, A. (1996). "Borrowing others words: text, ownership, memory and plagiarism", TESOL Quarterly, 30: 201–30.
  • Shuy, Roger W (2001). "Discourse Analysis in the Legal Context". In The Handbook of Discourse Analysis. Eds. Deborah Schiffrin, Deborah Tannen, and Heidi E. Hamilton. Oxford: Blackwell Publishing. pp. 437–452.

External links[edit]