Gudjonsson suggestibility scale

The Gudjonsson suggestibility scale (GSS) is a psychological test that measures suggestibility. It was created in 1983 by Icelandic psychologist Gísli Hannes Guðjónsson and involves reading the subject a short story and testing their recall. It has been used in court cases in several jurisdictions but has been the subject of various criticisms.


The Gudjonsson suggestibility scale (GSS) was created in 1983 by Icelandic psychologist Gísli Hannes Guðjónsson. Given his large number of publications on suggestibility, Gudjonsson was often called as an expert witness in court cases where the suggestibility of those involved in the case was crucial to the proceedings. To measure suggestibility, Gudjonsson created a scale that was relatively straightforward and could be administered in a wide variety of settings.[1] He noticed that while there was a significant body of research on the effects of leading questions on suggestibility, less was known about the effects of "specific instruction" and "interpersonal pressure".[1] Previous methods of measuring suggestibility were primarily aimed at "hypnotic phenomena"; however, Gudjonsson's scale was the first created to be used specifically in conjunction with interrogative events.[1] It relies on two different aspects of interrogative suggestibility: it measures how much an interrogated person yields to leading questions, as well as how much an interrogated person shifts their responses when additional interrogative pressure is applied. The test is designed specifically to measure the impact of suggestive questions and instructions.[1] Although originally in English, the scale has been translated into several different languages, including Portuguese,[2][3] Italian,[4] Dutch,[5] and Polish.[6]


The GSS involves a short story, followed by a general recall activity, a test, and a retest. It begins with a short story being read to the subject:

Anna Thomson of South Croydon was on holiday in Spain when she was held up outside her hotel and robbed of her handbag, which contained $50 worth of traveler's checks and her passport. She screamed for help and attempted to put up a fight by kicking one of the assailants in the shins. A police car shortly arrived and the woman was taken to the nearest police station, where she was interviewed by Detective Sergeant Delgado. The woman reported that she had been attacked by three men, one of whom she described as oriental looking. The men were said to be slim and in their early twenties. The police officer was touched by the woman’s story and advised her to contact the British Embassy. Six days later, the police recovered the lady’s handbag, but the contents were never found. Three men were subsequently charged, two of whom were convicted and given prison sentences. Only one had had previous convictions for similar offences. The lady returned to Britain with her husband Simon and two friends but remained frightened of being out on her own.[1]

The subject is instructed to listen carefully to the story being read to them because they will have to report what they remember afterward. After the researcher reads the story aloud to the participant, they engage in free recall in which they report everything they remember of what they have just heard. To make the assessment more difficult, subjects may be asked to report these facts after 50 minutes in addition to immediately following the story. This part of the assessment is scored based on how many facts the subject recalls correctly.[1]

The second part of the assessment consists of the actual scale. It consists of twenty questions regarding the short story – fifteen questions being suggestive and five being neutral.[1] The fifteen suggestive questions can be separated into three types of suggestibility: leading questions, affirmative questions, and false alternative questions. Their purpose is to measure how much a participant "yields" to suggestive questions. Leading questions contained some "salient precedence" and are worded in such a way that they seem plausible and lend themselves to an affirmative answer. A leading question on the GSS would ask, "Did the woman's glasses break in the struggle?"[1] Affirmative questions were those that presented facts not present in the story, but that contain an affirmative response bias. An example of an affirmative question would be ""Were the assailants convicted six weeks after their arrest?"[1] Like the affirmative questions, false alternative questions contain information not present in the story; however, these questions focus specifically on objects, people, and events not found in the story. One of these questions would be "Did the woman hit one of the assailants with her fist or handbag?"[1] The five neutral questions contain a correct answer that is affirmative-the correct answer is yes. After 1987, the GSS was altered so that these five questions were included in the shift score as well.[7] This version is referred to as the Gudjonsson suggestibility scale 2, or GSS2. The twenty questions are dispersed among the assessment to conceal its aim.[1] The person under interrogation is told in a "forceful manner" that there are errors in their story and must answer the questions a second time. After the initial questionnaire, the subjects are told that they made a certain number of errors and are instructed to go over the assessment again and correct any errors they detect. Any changes made in the suggestive questions are recorded.


Scoring can be broken down into two main categories: memory recall and suggestibility. Memory recall refers to the number of facts the subject correctly remembered during the free recall. Each fact is worth one point, and the subject can earn a maximum of forty points for this section.[1]

The suggestibility section is broken into three subcategories-yield, shift, and total. Yield refers to the number of suggestive questions answered incorrectly. With each question being worth one point, subjects can score up to fifteen points on this section. If the subject engaged in two recall activities, the score for the second trial is not included in the scoring. Shift refers to any notable significant change in the participant's answers after they were told to go over their original answers and correct their mistakes. Subjects can also score up to fifteen points on this section. The total score refers to the sum of both the Yield and Shift scores.[1]

In a sample of 195 people, the yield 1 mean score was 4.9, with a standard deviation of 3.0. The yield 2 mean score was 6.9, with a standard deviation of 3.4. The average shift score was 3.6, with a standard deviation of 2.7. For total suggestibility (yield + shift), the average score was 8.5, with a standard deviation of 4.3. The average memory recall score was 19.2, with a standard deviation of 8.0.[1]

Measures of reliability and validity[edit]

Internal consistency scores between yield 1 and shift for the GSS range from −.23 to .28.[8][9][10][5] Internal consistency for the fifteen yield and fifteen shift questions were reportedly 0.77 and 0.67, respectively.[1] The GSS2 showed higher internal consistency than the GSS1. Test-retest reliability was reportedly 0.55.[5] Overall, shift scores showed the lowest internal consistency, at 0.11.[4][11] Other scores were significant.[11] External validity, tested with the Portuguese version of the GSS, showed no correlation between interrogative suggestibility and factors of personality,[12][13] nor interrogative suggestibility and anxiety.[14][15] Immediate recall and delayed recall correlated negatively with all suggestibility scores.[4]

Uses in the justice system[edit]

Use in criminal proceedings[edit]

The GSS is used most often in criminal justice systems. The human memory has been known to be unreliable, and with Western countries relying heavily on eyewitness testimony, an increasing number of incorrect convictions have been made public.[2] The GSS allows psychologists to identify individuals who may be susceptible to giving false accounts of events when questioned.[2] The GSS could be useful in a situation where a defendant is being interrogated or cross-examined.[16] There is evidence that GSS scores vary between inmates and the general population. In the general population, high scores on the GSS are associated with an increased likelihood of false confession.[17][18] Pires (2014) studied 40 Portuguese prisoners and found that inmates had higher suggestibility scores than the general population.[2] This group had the lowest scores in the immediate recall portion of the GSS, suggesting that their higher suggestibility was due to their lower memory capacity.[2] Possible explanations for this may be that the inmates participated in the study voluntarily, and were told that participating would have no negative effect on them.[2] Therefore, even for inmates with antisocial personality disorder, the study took place in a "cooperative atmosphere". Inmates who had a negative attitude toward the test situation or the examiner had decreased vulnerability to suggestion.[2] Additionally, repeat offenders were more resistant to interrogative pressure than those without prior convictions; this may be due to their experience in interrogation settings.[2][17] Studies have found that GSS scores are higher in people who confess to crimes they did not commit than in people who are more resistant to police questioning.[17][18]

The use of the GSS in court proceedings has been met with mixed responses. In the United States, courts in many states have ruled that the GSS does not meet either the Frye standard or the Daubert standard for the admissibility of expert testimony.[16] In Soares v. Massachusetts,[19] for example, the Massachusetts Appeals Court stated that the case was "devoid of evidence demonstrating either the scientific validity or reliability of the GSS as a measure of susceptibility to suggestion or appropriate applications of the test results."[19] In the same year, the Wisconsin Supreme Court, in Summers v. Wisconsin affirmed the trial court's decision to exclude the defense's expert testimony on the GSS because it was "vague regarding what information or insights the expert could offer that would assist the jury and the scientific bases of these insights."[20] Despite these decisions, the GSS has been permitted to be used in several court cases. For example, in Oregon v. Romero, the Oregon Supreme Court held that the testimony of a defense expert about the results of a Gudjonsson suggestibility test—offered in support of the defendant's claim that her confession to police was involuntarily—met "the threshold for admissibility" because "It would have been probative, relevant, and helpful to the trier of fact."[21]

Experts have linked GSS suggestibility to the voluntary aspect of Miranda waivers during legal proceedings.[22] Despite this, there are very few appellate cases in which the GSS has been presented to a court with any reference to the voluntariness of a Miranda waiver. Rogers (2010) specifically examined the GSS in terms of its ability to predict people's ability to understand and agree to Mirand rights. This study found that suggestibility, as assessed by the GSS, appeared to be unrelated to "Miranda comprehension, reasoning, and detainees' perceptions of police coercion".[22] Defendants with high compliance were found to have significantly lower Miranda comprehension and ability to reason about exercising Miranda rights compared to their counterparts with low compliance.[22]

Use in juvenile delinquency proceedings[edit]

Scores of adolescents in the justice system differ from those of adults. Richardson (1995), administered the GSS to 65 juvenile offenders. When matched with adult offenders on IQ and memory, juveniles were much more susceptible to giving into interrogative pressure (Shift), specifically by changing their answers after they were given negative feedback.[23] Their answers to the leading questions, however, were no more affected by suggestibility than their adult cohorts.[23] These results were likely not due to memory capacity, as studies have shown that information that children can retrieve during free recall increases with age and is equal to adults around age 12.[23] Singh (1992) compared non-offending adults and adolescents and showed that adolescents still showed higher suggestibility scores than adults.[24] A study comparing delinquent adolescents to normal adults found the same results[25] Researchers suggest that police interviewers not place adolescent suspects and witnesses under excessive pressure by criticizing their answers.[23]


Use with people with intellectual disabilities[edit]

Use of the GSS with people who have an intellectual disability has been met with criticism.[16] This controversy is partially due to the large memory component of the GSS. Research has shown that the high levels of suggestibility demonstrated by people with intellectual disabilities are related to poor memory for the information presented in the GSS.[16] People with intellectual disabilities have difficulty remembering aspects of the fictional story of GSS because it is not relevant to them. When those with intellectual disabilities are tested based on events that are of personal significance to them, suggestibility decreases significantly.[16] In terms of false confession, which involves a situation in which the defendant was not present, the GSS might have more relevance to confessions than it does to witness testimony.[16] Another context in which the GSS is sometimes used is as part of the assessment of whether people accused of a crime have the capacity to plead to the charge.[16] Despite this perceived usefulness, it is advised that the GSS not be used in court, as their results may not accurately represent their ability to understand the charges against them or to stand trial.[16]

Internal consistency reliability[edit]

One issue with the GSS is internal consistency reliability, specifically in regards to the Shift portion of the measure.[8] Both Shift-positive and Shift-negative are associated with levels of internal consistency reliability of x2 < .60. Internal Shift scores have been reported as x2 = .60, which is "unacceptably low".[8] These numbers serve as a possible explanation for why studies have not found "theoretically meaningful correlations" between the Shift sub-scale and other external criteria. Researchers argue against the use of a Total suggestibility composite due to evidence that Yield 1 and Shift scores do not significantly correlate with each other.[8] This absence of a correlation is problematic because it "suggests that yielding to a leading question and yielding to negative feedback from an interviewer operate under completely different processes".[8] Other researchers have found that there are two types of suggestibility: direct and indirect. The failure to take these into account may have led to methodological problems with the GSS.[6] Researchers suggest that until these issues have been addressed, the GSS should only be limited to the Yield sub-scale.[8]

Effects of cognitive load on suggestibility[edit]

Drake et al. (2013) aimed to discover the effects that increasing cognitive load had on suggestibility scores on the GSS, and specifically attempts at faking interrogative suggestibility.[26] The study was conducted using 80 undergraduate students, each of whom were assigned to one of four conditions from a combination of instruction type (genuine or instructed faking) and concurrent task (yes or no).[26] Findings showed that instructed fakers not performing a concurrent task scored significantly higher on yield 1 compared with "genuine interviewees". Instructed fakers who were performing a concurrent task scored significantly lower on yield 1 scores. Genuines (non-fakers) did not exhibit this pattern in response to cognitive load differences.[26] These results suggest that an increase in cognitive load may indicate an attempt at faking on the yield portion of the GSS. Increasing cognitive load may facilitate the detection of deception because it is more difficult to act deceptively under these conditions.[26]


One possible issue with the GSS is its validity – whether it measures genuine "internalization of the suggested materials" or simply "compliance with the interrogator".[27] To test this, Mastroberardino (2013) conducted two experiments. In the first, participants were administered the GSS2 and then immediately performed a "source identification task" for the items on the scale. In the second experiment, half of the participants were administered this identification task immediately while the other have were administered it after 24 hours.[27] Both experiments found a higher proportion of compliant responses. Participants internalized more suggested information after yield 1, and made more compliant responses during the shift portion of the assessment.[27] In the second experiment, participants in the delayed condition internalized less material than those in the immediate condition.[27] These results support the idea that different processes underlie the yield 1 and shift parts of the GSS2-yield 1 may include internalization of suggested materials and compliance, while shift may be due mostly to compliance with the interrogator. The GSS is not able to differentiate between compliance and suggestibility, as the outcome behaviors of these two cognitive processes are the same.[27]

Suggestibility and false memory[edit]

Leavitt (1997) compared suggestibility (evaluated by the GSS) in participants who recovered memories of sexual assault to that of those without a history of sexual trauma.[28] The results of this study showed that those who had recovered memories had a lower average suggestibility scores than those who did not have a history of sexual abuse – 6.7 versus 10.6.[28] These results suggest that suggestibility does not play as large a role in the formation of memories than previously assumed.[28]


