Copy testing

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Copy testing is a specialized field of marketing research that determines an advertisement's effectiveness based on consumer responses, feedback, and behavior. Also known as pre-testing, it might address all media channels including television, print, radio, outdoor signage, internet, and social media.

Automated Copy Testing is a specialized type of digital marketing specifically related to digital advertising. This involves using software to deploy copy variations of digital advertisements to a live environment and collecting data from real users. These automated copy tests will generally use a Z-test to determine the statistical significance of results. If a specific ad variation out performs the baseline in the copy test, to a desired level of statistical significance, this new copy variation should be used by the marketer.


In 1982, a consortium of 21 leading advertising agencies — including N. W. Ayer, D’Arcy, Grey, McCann Erickson, Needham Harper & Steers, Ogilvy & Mather, J. Walter Thompson, and Young & Rubicam — released a public document laying out the PACT (Positioning Advertising Copy Testing) Principles that constitute a good copy testing system. PACT states a good copy testing system must meet the following criteria:

  1. Provides measurements which are relevant to the objectives of the advertising.
  2. Requires agreement about how the results will be used in advance of each specific test.
  3. Provides multiple measurements, because single measurements are generally inadequate to assess the performance of an advertisement.
  4. Based on a model of human response to communications – the reception of a stimulus, the comprehension of the stimulus, and the response to the stimulus.
  5. Allows for consideration of whether the advertising stimulus should be exposed more than once.
  6. Recognizes that the more finished a piece of copy is, the more soundly it can be evaluated and requires, as a minimum, that alternative executions be tested in the same degree of finish.
  7. Provides controls to avoid the biasing effects of the exposure context.
  8. Takes into account basic considerations of sample definition.
  9. Demonstrates reliability and validity.

Types of copy testing measurements[edit]


The predominant copy testing measure of the 1950s and 1960s, Burke's Day-After Recall (DAR) was interpreted to measure an ad's ability to “break through” into the mind of the consumer and register a message from the brand in long-term memory (Honomichl). Once this measure was adopted by Procter and Gamble, it became a research staple (Honomichl).

In the 70s, 80s, and 90s, validation efforts found no link between recall scores and actual sales (Adams & Blair; Blair; Blair & Kuse; Blair & Rabuck; Jones; Jones & Blair; MASB; Mondello; Stewart). For example, Procter and Gamble reviewed 10 year's worth of split-cable tests (100 total) and found no significant relationship between recall scores and sales (Young, pp. 3–30). In addition, Wharton University's Leonard Lodish conducted an even more extensive review of test market results and also failed to find a relationship between recall and sales (Lodish pp. 125–139).

The 1970s also saw a re-examination of the “breakthrough” measure. As a result, an important distinction was made between the attention-getting power of the creative execution and how well “branded” the ad was. Thus, the separate measures of attention and branding were born (Young, p. 12).


In the 1970s and 1980s, after DAR was determined to be a poor predictor of sales, the research industry began to depend on a measure of persuasion as an accurate predictor of sales. This shift was led, in part, by researcher Horace Schwerin who pointed out, “the obvious truth is that a claim can be well remembered but completely unimportant to the prospective buyer of the product – the solution the marketer offers is addressed to the wrong need” (Honomichl). As with DAR, it was Procter and Gamble's acceptance of the ARS Persuasion measure (also known as brand preference) that made it an industry standard. Recall scores were still provided in copy testing reports with the understanding that persuasion was the measure that mattered (Honomichl).

Harold Ross of Mapes & Ross found that persuasion was a better predictor of sales than recall (Ross), and the predictive validity of ARS Persuasion to sales has been reported in several refereed publications (Adams & Blair; Jones & Blair; MASB; Mondello ).


The main purpose of diagnostic measures is optimization. Understanding diagnostic measures can help advertisers identify creative opportunities to improve executions (Young, p. 7).


Non-verbal measures were developed in response to the belief that much of a commercial's effects – e.g. the emotional impact – may be difficult for respondents to put into words or scale on verbal rating statements. In fact, many believe the commercial's effects may be operating below the level of consciousness (Young, p. 7). According to researcher Chuck Young, “There is something in the lovely sounds of our favorite music that we cannot verbalize – and it moves us in ways we cannot express” (Young, p. 22).

In the 1970s, researchers sought to measure these non-verbal measures biologically by tracking brain wave activities as respondents watched commercials (Krugman). Others experimented with galvanic skin response, voice pitch analysis, and eye-tracking (Young, p. 22). These efforts were not popularly adopted, in part because of the limitations of the technology as well as the poor cost-effectiveness of what was widely perceived as academic, not actionable research.

In the early 1980s the shift in analytical perspective from thinking of a commercial as the fundamental unit of measurement to be rated in its entirety, to thinking of it as a structured flow of experience, gave rise to experimentation with moment-by-moment systems. The most popular of these was the dial-a-meter response which required respondents to turn a meter, in degrees, toward one end of a scale or another to reflect their opinion of what was on screen at that moment.

More recently, research companies have started to use psychological tests, such as the Stroop effect, to measure the emotional impact of copy. These techniques exploit the notion that viewers do not know why they react to a product, image, or ad in a certain way (or that they reacted at all) because such reactions occur outside of awareness, through changes in networks of thoughts, ideas, and images.

Copy testing in political elections[edit]

Copy testing is utilized in an array of fields ranging from commercial development to presidential elections. In 2007, CNN employed this form of market testing throughout the primary and general election. Rita Kirk and Dan Schill from Southern Methodist University worked with CNN to gauge voters reaction to debates between presidential hopefuls. (

Relevant Terms[edit]


  • Adams, A. J., & M. H. Blair. “Persuasive Advertising and Sales Accountability: Past Experience and Forward Validation.” Journal of Advertising Research, March/April 1992: 20–25.
  • Blair, M. H. “An Empirical Investigation of Advertising Wearin and Wearout.” Journal of Advertising Research, 27, 6 (1987): 45–50.
  • Blair, M. H., & A. R. Kuse. "Better Practices in Advertising Can Change a Cost of Doing Business to Wise Investments in the Business." Journal of Advertising Research, March 2004: 71-89.
  • Blair, M. H., & M. J. Rabuck. “Advertising Wearin and Wearout: Ten Years Later.” Journal of Advertising Research. October 1998: 7–18.
  • Foreman, T. "Focus Group's Satisfaction Grows for GOP Field during Debate -" - Breaking News, U.S., World, Weather, Entertainment & Video News. CNN. Web. 20 Jan. 2012.
  • Honomichl, J. J. Honomichl on Marketing Research, Lincolnwood, IL: NTC Business Books, 1986.
  • Jones, J. P. "Look Before You Leap." Admap, November 1996.
  • Jones, J. P. "Quantitative Pretesting for Television Advertising." How Advertising Works: The Role of Research, Sage Publications, Inc., 1998: 160-169.
  • Jones, J. P., & M. H. Blair. "Examining 'Conventional Wisdoms' About Advertising Effects With Evidence From Independent Sources." Journal of Advertising Research, November/December 1996: 37-59.
  • Kastenholz, J., G. Kerr, & C. Young. "Focus and Fit: Advertising and Branding Join Forces to Create a Star." Marketing Research, Spring 2004: 16-21.
  • Krugman, H. "Memory Without Recall, Exposure Without Perception." Journal of Advertising Research, July/August 1977.
  • Lodish, L. M., M. Abraham, S. Kalmenson, J. Livelsberger, B. Lubetkin, B. Richardson, & M. E. Stevens. "How TV Advertising Works: A Meta-Analysis of 389 Real World Split Cable TV Advertising Experiments." Journal of Marketing Research, May 1995: 125-139.
  • MASB. Marketing Accountability Standards: Measuring and Improving the Return from TV Advertising (An Example). April 2008 & May 2012. (
  • Mondello, M. "Turning Research Into Return-on-Investment." Journal of Advertising Research, July/August 1996.
  • Ross, H. "Recall vs. Persuasion: An Answer." Journal of Marketing Research, 1982, 22(1): 13-16.
  • Puckett, Jason "Ad Copy Testing - 5 Best Practices to Improve ROI", 2015. AdBasis, Inc.
  • Stewart, D. W. "Advertising Wearout: What and How you Measure Matters." Journal of Advertising Research, September/October 1999: 39-42.
  • Understanding Copy Pretesting (1994). Published by Advertising Research Foundation, NY.
  • "Choosing a Testing Company."(
  • "TV Commercial Pre-testing." (
  • Young, C. E. The Advertising Research Handbook, Ideas in Flight, Seattle, WA, April 2005.
  • Young, C. E. "A Short History of Television Advertising." Ameritest/CY Research, Inc., 2004 (
  • Zilberstein, S. "CNN to Track Debate Viewers' Responses in Real Time." Featured articles from CNN. CNN, 13 Dec. 2007. Web. 20 Jan. 2012. (