Copy testing

Summary

A specialized field of marketing research, copy testing is the study of television commercials prior to airing them. Although also known as copy testing, pre-testing is considered the more accurate, modern name (Young, p.4) for the prediction of how effectively an ad will perform, based on the analysis of feedback gathered from the target audience. Each test will either qualify the ad as strong enough to meet company action standards for airing or identify opportunities to improve the performance of the ad through editing. (Young, p.213)

Pre-testing is also used to identify weak spots within an ad campaign, to more effectively edit 60-second ads to 30-second ads or 30’s to 15’s, to select images from the spot to use in an integrated campaign’s print ad, to pull out the key moments for use in ad tracking, and to identify branding moments. [1]

Four Types of Copy Testing Scores

There are four general themes woven into the last century of copy testing. To understand how the different types of measures relate to one another, see the heuristic advertising model here Ameritest TV Ad Model.

Report Card Measures

The first theme is the quest for a valid, single-number statistic to capture the overall performance of the advertising creative. This search has spawned the creation of various report card measures. These measures are used to filter commercial executions and help management make the go/no go decision about which ads to air. (Young, p. 7). The predominant copy testing measure of the 1950s and 1960s, Day-After Recall (DAR) was interpreted to measure an ad’s ability to “break through” into the mind of the consumer and register a message from the brand in long-term memory. (Honomichl) Once this measure was adopted by Procter and Gamble, it became a research staple. (Honomichl)

In the 1970s and 1980s, after DAR was determined to be a poor predictor of sales, the research industry began to depend on the measure of persuasion as an accurate predictor of sales. This shift was led, in part, by researcher Horace Schwerin who pointed out, “the obvious truth is that a claim can be well remembered but completely unimportant to the prospective buyer of the product – the solution the marketer offers is addressed to the wrong need.” (Honomichl). As with DAR, it was Procter and Gamble’s acceptance of the persuasion measure (also known as motivation) that made it an industry standard. Recall scores were still provided in copy testing reports with the understanding that persuasion was the measure that mattered. (Honomichl)

The 1970s also saw a re-examination of the “breakthrough” measure. As a result, an important distinction was made between the attention-getting power of the creative execution and how well “branded” the ad was. Thus, the separate measures of attention and branding were born. (Young, p.12)

Obstacles

In the 70s, 80s, and 90s, tests were conducted to validate a link between the recall score and actual sales. For example, Procter and Gamble reviewed 10 year’s worth of split-cable tests (100 total) and found no significant relationship between recall scores and sales. (Young, pp. 3-30) In addition, Wharton University’s marketing guru Leonard Lodish conducted an even more extensive review of test market results and also failed to find a relationship between recall and sales. (Lodish pp. 125-139) Harold Ross of Mapes & Ross found that persuasion was a better predictor of sales than recall. (Ross pp.13-16)

Diagnostic Measures

The second theme is the development of diagnostic copy testing, the main purpose of which is optimization. Understanding why diagnostic measures such as attention, brand linkage, and motivation are high or low can help advertisers identify creative opportunities to improve executions. (Young, p.7)

Obstacles

Different approaches have been developed by research companies to determine the report card measures of attention, brand linkage, and motivation. For example, Unilever analyzed a database of commercials “triple-tested” using the three leading approaches to the measure of branding (Ameritest, ASI, and Millward Brown) which shows that each of the three is measuring something uncorrelated with, and therefore different from, the other two. (Kastenholtz, Kerr & Young).

Non-Verbal Measures

The third theme is the development of non-verbal measures in response to the belief of many advertising professionals that much of a commercial’s effects – e.g. the emotional impact – may be difficult for respondents to put into words or scale on verbal rating statements. In fact, many believe the commercial’s effects may be operating below the level of consciousness. (Young, p.7) According to researcher Chuck Young, “There is something in the lovely sounds of our favorite music that we cannot verbalize – and it moves us in ways we cannot express.” (Young, p.22)

Obstacles

In the 1970s, researchers, such as Herbert Krugman sought to measure these non-verbal measures biologically by tracking brain wave activities as respondents watched commercials. (Krugman) Others experimented with galvanic skin response, voice pitch analysis, and eye-tracking. (Young, p.22) These efforts were not popularly adopted, in part, because of the limitations of the technology as well as the poor cost-effectiveness of what was widely perceived as academic, not actionable research.

Solutions

In the 1990s, the Picture Sorts were created as a method of deconstructing a viewer’s dynamic response to the film on multiple levels. A Flow of Attention graph, as one example of a Picture Sort, measures how the eye pre-consciously filters the visual information in an ad and serves both as a gatekeeper for human consciousness and as an interactive search engine. More mainstream than the biological measures, Picture Sorts have been used extensively for on-line ad testing and, because they are not language-dependent, have been used around the world by major advertisers as diverse as IBM and Unilever. (Young, p.24) Example of Ameritest Flow of Attention Graph

Moment-by-Moment Measures

The fourth theme, which is a variation on the previous two, is the development of moment-by-moment measures to describe the internal dynamic structure of the viewer’s experience of the commercial, as a diagnostic counterpoint to the various gestalt measures of commercial performance or predicted impact. (Young, p.7)

In the early 1980s the shift in analytical perspective from thinking of a commercial as the fundamental unit of measurement to be rated in its entirety, to thinking of it as a structured flow of experience, gave rise to experimentation with moment-by-moment systems. The most popular of these was the dial-a-meter response which required respondents to turn a meter, in degrees, toward one end of a scale or another to reflect their opinion of what was on screen at that moment. PDF

Obstacles

Unless the dial-a-meter is calibrated by normalizing the data to each individual’s reaction time, the aggregate sample data will be spread across many measurement intervals. Second, dial-a-meters contain an uncertainty range around which moment is actually being measured because of differences in respondent response times. Relatively little has been published to validate dial-a-meter diagnostics to traditional measures of overall ad performance such as recall and persuasion. PDF

Solutions

In the 1990s, the Ameritest Picture Sorts shifted the frame of measurement from clock time (the dial-a-meter approach) to the “subjective time” of experience which is tied to the rate of information flow in the film, or the ad’s visual complexity. Instead of providing a rating whenever the alarm rings, respondents rate a Picture Sort image only when the mood, message, or image changes significantly. The data results are clear, easy to understand, and visually appealing. (Young, p. 23) Examples of an Ameritest Flow of Emotion Graph can be seen in The Advertising Research Handbook, (Young, p. 202) and here [2] in Exhibit 2.

In addition, the dial-a-meter’s single-scale limitations are overcome with a set of moment-by-moment measures in three dimensions: wiktionary: Flow of Attention Flow of Attention which measures the memorability of each moment, Flow of Emotion which measures the positive or negative emotional response to each moment, and Flow of Meaning which measures how well the brand’s strategic values are being communicated in each moment.

The Future: Seven Trends

Chuck Young, author of The Advertising Research Handbook, offers his views on the trends that will shape the way we do business in the future. (Young pp.27-30)

There will be an emergence of global research standards for global brands. Increasingly, multi-nationals are focusing on the need to build global brands, and for their brands to speak with one voice around the world. This calls for global advertising campaigns that will be increasingly visual in style. Providing both a standard way to measure advertising performance from one region to another, and the tools to identify how different cultural factors affect advertising response, will become more important for managing ad spending in the global marketplace.
There will be more advertising measurement, not less. Advertising is becoming more expensive and the range of executional options becoming so diverse that more control over the process is being demanded by major clients today. Procurement departments, in particular, under the banner of accountability, are challenging advertising agencies and research companies to provide more proof of value to justify ad budgets. This will drive growth in this important sector of advertising research.
Most copy testing will move to the Internet. In an age of rapid-response marketing, the emphasis is on speed of decision-making. The Internet is the obvious choice for shortening the time involved in the research step of the creative development cycle. Many suppliers have already begun migrating their advertising research to the web (for both television and print testing). Economic pressure will probably force the majority of testing online in the near future.
The new value proposition will be filtering plus optimization. For the foreseeable future, the cost of advertising executions will continue to go up. To manage that cost, managers will be increasingly interested in airing only their strongest ideas so that they don’t spend a large portion of their advertising budgets on average ideas. Ad managers will be looking for every opportunity to make executions work harder and research systems will outperform this growing category if they can validate the power of their diagnostics, providing proof that they actually help make ads more effective.
Ad research will move beyond semantics – putting a new emphasis on “wholistic” or 360-degree measurement of integrated advertising campaigns. Both the forces of globalization and the evolution of rich, multi-sensory media environments will continue to challenge researchers to think beyond the boundaries of language and semantics in understanding how advertising builds brand image.
New heuristic models will be built to help managers make ad decisions in a world increasingly confused by media fragmentation. As the world of media becomes increasingly fragmented and media choices proliferate, the need for research to simplify the decision-making process for advertising managers increases. This calls for new heuristics that describe how different media work, e.g., television versus print. These heuristics are necessary to provide a common measurement framework so that advertising managers trying to allocate budgets across television or print or the Internet can compare the relative strengths of the television execution to the print execution to the Internet ad.
Mathematics models of advertising ROI will begin to incorporate measures of creative quality.

Currently, researchers working with marketing-mix models to determine advertising ROI do not explicitly include measures of creative quality. That is, the performance strength of advertising executions is not calculated in their models. As a result, current mix models are heavily biased toward media weight or spend. In the future, sophisticated modelers will start to include a “quality” variable in these models, particularly as new forms of tracking research begin to provide relative performance rankings of competitive ads.

Relevant Terms

Copy Testing Companies

Ameritest http://www.ameritest.net

Anderson Analytics http://www.andersonanalytics.com

ARS http://www.ars-group.com

Ipsos-ASI http://www.ipsos.com

Millward Brown http://www.millwardbrown.com

References

http://www.ameritest.net/products/tv.php Ameritest TV Ad Model

http://www.ameritest.net/choose/

Example of Ameritest Flow of AttentionGraph http://www.ameritest.net/products/tv.php

Template:PDFlink.

Honomichl, J. J., Honomichl on Marketing Research, Lincolnwood, IL: NTC Business Books, 1986.

Kastenholz, J., Kerr, G., & Young, C., Focus and Fit: Advertising and Branding Join Forces to Create a Star. Marketing Research, Spring 2004, 16-21.

Krugman, H., Memory Without Recall, Exposure Without Perception. Journal of Advertising Research, July/August, 1977.

Lodish, L. M., Abraham, M., Kalmenson, Slk, Livelsberger, J., Lubetkin, B., Richardson, B., & Stevens, M. E., How TV Advertising Works: A Meta-Analysis of 389 Real World Split Cable TV Advertising Experiments. Journal of Marketing Research, May 2995, 125-139.

Ross, H., Recall vs. Persuasion: Ans Answer. Journal of Marketing Research, 1982, 22(1), 13-16.

Young, Charles E., The Advertising Research Handbook, Ideas in Flight, Seattle, WA, April 2005.