Why Do Campaign Polls Zigzag So Much?

Why Do Campaign Polls Zigzag So Much?



Because Invalid Sampling Leads To Zigzags In Party Affiliations.







Gerald S. Wasserman

Dept. of Psychological Sciences

Purdue University






Abstract: An analysis of telephone poll data collected by Gallup since the end of the primaries demonstrates that the size of Clinton's lead over Dole varies with the relative number of Democrats and Republicans who are contacted in any given poll. When the number of people who say they are Democrats is much larger than the number who say they are Republicans, Clinton's lead soars to as high as 23%. But when those who say they are Republicans slightly outweigh those who say they are Democrats, Clinton's lead falls to as little as 5%. Most of the fluctuation in Clinton's lead is accounted for by such poll-to-poll variations in party affiliation. The cause of these fluctuations is discussed; a failure to apply survey research technology in a sound manner is implicated. This demonstration compromises such and similar polls, which indicates that, despite the extraordinary investment which has been made in polling during the current presidential campaign, the true state of current opinion is actually unknown.

Every week or so, a new poll is released which suggests that some genuine change has occurred in the presidential horse race. Examples of such supposed changes are plotted in the figure given just below. These data come from the Gallup[1] poll and show that Clinton's lead over Dole has zigzagged over an 18 point range. The biggest fluctuation occurred in August, during the conventions, in what is commonly called the "convention bounce."



These week-to week changes are much larger than the random[2] sampling errors associated with such polls. Why then is there so much fluctuation? Conventional wisdom usually attributes such changes to genuine campaign developments and much newspaper space has been taken up by the interpretation of poll changes in this way. As a result, readers interested in politics are regularly treated to stories explaining the advance or retreat of one or the other candidate because of one or another news event. The conventions are particularly supposed to have great power to modify opinion, albeit only temporarily.


I suggest that, over the last half year, opinion has actually not changed to any great degree. An alternative explanation is more likely, namely that the apparent poll fluctuations are mostly artifacts which have been produced by poor recent telephone polling techniques. True, survey research technology is very highly developed, but campaign polling enterprises have not recently devoted enough time to data collection. Indeed, most of the Gallup data plotted above represent polls which were conducted within a two or three day period; one poll only covered a single day. This haste has produced considerable systematic[3] sampling error.


The abbreviated sampling periods of most recent polls makes them vulnerable to irrelevant variations caused by over- or under-sampling of population subgroups. For example suppose a breakneck poll sampled too many Democrats.[4] Systematic error would then follow because opinion is currently highly stratified: People who tell pollsters they are Democrats also say[5] they will almost all vote for Clinton while people who tell pollsters they are Republicans also say they will almost all vote for Dole. A poll which sampled too many Democrats would therefore have to be biased in Clinton's favor. And vice versa.


Because such reasoning is analytically exact, it leads to a certain conclusion: Spurious poll to poll changes in the relative number of Democrats and Republicans sampled must lead to erratic effects on apparent preferences for Clinton versus Dole. On this view, polling zigzags, which the news media interpret substantively, might then simply correspond to irrelevant changes in the party affiliations of the people who happened to be contacted in successive poll samples rather than to anyones's change of opinion.


The possibility that such an artifact did magnify poll zigzags can be readily checked, provided the necessary data are in hand. The Gallup Poll[6] provides many of them and when their data are considered from this perspective, it quickly becomes apparent that zigzags in the party affiliations of the people sampled are indeed closely linked to the zigzags in Clinton's lead over Dole.

The tightness of this linkage is directly demonstrated in two figures given below. The figure given immediately below covers the entire primary to convention period and it includes all[7] of Gallup's Clinton/Dole polls run between February 23 and September 1. Thus the figure includes fairly recent[8] data which were posted on Gallup's internet home page. Clinton's lead is clearly related to the computed[9] difference between the percentages of Democrats and Republicans who were polled.


Without even considering the effect of any changes of opinion among Independent voters, this simple analysis accounts for most [10] of the fluctuations in the poll lead. Changes in the partisan composition of successive poll samples are therefore definitely linked to most of the apparent changes in Clinton's lead over Dole.


However, Gallup's website home page rounded off presidential preferences to the nearest percentage point before reporting them. The computations which produced the figure given above are sensitive to round-off errors. Therefore, no one calculated data point presented in the plot given above can be treated incautiously because it may be displaced from its true value. Clearly, the true relationship must quite robust if it survives such displacements. Hence, the observed relationship would be expected to be even stronger if the actual values were available.


Indeed, the actual and completely exact party affiliation data are published in the Gallup Poll Monthly, but that journal's publication lag limits the period covered. An analysis of these more exact[11] but less complete[12] data is accordingly shown in the next figure. These actual data exhibit, as expected, a much tighter relationship than did the computed data given above.[13]



Whichever way it is viewed, whether complete and less exact data are considered or incomplete and more exact data are used, the implication of this analysis is clear: Political campaign polls would exhibit much smaller zigzags if such sampling errors were corrected. The way to correct them is to take the time to execute any given poll properly by calling back until almost all respondents have been contacted. A single proper poll would tell us more than all of the scores of invalid polls that come out one after another.


I would suggest that the fundamental cause of the deluge of invalid polling is most likely to be the appetite of the news media for news. Change is news; stasis is not. The news media, of course, pay for most publicly reported campaign polls, and the customer is always right. This is probably why we have had a surfeit of invalid polls when a single properly executed poll would have been more informative.



Version 4.1 - 9 October 1966
© 1996 Gerald S. Wasserman

A later related article, entitled: "Were The Polls Right? No. Only Once In 4,900 Elections Would Chance Alone Produce Such Failures." can be found at: http://www.psych.purdue.edu/~codelab/PollOdds.html


Notes and References.



[1] This plot comes from data collected by Gallup between a February 23-25 poll, taken when it had become clear that Dole would be the Republican nominee, and an August 30-September 1 poll, taken after both party conventions had been held.



[2] A mystique exists about random "polling errors." If 1,000 people are sampled and opinion turns out to be evenly split, then the standard error (SE) of this 50% result would only be 1.6%. However, such small values are not reported by pollsters; what they prefer to call the "polling error" is really what a statistician would call the "confidence interval," namely plus and minus a number of SEs. Nowadays, the claimed "polling error" is frequently a +/- 2 SE or even a +/- 3 SE confidence interval and often it is rounded up instead of being rounded off. Gallup, for example, rounds up a 3.2% "polling error" to 4% instead of rounding it off to 3%. Pollsters are perfectly entitled to use such a nomenclature, but whatever their preference, random error should cause the true value to be outside the confidence interval only 1 time out of 20 polls when a +/- 2 SE interval is used and 1 time out of 100 polls when a +/- 3 SE interval is used. Unfortunately, because failures to use survey research technology properly have become quite prevalent, such supposedly unlikely upsets have become not uncommon, as Shimon Peres recently discovered.

[3] Serious systematic sampling errors may affect breakneck telephone polls because some, perhaps even most, calls will not be answered at first. If unanswered numbers are not redialed over and over until someone does answer, then a poll will be biased in favor of more sedentary people and against more active types.

Note that a person who is sedentary at one time can be active at another time. During the Republican convention, for example, it would not be unreasonable to expect that a relatively greater number of Republicans were at home watching television, and vice versa. This change in behavior doubtless caused most of the fabled convention bounce.

Redialing of unanswered numbers is therefore essential; Standard survey research texts indicate that it must be done many different times in order to reduce this type of error to tolerable levels. (See Bernard Hennessy, Public Opinion, 1981, fourth edition, pp. 71-72, Brooks/Cole: Monterey CA.) But, as can be seen by inspection of the dates of the Gallup Poll (August 30-September 1, August 28-29, August 23-25, August 16-18, August 14-15, August 11, August 5-7, July 25-28, July 18-21, June 27-30, June 18-19, May 28-29, May 9-12, April 9-10, March 15-17, March 8-10, Feb 23-25), their data were all collected within a 1 to 4 day period (with most within 2 to 3 days), not enough time for an adequate number of callbacks.

[4] This idea was expressed by The John Zogby Group in June. See Associated Press Political Briefs, June 25, 1996 - 8:17 pm EDT.

[5] I distinguish between what people say to pollsters and what they actually do because Lewontin has forcefully drawn attention to the necessity of making this distinction in a thoroughgoing way. Men, for example, say they have much more sex with women than women say they have with men. Clearly, these two quantities must be identical. See Richard C. Lewontin, "Sex, Lies, and Social Science," The New York Review of Books, 20 April 1994.

[6] I examined Gallup Poll data for two reasons. First, because they are widely reported in many news media as, for example, the XYZ/Gallup Poll. Second, Gallup's underlying data are more accessible than those of most other widely reported polls.

Some of these data are eventually reported in The Gallup Poll Monthly (The Gallup Poll: Princeton NJ) in some detail. However, the most recent issue of this journal (i.e., that issue which was available in several university research libraries at the time of this writing) was dated May, 1996.

More up-to-date but less complete (as well as less exact) data were therefore retrieved from Gallup's internet website, which is located at: http://www.gallup.com/

But in early September, Gallup modified its website and no longer posts data there which are complete enough to be usable for analyses of the sort presented here. Unless Gallup responds positively to my written request for more complete data, the analysis presented here will be about as far as it is possible to go with this matter before the election.

[7] Data from the April 9-10 and the August 11 polls had to be omitted because Gallup's website reported them too incompletely to permit computations to be made.

[8] It should be noted that after Kemp was chosen by Dole for Vice President, the polls actually presented respondents with a choice between Clinton-Gore versus Dole-Kemp.

[9] Computation was necessary because the Gallup website did not report any party affiliation counts. But it did report overall presidential preferences along with the percentages of Republicans, Democrats, and Independents who favored Clinton, Dole, or who were Undecided. (These website percentages were rounded off to the nearest percent.) Party affiliations can be computed from such percentages because, for any given poll, these numbers form the coefficients of three simultaneous equations and the unknowns of these equations provide estimates of the national proportions of Republicans, Democrats, and Independents who were contacted by Gallup during that particular poll.

The algebra is as follows: The proportion of the national population who say they will vote for Clinton is determined by adding up the Republicans who say they will vote for him plus the Democrats who say they will vote for him plus the Independents who say they will vote for him. The next step in the analysis notes that Republicans who say they will vote for Clinton are given by the proportion of Republicans in the national population multiplied by the proportion of the Republicans who will say they vote for Clinton. And the analysis continues on in the same way through all of the remaining possibilities.

Applying this general logic to the particular Gallup data of June 27-30 leads to the following equations, where R stands for the proportion of Republicans in the national population, D stands for the proportion of Democrats, and I for the Independents:


For Clinton 0.54 = 0.16*R + 0.89*D + 0.50*I

For Dole 0.39 = 0.82*R + 0.08*D + 0.35*I

Undecided 0.07 = 0.02*R + 0.03*D + 0.15*I


Solving these three simultaneous equations produces:


R = 0.29, D = 0.35, and I = 0.36

[10] In statistical terms, the data in the figure have a correlation of 0.76, which means that 58% of the variance in the lead can be accounted for by the partisan difference.

[11] Party affiliations are reported in the monthly journal in the form of exact counts of persons and so percentages can be calculated to any desired degree of precision.

[12] To date, these more exact data have only been reported for polls conducted from February 23-25 to May 28-29.

[13] These actual data have a correlation of 0.93 and account for 86% of the variance in the lead, much higher than the corresponding values for the computed data. This is partly because rounding errors increased the scatter of the computed data above that which originally existed, and the increased scatter lowered the correlation of the computed data below that of the actual data. However, the smaller number of data points would also have inflated the correlation of the actual data somewhat, and so a final judgement on the magnitude of the correlation increase will have to be postponed until all of the actual data are in hand.