Recently, I read a news article from CNN (I don’t frequent CNN or any other specific news source. I browse around and cherry-pick articles.) and came across a poll for something mostly revolving around the Obama Administration. Being that I used to manage the data collection part of the polling process, I was interested.
I looked over the article, then began to peruse the user comments. Almost immediately I came across a user who did nothing but complain about the validity of the study. Again, being that I used to manage the data collection part of the polling process, I was interested.
Here is the article, and here is the comment from it:
“I just want to say, that regardless of what this poll says, it is wildly inaccurate. They only surveyed a sample size of 1,009 people, which by NO MEANS represents the millions that live in America. Where are the people from? How old are they? Also, a sampling error of plus or minus 3 percentage points is beyond terrible. Studies aim to be within 1%, and in the scientific world, this study would be dismissed due to its large inaccuracies.”
In response to that comment, I give you the rest of this post, some of which is in argument with the comment, some is in agreement, and some I feel is just clarification. The following wraps up a conversation had with a coworker about this and does not necessarily flow logically immediately from this point. I apologize for that and may correct a bit later. Until then, enjoy.
In addition to the fact that they are ranting about something they know nothing about, I am willing to bet they are the kind of person who would yell at us when we call. The reason I bring this up is this: They complain about how the 1,009 interviews does not constitute a proper sampling of the millions of Americans out there, and I gave you my argument as to why that assumption is invalid. However, they are technically right, but not for the reason they think they are. The total number of completed interviews (down to a certain point… 10 is not really a valid sample) is going to be representative, assuming a proper regional breakdown. However, there are still more factors to improve the accuracy, such as making sure to interview 500 Repubs and 500 Demos, which for SOME of the CNN surveys we actually did. That was a nightmare to manage, but we did it.
There are also other factors that we have only the TINIEST amount of control over; namely, getting people to do the survey. As I stated above, I’m willing to be this person would not wait for us to say ‘hello’ before they hung up on us, which is fine; that’s their choice. However, when a multitude of respondents do that (as is always the case) you start to get a ‘hidden’ alteration in the data. In other words, the answers provided by those 1,009 respondents do NOT represent the whole population of America, rather, it very well represents the population of Americans who are either willing or easily persuaded to complete telephonic interviews. If that person wants to complain about the validity of ANY survey results (not just CNN or whoever) they need to first take responsibility for the opinion of the American people by providing their input and adding their voice to the chorus.
MARGIN OF ERROR
NOW – with all that being said, I have yet more to say on the subject. This next part involves more about the respondents who DO complete the survey. Now, the person who made the comment about the CNN poll was complaining that a 3% margin of error is unacceptable, and that is somewhat true. Remember, we targeted 2% or less when I was in charge of the study. But for your information, in case you are currently unaware, the ‘margin of error’ comes from not actual ‘errors’ but respondents either not answering a question, or replying with a ‘no preference’ or ‘I don’t know’ reponse. When I was running it (though I cannot speak for the current job, I assume it is similar practice), we trained our interviewers to terminate the calls prematurely if a large portion of the answers were these types. One or maybe two in a survey is reasonable, sometimes people don’t know, but once they hit 3 in a survey, it was to be ended. Some people like to mess with pollers by giving bad data. Even worse, some will knowingly answer the Qs with bad responses. To that, I go back to the end of the last paragraph, wherein I call for the people of America to take responsibility for the results of these polls.
Back to the margin of error, we terminated calls that hit 3 DK responses, but what if they only had 2. Well, in a survey of 50 questions, that leads to a 4% error rate! It is VERY difficult to master a 0-0.5% error rate. Either you have to have a complete survey of over 100 questions and only accept those with 1 or fewer DK responses, which would be so stupid to manage, I’m not even going to go into the details, or you would have to complete many more surveys than your original sample requires in order to remove those interviews which have DK responses. In both of these cases, by doing ANYTHING outside of a direct polling would result in invalid data. Now, would you prefer the pollster had a 3% margin of error, or would you prefer a 0% margin with potentially important opinions being censored?
And LASTLY, I do have beef with the CNN survey itself. There is a certain way that CNN chose (chooses) to word certain questions, which I feel biases the answers given. Having taken Linguistics courses, I know the name of such questions, but not knowing the name, the error should be obvious. However, when I brought my issues up with management, project management, and other senior leadership, I was basically told ‘oh well’. Here’s the problem: Do you see a difference in the following two questions? Before moving on to the next paragraph, really study these next two lines.
(1.a) Do you feel Obama is doing a good job, or don’t you think so?
(1.b) Do you feel Obama is doing a good job, or do you feel he is not doing a good job?
Before I describe which one is invalid (whether you’ve figured it out on your own or not), I will explain the linguistics of the dilemma. In some languages (we’ll focus on English, of course) there are things called ‘tag questions’. These are questions which ask a question, then add to the end of that question, a phrase which indicates a desire of the answerer to respond a certain way. From one of the true authorities on the field of linguistics, SIL, the definition of a tag question is as follows: “A tag question is a constituent that is added after a statement in order to request confirmation or disconfirmation of the statement from the addressee. Often it expresses the bias of the speaker toward one answer.” I would like to reiterate here the last sentence, “Often it expresses the bias of the speaker toward one answer.” Now look again at the two versions of the questions above and compare to the following two, which have simply been reworded, but reflect the same meaning as the original:
(2.a) Don’t you think that Obama is doing a good job?
(2.b) Do you think Obama is doing a good, or a bad job?
Do you see the bias now? By the way, here’s a very cool website cataloging a good number of polls over more than a decade. Here is a search of that site for the horrible CNN Tag question suffix “or don’t you think so?”
 SIL - Tag Questions