Friday, June 06, 2008

Not Rocket Science

Neil Degrasse Tyson has an op-ed in the NYT arguing that Hillary Clinton is clearly the stronger candidate against John McCain. His argument is based on the work of two astrophysicists:

This conclusion comes not from wishful thinking but from a new method of analysis on the statistics of polls that has been accepted for publication in the journal Mathematical and Computer Modeling. The authors, J. Richard Gott III, a professor at Princeton, and Wes Colley, a researcher at the University of Alabama in Huntsville, are not political scientists. They are astrophysicists. And one of the tasks of scientists is to clarify the apparent complexity of the universe by using the language of mathematics.

Here’s what they discovered: in swing states, the median result of all the polls conducted in the weeks prior to an election is an especially effective predictor of which candidate will win that election — even in states where the polls consistently fall within the margin of error.

This method provides a far more accurate assessment of public opinion than most people’s politically informed commentary. In the 2004 presidential election between John Kerry and George W. Bush, many political analysts said the race was too close to call. But when Professor Gott and Dr. Colley applied the median method in 2004, they correctly predicted the winner in 49 states, missing only Hawaii.

He then goes on to apply this method to general election matchups over the last month:

When you complete this exercise for each state, Mr. Obama picks up Colorado, Iowa and New Mexico, three states that went Republican in 2004, but he also loses Michigan and New Hampshire, two states that Mr. Kerry had won. Mrs. Clinton loses the previously Democratic states of New Hampshire and Wisconsin, but she would nab 57 electoral votes from the Republicans by winning Florida, New Mexico, Nevada and Ohio.

If the general election were held today, Mr. Obama would win 252 electoral votes as the Democratic nominee, while Mrs. Clinton would win 295. In other words, Barack Obama is losing to John McCain, and Hillary Clinton is beating him

Count me skeptical. First, I don't think polls this far out are as meaningful as polls done shortly before the election. Campaigns generate information that voters use to decide their vote and thus voters later in the campaign are more certain of their views about the parties and candidates then they were earlier in the campaign. For example, Gott and Colley used polls from October, after voters knew Kerry's VP choice and after both parties had held their conventions. Furthermore, many of these polls came after voters had a chance to size up the candidates in the debates. Voters in current polls have none of this information. In fact, until this week, it was not even 100 percent certain who would be the Democratic nominee. If Tyson thinks the Gott and Colley model works this far out, he should test it using the polls from May 2004. My guess is that that model would have had Kerry winning pretty handily.

Second, polling increases in frequency through the campaign and by the final month polls in the swing states come fast and furious. For example, Gott and Colley's used 68 polls to forecast Florida and 56 in Ohio. In contrast, Tyson uses 3 polls to determine the results of Obama-McCain match up in those states and only 2 polls to forecast a Clinton-McCain contest. Such a small sample will undoubtedly result in more error. In fact, the one state Gott and Colley missed, Hawaii, was a state that had only two polls.

Third, Gott and Colley didn't really predict 49 of 50 states, since the outcome in most states was not in doubt. By the end of the campaign, there were perhaps a half a dozen states still in play, a much smaller set to predict.

Finally, the 2004 election was remarkably stable, with little movement to either candidate in the last month. I'm not so sure how taking the median of polls would work in an environment where one candidate or another is surging in the last month or weeks. If Gott and Colley are really on to something, their model should also work in other elections, but from what I can tell, they haven't tried to test their model beyond 2004.

Update: Andrew Gellman offers more criticism here.

1 comment:

Josh Putnam said...

I still can't figure out why Tyson settled on this six weeks time frame. Sure, if that conforms to the model being replicated, that's fine, but why confine it to just those six weeks. There's more data to be had that, while still limited, wouldn't push the sample size down to 2 or 3 polls.

As I said over at Andrew Gellman's site, I've been keeping tabs on the electoral college since March using a weighted average of the state polls since Super Tuesday. If the time frame is pushed back to that point the ten states mentioned in the op-ed have a more robust sample of polls to look at.

CO 6 polls
FL 11 polls
IA 9 polls
MI 6 polls
NV 6 polls
NH 7 polls
NM 7 polls
OH 14 polls
PA 21 polls
WI 10 polls

Now, this isn't 68 or 56, but it gives us a bit more to go on than 2 or 3 polls. Is it any more accurate? Maybe slightly, but it is too early anyway.