Just within the environmental conflict realm, a recent report by the Army Environmental Policy Institute lists no fewer than twelve ongoing projects that touch on some aspect of forecasting. These include the USAID's Famine Early Warning System which tracks and predicts food insecurity around the world as well as the Climate Change and African Political Stability project, housed at the Robert S. Strauss Center at the University of Texas at Austin. Outside of the environmental arena, there are more.
Forecasting models need reliable measures of "things that are usefully predictive," Ulfelder notes. Well, sure. Does this mean that reliability is at issue? Or that we are using data that are not "usefully" predictive? This is a curious claim, especially in light of the controversial nature of polls. Indeed, there exists five decades worth of literature that grapples with exactly those issues in public opinion. Take the recent U.S. election as an example. In 2012 there were two types of models: one type based on fundamentals such as economic growth and unemployment and another based on public opinion surveys. Proponents of the former contend that that the fundamentals present a more salient picture of the election's underlying dynamics and that polls are largely epiphenomenal. Proponents of the latter argue that public opinion polling reflects the real-time beliefs and future actions of voters.
As it turned out, in this month's election public opinion polls were considerably more precise than the fundamentals. The fundamentals were not always providing bad predictions, but better is better. Plus there is no getting around the fact that the poll averaging models performed better. Admittedly, many of the polls were updated on the night before the election, though Drew Linzer's prescient votamatic.org posted predictions last June that held up this November. To assess the strength of poll aggregation, we might ask how the trajectory of Silver's predictions over time compare with the results, and there are other quibbles to raise for sure. But better is better.
When it comes to the world, we have a lot of data on things that are important and usefully predictive, such as event data on conflicts and collaborations among different political groups within countries. Is it as reliable as poll data? Yes, just so, but not more. Would we like to have more precise data and be able to have real-time fMRIs of all political actors? Sure, but it is increasingly difficult to convincingly argue that we don't have enough data.
Let's consider a case in which Ulfelder argues there is insufficient data to render a prediction -- North Korea. There is no official data on North Korean GDP, so what can we do? It turns out that the same data science approaches that were used to aggregate polls have other uses as well. One is the imputation of missing data. Yes, even when it is all missing. The basic idea is to use the general correlations among data that you do have to provide an aggregate way of estimating information that we don't have. We know enough about how other things are related to GNP that we can figure out reasonable estimates of what range it falls into in places where it is not observed. The CIA estimates North Korean GDP per capita at $1800 for 2011, based on extrapolations, growth rates estimations, and inflation. Our Duke imputations are based on lots of other data, but no data whatsoever for North Korean GDP, and we have it at about $1700, perhaps close enough for government work.
Our point is that collecting data in new and exciting ways has changed the nature of political forecasting. While the Twitterverse might have been agog over the daily release of the Gallup tracking poll, the real story of the election was playing out elsewhere. The firm Latino Decisions did not start frequent polling until 2010 (it was founded in 2007) but conducted more than 60,000 questionnaires during this year's election and found that the Latino vote was going to be overwhelmingly in favor of the Democrats. One need only glance at this year's exit polling to know that the Latino vote was crucial to President Obama's reelection. So by focusing on Gallup -- whose results were wildly off the mark anyway -- many mainstream pundits ignored a wealth of other, perhaps more important information.