Ulfelder and others argue that the world needs a large amount of data from a large "swath" of history to effectively develop good models, but this kind of thinking often fails to reflect the mechanics of a statistical forecast. We've had elections in this country for more than two centuries at last count, but very few of the models that were applied to predict the 2012 election employed data on each of them. Some structural models only used data on a few years, and the polling data obviously doesn't go back very far.
There is a tradition in world politics to go either back until the Congress of Vienna (when there were fewer than two dozen independent countries) or to the early 1950s after the end of the Second World War. But in reality, there is no need to do this for most studies. Typically, data is needed only for the current political era, which might date to 1989 or even just through the current century. Don't be misled by claims that there "won't be enough cases to analyze" if you use these "shortened" time frames. This makes the assumption that you have to analyze annual data, an assumption that is just blatantly false, even if it has been standard operating practice in quantitative world politics for decades. Not only are there techniques available for analyzing data on a much smaller time scale (days, weeks, or months), but if we use them, we are likely to get closer to those elusive variables that the policymakers lust after. And generally, the data have to be tortured before they surrender to annual formats. Consider, for instance, a coup on January 1 in 2010 and another one on December 31st. These two coups occurred in the same calendar year, but did they actually occur at the same instant?
Ulfelder tells us that "when it comes to predicting major political crises like wars, coups, and popular uprisings, there are many plausible predictors for which we don't have any data at all, and much of what we do have is too sparse or too noisy to incorporate into carefully designed forecasting models." But this is true only for the old style of models based on annual data for countries. If we are willing to face data that are collected in rhythm with the phenomena we are studying, this is not the case. For example, Thailand became considerably more democratic in July 2011 as a result of Yingluck Shinawatra winning a landslide election and successfully forming a coalition government to replace a government established by a coup d'état. There is no need to assume this change to a more democratic form of government applies to the entire year of 2011, since we know it didn't really characterize Thailand during the first half of the year. We have data and techniques that can deal with monthly or even daily data. Whether the data are too noisy to make use of is an empirical question. Our hunch is that clever data scientists will find a way to make these data useful.
Consider thyroid cancer. According to the National Cancer Institute, thyroid cancer has an incidence of about 6 in 100,000. It is a rare event, but we know quite a lot about it, even to the level of making preventive prescriptions. The rareness does not prevent us from learning about its occurrence, how to treat it, and even how to best avoid this cancer.
Don't get us wrong: Better data is always better. But we actually have a lot of data. We don't want to argue that forecasts about world politics are as precise as we observed for the 2012 presidential elections. And we agree with Ulfelder that they are not. But we are not so pessimistic to think that forecasting of certain kinds of events cannot in principle be solved by a combination of statistical approaches and data by clever investigators. David Rothschild, David Pennock, and a team of about 30 others (then at Yahoo; now mainly at Microsoft Research) predicted 303 electoral votes for Obama on February 15th, 2012 not by building a model of the election, but as they put it, by being "the mother of all prediction engines, period." There is considerable effort underway, beyond Microsoft Research, to build predictive models that relate to world politics.
Better, of course, will always be better. But there is room for seeing the water in the glass, not just how big the glass is.