Social media can tell us a lot about voting patterns -- but only up to a point.
What can social media and "big data" tell us about electoral behavior in non-Western countries? At its best, social media can act as a relatively accurate metric for public opinion and reliably forecast political behavior. At its worst, however, it can obscure those opinions and present a skewed version of public preferences.
Social and computational scientists have debated whether we can forecast United States congressional election outcomes with various metrics gained from Twitter. In a Washington Post op-ed, Indiana University professor Fabio Rojas proclaimed:
"Digital democracy will put these campaign [polling] professionals out of work. New research in computer science, sociology, and political science shows that data extracted from social media platforms yield accurate measurements of public opinion."
This claim has its critics. Other political scientists point out that candidate incumbency itself can often predict elections to a high degree of accuracy. A number of computer scientists (here, here, and here) hold that a true "forecast" should predict elections "out-of-sample" (that is, researchers build a model with a certain set of data and use it to predict another separate set of data). As a result, real forecasts must be published before the election.
Yet there may be some support for the argument that social media can act as a reliable proxy for public opinion and electoral behavior. The major assumption here is that the electoral preferences of social media and Internet users looks something like the electoral preferences of the electorate. In non-Western or more closed political systems, where polling may be less common or people are less likely to express their true partisan positions to pollsters, social media may be a good indicator of public opinion. Then again, it may also be incredibly misleading.
We can look into this question by using the recent presidential race in Iran as an electoral case study. This is one case in which social media and other Internet data gave some indication of electoral dynamics against the conventional wisdom. But is this always accurate, or merely a function of the web presence of particular candidates? Here, we'll compare the results to the 2012 Egyptian presidential election, in which social media fails (and fails badly) in forecasting the winner of that race.
Speculation on candidates in the run-up to Iran's June 2013 presidential election ran wild. In March, the conservative website Mashregh listed no less than 26 possible contenders from across the political spectrum. On May 21, the Guardian Council -- the body which vets electoral contenders based on the vague notion of constitutional "suitability" -- approved eight candidates. Prominent conservatives permitted to run included Tehran's mayor Mohammad-Bagher Qalibaf, National Security Council member and nuclear dossier negotiator Said Jalili, and former Revolutionary Guards Commander Mohsen Rezai. In addition, the council approved two candidates from the center-liberal side of Iran's political establishment: former vice president Mohammad-Reza Aref and a former negotiator of the nuclear dossier, Hassan Rouhani.
Live televised debates took place on May 31, June 5, and June 7. One of us (Kevan) was in Tehran at the time. At the outset it was difficult to tell each candidate's chances. All the candidates tried to disassociate their own platform from then-President Mahmoud Ahmadinejad's lame duck record. Quite a few previous supporters of reformist politicians, citing the events of 2009, loudly announced they would not be voting at all. Western news outlets, believing all decisions great and small to securely be in the hands of Iran's Leader, Ali Khamenei, forecast a conservative victory. However, the TV debates, especially the rowdy third debate on June 7, reportedly watched by over half of the population, appeared to show that a real race was afoot. On the ground in Tehran and provincial cities, closer to the action, we could sense something new: a reawakening of electoral mobilization that had been politically dormant since 2009. Yet would it be big enough to register at the ballot box, given well-known popular misgivings? And could it be gauged at all?
Polling in Iran is notoriously spotty, but sociologist Hossen Ghazian ran a well-constructed and independently funded daily phone tracking poll in the two weeks preceding the election. With over 1,000 randomly sampled respondents each day, Ghazian's organization, Information and Public Opinion Solutions (IPOS), provided a crucial measure of changing support for various candidates -- even with 40 percent of respondents answering as undecided, even up until the last day. Nevertheless, IPOS captured the most important shift in the election. When Mohammad-Reza Aref pulled out of the race on June 11 and threw his support to Hassan Rouhani, the election became a choice between a single center-liberal candidate and numerous conservative candidates. With former presidents Rafsanjani and Mohammad Khatami backing Rouhani, the mobilization surge also brought in three key voting blocs: those who had previously decided to abstain, those who were still undecided, and those who switched their vote away from conservatives.
Ghazian captured the trend, but his poll lagged slightly behind a rapidly changing reality. The final IPOS poll on June 13 put Rouhani at 38 percent of decided voters, but 42 percent of the respondents stated they were still undecided. Rouhani won the vote the next day with 50.7 percent of the vote.
Could the hive mind of Twitter and Facebook have predicted the outcome better? This is what Malaysia-based blogger and "social media specialist" Pooria Asteraky claimed, suggesting that social media was a sufficient reflection of political opinion. Using aggregate counts of each candidate from Facebook, Twitter, blogs, and Google Trends, he forecasted Rouhani's victory as soon as the third debate on June 7, far in advance of Aref's withdrawal. In his final forecast, he predicted a 52.1 percent vote share for Rouhani, within 1.5 percent of the true vote share.
The above figure shows Pooria's estimation of the popularity of each candidate. Although he seems to capture Rouhani's bump from Aref's withdrawal, we had some methodological doubts over Pooria's approach and attempted to replicate his analysis. We searched for candidates' names in Google Trends and Twitter (using the Twitter search tool Topsy), since full historical data for Facebook and blogs are generally unavailable.
Following Pooria, we looked solely at the volume of candidate mentions within Iran in the Persian language. In the Google Trends data, there doesn't seem to be any clear front-runner on the day of the last debate. However, after the debate, Rouhani and Aref begin separating themselves from the rest of the pack, suggesting that the debates themselves may have swayed some voters. Or, at least, their interest was piqued enough to search for these two names.
As for Twitter, which has a much lower penetration rate in Iran than Google, the trends are similar but with minor differences. Much more Twitter traffic centered around the debate. After the debates, Rouhani maintained a narrow lead in tweet volume over Aref and Jalili but didn't substantially break away until much closer to the election date.
These are, to be sure, somewhat rough and naïve estimators of public opinion, predicated on the idea that "all publicity is good publicity." But if one buys into the "more tweets, more votes" thesis, then this is a good enough reflection of electoral preferences of Iranians that manifested in Rouhani's June 14 victory.
Does the same kind of relationship hold in other hotly contested elections in the Middle East? Egypt held its first open presidential election in 2012 after the ouster of Hosni Mubarak. With most political forces still consolidating power, the field of contestation seemed wide open between a variety of Islamists, reformists, those regarded as feloul, or remnants of the old regime, and all in between. The Muslim Brotherhood, considered the most organized political organization in post-revolution Egypt, broke their initial promise to refrain from fielding a presidential candidate when it ran one of its leading members, Khairat al-Shater, and again when they ran Mohamed Morsi after Shater was disqualified. Hamdeen Sabahi, a long-time reformist politician and activist associated with the Nasserist Karama party, made an impressive and unexpected showing in the first round. From the former regime, former Foreign Minister Amr Moussa and Mubarak's final Prime Minister Ahmed Shafik appealed to many Egyptians' want for order and their fear of an Islamist president. The reform Islamist Abdel Moneim Aboul Fotouh -- a former Muslim Brother -- appealed to many, including a wide swath of youth.
After nearly seven months of uncertainty (the Supreme Council of the Armed Forces had announced that elections would take place as early as November 2011), the election campaign began to coalesce. On May 10, Moussa debated toe-to-toe with Aboul Fotouh for nearly six hours on national TV. Two weeks later, the first round of the elections were held, with Morsi and Shafik making the most impressive showings, Sabahi in third, and those who had actually debated, Aboul Fotouh and Moussa, taking fourth and fifth, respectively. In the second round, on June 16-17, Morsi was claimed as the winner with 51.7 percent of the vote. He was not officially certified until June 24.
What could social media tell us about these election dynamics? Using the same approach as we did in Iran, we looked at Google Trends and Twitter volume for mention of these candidates in Arabic within Egypt.
Google Trends in Egypt first show a peak after the debate between Aboul Fotouh and Moussa. Before the first round, Fotouh, Shafik, and Sabahi are nearly tied in search volume. After the first round, however, the buzz about Shafik dwarfs that of any other candidate. His volume remains high and he stays ahead of Morsi even after the first round. It isn't until the election results are finally announced that Morsi displaces Shafik.
Twitter tells a largely similar story, with two important differences: Fotouh is more referenced on Twitter in the first round, possibly reflecting his popularity with youth. Second, Morsi doesn't displace Shafik until days after he is formally certified as the election victor.
In this case, relying on search volume and web mentions would have been an incredibly poor way to predict the outcome of the Egyptian presidential election.
The lesson here isn't that Twitter and other web data simply provide no important information. There is indeed something happening when we can retrieve trends from what people search and what they say. The fact that they reflect electoral events, whether mediated through broadcast media or otherwise, means we're picking up on salient political behavior. What isn't clear, however, is how these data are produced and what that means for their quality as an indicator of political preferences.
In Iran and Egypt, as elsewhere, web data tend to be produced by those who have frequent access to the Internet, usually urbanites and the middle classes. These groups tend to have particular political biases that lean reformist and anti-Islamist. In Iran, they learned towards Rouhani, in Tehran as well as in nearly every urban area. In Egypt, those biases were against Morsi -- the most populous governorates went for Shafik (with the exception of Alexandria, a stronghold of the Muslim Brotherhood).
To put it in statistical terms, using data from social media in both cases results in a biased sample. There are ways to correct for particular known biases, but as we move from country to country and election to election, the sample changes in ways that we can't easily measure.
Given these issues, it may be somewhat premature to proclaim the "death of the pollster," especially in the Middle East, where a distant gaze on public expression of political preferences is bound to give a skewed view of reality.
ATTA KENARE/AFP/Getty Images