Data visualization can offer some unique insights into social upheaval. But the data artists are just getting started.
At first glance you don't quite know what you're looking at. It could be strands of mitochondrial DNA, or sperm fertilizing an egg, or perhaps the product of an incongruous union between a sea creature and a hairball. As the thing grows, it passes through a series of increasingly complex mutations -- ending up as a kind of Death Star under construction.
You're looking at data -- or, to be more precise, a visual rendering of tweets and retweets from a several-hour period on February 11, 2011, starting shortly before the resignation of Egyptian President Hosni Mubarak.
More than ever before, we inhabit a cosmos of data. With smart phones and the ability to share reams of information publicly on social media, our digital footprints are everywhere -- from what we “like” on Facebook to our store loyalty cards to banking transactions on our smart phones.
With that explosion of data have come new ways to visualize it. From Sumerian cuneiform to the humble pie chart, visual symbols transform the abstract into the concrete, helping us to see patterns and relationships we might have otherwise missed. The bigger the data sets, the more useful pictures of them become -- and at no time in human history have we ever had to deal with floods of data greater than the ones that wash over us now.
Of all the social phenomena that invite analysis, few are as complex, or as volatile, as revolutions. The petabytes of social media data generated by the upheavals of the Arab Spring are fertile ground for social scientists studying those events. For years we've been snapping photos of demonstrators and protests; now the new cosmos of data potentially enables us to map the ebb and flow of the ideas that drive them, something like a magnetic-resonance imagery technique for visualizing the mechanisms of dramatic change.
Interest in the role of social media first exploded during Iran's post-election unrest in 2009. In his "Retweet Revolution," Gilad Lotan, a designer and computer scientist, tracked Twitter conversation threads around popular hashtags such as #iranelection, #ahmadinejad, and #mousavi. By visualizing flows, Lotan says that we can "put a face" to an audience -- as, for example, in this striking project about the Arab Spring, authored by Lotan and several of his collaborators (click to expand):
In 2011, Kovas Boguta beautifully mapped the "Egypt Influence Network," showing Arab- and English-language tweeters -- and, perhaps most strikingly, the bridge nodes that spanned the linguistic divide.
For social scientists, big data sets and visualizations are already proving to be powerful tools. Writing about network diagrams of tweets around Iran's 2009 post-election crisis, researcher Devin Gaffney saw that by using tag clouds, "we can identify the key terms used in tweets," as he wrote. "By looking at different tag clouds over time, we can perhaps even see terms reflect a general shift from awareness/advocacy towards organization/mobilization, and eventually action/reaction."
Data scientists can also map the growth of ideas over time. In 2008, John Kelly, of Morningside Analytics, a company that specializes in social network analysis, visualized the Iranian blogosphere. When he did the same a year later and compared the maps, he discovered not only that the number of blogs had grown, but also that new segments had sprung up. One of them, which he dubbed “CyberShia,” revealed a dramatic increase in blogging activity by religiously oriented users. While this could point to an effort by the pro-regime Basij militia to contain dissident discourse on the Web, another theory sees the data as evidence of an intensifying debate about Islamic law and its role within the country’s political system.
Iranian blogosphere 2009:
Iranian blogosphere 2008:
Similarly, by looking back at social-media data emerging from the Arab Spring, it's possible to see political ideas congeal and take shape. By evaluating data from 2010 and 2011, the Arab Media Influence Report, which captured over 10 million online conversations per day, showed how discourse became politicized by the start of 2011. In the first quarter of 2010, 57 percent of the Arabic conversations on social media included socio-economic terms (such as income, housing, and minimum wage). By 2011, that number had dropped to 37 percent. In 2010, 35 percent of the conversations on social media included political terms such as revolution, corruption, and freedom). In 2011, the number shot up to 88 percent.
Of course, we hardly need social media data to tell us that Arab society was becoming more politically aware in the period leading up to the revolts. Arguably one might have arrived at a comparable conclusion by sitting in Cairo coffee houses for a year or by tracking debates in the Arabic press.
Data visualization really comes into its own when it allows us to see patterns we might otherwise have missed, patterns that can be modeled and applied to other contexts. If, as some sociologists believe, structure is destiny, then network graphs might be able to tell us about the life spans of political movements, their likely growth and their eventual demise. "People tend to think about the qualities of the individual when they figure out whether they will be likely to succeed or fail," says Marc A. Smith, the director of the Social Media Research Foundation, a California-based nonprofit. "Network theory people, and sociologists more generally, like to think about the properties of a person's network as having equal, if not greater consequence to their likely outcomes."
As an example Smith cites two visualizations he made of the Occupy movement and the Tea Party on Twitter. In his renderings, the Tea Party appears as a far more tight-knit group, with many of them following each other, whereas Occupy is made up of looser clusters with a few high-profile accounts receiving plenty of retweets. In the lower right-hand corner of the visualizations there is a grid, a matrix of "isolates:" People who are talking about the ideas of Occupy or the Tea Party, but who don't have connections to others on the graph. For Occupy, the number of isolates is greater, which according to Smith could indicate a larger potential for growth and stronger brand cachet.
TeaParty on Twitter:
Occupy on Twitter:
When interrogating the data, the answers they yield are only as good as the questions we ask. Looking at the Twitter data obviously doesn't tell us how the Occupy forces in Zuccotti Park behave, sound, or smell. (The Twitterverse, after all, is not the universe.) But as Smith notes, fleshing out the data shows that that the group's structure strikingly reflects its self-description as a decentralized and bottom-up movement. "There's no question that big data can be very, very useful," says Zeynep Tufekci, a sociologist at the University of North Carolina. "But it's less useful and even misleading at times if it is not evaluated by people who understand the context of what they're looking for."
Without any context about Internet penetration or the demographic of Twitter users, network diagrams of Egyptian tweeters can give us the impression that Twitter played an oversized role in the unrest there. We might also get an overblown impression of the liberal character of the Arab Spring. But instead of being broadly representative of Egyptian society, these tweeters are rather a small sub-set of young, educated, often English-speaking elites, with a propensity for liberal ideas. They are perhaps as revealing about Internet connectivity as they are about Egyptian society. (Gregor Aische, a German designer, has produced his own striking visualization of the global digital divide.)
But perhaps it helps to ask a different question. If we want to analyze what's going on in the minds of Egypt's most influential elites, then that social media data set can certainly offer useful answers. "That may only be a sampling of a small [segment] of population, but if that's how they're organizing and communicating, then that's an important population," says Noah Iliinsky, an expert in the theory and practice of information visualization. "So I wouldn't discount that population simply because it's not representative."
What has got businesses, governments, and academics excited about big data and visualization is the ability to detect patterns in real time rather than mapping perceptions post-factum -- and even to use this data to make predictions. We're already seeing cases in which visualized data enable policy makers to make quick informed choices about public health, poverty, or energy efficiency. Google Flu Trends, which estimates current flu activity around the world in real-time by monitoring search terms, has been shown to predict confirmed cases of flu with a level of accuracy comparable to the Centers for Disease Control and Prevention. In 2010, researchers managed to predict with an accuracy of 87.6 percent the daily changes in the closing values of the Dow Jones Industrial Average by analyzing Twitter users' moods.
Could comparable techniques work for predicting social upheaval? The UN has launched an initiative, Global Pulse, that uses new technologies to collect, analyze, and filter information to help governments and organizations better understand what is happening in certain at-risk communities. In 2010 and 2011, the Ushahidi group, famous for its pioneering crowdsourcing and real-time visualization software, created a website that tracked potential disturbances during Liberia's elections.
On a similar note, the Associated Press reported in November 2011 that the CIA was monitoring five million tweets a day to monitor revolutionary change. (Of course, the same data can be used by repressive governments who want to track dissent and unrest.)
One of the key problems in making assumptions about societal change based on social media data is our limited understanding of the relevant conversion rates. Just as online marketers, advertisers, and political campaigners are frantically trying to understand the relationship between tweets and dollars or tweets and votes, the conversion rates for social change are murkier still. What assumptions, for example, can data scientists make between tweet volume and the amount of people likely to attend a protest? Or can the sentiments expressed in Facebook status updates be mined to produce an accurate indication of support for Vladimir Putin?
Given that social media analysis is still in its infancy, the answers to those questions remain elusive. So-called sentiment analysis still struggles to distinguish the signals from the noise. Network diagrams mapping relationships between tweeters and "likers" tell us that there is a big crowd, but they are pretty unhelpful in telling us what that crowd is thinking and why. While programs will become better at parsing huge amounts of data, they are still more comfortable counting than they are at interpreting. Computers still struggle with slang, sarcasm, and subtexts.
Take these tweets from user DanielNothing, who was tweeting about the London riots on August 6, 2011:
Heading to Tottenham to join the riot! who's with me? #ANARCHY
Clear enough, right? If thousands were retweeting DanielNothing's tweet, or tweeting similar sentiments, police officers might be well-advised to deploy resources accordingly. But then DanielNothing tweets again:
Hang on, that last tweet should've read 'Curling up on the sofa with an Avengers DVD and my missus, who's with me?' What a klutz I am!
Only friends of DanielNothing could say for sure what he meant. Is he just being sarcastic? Or could his first tweet be taken at face value, with his second being read as an attempt to mask his true intent? If a human struggles to decipher the true meaning, how would a computer fare?
Unpacking the complex performance art of social media is not easy. Our emotions on display might not always be attuned to the emotions we feel and our stated preferences might be approximations of our true desires. The cultural and societal meaning of a retweet or a Facebook "like" is more complex than we may think and varies across societies and platforms. "Not everyone is talking about it while mentioning it," says Fadi Salem, a fellow of governance and innovation at the Dubai School of Government.
People regularly retweet links to things they haven't read, not necessarily because they endorse them, but because they value the source or succumb to peer pressure. "At the moment, most [social media] analyses are not only superficial, but kind of crude," says Tufekci. "You don't even know were these tweets positive or negative... and what does this excitement correspond with."
Despite these limitations, the amount of data is unprecedented, and is correspondingly poised to transform sociological research. "We are now getting moment-by-moment statements from hundreds of millions of people in a native machine-readable format," says Smith. "This has never happened before."
Finding more innovative ways to process, parse, manage, and visualize this information will be crucial as the mountains of data grow. Piers Fawkes, who runs PSFK, a trends analysis company that has consulted for UN Global Pulse, says the data-rich future will encompass not just social-media data but also search queries, YouTube queries, and financial and retail transactions. The open data movement is pushing for more and more data sets to be made public so that others can benefit from them. Global notions of privacy are likely to loosen as we grow more comfortable (or clueless) about sharing our information.
As more people get online around the world, especially through their cheap smart phones (of which there are now some 4 billion), data streams will proliferate and most segments of society will reveal themselves through various kinds of social media data. (Last year, for example, Twitter user @Arasmus used information from Twitter accounts to map out violence against pro-democracy protestors in Libya.)
Advocates of open data are excited by the possibilities -- as no doubt are certain governments interested in more sophisticated ways of snooping on their citizens. Mobile phones allow us to become nodes in a human sensor network, feeding data sets on the weather or the state of water pumps. Big data visualized in real time can help to manage traffic congestion or to allow medical workers to better allocate supplies in hospitals.
"Data visualization will bring us deep knowledge. It will bring us awareness of things that have been too data-intensive to get good answers from in the past," says Iliinsky. "It will get us answers more quickly, if we can collect and analyze the data more quickly than we have in the past. And it's going to show us areas that we may have overlooked." Take the case of Dataminr, an analytics company that has just announced a partnership with Twitter that will presumably give the company greater access to tweets and metadata. The company claims that its system got wind of Osama bin Laden’s death minutes before news media -- minutes that could have enabled the company’s financial clients to get ahead of major moves in the markets.
No one should expect social media analysis to replace surveys, existing early-warning networks, or traditional ethnographic research any time soon. Yet if these data visualization tools can make good on their promise, we'll soon have some powerful new ways of telling stories about our social universe with a speed and clarity that would have been hard to imagine just a few years ago. If we want to take a meaningful snapshot of the next iteration of Tahrir Square, reaching for a camera will no longer be our only option.
ODD ANDERSEN/AFP/Getty Images