"We are developing a supercomputer that will do more calculating in a second than a person with a hand-held calculator can do in 30,000 years." --U.S. President Bill Clinton
NASA researchers Michael Cox and David Ellsworth use the term "big data" for the first time to describe a familiar challenge in the 1990s: supercomputers generating massive amounts of information -- in Cox and Ellsworth's case, simulations of airflow around aircraft -- that cannot be processed and visualized. "[D]ata sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk," they write. "We call this the problem of big data."
After the 9/11 attacks, the U.S. government, which has already dabbled in mining large volumes of data to thwart terrorism, escalates these efforts. Former national security advisor John Poindexter leads a Defense Department effort to fuse existing government data sets into a "grand database" that sifts through communications, criminal, educational, financial, medical, and travel records to identify suspicious individuals. Congress shutters the program a year later due to civil liberties concerns, though components of the initiative are simply shifted to other agencies.
The 9/11 Commission calls for unifying counterterrorism agencies "in a network-based information sharing system" that is quickly inundated with data. By 2010, the NSA's 30,000 employees will be intercepting and storing 1.7 billion emails, phone calls, and other communications daily. Meanwhile, with retailers amassing information on customers' shopping and personal habits, Wal-Mart boasts a cache of 460 terabytes -- more than double the amount of data on the Internet at the time.
As social networks proliferate, technology bloggers and professionals breathe new life into the "big data" concept. "This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear," Wired's Chris Anderson writes in "The End of Theory." Government agencies, some of the United States' top computer scientists report, "should be deeply involved in the development and deployment of big-data computing, since it will be of direct benefit to many of their missions."
The Indian government establishes the Unique Identification Authority of India to fingerprint, photograph, and take an iris scan of all 1.2 billion people in the country and assign each person a 12-digit ID number, funneling the data into the world's largest biometric database. Officials say it will improve the delivery of government services and reduce corruption, but critics worry about the government profiling individuals and sharing intimate details about their personal lives.
U.S. President Barack Obama's administration launches data.gov as part of its Open Government Initiative. The website's more than 445,000 data sets go on to fuel websites and smartphone apps that track everything from flights to product recalls to location-specific unemployment, inspiring governments from Kenya to Britain to launch similar initiatives.
Reacting to the global financial crisis, U.N. Secretary-General Ban Ki-moon pledges to create an alert system that captures "real-time data on the impact of the economic crisis on the poorest nations." The U.N. Global Pulse program has conducted research on how to predict everything from spiraling prices to disease outbreaks by analyzing data from sources such as mobile phones and social networks.
"There were 5 exabytes of information created by the entire world between the dawn of civilization and 2003. Now that same amount is created every two days." --Google CEO Eric Schmidt
Scanning 200 million pages of information, or 4 terabytes of disk storage, in a matter of seconds, IBM's Watson computer system defeats two human challengers in the quiz show Jeopardy!. The New York Times later dubs this moment a "triumph of Big Data computing."
The Obama administration announces a $200 million Big Data Research and Development Initiative in response to a U.S. government report calling for every federal agency to have a "'big data' strategy." The National Institutes of Health puts a data set of the Human Genome Project in Amazon's computer cloud, while the Defense Department pledges to develop "autonomous" defense systems that can "learn from experience." CIA Director David Petraeus, marveling that the "'digital dust' to which we have access is being delivered by the equivalent of dump trucks," discusses a post-Arab Spring agency effort to collect and analyze global social media feeds through cloud computing.
U.S. Secretary of State Hillary Clinton announces a public-private partnership called "Data 2X" to collect statistics on women and girls' economic, political, and social status around the world. "Data not only measures progress -- it inspires it," she explains. "Once you start measuring problems, people are more inclined to take action to fix them because nobody wants to end up at the bottom of a list of rankings." Let the Big Data race begin.