Humans have been whining about being bombarded
with too much information since the advent of clay tablets. The complaint in
Ecclesiastes that "of making many books there is no end" resonated in the
Renaissance, when the invention of the printing press flooded Western Europe
with what an alarmed Erasmus called "swarms of new books." But the digital
revolution -- with its ever-growing horde of sensors, digital devices, corporate
databases, and social media sites -- has been a game-changer, with 90 percent of
the data in the world today created in the last two years alone. In response,
everyone from marketers to policymakers has begun embracing a loosely defined
term for today's massive data sets and the challenges they present: Big Data.
While today's information deluge has enabled governments to improve security
and public services, it has also sowed fears that Big Data is just another
euphemism for Big Brother.
statistician Herman Hollerith invents an electric machine that reads holes
punched into paper cards to tabulate 1890 census data, revolutionizing the
concept of a national head count, which had originated with the Babylonians in
3800 B.C. The device, which enables the United States to complete its census in
one year instead of eight, spreads globally as the age of modern data
Franklin D. Roosevelt's Social Security Act launches the U.S. government on its
most ambitious data-gathering project ever, as IBM
wins a government contract to keep employment records on 26 million working
Americans and 3 million employers. "Imagine the vast army of clerks which will
be necessary to keep these records," Republican presidential candidate Alf
Landon scoffs. "Another army of field investigators will be necessary to check
up on the people whose records are not clear."
At Bletchley Park, a
British facility dedicated to breaking Nazi codes during World War II, engineers develop a series of
groundbreaking mass data-processing machines, culminating in the first programmable electronic computer.
The device, named "Colossus," searches for patterns in intercepted messages by
reading paper tape at 5,000 characters per second -- reducing a process that had
previously taken weeks to a matter of hours. Deciphered information on German
troop formations later helps the Allies during their D-Day invasion.
The U.S. National Security
Agency (NSA), a nine-year-old
intelligence agency with more than 12,000 cryptologists, confronts information
overload during the espionage-saturated Cold War, as it begins collecting and
processing signals intelligence automatically with computers while struggling
to digitize a backlog of records stored on analog magnetic tape in warehouses.
(In July 1961 alone, the agency receives 17,000 reels of tape.)
The U.S. government
secretly studies a plan to transfer all government records -- including 742
million tax returns and 175 million sets of fingerprints -- to magnetic computer
tape at a single national data center, though the plan is later scrapped amid
public concern about bringing "Orwell's '1984' at least as close as 1970," as
one report puts it. The outcry inspires the 1974 Privacy Act, which places
limits on federal agencies' sharing of personal information.
computer scientist Tim Berners-Lee proposes leveraging the Internet, pioneered
by the U.S. government in the 1960s, to share information globally through a "hypertext"
system called the World Wide Web. "The information contained would grow past a
critical threshold," he writes, "so that the usefulness [of] the scheme would
in turn encourage its increased use."
"We are developing a supercomputer that will do more calculating in a second than a person with a hand-held calculator can do in 30,000 years." --U.S. President Bill Clinton
NASA researchers Michael Cox
and David Ellsworth use the term "big data" for the first time to describe a
familiar challenge in the 1990s: supercomputers generating massive amounts of
information -- in Cox and Ellsworth's case, simulations of airflow around
aircraft -- that cannot be processed and visualized. "[D]ata sets are generally
quite large, taxing the capacities of main memory, local disk, and even remote
disk," they write. "We call this the problem of big
the 9/11 attacks, the U.S. government, which has already dabbled in mining
large volumes of data to thwart terrorism, escalates these efforts. Former
national security advisor John Poindexter leads a Defense Department effort to
fuse existing government data sets into a "grand database" that sifts through
communications, criminal, educational, financial, medical, and travel records
to identify suspicious individuals. Congress shutters the program a year later
due to civil liberties concerns, though components of the initiative are simply
shifted to other agencies.
The 9/11 Commission calls for
unifying counterterrorism agencies "in a network-based information sharing
system" that is quickly inundated with data. By 2010, the NSA's 30,000 employees will be
intercepting and storing 1.7 billion emails, phone calls, and other
communications daily. Meanwhile, with retailers amassing information on
customers' shopping and personal habits, Wal-Mart boasts a cache of 460
terabytes -- more than double the amount of data on the Internet at the time.
social networks proliferate, technology bloggers and professionals breathe new
life into the "big data" concept. "This is a world where massive amounts of
data and applied mathematics replace every other tool that might be brought to
bear," Wired's Chris Anderson writes in "The End of Theory." Government
agencies, some of the United States' top computer scientists report, "should be
deeply involved in the development and deployment of big-data computing, since
it will be of direct benefit to many of their missions."
Indian government establishes the Unique Identification Authority of India to
fingerprint, photograph, and take an iris scan of all 1.2 billion people in the
country and assign each person a 12-digit ID
number, funneling the data into the world's largest biometric database.
Officials say it will improve the delivery of government services and reduce
corruption, but critics worry about the government profiling individuals and
sharing intimate details about their personal lives.
President Barack Obama's administration launches data.gov as part of its Open
Government Initiative. The website's more than 445,000 data sets go on to fuel
websites and smartphone apps that track everything from flights to product
recalls to location-specific unemployment, inspiring governments from Kenya to
Britain to launch similar initiatives.
to the global financial crisis, U.N. Secretary-General Ban Ki-moon pledges to
create an alert system that captures "real-time data on the impact of the
economic crisis on the poorest nations." The U.N. Global Pulse program has
conducted research on how to predict everything from spiraling prices to
disease outbreaks by analyzing data from sources such as mobile phones and
"There were 5 exabytes of information created by the entire world between the dawn of civilization and 2003. Now that same amount is created every two days." --Google CEO Eric Schmidt
200 million pages of information, or 4 terabytes of disk storage, in a matter of seconds, IBM's Watson computer system defeats two human challengers
in the quiz show Jeopardy!. The New York Times later dubs this moment a "triumph
of Big Data computing."
The Obama administration
announces a $200 million Big Data Research and Development Initiative in
response to a U.S. government report calling for every federal agency to have a
"'big data' strategy." The National Institutes of Health puts a data set of the
Human Genome Project in Amazon's computer cloud, while the Defense Department
pledges to develop "autonomous" defense systems that can "learn from
experience." CIA Director David
Petraeus, marveling that the "'digital dust' to which we have access is being
delivered by the equivalent of dump trucks," discusses a post-Arab Spring
agency effort to collect and analyze global social media feeds through cloud
U.S. Secretary of State
Hillary Clinton announces a public-private partnership called "Data 2X" to
collect statistics on women and girls' economic, political, and social status
around the world. "Data not only measures progress -- it inspires it," she explains.
"Once you start measuring problems, people are more inclined to take action to
fix them because nobody wants to end up at the bottom of a list of rankings."
Let the Big Data race begin.
Sources for charts: International Data Corp., March 2012; Facebook SEC filing, April 2012.