In 1975, the US Congress held a series of hearings to discuss a frightening new surveillance technology they feared would take away individual liberties. “Technological developments are arriving so rapidly and are changing the nature of our society so fundamentally that we are in danger of losing the capacity to shape our own destiny,” said John Tunney, a senator from California, during the opening of the first hearing. Lawmakers expressed fears that the new system would enable the federal government to amass large amounts of data on private citizens, cross-linking traffic violations with tax returns.
What was it that NBC News described as a “secret electronic intelligence network that gives the White House, the CIA, and the Defense Department Instant access to computer files on millions of Americans?” Arpanet, the then nascent computer network developed by the Pentagon’s research agency, which eventually led to the modern day Internet.
Now, almost four decades later, the US government is facing a new wave of criticism over a controversial programme that connects the data dots. Some of the issues being raised echo those heard in 1975; people are again asking whether the ability to collect massive amounts of data on human society compromises privacy.
At issue are recent revelations by the Guardian newspaper that the US government was collecting wide-ranging “telephony metadata” on its customers, including numbers called, time and duration of calls, and other identifying information. That was quickly followed by further revelations that the National Security Agency and FBI are also mining Internet traffic – everything from email to video chats.
The idea behind collecting such information is that modern data science can navigate a sea of information to alert law enforcement and intelligence agencies to any potential terrorist networks. But the debate is also about a broader issue: the emerging science of “big data,” which allows you to see connections and trends in large masses of information, and perhaps even predict types of behaviour.
“Big data is very useful, it can be incredibly revealing,” says Julian Sanchez, a fellow specialising in privacy and technology at the Washington DC-based Cato Institute. But the flip side, he says, is that big data can also be very revealing when it comes to what we think is private information.
The government’s argument is that the data it’s collecting is largely anonymised, and thus does not encroach on individual privacy. But that presumption rests on the idea that anonymised data truly is anonymous. Recent studies show that this may not be the case.
In a study published in Nature earlier this year, a group of researchers looked at the phone records of some 1.5 million mobile phone users in an undisclosed small European country, and found it took only four different data points on the time and location of a call to identify 95% of the people. “Hence, even coarse datasets provide little anonymity,” they concluded.
As soon as you have some piece of information, such as gender, language, or zip code, you can start to identify people, according Vincent Blondel, a professor at the Université catholique de Louvain, in Belgium, and one of the study’s authors. “It’s very hard to make things anonymous,” he says.
If current projections are correct, there’s soon going to be much, much more data available. Abe Usher, chief innovation officer at US-based data and cybersecurity firm HumanGeo Group, points out that there will be more mobile devices than people by 2014, which creates what he call a “human sensor network.” While such information can be used by the private sector to target consumers, such as to up-sell someone a particular computer, Usher says “the real value is in understanding the aggregates and macrotrends” – the kind of information that could tell a large retailer where to place a store to maximise profits.