His lab’s models also correctly predicted that the 2011 Mel Gibson movie The Beaver, about a man who speaks through a hand puppet, would bomb; so too would Jim Carey’s Mr. Popper’s Penguins. Neither received much pickup on social media, according to Chen.
The lesson learned from looking at social media data and movies is that it does not matter who the star is, how much you spend on the movie, or even how good the movie is: what matters is what people are saying about it online. “The movie industry is buzz,” Chen suggests. “It’s not content.”
Whilst this crowd approach works nicely for films, it starts to break down when applied to political events. Instead, researchers have to look for different types of data. This is where the “swarm” comes in, says Gloor. The swarm is a group of more neutral experts, such as those who regularly edit Wikipedia. “There are lots of Wikipedians, but just two or three thousand do most of the work,” he says. “We track those, how well respected they are, who’s editing what.”
In the case of the Republican primaries, Gloor’s analysis in December noted that even though the masses on Twitter seemed to indicate a Gingrich victory, the swarm on Wikipedia was pointing to Romney.
While those working in predictive analytics acknowledge that the wealth of information provided by social media is important, they are circumspect about its ability to be applied to all areas. Some events cannot be predicted well using social media, namely those which people simply don’t talk about online. “We probably cannot find any crimes,” says Gloor. “They will not be discussed in public.”
Leetaru is even more wary of overly relying on social media to make predictions, arguing that in many cases even seemingly public events, like protests, have a hidden side to them. “If you look at the UK riots [in 2011], the first thing everyone said was [look at] Facebook and Twitter. But when they checked further, they realised that actually the rioters were using encrypted peer-to-peer Blackberry messages.”
While Leetaru is also involved in forecasting social and political events, his current work focuses more on culling information from traditional media, including a retroactive analysis of news reports, which he said located Osama bin Laden’s hideout within a radius of 125 miles (200km).
Social media, while providing a wealth of data, does not necessarily provide first-hand information, or better information. “All the work that is coming out seems to suggest that social media is more of a sounding box,” says Leetaru. “Something happens and social media reports on it.”
For example, in looking at Kenyan election violence several years ago, Leetaru suggests that most of the social media messages coming out of the country were not necessarily first-hand reports about what people personally saw, but rather people retweeting or rebroadcasting news media reports. “So it wasn’t that they were reporting that they saw a tank heading down the street, they were basically using it almost as a real-time bulletin board,” he says.
In many of these more complex cases, researchers are combining social media with news reports and other public data to help hone their forecasts. For example, Swedish-American firm Recorded Future, now based in Cambridge, Massachusetts, trawls hundreds of thousands of pieces of data, from government filings to social media, hunting for clues to the future.
According to Christopher Ahlberg, the CEO of the firm, the company’s proprietary software grew out of a decade of previous work on large datasets. He and his coworkers grew interested in looking at ways to organise data in a time-based manner, allowing people to do Google-like searches on future events.