Crowds 'could be counted' with phone and Twitter data
It may be possible to estimate the size of a large crowd based on geographical data from mobile phones and Twitter, according to a new study.
Warwick University researchers studied geo-tagged tweets and mobile phone use over a two-month period in Milan.
In two locations with known visitor numbers - a football stadium and an airport - these activities rose and fell in close step with flow of people.
The team said it could enable measurement of events such as protests.
Other researchers emphasised that there were limitations and biases within this type of data - for example, only a subset of the population uses smartphones and Twitter and not all areas are well served by mobile phone towers.
But the study's authors said the results were "a very good starting point" for making more of these estimates - with greater accuracy - in the future.
"These are the numbers - the calibration examples - that we can draw on," said co-author Dr Tobias Preis. "Obviously it would be even better if there were training examples in other countries, other environments, other time periods. Human behaviour is not constant around the globe.
"But it's a very, very good base to build on, to provide initial estimates."
The work, published in the journal Royal Society Open Science, is part of a big and growing research field exploring what online activity can reveal about human behaviour and other real-world phenomena.
Federico Botta, the PhD student who led the analysis, said the mobile phone-based approach had advantages over other methods for estimating crowd sizes.
"This is very quick," Mr Botta told BBC News. "It does not rely on human judgement, it only relies on having the data related to mobile phones, or Twitter activity."
Margin of error
Within two months of mobile data provided by Telecom Italia as part of its Big Data Challenge, Mr Botta and his colleagues concentrated on Linate Airport and the San Siro football stadium, home to both AC Milan and Inter Milan.
They compared the number of people known to be in those two places, based on flight schedules and football ticket sales, with three measures of mobile phone activity: the number of phonecalls and text messages, the amount of internet use, and volume of tweets.
"What we saw was that indeed these activities… had a strikingly similar behaviour to the number of people," Mr Botta said.
This may not seem surprising - but particularly within the football stadium, the team saw such a reliable pattern that they could make predictions.
There were 10 football matches during the analysis period. Based on data from any nine of those games, the researchers could predict how many were at the tenth game based purely on mobile internet data.
But how accurate were they?
"Our mean absolute percentage error is about 13%," Mr Botta said. "That means that our estimate and the actual number of people differ in absolute value by roughly 13%."
This is pretty good, the team said, compared with traditional techniques that rely on images, grids and human judgement. They pointed to the famous example of the 1995 "Million Man March" in Washington DC, where even the most careful analysis could only produce an estimate to within 20% - after initial statements varied wildly from 400,000 to 2 million.
Co-author Dr Suzy Moat said the football stadium results were even better than the team had hoped.
"This is the kind of thing you really hope you'll find, and you're not normally lucky enough to see," she told the BBC. "It's really striking that we're seeing quite such a close correspondence between the telecommunications data and the crowd size estimates."
Dr Ed Manley is a lecturer at the Centre for Advanced Spatial Analysis at University College London. He said this technique had a lot of potential and that people should be "optimistic but cautious" about using phone data in this way.
"We've got these massive data sets and there's a lot to be done with them... But we need to remain cautious about how far we push the data," Dr Manley said.
He explained that it would be important to remember that data from smartphones would not measure the population evenly.
"There are potentially biases there. Who are we measuring in these data sets?" Twitter, he noted as an example, has a relatively young and affluent user base.
Dr Manley also said it would be important to choose the measurement carefully because people use mobiles for different things in different places - perhaps more for phonecalls at the airport and for tweets at the football, for example.
And crucially, the whole analysis hinges on mobile phone signal - which is notoriously variable from place to place, and prone to dropping out.
"If we're relying on these data sets to tell us where people are, what happens when we have a problem with the way that data is collected?" Dr Manley said.
Follow Jonathan on Twitter.