Making sense of the web during a crisis
- 20 July 2010
- From the section Technology
In 2009, a series of deadly riots swept through the Ugandan capital of Kampala.
At least 10 people were killed during clashes between police and rioting supporters of a traditional king.
"It took a long time for the international media to pick up on the story, so I spent a long time following the Kampala hashtag on Twitter and checking my e-mail," said Jonathan Gosier, an American programmer living in the country.
As the riot intensified, information flooded the web, making it more and more difficult to follow what exactly was going on, he said.
"I just thought there has to be a better way of doing this. I was looking at Twitter, e-mail, texts, blogs; I had 20 windows open because you don't know how big this could be.
"I was scared," the CEO of AppAfrica added.
One year on, Mr Gosier and the not-for-profit groups Ushahidi are about to release SwiftRiver, a suite of tools that can intelligently crunch, process, filter, and verify the torrent of real-time news that spills on to the web on a daily basis and peaks during a crisis.
The ultimate goal is to mesh together thousands of pieces of information and spit out a single, unified and useful feed.
In essence, it attempts to strip out a signal from the noise.
"We want to help people manage real-time data on the web more efficiently," he told BBC News.
Out of adversity
However, it took another crisis, more than 11,000km (7,000 miles) from Uganda, to bring the software into being.
In January 2010 a massive earthquake hit Haiti. Within an hour, volunteers at Ushahidi sprang into action.
The organisation has developed an online mapping tool that can be used to collect and plot reports and information coming in from citizens and organisations via e-mail, SMS, the web and Twitter.
These can then be used to direct and target humanitarian and rescue missions.
But the scale of the problem in Haiti began to overwhelm the group.
"They got more than 100,000 reports in four days," said Mr Gosier.
Two weeks later teams of volunteers, working around the clock, had only managed to process half the messages.
"We had to do what we call report triage," Erik Hersman, one of the co-founders of Ushahidi, told BBC News.
"Figuring out which channels and sources were the most important is incredibly difficult. How do you know if you haven't missed something?"
At that moment, the idea for SwiftRiver was born.
The open source software consists of five web applications that filter a person's chosen streams of information in different ways.
Most are built using three technologies: so-called veracity algorithms that try to rate the trustworthiness of a source, machine learning, and natural language processing.
For example, one of the natural language tools processes messages and text to extract relevant keywords with which to tag it.
A message such as "there has been an earthquake in Chile" would extract "Chile" and "earthquake" as the key message components.
These keywords can then be used to find other related content containing those words.
To refine the stream further, the same software can be used to filter out inaccuracies, falsehoods, and irrelevant information.
"That component is partially human-powered," said Mr Gosier.
"As you are looking at stream of information, you are interacting with the system saying that is a falsehood, that is inaccurate and so on," he added.
Over time, the system begins to learn what information a person is looking for and begins to filter it more intelligently.
Combined with the initial tags, the software can crunch a torrent of information and spit out a more manageable flow.
"I think that humans are a very important part of the system, but you want to maximise their time," said Mr Gosier.
Other tools can be used to filter out duplicate messages.
This is particularly important for finding information on Twitter, where the original source of a message can become blurred as people repeat and retweet the message around the web.
Other apps try to work out the geographical origin of a message author, so that information from an area of interest can be prioritised over others.
For example, in the case of Haiti, messages that appear to originate from within the country could appear at the top of the list, as these may contain the most valuable or relevant information.
The final two applications try to determine the trustworthiness of a piece of information.
One uses algorithms to assign a score to a source based on the quality of their information over time.
If a source - whether a blog, twitter feed, or news organisation - is deemed consistently accurate, it will be scored more highly than more questionable sources and promoted further up the feed.
Although SwiftRiver was originally developed to extract useful data from the web in terms of crisis, it may have more general applications.
Media firms have shown interest in it for news gathering, whilst a public health organisation wants to use it to extract and share the best health information on the net.
Mr Gosier believes it could also be used to help everyone with information overload.
"You can use all of these tools and apply them just to Twitter, e-mail or SMS," said Mr Gosier.
"We are attempting to solve a problem that everyone has," he added.
In August, his team will release the first complete beta - or test - version of the software for anyone to use.
However, at launch, there will be a tension.
"Historically, every time that Ushahidi is deployed, it means that something bad has happened," said Mr Hersman.
"At least now, people will have tools to help deal with it."