Big data: Should it come with a big health warning?

Sneezing man Estimating who has the flu has shown up some problems with big data projects

Pick a number between 1 and 100.

Technology of Business

Got one? Good. Congratulations. Chances are that by plucking that number out of the ether you have done a better job than Google of predicting the percentage increase in the number of flu-like illnesses that will strike Americans over the next few weeks.

That's right. You, armed only with your puny brain, can outdo a multi-billion dollar corporation that employs some of the smartest people in the world.

This example might seem trivial, but many think it matters because of the status of Google Flu Trends (GFT), once seen as the shining example of the power of so-called big data.

The data it uses to make predictions about how many will be sneezing and wheezing a week or so ahead is drawn from search terms, blog entries and messages shared via social media - so-called unstructured data.

This is very different to the structured and slow stream of information gathered from forms filled in at surgeries and hospitals that, before the rise of big data, were how predictions were made.

And the problem is, GFT turned out not to be terribly accurate.

Start Quote

Often times the only reason why people believe their data is clean is because they have never looked at it”

End Quote Kaiser Fung Author and statistician

In a run of 108 weeks, GFT wrongly predicted the number of flu cases 100 times, revealed a recent study.

Sometimes its estimate was double the number of actual flu cases recorded by US doctors. Hence the reason anyone can do better by plucking a number out of thin air.

Yet this unstructured data humans put online is exactly the type of stuff that companies want to analyse when they kick off their own big data projects.

Many corporations are keen to use those garbled knots of human sentiment to monitor how their brands are faring online, and to tweak their operations accordingly when they spot commercial opportunities or potential PR disasters.

Before now, those giant data sets had been hard to unpick. GFT seemed to suggest that with the right tools it could unlock all kinds of useful predictions.

Not only that, but those predictions could be uncovered quickly and cheaply.

Dirty data

Why did GFT go so wrong and what implications does this have for other big data projects?

"There's no such thing as clean and stable data," said statistician Kaiser Fung who has written extensively about the pitfalls that can dog big data projects.

Close-up of hard drive There's no such thing as perfectly clean data, argues statistician Kaiser Fung

What he means by "clean and stable" is that it is a mistake to think that the data Google gathered for GFT today is the same as it gathered last week, last month or last year.

Google regularly tweaks the algorithms it uses to index online life and, as a result, may be sampling very different things month to month, adding a degree of instability - spots of dirt as it were - to that dataset.

The same is true of any big data set gathered by anyone, he said.

All will be tainted in some way as they will miss out something simply because of the quirks of the underlying code used to parse and index web pages, social media messages and blog posts.

Start Quote

There's a customer backlash about to happen - it's against the big part of big data”

End Quote Patrick James Ernst & Young

That will be particularly true if companies buy in their data from different sources and then treat it as all one corpus.

"I have never come across a complete data set," he said. "Often times the only reason why people believe their data is clean is because they have never looked at it."

Companies in possession of a huge corpus of data can assume that all the information they need is in it. Sadly, he said, this "N=all" assumption is wrong.

"It is much better to assume that the data has holes and flaws than it is to assume it is complete."

Any company starting a big data project would do better to look at the data they have gathered and clean it up before any analysis starts.

There are other good reasons for scrutinising that mass of information about customers, says Patrick James, a partner in consultancy Ernst and Young's consumer practice.

"There's a customer backlash about to happen," he says. "It's against the big part of big data."

More and more people are getting less and less happy about simply surrendering information and getting nothing in return, he maintains.

Increasingly, consumers and customers will attempt to hold back their data, limit what they share online or simply give the wrong answers when they sign up for a service or are quizzed about their life and habits, he believes.

Line of people People are getting more reluctant to share data about who they are and what they are doing

The tens of thousands of people who filled in a form to make Google expunge their data from its index was evidence of that growing desire to disappear, says Mr James.

If this trend grows, it could mean data sets get skewed and become less useful for those big projects.

These early days of big data might prove to be its golden age.

"Data has never been cheaper than it has been today and it's only going to get more expensive," says Mr James.

Fast response

So, if data is not the key to a good project, what is?

"Too many big data projects are started by the IT departments in companies that want to play with new technologies like Hadoop," says Dr Laurie Miles, head of analytics at big data specialist, SAS.

"That's led to scepticism, because in the history of IT projects a lot of them have been failures."

Instead of the technology coming first, anyone embarking on a big data project needs to know why they are doing it before they sign off on any expenditure by the IT folks, he argues.

British rowing team British Rowing has turned to big data to help fine tune coaching of its rowers

"A big data project is not going to deliver any benefit unless you focus on a specific problem."

That focus can stop a project running away with itself and ensure it produces results that impinge on a real business issue, he says.

Spotting fraudulent credit card use requires a very different approach to analysing the performance of elite rowers - SAS is helping with both.

"We analyse credit card data at the point of sale, and you need that quickly," says Dr Miles. "With British Rowing we have a couple of weeks to to give them answers."

Knowing the response can help define the technology needed to underpin that big data project.

"Often you do not need to spin up a massive IT infrastructure to make this work," he says. "That's just as well, as real time results are really expensive."

More on This Story

The BBC is not responsible for the content of external Internet sites

More Business stories


BBC Business Live

    CO-OP BANK ETHICS 07:34: BBC Radio 4

    Laura Carstensen, chair of Co-op Bank's Values and Ethics Committee is talking about the lender's new advertisement campaign. A man gets a tattoo about his commitment to Co-op's ethics. What does a fake tattoo say about the company's commitment to ethics, asks presenter Justin Rowlatt? Customers are savvy enough to understand the message, she says.

    MINIMUM WAGE 07:26: BBC Breakfast

    The minimum wage is most commonly paid in the hospitality, retail and care industries says Charlotte O'Brien a lawyer from York Law School on Breakfast. Many workers who think they are being paid less than minimum wage cannot afford to take their employer to an employment tribunal, she says. That's because they now have to pay £390 for a tribunal - or 60 hours work at £6.50 an hour.

    SAINSBURY'S BOSS 07:21: BBC Radio 4

    "It wouldn't be surprising in any business for us to try to sell more," says Sainsbury's chief Mike Coupe. He's talking about that gaffe earlier in the week, where a Sainsbury's store placed a motivational poster encouraging staff to get customers to spend an extra 50p in store, in the shop window. It was a simple mistake, he says.

    • Tesco to be investigated by FCA over profit warning
    • Sainsbury's reports third quarter of falling sales
    • Dollar hits six-year high against Japanese yen
    SAINSBURY'S BOSS 07:18: BBC Radio 4

    Mike Coupe is doing his first broadcast interview of the day on Today. In the last few months things have changed in the industry, he says. Customers are changing how they shop, he says. Is he worried about the fact Sainsbury's is the most short-sold (bet-against) stock on the FTSE 100, asks presenter Justin Rowlatt? "I'm a very calm individual," he says.


    Sainsbury's is not expecting sales to pick up in the rest of the financial year. Chief executive Mike Coupe says "we now expect our like-for-like sales in the second half of they year to be similar to the first half".

    TESCO PROBE 07:05: Breaking News

    The Financial Conduct Authority will conduct a "full investigation" into Tesco's £250m overstatement of profits, the supermarket said.

    SAINSBURY'S RESULTS 07:00: Breaking News

    Sainsbury's has reported a 2.8% decline in like-for-like sales in the second quarter, excluding fuel. That's the third consecutive quarter of falling sales.

    CO-OP BANK ETHICS 06:45: Radio 5 live

    Oh dear. The chair of Co-operative Bank's ethical board, Laura Carstensen appears on Wake Up to Money. She can't name a single country with an oppressive regime that the bank has declined to do business with. She also does not want to talk about the bank's private equity owners, but says all investors realise that the bank's ethical values are what makes it different. Co-op bank launches an advertising campaign today to publicise its ethics.

    SAINSBURY'S RESULTS 06:34: BBC Radio 4
    Sainsbury's store

    Sainsbury's releases results at 07:00. Sales are expected to be down 4%, says Bryan Roberts, from Kantar Retail on the Today programme. "What it reflects is we are in a zero-growth market. The pie isn't getting any bigger. Aldi and Lidl aggressively expanded in the South East and London," he says.

    PENSIONS 06:28: Radio 5 live

    "People are getting it," says Tim Jones the chief executive of National Employment Savings Trust (NEST) on Wake Up to Money. It's been two years since the government launched automatic enrolment for workplace pension schemes. Only 8% of people are opting out, but a quarter of those aged between 60 and 65 are not taking up the offer. "Older folks are thinking it's too late," says Mr Jones.

    DOLLAR YEN 06:19: Radio 5 live

    The dollar has hit a six-year high against the yen, trading above 110 yen. On Wake Up to Money the BBC's Rico Hizon says that there are expectations that the Bank of Japan will further ease monetary policy. Currency traders are also looking ahead to Friday's US employment report, which should give an indication of the strength of the US economy.

    MINIMUM WAGE 06:11: Radio 5 live

    Mike Cherry, National Policy Chairman for the Federation of Small Businesses says the raise in the minimum wage is still "realistic". But he wants a longer-term view of the wage. On Wake Up to Money he complains that business only gets six months notice of changes. He would like to the level to be set for a whole parliament.

    MINIMUM WAGE 06:03: Radio 5 live
    Hand pennies

    From today the minimum wage rises 3% to £6.50. Conor D'Arcy, Policy Analyst at Resolution Foundation welcomes the move, until this raise the minimum wage was worth the same as it was in 2005, he says on Wake Up to Money.

    06:01: Ben Morris Business Reporter

    Good morning. You can email us at or tweet @bbcbusiness.

    06:00: Howard Mustoe Business reporter

    Good morning everyone. This morning we are expecting an update from Sainsbury's on how trading has gone for them. The National Minimum Wage annual increase also takes effect, rising by 19p an hour to £6.50. Stay with us for more breaking news and analysis.



From BBC Capital


  • A computer simulation showing a planned station upgrade in Hong KongClick Watch

    Simulated world - how architects are using virtual and augmented reality to transform our cities

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.