Big data: Should it come with a big health warning?

Sneezing man Estimating who has the flu has shown up some problems with big data projects

Pick a number between 1 and 100.

Technology of Business

Got one? Good. Congratulations. Chances are that by plucking that number out of the ether you have done a better job than Google of predicting the percentage increase in the number of flu-like illnesses that will strike Americans over the next few weeks.

That's right. You, armed only with your puny brain, can outdo a multi-billion dollar corporation that employs some of the smartest people in the world.

This example might seem trivial, but many think it matters because of the status of Google Flu Trends (GFT), once seen as the shining example of the power of so-called big data.

The data it uses to make predictions about how many will be sneezing and wheezing a week or so ahead is drawn from search terms, blog entries and messages shared via social media - so-called unstructured data.

This is very different to the structured and slow stream of information gathered from forms filled in at surgeries and hospitals that, before the rise of big data, were how predictions were made.

And the problem is, GFT turned out not to be terribly accurate.

Start Quote

Often times the only reason why people believe their data is clean is because they have never looked at it”

End Quote Kaiser Fung Author and statistician

In a run of 108 weeks, GFT wrongly predicted the number of flu cases 100 times, revealed a recent study.

Sometimes its estimate was double the number of actual flu cases recorded by US doctors. Hence the reason anyone can do better by plucking a number out of thin air.

Yet this unstructured data humans put online is exactly the type of stuff that companies want to analyse when they kick off their own big data projects.

Many corporations are keen to use those garbled knots of human sentiment to monitor how their brands are faring online, and to tweak their operations accordingly when they spot commercial opportunities or potential PR disasters.

Before now, those giant data sets had been hard to unpick. GFT seemed to suggest that with the right tools it could unlock all kinds of useful predictions.

Not only that, but those predictions could be uncovered quickly and cheaply.

Dirty data

Why did GFT go so wrong and what implications does this have for other big data projects?

"There's no such thing as clean and stable data," said statistician Kaiser Fung who has written extensively about the pitfalls that can dog big data projects.

Close-up of hard drive There's no such thing as perfectly clean data, argues statistician Kaiser Fung

What he means by "clean and stable" is that it is a mistake to think that the data Google gathered for GFT today is the same as it gathered last week, last month or last year.

Google regularly tweaks the algorithms it uses to index online life and, as a result, may be sampling very different things month to month, adding a degree of instability - spots of dirt as it were - to that dataset.

The same is true of any big data set gathered by anyone, he said.

All will be tainted in some way as they will miss out something simply because of the quirks of the underlying code used to parse and index web pages, social media messages and blog posts.

Start Quote

There's a customer backlash about to happen - it's against the big part of big data”

End Quote Patrick James Ernst & Young

That will be particularly true if companies buy in their data from different sources and then treat it as all one corpus.

"I have never come across a complete data set," he said. "Often times the only reason why people believe their data is clean is because they have never looked at it."

Companies in possession of a huge corpus of data can assume that all the information they need is in it. Sadly, he said, this "N=all" assumption is wrong.

"It is much better to assume that the data has holes and flaws than it is to assume it is complete."

Any company starting a big data project would do better to look at the data they have gathered and clean it up before any analysis starts.

There are other good reasons for scrutinising that mass of information about customers, says Patrick James, a partner in consultancy Ernst and Young's consumer practice.

"There's a customer backlash about to happen," he says. "It's against the big part of big data."

More and more people are getting less and less happy about simply surrendering information and getting nothing in return, he maintains.

Increasingly, consumers and customers will attempt to hold back their data, limit what they share online or simply give the wrong answers when they sign up for a service or are quizzed about their life and habits, he believes.

Line of people People are getting more reluctant to share data about who they are and what they are doing

The tens of thousands of people who filled in a form to make Google expunge their data from its index was evidence of that growing desire to disappear, says Mr James.

If this trend grows, it could mean data sets get skewed and become less useful for those big projects.

These early days of big data might prove to be its golden age.

"Data has never been cheaper than it has been today and it's only going to get more expensive," says Mr James.

Fast response

So, if data is not the key to a good project, what is?

"Too many big data projects are started by the IT departments in companies that want to play with new technologies like Hadoop," says Dr Laurie Miles, head of analytics at big data specialist, SAS.

"That's led to scepticism, because in the history of IT projects a lot of them have been failures."

Instead of the technology coming first, anyone embarking on a big data project needs to know why they are doing it before they sign off on any expenditure by the IT folks, he argues.

British rowing team British Rowing has turned to big data to help fine tune coaching of its rowers

"A big data project is not going to deliver any benefit unless you focus on a specific problem."

That focus can stop a project running away with itself and ensure it produces results that impinge on a real business issue, he says.

Spotting fraudulent credit card use requires a very different approach to analysing the performance of elite rowers - SAS is helping with both.

"We analyse credit card data at the point of sale, and you need that quickly," says Dr Miles. "With British Rowing we have a couple of weeks to to give them answers."

Knowing the response can help define the technology needed to underpin that big data project.

"Often you do not need to spin up a massive IT infrastructure to make this work," he says. "That's just as well, as real time results are really expensive."

More on This Story

The BBC is not responsible for the content of external Internet sites

More Business stories

RSS

Business Live

  1.  
    13:35: Good afternoon Ian Pollock Business reporter, BBC News

    Thanks to Howard and Chris for this morning's work, getting up very early so you don't have to. I'm here till six.

     
  2.  
    13:30: Warren Buffett
    Warren buffett

    Please sir, can I have some more growth? No you many not, says Warren Buffett. He told business channel CNBC today that investors should not be disappointed with 2% economic growth a year, adding that the US has a "terrific economy".

     
  3.  
    13:14: Market update
    Nasdaq

    The US technology-focused Nasdaq index closed last week at 4,963.53, closing in on 5,000. "Certainly, the Nasdaq at 5,000 conjures up images of a tech bubble," said Jack Ablin, chief investment officer at BMO Private Bank. "But we've had time for business profits to grow into those crazy expectations 15 years ago."

     
  4.  
    12:55: Germany vs Greece, part 675...
    Alexis Tsipras

    German finance minister Wolfgang Schaeuble's spokesman has put the knife into Greek PM Alexis Tsipras for accusing eurozone partners including Spain and Portugal of undermining his negotiations with Brussels. "I can only say that according to European standards that was a very unusual foul," spokesman Martin Jaeger said on Monday. "We don't do that in the Eurogroup. It is not the done thing."

     
  5.  
    12:40: Markets update
    Tullow rig

    The FTSE is down 16 points, or 0.2%, at 6,930 in lunchtime trading.

    • Not a great Monday for Tullow Oil, whose shares are down 7% after updating the market on a dispute over the maritime border between Ghana and Ivory Coast that has affected exploration in the area.
    • Top riser of the day is Intertek, up 3.6% to £26.21 per share.
     
  6.  
    12:25: Mikhail Fridman

    Mikhail Fridman's investment company, LetterOne, has written a stern letter to energy secretary Ed Davey following his department's decision to block the sale of 12 North Sea oil fields to the Russian oligarch. Jonathan Muir, chief executive of LetterOne, says the company will seek a judicial review of the decision. "We very much hope that DECC will reconsider its position," he adds in the letter obtained by the FT.

     
  7.  
    China's Oprah Via Twitter Linda Yueh Chief business correspondent
    BBC

    tweets: My interview w Yang Lan, China's Oprah, at GREAT festival Shanghai. Prince William here too. Listen tomorrow #r4today

     
  8.  
    11:55: Semiconductor merger
    nxp

    You might not have heard of NXP Semiconductors before, and frankly neither had we, but the Dutch chip maker has announced a deal to buy its smaller rival Freescale Semiconductor for $11.8bn (£7.6bn). The combined NXP/Freescale will be the biggest automotive and industrial semiconductor manufacturer and be worth more than $40bn (£26bn).

     
  9.  
    11:40: China purchasing

    Manufacturing activity in China improved in February for the first time in four months, but export demand weakened. HSBC's manufacturing index, based on a survey of factory purchasing managers, rose to 50.7 from January's 49.7. It uses a 100-point scale on which numbers above 50 show activity increasing.

     
  10.  
    11:25: Dilbert

    Today's Dilbert is also an exposition of Goodhart's law (that when a measure becomes a target, it is no longer a worthwhile measure). So while you read it, you'll be technically working, which is a good excuse if you get caught reading comics at work.

     
  11.  
    11:10: Xiaomi camera
    Xiaomi

    Tweet picture of the day courtesy of TechCrunch, which reports that Xiaomi has introduced a bargain GoPro-style action camera that can be strapped to almost anything - even a cat...

     
  12.  
    10:54: Eurozone inflation

    Eurostat also said that consumer prices in the eurozone fell by 0.3% in February, compared with the same month last year, following a 0.6% fall in January. Economists had expected a 0.4% slide. Excluding the cost of energy and unprocessed food, prices rose by 0.6% year-on-year.

     
  13.  
    10:44: Eurozone unemployment

    Eurozone unemployment continued to fall in January, hitting its lowest level since April 2012 as the economy gained momentum. The jobless rate fell to 11.2% from 11.3% in December, with the number of people out of work down by 140,000 to just over 18m.

     
  14.  
    Mobile World Congress Via Twitter

    Nic Fildes, technology and communications editor of The Times, tweets from Mobile World Congress in Barcelona:

    Of all the clunky telecoms buzzwords, "softwarization" is the worst. It sounds like a defunct rave act

    @NicFildes

     
  15.  
    10:13: Mortgage lending
    house building

    The number of loan approvals for house purchases fell to 60,786 in January, compared to an average of 61,666 over the previous six months, the Bank of England said today. The number of approvals for remortgaging was also down, to 31,640, compared with an average of 32,044 over the previous six months.

     
  16.  
    10:00: New coin
    coin

    Here's that new portrait of the Queen for sterling coins. It's only the fifth coin portrait of the Queen in her 63 years on the throne and the first since 1998. New coins bearing the image will now be struck, according to the Royal Mint.

     
  17.  
    Mobile World Congress Via Twitter

    BBC technology correspondent Rory Cellan-Jones tweets:

    Rory Cellan-Jones

    I think it's fair to say that Sony Eye Glass is a work in progress

    @ruskin147

     
  18.  
    09:42: North Sea gas deal BBC World News
    North sea oil rig

    Following the government decision to block the sale of 12 oil and gas fields in the North Sea to Russian oligarch Mikhail Fridman, Daragh McDowell, an analyst at risk analytics firm Verisk Maplecroft tells BBC World Business Report: "To allow Russian investment in North Sea offshore oil would definitely run counter to the spirit, if not the letter, of existing sanctions."

     
  19.  
    Globalisation Via Twitter
    Factory workers in Sri Lanka

    Duncan Weldon, Newsnight economics correspondent, asks whether globalisation is slowing down.

     
  20.  
    09:12: Brand new
    chocolate

    Marks & Spencer, Cadbury and Heinz have all taken a dip in consumers' affections, according to the annual Consumer Superbrands survey. British Airways topped the survey, while messing with the Creme Egg hurt Cadbury. What say you, readers? Which brands do you love - or love to hate? bizlivepage@bbc.co.uk

     
  21.  
    08:57: Pension charges
    Steve Webb

    Pensions minister Steve Webb is worried that the "dark corners" of the investment and pensions industry are hiding some "nasty surprises". As a result, the Financial Conduct Authority and Department for Work and Pensions have called on the industry to help draft new rules on how the cost of workplace pension schemes should be reported to savers. "We have a duty to throw light for the first time on potential hidden charges - and restore faith and fairness in British pensions," Mr Webb says.

     
  22.  
    Mobile World Congress Via Twitter
    Lumia phones

    Leo Kelion, technology desk editor of BBC news website, tweets: #Sony and #Microsoft have new phones at #MWC - but they're not flagships - here's why

     
  23.  
    08:28: Newspaper review
    paper

    The FT reports that Russian billionaire Mikhail Fridman will fight to buy those North Sea gas assets. The Wall Street Journal analyses the complex relationship between mobile operators and social networks and whether they should defriend each other. The Times takes a look at more companies coming forward to tackle RBS's Global Restructuring Group (GRG) and its alleged habit of putting them to the wall. RBS says a legal inquiry found no evidence it "set out to artificially distress otherwise viable businesses", The Times reports.

     
  24.  
    08:13: Watch out
    watch

    Technology fans will be watching the Mobile World Congress, which kicks off in Barcelona today, for the latest gadgets. Manufacturers seem determined to get us to buy a smartwatch. Apple and its rivals such as Samsung are trying to make the things prettier and more useful, AFP reports, having interviewed various analysts. Will you be tempted?

     
  25.  
    07:59: Trinity Mirror
    Mirror.co.uk

    Trinity Mirror will start paying a dividend for the first time since 2008 - of 3p a share - as pre-tax profits rose 1% to £102.3m for 2014, the Daily Mirror publisher said. However, print advertising revenue fell 14.1% in the second half of the year as supermarkets cut their spending.

     
  26.  
    07:45: Thorntons
    choc

    A "mixed performance" from Thorntons, chief executive Jonathan Hart tells investors. International sales rose by 19.9% to £5.4m in the first half of the company's financial year, but UK commercial sales melted away by 12.4% to £54.7m. Sounds more like a pick 'n' mixed bag to us...

     
  27.  
    07:32: Nationwide house prices
    For sale sign

    House prices fell by 0.1% in February, according to Nationwide - the first decline in five months, since September. That brought the annual rate of price rises to 5.7% compared with 6.8% in January - a sharper than expected slowdown.

     
  28.  
    07:16: Lib Dems Norman Smith Assistant political editor, BBC News
    Lib Dems

    The Liberal Democrats announce the first of many proposed tax rises today as part of their vow to pay off the deficit by 2018 by increasing the tax take rather than cutting spending further. Norman Smith tells Radio 4's Today programme that banks would have all the cuts in corporation tax since 2010 wiped out in a move that would generate about £1bn for the public purse. However, the Lib Dems still need to raise a further £7bn or so to make their sums add up, he adds.

     
  29.  
    07:02: Buffett letter
    bricks

    A quick reminder of what Mr Buffett's company owns. Among other businesses, Berkshire Hathaway owns about half of Heinz, engine oil firm Lubrizol, clothing maker Fruit of the Loom, the pleasingly named Acme Brick company and private plane operator NetJets. He also owns stakes in Mars, Coca Cola and American Express.

     
  30.  
    06:50: Bank shares BBC Radio 4

    David Cumming, head of equities at Standard Life, tells presenter Simon Jack on Today there is a "lot of noise" around banking stocks given the regulatory pressure the sector is now under, meaning they have a "higher than average risk profile". He also thinks the FTSE 100 will crack the 7,000 mark in the next few weeks as the economy continues to improve.

     
  31.  
    06:37: East Coast trains Radio 5 live
    train

    The East Coast rail route between London and Scotland has returned to private hands after more than five years in the public sector. David Horne of Virgin Trains East Coast is on 5 live. He says Virgin has done a good job with the West Coast line. National Express took over the line during a recession, so starting a franchise now should work better for Virgin, Horne adds.

     
  32.  
    06:24: Market update

    China's decision to cut interest rates over the weekend - the second reduction in four months - in a bid to ward off deflation has boosted stock markets in Asia today, with Sydney up 0.5% as mining companies bounced higher, while the Nikkei in Tokyo and the Shanghai Composite were both 0.3% higher.

     
  33.  
    06:11: Buffett letter Radio 5 live
    warren

    Sue Noffke, fund manager at asset manager Schroders, is 5 live's markets guest. Billionaire investor Warren Buffett sent his annual letter to shareholders on Saturday, summing up his 50 years building one of the planet's biggest companies. Because he behaves more like an owner than an investor, "he has had a longer-term investment horizon" than other investors, says Ms Noffke.

     
  34.  
    06:02: Software security Radio 5 live

    Online security firm AVG's chief executive Gary Kovacs is on 5 live speaking from Barcelona's Mobile World Congress tech show about security breaches. The internet has "only been around for 20 years," so securing the place is now a priority, he says. A podcast is now on the website.

     
  35.  
    06:01: Chris Johnston Business reporter

    Good morning! Get in touch via email bizlivepage@bbc.co.uk or on Twitter @BBCBusiness

     
  36.  
    06:00: Howard Mustoe Business reporter

    Good morning everyone. Welcome to Monday. The UK government has said it will block the sale of 12 North Sea oil and gas fields to Russian billionaire Mikhail Fridman after concerns about the effect of "possible future sanctions". Stay tuned for more of the best business news.

     

Features

  • A very clever little girlBrain gain

    Are people getting more intelligent?


  • Don Roberto Placa Quiet Don

    The world's worst interview - with one of the loneliest men on Earth


  • BeefaloBeefalo hunt

    The hybrid animal causing havoc in the Grand Canyon


  • Sound of Music PosterFar from a flop

    Even Sound of Music film crew surprised by success


From BBC Capital

Programmes

  • BatteriesClick Watch

    More power to your phone - the lithium-ion batteries that could last twice as long

Try our new site and tell us what you think. Learn more
Take me there

Copyright © 2015 BBC. The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.