Libraries to store all UK web content


Richard Gibby from the British Library says there is a common belief that the average web page lasts just 75 days

Related Stories

Millions of tweets, Facebook status updates and even a blog about a bus shelter in Shetland are to be preserved for the nation.

The British Library and four other "legal deposit libraries'" have the right to collect and store everything that is published online in the UK.

It is estimated around a billion pages a year will be available for research.

It follows 10 years of planning and will also offer visitors access to material currently behind paywalls.

The other institutions involved are the National Libraries of Scotland and Wales, the Bodleian Libraries in Oxford, the University Library, Cambridge and the Library of Trinity College, Dublin.

The archive will cover 4.8 million websites and will include magazines, books and academic journals as well as alternative sources of literature, news and comment such as Mumsnet, the Beano online, Stephen Hawking's website, and the unofficial armed forces' bulletin board, ARRSE.

Ben Sanderson from the British Library said while people may think information on the web lasts forever, huge amounts of research material has already disappeared.

He added the public had already "lost a lot of the material that was posted by the public during the 7/7 bombings".

MP's blog sites have also been lost following a death or an election defeat.

Start Quote

Many Facebook comments are public and people don't realise they're publishing to the world”

End Quote Jim Killock Open Rights Group
Top 100 websites

Mr Sanderson explained that with much of public life having migrated to the online world, material that is now published physically gives only a part of the story and debate within modern Britain.

He said: "It will be impossible to tell for instance the story of the 2015 general election without accessing what appears on the web".

The new databases will cover all areas of interest, for example the website Style Scout - a fashion blog documenting London Street Fashion - will give historians a snapshot of what people were wearing in 2013.

As part of the launch of the process, the British Library has commissioned a survey of the top 100 websites that ought to be preserved for historians and researchers.

Among the sites recommended to keep material from are eBay, Facebook, Twitter, Tripadvisor and Rightmove.

Some other lesser known ones include the Anarchist Federation, the Dracula Society and The Dreamcast Junkyard - a blog dedicated to the community of gamers who continue to play Dreamcast games online, despite the fact they were officially discontinued in 2002.

The British Library is also asking for advice from the public as to which websites should be preserved to give an accurate picture to future generations.

Jim Killock, executive director of the Open Rights Group, told the BBC News website: "The idea of the British Library preserving published content from UK websites is a great one.

"My concern is that a lot of Facebook comments are public and people don't realise they're publishing to the world. That's Facebook's fault, not the British Library's - their user settings need to be changed in line with people's expectations.

"Twitter, on the other hand, is avowedly public - it's very clear you're publishing to the world."


More on This Story

Related Stories

The BBC is not responsible for the content of external Internet sites


This entry is now closed for comments

Jump to comments pagination
  • rate this

    Comment number 58.

    About time someone started this sort of record keeping, its all very well thinking that everything placed on the web is there forever, but how many people today remember Fidonet? the amature global online network that preceeded the web. It had the precusors to email and usegroups. But it must be said was better monitored and supervised than usegroups.
    Its mainly gone now, with all its records.

  • rate this

    Comment number 50.

    Surely by nature of it being on the internet then (not including if the website is closed down or the hosting server fails) it will be available to the public for the rest of time anyway? Dig deep enough and the vast majority of things that are posted online are there for anyone to access, why waste money (that's supposedly really tight for libraries these days) on storing them offline elsewhere.

  • rate this

    Comment number 35.

    I already use this service to archive the output on my research blog, and a number of tutorial websites that I manage. The only other way to get a library to archive this is to spend lots of public money, to get it published.

    It's not just about celebrity tweets; the use I make of the web may only be of interest to a few people, but this does not make it unimportant. Good news all told!

  • rate this

    Comment number 34.

    A good rule of thumb is to assume that everything you post on a public medium such as the internet is going to be stored and accessible forever. People need to learn not to post anything online that they don't want in public. As for the project, isn't there already various archives that perform the same function? This seems like a duplication of effort.

  • rate this

    Comment number 27.

    If every flippant comment ever made by an individual is on file for all to see what happens if his attitudes or makes one ill advised statement? Surely people have the right to 'be forgotten' as it were? I find this pretty outrageous.


Comments 5 of 7


More Entertainment & Arts stories


Features & Analysis

Elsewhere on the BBC

  • Audi R8Need for speed

    Audi unveils its fastest production car ever - ahead of its Geneva debut


  • A robot holding a table legClick Watch

    The robots who build flat-pack furniture - teaching machines to work collaboratively

Try our new site and tell us what you think. Learn more
Take me there

Copyright © 2015 BBC. The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.