Who, What, Why: How do you reassemble shredded documents?
- 6 December 2011
- From the section Magazine
Governments and businesses have long used shredders to destroy sensitive documents. How easy is it to reassemble the pieces?
Almost every office has one - a document shredder and a bin filled with strips of paper fit for the bottom of a birdcage.
But in war time, the shredded pages found in a captured bunker or command post could contain intelligence, if the thousands of pieces could be reassembled.
After Iranian students seized the US embassy in Tehran in 1979, they spent years painstakingly reassembling the intelligence reports and operational accounts shredded by the CIA officers who were the last workers captured.
Now, a team of computer programmers from California have developed software they say shows that computers can, in theory, do most of the hard work.
It works by matching up individual shreds based on minuscule clues in each shred - the contour of the tears, a barely-visible watermark, and traces of writing, for instance - and can work incalculably faster than a human undertaking the same task.
It was the successful entry in a document shredder competition launched this autumn by the US military, in an attempt to encourage research on what is essentially a maths problem - how to assemble a puzzle efficiently.
In October, the Defense Advanced Research Projects Agency (Darpa), the Pentagon's research arm, offered $50,000 (£31,961) to the first team to reassemble five shredded hand-written documents and answer the puzzles contained in each of them.
"Any time you're in conflict or in war and you were to take over a building or a compound, it wouldn't be terribly surprising to have the enemy try to destroy or shred their documents," says Dan Kaufman, director of Darpa's information innovation office.
"How can we quickly put them together and get some value and try to save some lives?"
A decent commercial shredder can reduce a sheet of paper to more than 400 pieces. That yields a total of 1,276,800 possible two-piece combinations - for one single-sided sheet.
Most office documents are a lot longer, many are printed on both sides, and the bin containing the shreddings could hold the remnants of hundreds of pages.
The last embassy workers captured by the Iranian students on 4 November 1979 included a team of CIA officers who had locked themselves in a vault in order to burn and shred sensitive embassy documents, says Malcolm Byrne, deputy director of the National Security Archive, a research organisation at George Washington University.
"When those guys gave themselves up, they left all the stuff in there thinking, 'okay, we've done our job,'" he says.
Instead, the Iranians laid the shreds out on a floor and devised a sophisticated procedure for numbering, indexing and reassembling the individual shreds, Mr Byrne says.
"Certainly it took a number of years for them to finish the process," he says.
The security forces later published the reconstructed documents in book form and sold copies all over Tehran, he says. And agents used the intelligence they gathered to identify and kill CIA collaborators.
The Darpa competition opened on 27 October, and more than 9,000 teams entered from across the US.
Each of the five shredded documents were presented online in high-quality digital format. Some documents were more than a single page and some had pieces missing.
The winning team was a group of California computer programmers led by Otavio Good, a former video game developer.
He and his partners developed software that analysed the digital images of the shredded documents, using a concept called computer vision.
"We get the computer to look at where the ink is on the page and the shape of tear on the page," says Good, 37.
To reconstruct the document, a human user clicks on an individual piece that has been ingested into the software, then selects which side of the piece to check for a match. The software then recommends possible matches from the remaining unmatched pieces.
This continues until all the pieces have been matched up.
"The process was more about having a human verify what the computer was recommending," he says.
It took the team, called All Your Shreds Are Belong to US, about a month to develop and revise the software, and he estimates they spent about 600 man-hours on the programming and eventual solution of the puzzles.
"We basically spent every hour outside of work, working on this for a month," Good says.
"And we did it for the competition. If we were doing it for the money we would have come out a lot lower than we could have just doing contract work."
Good says the software the team developed has little potential as an off-the-shelf product for use by the world's militaries and intelligence agencies.
"I would call what we did a proof of concept," he says. "We put this stuff together very quickly... and it's very specifically tailored to each puzzle."
The Darpa documents were far simpler and neater than in a real case. All the sheets were single-sided, all the pieces were laid right-side-up in an orderly fashion to make them easier to work with.
And the shreds were unmarred by, say, smoke, mud or blood, unlike pieces captured in the field.
Good said he approached the competition as a programming challenge and was "neutral" about the fact he was using his skills potentially to aid spies and soldiers.
"What we've done here is we've set the bar for where the security's at," he says.
"A lot of these shredders are maybe not as secure as you thought, and maybe you should get a better shredder if you want these really and truly not to be assembled."