'First drafts' of human protein catalogue published
- 28 May 2014
- From the section Science & Environment
The first two attempts at a database of every single human protein - the "proteome" - have been made public.
This builds on our knowledge of the genome by showing which genes actually produce proteins in which tissues.
Some of the 17-to-18,000 reported proteins arise from stretches of DNA previously thought to be "non-coding".
Along the vast length of DNA packed inside each of our cells, our genes are the sections which contain the instructions, or code, for making proteins.
"While we have a good idea of what the genome looks like, we didn't know how many of those potentially 20,000 protein-coding genes would actually make protein," said Prof Bernhard Kuester, who led the German team at the University of Technology, Munich.
To find out, the researchers extracted all of the protein from many different samples of human tissues, as well as a number of cell lines. The proteins in that purified mixture were then chopped into small pieces and a technique called mass spectrometry revealed the sequence of amino acids forming each of those pieces.
With a lot of computing power and patience, these batches of protein fragments can be compared with the human genome to make a map, showing which genes in which tissues are "expressed" and producing protein.
"This is the first inventory, if you like," Prof Kuester told BBC news, "like a dozen years ago with the first draft of the human genome."
And just like the results of the Human Genome Project, these data contain some surprises.
Both groups found hundreds of unexpected proteins, produced by fragments of ancient genes (called "pseudogenes") or by lengths of DNA that were not thought to be genes at all.
As well as the newcomers, there were notable absences. "We have good reason to believe that there are hundreds of known, annotated genes that perhaps are redundant," said Prof Kuester.
The team based in the US and India, led by Prof Akhilesh Pandey of Johns Hopkins University in Baltimore, found evidence for only 84% of the proteins that might be predicted from looking at the genome.
Prof Pandey told the BBC it was important to study the proteins themselves, as well as the genes that encode them.
He offered an example of how a researcher, investigating a particular gene, might use one of the new databases: "They can look at the expression and get clues about what it could be doing. For example if a protein is expressed in the foetal gut and not the adult gut, then they might think of some sort of developmental process."
The tissue-by-tissue breakdown could also help scientists trying to figure out the actions and side effects of drugs. By comparing the proteome of various cancer cell lines, Prof Kuester and his team have already identified certain clusters of proteins that could increase or decrease sensitivity to cancer drugs.
Dr Kevin Mills, who uses proteomics to study rare diseases at the UCL Institute of Child Health, agrees that it is crucial to look "beyond genomics" at protein levels and how they vary.
"Genetics can't tell us everything," said Dr Mills, who was not involved in either study. "This is really important. We're not static - we're fluid and dynamic and our proteome changes continually."
Although they had seen each other's work at conferences, both Prof Pandey and Prof Kuester told BBC News they had "no idea" they were headed towards publishing simultaneously. They spoke on the phone last week after discovering that both of their studies would grace the cover of Nature.
"We never saw this as a race to be first," said Prof Kuester. "My interpretation is that when the time is right, somebody's going to just do it. And perhaps two people are going to do it!"
Prof Pandey compared today's joint publication to the first draft of the human genome, which was announced by two different teams in February 2001.
"Although both groups came up with similar numbers of genes, the actual list was different," he said. "We are likely to have less of that confusion, but we are definitely going to benefit from putting the two data sets together."