Protein-coding genes make up just 1.5% of our genome, the rest includes a lot of what is thought to be useless junk with no discernible function. But it also contains regulatory sequences that control when, where and how our genes are used. We need to identify these if we’re ever to predict how a genome leads to a living, breathing organism. The technology for doing that is being developed, and the ENCODE project – the Encyclopaedia of DNA elements – has put it to good use, compiling a catalogue of the various regulatory sequences in our own genome. But ENCODE involved 442 scientists intensely running experiments for a decade, and even its unprecedented catalogue is incomplete.
And even if we have all this information—every gene, protein structure, and regulatory sequence – we’d still need to figure out how it all works together, and how it interacts with its environment. We would need patterns: when and where different genes are activated as an organism develops. We need timings: how quickly chemical reactions take place in a cell, and how proteins speed up that process.
Here, our metaphors let us down. Science writers like to compare the genome to a textbook or a blueprint. That conveys the fact that it stores information, but glosses over its buzzing, dynamic nature – proteins docking on and off to control the activity of genes, huge stretches of DNA that fold and unfold to reveal or hide their sequences, parasitic jumping genes that copy themselves and hop throughout the genome... None of our information stores – not sheet music, not recipe books – are this intricate.
This hasn’t stopped some scientists from trying to simulate this intricacy. In July, Covert announced that he had created a rough simulation of an entire organism – a single-celled microbe called Mycoplasma genitalium. Covert’s model simulates how all of the bacterium’s 525 genes are used, the proteins they produce, how quickly the proteins act, how they interact, and more. It is not completely accurate, but it captures much of M. genitalium’s lifestyle. Two colleagues wrote that the project “should be commended for its audacity alone”.
Still, the stimulation was hard-won. At 525 genes, M.genitalium has the smallest genome outside of viruses (humans have 20-25,000 genes, by comparison), pared down to extreme minimalism by its life as a parasite. It may be one of the simplest living things we can imagine, but modelling this microbe still took around 1,900 experiments and a lot of borrowed knowledge. “Around half of our model comes from experiments that were done in other bacteria,” says Covert. “There’s no way [the genome] would have been predictive by itself.”
Covert also needed to factor in M. genitalium’s environment. It lives only in the stable environment of our urethra, with no light, and steady temperature. “But even then, it occasionally sees the immune system coming after it and there’s no way of modelling that,” says Covert.
The influence of the environment becomes even more crucial for more complex free-ranging organisms. Temperature and acidity affect how proteins behave. The food that an organism consumes, the infections that plague it, and the competitors it interacts with, all affect how it develops, and how its genes are used. Many of these factors leave marks on the genome itself – “epigenetic” tags that dictate the deployment of genes, and can be passed on to the next generation. The environment clearly matters. When making predictions from a genome, the elephant in the room is the room.