Birdsong is beautiful. No other animals have inspired quite so much literature and music as songbirds.

Does our love of these sweet tweets betray a deeper link between humans and birds? We are very distant relatives indeed, but could we share some fundamental aspects of language?

Charles Darwin was so struck by this thought that in 1871 he wrote in The Descent of Man: "The sounds uttered by birds offer in several aspects the nearest analogy to language, for all the members of the same species utter the same instinctive cries expressive of their emotions; and all kinds that have the power of singing exert this power instinctively."

Today, there is growing evidence that humans and birds have more in common than the simple ability to produce lots of different sounds. In fact we share brain structures and genes that are associated with speech. Some scientists now believe that birds may hold the key to a great mystery: how human language evolved.

This idea was put forward in 2013 by Shigeru Miyagawa, a linguist at the Massachusetts Institute of Technology in Boston. He and his colleagues suggested that human language relies on two distinct systems, both of which had previously evolved in simpler animals.

A nightingale can sing up to 200 different songs

The first system generates words. This "lexical" system is used by our primate relatives, such as chimpanzees.

The second system is "expressive". It creates patterns that don't include words, such as a tune that you hum. It's this system that Miyagawa says is similar to those underlying birdsong.

Birds like zebra finches learn their songs when they are young, usually from their fathers, and continue to sing those same songs throughout their lives. The songs don't contain words: they are just tunes with a certain pattern.

Even nightingales, which are renowned for the complexity of their music, do not express meaning by singing. "A nightingale can sing up to 200 different songs," says Miyagawa. "But the purpose of these songs is pretty limited, usually to mate and also to assert territory… So each song doesn't have a particular meaning."

In humans, says Miyagawa, these two systems work together. The lexical system holds something like 60,000 words. Then the expressive system assembles them into patterns. He calls his idea the "integration hypothesis".

Our last common ancestor with birds lived over 250 million years ago

"As far as we know, there are no animals, other than human beings, that have integrated the two systems," says Miyagawa. He and his colleagues have suggested that humans may even have started out with the ability to sing, like birds, and later worked words into the songs.

But the integration hypothesis is controversial. Birds and humans are quite separate on the evolutionary tree: our last common ancestor with birds lived over 250 million years ago, before the dinosaurs evolved.

So when linguists try to understand the origins of human language, they have tended to focus on our closest relatives, the primates.

According to many linguists, "we basically began with a lexical system, like monkeys, which use isolated utterances like 'snake', 'leopard' and 'eagle'," says Miyagawa. The idea is that these individual utterances got joined into two-word sentences, and eventually into the long wordy sentences we use today.

There are no physical remnants of early language

The problem with this "protolanguage" idea is that the leap from single words to sentences is a big one. We don't just string words together any old way: in fact, the meaning of words can change depending on how they are used in a sentence.

"That's what makes human language unique," says Miyagawa. There appears to be an enormous jump from what primates can do to what humans can do. He believes that the integration hypothesis, by introducing the element of birdsong, through an expressive layer, may help bridge this gap.

However, the hitch with the integration hypothesis is a lack of scientific proof. This is an issue for any idea that purports to explain where language came from. Unlike fossils that can tell us how our bodies evolved, there are no physical remnants of early language.

Instead, we need to look for analogues to human language elsewhere in the animal kingdom. If we can find biological traces of them, this could provide hard evidence to support or disprove our ideas about how language evolved.

Songbirds can learn the calls of other species

The most obvious link between humans and birds is the ability to learn new sounds from others. As Darwin noted, young birds learn their songs from adults by imitation, and develop them into a song or repertoire of their own. Human infants demonstrate exactly the same process of vocal learning, first by babbling and then developing this into words and sentences.

Songbirds are remarkably good at learning new calls. According to a 2009 study, they can learn the calls of other species, becoming "bilingual" or even "trilingual".

These remarkable abilities are underpinned by specialised genes.

In 2001, Simon Fisher, now the director of the Max Planck Institute for Psycholinguistics in Nijmegen, the Netherlands, helped discover the FOXP2 gene. It was later dubbed "the language gene", because its absence in some people coincides with speech problems.

Language comprises far more than speech

It has since been shown that birds and humans have this gene in common. According to a study published in December 2014, birds and humans may share more than 50 genes connected to speech and vocal learning.

However, birdsong expert Johan Bolhuis of the University of Utrecht in the Netherlands is sceptical that the origins of human language can be explained this way. He and many linguists, including the influential American intellectual Noam Chomsky, think language comprises far more than speech. They point out that, in the absence of speech, humans use language in other ways – sign language, for instance.

These linguists believe that an evolutionary change in the human brain between 70,000 and 100,000 years ago sparked the birth of the complex, sophisticated form of language we use today. It coincides with the emergence of abstract thought, the production of jewellery and cave art. As such, it is uniquely human.

The feature that most clearly sets apart human language from the calls or songs of other species is grammar. In his theory of "universal grammar", Chomsky stated that grammar is innate to humans. That means even very young children instinctively grasp the rules, even when they don't understand the words in sentences.

Very basic rules can be learned by all sorts of species

One of Chomsky's examples is that the sentence "colourless green ideas sleep furiously" makes grammatical sense in English, even though the words themselves are nonsensical. Meanwhile, the sentence, "furiously sleep ideas green colourless" makes no sense at all because it breaks the basic rules of grammar.

These rules vary from language to language, but they are always there.

"There are no such rules in animals," says Bolhuis. "Even in songbirds they have completed songs and there are certain simple rules for those songs, but they are nothing like the structure of human language. There's no indication whatsoever that any kind of other species has that."

Rule-making in zebra finches is sophisticated, but it's not the same as human language, says Carel ten Cate of the University of Leiden in the Netherlands. "Very basic rules can be learned by all sorts of species," he says. "But there is a gradual increase in complexity and abstraction of rules which seem to be unique in humans. It's not clear where exactly what the level is that animals might reach in that hierarchy."

Even so, ten Cate says he has been surprised at the perceptual abilities of songbirds like zebra finches.

In 2014, he and his colleague Michelle Spierings found that zebra finches can pick up on changes in intonation in human speech.

These birds share many aspects of human language

She played the birds word-like sounds, with the emphasis either at the start or at the end. "They could learn which word was in which group," says Spierings. "If we gave them new sounds or words, with the same intonation, they would group it accordingly."

"For humans, intonation is very important and can change a sentence," says Spierings. "We never expected that these birds were so sensitive to these small changes in our speech."

These birds share many aspects of human language. The strange thing is that there are many animals that are more closely related to humans, and they lack these abilities.

Our closest living relatives are great apes like chimpanzees, bonobos and gorillas. "They seem to be very bad at language, and they're particularly bad at vocal learning processes," says Fisher. "So it can't be that there's a kind of ancestral state which binds us to birds because we're so distant from birds."

FOXP2 has been around for a really long time

It seems likely that birds and humans have evolved vocal learning independently. This is called "convergence", and it's quite common. For example, birds, insects and bats are only very distantly related to each other, but each group has separately evolved the ability to fly.

The convergence towards vocal learning in birds and humans may be partly explained by genetics. Fisher says the FOXP2 gene, far from being unique to humans and birds, is present in many other species.

"FOXP2 has been around for a really long time," says Fisher. "One of the things it does is something in the brain that relates to sequencing of movements. So the idea is that FOXP2 was in the right place to be able to help these kinds of vocal learning processes to emerge."

This shift only happened in some species. Something must have pushed those species to use FOXP2 differently.

It may be something to do with the way birds and humans live. "Usually vocal learning develops in groups where individuals are long-lived and there is transmission of knowledge from one generation to the other," says ten Cate.

It may be possible to piece together how human language evolved

Bats, whales and dolphins are also known to show aspects of vocal learning, and Fisher suggests they may also provide promising avenues to better understand human language. Dolphins are famously smart, says ten Cate, which may suggest that intelligence helps language to evolve.

By pulling together the evidence from all these different animals, each with their own limited set of linguistic skills and reasons for using them, it may be possible to piece together how human language evolved.

"Whether we will ever solve that question, I don't know," says ten Cate. "It may be an unanswerable question. But at least we can narrow down the number of hypotheses."

Angela Saini is a London-based science journalist and broadcaster. You can listen to her documentary What the Songbird Said on the Radio 4 website.