Emoji are already butting their heads with traditional words, but could they take over completely? Linguist Neil Cohn casts his expert eye over the pictures taking the world by storm.

The year 2015 could be called the year of the emoji. They have landed a teenage boy in a police cell and prompted Vladimir Putin’s wrath in Russia, and the loveable smiley faces are even set to come to life in their own Hollywood film. Emoji are now used in around half of every sentence on sites like Instagram, and Facebook looks set to introduce them alongside the famous “like” button as a way of expression your reaction to a post.

To many, emoji are an exciting evolution of the way we communicate; to others, they are linguistic Armageddon.

If you were to believe the headlines, this is just the tipping point: some outlets have claimed that emoji are an emerging language that could soon compete with English in global usage. To many, this would be an exciting evolution of the way we communicate; to others, it is linguistic Armageddon.

As a linguist concerned with visual communication, I have been interested to explore exactly what lies in these claims. Do emoji show the same characteristics of other communicative systems and actual languages? And what do they help us to express that words alone can’t say?

When emoji appear with text, they often supplement or enhance the writing. This is similar to gestures that appear along with speech. Over the past three decades, research has shown that our hands provide important information that often transcends and clarifies the message in speech. Emoji serve this function too – for instance, adding a kissy or winking face can disambiguate whether a statement is flirtatiously teasing or just plain mean.

This is a key point about language use: rarely is natural language ever limited to speech alone. When we are speaking, we constantly use gestures to illustrate what we mean. For this reason, linguists say that language is “multi-modal”. Writing takes away that extra non-verbal information, but emoji may allow us to re-incorporate it into our text.

Emoji are not always used as embellishments, however – sometimes, strings of the characters can themselves convey meaning in a longer sequence on their own. But to constitute their own language, they would need a key component: grammar.

A grammatical system is a set of constraints that governs how the meaning of an utterance is packaged in a coherent way. Natural language grammars have certain traits that distinguish them. For one, they have individual units that play different roles in the sequence – like nouns and verbs in a sentence. Also, grammar is different from meaning, which is why an active sentence like Hobbes tackled Calvin conveys roughly the same meaning as the passive Calvin was tackled by Hobbes, though they differ in the sequencing of their grammatical structure.

In addition, grammars are made up of groupings of units. The sentence Calvin, who is a short blonde boy, was tackled by Hobbes has several groupings, most noticeably the clause (Calvin) is a short blonde boy which is embedded inside the sentence Calvin was tackled by Hobbes.

When emoji are isolated, they are primarily governed by simple rules related to meaning alone, without these more complex rules. For instance, according to research by Tyler Schnoebelen, people often create strings of emoji that share a common meaning, like this texted birthday greeting:

This sequence has little internal structure; even when it is rearranged, it still conveys the same message. These images are connected solely by their broader meaning. We might consider them to be a visual list: “here are all things related to celebrations and birthdays.” Lists are certainly a conventionalised way of communicating, but they don’t have grammar the way that sentences do.

What if the order did matter though? What if they conveyed a temporal sequence of events? Consider this example, which means something like “a woman had a party where they drank, and then opened presents and then had cake”:

In this case, the units are connected only by linear order. One unit “happens” after the next. Rearranging this sequence would create a new order, or perhaps would just revert to loose meaningful connections, like the ones about birthdays above.

Another technique appears when people are talking about objects doing things. Schnoebelen gives these examples:

In all cases, the doer of the action (the agent) precedes the action. In fact, this pattern is commonly found in both full languages and simple communication systems. For example, the majority of the world’s languages place the subject before the verb of a sentence.

These rules may seem like the seeds of grammar, but psycholinguist Susan Goldin-Meadow and colleagues have found this order appears in many other systems that would not be considered a language. For example, this order appears when people arrange pictures to describe events from an animated cartoon, or when speaking adults communicate using only gestures. It also appears in the gesture systems created by deaf children who cannot hear spoken languages and are not exposed to sign languages. In Goldin-Meadow’s book The Resilience of Language, she describes the children as lacking exposure to a language and thus invent their own manual systems to communicate, called “homesigns”. These systems are limited in the size of their vocabularies and the types of sequences they can create. For this reason, the agent-act order seems not to be due to a grammar, but from basic heuristics – practical workarounds – based on meaning alone. Emoji seem to tap into this same system.

Nevertheless, some may argue that despite emoji’s current simplicity, this may be the groundwork for emerging complexity – that although emoji do not constitute a language at the present time, they could develop into one over time.

In the 1970s, deaf homesigners in Nicaragua were brought together in a school for the first time. The result was a new Nicaraguan sign language, which is still developing

Some precedent for this exists in sign languages too. In the 1970s, deaf homesigners in Nicaragua were brought together in a school for the first time. In sharing their own individual systems with each other, a more complex system began to emerge, which grew to the richness of a full language as new cohorts entered the school. The result was a new Nicaraguan sign language, which is still developing.

The birth of a new tongue?

Could an emerging “emoji visual language” be developing in a similar way, with actual grammatical structure? To answer that question, you need to consider the intrinsic constraints on the technology itself.

Emoji are created by typing into a computer like text. But, unlike text, most emoji are provided as whole units, except for the limited set of emoticons which convert to emoji, like :) or ;). When writing text, we use the building blocks (letters) to create the units (words), not by searching through a list of every whole word in the language. Drawings are similar, combining simple building blocks (lines and shapes) to make larger units (representational drawings).

Emoji do not allow this building of units from parts, however. For example, let’s say I want to talk about my brother surfing. I could assign a mustachioed emoji to represent my brother (itself a challenge because of the limited emoji vocabulary) and then combine it with the one for surfing to make the sequence.

This follows the agent-action pattern described above. It lacks the flexibility that I might have with drawing though - if I wanted to naturally convey this message with pen and paper, I’d just draw my brother surfing, with his head on the surfing body, with no awkward and artificial sequencing.

In this way, emoji force us to convey information in a linear unit-unit string, which limits how complex expressions can be made. These constraints may mean that they will never be able to achieve even the most basic complexity that we can create with normal and natural drawings.

What’s more, these limits also prevent users from creating novel signs – a requisite for all languages, especially emerging ones. Users have no control over the development of the vocabulary. As the “vocab list” for emoji grows, it will become increasingly unwieldy: using them will require a conscious search process through an external list, not an easy generation from our own mental vocabulary, like the way we naturally speak or draw. This is a key point – it means that emoji lack the flexibility needed to create a new language.

Do you talk comic book?

The irony is that the focus on emoji has meant that many have neglected that we already have very robust visual languages, as can be seen in comics and graphic novels. As I argue in my book, The Visual Language of Comics, the drawings found in comics use a systematic visual vocabulary (such as stink lines to represent smell, or stars to represent dizziness). Importantly, the available vocabulary is not constrained by technology and has developed naturally over time, like spoken and written languages.

Unlike emoji, the visual language used in comics creates “grammatical” sequences of images, making them more similar to spoken or sign languages

What’s more, unlike emoji, the visual language used in comics creates “grammatical” sequences of images in a way that makes them much more similar to spoken or sign languages. In this case, the grammar of sequential images is more of a narrative structure – not of nouns and verbs. Yet, these sequences use principles of combination like any other grammar, including roles played by images, groupings of images, and hierarchic embedding.

Take this sequence adapted from the comic One Night by Tym Godek where a man lying in bed considers getting up and taking a shower, only to decide against it. Not only is the ordering crucial to convey the meaning; you can also see a smaller sequence embedded within a larger structure. That’s very similar to the clause in a sentence such as Calvin, who is a short blonde boy, was tackled by Hobbes.

All this requires some form of grouping and hierarchy – traits important to grammars in natural languages. Perhaps the most convincing evidence that this constitutes a “grammar” in this “visual language” comes from our studies of the brain. In an experiment published last year in the journal Neuropsychologia, we measured participants’ brainwaves while they viewed sequences one image at a time where a disruption appeared either within the groupings of panels or at the natural break between groupings. The particular brainwave responses that we observed were similar to those that experimenters find when violating the syntax of sentences. That is, the brain responds the same way to violations of “grammar”, whether in sentences or sequential narrative images.

I would hypothesise that emoji can use a basic narrative structure to organise short stories (likely made up of agent-action sequences), but I highly doubt that they would be able to create embedded clauses like these. I would also doubt that you would see the same kinds of brain responses that we saw with the comic strip sequences.

Despite this, I believe that emoji are still very useful for enhancing and enriching the text of our contemporary digital conversations and interactions, injecting a note of humour, affection or even melancholy into the most concise message. Their increasing popularity serves as a reminder that there is a lot more to our communication than words alone. However, they pale in comparison to the richness or complexity of both natural written languages and the visual languages that already exist in the drawings we have used for millennia.

Neil Cohn is a post-doctoral research fellow at the University of California, San Diego. You can find his work at the Visual Language Lab.

Follow us on Facebook, TwitterGoogle+LinkedIn and Instagram.