Loading
English and a handful of other languages dominate the internet, but this is leaving indigenous cultures without a voice online. Now they are fighting to get their own languages on the web.
I

Imagine your favourite social media platform does not let you post in English. Now think of a keyboard that won’t allow you to type in your own words. You would have two options: either switch to another language or remain digitally silent.

This is the reality for most people that speak indigenous languages and dialects.

There are nearly 7,000 languages and dialects in the world, yet only 7% are reflected in published online material, according to Whose knowledge?, a campaign that aims to make visible the knowledge of marginalised communities online.

While Facebook supports up to 111 languages, making it the most multilingual online social media platform, a survey published by Unesco in 2008 found that 98% of the internet’s web pages are published in just 12 languages, and more than half of them are in English. This reduces linguistic diversity online to a handful of tongues, making it harder for those that speak one of the excluded languages of the internet.

You might also like:

What if the internet stopped working for a day?
The alphabets at risk of extinction
Why there’s so little left of the early internet

The Kaqchikel Mayan community from Guatemala includes more than half a million speakers. Miguel Ángel Oxlaj Kumez is part of it and was one of the organisers of the first Latin American Festival of Indigenous Languages on the Internet, held in 2019.

“When I get on the internet I find more than 90% of the content in English and hence a significant percentage in Spanish and other languages,” he says. “So what I have to do is to move to another language, and that favours the displacement of my own language.”

Some branches of the indigenous Mayan language are taught in schools in Guatemala, but are not well represented on the internet (Credit: Alamy)

Some branches of the indigenous Mayan language are taught in schools in Guatemala, but are not well represented on the internet (Credit: Alamy)

“It discredits my own language, because – as it is not on the internet – then it is not valid, then it does not work, therefore why am I going to continue learning it? Why am I going to teach it to my children if, when I turn on the internet or television, I cannot find it there?”

Oxlaj Kumez is working with other activists to create a version of Wikipedia in Kaqchikel Mayan, as well as a translated version of Mozilla’s Firefox web browser. His dream is to be able to have a “digital life in my own language, and when I decide to move to another language that it will be my decision”.

He is not the only one with that dream. In 2003, Unesco adopted a recommendation to promote the use of multilingualism online. Ever since, the organisation has been pushing for universality on the internet, with a special focus on indigenous languages.

The first problem, and possibly the most challenging, is access. According to Internet World Stats, only 58% of the world’s population has access to online infrastructure. And while 76% of the cyber population lives in Africa, Asia, the Middle East, Latin America and the Caribbean, most of the online content comes from elsewhere.

Take Wikipedia, for example, where more than 80% of articles come from Europe and North America. A similar thing happens with 75% of the web’s top-level domains, from .com to .org, which come from the same two regions.

Keyboards do not come with the spellings of indigenous languages, and when I write in my own language the autocorrect keeps changing my texts – Miguel Ángel Oxlaj Kumez

“There are several difficulties, and the technical is just one of them,” explains Oxlaj Kumez. “Keyboards are designed for dominant languages. The keyboards do not come with the spellings of indigenous languages, and since the platforms are in Spanish or English or in another dominant language, when I write in my own language the autocorrect keeps changing my texts.”

There are different levels to this linguistic divide. From hardware like keyboards to programming languages, from website domains to applications and social media platforms, the lack of diverse alphabets is the first gap of many that prevents almost all the indigenous languages being part of the online conversation.

For Victoria Aguilar, the main problem is that societies are now dragging the same structural social inequalities found offline to the internet.

“We need a lot of work in localisation, in adapting the technology to our needs,” she says. “The internet is a wonderful channel of communication, but it also reflects the inequalities in real life. The way in which some forms of writing are being neglected is affecting the fact that we cannot write freely on the internet. We need technologies that allow us to accelerate this process.”

Aguilar is a native speaker of the Mixteco language and a linguistics student at Mexico’s National University. With the help of a designer, she is creating a new typeface so that she can write online in her own language with the right orthography.

Following the Latin American Festival of Indigenous Languages on the Internet, activists are fighting to make the internet open to all  who choose to use it (Credit: FLLII2019)

Following the Latin American Festival of Indigenous Languages on the Internet, activists are fighting to make the internet open to all who choose to use it (Credit: FLLII2019)

Through her work, she found that digital citizenship has a double-edge impact. On one hand, she says, it helps to make visible First Nation communities, but on the other, she is afraid that the speed of the internet’s spread could accelerate the disappearance of minority languages. 

“If we do not hurry at this time with the technologies, it can play against us, because it can pull us towards a Spanish language homogenisation in the case of Mexico,” she says. “It is a key moment for languages, because there is an internet boom and more and more people have data.”

In some areas, things are improving. Unicode is a computing standard that encodes characters, alphabets, numbers and even emojis into scripts. Latin, for example, is a script that works for dozens or even hundreds of languages – but some scripts are used for only one language. As of 2020 Unicode supports 154 scripts.

Internationalised Domain Names (IDN) tackle this challenge from the domain name perspective. Sarmad Hussain is running the implementation work within the Internet Corporation for Assigned Names and Numbers (Icann).

It is a key moment for languages, because there is an internet boom and more and more people have data – Victoria Aguilar

“The domain names system was based on the American Standard Code for Information Interchange (Ascii), so it means that domain names were limited to characters, what we call the ‘letter, digit, hyphen scheme’, so basically letters A to Z, digits 0 to 9 and hyphen, so those were the only ones you could use to develop domain names,” explains Hussain.

“The community, eventually, as the internet expanded in countries that were not using the Ascii set of characters, had a clear need to expand this domain name system to support all the other languages and scripts around the world.”

Since 2003, the IDN project has enabled 152 languages, 75 in Chinese, Japanese and Korean (CJK) scripts, and 33 in Arabic scripts. There are now more than 9 million registered IDNs, making up 2.5% of all domains.

Opening up domain names to new languages, along with expanding internet access, has already had an effect on online populations and the content they produce. Findings of a study, carried out by the Council of European National Top-Level Domain Registries and the Oxford Information Labs, “indicate that country and regional TLDs (top level domains) boost the presence of local languages online and show lower levels of English language than is found in the domain name sector worldwide”.

While some indigenous groups have fought hard to get their languages the recognition they deserve, their presence online is still marginal (Credit: Alamy)

While some indigenous groups have fought hard to get their languages the recognition they deserve, their presence online is still marginal (Credit: Alamy)

The Wikimedia community have acknowledged the struggle to make Wikipedia more diverse and multilingual. As of December 2019, the collaborative encyclopaedia had published articles in 307 languages, making it the most diverse online platform.

“There is a responsibility of technology platforms to give access to technology in these languages and to reduce the internet access gap, and there is responsibility of the state as well,” says José Flores, Wikimedia vice-president in Mexico, where the chapter has been working on diversifying its content with community members of the more than 60 indigenous languages spoken in the country.

But the gap can’t be closed by technology companies and the state alone. “It seems to me that academia, the community itself and even journalism and media are responsible, because there is a need of more sources to build articles on Wikipedia.”

Wikipedia articles need to cite second-hand published sources, like news articles or academic publications. This often becomes an issue for communities that are not well documented. A whole Wikipedia page in one language, explains Flores, needs to reference between 800 and 1,000 published articles. These requirements keep several Wikipedia indigenous pages in an incubator stage.

“It's not just that we need to connect but also how we connect,” adds Flores. “It goes beyond the infrastructure, because it is also about the social uses of this infrastructure.”

Access to devices and sources are not the only issues. Estimations indicate that nearly 43% of the world’s languages and dialects are unwritten, posing another important challenge for the way they could fit in the often text-based online world.

As more people in indigenous cultures get online, there is a risk that their own languages will become increasingly irrelevant if they cannot use them (Credit: Getty Images)

As more people in indigenous cultures get online, there is a risk that their own languages will become increasingly irrelevant if they cannot use them (Credit: Getty Images)

That is where projects like Lingua Libre, a Wikimedia Foundation-funded platform to record oral languages, come in. The archive, run by Wikimedia France, opened in August 2018 and already contains more than 100,000 recordings in 43 languages that, otherwise, could have been lost forever.

Back in Guatemala, Miguel Ángel Oxlaj Kumez is aware that the challenges ahead remain difficult and complex, but he is not discouraged. “We see the challenges as opportunities,” he says. “In the workshops, I raised the question ‘Why is it necessary to have my language on the internet?’ And an activist turns it over and tells me ‘Why shouldn’t I have my indigenous language on the internet?’."

He is currently working with other online indigenous activists to create Kaqchikel Mayan versions of Wikipedia, WhatsApp and Duolingo. “Five years ago, I did not imagine my language on the internet, and there are people who still don’t think of that possibility”.

In the meantime, he’s glad that there’s a growing network of indigenous speakers fighting to get their languages online.

"Now it is in the hands of this network of activists," he adds. "And we all have the dream of making it happen.”

--

Join one million Future fans by liking us on Facebook, or follow us on Twitter orInstagram.

If you liked this story, sign up for the weekly bbc.com features newsletter, called “The Essential List”. A handpicked selection of stories from BBC Future, Culture, Worklife, and Travel, delivered to your inbox every Friday.

Around the BBC