There’s no way round it: learning Chinese is tough. As far as reading goes, what most dismays native speakers of alphabetic languages is that Chinese characters offer so few clues. With virtually no Spanish, I can figure out in the right context that baño means bath, but that word in Chinese (洗澡) seems to offer no clues about pronunciation, let alone meaning.
There seems no alternative, then, but to slavishly learn the 3,500 or so characters that account for at least 99% of use in written Chinese. This is hard even for native Chinese speakers, usually demanding endless rote copying in school. And even then, it is far more common than is often admitted for Chinese people to forget even quite routine characters, such as 钥匙 (key). As a result, there’s been a rising dissatisfaction with current language teaching methods in China in recent years.
Is there a better way? Physicist Jinshan Wu of Beijing Normal University, a specialist in the new mathematical science of network theory, and colleagues have investigated the structural relationships between Chinese characters to develop a learning strategy that exploits these connections.
Chinese characters aren’t really as arbitrary and bewilderingly diverse as they might seem. For one thing, they are made up of a fairly limited number of sub-characters or radicals, which themselves are composed of a set of standard marks, or strokes. What’s more, the radicals often contain clues about meaning or pronunciation, or both. In the Chinese for bath, for example (pronounced xizao in the pinyin Romanization system), both characters start with the same radical, which denotes water, and the right-hand half of both characters indicates how they are pronounced. There are general rules (called liu shu, 六书) for building characters from radicals.
These connections help with learning the language. Once you know that wood is 木 (mu), it’s not so hard to remember that forest is 林 (lin) – or even more pictorially, 森林 (senlin). Assisted by the liu shu rules, Wu and colleagues mapped out the structural relationships between all 3,500 of the common characters, to form a network with over 7,000 links. This shows that the roughly 224 radicals are combined in just 1,000 or so characters that form the basis of all the others.
This network is hierarchical, meaning that it is somewhat like a tree, with a few central nodes (trunks) branching into many branch tips. That’s very different from a web-like network such as a grid or street map, in which there are often many different ways to get to any particular node. The researchers figured that it could be most efficient to start learning characters at the lower levels of the hierarchy – the trunks, as it were – and to progress gradually out towards the branch tips.
But would that necessarily be better than a strategy which focuses on the most frequently used words first? How, indeed, can one assess the relative efforts, or costs, required with different learning strategies? There’s no unique way to do this, but Wu and colleagues developed a logical, intuitive method of enumerating costs. They figured that it is easier to learn a multi-part character if all the components had been learnt previously. To take a simple case, it’s easier to learn 明 (ming: bright) if you have already learnt 日 (ri: sun, day) and 月 (yue: month, moon).
The researchers assigned cost values to each new learning task, and found the “cheapest” way to learn all the characters in the network is to start with the “trunk” characters that have the highest number of branches, and work up through the layers. But that could leave you knowing a lot of words you rarely need to use. If, on the other hand, you simply learn characters in order of use frequency (as some learning methods do), you fail to take advantage of the network connections that can aid recognition.
The ideal approach, which Wu’s team adopts, is a compromise between the two: it’s rather like planning a shopping trip by seeking the shortest path between shops while also contriving to pick up the heaviest items last. Adjusting the relationship network by giving a certain weighting or priority to each character depending on its use frequency, means the learning path spreads gradually through the network while picking up most of the common characters first.
The researchers compared the learning cost of their strategy with that for the most widely used textbook in Chinese primary schools (covering 2,475 characters) and a popular textbook for learning Chinese as a second language. For a given cost, their new strategy picked up both considerably more characters in total and a significantly greater total use frequency than the two alternatives.
What’s more, the researchers say that their approach would allow each student’s learning strategy to be tailored to his or her individual strengths – for example, to suit those who have already learnt some characters. This just isn’t possible with traditional approaches.
Of course, the ultimate test is whether students do actually learn faster. This remains to be seen. But with the debate continuing to rage in China over current teaching methods, this new proposal shows that there may be rational ways to pursue the question.
If you would like to comment on this article or anything else you have seen on Future, head over to our Facebook page or message us on Twitter.