There’s no way round it: learning Chinese is tough. As far as reading goes, what most dismays native speakers of alphabetic languages is that Chinese characters offer so few clues. With virtually no Spanish, I can figure out in the right context that baño means bath, but that word in Chinese (洗澡) seems to offer no clues about pronunciation, let alone meaning.
There seems no alternative, then, but to slavishly learn the 3,500 or so characters that account for at least 99% of use in written Chinese. This is hard even for native Chinese speakers, usually demanding endless rote copying in school. And even then, it is far more common than is often admitted for Chinese people to forget even quite routine characters, such as 钥匙 (key). As a result, there’s been a rising dissatisfaction with current language teaching methods in China in recent years.
Is there a better way? Physicist Jinshan Wu of Beijing Normal University, a specialist in the new mathematical science of network theory, and colleagues have investigated the structural relationships between Chinese characters to develop a learning strategy that exploits these connections.
Chinese characters aren’t really as arbitrary and bewilderingly diverse as they might seem. For one thing, they are made up of a fairly limited number of sub-characters or radicals, which themselves are composed of a set of standard marks, or strokes. What’s more, the radicals often contain clues about meaning or pronunciation, or both. In the Chinese for bath, for example (pronounced xizao in the pinyin Romanization system), both characters start with the same radical, which denotes water, and the right-hand half of both characters indicates how they are pronounced. There are general rules (called liu shu, 六书) for building characters from radicals.
These connections help with learning the language. Once you know that wood is 木 (mu), it’s not so hard to remember that forest is 林 (lin) – or even more pictorially, 森林 (senlin). Assisted by the liu shu rules, Wu and colleagues mapped out the structural relationships between all 3,500 of the common characters, to form a network with over 7,000 links. This shows that the roughly 224 radicals are combined in just 1,000 or so characters that form the basis of all the others.
This network is hierarchical, meaning that it is somewhat like a tree, with a few central nodes (trunks) branching into many branch tips. That’s very different from a web-like network such as a grid or street map, in which there are often many different ways to get to any particular node. The researchers figured that it could be most efficient to start learning characters at the lower levels of the hierarchy – the trunks, as it were – and to progress gradually out towards the branch tips.
But would that necessarily be better than a strategy which focuses on the most frequently used words first? How, indeed, can one assess the relative efforts, or costs, required with different learning strategies? There’s no unique way to do this, but Wu and colleagues developed a logical, intuitive method of enumerating costs. They figured that it is easier to learn a multi-part character if all the components had been learnt previously. To take a simple case, it’s easier to learn 明 (ming: bright) if you have already learnt 日 (ri: sun, day) and 月 (yue: month, moon).
The researchers assigned cost values to each new learning task, and found the “cheapest” way to learn all the characters in the network is to start with the “trunk” characters that have the highest number of branches, and work up through the layers. But that could leave you knowing a lot of words you rarely need to use. If, on the other hand, you simply learn characters in order of use frequency (as some learning methods do), you fail to take advantage of the network connections that can aid recognition.