Our sense of hearing is truly remarkable, all the more so when you understand the rich layers of information that exist within even a single spoken sentence. Of course this is something most of us take for granted, until you discover what life is like for people who don’t have the benefit of fully functioning hearing.
I have no direct experience with cochlear implants (CIs) – electronic devices that partly compensate for severe hearing impairment. But listening to a simulation of the sound produced is salutary. It is rather like hearing things underwater: fuzzy and with an odd timbre, yet still conveying words and some other identifiable sounds.
It is a testament to the adaptability of the human brain that this information can be recognizable even when the characteristics of the sound are so profoundly altered. Some people with CIs can appreciate and even perform music.
But there are serious limitations to what people with CIs can hear. And understanding these limitations could provide insights into how sound is processed in people with normal hearing – insights that can help us identify what can potentially go wrong and how it might be fixed. That is evident in a trio of papers buried in the little-known but infallibly fascinating Journal of the Acoustical Society of America – a publication whose scope ranges from urban noise pollution to whale song and the sonic virtues of cathedrals.
These three papers examine what gets lost in translation in CIs. Much of the emotional content, as well as some semantic information, in speech is conveyed by the rising and falling of voice – known as prosody. In the English language, prosody can distinguish a question from a statement (at least before the rising inflection became fashionable). It can also tell us whether the speaker is happy, sad or angry.
But CIs cannot fully convey the pitch of sounds, as well as their “spectrum” of sound frequencies, and this means that users may find it harder to tell whether someone’s voice is happy rather than sad, for instance. So users rely more on visual than on audio information to gauge a speaker’s emotional state.
In the first study, Takayuki Nakata of Future University Hakodate in Japan and his co-workers verified that Japanese children who are born deaf but use CIs are significantly less able to identify happy, sad, and angry voices than normal hearers of the same age. They went further than previous studies, however, by showing that these difficulties prevent children from communicating emotion through prosody in their own speech. This suggests two things: first, that we learn to communicate emotion by hearing and copying; and second, that CI users face an additional burden in that other people are less likely to understand their emotions.
Difficulties in hearing pitch can create even more severe linguistic problems. In tonal languages such as Mandarin Chinese, changes in pitch may alter the semantic meaning of a word. So CI users may struggle to distinguish such tones even after years of using a device, and hearing-impaired Mandarin-speaking children who start using them before they can speak are often scarcely intelligible to adult listeners – again, they cannot learn to produce the right sounds if they cannot hear them.
To understand how CI users might perceive these language tones, Damien Smith and Denis Burnham of the University of Western Sydney in Australia tested normal hearers with audio clips of spoken Mandarin altered to simulate CIs. The results were surprising.
Both native Mandarin speakers and English-speaking subjects do better in identifying the (four) Mandarin tones when the CI-simulated voices are accompanied by video footage of the speakers’ faces. That is not so surprising. But all subjects did better with visuals alone – and in this case non-Mandarin speakers did better than Mandarin speakers.
What could this mean? It suggests that native speakers learn to disregard visual information in preference for audio information. It also suggests that training CI users might help them recognize the visual cues of tonal languages: if you like, to lip-read the tones.
Improvements in CIs are clearly needed, and scientists are busy trying to enhance changes in tone, or pitch. Xin Luo of Purdue University in West Lafayette, Indiana, in collaboration with researchers from the House Research Institute, a hearing research centre in Los Angeles, has figured out how to make CIs create pitch changes that better reflect the smooth variations in prosody.
To understand how, we need to know a little about how the cochlea senses pitch in the ear, and how CIs try to replicate this. The cochlea contains a coiled membrane, which is stimulated in different regions by different sound frequencies – low at one end, high at the other, rather like a keyboard. A CI creates a crude approximation of this continuous pitch-sensing device using a few (typically 16-22) electrodes to excite different nerve endings, producing a small set of pitch steps instead of the normal smooth pitch slope. Luo and colleagues have figured out a way of sweeping the signal from one electrode to the next such that pitch changes seem gradual instead of jumpy.
The cochlea can also identify pitches by, in effect, “timing” successive acoustic oscillations to figure out the frequency. CIs can simulate this method of pitch discrimination too, but only for frequencies up to about 300 Hertz, the upper limit of a bass singing voice. Luo and colleagues say that a judicious combination of these two pitch-sensing methods, enabled by signal-processing circuits in the implant, could one day improve pitch perception for users. Then we may, at least, to allow them to capture more of the emotion-laden prosody of speech that exists within every sentence we express.