January 16, 2011
P is happy and N is sad – a biological universal?
Twitter has been abuzz recently with news of a paper that claims to have found universal sound correlates of happiness and sadness:
Auracher, J., Albers, S., Zhai, Y., Gareeva, G., & Stavniychuk, T. (2011). P Is for Happiness, N Is for Sadness: Universals in Sound Iconicity to Detect Emotions in Poetry Discourse Processes, 48 (1), 1-25 DOI: 10.1080/01638531003674894
The central hypothesis is certainly intriguing: Speech sounds can be universally, biologically linked to certain emotions. As Auracher et al. correctly note, such a hypothesis flies in the face of a central tenet of linguistics: For almost all words, the link between sound and meaning is arbitrary. Words whose sounds are closely connected to their meaning, such as “pop” or “buzz” are the exception, not the rule. Based on previous work by Wiseman and van Peer (2003) and Albers (2008), they hypothesise that plosives evoke happiness and nasals evoke sadness. Albers, it should be noted, worked on Ancient Egyptian, a language for which no sound recordings exist. What we know about the phonology and phonetics of this language is due to razor-sharp reasoning, long hours spent slaving over parallel texts, and some of the most exacting empirical investigations in the whole of philology. (I would dearly love some Egyptologist input on the feasibility of sound symbolism studies for this language, by the way.)
Oral plosives are a class of sounds that are made by closing off a part of the vocal tract and then opening it again. You can close off the vocal tract at lots of different places. English only uses a few of these: the lips (that would be p or b), the space just behind the teeth (t, d), and the soft palate (k, g). Try saying p, t, k, and then b, d, g, and notice what happens when the closure is released.
Nasals are made in a way that is very similar to plosives, with one important difference: while the mouth is blocked off, the nasal cavity is open. Try saying m, n, ng, and compare this to b, d, g. You will find that the place at which the mouth is closed off is similar, but now, the sound “comes out” through the nose.
Which means that the difference between happiness and sadness is all in the nose …
Anyway, back to the research. What we have here is clearly an extraordinary hypothesis, which requires very strong evidence. Let’s see whether Auracher et al. have been able to provide this. At first glance, their overall approach appears to be very thorough:
[…] this study (a) uses a sufficiently large text basis to test the results for their general validity, (b) includes several languages to test the results for their universality, (c) asks participants to rate the texts to avoid subjective categorization, (d) uses an established dimensional model of emotions to categorize the phonemes, and (e) predicts a distinct relation between the relative occurrence of specific phonemes and the particular meaning expressed by the overall text.
All of these are commendable, but as always, the devil is in the detail. What is a sufficiently large text basis, and what kinds of texts should be used? Auracher and colleagues choose poetry, because there, they would expect to see particularly strong links between sound symbolism (phonosemantics) and emotional content. This is a reasonable argument, because poets harness everything language offers to create complex pieces of art that can evoke strong emotions in readers and listeners.
How many poems did the choose? Why, as far as I can see, two per language, each taken from one particular large collection of poetry. In my book, that is not “large”, but then I am also a computational linguist, and within computational linguistics, my main specialty is corpus linguistics. That is, I am used to dealing with databases of language and speech that contain thousands, tens of thousands, and millions of words of text. An alternative approach, which was taken by Whissell (1999), would be to examine the vocabulary of a language and look at the frequency of different types of sounds in words with positive or negative emotional connotations. This is very difficult to do cross-linguistically, because you need a well-curated lexicon with the appropriate information, or at least a thesaurus which would allow you to extract words for positive and negative emotions automatically.
The way Auracher et al. chose their two poems is linked to (e) – they selected the poem with the highest and the poem with the lowest plosive-to-nasal ratio from four collections of poems, one per language.
Which brings us to (b), including several languages. They selected a total of four languages, German, Russian, Ukrainian, and Chinese. German, Russian, and Ukrainian are Indo-European languages, just like Hindi and Urdu. German is a close relative of English – both are in fact West Germanic languages, while the Scandinavian languages form the North Germanic family. Russian and Ukrainian are not only both Slavic languages, they also belong to the same sub-branch of the Slavic family, East Slavic. Chinese in its manifold forms is part of the Sino-Tibetan family of languages. This is hardly a representative sample – Indo-European, and in particular East Slavic, are vastly oversampled, and whole language families have been left untouched. Typologists who study language universals would typically insist on a broader sample.
The authors recruited a substantial number of participants for each language and asked them to rate each poem on a number of dimensions that have been used to classify human emotions. (Let me repeat – each participant only saw two poems, and these were in their native language.) Asking people to rate a stimulus along several dimensions is more reliable than asking them to rate whether a stimulus is happy, angry, or sad, so this was a very good decision to make. Participants did not know the research hypothesis, but they were debriefed afterwards if they wanted to. The dimensional model that Auracher et al. used looks sound enough to me, but having dabbled in emotion research myself, I know only too well that there is no single well-established model of human emotion, and that each of the rating scales that have been proposed so far have their problems. Inter-cultural differences are not the least of these issues.
The four groups of participants, which were opportunity samples (friends, acquaintances, recruited from “clubs or associations”) differed significantly in their gender ratios, age range, and education levels. None of the Chinese participants gave any personal information. As with the language sample, I would expect more of a balance in the participant sample – or at least an attempt at maintaining similar gender ratios and a similar coverage of age groups.
Some of the results were statistically significant.
I would like to leave you with two quotes from the paper. First, one from the discussion:
It has to be acknowledged that other models (e.g., Tsur, 1992) and empirical research (e.g., Miall, 2001; Whissell, 1999, 2000) lead toward different—and partially opposite—conclusions about the meaning of nasal and plosive sounds. In particular, Whissell (1999) identified the nasal phoneme /m/ as an active–pleasant sound and the plosive /t/ as a passive–unpleasant sound.
Whissell (1999) was the empirical vocabulary study I discussed earlier as an example of the kind of methodology I would have preferred to subjective judgements of two poems per language and participant.
Then, there is this speculation from the introduction:
However, if these cross-connections between facial gestures and movements of other parts of the body are hardwired, it sounds plausible to us that the articulation of certain phonemes can sympathetically mimic postures and movements, which are related to emotional states. As an example, this could mean that sounds, expressed with a closed mouth and constantly constrained lips or tongue, such as nasal phonemes, rather simulate the body movements of people who are in depressed, melancholic, sad, or passive moods, whereas the opening of the mouth and the explosive release of the air stream in plosive phonemes is associated with active, happy people.
Remember – the difference between oral plosives and nasals is all in the nose. Hardwired neural links between sounds and emotions require strong evidence. Are you convinced?
Albers, S. (2008). Lautsymbolik in ägyptischen Texten [Sound symbolism in Egyptian texts]. Mainz, Germany: Zabern.
Miall, D. (2001). Sounds of contrast: An empirical approach to phonemic iconicity. Poetics, 29, 55–70.
Tsur, R. (1992). What makes sound patterns expressive? The poetic mode of speech perception. Durham, NC: Duke University Press.
Whissell, C. (1999). Phonosymbolism and the emotional nature of sounds: Evidence of the preferential use of particular phonemes in texts of differing emotional tone. Perceptual and Motor Skills, 89, 19–48.
Whissell, C. (2000). Phonoemotional profiling: A description of the emotional flavour of English texts on the basis of the phonemes employed in them. Perceptual and Motor Skills, 91, 617–648.
Wiseman, M., & van Peer, W. (2003). Roman Jakobsons Konzept der Selbstreferenz aus der Perspektive der heutigen Kognitionswissenschaft [Roman Jakobson’s concept of self-reference from the perspective of present-day cognition studies]. In H. Birus, S. Donat, & B. Meyer- Sickendiek (Eds.), Roman Jakobsons Gedichtanalysen. Eine Herausforderung an die Philologien (pp. 277–306). Göttingen, Germany: Wallstein.
EDIT: Misattribution of original Twitter conversation fixed. (January 16)