Laughter and Voice in Humor and Irony
The complexity of factors shaping irony use and understanding underscore our concern with standard linguistic and psycholinguistic methods for assessing when people interpret different aspects of ironic meaning and whether they also appreciate any humor in what was said. We argue that more elaborate behavioral studies are needed to explore the range of ironic actions people engage in, both verbally and non-verbally. One psychological perspective emphasizes that irony is based on speakers pretending to be certain people or have certain views (a kind of “staged communicative act”, Clark 1996; Gibbs 2000b). These kinds of staged communicative acts are not merely verbal, but involve a number of different bodily actions which both enact irony and humor and possibly elicit humorous responses in others. For instance, when Anne says “Isn’t it so nice to have guests here?” she is taking action at two levels. First, she is asking a rhetorical question, one that is not meant to be literally answered. Second, she is pretending to embrace the idea conveyed by her questions, namely that it must be nice to have guests visiting. By highlighting the contrast between these two levels of meaning, the speaker can efficiently communicate a variety of social and affective meanings. Layering reflects metarepresentational reasoning because a speaker is, once more, alluding to, or echoing, some attributed utterance or thought of another person (i.e., people who really do like to have guests visiting), thus creating a representation of a representation (i.e., a second-order belief). Understanding ironic utterances, therefore, requires people to recognize the second-order nature of the speaker’s beliefs if they are to correctly infer that individual’s intended meaning (Bryant 2012; Colston and Gibbs 2002; Sperber and Wilson 1995).
Metarepresentational reasoning sounds very cognitive, and disembodied, but actually involves many non-linguistic actions that help create staged personas. Even if ironic speakers are pretending to act in certain ways, and indeed pretending to be other people, their actions, including non-linguistic ones such as laughing and tone of voice, are often critical to understanding the nature of ironic meaning and humor. Ironic speakers are not saying things to be funny; they are performing irony with humor and often expect listeners to co-participate in their “on stage” performances by adopting similar forms of pretense. Once again, speakers together adapt an “ironic mode” of not just speaking, but acting that includes various behaviors.
For example, empirical studies clearly demonstrate how laughter is often associated with humor and irony in combination. For example, Bryant (2010a) found that over half of all ironic utterances in spontaneous conversations between friends were closely associated with laughing. When used with irony, laughter may help listeners recognize an ironic intention, although to our knowledge, this has never been demonstrated empirically. Conversely, it is easy to imagine how laughter could be used to create ambiguity in meaning when used with irony. For instance, if a speaker intended to humiliate a victim through sarcasm, it might not be clear if laughter produced before, during, or just after the ironic utterance was intended to amplify the humiliation, or signal innocent play. Particular acoustic features of the laughter might be critical. Noisy, atonal features often associated with aggression and strength (e.g., Morton 1977) might facilitate antagonistic sarcasm, for example. Moreover, if irony is used as a means to sort social alliances, only those who are in on the joke might engage in laughter, and this could reveal a prevailing social network.
Most research on laughter examines people’s reactions to humorous videos and comics. But people also laugh when they are not being humorous (Provine 2000). One analysis of over 1000 laughs in various natural contexts revealed that speakers were much more likely to laugh than listeners, suggesting that others’ humor production is not the predominant trigger for laughter (Provine 1993). Many of the comments immediately preceding spontaneous laughter were not obviously funny. Given this observation, laughter may best be understood as a stereotyped, social vocalization not necessarily linked to humor. But many of the seemingly mundane utterances happening just before speakers laugh are probably actually often funny to those involved. There is no way to know why people laugh when they do without asking them, and even then, many people would have difficulty articulating exactly why they laughed (Provine 2000).
A recent theory of humor describes why people laugh when they do. Flamson and Barrett (2009) proposed that humor is often a form of encryption in which people signal honestly the possession of certain knowledge that requires specific implicit knowledge to recognize (i.e., a key). When this recognition occurs, it is subjectively experienced as funny, and often results in some indicator of that (e.g., a laugh). Flamson and Barrett manipulated the level of encryption in presented jokes, tested for prior relevant knowledge, and examined the effect of the variables on funniness. For example, consider the following headline taken from the satirical newspaper The Onion: “Frank Gehry no longer allowed to make sandwiches for grandkids.” This line appeared with a picture of an abstract and oddly stacked assemblage of bread and lunch meat on a plate. Headlines and pictures like this were presented along with high encryption and low encryption paragraphs. In this example, the high encryption information provided basic biographical information of Frank Gehry such as when he was born, where he went to college, and where he currently lives. The low encryption paragraph was matched in length but instead included information about his architectural career and fame for designing curved and uneven buildings. Participants were also tested for their background knowledge. As expected, high encrypted jokes were generally judged as being significantly funnier, with people having prior background knowledge of the joke topics more likely to judge jokes as being funny, especially when that knowledge was needed to actually “get” the jokes.
To a large extent, people’s production and understanding of laughter follows general principles that have previously been outlined for irony (Kaufer 1977). First, some people recognize the irony and understand what the speaker/author intends to communicate, and because of their wisdom may be called “wolves.” People who fail to recognize the irony mistake what the speaker/author appears to say for what he or she intends to say. For their gullibility, these people may be called “sheep.” Audiences who agree with the speaker/author’s intended meaning are “confederates” and those that disagree are “victims.” These two groups are not the same as wolves and sheep, because understanding what the author intends to say and agreeing with it are distinct aspects of communication. Overall, irony divides the audience into four groups: (1) those who recognize the irony and agree with the author’s intended message (i.e., wolf-confederates), (2) those who recognize the irony but disagree with the author’s intended message (i.e., wolf-victims), (3) those who do not recognize the irony but would agree with the author’s message if they had correctly understood it (i.e., sheep- confederates), and (4) those who do not recognize the irony and would not accept the author’s communicative message (i.e., sheep-victims). The main job of an ironic speaker is to create as many wolf-confederates as possible while keeping to a minimum the number of sheep-confederates who wrongly believe themselves opposed to the creator’s position (Kaufer 1977).
Our suggestion is that laughter too may be divided into the same four categories (e.g., wolf-confederates, wolf-victims, sheep-confederates, and sheep- victims), and thus serve different purposes depending on the kind of encrypted information shared between conversational participants. If laughter indicates the recognition of some encrypted information, it is important to know whether the listener actually got the joke, or they merely recognized the presence of a joke and produced a laugh to pretend understanding. This effort is motivated by the desire for accessing the many possible benefits of sharing a joke with someone - a potentially costly kind of deception for joke producers. Scholars have distinguished between different kinds of laughs, designating some as deliberate as opposed to spontaneous (Keltner and Bonanno 1997). This distinction relates closely to “fake” versus “real” smiles, or Duchenne smiles (Ekman, Davidson and Friesen 1990). Laughter generated during the production of different kinds of smiles have been labeled similarly, and some research has explored the difference between real and fake laughs, though no acoustic data as of yet exists. One possibility is that the production and perception of different laughs and smiles emerge as a kind of co-evolutionary arms race. From this perspective, laughing is manipulative, and people must mind read laughers to assess a person’s real intention when laughing. Is the laugher actually revealing the possession of a “key” and generating a spontaneous laugh motivated by the
Figure 1: Spectrogram (FFT method, window length - 0.005 s., Gaussian window shape, dynamic range - 50 dB) and wave form of conversation segment containing antiphonal laughter. Lines in spectrogram indicate fundamental frequency (pitch). Straight lines underneath waveform indicate laughter calls, and curved lines indicate inspiration. 1. First antiphonal laugh sequence where each speaker produced a two-call bout with the first call unvoiced, and the second call voiced. 2. Second antiphonal laugh sequence with bottom speaker (Annie) producing initial inspiratory call, then four expiratory calls, each with two microelements.
Top speaker (Jill) produced an initial long antiphonal call in synch with Jill's second call (first expiratory call), and proceeds with four short calls. 3. Simultaneous inspiration revealing synchronized breathing and speech rates.
pleasure of decryption? Or is the laugher generating a deliberate laugh in an effort to align strategically with the speaker without the requisite knowledge that reveals insider status, or some other relevant dimension?
Recent research has examined the acoustic features of laughing in spontaneous conversation (Bryant 2010b). People often engage in shared laughter (also called antiphonal laughter), and researchers are just beginning to distinguish this behavior from other forms of laughter (Smoski and Bachorowski 2003a,b; though see Jefferson 1979 for an early exception). Consider the example illustrated in Figure 1. Jill and Annie are 18 year-old women who have been friends for about four months. As in the earlier example, these friends were discussing roommate experiences, and began laughing together immediately. In the figure, laughs are designated by straight lines with end points, inspiratory (inhaling) elements are indicated by curved lines, and the text of the speech is provided for an eight-second period. First note that there is much laughter here even though nothing funny is being said. More interesting is the way these speakers laugh together. They began by producing quite similar laughs (Annie after Jill) where the first part was unvoiced, and the second half was voiced. When Annie suggested that Jill talk about her roommate, she laughed again, triggering Jill to laugh along in close synchrony. The two ended this antiphonal laugh bout with simultaneous inhalations likely indicating their mutually entrained speech production and breathing (Wilson and Wilson 2006). This lead Jill to ask why she must go first, that caused Annie to laugh again. The conversation continued on like this for ten minutes with much shared laughing. There are at least two adaptive reasons why interacting speakers might laugh simultaneously. First, laughing can mutually signal affiliative intentions. Voiced laughter is associated with greater friendliness, higher interest in meeting, and positive emotions (Bachorowski and Owren 2001) that can contribute to cooperative behavior, and can also signal sexual interest (Grammer 1990). Laughing is associated with endogenous reward, generally inferred from research identifying brain areas linked to laughter that have known mesolimbic dopamine pathways (Dunbar 2002; Panksepp 2007), and might offer other physiological benefits as well, including helping the immune system (Hasan, and Hasan, 2009). By laughing together, conversationalists can jointly communicate a willingness to pursue a relationship and/or continue to cooperate. If people signal to each other the possession of mutual unspoken knowledge, they are assorting themselves socially - laughter can operate as a type of social glue in such contexts (thus opening up a niche for social exploitation described above). This coordinated shared laughing constitutes the second potential function. Antiphonal laughter generates a signal that is broadcast to those outside of the interaction, so listeners can infer relevant relationship information by hearing others laugh together. A group of people laughing in concert can quickly signal to others a whole range of information, including but not limited to normative expectations, current alliances, sources of potential conflict, and various aesthetic preferences. In this way, laughter can be a powerful social group communication device.
Given these two possible functions of antiphonal laughter, laughing should possibly have specific acoustic properties that assist others in recognizing the laughter as such. For example, the effort to broadcast affiliation to outsiders might be enhanced if the antiphonal laughter was longer, louder, and less acoustically variable - all features that facilitate signal transmission in noisy environments. An analysis of over 2000 laughs taken from 40 natural conversations between friends and strangers, in all gender combinations, indicated that antiphonal laughs between friends possess many of the predicted differences from individual laughs (Bryant 2010b). When laughing antiphonally, a speaker generates the first laugh, an “initiation,” and their partner produces a “response” laugh. This study isolated laughs, extracted them acoustically, and then coded for various features and subjected to acoustic analysis. Not surprisingly, friends laughed more than strangers, and women laughed more than men. Friends and females also produced more voiced laughs (i.e., laughter with tonal properties) relative to unvoiced laughs. As mentioned earlier, voiced laughter has been judged emotionally as relatively more positive (Bachorowski and Owren 2001). Friends also produced significantly more antiphonal laughs than strangers, but did so with much more variable timing. In Bryant (2010b), a response laugh needed to occur within one-second of the initiation in order for the pair to be categorized as antiphonal. The time from the onset of the initiation laugh to the onset of the response laugh was measured, and an unexpected difference was revealed between friends and strangers. The average latency for a response in all antiphonal laughs within one-second was about 400ms, with friends being slightly, although not significantly longer. But the variation between these groups was radically different. Using a bootstrap analysis, distributions of the standard deviations of the latencies were generated, and strangers were much less variable, with very little overlap between distributions. This suggests that strangers responded to antiphonal laughs in a much more regimented manner, possibly as a reflexive laugh reflecting an early relationship negotiation tactic (and/or manipulative social strategy). Friends, conversely, produced much more variable laugh responses, perhaps reflecting the tremendous diversity of familiar interactive behavior. If laughers are signaling something about their relationship to others outside the interaction, we might expect that the coordinated signals would be quite varied given that more variable (although still coordinated) signals honestly reflect the investment of time friends must spend to organize complex behavior (Hagen and Bryant 2003).
This type of analysis speaks to the long standing conflict between examining actual communication-in-action and experimental control. If one is interested in how laughs manifest themselves during real interaction, it is necessary to retrieve the behavior of interest from the wild, so to speak. Most researchers studying laughter use various stimuli to elicit laughter in experimental contexts, but there is good reason to suspect that the acoustic properties of this kind of laughter might be different from more natural, spontaneous laughter. Moreover, these studies do not address the crucial issue of how laughter functions in normal conversation. Studying conversational laughter, however, presents researchers with difficult methodological problems associated with the messiness of spontaneous communicative behavior. Laughs often co-occur with talk and overlap with other speakers, as well as manifest with incredible phonetic variation. One simple criterion researchers often use is to rely on subjective judgment of what “sounds like a laugh” to decide on whether a laugh is present. But there are basic acoustic features that often distinguish laughs from other vocalizations, such as short (~500 ms) and often successive expiratory elements that are similar within individual bouts, and generally either tonal (i.e., vowel-like), or if nonvoiced, have clear onsets (Bachorowski, Smoski and Owren 2001; Vettin and Todt 2004). Even with these relatively specific features, deciding on the acoustic boundaries of laughs in speech can be extremely difficult, and somewhat subjective. Speakers blend speech into laughs both at onset and offset, and many unvoiced laughs are barely more than a 100 ms puff of air. While laughter punctuates speech in what appears to be rule-governed ways (Provine 1993), it does so with extreme complexity - certainly a contributing factor in the paucity of detailed acoustic and pragmatic analysis of laughter in the wild. Laughter can often mark humorous interaction and utterances, but how and when people signal they are being funny is clearly not so simple. One question of recent interest is whether speakers mark their humor prosodically. For example, irony is frequently assumed to have a special “tone of voice” that makes it uniquely identifiable as conveying ironic meaning and perhaps humor. For instance, a sarcastic speaker might lower his pitch, speak louder, and slow down his speech rate relative to his baseline speech. These particular adjustments are often found with actors (Rockwell 2000). But in spontaneous conversations, the acoustic patterns are not at all consistent, except that speakers quite often slow down (Bryant 2010a). Consider the following exchange between two housemates discussing past roommate experiences:
(2) Kristen: “My side of the room would always be messy.”
Shayna: “You the messy? Ha.”
Kristen: “Hah ha ha, I know, can you believe it?”
Kristen explains that in a past living situation her side of the room would be messy, and this comes as no surprise to Shayna, her current roommate. Shayna responds with an ironic rhetorical question that elicits ironic jocularity, and in it she exaggerates particular prosodic features associated with interrogatives. She also laughs immediately after, clearly marking her ironic intention. Kristen responds with exaggerated surprise signaling her participation in the irony, especially with shared laughter following Shayna’s laugh. These vocal features not only contrast from baseline speaking patterns in the current conversation, but also sound over-the-top for ordinary speech. Functionally, these vocal signals serve to mark play, and make this part of the interaction distinct from other talk in the immediate communicative context.
One study of spontaneous ironic speech isolated the specific instances of irony, and then compared those utterances acoustically to speech immediately preceding them by the same speaker (Bryant 2010a). This method quantitatively measures how particular vocal changes manifest themselves in discourse. Not only do speakers change vocal dimensions in significant and perceptible ways, they do so in a manner reflecting the particular emotional content of their speech. Thus, Shayna’s ironic rhetorical question contained exaggerated features of questions, not some stereotyped form such as low pitch and nasalization - speech characteristics commonly associated with verbal irony (Cutler 1974; Rockwell 2000). Interrogatives have greater pitch range than complementary declaratives, and have a distinctive rise-fall pattern in the pitch and amplitude as well as stress on specific parts of the sentence. Shayna’s question contained these exaggerated features relative to her speech immediately preceding it.
Of course, context drives a good deal of the comprehension process as well. Prosody is not always necessary for accurate understanding but can serve to highlight various aspects of the intention, including the humor. Kristen signals her understanding of Shayna’s intention, as well as her willingness to participate in the play by also contrasting several acoustic dimensions in her statement of pretended surprise. But if her statement was another ironic utterance, her vocal features likely would have been quite different. For example, if she has instead said, “I know, I’m a bad person” with exaggerated features of shame and sadness (e.g., higher, descending pitch, and lowered, descending amplitude), then the contrast from her previous speech would still be noticeable, but the actual form would have better reflected the quite different emotional connotation.
In all kinds of speech communication, prosodic contrasts can manifest in a variety of ways depending on the affective and intentional goals of the speaker and the communicative context. Flamson, Bryant and Barrett (2011) examined the vocal features of humor in a natural context amongst rural Brazilian farmers. Five hours of speech were recorded during two community meetings where landowners discussed matters of business - not a time for joking in the traditional sense, but certainly a time when speakers use various communicative tactics to highlight alliances. If speakers are signaling encrypted knowledge as a means for highlighting the depth of similarity with those audience members who manage to detect the encryption (i.e., think an utterance is funny), then we should not expect very much explicit marking.
A similar methodology was used in this study as in the research described earlier examining prosodic contrasts in verbal irony (Bryant 2010a). Humorous speech was first identified based on the presence of audience (and sometimes speaker) laughter. Following this, the speech was then divided into two segments based on the sentence structure and content: set-up, and punch line. The punch line speech was the utterance that concluded the immediate talk just before the laughter. This is not a punch line in the traditional sense of a joke, but rather the final statement in a series. The set-up speech was always related in content to the punch line, and immediately preceding it. Separating these segments can prove difficult, and is generally not obvious. Often times the whole segment was divided based on intonational phrasing. The speech immediately preceding these pairs of segments was then extracted to get comparison baseline speech. These segments were comparable in length to the set-up segments, and occurred anywhere from immediately preceding the set-up speech to up to five seconds before. Here is an example of one trio of utterances (in Portuguese with English translation in italics):
(3) Baseline: Poluicao, era poluicao, ne?
‘Pollution, it was pollution, right?’
Set-up: A( se outro acende um fogo
‘Then, if someone else starts a fire’
Punch line: eu posso acusar ele, ne?
‘I can accuse him, right?’
These three segments were then analyzed within speakers to check for prosodic contrasts in the acoustic dimensions of vocal pitch, loudness, and duration. If speakers were marking humorous utterances, this analysis should reveal it, especially between baseline segments and the other two. In this sample of 24 trios of segments, there were no differences in the rate of contrasts between any of the segment types indicating that speakers were not contrasting their prosody differently when they were using humor. But only a between speaker analysis can reveal systematic marking in any one dimension. The acoustic data were analyzed between speakers, and only a difference in overall loudness was found between baseline speech and the other two segments. Baseline speech was lower in amplitude suggesting that when speakers began saying something intended to be funny, they increased their loudness. But this is not likely due to explicit marking of humor. The room where the meetings took place was loud, given the large number of people participating (~25), the acoustic properties of the space, and the laughter often erupting as the punch line approached. The increased loudness of the set-up and punch line segments is likely due to an effort toward being heard, and not for prosodic marking of humor. Of course, this effort could be used potentially as a cue to humor, although not a particularly reliable one given the many reasons (not related to humor) speakers raise their voices. There was also a trend of speakers reducing the range of their pitch in set-up and punch line speech, further supporting the interpretation of the speech differences as effort toward increasing the signal-to- noise ratio rather than marking humor specifically. These findings are consistent with other research examining the production of punch lines in scripted jokes. Pickering et al. (2009) found that when speakers told jokes, their punch lines were not different from the speech leading up to them. There was a general decline in vocal pitch, but this was not considered to be beyond what one would expect from ordinary pitch declination in sentence production.
In the example above, it is not difficult to imagine how this could be funny to the people involved, but it certainly is not obvious to outsiders. Audience members began laughing during the set-up portion of the speech, as they likely had some idea what the speaker was going to say. But the speaker was likely communicating encrypted information that solicited the laughter from certain target audience members. Flamson (2010) found that social network proximity was related to judgments of humor in jokes - they tended to find funny the same things their friends did. So why laugh? As mentioned before, laughter could be serving multiple functions simultaneously including helping cooperators sort themselves and mutually communicate their intentions, as well as signal to others information about their bond. Studies in the field that examine not only acoustic features of real speech interactions, but also social network analyses that explore the relationships between interlocutors, can help illuminate the complex and dynamic patterns in ordinary talk. This approach provides insight into these communicative processes that traditional linguistic analyses could easily miss. Using mixed methods (e.g., acoustic analysis, social network analysis, and conversational analysis) in combination allows researchers to identify interactions that one-dimensional analytic techniques would be unable to measure.