2. Methods
2.1. Participant Speakers
Twelve adults (six men and six women) who had undergone total laryngectomy and TE puncture voice restoration served as participant speakers for this investigation. The TE speakers who participated in this study ranged in age from 58 to 71 years of age. These participants had used TE speech for a minimum period of one year and were judged by a highly experienced clinician to be excellent examples of this mode of alaryngeal speech. All of the speakers exhibited excellent speech intelligibility, self-reported that they were in good general health, and confirmed that they remained communicatively active in both their vocational and avocational endeavors.
All speakers who provided samples used TE speech as their primary method of communication, had received radiation treatment postoperatively, and were native English speakers. Speaker samples were excluded if the individual reported any history of perioperative complications following laryngectomy and TE puncture or identified other medical conditions that might affect speech, language, or hearing, including oral or pharyngeal resection, cancer recurrence or a second primary cancer, or chronic obstructive pulmonary disease, asthma, persistent swallowing difficulties, or neurological disease.
2.2. Participant Listeners
Sixteen normal-hearing young adults (eight females and eight males), ranging in age from 22 to 27 years of age (mean age = 23; 9), who were enrolled as either undergraduate or graduate students at a single institution, were recruited as listeners in this study. All listeners were considered naïve to voice disorders and alaryngeal speech as they did not have any formal exposure to postlaryngectomy speech options or any education in the area of voice disorders, voice, or speech disorders associated with head and neck cancer. All listeners were native English speakers, and none had reported any history of speech, language, or hearing concerns. Permission to conduct this study was formally granted by the Research Ethics Board at the University of Western Ontario (#104645).
2.3. Speech Stimuli
High-quality digital audio recordings of the Rainbow Passage [
28] were obtained for all 12 TE speakers from a large archival library of TE speech samples. These samples were judged by two independent professional Speech-Language Pathologists (SLPs) to not be obviously associated with either gender. Thus, these samples were selected to be representative of a larger group of TE speakers. All original speech samples were recorded using a headset microphone (Shure SM10a; Shure Incorporated, Niles, IL, USA) and either a digital minidisk (MD) research-quality recorder (Sony MZ-R55; Sony Corp., New York, NY, USA) or a digital audiotape portable recorder (Sony PCM-M1), with all recordings being obtained in a quiet experimental setting, free of ambient noise. All recordings were digital originals recorded at a sampling rate of 48 kHz.
Digital recordings were transferred to a personal computer and saved as WAV files using the acoustic software Audacity (version 2.0.6, Pittsburgh, PA, USA). Each sample was edited to extract the second sentence of the Rainbow Passage, “The rainbow is a division of white light into many beautiful colors”. Aside from the samples being edited to include this sentence exclusively, the only other editing that took place was the addition of 3 s of silence on either side of the sample sentence, to ensure that listeners could easily attend to the entire speech sample during the experimental listening tasks.
2.4. Orientation and Listening Procedure
As an a priori requirement of the experimental procedure, all listeners were required to participate in two listening sessions. Upon arrival for the first listening session, participant listeners were informed that the speech samples to which they would listen were abnormal voice samples where the quality of the voice was reduced from normal expectation. Each listener was then asked to listen to four TE speech samples (two males, two females) that were not part of the experimental stimuli. These four samples had been compiled into a single audio file so that each voice sample would play continuously, one after another. The purpose of this task was to familiarize and orient listeners to the unique types of voices on which they would soon be making judgements.
This exposure task was included to reduce any potential surprise related to the unusual and abnormal voice qualities that often characterize TE speech. By doing so, we believed that potential confounds related to less favorable ratings of early samples due to the unusual acoustic nature and listener adaptation could be reduced. While the gender of each of the four exposure samples was not specified, listeners were told that these samples would include both male and female voices. Listeners were allowed to listen to these familiarization samples as many times as they desired before formally beginning the experimental rating session.
Each listener had control over the computer mouse so that he/she could independently select and repeat the audio file as many times as desired. Listeners completed this task while seated comfortably at a personal computer (Dell, Round Rock, TX, USA), listening to the audio files via headphones (Sony MDRV-150); each listener was able to independently adjust the listening volume to a comfortable loudness level. Once the listener was ready to begin the experimental rating task, he/she was presented with a randomized playlist of the 16 TE voice samples (12 primary samples and 4 duplicates, for reliability) and a series of rating scales that were provided on paper in numerical order, 1 through 16. Two auditory-perceptual dimensions were assessed in the sessions, either “speech acceptability” (SA) or a dimension termed “listener comfort” (LC).
In the first listening session, the gender of each sample was indicated in the margin on the rating form for each sample by the letter “F” for female or “M” for male. However, in this first listening session, the gender indicated was in actuality the opposite of the speaker’s true gender. For the purposes of data analysis, this first session was identified as “Gender Opposite”. Each listener was asked to read the definition of the feature on which he/she would be making their judgments (either SA or LC) and was then asked to systematically play the list of randomized voice samples and rate each one independent of one another in a sequential manner.
2.5. Auditory-Perceptual Rating Task
For ratings of the auditory-perceptual feature SA, listeners were asked to rate a voice based on “The pitch, rate, understandability, and voice quality. In other words, is the voice pleasing to listen to or does it cause…some discomfort as a listener?” [
29]. Alternatively, ratings of the auditory-perceptual feature LC required the listener to rate samples based on “How comfortable would you feel listening to the person’s speech in a social situation?” [
30]. Each audio file was individually labeled as “Sample 1”, “Sample 2” … “Sample 16”, etc., and, accordingly, each rating scale provided was labeled with the same sample number. Upon listening to each consecutive sample, the listener bisected the line of the visual analog rating scale at the point that they believed best represented their judgment of that sample for each dimension; this procedure was followed identically for each of the randomized set of samples presented. Listeners could play each individual sample as many times as desired before making their judgment using the rating scale, but they were instructed that once a rating was made, it could not be altered later, nor could they return to past samples.
Regardless of whether SA or LC was being evaluated, each rating scale was comprised of a solid line measuring 100 mm and listeners were shown examples of how the scale was to be used. The appearance of the rating scales used were consistent with those of the Consensus Auditory-Perceptual Evaluation—Voice (CAPE-V) that is used in clinical studies and research associated with laryngeal-based voice disorders [
31]. Below each scale, descriptive indicators of “mild”, “moderate”, and “profound” were provided at approximately 25 mm, 55 mm, and 85 mm, respectively. Thus, as scores moved from left-to-right on the scale (increased) for both SA and LC, listener judgments became increasingly more favorable for either dimension; that is, higher scores indicated a more positive judgment by a listener.
Once the first auditory-perceptual dimension was rated, listeners were provided with a short break of approximately 15–20 min and then were asked to complete a second rating session, which addressed the second dimension that they had not yet completed (either SA or LC). During the rating task for the second dimension, listeners were provided with the same samples in a new randomized order. The delegation of these scales (SA and LC) was counterbalanced so that 8 of the 16 listeners rated SA first, followed by LC; conversely, the other 8 listeners rated LC first, followed by SA. Thus, controls for potential order effects related to the auditory-perceptual dimensions were considered. Once ratings of both SA and LC were completed in this session, listeners were dismissed and scheduled to return for a follow-up listening session in 7 to 14 days.
In the second experimental listening session, which occurred between 7 and 14 days after completion of the first, listeners were presented stimuli and asked to assess either SA or LC in the reverse order of that completed in the first listening session. In this follow-up listening session, however, the true gender of each speaker sample was now indicated in the margin of the rating form. For the purposes of data analysis, this second session was identified as “Gender Known”. However, it should be noted that, regardless of the session (i.e., Gender Opposite or Gender Known), listeners were led to believe that the gender indicated on the rating form was accurate. The purpose of using deception in the first listening condition (Gender Opposite) was to ascertain whether listeners would rate speaker samples differently based on a prescribed gender. In both conditions, only the examiners were aware of the true gender of the voice sample being presented.
2.6. Reliability
At the end of all listening sessions and to ensure measures of internal validity for listener ratings, 4 additional voice samples (25%) were duplicated from the 12 original voice samples. Thus, Samples 13, 14, 15, and 16 represented duplicate samples selected from those played in the first 12 randomized samples (two males and two females). Listeners were asked to provide these reliability ratings for both SA and LC judgments for all experimental sessions; thus, reliability was gathered in all four rating sessions. Ratings obtained from these duplicate measures were then compared to the first rating of each sample to evaluate the consistency of ratings. When raw data from judgments of reliability samples were compared to initial judgments of the same sample, analysis revealed that 75% of all listener reliability judgments fell within +/−10 scaled points of the original rating, indicating judgments that were highly consistent. Less than 3% of all ratings exceeded +/−15 scaled points between judgments.
2.7. Data Analysis
The statistical relationship between measures of SA and LC were determined using Pearson correlation coefficients. The relationships between the Gender Opposite and Gender Known conditions were also calculated using Pearson correlation coefficients and independent t-tests. These analyses were first completed for all listener scores combined for the entire group of speakers and then separately for male and female listener groups. A predetermined level of statistical significance (p < 0.05) was used for all analyses.
4. Discussion
The objective of this study was directed at the question of how a listener’s knowledge of a TE speaker’s gender would influence their judgments of the auditory-perceptual features of SA and LC. While concerns have evolved from the past literature around the potential for differential social penalty specific to voice quality in those who are laryngectomized, these investigations sought to assess gender differences associated with TE speech. In seeking to address this question, several variables were purposely controlled. In anticipation of the deception component of this experiment (Session 1), speaker samples were chosen based on the experimenter’s evaluation that the samples included were ambiguous and not obviously associated with either gender. Had the samples clearly sounded male or female, the deception aspect of the experiment may have become apparent to the listener, and we believe that listener ratings of female speakers would have deteriorated further. However, consideration should be given to the fact that TE samples are characterized by lower fundamental frequency and considerable noise [
2], which inherently decrease the overall feminine characteristics of the voices [
13]. Therefore, to control for variables that might influence perceptual judgements, several steps were taken as part of the experimental design, namely, the deception in Session 1.
First, four randomized lists of speaker samples were developed so that listeners were always presented with a uniquely ordered list of samples on which to complete their ratings. Secondly, the auditory-perceptual feature rated in each session was counterbalanced (i.e., if ratings of SA or LC were carried out first, the opposite feature would be rated during the second session). Efforts were made to evaluate listener judgements in the context of speaker gender, and, thus, listeners were directed to the gender identification of each of the speaker samples prior to each listening session; this was true whether judgments were being made for either the true gender identification or deception condition.
In evaluating the present data, it is important to consider what the SA ratings obtained in this study suggest. As described in the previous literature by Eadie and colleagues [
13], judgments of SA encourage the listener to identify “acceptability” as it relates to their own personal beliefs about deviation from a normal signal and potential disability [
26]. Thus, it is not unreasonable to assume that gender is a critical factor in determining a level of acceptability regardless of speaker gender. That is, listeners likely have preconceived templates of how men and women should sound, and, when intrinsic perceptual standards are violated, or thresholds challenged, the associated ratings of a sample will be altered [
8,
13]. However, in this regard and specific to the present data, it is interesting to note that penalizing judgements were more apparent for female TE speakers for SA as compared to LC.
By strict definition, there is some degree of “comfort” inherently considered in “acceptability” ratings; thus, the features may not be entirely mutually exclusive. Yet, there were differences in how these two auditory-perceptual constructs (i.e., SA and LC) were rated across listening sessions. It appears that when listeners are requested to make overall judgments about a speaker’s voice, LC may comprise one component of the rating, but more quantitative aspects of the voice, such as its pitch, noise level, etc., may further influence such overall judgments of SA [
29]. This difference between the SA scores across sessions was pronounced for the female TE speaker population, suggesting that when female TE speakers were believed to be male, listeners rated the samples as being slightly more acceptable than when the samples were known to be female.
The same trend was seen for the construct of LC; however, this difference was not found to be significantly different. When male TE speakers were analyzed independently from their female counterparts, there were no significant differences in ratings for either SA or LC. Consequently, it may be concluded that female TE speakers may face greater penalty for the unusual characteristics of their alaryngeal voices than male TE speakers when it comes to listener judgments; this finding was more prominent for SA in the present study [
32].
When
listener data were analyzed by gender, findings revealed that neither male nor female listeners rated a particular speaker gender significantly worse than the other. This suggests that auditory-perceptual ratings of male and female TE speakers are fairly consistent across a variety of listeners and cannot necessarily be predicted based on the gender of the rater. Further, it may suggest that male and female listeners have similar ideals when it comes to the SA and LC associated with listening to TE speech [
23,
24]. In this context, it must again be noted that TE voices are inherently and considerably different from that of a normal speaker. Thus, the abnormality of the voice/speech sample will be easily recognized, and a listener may work to adjust their perceptual template in order to provide a rating that may better represent the sample population being assessed—in the present case, that of TE speakers [
7,
8].
One final area that deserves comment is that related to potential acoustic markers that may distinguish TE speakers. Because the pharyngoesophageal voice source in TE speakers is not capable of adductor–abductor control, the signal generated is highly aperiodic. Thus, the noise quality of TE voices, in addition to the lowered fundamental frequency [
14,
15], may lead to misperceptions of gender that may often default to that of a male speaker. In assessing data from the present study that are reflected in
Figure 5,
Figure 6 and
Figure 7, it can be seen that there appear to be two “clusters” of speaker data represented in those graphics. This raised the question of whether those clusters might correspond to distinct groups of our male and female TE speakers. Consequently, we assessed those data in a post-hoc manner to determine where individual speakers were represented on those figures.
In viewing
Figure 5 as an example, we were able to determine that the six data points shown in the lower quadrant of that figure represent four female and two male TE speakers. In contrast, the upper quadrant of
Figure 5 represents four males and the remaining two female speakers. This would suggest that listeners were able to make independent judgments of speakers based on the dimensions assessed; however, the knowledge of speaker gender that is represented in this figure may have biased their ratings to the less favorable side of the scale for at least four of our six female speakers. Therefore, based on these data, future work that seeks to comprehensively describe the acoustic characteristics of any given TE speaker’s voice may provide valuable information that potentially guides a listener’s assessment of gender.
5. Limitations of the Present Study
Although the findings of our work provide evidence that a gender bias might exist, several limitations to our work must be noted. First, both the speaker (n = 12) and listener (n = 16) groups in this study are relatively small; thus, generalizations from the present findings must be made with a degree of caution. In this regard, future attempts at replication of this study, or similar approaches to identifying possible gender bias in postlaryngectomy speakers with a larger group of participants in both groups, would be of benefit. As sample sizes of speakers increase, one would need to be mindful of the proficiency of the speaker, as other factors such as intelligibility reductions would need to be carefully considered. Recall that our speakers were judged to be highly intelligible to reduce any potential confound specific to the listener also having challenges in understanding the speaker.
Secondly, our listener group comprised young adults. It is possible that individuals who are older, perhaps a cohort that is similar in age to the speakers in this study, could provide different findings. To our knowledge, there are no published reports that address gender bias with specific respect to the age of either the speaker(s) or listener(s). However, if older listeners are used in the future, it would be important to assess and quantify one’s hearing status, as age-related hearing loss could impact the findings. It would, therefore, appear that future studies that consider not only increasing sample size but also the age of listeners could offer findings that further validate the present results. Lastly, depending on the age of listeners, it is possible that judgments of a speaker may be penalized more or less as one sees themselves as being a “peer” relative to age of the speaker.
6. Clinical Implications
The present study sought to determine whether explicit knowledge of a TE speaker’s gender would influence the perceptual ratings assigned by naive listeners. In that respect, our study was designed to address the potential for a gender bias associated with TE voice and speech. Based on the data gathered, it is apparent that listeners rate female TE speaker samples to be more unacceptable and less comfortable to listen to when the samples are known (or are assumed) to be female speakers. While the underlying reasons for this finding remain incomplete, it would appear that expectation(s) of what might be termed “gender markers” and the general acoustic characteristics of TE speech were actively considered by the current listeners during their assessment [
5]. However, it does seem likely that a multitude of factors are involved when a listener judges a TE speaker’s voice, including its collective attributes of voice “quality” and the general comfort level associated with listening to a particular voice in a social setting and the related disability it may pose [
33]. It will be important for future research to evaluate acoustic information in conjunction with perceptual data for male and female TE speaker samples, to determine the specific parameters that may affect listener judgements. Additionally, the combined impact of visual and acoustic information is likely critical to understanding such judgments [
17].
In terms of clinical relevance, the present results are important to consider in the context of pre- and postoperative counselling for female laryngectomees, to ensure the most informed level of education in the context of postlaryngectomy rehabilitation [
34]. While there would be wide acceptance of the notion that the capacity to restore a more feminine-sounding postlaryngectomy voice would be advantageous [
35], this desire remains challenging due to the unique postsurgical anatomy that forms the new voicing source. At the very least, however, providing information on the potential limitations of TE voice and speech relative to the speaker’s gender would appear to be an important area of discussion in both pre- and postoperative counselling for those who will undergo total laryngectomy for laryngeal cancer [
20,
21,
22,
25,
35].