Gender Perception of Speech: Dependence on Formant Space Configuration, Fundamental Frequency, and Source Spectral Tilt
TJ Neuhaus, MS, PhD Student, Ronald C. Scherer, PhD
Abstract:
Objective: To explore how listeners use three aspects of the acoustic signal to determine speaker gender.
Methods: The software Madde, Praat, and Audacity were used to synthesize 210 sound files. The 210 files are the combinations of seven “formant space configurations” (FSC), 10 values for fundamental frequency, and three values for source spectral tilt. Each formant space configuration is the set containing the vowels /i, æ, ɑ, u/ and is based on average values for formant frequencies published in the literature. The lowest formant space configuration (FSC 1 in the figure below) is based on values for formant frequencies for the four vowels that are male-typical, the highest formant space configuration (FSC 7) is based on values for formant frequencies for the four vowels that are female-typical, and the remaining five formant space configurations of the four vowels are spaced in between using semitones. For fundamental frequency, the lowest value is male-typical, the highest value is female-typical, and the remaining eight values are evenly-spaced in between using semitones. The three values for source spectral tilt are -18 dB/oct, -14 dB/oct, and -10 dB/oct, which are approximate values for the voice qualities of breathy, normal, and pressed. The listeners are asked to rate the “speaker” of each synthesized vowel set as either male or female. The experiment has been performed on two individuals to guarantee methodology and provide preliminary results. Approximately 10 males and 10 females will be recruited this semester and the project will be completed by March 2020.
Results: Three main results are evident from the pilot study conducted with one male and one female subject using half (105) of the sound files covering the full ranges of formant space configurations, fundamental frequency, and source spectral tilt. First, increases in either formant space configuration (Figure 1) or fundamental frequency (Figure 1) were positively related to increases in the response of “female” (in the figure, zero represents all male choices). Second, increasing both formant space configuration and fundamental frequency together (Figure 1) was positively related to a higher response of “female.” Third, increases in the steepness of the source spectral tilt (Figure 2) were positively related to a higher response of “female” only at the gender-ambiguous fundamental frequency of 166.78 Hz.
Conclusions: Listeners use both fundamental frequency and formant frequencies to infer speaker gender, and investigation of the salience of source spectral tilt as a cue to speaker gender will continue. The results of this study will increase our understanding of how listeners use aspects of the acoustic speech signal to infer speaker gender. This understanding may guide transgender clients in modifying their communication to better reflect their gender, as well as address other perceptual concepts.
TJ Neuhaus, B.S.; Graduate Student; Department of Communication Sciences and Disorders, Bowling Green State University; 200 Health and Human Services Building, Bowling Green, OH 43403; 1-(616)-730-3280; tjneuha@bgsu.edu
Ronald C. Scherer, Ph.D.; Distinguished Research Professor; Department of Communication Sciences and Disorders, Bowling Green State University; 200 Health and Human Services Building, Bowling Green, OH 43403; 1-(419)-372-7189; ronalds@bgsu.edu
Ещё видео!