Auditory and Phonetic Processes in Speech Perception

A special issue of Brain Sciences (ISSN 2076-3425). This special issue belongs to the section "Neurolinguistics".

Deadline for manuscript submissions: closed (20 February 2022) | Viewed by 48145

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Department of Linguistics, University of Washington, Seattle, WA 98195, USA
Interests: phonetics; speech perception; psycholinguistics; auditory processes; acoustics

E-Mail Website
Co-Guest Editor
Department of Linguistics, University of Alberta, Edmonton, AB T6G 2R3, Canada
Interests: phonetics; speech science; psycholinguistics; spontaneous speech

Special Issue Information

Dear Colleagues,

The past two decades have seen great advances in phonetic, auditory, and psycholinguistic research on speech perception. As the fields have advanced, there has been increasing interdisciplinary collaboration across them, which has in turn revealed their interdependence. The aim of this Special Issue is to bring together top scholars in these three related areas to present research in a single volume with papers ranging from the neurophysiology of hearing, to phonetic and linguistic factors, to factors related to hearing loss and assistive devices, second language learning and accentedness, and the processing of speech and the mental lexicon. We seek cutting-edge primary research on topics related to speech perception, including prosody, reduced linguistic variants, regional accents, language development, second language learning, neurophysiology and brain function, sociolinguistics, hearing loss, hearing aids and cochlear implants, listening context, unit predictability, lexical factors, and perceptual modeling.

Prof. Dr. Richard Wright
Guest Editor

Prof. Dr. Benjamin V. Tucker
Co-Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Brain Sciences is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2200 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • phonetics
  • speech perception
  • auditory processing
  • psycholinguistics
  • hearing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (16 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 6149 KiB  
Article
Differences between Monolinguals and Bilinguals in Phonetic and Phonological Learning and the Connection with Auditory Sensory Memory
by Laura Spinu, Jiwon Hwang and Mariana Vasilita
Brain Sci. 2023, 13(3), 488; https://doi.org/10.3390/brainsci13030488 - 14 Mar 2023
Viewed by 4744
Abstract
Bilingualism has been linked with improved function regarding certain aspects of linguistic processing, e.g., novel word acquisition and learning unfamiliar sound patterns. Two non mutually-exclusive approaches might explain these results. One is related to executive function, speculating that more effective learning is achieved [...] Read more.
Bilingualism has been linked with improved function regarding certain aspects of linguistic processing, e.g., novel word acquisition and learning unfamiliar sound patterns. Two non mutually-exclusive approaches might explain these results. One is related to executive function, speculating that more effective learning is achieved through actively choosing relevant information while inhibiting potentially interfering information. While still controversial, executive function enhancements attributed to bilingual experience have been reported for decades. The other approach, understudied to date, emphasizes the role of sensory mechanisms, specifically auditory sensory memory. Bilinguals outperformed monolinguals in tasks involving auditory processing and episodic memory recall, but the questions whether (1) bilinguals’ auditory sensory memory skills are also enhanced, and (2) phonetic skill and auditory sensory memory are correlated, remain open, however. Our study is innovative in investigating phonetic learning skills and auditory sensory memory in the same speakers from two groups: monolinguals and early bilinguals. The participants were trained and tested on an artificial accent of English and their auditory sensory memory was assessed based on a digit span task. The results demonstrated that, compared to monolinguals, bilinguals exhibit enhanced auditory sensory memory and phonetic and phonological learning skill, and a correlation exists between them. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

16 pages, 1575 KiB  
Article
Increased Pre-Boundary Lengthening Does Not Enhance Implicit Intonational Phrase Perception in European Portuguese: An EEG Study
by Ana Rita Batista, Dinis Catronas, Vasiliki Folia and Susana Silva
Brain Sci. 2023, 13(3), 441; https://doi.org/10.3390/brainsci13030441 - 4 Mar 2023
Viewed by 1713
Abstract
Prosodic phrasing is the segmentation of utterances into prosodic words, phonological phrases (smaller units) and intonational phrases (larger units) based on acoustic cues—pauses, pitch changes and pre-boundary lengthening. The perception of prosodic boundaries is characterized by a positive event-related potential (ERP) component, temporally [...] Read more.
Prosodic phrasing is the segmentation of utterances into prosodic words, phonological phrases (smaller units) and intonational phrases (larger units) based on acoustic cues—pauses, pitch changes and pre-boundary lengthening. The perception of prosodic boundaries is characterized by a positive event-related potential (ERP) component, temporally aligned with phrase boundaries—the Closure Positive Shift (CPS). The role of pre-boundary lengthening in boundary perception is still a matter of debate: while studies on phonological phrase boundaries indicate that all three cues contribute equally, approaches to intonational phrase boundaries highlight the pause as the most powerful cue. Moreover, all studies used explicit boundary recognition tasks, and it is unknown how pre-boundary lengthening works in implicit prosodic processing tasks, characteristic of real-life contexts. In this study, we examined the effects of pre-boundary lengthening (original, short, and long) on the EEG responses to intonational phrase boundaries (CPS effect) in European Portuguese, using an implicit task. Both original and short versions showed equivalent CPS effects, while the long set did not elicit the effect. This suggests that pre-boundary lengthening does not contribute to improved perception of boundaries in intonational phrases (longer units), possibly due to memory and attention-related constraints. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Graphical abstract

14 pages, 1415 KiB  
Article
Explaining L2 Lexical Learning in Multiple Scenarios: Cross-Situational Word Learning in L1 Mandarin L2 English Speakers
by Paola Escudero, Eline A. Smit and Karen E. Mulak
Brain Sci. 2022, 12(12), 1618; https://doi.org/10.3390/brainsci12121618 - 25 Nov 2022
Cited by 9 | Viewed by 2171
Abstract
Adults commonly struggle with perceiving and recognizing the sounds and words of a second language (L2), especially when the L2 sounds do not have a counterpart in the learner’s first language (L1). We examined how L1 Mandarin L2 English speakers learned pseudo English [...] Read more.
Adults commonly struggle with perceiving and recognizing the sounds and words of a second language (L2), especially when the L2 sounds do not have a counterpart in the learner’s first language (L1). We examined how L1 Mandarin L2 English speakers learned pseudo English words within a cross-situational word learning (CSWL) task previously presented to monolingual English and bilingual Mandarin-English speakers. CSWL is ambiguous because participants are not provided with direct mappings of words and object referents. Rather, learners discern word-object correspondences through tracking multiple co-occurrences across learning trials. The monolinguals and bilinguals tested in previous studies showed lower performance for pseudo words that formed vowel minimal pairs (e.g., /dit/-/dɪt/) than pseudo word which formed consonant minimal pairs (e.g., /bɔn/-/pɔn/) or non-minimal pairs which differed in all segments (e.g., /bɔn/-/dit/). In contrast, L1 Mandarin L2 English listeners struggled to learn all word pairs. We explain this seemingly contradicting finding by considering the multiplicity of acoustic cues in the stimuli presented to all participant groups. Stimuli were produced in infant-directed-speech (IDS) in order to compare performance by children and adults and because previous research had shown that IDS enhances L1 and L2 acquisition. We propose that the suprasegmental pitch variation in the vowels typical of IDS stimuli might be perceived as lexical tone distinctions for tonal language speakers who cannot fully inhibit their L1 activation, resulting in high lexical competition and diminished learning during an ambiguous word learning task. Our results are in line with the Second Language Linguistic Perception (L2LP) model which proposes that fine-grained acoustic information from multiple sources and the ability to switch between language modes affects non-native phonetic and lexical development. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

26 pages, 1413 KiB  
Article
Native Listeners’ Use of Information in Parsing Ambiguous Casual Speech
by Natasha Warner, Dan Brenner, Benjamin V. Tucker and Mirjam Ernestus
Brain Sci. 2022, 12(7), 930; https://doi.org/10.3390/brainsci12070930 - 15 Jul 2022
Cited by 2 | Viewed by 2218
Abstract
In conversational speech, phones and entire syllables are often missing. This can make “he’s” and “he was” homophonous, realized for example as [ɨz]. Similarly, “you’re” and “you were” can both be realized as [jɚ], [ɨ], etc. We investigated what types of information native [...] Read more.
In conversational speech, phones and entire syllables are often missing. This can make “he’s” and “he was” homophonous, realized for example as [ɨz]. Similarly, “you’re” and “you were” can both be realized as [jɚ], [ɨ], etc. We investigated what types of information native listeners use to perceive such verb tenses. Possible types included acoustic cues in the phrase (e.g., in “he was”), the rate of the surrounding speech, and syntactic and semantic information in the utterance, such as the presence of time adverbs such as “yesterday” or other tensed verbs. We extracted utterances such as “So they’re gonna have like a random roommate” and “And he was like, ‘What’s wrong?!’” from recordings of spontaneous conversations. We presented parts of these utterances to listeners, in either a written or auditory modality, to determine which types of information facilitated listeners’ comprehension. Listeners rely primarily on acoustic cues in or near the target words rather than meaning and syntactic information in the context. While that information also improves comprehension in some conditions, the acoustic cues in the target itself are strong enough to reverse the percept that listeners gain from all other information together. Acoustic cues override other information in comprehending reduced productions in conversational speech. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Graphical abstract

13 pages, 438 KiB  
Article
Adaptation to Social-Linguistic Associations in Audio-Visual Speech
by Molly Babel
Brain Sci. 2022, 12(7), 845; https://doi.org/10.3390/brainsci12070845 - 28 Jun 2022
Cited by 4 | Viewed by 1965
Abstract
Listeners entertain hypotheses about how social characteristics affect a speaker’s pronunciation. While some of these hypotheses may be representative of a demographic, thus facilitating spoken language processing, others may be erroneous stereotypes that impede comprehension. As a case in point, listeners’ stereotypes of [...] Read more.
Listeners entertain hypotheses about how social characteristics affect a speaker’s pronunciation. While some of these hypotheses may be representative of a demographic, thus facilitating spoken language processing, others may be erroneous stereotypes that impede comprehension. As a case in point, listeners’ stereotypes of language and ethnicity pairings in varieties of North American English can improve intelligibility and comprehension, or hinder these processes. Using audio-visual speech this study examines how listeners adapt to speech in noise from four speakers who are representative of selected accent-ethnicity associations in the local speech community: an Asian English-L1 speaker, a white English-L1 speaker, an Asian English-L2 speaker, and a white English-L2 speaker. The results suggest congruent accent-ethnicity associations facilitate adaptation, and that the mainstream local accent is associated with a more diverse speech community. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

16 pages, 1495 KiB  
Article
The Role of the Root in Spoken Word Recognition in Hebrew: An Auditory Gating Paradigm
by Marina Oganyan and Richard A. Wright
Brain Sci. 2022, 12(6), 750; https://doi.org/10.3390/brainsci12060750 - 7 Jun 2022
Cited by 1 | Viewed by 2586
Abstract
Very few studies have investigated online spoken word recognition in templatic languages. In this study, we investigated both lexical (neighborhood density and frequency) and morphological (role of root morpheme) aspects of spoken word recognition of Hebrew, a templatic language, using the traditional gating [...] Read more.
Very few studies have investigated online spoken word recognition in templatic languages. In this study, we investigated both lexical (neighborhood density and frequency) and morphological (role of root morpheme) aspects of spoken word recognition of Hebrew, a templatic language, using the traditional gating paradigm. Additionally, we compared the traditional gating paradigm with a novel, phoneme-based gating paradigm. The phoneme-based approach allows for better control of information available at each gate. We found lexical effects with high-frequency words and low neighborhood density words being recognized at earlier gates. We also found that earlier access to root-morpheme information enabled word recognition at earlier gates. Finally, we showed that both the traditional gating paradigm and gating by phoneme paradigm yielded equivalent results. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

14 pages, 436 KiB  
Article
Relating Suprathreshold Auditory Processing Abilities to Speech Understanding in Competition
by Frederick J. Gallun, Laura Coco, Tess K. Koerner, E. Sebastian Lelo de Larrea-Mancera, Michelle R. Molis, David A. Eddins and Aaron R. Seitz
Brain Sci. 2022, 12(6), 695; https://doi.org/10.3390/brainsci12060695 - 27 May 2022
Cited by 6 | Viewed by 3230
Abstract
(1) Background: Difficulty hearing in noise is exacerbated in older adults. Older adults are more likely to have audiometric hearing loss, although some individuals with normal pure-tone audiograms also have difficulty perceiving speech in noise. Additional variables also likely account for speech understanding [...] Read more.
(1) Background: Difficulty hearing in noise is exacerbated in older adults. Older adults are more likely to have audiometric hearing loss, although some individuals with normal pure-tone audiograms also have difficulty perceiving speech in noise. Additional variables also likely account for speech understanding in noise. It has been suggested that one important class of variables is the ability to process auditory information once it has been detected. Here, we tested a set of these “suprathreshold” auditory processing abilities and related them to performance on a two-part test of speech understanding in competition with and without spatial separation of the target and masking speech. Testing was administered in the Portable Automated Rapid Testing (PART) application developed by our team; PART facilitates psychoacoustic assessments of auditory processing. (2) Methods: Forty-one individuals (average age 51 years), completed assessments of sensitivity to temporal fine structure (TFS) and spectrotemporal modulation (STM) detection via an iPad running the PART application. Statistical models were used to evaluate the strength of associations between performance on the auditory processing tasks and speech understanding in competition. Age and pure-tone-average (PTA) were also included as potential predictors. (3) Results: The model providing the best fit also included age and a measure of diotic frequency modulation (FM) detection but none of the other potential predictors. However, even the best fitting models accounted for 31% or less of the variance, supporting work suggesting that other variables (e.g., cognitive processing abilities) also contribute significantly to speech understanding in noise. (4) Conclusions: The results of the current study do not provide strong support for previous suggestions that suprathreshold processing abilities alone can be used to explain difficulties in speech understanding in competition among older adults. This discrepancy could be due to the speech tests used, the listeners tested, or the suprathreshold tests chosen. Future work with larger numbers of participants is warranted, including a range of cognitive tests and additional assessments of suprathreshold auditory processing abilities. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

29 pages, 4450 KiB  
Article
Social Priming in Speech Perception: Revisiting Kangaroo/Kiwi Priming in New Zealand English
by Gia Hurring, Jennifer Hay, Katie Drager, Ryan Podlubny, Laura Manhire and Alix Ellis
Brain Sci. 2022, 12(6), 684; https://doi.org/10.3390/brainsci12060684 - 24 May 2022
Cited by 7 | Viewed by 3094
Abstract
We investigate whether regionally-associated primes can affect speech perception in two lexical decision tasks in which New Zealand listeners were exposed to an Australian prime (a kangaroo), a New Zealand prime (a kiwi), and/or a control animal (a horse). The target stimuli involve [...] Read more.
We investigate whether regionally-associated primes can affect speech perception in two lexical decision tasks in which New Zealand listeners were exposed to an Australian prime (a kangaroo), a New Zealand prime (a kiwi), and/or a control animal (a horse). The target stimuli involve ambiguous vowels, embedded in a frame that would result in a real word with a KIT or a DRESS vowel and a nonsense word with the alternative vowel; thus, lexical decision responses can reveal which vowel was heard. Our pre-registered design predicted that exposure to the kangaroo would elicit more KIT-consistent responses than exposure to the kiwi. Both experiments showed significant priming effects in which the kangaroo elicited more KIT-consistent responses than the kiwi. The particular locus and details of these effects differed across experiments and participants. Taken together, the experiments reinforce the finding that regionally-associated primes can affect speech perception, but also suggest that the effects are sensitive to experimental design, stimulus acoustics, and individuals’ production and past experience. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

29 pages, 895 KiB  
Article
DIANA, a Process-Oriented Model of Human Auditory Word Recognition
by Louis ten Bosch, Lou Boves and Mirjam Ernestus
Brain Sci. 2022, 12(5), 681; https://doi.org/10.3390/brainsci12050681 - 23 May 2022
Cited by 9 | Viewed by 2358
Abstract
This article presents DIANA, a new, process-oriented model of human auditory word recognition, which takes as its input the acoustic signal and can produce as its output word identifications and lexicality decisions, as well as reaction times. This makes it possible to [...] Read more.
This article presents DIANA, a new, process-oriented model of human auditory word recognition, which takes as its input the acoustic signal and can produce as its output word identifications and lexicality decisions, as well as reaction times. This makes it possible to compare its output with human listeners’ behavior in psycholinguistic experiments. DIANA differs from existing models in that it takes more available neuro-physiological evidence on speech processing into account. For instance, DIANA accounts for the effect of ambiguity in the acoustic signal on reaction times following the Hick–Hyman law and it interprets the acoustic signal in the form of spectro-temporal receptive fields, which are attested in the human superior temporal gyrus, instead of in the form of abstract phonological units. The model consists of three components: activation, decision and execution. The activation and decision components are described in detail, both at the conceptual level (in the running text) and at the computational level (in the Appendices). While the activation component is independent of the listener’s task, the functioning of the decision component depends on this task. The article also describes how DIANA could be improved in the future in order to even better resemble the behavior of human listeners. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

16 pages, 1569 KiB  
Article
Learning to Perceive Non-Native Tones via Distributional Training: Effects of Task and Acoustic Cue Weighting
by Liquan Liu, Chi Yuan, Jia Hoong Ong, Alba Tuninetti, Mark Antoniou, Anne Cutler and Paola Escudero
Brain Sci. 2022, 12(5), 559; https://doi.org/10.3390/brainsci12050559 - 27 Apr 2022
Cited by 3 | Viewed by 2751
Abstract
As many distributional learning (DL) studies have shown, adult listeners can achieve discrimination of a difficult non-native contrast after a short repetitive exposure to tokens falling at the extremes of that contrast. Such studies have shown using behavioural methods that a short distributional [...] Read more.
As many distributional learning (DL) studies have shown, adult listeners can achieve discrimination of a difficult non-native contrast after a short repetitive exposure to tokens falling at the extremes of that contrast. Such studies have shown using behavioural methods that a short distributional training can induce perceptual learning of vowel and consonant contrasts. However, much less is known about the neurological correlates of DL, and few studies have examined non-native lexical tone contrasts. Here, Australian-English speakers underwent DL training on a Mandarin tone contrast using behavioural (discrimination, identification) and neural (oddball-EEG) tasks, with listeners hearing either a bimodal or a unimodal distribution. Behavioural results show that listeners learned to discriminate tones after both unimodal and bimodal training; while EEG responses revealed more learning for listeners exposed to the bimodal distribution. Thus, perceptual learning through exposure to brief sound distributions (a) extends to non-native tonal contrasts, and (b) is sensitive to task, phonetic distance, and acoustic cue-weighting. Our findings have implications for models of how auditory and phonetic constraints influence speech learning. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

13 pages, 2104 KiB  
Article
Neural–Behavioral Relation in Phonetic Discrimination Modulated by Language Background
by Tian Christina Zhao
Brain Sci. 2022, 12(4), 461; https://doi.org/10.3390/brainsci12040461 - 29 Mar 2022
Cited by 3 | Viewed by 2179
Abstract
It is a well-demonstrated phenomenon that listeners can discriminate native phonetic contrasts better than nonnative ones. Recent neuroimaging studies have started to reveal the underlying neural mechanisms. By focusing on the mismatch negativity/response (MMN/R), a widely studied index of neural sensitivity to sound [...] Read more.
It is a well-demonstrated phenomenon that listeners can discriminate native phonetic contrasts better than nonnative ones. Recent neuroimaging studies have started to reveal the underlying neural mechanisms. By focusing on the mismatch negativity/response (MMN/R), a widely studied index of neural sensitivity to sound change, researchers have observed larger MMNs for native contrasts than for nonnative ones in EEG, but also a more focused and efficient neural activation pattern for native contrasts in MEG. However, direct relations between behavioral discrimination and MMN/R are rarely reported. In the current study, 15 native English speakers and 15 native Spanish speakers completed both a behavioral discrimination task and a separate MEG recording to measure MMR to a VOT-based speech contrast (i.e., pre-voiced vs. voiced stop consonant), which represents a phonetic contrast native to Spanish speakers but is nonnative to English speakers. At the group level, English speakers exhibited significantly lower behavioral sensitivity (d’) to the contrast but a more expansive MMR, replicating previous studies. Across individuals, a significant relation between behavioral sensitivity and the MMR was only observed in the Spanish group. Potential differences in the mechanisms underlying behavioral discrimination for the two groups are discussed. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

12 pages, 1103 KiB  
Article
Just-Noticeable Differences of Fundamental Frequency Change in Mandarin-Speaking Children with Cochlear Implants
by Wanting Huang, Lena L. N. Wong and Fei Chen
Brain Sci. 2022, 12(4), 443; https://doi.org/10.3390/brainsci12040443 - 26 Mar 2022
Cited by 12 | Viewed by 2631
Abstract
Fundamental frequency (F0) provides the primary acoustic cue for lexical tone perception in tonal languages but remains poorly represented in cochlear implant (CI) systems. Currently, there is still a lack of understanding of sensitivity to F0 change in CI users who speak tonal [...] Read more.
Fundamental frequency (F0) provides the primary acoustic cue for lexical tone perception in tonal languages but remains poorly represented in cochlear implant (CI) systems. Currently, there is still a lack of understanding of sensitivity to F0 change in CI users who speak tonal languages. In the present study, just-noticeable differences (JNDs) of F0 contour and F0 level changes in Mandarin-speaking children with CIs were measured and compared with those in their age-matched normal-hearing (NH) peers. Results showed that children with CIs demonstrated significantly larger JND of F0 contour (JND-C) change and F0 level (JND-L) change compared to NH children. Further within-group comparison revealed that the JND-C change was significantly smaller than the JND-L change among children with CIs, whereas the opposite pattern was observed among NH children. No significant correlations were seen between JND-C change/JND-L change and age at implantation /duration of CI use. The contrast between children with CIs and NH children in sensitivity to F0 contour and F0 level change suggests different mechanisms of F0 processing in these two groups as a result of different hearing experiences. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

15 pages, 1627 KiB  
Article
Phonetic Effects in the Perception of VOT in a Prevoicing Language
by Viktor Kharlamov
Brain Sci. 2022, 12(4), 427; https://doi.org/10.3390/brainsci12040427 - 23 Mar 2022
Cited by 1 | Viewed by 3312
Abstract
Previous production studies have reported differential amounts of closure voicing in plosives depending on the location of the oral constriction (anterior vs. posterior), vocalic context (high vs. low vowels), and speaker sex. Such differences have been attributed to the aerodynamic factors related to [...] Read more.
Previous production studies have reported differential amounts of closure voicing in plosives depending on the location of the oral constriction (anterior vs. posterior), vocalic context (high vs. low vowels), and speaker sex. Such differences have been attributed to the aerodynamic factors related to the configuration of the cavity behind the oral constriction, with certain articulations and physiological characteristics of the speaker facilitating vocal fold vibration during closure. The current study used perceptual identification tasks to examine whether similar effects of consonantal posteriority, adjacent vowel height, and speaker sex exist in the perception of voicing. The language of investigation was Russian, a prevoicing language that uses negative VOT to signal the voicing contrast in plosives. The study used both original and resynthesized tokens for speaker sex, which allowed it to focus on the role of differences in VOT specifically. Results indicated that listeners’ judgments were significantly affected by consonantal place of articulation, with listeners accepting less voicing in velar plosives. Speaker sex showed only a marginally significant difference in the expected direction, and vowel height had no effect on perceptual responses. These findings suggest that certain phonetic factors can affect both the initial production and subsequent perception of closure voicing. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

20 pages, 1477 KiB  
Article
Computational Modelling of Tone Perception Based on Direct Processing of f0 Contours
by Yue Chen, Yingming Gao and Yi Xu
Brain Sci. 2022, 12(3), 337; https://doi.org/10.3390/brainsci12030337 - 2 Mar 2022
Cited by 5 | Viewed by 3005
Abstract
It has been widely assumed that in speech perception it is imperative to first detect a set of distinctive properties or features and then use them to recognize phonetic units like consonants, vowels, and tones. Those features can be auditory cues or articulatory [...] Read more.
It has been widely assumed that in speech perception it is imperative to first detect a set of distinctive properties or features and then use them to recognize phonetic units like consonants, vowels, and tones. Those features can be auditory cues or articulatory gestures, or a combination of both. There have been no clear demonstrations of how exactly such a two-phase process would work in the perception of continuous speech, however. Here we used computational modelling to explore whether it is possible to recognize phonetic categories from syllable-sized continuous acoustic signals of connected speech without intermediate featural representations. We used Support Vector Machine (SVM) and Self-organizing Map (SOM) to simulate tone perception in Mandarin, by either directly processing f0 trajectories, or extracting various tonal features. The results show that direct tone recognition not only yields better performance than any of the feature extraction schemes, but also requires less computational power. These results suggest that prior extraction of features is unlikely the operational mechanism of speech perception. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

11 pages, 1984 KiB  
Article
Perceived Anger in Clear and Conversational Speech: Contributions of Age and Hearing Loss
by Shae D. Morgan, Sarah Hargus Ferguson, Ashton D. Crain and Skyler G. Jennings
Brain Sci. 2022, 12(2), 210; https://doi.org/10.3390/brainsci12020210 - 2 Feb 2022
Cited by 1 | Viewed by 2141
Abstract
A previous investigation demonstrated differences between younger adult normal-hearing listeners and older adult hearing-impaired listeners in the perceived emotion of clear and conversational speech. Specifically, clear speech sounded angry more often than conversational speech for both groups, but the effect was smaller for [...] Read more.
A previous investigation demonstrated differences between younger adult normal-hearing listeners and older adult hearing-impaired listeners in the perceived emotion of clear and conversational speech. Specifically, clear speech sounded angry more often than conversational speech for both groups, but the effect was smaller for the older listeners. These listener groups differed by two confounding factors, age (younger vs. older adults) and hearing status (normal vs. impaired). The objective of the present study was to evaluate the contributions of aging and hearing loss to the reduced perception of anger in older adults with hearing loss. We investigated perceived anger in clear and conversational speech in younger adults with and without a simulated age-related hearing loss, and in older adults with normal hearing. Younger adults with simulated hearing loss performed similarly to normal-hearing peers, while normal-hearing older adults performed similarly to hearing-impaired peers, suggesting that aging was the primary contributor to the decreased anger perception seen in previous work. These findings confirm reduced anger perception for older adults compared to younger adults, though the significant speaking style effect—regardless of age and hearing status—highlights the need to identify methods of producing clear speech that is emotionally neutral or positive. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

20 pages, 1182 KiB  
Article
What Do Cognitive Networks Do? Simulations of Spoken Word Recognition Using the Cognitive Network Science Approach
by Michael S. Vitevitch and Gavin J. D. Mullin
Brain Sci. 2021, 11(12), 1628; https://doi.org/10.3390/brainsci11121628 - 10 Dec 2021
Cited by 7 | Viewed by 3659
Abstract
Cognitive network science is an emerging approach that uses the mathematical tools of network science to map the relationships among representations stored in memory to examine how that structure might influence processing. In the present study, we used computer simulations to compare the [...] Read more.
Cognitive network science is an emerging approach that uses the mathematical tools of network science to map the relationships among representations stored in memory to examine how that structure might influence processing. In the present study, we used computer simulations to compare the ability of a well-known model of spoken word recognition, TRACE, to the ability of a cognitive network model with a spreading activation-like process to account for the findings from several previously published behavioral studies of language processing. In all four simulations, the TRACE model failed to retrieve a sufficient number of words to assess if it could replicate the behavioral findings. The cognitive network model successfully replicated the behavioral findings in Simulations 1 and 2. However, in Simulation 3a, the cognitive network did not replicate the behavioral findings, perhaps because an additional mechanism was not implemented in the model. However, in Simulation 3b, when the decay parameter in spreadr was manipulated to model this mechanism the cognitive network model successfully replicated the behavioral findings. The results suggest that models of cognition need to take into account the multi-scale structure that exists among representations in memory, and how that structure can influence processing. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

Back to TopTop