Testing in Noise Based on the First Adaptive Matrix Sentence Test in Slovak Language

Kiktová, Eva; Sock, Rudolph; Getlík, Peter

doi:10.3390/electronics13030602

Open AccessArticle

Testing in Noise Based on the First Adaptive Matrix Sentence Test in Slovak Language

by

Eva Kiktová

^1,*,†

,

Rudolph Sock

^1,2

and

Peter Getlík

¹

Department of Slovak Studies, Slavonic Philologies, and Communication, Faculty of Arts, Pavol Jozef Šafárik University in Košice, 04001 Košice, Slovakia

²

Linguistique, Langues et Parole (LiLPa)—UR 1339, University of Strasbourg, 67084 Strasbourg, France

^*

Author to whom correspondence should be addressed.

^†

Current address: Language, Information and Communication Laboratory, Pavol Jozef Šafárik University in Košice, Moyzesova 9, 04001 Košice, Slovakia.

Electronics 2024, 13(3), 602; https://doi.org/10.3390/electronics13030602

Submission received: 4 January 2024 / Revised: 26 January 2024 / Accepted: 30 January 2024 / Published: 1 February 2024

(This article belongs to the Special Issue Modeling of Multimodal Speech Recognition and Language Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This study deals with an acoustic perceptual test performed on the basis of adaptive matrix tests, which represent a modern and reliable tool that can be used not only in perceptual phonetics but also for detecting problems related to hearing. The tests used, based on the first Slovak adaptive matrix, provided extensive test material, which was evaluated through a series of tests implemented according to ICRA (International Collegium of Rehabilitative Audiology) guidelines. Healthy listeners took part in the tests, and, during the tests, they listened to prepared sentence stimuli simultaneously with noise. Out of a total number of 30 tests, 15 tests met the demanding criteria. The tests were evaluated from the point of view of the word recognition score, the slope of the psychometric curve function, and also the threshold values corresponding to word recognition at the levels of 20%, 50%, and 80%. We also investigated and compared the impact of two different testing strategies (open and closed test format) and also the impact of experience or unfamiliarity with the test routine used. The created tests achieved SRT50 = −7.03 ± 0.79 dB and a slope of 13.13 ± 1.60%/dB.

Keywords:

Slovak; adaptive matrix; hearing test; noise

1. Introduction

Speech audiometry is one of the basic examination methods used for hearing diagnosis. From the point of view of the test routine [1], sound stimuli play a key role, which can have different forms (nonsense syllables [2,3], numbers or digits [4,5], words [6,7], or sentences [8,9]), as well as presentation level, type of interfering noise, range of the signal-to-noise ratio, and other details of the test procedure [1]. Adaptive matrix tests have their justification not only in audiometry, which primarily deals with measuring the hearing ability of individuals, but the methodology of their development and evaluation of results find application in various scientific fields (perceptual phonetics, comprehension testing, communication systems, and psychoacoustics) where there is a requirement for accurate and repeated measurements. Tests created on the basis of an adaptive matrix contain a phonetically balanced, simple, and frequently used vocabulary and are further characterized by the same level of difficulty, so it does not matter which test list is used because the difficulty of the test is always the same. Of course, the test can be adaptively changed according to the specific needs of a particular patient. These key features also make such tests an important supportive diagnostic tool in the long-term therapy/monitoring of patients with hearing impairment. Adaptive test matrices have their origins in the work of Hagerman [10].

Matrix tests are characterized by relevant vocabulary, fast and reliable measurement, and the possibility of practically unlimited testing using a set of 50 words. The predictability of the test content is minimal compared to a test containing “everyday life” sentences. This type of test is suitable for use with any degree of hearing loss and, unlike tone audiometry, reflects the real ability of speech perception. Noise plays a significant role in testing as normal communication rarely takes place in ideal acoustic conditions (without distracting background noises). The presence of noise considerably complicates speech understanding, especially in the case of hearing-impaired people, so diagnosis and subsequent rehabilitation should include hearing measurement in the presence of noise. Matrix tests were first compiled by Hagerman for Swedish language [10]. Later, they were created for other languages, e.g., German [11,12], Danish [13], Mandarin Chinese [14], English [15], Polish [16], Spanish [17], French [18], Dutch [19], Finnish [20], Italian [21,22], Russian [23], Turkish [24], etc. The international matrix tests are now created in 19 different languages (see [25] for a complete list), covering over 60% of the world’s population.

The stable syntactic structure of five-word sentences and the method of creating and evaluating tests help to compare the achieved results with results in other languages. Such tests can be potentially beneficial in the development and testing of compensatory aids due to a uniform test type available in multiple language mutations.

The process of introducing new tests into practice is usually preceded by testing on healthy listeners in order to obtain reference data (about how an average healthy person hears), and then the measurement is carried out on hearing-impaired people.

Our motivation behind this work is to evaluate the tests based on the Slovak adaptive matrix and find out if there are tests with suitable statistical properties, thanks to which they could find practical application in hearing examination for Slovak-speaking patients.

2. Slovak Adaptive Matrix Test

The main difference between audiometric testing using a large database of test sentences (e.g., in the range of 600 sentences) and using a matrix composed of 50 words is that, in the first method of testing, a high recognition accuracy is achieved and the wide-spectrum properties of the language are consistently respected. However, these positive factors can be negated by the limited vocabulary of the patient and his/her intellect. In the second method—in matrix tests—the measurement is more focused on speech recognition and the special characteristics of the test subject are suppressed [17]. Matrix tests are particularly suitable for cross-language comparisons in audiometric and clinical research [26].

The matrix presented in Table 1 contains 10 proper names, 10 verbs in the present tense, 10 numerals, 10 adjectives, and 10 objects. By choosing one word from each named category, a total of 100,000 unique sentences can be constructed. The category of proper names consists of five female and five male names that are very frequent in Slovak according to the Slovak National Corpus [27]. In the study [28] dedicated to the proposal of an adaptive matrix in the Slovak language, the verbs were “waits”, “holds”, and “takes”; in this new version of the matrix, more suitable alternatives have been found in the form of the words “buys”, “finds”, and “doesn’t have”. In the category of numerals, there was no change compared to the original version of the matrix. In the category of adjectives, the original word "bad" was replaced by the word “cheap” as we wanted to eliminate the expressive undertone of this word. In the last category of words, there were several replacements. The words originally used, “bridges”, “rooms”, “lamps”, and “buildings”, were replaced by new ones: “apartments”, “bowls”, “newspapers”, and “gifts”. The new objects correspond better with the selected verbs and thus make it possible to create more plausible sentence constructions. In the process of matrix creation, we tried as much as possible to apply the criterion of logic so that, upon listening and subsequent comprehension, confusing or surprising sentence constructions would not arise, which from the listener’s point of view could be evaluated as false or strange. Therefore, he/she could prefer not to mark the content heard. Thus, new words logically suit the content of sentences more, they are more neutral in meaning, and, at the same time, they have a satisfactory acoustic content from the point of view of phoneme distribution.

Due to the nature of the matrix test, some deviations were expected, but they are easily justified. In the Slovak language, there are five short vowel phonemes /i, e, a, o, u/, five long vowel phonemes /i:, e:, a:, o:, u:/, four diphthongs /ia, ie, iu, uo/, and 27 consonant phonemes /p, b, m, f, v, t, d, n, l, r, s, z, ts, dz, c, Ɉ, ɲ, ʎ, ʃ, ʒ, ʧ, ʤ, j, k, g, x, h/ [28]. The graphic representation of the frequency of the occurrence of the Slovak vowel and consonant phonemes is depicted in Figure 1.

Figure 1 indicates higher occurrence of the phonemes /i:/, /v/, and /x/ in the matrix sentences. This discrepancy can be explained by the structure of the sentences: each adjective is in the plural accusative form ending with the suffix -ý(í)ch /i:x/ and all four masculine nouns are in the plural genitive form ending with the /ov/ (“darov”, “bytov”, “nožov”, and “domov”). This contributed to the overuse of the /i:/, /v/, and /x/ phonemes.

Recording and Editing of the Speech Material

When constructing the basic set of sentences, we focused on the need to record all combinations of inter-word transients to ensure the fluency and naturalness of the constructed sentences. Hence, a basic set consisting of 100 sentences was recorded. Each word appears exactly 10 times in this set of sentences. The recording was carried out in the recording studio of LICOLAB (Faculty of Arts, Pavol Jozef Šafárik University in Košice, Slovakia) using professional recording equipment. The speaker was a 27-year-old woman whose native language is Slovak. Her speech was characterized by naturalness, without any abnormalities that would affect the resulting quality of the recorded signal. The recording lasted about 5 h, during which there was time for the speaker to rest and also to listen to the emerging recordings of the sentences and to re-record, if needed, in case of suspicion or the occurrence of an unwanted phenomenon (e.g., a change in speech tempo, vocal timbre, non-observance of quantity, imprecise pronunciation, etc.). The recording was performed at a sampling frequency of 48 kHz, a resolution of 16 bits per sample, and in single-channel mode.

The recorded set of one hundred sentences was subsequently cut into individual words, taking into account the non-violation of the inter-word transient, which thus became part of the word cut. All the words were normalized to the same volume. Subsequently, a set of 300 new sentences was created from these words by concatenation, which maintained fluency and naturalness thanks to intact inter-word transients. During the construction of these sentences, the intensity of the volume was also adjusted at the onset or offset of words, such that the given sentences sounded as natural as possible. Editing and management of audio content was carried out in Adobe Audition software (see Figure 2).

The resulting set of three hundred sentences was then evaluated by three independent quality assessors who followed the quality assessment methodology using MOS (Mean Opinion Score). The threshold for editing a sentence recording was set to 4; i.e., sentences with an MOS score less than or equal to 4 were additionally edited to remove unwanted phenomena.

Noise plays an important role in the process of testing auditory competence. Its level significantly affects the accuracy of word recognition. In our experiments, we used one of the most challenging noises for speech perception [1,21]. This noise is referred to as babble noise [16]. It is created by combining, mixing the speech signal, thus achieving a spectrum that is similar to the useful speech signal. In our case, a total of 90 sentences were used to create the noise. They were divided into 30 tracks, with different time shifts. By mixing them into one final recording and then removing its beginning and end, we obtained a sufficiently long recording with babble noise.

The resulting stereo recordings contained a sentence recording on one channel and a noise recording on the other (see Figure 3). The noise with a gradual fade-in and fade-out exceeded the length of the useful recording at the beginning and end by 500 ms.

3. Experimental Setup

3.1. Participants

A total of 68 young people aged from 21 to 32, with a mean age of 23, took part in the test. Prior to testing, they were asked to fill out a questionnaire, in which we ascertained their general state of health, taking into account any current or previous diseases or hearing impairments. We also investigated the current state of health in order to obtain information about discomfort that could affect the results of the tests (headache, flu, etc.). Next, we focused on the environment in which the listeners could be. We investigated whether they were exposed to excessive noise on the day of the test, or whether they were in a noisy environment, for example at work or during leisure activities, or whether they played sports that could affect the test results (e.g., swimming). We also asked about their subjective assessment regarding hearing, whether there are situations in which they prefer a higher volume of sounds compared to other people. We also investigated additional information that could help to understand possible unexpected results. These questions included, e.g., their preference when listening to music, whether they use headphones and what volume they prefer, or whether they can play a musical instrument or whether they think they have a musical sense. These questions allowed us to identify one participant who was assumed to have worse results regarding perception tests. She responded positively to the use of headphones with increased volume and the preference for louder sounds when watching TV or listening to the radio, but at the same time she denied the diagnosed hearing disorder. This assumption was finally confirmed and her results were excluded from the evaluation in order to avoid distorting the resulting data in a negative way.

3.2. Equipment Used

The testing room was equipped with 15 identical computers (Win 10, 64-bit, Intel Core i7-7700 CPU, 16 GB RAM) with an external sound card (Creative Sound Blaster X-Fi HD) and closed headphones (AKG K77). The sound chain was calibrated by G.R.A.S. 90AB (artificial ear type IEC 60318-2, connected with microphone type 1^″ 40EN, and preamplifier type 26AB). G.R.A.S. Audiometer Calibration Analyzer HW1001 was connected by G.R.A.S. AA0008 cable to the artificial ear. The mentioned setup ensured that all participants were in the same conditions.

4. Optimization Phase

The set of 300 sentences was divided into 10 tests, so-called triplets. The triplet therefore contained 3 × 10 sentences. The purpose of the optimization process was to find the SNR [dB] values that correspond to the word recognition score (WRS) at the levels of 20%, 50%, and 80%. In the optimization phase, an SNR ranging from −20 dB to 4 dB (in +1 dB steps) was used, while, in each triplet, three different SNR levels were tested at a constant noise level of 70 dB SPL. A total of 30 native speakers, students (aged from 21 to 32, with a mean age of 24), took part in the test in the optimization phase. Due to laboratory capacity constraints, the test was carried out during two afternoons, 15 students each day. The testing room is located in a quiet part of the building, with a controlled entrance.

Prior to testing, the participants were instructed about the method of testing and the organization of the whole process. As part of the testing, two breaks were organized, the first after Triplet 3 and the second after Triplet 7. During these breaks, the participants had the opportunity to rest and refresh themselves. The breaks lasted approximately 40 min. However, the participants were instructed about the possibility to take a break whenever it was deemed necessary, but they used this possibility to a minimal extent.

The testing was carried out through a designed interface, which was used to play prepared recordings (play button), select the words heard, and record answers (confirm button). Throughout the test, the participants had the opportunity to see all the words, clearly arranged in the form of a matrix (see Figure 4). The interface (GUI) was created in MATLAB R2021b software.

Results

The results recorded by the GUI were then evaluated and processed using an intelligibility curve with a typical shape similar to the letter S (see Figure 5). This red marked curve is sometimes also referred to as an S-curve.

From the results of perception tests, it is possible to identify SNR values for the Word Recognition Score (WRS) equal to 20%, 50%, and 80%. Based on the sigmoid psychometric function [20], the following threshold values of −8.3 dB, −6.3 dB, and −4.3 dB were determined, which will be used as part of testing in the evaluation phase.

As part of this optimization phase, in addition to the WRS at the 20%, 50%, and 80% levels, we also obtained an overview of the recognition of each individual word and the average recognition of all words. Based on the obtained values, we optimized the words in the range of max. ±3 dB [1,20] in such a way that, for easy-to-recognize words, the intensity was reduced towards the average, and vice versa for harder-to-recognize words: the intensity was increased towards the average WRS value; see Figure 6.

After adjusting the volume of individual words, new sentences were constructed from these words, which were used in the next phase, i.e., the evaluation phase, which is discussed in detail in the next section.

5. Evaluation Phase

In the optimization process, the WRS was equalized by adjusting the volume, which should result in an equalization of the perceptual difficulty. Testing was performed at three threshold levels (20%, 50%, and 80%), which correspond to SNR values equal to −8.3 dB, −6.3 dB, and −4.3 dB, respectively. In each triplet, the mentioned three SNR levels were used.

In the evaluation phase, a total of 63 individuals participated in the testing, while one person was excluded due to suspicion of hearing impairment. Her results were not included in the evaluation analyses. A total of 40 individuals participated in the GUI testing (30 new, 10 experienced). The minimum age of the participants was 20, the maximum age was 25, and the mean age was 22 years. Twenty-two individuals participated in the written test (eight new, fourteen experienced). The testing was carried out in the same way (briefing, questionnaire, and breaks), in the same premises, and on the same equipment, including calibration performed prior to each test. In the case of using the GUI, the content heard was recorded directly by selecting words from the matrix and confirming the chosen words.

Results

The test results are displayed using a boxplot, where it is possible to observe the minimum, maximum, median, first and third quartile, as well as outliers for each triplet made up of three tests (SNR = −8.3 dB, −6.3 dB, and −4.3 dB); see Figure 7.

A phenomenon called the training effect can be observed at the beginning of testing, when the performed tests achieve a relatively low score compared to other tests. The statistical values of individual test scores gradually become stable as a result of mastering of the test routine, eliminating random errors due to inattention or insufficient initial concentration. In order to eliminate this phenomenon, which is otherwise normally present in perceptual testing, the results of the first triplet were not taken into account.

The results presented in the previous figure can also be represented using the curve of the psychometric function with the estimated and actual measured values marked (see Figure 8).

As a result of optimization, there was an offset in the psychometric curve to the left and word recognition threshold values were changed from −8.3 dB to −9.2 dB, from −6.3 dB to −6.9 dB, and from −4.3 dB to −4.6 dB for WRS = 20%, 50%, and 80%, respectively. In the mentioned evaluation, 9 triplets were taken into account, i.e., 27 tests.

6. Comparison of Test Routine

The comparison of test scores was also evaluated from the point of view of the test routine, i.e., the test format (GUI or written form). The first way of testing was based on using an interface (GUI) with a depicted matrix of words. Participants played the recording with stimulus and then selected heard words directly through the testing interface. The second form of tests consisted of writing the answer in the answer sheets without the possibility of a visual hint—a depicted word matrix. The task of the participants was to write down the content heard (the whole sentence or words). Immediately after listening to each recording, they wrote down the content heard on the prepared answer sheets. The playback speed of the recordings was within the competence of each participant. The other aspects of the testing conditions remained unchanged (identical tests, same equipment, calibration process, and breaks). After completing all the tests, the participants were asked to fill in a short feedback form, which ascertained their opinions on the course of the testing, as well as the content of the sentences heard, and the frequency of occurrence of certain words. The results of 40 listeners (30 new, 10 experienced) who performed tests with the use of visual support (GUI) and 22 listeners (8 new, 14 experienced) who performed tests in the written form were included in the comparison of the test formats.

The review of the achieved results in relation to the used test routine for each triplet and the corresponding SNR value of the given test are depicted in Figure 9.

In the first triplet, the format of testing with writing the answer on the answer sheet clearly dominates. This form of testing is easy and straightforward, which is probably the reason for the more accurate results of the first triplet. In tests with significant noise (SNR = −8.3 dB) with WRS at around 20%, better results were achieved for the routine with written answers. This finding probably indicates better concentration regarding listening due to the absence of visual cues and efforts to quickly label the word heard. On the other hand, in tests in which the noise level did not significantly affect the quality of the information heard, the test format with an illustrated matrix appears to be a more suitable method of testing. At the same time, it can be observed that the routine of the performed tests (from the first to the last triplet) helps both forms to achieve comparable results. Mean GUI vs. mean WRITE from Triplet 2 to Triplet 10 are 56.66% and 56.43% (see Figure 9—right), respectively, in favor of the GUI form of testing (with visual support).

7. Equivalence of Tests

Ensuring the same difficulty regarding the tests is crucial for their use in practice. In this section, we will therefore statistically evaluate the test parameters from nine triplets, a total of twenty-seven tests (nine tests for each monitored SNR level, see Table 2), and identify unsatisfactory tests. In the remaining tests, we will evaluate their statistical independence using analysis of variance—ANOVA.

When determining unsatisfactory tests, the technique of data cleaning and identification of outliers was applied (MATLAB). The algorithm used a detection method based on the mean value with a threshold of 0.75. The following figures show the result of the procedure used, and at the same time it is possible to identify the promising test sets within the triplets tested. These are the tests whose scores fell within the area defined by the red threshold lines (see Figure 10). A total of 17 tests were identified, for which we used statistical analysis of variance to determine the independence of the tests and thus whether there were significant differences between the selected 17 tests.

ANOVA for SNR −8.3 dB

Based on the applied cleaning routine, the tests T2, T4, and T10 were omitted and analysis of variance (ANOVA) was performed only on selected tests T3, T5, T6, T7, T8, and T9 (see Figure 10a). The ANOVA showed that there were significant differences between mean of individual tests [F(5, 234) = 3.605, p = 0.004]. Pairwise comparisons (with Bonferroni correction) showed that the p value of the t-test for T3 vs. T7, T8, and T9 had a statistically significant difference. We excluded the T3 test for the mentioned reason. Other tests (T5, T6, T7, T8, and T9) demonstrated the desired statistical features.

ANOVA for SNR −6.3 dB

Similarly, ANOVA was applied in the second group of tests, i.e., T3, T4, T5, T6, T7, and T8 (see Figure 10b), with the result indicating the existence of significant differences between this group of tests [F(5, 234) = 2.691, p = 0.0219]. Pairwise comparisons (with Bonferroni correction) were performed showing one significant dependence between T3 and T6. We excluded the T3 test from further processing and kept the remaining tests, T4, T5, T6, T7, and T8.

ANOVA for SNR −4.3 dB

In the last group of tests, i.e., T4, T5, T6, T7, and T10 (see Figure 10c), the ANOVA did not reveal statistically significant differences between the mentioned tests [F(4, 195) = 1.863, p = 0.118].

Results

After performing the ANOVA, 15 out of a total of 17 tests remained, which means that we have 15 test sets available (each set containing 10 sentences). These tests are equivalent and have desired statistical features. The corresponding psychometric curve is shown in Figure 11.

The blue curve shows the approximation of the testing progress using the psychometric curve in the range from −20 to 4 dB before optimization (the results of the first triplet were not included); the red curve shows the progress after optimization, which is the same for all 27 tests; and the green curve shows the final psychometric curve for the selected set of 15 tests, which show appropriate statistical features identified by ANOVA. The recognition threshold values with WRS at the level of 20% (SRT20) changed from −8.3 dB to −9.2 dB and −9.43 dB due to the adjustments made in the optimization phase and selection of appropriate tests. The value of WRS at the level of 50% or SRT50 changed from −6.3 dB to −6.9 dB and subsequently to −7.03 dB. For WRS of 80% (SRT80), it changed from −4.3 dB to −4.6 dB and −4.62 dB.

8. Comparison of the Effect of New vs. Experienced Participants

To determine the effect of knowing the test routine (GUI), we performed an ANOVA, taking into account the results of the first triplet, which were omitted in the rest of this study. We expect the results of the first triplet to be strongly dependent on the fact that the given test participant had already completed such a test before or whether it was his/her first time. The ANOVA showed that there was a statistically significant effect on test scores with respect to previous knowledge or unfamiliarity with the test routine, not only for the first triplet [F(1, 78) = 379.6907, p =

1.06 \times 10^{- 31}

] but also for the other triplets.

9. Analysis of Words Recognition from Adaptive Matrix

The best score was achieved by words from the category of name appearing in the first position within the sentence. From the point of view of the prosodic composition of the sentence, its onset is realized with a stronger stress accent and often with emphasis. The most accurately recognized names included “Mária” and “Peter”, and, on the contrary, the names “Jana” and “Ján” achieved the lowest scores. Their phonetic proximity probably caused the confusion regarding the correct form. The mean success rate for the name category is 72.8%, STD = 12.7%.

In the second sentence position, there were verbs with a mean success rate of 52.6%, STD = 15.1%. The most accurately recognized verbs included the words “knows (pozná)” and “has (má)”, and, on the contrary, the words “buys (kúpi)” and “finds (nájde)” achieved the lowest score.

In the third sentence position, there were numerals with a mean success rate of 58.9% and STD = 15.3%. The numerals “four (štvoro)” and “hundred (sto)” dominated, and, on the contrary, the numerals “many (mnoho)” and “three hundred (tristo)” were the least correctly recognized numerals.

In the fourth position in the sentence, there were adjectives with a mean success rate of 56.2%, STD = 14.1%. The most accurately recognized words included the adjectives “other (ďalších)” and “nice (pekných)”, and, on the contrary, the least correctly recognized adjectives were the adjectives “small (malých)” and “good (dobrých)”.

In the last position in the sentence, there were objects with a mean recognition success rate of 52.5%, STD = 18.1%. The highest scores were achieved by the words “spoons (lyžíc)” and “buckets (vedier)”, and, on the contrary, the least correct recognitions and thus the lowest scores were achieved by the words “houses (domov)” and “bowls (mís)”.

The boxplot illustrates the variability in all word categories present within the proposed adaptive matrix, which provides a good identification of the distribution of data within each category, as well as identifying outliers (a circle) and providing information on the symmetry of the data according to the given word category (see Figure 12).

From the point of view of the evaluation of word categories, the category of nouns dominated, followed by numerals and adjectives. On the other hand, verbs and objects appeared to be the most problematic. Their positions within a sentence correspond to the second and last positions.

10. Discussion

In this section, the achieved results of the Slovak adaptive matrix will be compared with the results of other studies dealing with the evaluation of adaptive test matrices in other languages.

The average slope of the psychometric curves of the test participants is 13.13 ± 1.60%/dB, while the corresponding SRT 50 value reaches a mean value of −7.03 ± 0.79 dB.

Comparable results were also achieved for tests in other languages; for example, the SRT threshold is at the level of −8.4 ± 1.1 dB and the slope of the psychometric curve is 17.2%/dB for German [12]; SRT = −8.43 ± 1.75 dB and slope of 13.2%/dB for Danish [13]; SRT = −10.1 ± 0.1 dB with a slope of the psychometric curve of 16.7 ± 1.2%/dB for the Finnish matrix test [20]; SRT = −9.3 ± 0.8 dB and slope of 11.2 ± 1.2%/dB for the Mandarin Chinese matrix test [14]; SRT = −9.6 dB with a curve slope of 17.1%/dB for the Polish matrix test [16]; and SRT = −8.3 ± 0.2 dB and slope of 14.1 ± 1.0%/dB for the Turkish matrix sentence test [24]. Similar results (like Slovak matrix) were reported for French, i.e., SRT = −6.0 ± 0.6 dB and slope of 14 ± 1.6%/dB and for the Italian matrix tests with the reference SRT = −7.3 ± 0.2 dB and a slope of the curve of 13.3 ± 1.2%/dB [21,22]. The number of tests (lists with 10 sentences) that meet the necessary criteria is, e.g., 12 for Italian [21], 16 for Russian [23], and 25 tests for Swedish [30].

11. Conclusions

Adaptive matrix tests in audiometry represent a specific method that enables accurate and efficient measurement of an individual’s hearing abilities. In this study, we present test results obtained using an adaptive test matrix. To our knowledge, this is the first work of this kind for the Slovak language. The set of 15 tests show suitable statistical features for repeated measurements. The independence of the constructed tests was proven by ANOVA. The tests are characterized by the SRT50 recognition threshold at the level of −7.03 dB, STD = 0.79 dB and the slope of the psychometric curve approximating the test results of healthy listeners at the level of 13.13%/dB, STD = 1.60%/dB. In this study, we also focused on the impact of the testing format on the final test scores. By comparing both testing formats, comparable results were achieved, slightly in favor of the visual supported form of the test (GUI). We also investigated whether initial experience or, on the contrary, inexperience with the implementation of this type of test influenced the results of tests in the GUI format. The results show that experienced participants have a higher chance of achieving a better result on the test. The presented tests in the Slovak language represent the first step towards standardizing hearing assessment using adaptive test matrices in Slovakia. The data also enable a comparison of achieved results between similar tests in other languages.

Author Contributions

Conceptualization, E.K. and P.G.; methodology, E.K.; software, E.K.; validation, E.K., P.G., and R.S.; formal analysis, P.G.; investigation, R.S.; resources, R.S.; data curation, P.G.; writing—original draft preparation, E.K. and R.S.; writing—review and editing, E.K.; visualization, E.K.; supervision, R.S.; project administration, P.G.; funding acquisition, E.K. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Scientific Grant Agency of the Ministry of Education, Science, Research and Sport of the Slovak Republic and the Slovak Academy of Sciences under research project VEGA 1/0344/21 and the Slovak Research and Development Agency under research project APVV-22-0261.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets are available from the corresponding author upon reasonable request.

Acknowledgments

The work was created with the kind support of ORL specialists Silvia Krempaská and Juraj Kovaľ.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ICRA	International Collegium of Rehabilitative Audiology
SNR	Signal Noise Ratio
SPL	Sound Pressure Level
SRT	Speech Reception Threshold
WRS	Word Recognition Score

References

Akeroyd, M.A.; Arlinger, S.; Bentler, R.A.; Boothroyd, A.; Dillier, N.; Dreschler, W.A.; Gagne, J.-P.; Lutman, M.; Wouters, J.; Wong, L.; et al. ICRA recommendations for the construction of multilingual speech tests. Int. J. Audiol. 2015, 54, 17–22. [Google Scholar] [CrossRef]
Trimmis, N.; Vrettakos, G.; Gouma, P.; Papadas, T. Speech audiometry: Nonsense monosyllabic lists in modern Greek. J. Hear. Sci. 2012, 2, 41–49. [Google Scholar] [CrossRef]
Kuk, F.; Lau, C.C.; Korhonen, P.; Crose, B.; Peeters, H.; Keenan, D. Development of the ORCA nonsense syllable test. Ear Hear. 2010, 31, 779–795. [Google Scholar] [CrossRef]
Potgieter, J.M.; Swanepoel, W.; Smits, C. Evaluating a smartphone digits-in-noise test as part of the audiometric test battery. S. Afr. J. Commun. Disord. 2018, 65, e1–e6. [Google Scholar] [CrossRef]
Kwak, C.; Seo, J.H.; Oh, Y.; Han, W. Efficacy of the Digit-in-Noise Test: A Systematic Review and Meta-Analysis. J. Audiol. Otol. 2022, 26, 10–21. [Google Scholar] [CrossRef]
Ondáš, S.; Kiktová, E.; Pleva, M.; Oravcová, M.; Hudák, L.; Juhár, J.; Zimmermann, J. Pediatric Speech Audiometry Web Application for Hearing Detection in the Home Environment. Electronics 2020, 9, 994. [Google Scholar] [CrossRef]
Wilson, R.H.; Burks, C.A. Use of 35 words for evaluation of hearing loss in signal-to-babble ratio: A clinic protocol. J. Rehabil. Res. Dev. 2005, 42, 839–852. [Google Scholar] [CrossRef]
Coene, M.; Krijger, S.; van Knijff, E.; Meeuws, M.; De Ceulaer, G.; Govaerts, P.J. LiCoS: A New Linguistically Controlled Sentences Test to Assess Functional Hearing Performance. Folia Phoniatr. Logop. Off. Organ Int. Assoc. Logop. Phoniatr. (IALP) 2018, 70, 90–99. [Google Scholar] [CrossRef] [PubMed]
Percy-Smith, L.; Wischmann, S.; Josvassen, J.L.; Hallstrøm, M.; Laplante-Lévesque, A.; Sorgenfrei, M.G.; Caye-Thomasen, P. Evaluation of a sentence test in noise in children with hearing impairment. Dan. Med. J. 2020, 67, A06190358. [Google Scholar] [PubMed]
Hagerman, B. Sentences for testing speech intelligibility in noise. Scand. Audiol. 1982, 11, 79–87. [Google Scholar] [CrossRef] [PubMed]
Wagener, K.; Brand, T.; Kollmeier, B. Development and evaluation of a German sentence test I: Design of the Oldenburg sentence tests. Z. Audiol. 1999, 38, 4–15. (In Germany) [Google Scholar]
Wagener, K.; Brand, T.; Kollmeier, B. Development and evaluation of a German sentence test II: Optimization of the Oldenburg sentence tests. Z. Audiol. 1999, 38, 44–45. (In Germany) [Google Scholar]
Wagener, K.; Josvassen, J.L.; Ardenkjaer, R. Design, optimization and evaluation of a Danish sentence test in noise. Int. J. Audiol. 2003, 42, 10–17. [Google Scholar] [CrossRef]
Hu, H.; Xi, X.; Wong, L.L.N.; Hochmuth, S.; Warzybok, A.; Kollmeier, B. Construction and evaluation of the Mandarin Chinese matrix (CMN matrix) sentence test for the assessment of speech recognition in noise. Int. J. Audiol. 2018, 57, 838–850. [Google Scholar] [CrossRef]
Potts, L.G.; Olivo, A.M.; Reeder, R.M.; Firszt, J.B. Evaluation of the American English Matrix Test with Cochlear Implant Recipients. Int. J. Audiol. 2023, 1–7, advance online publication. [Google Scholar] [CrossRef]
Ozimek, E.; Warzybok, A.; Kutzner, D. Polish sentence matrix test for speech intelligibility measurement in noise. Int. J. Audiol. 2010, 49, 444–454. [Google Scholar] [CrossRef] [PubMed]
Hochmuth, S.; Brand, T.; Zokoll, M.A.; Castro, F.Z.; Wardenga, N.; Kollmeier, B. A Spanish matrix sentence test for assessing speech reception thresholds in noise. Int. J. Audiol. 2012, 51, 536–544. [Google Scholar] [CrossRef] [PubMed]
Jansen, S.; Luts, H.; Wagener, K.C.; Kollmeier, B.; Del Rio, M.; Dauman, R.; James, C.; Fraysse, B.; Vormès, E.; Frachet, B.; et al. Comparison of three types of French speech-in-noise tests: A multi-center study. Int. J. Audiol. 2012, 51, 164–173. [Google Scholar] [CrossRef]
Houben, R.; Koopman, J.; Luts, H.; Wagener, K.C.; van Wieringen, A.; Verschuure, H.; Dreschler, W.A. Development of a Dutch matrix sentence test to assess speech intelligibility in noise. Int. J. Audiol. 2014, 53, 760–763. [Google Scholar] [CrossRef]
Dietz, A.; Buschermöhle, M.; Aarnisalo, A.A.; Vanhanen, A.; Hyyrynen, T.; Aaltonen, O.; Löppönen, H.; Zokoll, M.A.; Kollmeier, B. The development and evaluation of the Finnish Matrix Sentence Test for speech intelligibility assessment. Acta Oto-Laryngol. 2014, 134, 728–737. [Google Scholar] [CrossRef]
Puglisi, G.E.; Warzybok, A.; Hochmuth, S.; Astolfi, A.; Prodi, N.; Visentin, C.; Kollmeier, B. Construction and first evaluation of the Italian Matrix Sentence Test for the assessment of speech intelligibility in noise. In Proceedings of the Forum Acusticum, Krakow, Poland, 7–12 September 2014; PAS—Polish Acoustical Society, European Acoustics Association: Madrid, Spain, 2014; pp. 1–5. [Google Scholar]
Puglisi, G.E.; Warzybok, A.; Hochmuth, S.; Visentin, C.; Astolfi, A.; Prodi, N.; Kollmeier, B. An Italian matrix sentence test for the evaluation of speech intelligibility in noise. Int. J. Audiol. 2015, 54, 44–50. [Google Scholar] [CrossRef] [PubMed]
Warzybok, A.; Zokoll, M.; Wardenga, N.; Ozimek, E.; Boboshko, M.; Kollmeier, B. Development of the Russian matrix sentence test. Int. J. Audiol. 2015, 54, 35–43. [Google Scholar] [CrossRef] [PubMed]
Zokoll, M.A.; Fidan, D.; Türkyılmaz, D.; Hochmuth, S.; Ergenç, İ.; Sennaroğlu, G.; Kollmeier, B. Development and evaluation of the Turkish matrix sentence test. Int. J. Audiol. 2015, 54, 51–61. [Google Scholar] [CrossRef] [PubMed]
Hörzentrum Oldenburg. Available online: https://www.hz-ol.de/en/matrix.html (accessed on 20 January 2024).
Kollmeier, B.; Warzybok, A.; Hochmuth, S.; Zokoll, M.A.; Uslar, V.; Brand, T.; Wagener, K.C. The multilingual matrix test: Principles, applications and comparison across languages: A review. Int. J. Audiol. 2015, 54, 3–16. [Google Scholar] [CrossRef]
Slovak National Corpus. Available online: https://bonito.korpus.sk/run_guest.cgi/first_form?corpname=prim-6.0-public-all;align= (accessed on 10 December 2023).
Panocová, R.; Gregová, R. Designing the Slovak Matrix Sentence Test. Int. J. Appl. Lang. Stud. Cult. 2019, 2, 33–38. [Google Scholar] [CrossRef]
Štefánik, J.; Rusko, M.; Považanec, D. The Frequency of Words, Graphemes, Phones and Other Elements in Slovak. Jazykoved. Časopis 50 1999, 2, 81–93. [Google Scholar]
Hällgren, M.; Larsby, B.; Arlinger, S. A Swedish version of the Hearing in Noise Test (HINT) for measurement of speech recognition. Int. J. Audiol. 2006, 45, 227–237. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The reference distribution of the Slovak phonemes [29] and the distribution of the Slovak phonemes in matrix sentences.

Figure 2. Construction of one sentence consisting of five recordings in Audition.

Figure 3. Example of one recording in Audition.

Figure 4. Graphical User Interface (GUI) used during the tests.

Figure 5. The measured values (circles) and the resulting psychometric curve (red) from the optimization phase.

Figure 6. Optimization of SRT for 50 words.

Figure 7. Results of 10 triplets with depicted median, IQR (interquartile range), min, max, and outliers (small circles).

Figure 8. Curves of psychometric functions showing values before optimization (blue curve) and after optimization (red curve) with corresponding levels for 20%, 50%, and 80% WRS. Red circles refer to the measured values and red stars refer to estimated values.

Figure 9. Comparison of test routine.

Figure 10. Identification of suitable tests according to an algorithm for data cleaning and identification of outliers for SNR thresholds −8.3 dB, −6.3 dB, −4.3 dB that correspond to (a–c).

Figure 11. Curves of psychometric functions showing results before optimization (blue), after optimization (red), and final set of 15 selected tests (green).

Figure 12. Boxplots for word categories.

Table 1. Adaptive test matrix for Slovak language.

Name	Verb	Numeral	Adjective	Object
Mária	chce (wants)	štvoro (four)	iných (different)	darov (gifts)
Jana	hľadá (looks)	osem (eight)	veľkých (big)	vedier (buckets)
Martin	pozná (knows)	sto (hundred)	starých (old)	okien (windows)
Jano	kúpi (buys)	veľa (many/much)	dobrých (good)	novín (newspapers)
Viera	nájde (finds)	tristo (three hundred)	malých (small)	mís (bowls)
Pavol	nemá (doesn’t have)	sedem (seven)	lacných (cheap)	bytov (flats)
Jožo	má (has)	pár (a few)	ďalších (other)	nožov (knives)
Peter	dáva (gives)	mnoho (many/much)	celých (whole)	lyžíc (spoons)
Eva	vidí (sees)	málo (little/few)	pekných (nice)	domov (houses)
Anna	nechce (doesn’t want)	dvesto (two hundred)	nových (new)	lavíc (benches)

Table 2. Results for triplets according to the SNR threshold (−8.3 dB, −6.3 dB, and −4.3 dB).

SNR	Triplet	Mean [%]	STD [%]	SNR	Triplet	Mean [%]	STD [%]	SNR	Triplet	Mean [%]	STD [%]
−8.3	2	15.75	11.32	−6.3	2	50.00	13.82	−4.3	2	73.45	8.42
−8.3	3	25.40	11.24	−6.3	3	55.55	12.78	−4.3	3	74.65	10.20
−8.3	4	22.90	8.46	−6.3	4	56.80	12.95	−4.3	4	81.90	8.85
−8.3	5	31.55	11.76	−6.3	5	56.70	10.81	−4.3	5	84.05	7.99
−8.3	6	29.50	10.73	−6.3	6	63.55	12.49	−4.3	6	80.95	9.51
−8.3	7	33.60	13.92	−6.3	7	58.85	12.41	−4.3	7	85.55	8.42
−8.3	8	34.80	12.65	−6.3	8	62.40	14.52	−4.3	8	86.30	9.99
−8.3	9	34.40	11.44	−6.3	9	71.55	12.49	−4.3	9	88.75	6.56
−8.3	10	38.75	14.76	−6.3	10	65.45	13.04	−4.3	10	84.05	7.95

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kiktová, E.; Sock, R.; Getlík, P. Testing in Noise Based on the First Adaptive Matrix Sentence Test in Slovak Language. Electronics 2024, 13, 602. https://doi.org/10.3390/electronics13030602

AMA Style

Kiktová E, Sock R, Getlík P. Testing in Noise Based on the First Adaptive Matrix Sentence Test in Slovak Language. Electronics. 2024; 13(3):602. https://doi.org/10.3390/electronics13030602

Chicago/Turabian Style

Kiktová, Eva, Rudolph Sock, and Peter Getlík. 2024. "Testing in Noise Based on the First Adaptive Matrix Sentence Test in Slovak Language" Electronics 13, no. 3: 602. https://doi.org/10.3390/electronics13030602

APA Style

Kiktová, E., Sock, R., & Getlík, P. (2024). Testing in Noise Based on the First Adaptive Matrix Sentence Test in Slovak Language. Electronics, 13(3), 602. https://doi.org/10.3390/electronics13030602

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Testing in Noise Based on the First Adaptive Matrix Sentence Test in Slovak Language

Abstract

1. Introduction

2. Slovak Adaptive Matrix Test

Recording and Editing of the Speech Material

3. Experimental Setup

3.1. Participants

3.2. Equipment Used

4. Optimization Phase

Results

5. Evaluation Phase

Results

6. Comparison of Test Routine

7. Equivalence of Tests

Results

8. Comparison of the Effect of New vs. Experienced Participants

9. Analysis of Words Recognition from Adaptive Matrix

10. Discussion

11. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI