*EMA Results*

A total of 205 valid votes were cast in total across the study group over the two-week period. The median number of votes cast by each subject was 15 and ranged from a minimum of seven to a maximum of 50. Six subjects voted at least once per day on average over the 2-week period, while five subjects voted less often. Votes were spread across the different acoustic scenes, the distribution of which is provided in Figure 2 for the entire subject pool. The scene with the fewest votes cast was Speech with 19 votes, while the other three classes had an approximately equal number of votes, with 55, 54 and 45 votes cast in the Quiet, Speech in Noise, and Noise class respectively. The median number of votes cast per subject in each scene was 6, 1, 4 and 4 for the Quiet, Speech, Speech in Noise, and Noise classes, respectively.

The number of votes cast by each individual subject is presented in Figure 3 using bubble plots. The location of the bubble on the x-axis indicates the program preference, the size of the bubble indicates the number of votes that contributed to that data point and the color indicates the sound class to which the votes were allocated.

Overall preference was analyzed by aggregating data across all scenes. A glm was fitted with program preference as the dependent variable and subject as the independent variable. The resulting chi-squared analysis of variance on the glm showed the effect of subject was highly significant (*p* < 0.001). P-values indicating the significance of preference for each subject are presented in Table 2. Two subjects had a significant preference for FF labelled as category A: subject #7 (*p* = 0.004) and #15 (*p* < 0.001). The remaining nine subjects showed no significant overall preference for either program.

**Figure 3.** Individual program preference separated by sound class. Size of data point indicates number of votes.

**Table 2.** Summary of statistical analysis, indicating those subjects with an overall preference (A), those subjects whose preference varied with SoundClass (B), and those subjects where no preference could be determined (C). # xx – patient ID; \* means significant test result.


To analyze the effect that sound class had on preference, a mixed-effects glm was fitted. The dependent variable was program preference and independent variable was sound class, while subject was considered as a random effect in the model.

The effect of SoundClass (fixed effect) was tested by comparing the mixed-effects glm to a model that excluded the SoundClass (fixed effect) and only included the random effect (subject). The resulting chi-squared analysis of variance showed SoundClass fixed effect was not significant (*p* = 0.191), indicating a lack of association between program preference and sound class for the group of subjects as a whole.

The effect of subject (random effect) was tested by comparing the mixed-effect glm to a model that excluded the random effect (Subject) and only included the fixed effect (SoundClass). The resulting chi-squared analysis of variance confirmed that Subject (random effect) was highly significant (*p* < 0.001), indicating that individual subjects voting preferences varied amongst the group. For each individual subject, the logistic regression of preference with SoundClass was fitted, and the resulting p-values are shown in Table 1. Three subjects showed logistic regression that was significant, indicating that those subjects voting preference was dependent on the SoundClass (#1 *p* = 0.004, #13 *p* = 0.011, #14 *p* = 0.017), labelled as category B.

Six subjects (#4, #6, #9, #12, #14, #17) were labelled as category C, for which no conclusive preference could be determined. Four of those subjects voted less than once per day on average over the 2-week period.

#### **4. Discussion**

The automatic scene classifier in the cochlear implant sound processor offers the ability to characterize the surrounding with respect to its acoustics characteristics, such as speech and noise. It is known that signal pre-processing in cochlear implant systems should be chosen depending on the acoustic environment to improve speech comprehension [32]. Such conclusions were derived from in-lab investigations. So far, there is limited knowledge available on the patient's everyday real-world program preference using such algorithms.

The technical realization of the remote control of modern cochlear implant systems, as used as a scene-dependent voting tool in this study, can be used for the EMA in a cochlear implant population. The integration of the assessment tool into the patient's sound processor proved to be useful. The resulting link of the patients' input to the captured acoustic scene class potentially allows for the investigation of patients' individual preferences with respect to program settings and/or specific algorithms in different acoustic environments. Additionally, in cases of data mismatch to clinical expectations or ongoing inactivity, this method may provide new insight into individual preferences. This pilot study showed a significant difference in voting patterns across the group of subjects: for instance, two patients (#15, #7) had an overall preference for ForwardFocus that persisted regardless of the acoustic scene, three subjects (#1, #13, #14) had a scene specific algorithm preference, and the remaining six subjects (#4, #6, #9, #10, #12, #17) had no conclusive preference.

Our methodology complements the use of EMA in hearing science to date, where studies have prompted surveys where participants assess their acoustic environment and rate their hearing experience [24,25,27]. These methodologies provide in-the-moment responses to complex real-world situations, which is a significant improvement to surveys confounded by retrospective recall. In addition, our approach enables in-the-moment rating of signal pre-processing technology for real-world environments.

A significant advantage of this EMA methodology is the objective acoustic scene classification. By using the available scene classification of the sound processor [31], an accurate environmental measure is captured without further patient interaction. This ability is expected to be useful for sound processing algorithms that are designed and expected to provide benefit in specific noise environments. Research algorithms are being developed for specific noise scenes, such as constant noise [34] or babble noise [35]. To complement the in-booth speech understanding results, this method could provide real-world preference results for each of the available scene classes.

This study also aimed to investigate the feasibility of EMA to provide data on sound processing algorithms. Two sound processing algorithms were compared, with one being the adaptive directional microphone BEAM, and the other being ForwardFocus, known to provide significant speech understanding over BEAM in dynamic noise environments. Although these technologies were chosen because they were expected to provide differences, particularly in noisy listening environments, no clear general or scene-specific difference was determined from the group. This is not unexpected, due to the patient numbers and the numbers of votes collected. What was found was some evidence of individualized general or scene-specific voting patterns. In future, such EMA studies should therefore consider the proportion of votes expected in each scene. For instance, in this study, patients were far less

likely to vote in Wind, Music, or Speech in Noise scenes spontaneously. For a prompted methodology, the proportion of time, on average, in each scene would be important to consider and could be found from the sound processor data logs [36,37]. These insights will provide at least a basis to determine study design to power and capture data for direct scene-specific algorithm comparisons.

Compared to BEAM, the ForwardFocus algorithm shows its advantages in speech understanding, especially in fluctuating noise [21]. Several consequences can result from this. The acoustic scenes of Noise and Speech in Noise are not only characterized with respect to the signal-to-noise ratio, but a characterization of the temporal properties of the noise is additionally performed: stationary or fluctuating. This can be the basis to introduce ForwardFocus as an algorithm that is activated in specific listening scenes, such as determined by the automatic classification algorithm SCAN [18,32].
