1. Introduction
Smart mobile devices (e.g., smartphones) are becoming increasingly ubiquitous. Their capabilities allow the combined use of ecological momentary assessments (EMA) and mobile crowdsensing (MCS) in the healthcare domain to not only collect qualitative longitudinal and ecologically valid data, but also to use sensors of smartphones as well as connected external sensors (e.g., wearables) to capture the context in which these data are collected [
1]. For example, environmental data (e.g., noise [
2,
3]) can be measured when a questionnaire is answered to correlate the questionnaire data with the environmental data to gain new insights about patients. However, sensor measurements must be accurate, comparable, and interpretable to provide meaningful information. Especially for non-standardized smartphone sensors like the microphone (i.e., different manufacturers, different mobile operating systems, different scales), it can be challenging to achieve these properties.
The TrackYourTinnitus (TYT) mobile platform uses EMA and MCS to track a user’s individual tinnitus. Tinnitus is the perception of an internal sound in the ears in the absence of a corresponding external sound. As symptoms are subjective and vary over time, TYT was created to monitor and evaluate the variability of these symptoms in the daily life of tinnitus affected patients or interested users [
4]. The platform has been in operation since 2014 and is composed of a registration and information website (
https://www.trackyourtinnitus.org/, accessed on 1 October 2021), a central backend for data storage, and a mobile application available for both Android and iOS. The mobile apps assess users’ individual tinnitus perceptions (e.g., tinnitus loudness and distress) by asking them to complete tinnitus EMA questionnaires at different times of the day [
5]. In addition, the environmental sound level is captured in parallel with the completion of the daily questionnaire [
5]. The detailed process of the TYT app is described in [
1], whereas the underlying data set (i.e., structure and insights to the collected data) is described in [
6]. The overall objective of this work is to investigate the correlations between environmental sound level and reported tinnitus symptoms. More specifically, it should be examined whether the environmental sound level has an effect on tinnitus. If the sound levels can be correlated to questionnaire-collected data, new insights might be unveiled as the sound level data can be considered more objective than data from completed questionnaires alone (e.g., to allow predictions on tinnitus loudness based on the sound data). In this context, further note that, for tinnitus and many other diseases and disorders, longitudinal studies that are able to collect ecologically valid data for such a long time are still very rare. In addition, the collection of objective data succh as the sound level is even more scarce. Since TYT has been running for more than half a decade, and not all circumstances of the collection procedure were clear to the developers beforehand, it is now of great interest to make the collected amount of sound levels interpretable from a medical perspective. Therefore, the experiment at hand is important for TYT, but the results and lessons learned may be of much greater value for the healthcare domain in general.
However, the data available in the TYT database [
6] do not contain calibrated sound pressure level (SPL) or weighted decibel (e.g., dB(A)) values, but rather relative amplitude (Android) or uncalibrated decibel (iOS) values as retrieved from the mobile system APIs. This fact prevents a direct comparison of these values and therefore a meaningful interpretation regarding the correlation with tinnitus symptoms. A preceding calibration of the mobile devices and storing respective dB
SPL, dB(A) or dB(C) values would circumvent this issue. To encounter that sound sensor values measured by a smartphone require further considerations in healthcare scenarios is also recognized by other works than TYT [
7]. From a general viewpoint, sensor measurements collected by a modern smartphone for healthcare purposes require many considerations before collected sensor data can be actually evaluated. In [
8], for example, challenges are discussed in the context of fall detection. One of the challenges discussed by the authors of [
8] also has implications for the data collected by TYT, namely the usability when collecting sensor data. If a user has his or her smartphone in the pocket, collected sensor values may not be usable. Consequently, works can be found that try to mitigate such challenges on a more generic level [
9]. However, the data in the TYT database were collected for more than six years with more than 100,000 entries, and the respective mobile apps used to collect these values cannot be changed retroactively to counteract the described issues. Since no other works could be found that helped to analyze these pre-existing collected sound pressure levels, the following requirements were established for the experiment shown in the work at hand:
Identification of an experimental setting that can be used to learn more about the interpretation possibilities of the collected TYT sound level values.
In addition to the latter point, in the best case, the experiment should be appropriate to enable us to compare all sound level values across the different smartphone devices from different manufacturers and different mobile operating systems.
Conduction of the experiment without the use of an expensive sound laboratory, with the goal to foster and facilitate the overall reproducibility.
Based on these requirements, different scenarios have been discussed. In the end, the following approach (i.e., list of decisions for the experiment) was conceived to make the described values usable and comparable:
The TYT database was analyzed to identify the mobile device models that contributed the most environmental sound measurement data.
The analysis of the database showed that more detailed device information is available for Android devices. For this reason, it was decided to use Android devices for the experiment.
A sample of the identified device models was selected and acquired (i.e., we purchased these devices for the experiment).
A new mobile application was developed that mimics the behavior of the TYT app with respect to the sound measurement. More specifically, the app was implemented with the specific focus on the sound measurement but using the same software functions as TYT (i.e., by copying the relevant source code fragments from the original app).
The selected device models were equipped with this mobile application.
For the evaluation of the smartphone devices equipped with the app, a sound signal was generated, for which the volume was adjusted to different sound levels using a professional calibrated sound level meter (SLM). Based on this setting, the values captured by the mobile app on the different mobile devices were recorded.
Finally, the results were used to derive equations for the different device models that, in turn, can be used to transform the measurement data in the database into (partially) comparable dB(C) values.
How these steps were carried out in practice and what results were achieved are discussed in the following sections. In
Section 2, a detailed discussion of related works will be presented.
Section 3 presents the experiment in detail, while
Section 4 presents its results. A discussion of the results with respect to limitations and practical relevance will be provided in
Section 5.
Section 6 closes our work with a summary and an outlook for future work.
2. Related Work
Measuring sound levels with smartphones has been a topic of research for some time. There are both scientific and commercial implementations of apps that perform sound measurements. In addition, studies evaluating the accuracy and precision of these apps can be found in the literature. Moreover, the ability of smartphones to perform sound level measurements in the environment as well as their calibration has been investigated and discussed in a thorough manner. Finally, there are works that deal with large data sets of sound levels measured with smartphones.
NoiseMap [
10] is an Android app that performs geo-referenced sound measurements and sends these data to an open urban sensing platform following a participatory sensing approach to create real-time noise maps and data graphs. The incoming sound signal is sampled and first translated to a relative dB full scale (dBFS) value and subsequently to a dB
SPL value by adding a constant calibration value. A built-in calibration tool can be used to determine this value using a constant pink noise [
10]. The iOS app SoundLog [
11] was developed by the Australian National Acoustic Laboratories (NAL) with the aim to provide a personal noise dosimeter. The app is capable of measuring A-weighted equivalent continuous sound levels (LA
eq), C-weighted peak sound pressure levels (LC
pk), as well as other values for different sampling periods [
11]. Ambiciti [
12] is a mobile app developed for both Android and iOS that utilizes mobile crowdsensing to enable urban noise monitoring. The app performs automatic background noise measurements in dB(A) using the microphone and the user’s location. In addition, a calibration feature is provided [
12]. The accuracy of the app has been evaluated and found to be within ±1.2 dB(A) [
13]. The City Soundscape [
14] mobile app is used as part of a noise monitoring platform in the context of acoustic urban planning in smart cities. The app mimics the user interface of a professional SLM and is able to measure dB
SPL and equivalent continuous sound level (L
eq) values [
14]. Furthermore, there are numerous apps implementing sound measurements available in the Google Play Store (e.g., refs. [
15,
16,
17]) and the Apple App Store (e.g., refs. [
18,
19,
20]). However, in the context of environmental and occupational noise monitoring, for most of these apps there is no information available on the algorithms used as well as no systematic and standardized evaluation of their quality and accuracy, which is a common issue in the field of mHealth apps [
21]. There are various studies evaluating the accuracy of existing apps [
22,
23,
24,
25,
26,
27,
28,
29]. These studies were thereby either conducted in controlled laboratory environments [
22,
23,
24,
25,
27,
28,
29] and used pink noise [
23,
24,
28,
29], white noise [
25,
27,
28,
29], 1/3 octave band noise [
22], or representative audio samples [
29] to simulate sound sources with different sound levels, or were performed in real-world field environments [
26,
28]. Results indicate that some sound measurement smartphone apps may be considered accurate and reliable to a certain degree (±1 dB(A) or ±2 dB(A) respectively), but most of the apps cannot be used as reliable tool to assess the environmental sound [
23,
25]. In general, iOS apps performed better than Android apps, which can be attributed to the fact that Android devices are built by several different manufacturers and there is a lack of conformity of microphones and other audio components [
23,
25]. It has been shown that accuracy can be improved if the smartphone apps are calibrated before the measurements [
27]. Furthermore, it has been shown that the use of an external calibrated microphone can further increase the accuracy and precision of sound measurements compared to measurements using internal smartphone microphones [
30].
Moreover, the ability of smartphones to perform environmental sound level measurements in general has been extensively discussed in the literature [
31,
32,
33,
34]. In [
32], the sound capture and processing procedure when using smartphones for environmental noise measurements is investigated by analyzing the impact and accuracy of different algorithms, time periods, and sampling strategies for noise calculation. The results indicate that, with the correct settings, it is possible to measure noise levels in the range of 35–95 dB(A), with an accuracy of ±2 dB(A). Other studies have shown that an adequate sound level meter smartphone app that is used together with an external microphone can achieve compliance with most of the requirements of Class 2 of the IEC 61672/ANSI S1.4-2014 standard for periodic testing [
33], as well as full compliance for directional response in the horizontal plane [
34]. The authors of [
31] discuss the use of smartphones in the context of urban noise pollution and present a field-study evaluating the relevancy and accuracy in this context. The results indicate that smartphones can be used as useful noise measurement devices with an accuracy of ±3 dB(A) if careful review of the collected data is undertaken.
Furthermore, the calibration of smartphones for sound measurements and different approaches in this regard have been discussed in this context [
35,
36,
37,
38,
39]. In [
35], a laboratory calibration method for noise measurement smartphone apps is presented based on frequency response linearization and an A-weighted sound level correction. The authors of [
36] introduce a calibration method that does not require user interaction and is based on a node-based calibration utilizing a linear model and a common indoor quiet noise base. Slow-start issues of this approach are mitigated with the help of a crowdsourcing-based calibration. A cross-calibration method for participatory sensor networks based on outlier detection, crowd sensors-based correction, fixed sensors-based correction, and day–evening–night noise level (L
den) estimation is proposed by [
37]. In [
38], an averaging method for the calibration of a smartphone microphone against a reference microphone in terms of sound pressure level and frequency spectrum measurements is presented. It is shown that the method can be used to calibrate a smartphone using another smartphone calibrated using the same method. Finally, the authors of [
39] propose a calibration method for smartphones that does not require specific equipment or knowledge of the user by utilizing the low variability of the average noise emission of vehicles.
Finally, works that deal with large data sets of sound levels measured with smartphones can be found in the literature. For example, interpolation [
40,
41] and simulation [
41] strategies for producing sound maps based on such smartphone measurements have been investigated and discussed in this context.
However, to the best of our knowledge, the evaluation of an pre-existing large data set of uncalibrated environmental sound level amplitude values measured with smartphone sensors has not yet been considered in the literature. In this context, the chosen approach of making the data set of sound measurements comparable and interpretable by taking a sample of devices from this data set, calibrating them, and deriving corresponding equations is a novelty. Furthermore, none of the existing related works considers the assessment of environmental sounds measured with smartphone sensors, or smartphone sensor measurements in general, in the context of tinnitus.
3. Materials and Methods
First, the materials and methods used to perform the experiments in the scope of the work at hand are described. In this context, the data set used for the initial analysis is outlined. Furthermore, the selection of hardware and software components used for the experiments is described. Finally, the experimental setup and procedure are delineated.
3.1. Data Set for the Analysis
The data set for the analysis has been extracted from the TYT database on 26 January 2020 and contains a total of 76,542 entries. The structure of the TYT data set has been described in [
6]. In this data set, 45,712 (59.72%) entries belong to an Android device, 30,607 belong to an iOS device (39.99%), and 223 of the entries contain no user agent information (0.29%), as shown in
Table 1. As described in [
6], for every answer sheet that is collected with the TYT mobile applications for Android and iOS, the user agent is extracted and stored together with the answer data. For the Android version of the app, this user agent contains, among other information, the constant
Build.MODEL from the
android.os.Build API (
https://developer.android.com/reference/android/os/Build#MODEL, accessed on 1 October 2021), which can be used to uniquely identify the respective device model (see
Table 2). Note that for the iOS version of TYT, only the device type (iPhone/iPad) and the OS version is stored in this variable. For this reason, it was decided to use Android devices for the experiments in the scope of this work.
Furthermore, a sound level measurement capturing the environmental noise level for the first 15 s of the user completing the EMA questionnaire is performed and stored together with the EMA answer data. For the Android version of the app, this value represents an amplitude value retrieved by the Android
MediaRecorder API [
42] and averaged over the measurement period. The Android source code that was used in the application to retrieve this value is later analyzed and discussed in
Section 4.2. In contrast, the iOS version stores a relative dB value, which is not further analyzed in the scope of this work.
3.2. Hardware and Software Selection
The selection of the hardware as well as software used for the experiments is described in the following. This includes the selection process used to decide on the mobile devices to be investigated. In addition, other relevant hardware and software used to perform the experiments themselves, namely the sound level meter, calibrator, speaker, tone generator, and the mobile application for the sound measurement, are described.
3.2.1. Mobile Devices
In order to perform the experiments for an optimal subset of devices that allows assumptions to be made about as many entries in the data set as possible, the data set described in the previous section was analyzed from two different perspectives.
For the first analysis, the data set was analyzed on a per-device basis. To this end, the following procedure was used:
For each entry, the device IDs of the device models (see
Section 3.1) are extracted.
For each extracted device ID, the number of unique users and entries containing a sound measurement are counted.
For each device ID, the device names are looked up and device IDs with the same device name are summarized in a new row.
The 30 most used device models resulting from this process are shown in
Table 2.
For the second analysis, the data set was analyzed on a per-user basis with regard to the intended interpretation of the data. Thereby, users (and their respective device models used) were selected based on the following conditions:
There are more than 500 entries containing sound measurements for the user.
The reported tinnitus loudness (see [
6]) is fluctuating and appears plausible (e.g., not only zero values and not always the same value).
The sound measurement is fluctuating and appears plausible (e.g., not only zero values and not always the same value).
Finally, the identified devices from both analyses were combined, resulting in eight devices, as highlighted in
Table 2. Since the selected device models had to be purchased and not all devices were available at the time of starting the experiments, only four of the eight identified devices could be used (highlighted in dark gray in
Table 2). On top of these four devices, a
Google Pixel 2 was used simply because it was available to the experimenters. This resulted in the five devices shown in
Table 3. The Android version installed on each device can be found in the table. These are the maximum versions that were officially supported by the acquired devices at the time of the experiments.
3.2.2. Reference Sound Level Meter and Calibrator
As a reference sound level meter (SLM) for the performed sound measurements the
testo 815 by
Testo SE & Co. KGaA is used. It allows measurements in the range of 32 to 130 dB and a frequency range of 31.5 to 8000 Hz. The SLM supports frequency weightings A and C. Its accuracy is ±0.5 dB under reference conditions at 94 dB and 1000 Hz in accordance with Class 2 of IEC 60,942 [
43], with a resolution of 0.1 dB. In order to avoid distortions due to differences in temperature and air pressure, the sound level calibrator PeakTech 8010 by
PeakTech Prüf- und Messtechnik GmbH was used to calibrate the SLM. The accuracy of the calibrator is ±0.5 dB under reference conditions at 23
C, 1013 mbar air pressure and 65% humidity.
3.2.3. Speaker and Tone Generator
As a sound source for the experiments, the speaker of the
GigaWorks T20 Series II by
Creative connected to a notebook was used. The Online Tone Generator by Tomasz P. Szynalski [
44] was used on the notebook to generate a sine wave (pure tone) on different frequencies.
3.2.4. Mobile Application for Sound Measurement
In order to mimic the behavior of the TYT app for the experiments, the corresponding code for the sound measurement was extracted and integrated into a new sound measurement mobile application. In addition, this allows to implement a more convenient way of extracting the results, as well as more insights into various parameters of the sound measurement. Equivalent to the TYT app, the sound measurement application utilizes the previously described
MediaRecorder.getMaxAmplitude() method to capture the “maximum absolute amplitude that was sampled since the last call to this method” [
42] every 500 ms for a total of 30 values (15 s). These values, in turn, are then averaged into a single value. This averaging step was found to be erroneous in the original application, as will be discussed in
Section 4.2, and has been corrected for the application used in the experiments. Furthermore, the first two values of the sound measurement have shown to be erroneous for several smartphone models (see
Section 4.2) and are therefore discarded for the measurements. A screenshot of the sound measurement application is shown in
Figure 1. The user interface of the application allows to start the measurement and displays the measured single amplitude values as well as the resulting average value after the measurement is done. As shown in the screenshot, the first two values that are discarded and excluded from the average are highlighted by displaying them as crossed out in red. In addition to the features used for the experiments in the scope of this work, the application allows further configurations for experimental purposes (e.g., the option to change the audio encoding as well as to remove any audio compression) and offers the possibility to perform a continuous measurement of the sound level.
3.3. Experimental Setup and Procedure
Before conducting the actual experiments, various measurements were taken with different frequencies (125–2000 Hz), frequency weightings (A & C), distances to the sound source, and different smartphones to find the optimal settings for the experiments. The measurements indicate that—using the correct settings—the smartphones measure sound frequency-independently in the study’s frequency range of 125–2000 Hz, allowing a single frequency to be used for the experiments. The final settings are shown in
Table 4. A pure tone with a frequency of 1000 Hz was chosen for the sound source to obtain an unweighted result with the given SLM, since it supports only A- and C-weightings and these frequency weightings do not apply offsets at 1000 Hz [
45]. Note that, for this reason, dB
SPL, dB(A) and dB(C) at 1000 Hz are all equal and may therefore be used interchangeably for measurements at this frequency. For purposes of clarity, dB(C) is used for the remainder of this paper. To promote and facilitate the overall reproducibility, it was decided against a professional sound laboratory in favor of a simpler test environment for the experiments. Thus, for the measurement range, a lower limit of 50 dB(C) was chosen because the background noise in the test environment was measured at approximately 46 dB(C). 80 dB(C) was chosen as upper limit to avoid hearing damage for the experimenter (without additional protective measures). A distance of 30 cm between sound source and SLM/smartphone was chosen due to spatial restrictions to avoid reflections in the test room.
The experimental setup is shown in
Figure 2. The experiment is performed in a room of 15 square meters. The speaker is positioned at the edge of a 76 cm high table to avoid reflections by the table surface. Furthermore, it is fixated in a way that accounts for its slightly upward design and results in a vertical positioning of the speaker cone. The SLM and each of the smartphones are screwed onto tripods and positioned as close as possible to each other and 30 cm from the speaker, with their microphones pointed at the speaker. The SLM is thereby rotated 90 degrees so that its display can be read from a distance by the experimenter. The speaker and the smartphone are controlled remotely with a notebook that is positioned 2 m away from the table to avoid reflections by the equipment and the experimenter.
Before conducting the experiments, the SLM is calibrated with the calibrator to account for the room conditions such as temperature and air pressure. Thereby, the calibrator is attached to the SLM and turned on, producing a sound at 94 dB and 1000 Hz. The SLM is then configured to measuring range 50–100 dB, time weighting “Fast” (the measured samples are averaged every 125 ms) and frequency weighting A. The SLM is then potentially fine-tuned until the display also shows 94 dB.
The experimental procedure is structured as follows and was repeated for each of the five smartphones.
The tone generator software is used to create a 1000 Hz sinus signal (pure tone) with the speaker.
The volume is then adjusted until the SLM shows the desired sound pressure level.
Subsequently, the measurement is started on the smartphone. As described in
Section 3.2.4, the mobile application captures 30 measurement values (while discarding the first two values) for about 15 seconds, averages these values and stores them in a table.
The steps 1.–3. are repeated for 5 dB increments between 50 and 80 dB(C) (an explanation for the measuring range can be found in the first paragraph of this subsection), resulting in seven values per smartphone.
6. Summary and Outlook
In this work, an experiment was described with the objective to make a large data set of environmental sound measurements captured with smartphones and stored in the TrackYourTinnitus (TYT) database usable and comparable to enable meaningful interpretations in the context of tinnitus research. To this end, the existing data were analyzed to find the device models that contributed the most data entries. Four of these device models were then acquired for the experiments and equipped with a mobile app that mimics the environmental sound measurement of the TYT Android app. For the actual experiments, a sound signal was generated, the volume was adjusted to different sound levels using a professional calibrated sound level meter (SLM), and the values captured by the source code of the app on the Android devices were recorded. The results indicate that the amplitude values retrieved by the devices behave similarly except for a constant offset. Furthermore, equations derived from the results with a logarithmic regression analysis can be used to transform the values in the TYT database to (partially) comparable dB values. However, there are several limitations to the experiments due to the code of the TYT app and the experimental setup.
Since the experiments within the scope of this work were only conducted for a number of selected Android device models, in future work, more device models should be considered. This includes both Android as well as iOS device models. For the latter, there are far fewer different models, which are all produced by a single manufacturer, which simplifies the process. Once the values retrieved by the system APIs of the different device models and operation system versions are known, respective equations can be derived and used for any future measurements of the same models. Alternatively, along with the recommendations in
Section 5.1, a calibration feature could be integrated in a future version of the TYT app that could lead to even more accurate results.
In conclusion, it has been shown that measuring sound levels with mobile devices is possible and feasible for healthcare purposes, but there are many challenges to ensuring that the measured values are accurate, comparable, and interpretable and thus more future work towards the interpretation of mobile crowdsensing data should be conducted.