Next Article in Journal
A Three-Stage Inter-Channel Calibration Approach for Passive Radar on Moving Platforms Exploiting the Minimum Variance Power Spectrum
Next Article in Special Issue
Automated Assessment of the Quality of Phonocardographic Recordings through Signal-to-Noise Ratio for Home Monitoring Applications
Previous Article in Journal
Validating and Comparing Highly Resolved Commercial “Off the Shelf” PM Monitoring Sensors with Satellite Based Hybrid Models, for Improved Environmental Exposure Assessment
Previous Article in Special Issue
A Wearable Stethoscope for Long-Term Ambulatory Respiratory Health Monitoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Automatic Classification of Adventitious Respiratory Sounds: A (Un)Solved Problem? †

1
University of Coimbra, Centre for Informatics and Systems of the University of Coimbra, Department of Informatics Engineering, 3030-290 Coimbra, Portugal
2
Lab3R—Respiratory Research and Rehabilitation Laboratory, School of Health Sciences (ESSUA), University of Aveiro, 3810-193 Aveiro, Portugal
3
Institute of Biomedicine (iBiMED), University of Aveiro, 3810-193 Aveiro, Portugal
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in the Proceedings of the 25th International Conference on Pattern Recognition, Milan, Italy, 10–15 January 2021.
These authors contributed equally to this work.
Sensors 2021, 21(1), 57; https://doi.org/10.3390/s21010057
Submission received: 10 November 2020 / Revised: 12 December 2020 / Accepted: 16 December 2020 / Published: 24 December 2020
(This article belongs to the Special Issue Physiological Sound Acquisition and Processing)

Abstract

:
(1) Background: Patients with respiratory conditions typically exhibit adventitious respiratory sounds (ARS), such as wheezes and crackles. ARS events have variable duration. In this work we studied the influence of event duration on automatic ARS classification, namely, how the creation of the Other class (negative class) affected the classifiers’ performance. (2) Methods: We conducted a set of experiments where we varied the durations of the other events on three tasks: crackle vs. wheeze vs. other (3 Class); crackle vs. other (2 Class Crackles); and wheeze vs. other (2 Class Wheezes). Four classifiers (linear discriminant analysis, support vector machines, boosted trees, and convolutional neural networks) were evaluated on those tasks using an open access respiratory sound database. (3) Results: While on the 3 Class task with fixed durations, the best classifier achieved an accuracy of 96.9%, the same classifier reached an accuracy of 81.8% on the more realistic 3 Class task with variable durations. (4) Conclusion: These results demonstrate the importance of experimental design on the assessment of the performance of automatic ARS classification algorithms. Furthermore, they also indicate, unlike what is stated in the literature, that the automatic classification of ARS is not a solved problem, as the algorithms’ performance decreases substantially under complex evaluation scenarios.

1. Introduction

Respiratory diseases are among the most significant causes of morbidity and mortality worldwide [1] and are responsible for a substantial strain on health systems [2]. Early diagnosis and routine monitoring of patients with respiratory conditions are crucial for timely interventions [3]. Health professionals are trained to listen to and to recognize respiratory pathological findings, such as the presence of adventitious respiratory sounds (ARS) (e.g., crackles and wheezes), commonly in the anterior and posterior chest of the patient [4].
Respiratory sounds have been validated as an objective, simple, and noninvasive marker to check the respiratory system [5]. In clinical practice they are commonly assessed with pulmonary auscultation using a stethoscope. Despite the technological advances in auscultation devices, which have enabled the storing, analysis, and visualization of respiratory sounds in computers, digital auscultation is not yet entirely computational. Conventional auscultation is usually employed but has some drawbacks that limit its expansion in clinical practice and suitability in research due to: (i) the necessity of an expert to annotate the presence/absence and clinical meanings of normal/abnormal respiratory sounds [6]; (ii) the unfeasibility of providing continuous monitoring; (iii) its inherent inter-listener variability [7]; (iv) human audition and memory limitations [8]; and (v) as demonstrated during the COVID-19 crisis, it might not be viable in highly contagious situations, as stethoscopes can be a source of infection and need to be constantly sanitized [9]. These limitations could potentially be surmounted by automated respiratory sound analysis.
Respiratory sounds can be normal or abnormal. Normal respiratory sounds are nonmusical sounds provided by breathing and heard over the trachea and chest wall [10]. They show different acoustic properties, such as duration, pitch, and sound quality depending on the characteristics and position of subjects, respiratory flow, and recording location [6,11]. On the other hand, ARS are abnormal sounds that are overlayed on normal respiratory sounds [10]. ARS can be categorized into two main types: continuous and discontinuous [12]. The nomenclature recognized by the European Respiratory Society Task Force on Respiratory Sounds [13] is that continuous ARS are called wheezes, and discontinuous ARS are called crackles, which will be followed in this study.
Crackles are explosive, short, discontinuous, and nonmusical ARS that are attributed to the sudden opening and closing of abnormally closed airways [14]. They usually last less than 20 ms and can be classified as fine or coarse depending on their duration and frequency. Fine crackles have short duration and high frequency, whereas coarse crackles have longer duration and lower frequency [15]. Although the frequency range of crackles is bounded by 60 Hz and 2 kHz, most of their energy is concentrated between 60 Hz and 1.2 kHz [16]. The characteristics of crackles, such as number, regional distribution, timing in the respiratory cycle, and especially the distinction between fine and coarse, can all be used in the diagnosis of various types of lung diseases, such as bronchiectasis or pneumonia [15]. In contrast, wheezes are musical respiratory sounds usually longer than 100 ms. Their typical frequency range is between 100 and 1000 Hz, with harmonics that occasionally exceed 1000 Hz [17]. Wheezes occur when there is flow limitation and can be clinically defined by their duration, intensity, position in the respiratory cycle (inspiratory or expiratory), frequency (monophonic or polyphonic), number, gravity influence, and respiratory maneuvers [14]. Health professionals have utilized wheezes for diagnosing various respiratory conditions in adults (e.g., chronic obstructive pulmonary disease) and in children (e.g., bronchiolitis) [14].
Several authors have reported excellent performance on ARS classification. However, a robust experimental design is lacking in many studies, leading to overestimated results. To determine if a system is relevant, we need to understand the extent to which the characteristics it is extracting from the signal are confounded with the ground truth [18]. In the case of ARS classification, we argue that results in the literature are overestimated because little attention has been dedicated to the design of the negative classes; i.e., the classes against which the wheeze or crackle classification algorithms learn to discriminate.
The main objective of this study was to understand, through a set of experiments with different tasks, how experimental design can impact classification performance. We used four machine learning algorithms in the experiments: linear discriminant analysis (LDA), support vector machines with radial basis function (SVMrbf), random undersampling boosted trees (RUSBoost), and convolutional neural networks (CNNs). The LDA, SVMrbf, and RUSBoost classifiers were fed features extracted from the spectrograms, including some novel acoustic features. On the other hand, the CNNs received spectrogram and mel spectrogram images as inputs.
The article is organized as follows: in Section 2, we provide a general overview of the state-of-the-art on algorithms that have been used in similar works to automatically classify wheezes and crackles; in Section 3, we provide information regarding the dataset, and all the methods used in the different stages of the classification process; in Section 4, the obtained results are presented; and lastly, in Section 5, the results are analyzed and a global conclusion is presented. This paper expands previously published work [19] that focused only on wheeze classification.

2. Related Work

Several features and machine learning approaches have been proposed to develop methods for the automatic classification of respiratory sounds [20,21,22,23,24]. In most systems, suitable features are extracted from the signal and are subsequently used to classify ARS (i.e., crackles and wheezes). The most common features and machine learning algorithms employed in the literature to detect or classify ARS have been reported [6], including spectral features [25], mel-frequency cepstral coefficients (MFCCs) [26], entropy [27], wavelet coefficients [28], rule-based models [29], logistic regression models [30], support vector machines (SVM) [31], and artificial neural networks [32]. More recently, deep learning strategies have also been introduced, where the feature extraction and classification steps are merged into the learning algorithm [33,34,35].
Over the years, several authors have reported excellent results on ARS classification (Table 1). However, one crucial problem of this field has been its reliance on small or private data collections. Moreover, public repositories that have been commonly used in the literature (e.g., R.A.L.E. [36]) were designed for teaching, typically including a small number of ARS, and usually not containing environmental noise. Therefore, we chose to perform the evaluation on the Respiratory Sound Dataset (RSD), the largest publicly available respiratory sound database, which is described in Section 3.1.

3. Materials and Methods

3.1. Database

The ICBHI 2017 Respiratory Sound Database (RSD) is a publicly available database with 920 audio files containing a total of 5.5 h of recordings acquired from 126 participants of all ages [44]. The database (Table 2) contains audio samples collected independently by two research teams in two different countries. It is a challenging database, since the recordings contain several types of noises, background sounds, and different sampling frequencies; 1898 wheezes and 8877 crackles, which are found in 637 audio files, are annotated. The training set contains 1173 wheezes and 5996 crackles distributed among 203 and 311 files, respectively. The test set includes 725 wheezes and 2881 crackles distributed among 138 and 190 files, respectively. Moreover, patient-based splitting was performed following the split suggested by the RSD authors [45].

3.2. Random Event Generation

We created a custom script to randomly generate events with fixed durations of 50 ms and 150 ms. This procedure was followed to reproduce “Experiment 2” [44], an experiment where ARS events were classified against other events. By employing this process we were able to establish a fair comparison with other methods that were tested on the same database. To simultaneously guarantee variation and reproducibility, the seed for the random number generator changed for each file but was predetermined. The number of randomly generated events (RGE) of each duration is displayed in Table 3, along with the number of annotated events.
An alternative approach to generate the random events was then employed to study the impacts of event duration on the performance of the classifiers. For this approach, we started by visually inspecting the distribution of the annotated crackles’ and wheezes’ durations and found that a Burr distribution [46] provided a good fit for both distributions. The Burr distribution used to generate the events with durations shorter than 100 ms (otherCrackle) had probability density function
f x a , c , k = k c α n α c 1 ( 1 + n a c ) k + 1 , x > 0 ; α > 0 ; c > 0 ; k > 0
with α = 0.199 , c = 7.6698 , and k = 0.3146 . Durations longer than 100 ms were discarded. The Burr distribution used to generate the events with durations longer than 100 ms (otherWheeze) had probability density function:
f x a , c , k = k c α n α c 1 ( 1 + n a c ) k + 1 , x > 0 ; α > 0 ; c > 0 ; k > 0
with α = 0.2266 , c = 4.1906 , and k = 0.3029 . Durations longer than 2 s were discarded. The number of events with durations belonging to each distribution is displayed in Table 4, and the number of annotated events. Figure 1 displays both histograms with the according durations for each class and the Burr distributions used to generate the new random events.

3.3. Preprocessing

The audio files in RSD were recorded with different sampling rates. Therefore, we resampled every recording at 4000 Hz, the lowest sampling rate in the database. As the signal of interest was below 2000 Hz, this was considered a good resolution for Fourier analysis.

3.4. Time Frequency Representations

To generate the time frequency (TF) images of the audio events, three different representations were used: spectrogram, mel spectrogram, and scalogram. All images obtained with the different methods were normalized between 0 and 1. Moreover, TF representations were computed using MATLAB 2020a. We present only the descriptions and results for the two best performing TF representations, which were the spectrogram and the mel spectrogram.
The spectrogram obtained using the short-time Fourier transform (STFT) is one of the most used tools in audio analysis and processing, since it describes the evolution of the frequency components over time. The STFT representation (F) of a given discrete signal is given by [35]:
F ( n , ω ) = i = i ω ( n i ) e j ω
where ω ( i ) is a window function centered at instant n.
The mel scale [47] is a perceptual scale of equally spaced pitches, aiming to match the human perception of sound. The conversion from Hz into mels is performed using Equation (4):
m = 2595 · l o g 10 1 + f 700
The mel spectogram displays the spectrum of a sound on the mel scale. Figure 2 presents an example of both TF representations.
Since the database events have a wide range of durations, a maximum time for each event was defined according to Equation (5):
M e d i a n ( x ) + 2 × S t d ( x ) ,
with x corresponding to the durations of annotated wheeze events. Thus, the maximum length per event was established as 2 s, and smaller events were centered and zero-padded. The database also contained annotated events with more than 2 s (87 events). For these cases, only the first 2 s were considered, as we observed that the annotation of these longer events was less precise.
The TF representations were obtained with three windowing methods and three different window lengths: Hamming, Blackman–Harris, and rectangular windows with the respective sizes of 32, 64 ms and 128 ms. We decided to only report the results for the best performing windowing method and window length, the Blackman–Harris window with a size of 32 ms. Moreover, 512 points with 75% overlap were employed to compute the STFT and obtain both TF representations. For the mel spectrogram, 64 mel bandpass filters were employed. The resulting spectrogram and mel spectrogram images were 1 × 247 × 257 and 1 × 247 × 64.

3.5. Feature Extraction

To study how frame lengths influence spectrogram computation, a multiscale approach was followed for feature extraction. We computed spectrograms with three windowing methods and six window lengths: Hamming, Blackman–Harris, and rectangular windows with window lengths of 16, 32, 64, 128, 256, and 512 ms with 75% overlap. Then, 81 features were extracted from each frame of the spectrogram: 25 spectral features, 26 MFCC features, and 30 melodic features. Sensitivity analysis on the most realistic task, the 3 Class task with variable durations, revealed that the Hamming window produced slightly better results. Therefore, all the results obtained with the traditional approach of feature extraction, feature selection, and classification, were computed using the Hamming window. Most features were extracted using the MIR Toolbox 1.7.2 [48]. Table 5 provides a small description of all the employed features. For each event, five statistics of each feature were calculated: mean, standard deviation, median, minimum value, and maximum value. Therefore, the total number of features fed to the classifiers was 2430.

3.5.1. Spectral Features

We estimated several features from the spectrograms. To begin with, the first four standardized moments of the spectral distributions were computed: centroid, spread, skewness, and kurtosis. Then, we extracted other features that are commonly employed for characterizing the timbre of a sound, such as zero-crossing rate, entropy, flatness, roughness, and irregularity. The spectral flux (SF), which measures the Euclidean distance between the magnitude spectrum of successive frames, gave origin to three other features: SF inc, where only positive differences between frames were summed; SF halfwave, a halfwave rectification of the SF; SF median, where a median filter was used to remove spurious peaks. Finally, the amount of high-frequency energy was estimated in two ways: brightness, the high-frequency energy above a certain cut-off frequency; rolloff, which consists of finding the frequency below which a defined percentage of the total spectral energy is contained [48]. Brightness was computed at four frequencies: 100, 200, 400, and 800 Hz. Furthermore, we calculated the ratios between the brightnesses at 400 and 100 Hz, and between the brightnesses at 800 and 100 Hz. Rolloff was computed for the percentages of 95, 75, 25, and 5. Moreover, two novel features were computed: the outlier ratio between rolloffs at 5 and 95%; the interquartile ratio between rolloffs at 25 and 75%.

3.5.2. MFCC Features

The most common features used to describe the spectral shape of a sound are the MFCCs [49]. The MFCCs are calculated by converting the logarithm of the magnitude spectrum to the mel scale and computing the discrete cosine transform (DCT). As most of the signal information is concentrated in the first components, it is typical to extract the first 13 [48]. A first-order temporal differentiation of the MFCCs was also computed to understand the temporal evolution of the coefficients.

3.5.3. Melodic Features

Fundamental frequency, henceforth referred to as pitch, was the basis for computing the 30 melodic features. We computed the cepstral autocorrelation of each frame to estimate each event’s pitch curve. The maximum allowed pitch frequency was 1600 Hz, the highest fundamental frequency reported in the literature about wheezes [50]. The inharmonicity and the voicing curves were then computed based on the pitch curve. Next, we applied moving averages with durations 100, 250, 500, and 1000 ms to the time series to understand trends at different lengths and smooth the curves, giving origin to a total of 15 features. Finally, the same features were computed for a 400 Hz high-pass filtered version of the sound events. The rationale for this filter was the removal of the respiratory sounds, whose energy typically drops at 200 Hz [17], reaching insignificant levels at 400 Hz [50].

3.6. Feature Selection

After preliminary experiments, the minimum redundancy maximum relevance (MRMR) algorithm was chosen to perform feature selection. This algorithm provides ranks to the features that are mutually and maximally dissimilar and can represent the response variable effectively [51] The MRMR algorithm ranks features by calculating the mutual information quotient of the relevance and redundancy of each feature. For each experiment, three subsets of features were selected: the best 10 features selected by MRMR (10MRMR), the best 100 features selected by MRMR (100MRMR), and all 2430 features.
Table 6 and Table 7 list the 10 most relevant features as ranked by the MRMR algorithm on both fixed durations (FD) and variable durations (VD) sets. The first noteworthy fact is that, while features from every frame length were selected for all the tasks in the VD set, features extracted with the longest window size (512 ms) were not selected for any task in the FD set. Comparing the feature sets selected for the 3 Class tasks, while the best 2 features on the FD set were melodic features, the best 2 features and 3 of the best 10 features for the variable durations dataset were spectral. In both cases, 7 MFCC features were present in the 10 highest-ranked features. The novel brightness ratios turned out to be important features, as they were selected for every task in both sets. In the VD set, while no melodic features were selected for the 3 Class and 2 Class Crackles tasks, two of the smoothed inharmonicities we introduced were selected for the 2 Class Wheezes task.

3.7. Classifiers

We used four machine learning algorithms to classify the events: linear discriminant analysis (LDA), SVM with radial basis function (SVMrbf), random undersampling boosted trees (RUSBoost), and convolutional neural networks (CNNs). All the classifiers were trained 10 times with different seeds, and their hyperparameters were optimized on a validation set containing 25% of the training set. The models with the best hyperparameters were then applied to the test set. Bayesian optimization [52] was used to optimize the following hyperparameters of each traditional machine learning algorithm: delta for LDA; box constraint and kernel scale for SVMrbf; learning rate, number of variables to sample, number of learning cycles, minimum leaf size, and maximum number of splits for RUSBoost.
Three different CNN models were considered with regard to deep learning approaches: a model with a dual input configuration, using the spectrogram and mel spectrogram as inputs, and two other models using each of the TF representations individually as input. The architecture of the dual input model and the parameter for each of the layers is represented in Figure 3. The architecture of the models with a single input is the same as the one represented in Figure 3, considering the respective branch before the concatenation and the remaining layer afterwards. To train all the deep learning models, a total of 30 epochs were used with a batch size of 16 and 0.001 learning rate (Adam optimization algorithm). The early stopping strategy [53] was used to avoid overfitting during the training phase, i.e., stopping the training process after 10 consecutive epochs with an increase in the validation loss (validated in 25% of the training set).

3.8. Evaluation Metrics

We used the following measures to evaluate the performance of the algorithms:
A c c u r a c y = ( T P + T N ) ( T P + T N + F P + F N )
P r e c i s i o n = T P ( T P + F P )
S e n s i t i v i t y = T P ( T P + F N )
F 1 S c o r e ( F 1 ) = ( 2 × P r e c i s i o n × S e n s i t i v i t y ) ( P r e c i s i o n + S e n s i t i v i t y )
M a t t h e w s C o r r e l a t i o n C o e f f i c i e n t ( M C C ) = ( ( T P × T N ) ( F P × F N ) ) ( ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N ) )
where TP (True Positives) are events of the relevant class that are correctly classified; TN (True Negatives) are events of the other classes that are correctly classified; FP (False Positives) are events that are incorrectly classified as the relevant class; FN (False Negatives) are events of the relevant class that are incorrectly classified. The area under the ROC curve (AUC) was also computed for the binary cases. For multi-class classification, the evaluation metrics were computed in a one-vs-all fashion. Precision and sensitivity were not included in the tables of Section 4 to improve legibility.

4. Evaluation

In this section, we analyze the performance of the algorithms in three experiments that are detailed in the following subsections. Each experiment is composed of three tasks: one problem with three classes, i.e., crackles, wheezes, and others (3 Class); and two problems with two classes, i.e., crackles and others (2 Class Crackles), and wheezes and others (2 Class Wheezes). Each experiment is divided into three tasks in order to study how the performance of the algorithms are affected by having to classify each type of ARS against events of the same range of durations. By partitioning the RGE into two sets, we can determine whether the performance in the 3 Class problem is inflated.

4.1. Fixed Durations

Table 8 displays the results achieved by all the combinations of classifiers and feature sets on the test set of the 3 Class task with fixed durations. Results achieved by the best performing algorithm in "Experiment 2" of [44], SUK [41], are also shown as a baseline for comparison. Table 9 displays the results achieved by all the combinations of classifiers and feature sets on the test set of the 2 Class Crackles task with fixed durations. Table 10 displays the results achieved by all the combinations of classifiers and feature sets on the test set of the 2 Class Wheezes task with fixed durations.
With an accuracy of 95.8%, SVMrbf_MFCC was the best traditional classifier in the 3 Class task, surpassing the baseline accuracy of 91.2%. Nevertheless, the CNNs achieved even better results, with several reaching 96.9% accuracy. Given such great results, we decided to investigate whether the performance would be the same for two-class tasks, i.e., wheezes vs. 150 ms RGE, and crackles vs. 50 ms RGE. Surprisingly, while the traditional classifiers’ performance did not improve, the CNNs achieved better results in both tasks, with CNN_dualInput reaching 99.6% accuracy and 99.6% AUC in the 2 Class Crackles task, and 98.6% accuracy and 98.4% AUC in the 2 Class Wheezes task.

4.2. Fixed and Variable Durations

After noticing the CNNs had achieved almost perfect performance on the fixed durations experiment, we suspected the algorithms might be implicitly learning the duration of each event instead of the underlying characteristics of each type of sound. To test this, we designed a new experiment with a different approach to random event generation, detailed in Section 3.2. In this experiment, the training set was the same as before—i.e., the RGE had fixed durations—but the test set’s RGE had variable durations. Table 11 displays the results achieved by all the combinations of classifiers and feature sets on the test set of the 3 Class task with variable durations. As a baseline, we computed SUK’s results on this test set with the same training model as before. Table 12 displays the results achieved by all the combinations of classifiers and feature sets on the test set of the 2 Class Crackles task with variable durations. Table 13 displays the results achieved by all the combinations of classifiers and feature sets on the test set of the 2 Class Wheezes task with variable durations.
Looking at the results of the 3 Class task, the decline in performance is quite salient, with the accuracy decreasing by more than 30% for the best classifiers. The bulk of this decline was due to the class other, as can be seen in the last three columns of Table 11. With this experiment, we were able to grasp that classifiers were implicitly learning the duration of the events, rather than relevant characteristics of the classes. The performance did not improve in the 2 Class tasks. In the 2 Class Crackles task, the highest AUC, reached by SVMrbf_100MRMR, was 68.4%, whereas the AUC attained by the CNNs was close to 50%, thereby not being better than random. In the 2 Class Wheezes task, the best AUC, reached by SVMrbf_Full, was 57.2%, also close to random.

4.3. Variable Durations

Finally, in this experiment we examined whether the algorithms’ performance improved when training with RGE with variable durations. This experiment arguably represents the more realistic setup to evaluate the performance of the classifiers, as we aimed to remove the bias introduced by the generation of random events with fixed sizes. Table 14 displays the results achieved by all the combinations of classifiers and feature sets on the test set of the 3 Class task with variable durations. Table 15 displays the results achieved by all the combinations of classifiers and feature sets on the test set of the 2 Class Crackles task with variable durations. Table 16 displays the results achieved by all the combinations of classifiers and feature sets on the test set of the 2 Class Wheezes task with variable durations.
While the accuracy reached by the best traditional classifier RUSBoost_Full increased by 6.2% in the 3 Class task, the improvement in performance was especially appreciable in the CNNs, with CNN_dualInput reaching 81.8% accuracy an 20.3% increase in accuracy. Figure 4 displays confusion matrices for the best traditional and deep learning models. In the 2 Class Crackles task, CNN_dualInput achieved the best AUC, 84.9%, not much higher than the best AUC reached by a traditional classifier, SVMrbf_100MRMR, 80.1%. In the two-class wheezes task, traditional and deep learning classifiers attained similar results, 68.5% (SVMrbf_Full) and 72.7% (CNN_dualInput), respectively.

5. Discussion

In this work, we proposed a set of experiments that can be used to evaluate ARS classification systems. We demonstrated how random event generation can have a significant impact on the automatic classification of ARS through the evaluation of several classifiers on those experiments. As the performance of the algorithms presented in Section 4 shows, methods that seem to achieve promising results can fail if we change the way the other class is designed. This can happen even if the dataset where the systems are evaluated does not change. The substantial variance in performance between experiments might indicate that the generation of the random events with fixed durations introduces a considerable bias. Classifiers might be implicitly learning to identify the durations of the events. It is important to consider how data are used to train, validate, and test a trained model. Such a model should encode some essential structure of the underlying problem [54]. When a highly specified artificial system appears to give credence to the allegation that it is addressing a complex human task, the default position should be that the system relies upon characteristics confounded with the ground truth and is not actually addressing the problem it appears to be solving [18]. Our findings corroborate the need to test models on realistic and application-specific tasks [54].
Nevertheless, it is important to reiterate that the performance of the evaluated systems may have been influenced by the limitations of this dataset. As previously pointed out [44], these include the shortage of healthy adult participants and the unavailability of gold standard annotations, (i.e., annotations from multiple annotators). A future update of the database should also check for possible errors.
Automatic classification of ARS is a complex task that is not yet solved, despite the claims made in the literature. It may be particularly hard when algorithms are evaluated on challenging datasets, such as the RSD. Though significant work has been developed to classify ARS, none has been widely accepted [55]. While CNNs have become state-of-the-art solutions in several tasks [34], they were not enough to tackle this problem. Therefore, accelerating the development of machine learning algorithms is critical to the future of respiratory sounds analysis. Future work on ARS classification should focus on improving three crucial steps of the methodology: (i) TF representations; (ii) deep learning architectures; and (iii) evaluation. Other TF representations have been proposed for ARS classification, such as the wavelet transform [28], the S-transform [43], and the scalogram [56], but better denoising methods would allow us to extract more meaningful features. Hybrid deep learning architectures that combine convolutional layers with recurrent layers that learn the temporal context have been shown to perform well in other sound event classification tasks [57] and could be successfully applied in ARS classification. Finally, ARS classification systems should be evaluated on realistic datasets containing several noise sources.

Author Contributions

Conceptualization, B.M.R., D.P., and R.P.P.; data curation, B.M.R., D.P., and A.M.; formal analysis, B.M.R. and D.P.; funding acquisition, P.C. and R.P.P.; methodology, B.M.R. and D.P.; supervision, A.M., P.C., and R.P.P.; writing—original draft, B.M.R. and D.P.; writing—review and editing, B.M.R., D.P., A.M., P.C., and R.P.P. All authors reviewed and edited the manuscripts. All authors have read and agreed to the published version of the manuscript.

Funding

This research is partially supported by Fundação para a Ciência e Tecnologia (FCT) Ph.D. scholarship SFRH/BD/135686/2018 and by the Horizon 2020 Framework Programme of the European Union under grant agreement number 825572 (project WELMO) for the authors with affiliation 1. This research is partially supported by Fundo Europeu de Desenvolvimento Regional (FEDER) through Programa Operacional Competitividade e Internacionalização (COMPETE) and FCT under the project UID/BIM/04501/2013 and POCI-01-0145-FEDER-007628—iBiMED for the author with affiliations 2 and 3.

Data Availability Statement

The data used in this study are available in a publicly accessible repository: https://bhichallenge.med.auth.gr/ICBHI_2017_Challenge.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. The Top 10 Causes of Death. Available online: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death (accessed on 10 December 2020).
  2. Gibson, G.J.; Loddenkemper, R.; Lundbäck, B.; Sibille, Y. Respiratory health and disease in Europe: The new European Lung White Book. Eur. Respir. J. 2013, 42, 559–563. [Google Scholar] [CrossRef] [PubMed]
  3. Marques, A.; Oliveira, A.; Jácome, C. Computerized adventitious respiratory sounds as outcome measures for respiratory therapy: A systematic review. Respir. Care 2014, 59, 765–776. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Fleming, S.; Pluddemann, A.; Wolstenholme, J.; Price, C.; Heneghan, C.; Thompson, M. Diagnostic Technology: Automated lung sound analysis for asthma. Technology Report 2011.
  5. Jácome, C.; Marques, A. Computerized Respiratory Sounds in Patients with COPD: A Systematic Review. COPD J. Chronic Obstr. Pulm. Dis. 2015, 12, 104–112. [Google Scholar] [CrossRef] [PubMed]
  6. Pramono, R.X.A.; Bowyer, S.; Rodriguez-Villegas, E. Automatic adventitious respiratory sound analysis: A systematic review. PLoS ONE 2017, 12, e0177926. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Gurung, A.; Scrafford, C.G.; Tielsch, J.M.; Levine, O.S.; Checkley, W. Computerized lung sound analysis as diagnostic aid for the detection of abnormal lung sounds: A systematic review and meta-analysis. Respir. Med. 2011, 23, 1396–1403. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Reichert, S.; Gass, R.; Brandt, C.; Andrès, E. Analysis of Respiratory Sounds: State of the Art. Clin. Med. Circ. Respir. Pulm. Med. 2008, 2, CCRPM.S530. [Google Scholar] [CrossRef]
  9. Marinella, M.A. COVID-19 pandemic and the stethoscope: Do not forget to sanitize. Heart Lung J. Cardiopulm. Acute Care 2020, 49, 350. [Google Scholar] [CrossRef] [PubMed]
  10. Sovijärvi, A.R.; Dalmasso, F.; Vanderschoot, J.; Malmberg, L.P.; Righini, G.; Stoneman, S.A. Definition of terms for applications of respiratory sounds. Eur. Respir. Rev. 2000, 10, 597–610. [Google Scholar]
  11. Oliveira, A.; Marques, A. Respiratory sounds in healthy people: A systematic review. Respir. Med. 2014, 108, 550–570. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Hadjileontiadis, L.J.; Moussavi, Z.M.K. Current Techniques for Breath Sound Analysis. In Breath Sounds; Springer International Publishing: Cham, Switzerland, 2018; Chapter 9; pp. 139–177. [Google Scholar] [CrossRef]
  13. Pasterkamp, H.; Brand, P.L.; Everard, M.; Garcia-Marcos, L.; Melbye, H.; Priftis, K.N. Towards the standardisation of lung sound nomenclature. Eur. Respir. J. 2016, 47, 724–732. Available online: https://erj.ersjournals.com/content/47/3/724.full.pdf (accessed on 10 December 2020). [CrossRef] [PubMed] [Green Version]
  14. Marques, A.; Oliveira, A. Normal Versus Adventitious Respiratory Sounds. In Breath Sounds; Springer International Publishing: Cham, Switzerland, 2018; Chapter 10; pp. 181–206. [Google Scholar] [CrossRef]
  15. Douros, K.; Grammeniatis, V.; Loukou, I. Crackles and Other Lung Sounds. In Breath Sounds; Springer International Publishing: Cham, Switzerland, 2018; Chapter 12; pp. 225–236. [Google Scholar] [CrossRef]
  16. Abbas, A.; Fahim, A. An automated computerized auscultation and diagnostic system for pulmonary diseases. J. Med. Syst. 2010, 34, 1149–1155. [Google Scholar] [CrossRef]
  17. Bohadana, A.; Izbicki, G.; Kraman, S.S. Fundamentals of Lung Auscultation. N. Engl. J. Med. 2014, 370, 744–751. [Google Scholar] [CrossRef] [Green Version]
  18. Sturm, B.L. A simple method to determine if a music information retrieval system is a ‘horse’. IEEE Trans. Multimed. 2014, 16, 1636–1644. [Google Scholar] [CrossRef]
  19. Rocha, B.M.; Pessoa, D.; Marques, A.; Carvalho, P.; Paiva, R.P. Influence of Event Duration on Automatic Wheeze Classification. arXiv 2020, arXiv:2011.02874. [Google Scholar]
  20. Urquhart, R.B.; McGhee, J.; Macleod, J.E.; Banham, S.W.; Moran, F. The diagnostic value of pulmonary sounds: A preliminary study by computer-aided analysis. Comput. Biol. Med. 1981, 11, 129–139. [Google Scholar] [CrossRef]
  21. Murphy, R.L.; Del Bono, E.A.; Davidson, F. Validation of an automatic crackle (Rale) counter. Am. Rev. Respir. Dis. 1989, 140, 1017–1020. [Google Scholar] [CrossRef] [PubMed]
  22. Sankur, B.; Kahya, Y.P.; Çaǧatay Güler, E.; Engin, T. Comparison of AR-based algorithms for respiratory sounds classification. Comput. Biol. Med. 1994, 24, 67–76. [Google Scholar] [CrossRef]
  23. Du, M.; Chan, F.H.; Lam, F.K.; Sun, J. Crackle detection and classification based on matched wavelet analysis. In Proceedings of the 19th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. ’Magnificent Milestones and Emerging Opportunities in Medical Engineering’ (Cat. No.97CH36136), Chicago, IL, USA, 30 October–2 November 1997; Volume 4, pp. 1638–1641. [Google Scholar] [CrossRef] [Green Version]
  24. Palaniappan, R.; Sundaraj, K.; Ahamed, N.U. Machine learning in lung sound analysis: A systematic review. Integr. Med. Res. 2013, 33, 129–135. [Google Scholar] [CrossRef]
  25. Bokov, P.; Mahut, B.; Flaud, P.; Delclaux, C. Wheezing recognition algorithm using recordings of respiratory sounds at the mouth in a pediatric population. Comput. Biol. Med. 2016, 70, 40–50. [Google Scholar] [CrossRef]
  26. Nakamura, N.; Yamashita, M.; Matsunaga, S. Detection of patients considering observation frequency of continuous and discontinuous adventitious sounds in lung sounds. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), Orlando, FL, USA, 16–20 August 2016; pp. 3457–3460. [Google Scholar] [CrossRef]
  27. Liu, X.; Ser, W.; Zhang, J.; Goh, D.Y.T. Detection of adventitious lung sounds using entropy features and a 2-D threshold setting. In Proceedings of the 2015 10th International Conference on Information, Communications and Signal Processing (ICICS), Singapore, 2–4 December 2015. [Google Scholar] [CrossRef]
  28. Ulukaya, S.; Serbes, G.; Kahya, Y.P. Overcomplete discrete wavelet transform based respiratory sound discrimination with feature and decision level fusion. Biomed. Signal Process. Control 2017, 38, 322–336. [Google Scholar] [CrossRef]
  29. Pinho, C.; Oliveira, A.; Jácome, C.; Rodrigues, J.; Marques, A. Automatic crackle detection algorithm based on fractal dimension and box filtering. Procedia Comput. Sci. 2015, 64, 705–712. [Google Scholar] [CrossRef] [Green Version]
  30. Mendes, L.; Vogiatzis, I.M.; Perantoni, E.; Kaimakamis, E.; Chouvarda, I.; Maglaveras, N.; Henriques, J.; Carvalho, P.; Paiva, R.P. Detection of crackle events using a multi-feature approach. In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 16–20 August 2016; pp. 3679–3683. [Google Scholar] [CrossRef]
  31. Lozano, M.; Fiz, J.A.; Jané, R. Automatic Differentiation of Normal and Continuous Adventitious Respiratory Sounds Using Ensemble Empirical Mode Decomposition and Instantaneous Frequency. IEEE J. Biomed. Health Inform. 2016, 20, 486–497. [Google Scholar] [CrossRef] [PubMed]
  32. Chamberlain, D.; Kodgule, R.; Ganelin, D.; Miglani, V.; Fletcher, R.R. Application of semi-supervised deep learning to lung sound analysis. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), Orlando, FL, USA, 16–20 August 2016; pp. 804–807. [Google Scholar] [CrossRef]
  33. Aykanat, M.; Kılıç, Ö.; Kurt, B.; Saryal, S. Classification of lung sounds using convolutional neural networks. EURASIP J. Image Video Process. 2017, 2017, 65. [Google Scholar] [CrossRef]
  34. Bardou, D.; Zhang, K.; Ahmad, S.M. Lung sounds classification using convolutional neural networks. Artif. Intell. Med. 2018, 88, 58–69. [Google Scholar] [CrossRef] [PubMed]
  35. Demir, F.; Sengur, A.; Bajaj, V. Convolutional neural networks based efficient approach for classification of lung diseases. Health Inf. Sci. Syst. 2020, 8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. Owens, D. Rale Lung Sounds 3.0. CIN Comput. Inform. Nurs. 2002, 5, 9–10. [Google Scholar]
  37. Forkheim, K.E.; Scuse, D.; Pasterkamp, H. Comparison of neural network models for wheeze detection. In Proceedings of the IEEE WESCANEX 95. Communications, Power, and Computing, Winnipeg, MB, Canada, 15–16 May 1995; Volume 1, pp. 214–219. [Google Scholar] [CrossRef]
  38. Riella, R.; Nohama, P.; Maia, J. Method for automatic detection of wheezing in lung sounds. Braz. J. Med Biol. Res. 2009, 42, 674–684. [Google Scholar] [CrossRef] [Green Version]
  39. Mendes, L.; Vogiatzis, I.M.; Perantoni, E.; Kaimakamis, E.; Chouvarda, I.; Maglaveras, N.; Tsara, V.; Teixeira, C.; Carvalho, P.; Henriques, J.; et al. Detection of wheezes using their signature in the spectrogram space and musical features. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), Milan, Italy, 25–29 August 2015; pp. 5581–5584. [Google Scholar] [CrossRef]
  40. Grønnesby, M.; Solis, J.C.A.; Holsbø, E.; Melbye, H.; Bongo, L.A. Feature extraction for machine learning based crackle detection in lung sounds from a health survey. arXiv 2017, arXiv:1706.00005. [Google Scholar]
  41. Serbes, G.; Ulukaya, S.; Kahya, Y.P. An Automated Lung Sound Preprocessing and Classification System Based On Spectral Analysis Methods. In Precision Medicine Powered by pHealth and Connected Health. ICBHI 2017. IFMBE Proceedings; Maglaveras, N., Chouvarda, I., de Carvalho, P., Eds.; Springer: Singapore, 2018; Volume 66, pp. 45–49. [Google Scholar]
  42. Jakovljević, N.; Lončar-Turukalo, T. Hidden Markov Model Based Respiratory Sound Classification. In Precision Medicine Powered by pHealth and Connected Health. ICBHI 2017. IFMBE Proceedings; Maglaveras, N., Chouvarda, I., de Carvalho, P., Eds.; Springer: Singapore, 2018; Volume 66, pp. 39–43. [Google Scholar]
  43. Chen, H.; Yuan, X.; Li, J.; Pei, Z.; Zheng, X. Automatic Multi-Level In-Exhale Segmentation and Enhanced Generalized S-Transform for wheezing detection. Comput. Methods Programs Biomed. 2019, 178, 163–173. [Google Scholar] [CrossRef]
  44. Rocha, B.M.; Filos, D.; Mendes, L.; Serbes, G.; Ulukaya, S.; Kahya, Y.P.; Jakovljevic, N.; Turukalo, T.L.; Vogiatzis, I.M.; Perantoni, E.; et al. An open access database for the evaluation of respiratory sound classification algorithms. Physiol. Meas. 2019, 40. [Google Scholar] [CrossRef] [PubMed]
  45. Rocha, B.M.; Filos, D.; Mendes, L.; Vogiatzis, I.; Perantoni, E.; Kaimakamis, E.; Natsiavas, P.; Oliveira, A.; Jácome, C.; Marques, A.; et al. A respiratory sound database for the development of automated classification. IFMBE Proc. 2018, 66, 33–37. [Google Scholar] [CrossRef]
  46. Burr, I.W. Cumulative Frequency Functions. Ann. Math. Stat. 1942, 13, 215–232. [Google Scholar] [CrossRef]
  47. Stevens, S.; Volkmann, J.; Newman, E.B. A Scale for the Measurement of the Psychological Magnitude Pitch. J. Acoust. Soc. Am. 1937, 8, 185–190. [Google Scholar] [CrossRef]
  48. Lartillot, O.; Toiviainen, P. Mir in matlab (II): A toolbox for musical feature extraction from audio. In Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR 2007), Vienna, Austria, 23–27 September 2007; pp. 127–130. [Google Scholar]
  49. Davis, S.; Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 1980, 28, 357–366. [Google Scholar] [CrossRef] [Green Version]
  50. Charbonneau, G.; Ademovic, E.; Cheetham, B.; Malmberg, L.; Vanderschoot, J.; Sovijärvi, A. Basic techniques for respiratory sound analysis. Eur. Respir. Rev. 2000, 10, 625–635. [Google Scholar]
  51. Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005, 3, 185–205. [Google Scholar] [CrossRef]
  52. Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian Optimization of Machine Learning Algorithms. Adv. Neural Inf. Process. Syst. 2012, 25, 2951–2959. [Google Scholar]
  53. Prechelt, L. Early Stopping-However, When? In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1998; pp. 55–69. [Google Scholar] [CrossRef]
  54. D’Amour, A.; Heller, K.; Moldovan, D.; Adlam, B.; Alipanahi, B.; Beutel, A.; Chen, C.; Deaton, J.; Eisenstein, J.; Hoffman, M.D.; et al. Underspecification Presents Challenges for Credibility in Modern Machine Learning. arXiv 2020, arXiv:2011.03395. [Google Scholar]
  55. Marques, A.; Jácome, C. Future Prospects for Respiratory Sound Research. In Breath Sounds; Springer International Publishing: Cham, Switzerland, 2018; pp. 291–304. [Google Scholar] [CrossRef]
  56. Jayalakshmy, S.; Sudha, G.F. Scalogram based prediction model for respiratory disorders using optimized convolutional neural networks. Artif. Intell. Med. 2020, 103, 101809. [Google Scholar] [CrossRef]
  57. Adavanne, S.; Politis, A.; Nikunen, J.; Virtanen, T. Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks. IEEE J. Sel. Top. Signal Process. 2019, 13, 34–48. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Histogram of adventitious respiratory sounds (ARS) events’ durations versus Burr distributions (red line).
Figure 1. Histogram of adventitious respiratory sounds (ARS) events’ durations versus Burr distributions (red line).
Sensors 21 00057 g001
Figure 2. Example of both TF representations of a wheeze event (left—spectrogram, right—mel spectrogram).
Figure 2. Example of both TF representations of a wheeze event (left—spectrogram, right—mel spectrogram).
Sensors 21 00057 g002
Figure 3. Dual input CNN architecture.
Figure 3. Dual input CNN architecture.
Sensors 21 00057 g003
Figure 4. Confusion matrices for the best traditional and deep learning models on the 3 Class task––training: variable duration; testing: variable duration.
Figure 4. Confusion matrices for the best traditional and deep learning models on the 3 Class task––training: variable duration; testing: variable duration.
Sensors 21 00057 g004
Table 1. Summary of selected works.
Table 1. Summary of selected works.
ReferenceData#Classes: ClassesBest Results
Forkheim et al. [37]Participants: NA; Recordings: NA; Source: Private2: Wheezes and NormalAccuracy: 96%
Riella et al. [38]Participants: NA; Recordings: 28 Source R.A.L.E.2: Wheezes and NormalAccuracy: 85%; Sensitivity: 86%; Specificity: 82%
Mendes et al. [39]Participants: 12; Recordings: 24; Source Private2: Wheezes and NormalAccuracy: 98%; Sensitivity: 91%; Specificity: 99%; MCC: 93%
Pinho et al. [29]Participants: 10; Recordings: 24; Source: Private1: CracklesPrecision: 95%; Sensitivity: 89%; F1: 92%
Chamberlain et al. [32]Participants: 284; Recordings: 500; Source: Private3: Wheezes, Crackles, and NormalWheeze AUC: 86%; Crackle AUC: 73%
Lozano et al. [31]Participants: 30; Recordings: 870; Source: Private2: Wheezes and NormalAccuracy: 94%; Precision: 95%; Sensitivity: 94%; Specificity: 94%
Gronnesby et al. [40]Participants: NA; Recordings: 383; Source: Private2: Crackles and NormalPrecision: 86% Sensitivity: 84% F1: 84%
Aykanat et al. [33]Participants: 1630; Recordings: 17930; Source: Private2: Healthy and PathologicalAccuracy: 86%; Precision: 86%; Sensitivity: 86%; Specificity: 86%
Bardou et al. [34]Participants: 15; Recordings: 15; Source: R.A.L.E.7, including Wheezes, Crackles, and NormalAccuracy: 96%; Wheeze Precision: 98%; Wheeze Sensitivity: 100%
Serbes et al. [41]Participants: 126; Recordings: 920; Source: RSD3: Wheezes, Crackles, and NormalWheeze Sensitivity: 79%; Crackle Sensitivity: 95%; Normal Sensitivity: 91%
Jakovljevic et al. [42]Participants: 126; Recordings: 920; Source: RSD3: Wheezes, Crackles, and NormalWheeze Sensitivity: 52%; Crackle Sensitivity: 56%; Normal Sensitivity: 52%
Chen et al. [43]Participants: NA; Recordings: 240; Source: R.A.L.E. and RSD2: Wheezes and NormalAccuracy: 99%; Sensitivity: 96%; Specificity: 99%
AUC: area under the receiver operating characteristic curve; F1: F1-score; MCC: the Matthews correlation coefficient; NA: not available; RSD: Respiratory Sound Database.
Table 2. Demographic information of the database.
Table 2. Demographic information of the database.
Number of recordings920
Sampling frequency (number of recordings)4 kHz (90); 10 kHz (6); 44.1 kHz (824)
Bits per sample16
Average recording duration21.5 s
Number of participants126: 77 adults, 49 children
DiagnosisCOPD (64); Healthy (26); URTI (14); Bronchiectasis (7); Bronchiolitis (6); Pneumonia (6); LRTI (2); Asthma (1)
Sex79 males, 46 females (NA: 1)
Age (mean ± standard deviation)43.0 ± 32.2 years (NA: 1)
Age of adult participants67.6 ± 11.6 years (NA: 1)
Age of child participants4.8 ± 4.6 years
BMI of adult participants27.2 ± 5.4 kg m2 (NA: 2)
Weight of child participants21.4 ±17.2 kg (NA: 5)
Height of child participants104.7 ± 30.8 cm (NA: 7)
COPD: chronic obstructive pulmonary disease; LRTI: lower respiratory tract infection; NA: not available; URTI: upper respiratory tract infection.
Table 3. Number of randomly generated events (RGE) with fixed durations in the training and test sets.
Table 3. Number of randomly generated events (RGE) with fixed durations in the training and test sets.
Training SetTest SetTotal
Number of crackles599628818877
Number of wheezes11737251898
Number of 50 ms events155710502607
Number of 150 ms events14569622418
Table 4. Number of RGE with variable durations in the training and test sets.
Table 4. Number of RGE with variable durations in the training and test sets.
Training SetTest SetTotal
Number of crackles599628818877
Number of wheezes11737251898
Number of otherCrackle events247816804158
Number of otherWheeze events575388963
Table 5. Small description of each feature.
Table 5. Small description of each feature.
TypeFeaturesDescription
SpectralSpectral CentroidCenter of mass of the spectral distribution
Spectral SpreadVariance of the spectral distribution
Spectral SkewnessSkewness of the spectral distribution
Spectral KurtosisExcess kurtosis of the spectral distribution
Zero-crossing RateWaveform sign-change rate
Spectral EntropyEstimation of the complexity of the spectrum
Spectral FlatnessEstimation of the noisiness of a spectrum
Spectral RoughnessEstimation of the sensory dissonance
Spectral IrregularityEstimation of the spectral peaks’ variability
Spectral FluxEuclidean distance between the spectrum of successive frames
Spectral Flux IncSpectral flux with focus on increasing energy solely
Spectral Flux HalfwaveHalfwave rectified spectral flux
Spectral Flux MedianMedian filtered spectral flux
Spectral BrightnessAmount of energy above 100, 200, 400, and 800 Hz
Brightness 400 RatioRatio between spectral brightness at 400 and 100 Hz
Brightness 800 RatioRatio between spectral brightness at 800 and 100 Hz
Spectral RolloffFrequency such that 95, 75, 25, and 5% of the total energy is contained below it
Rolloff Outlier RatioRatio between spectral rolloff at 5 and 95%
Rolloff Interquartile RatioRatio between spectral rolloff at 25 and 75%
MFCCMFCC13 Mel-frequency cepstral coefficients
Delta-MFCC1st-order temporal differentiation of the MFCCs
MelodicPitchFundamental frequency estimation
Pitch SmoothingMoving average of the pitch curve with lengths of 100, 250, 500, and 1000 ms
InharmonicityPartials non-multiple of fundamental frequency
Inharmonicity SmoothingMoving average of the inharmonicity curve with lengths of 100, 250, 500, and 1000 ms
VoicingPresence of fundamental frequency
Voicing SmoothingMoving average of the voicing curve with lengths of 100, 250, 500, and 1000 ms
Table 6. Ten highest-ranked features (fixed durations).
Table 6. Ten highest-ranked features (fixed durations).
Rank3 Class2 Class Crackles2 Class Wheezes
1std_melinharm250ms_32std_melinharm250ms_32std_melinharm500ms_64
2median_melvoicing_16max_melinharm_64median_melpitchHF_16
3std_deltamfcc2_64min_specbright4ratio_32std_melpitchHF_128
4std_deltamfcc10_64max_speccentroid_256std_specrolloff05_32
5median_specbright4ratio_32min_mfcc11_16std_specrolloff05_128
6median_deltamfcc7_32min_deltamfcc11_32max_melvoicingHF_128
7min_deltamfcc5_128std_deltamfcc3_32std_specbright4ratio_256
8median_deltamfcc13_32median_deltamfcc13_16mean_melinharmHF250ms_32
9max_mfcc2_64min_deltamfcc5_32std_specrolloff05_16
10median_deltamfcc1_32min_deltamfcc7_16max_mfcc12_256
min: minimum; max: maximum; std: standard deviation; spec: spectral; mel: melodic; inharm: inharmonicity; HF: high-frequency; rolloffOutRatio: rolloff outlier ratio; rolloffIQRatio: rolloff interquartile ratio; bright8ratio: brightness 800 ratio; bright4ratio: brightness 400 ratio.
Table 7. Ten highest-ranked features (variable durations).
Table 7. Ten highest-ranked features (variable durations).
Rank3 Class2 Class Crackles2 Class Wheezes
1std_specentropy_128min_specbright4ratio_32mean_specbright8ratio_16
2std_specskewness_64max_speccentroid_128std_mfcc5_512
3min_deltamfcc12_64min_deltamfcc7_32std_melinharm250ms_16
4std_specbright8ratio_64min_deltamfcc3_16mean_mfcc11_32
5mean_deltamfcc13_512median_deltamfcc6_32mean_deltamfcc1_64
6median_deltamfcc1_32mean_deltamfcc13_64std_mfcc5_128
7max_mfcc11_256max_mfcc11_64std_melinharmHF1s_16
8min_deltamfcc10_256mean_specirregularity_512min_deltamfcc5_512
9median_deltamfcc10_32max_deltamfcc1_256std_deltamfcc3_32
10std_mfcc5_16max_deltamfcc8_128median_deltamfcc5_16
min: minimum; max: maximum; std: standard deviation; spec: spectral; mel: melodic; inharm: inharmonicity; HF: high-frequency; rolloffOutRatio: rolloff outlier ratio; rolloffIQRatio: rolloff interquartile ratio; bright8ratio: brightness 800 ratio; bright4ratio: brightness 400 ratio.
Table 8. Performance results obtained with 3 classes (crackle vs. wheeze vs. other)—training: fixed duration; testing: fixed duration.
Table 8. Performance results obtained with 3 classes (crackle vs. wheeze vs. other)—training: fixed duration; testing: fixed duration.
ClassifiersAccuracyF1WheezMCCWheezF1CrackMCCCrackF1OtherMCCOther
SUK (Baseline)91.277.874.595.190.090.585.2
LDA_10MRMR80.4 ± 0.041.0 ± 0.135.2 ± 0.192.4 ± 0.085.4 ± 0.076.4 ± 0.061.9 ± 0.1
LDA_100MRMR81.1 ± 0.763.1 ± 0.958.5 ± 0.791.8 ± 0.085.5 ± 0.075.5 ± 1.461.8 ± 1.7
LDA_Full84.2 ± 1.470.9 ± 1.566.6 ± 1.791.0 ± 0.681.7 ± 1.779.5 ± 2.669.0 ± 3.5
SVMrbf_10MRMR82.9 ± 0.361.0 ± 2.955.6 ± 3.291.3 ± 0.582.3 ± 1.178.5 ± 0.566.1 ± 0.9
SVMrbf_100MRMR88.3 ± 0.376.9 ± 0.573.8 ± 0.692.5 ± 0.384.7 ± 0.786.2 ± 0.378.4 ± 0.5
SVMrbf_Full89.7 ± 1.076.8 ± 3.074.1 ± 3.193.9 ± 0.587.8 ± 0.988.1 ± 1.281.2 ± 2.0
RUSBoost_10MRMR89.7 ± 0.482.4 ± 1.179.9 ± 1.592.6 ± 0.485.2 ± 0.988.4 ± 0.582.0 ± 0.7
RUSBoost_100MRMR91.3 ± 0.583.7 ± 1.081.3 ± 1.293.9 ± 0.487.8 ± 0.990.5 ± 0.685.1 ± 1.0
RUSBoost_Full92.3 ± 1.384.9 ± 1.982.7 ± 2.294.6 ± 0.889.2 ± 1.391.7 ± 1.787.0 ± 2.7
CNN_dualInput96.9 ± 0.389.3 ± 0.987.7 ± 1.097.7 ± 0.295.3 ± 0.498.4 ± 0.697.6 ± 0.9
CNN_Spectrogram96.2 ± 0.388.1 ± 0.886.4 ± 0.896.8 ± 0.393.4 ± 0.698.2 ± 0.397.3 ± 0.4
CNN_melSpectrogram96.7 ± 0.288.9 ± 0.987.3 ± 1.097.5 ± 0.294.8 ± 0.598.4 ± 0.397.6 ± 0.4
Table 9. Performance results obtained with 2 classes (crackle vs. other)—training: fixed duration; testing: fixed duration.
Table 9. Performance results obtained with 2 classes (crackle vs. other)—training: fixed duration; testing: fixed duration.
ClassifiersAccuracyAUCCrackF1CrackMCCCrackF1OtherMCCOther
LDA_10MRMR88.9 ± 0.693.9 ± 0.792.0 ± 0.385.1 ± 1.281.9 ± 1.978.3 ± 2.7
LDA_100MRMR88.9 ± 0.092.8 ± 0.691.8 ± 0.085.5 ± 0.082.8 ± 0.079.9 ± 0.0
LDA_Full88.1 ± 0.293.0 ± 0.291.5 ± 0.283.6 ± 0.379.9 ± 0.475.3 ± 0.4
SVMrbf_10MRMR91.2 ± 0.295.2 ± 0.994.0 ± 0.187.7 ± 0.383.5 ± 0.579.7 ± 0.6
SVMrbf_100MRMR93.3 ± 0.297.6 ± 0.495.4 ± 0.190.6 ± 0.287.4 ± 0.384.5 ± 0.4
SVMrbf_Full93.5 ± 0.697.7 ± 0.395.6 ± 0.491.0 ± 0.988.0 ± 1.285.3 ± 1.5
RUSBoost_10MRMR93.1 ± 0.397.5 ± 0.195.3 ± 0.290.4 ± 0.487.4 ± 0.784.5 ± 0.8
RUSBoost_100MRMR95.2 ± 0.798.8 ± 0.296.7 ± 0.593.2 ± 1.091.0 ± 1.288.9 ± 1.4
RUSBoost_Full94.7 ± 0.998.8 ± 0.496.3 ± 0.792.6 ± 1.290.4 ± 1.488.3 ± 1.6
CNN_dualInput99.6 ± 0.199.6 ± 0.299.8 ± 0.199.1 ± 0.399.3 ± 0.299.1 ± 0.3
CNN_Spectrogram98.5 ± 0.597.8 ± 1.199.0 ± 0.396.2 ± 1.297.2 ± 0.996.2 ± 1.2
CNN_melSpectrogram99.4 ± 0.299.2 ± 0.599.6 ± 0.298.5 ± 0.698.9 ± 0.598.5 ± 0.6
Table 10. Performance results obtained with 2 classes (wheeze vs. other)—training: fixed duration; testing: fixed duration.
Table 10. Performance results obtained with 2 classes (wheeze vs. other)—training: fixed duration; testing: fixed duration.
ClassifiersAccuracyAUCWheezF1WheezMCCWheezF1OtherMCCOther
LDA_10MRMR82.5 ± 0.283.0 ± 1.175.2 ± 0.574.6 ± 0.486.5 ± 0.284.1 ± 0.2
LDA_100MRMR84.1 ± 0.088.6 ± 0.077.3 ± 0.177.2 ± 0.187.8 ± 0.185.8 ± 0.0
LDA_Full83.3 ± 0.283.6 ± 0.178.1 ± 0.276.1 ± 0.286.5 ± 0.383.9 ± 0.5
SVMrbf_10MRMR84.4 ± 1.387.3 ± 1.180.3 ± 2.378.1 ± 2.287.1 ± 0.884.4 ± 0.9
SVMrbf_100MRMR87.2 ± 0.492.8 ± 1.484.1 ± 0.982.2 ± 0.789.3 ± 0.487.1 ± 0.5
SVMrbf_Full88.6 ± 0.492.5 ± 1.186.1 ± 0.584.2 ± 0.690.3 ± 0.488.3 ± 0.4
RUSBoost_10MRMR91.6 ± 1.296.2 ± 0.789.9 ± 1.588.6 ± 1.792.7 ± 1.191.3 ± 1.3
RUSBoost_100MRMR91.0 ± 1.096.5 ± 0.689.7 ± 1.288.2 ± 1.491.9 ± 0.990.3 ± 1.1
RUSBoost_Full93.6 ± 2.097.8 ± 0.892.5 ± 2.291.4 ± 2.694.4 ± 1.793.3 ± 2.1
CNN_dualInput98.2 ± 0.498.1 ± 0.497.9 ± 0.596.4 ± 0.998.5 ± 0.496.4 ± 0.9
CNN_Spectrogram98.6 ± 0.298.4 ± 0.298.3 ± 0.297.1 ± 0.498.8 ± 0.297.1 ± 0.4
CNN_melSpectrogram98.3 ± 0.398.1 ± 0.297.9 ± 0.396.4 ± 0.698.5 ± 0.296.4 ± 0.6
Table 11. Performance results obtained with 3 classes (crackle vs. wheeze vs. other)—training: fixed duration; testing: variable duration.
Table 11. Performance results obtained with 3 classes (crackle vs. wheeze vs. other)—training: fixed duration; testing: variable duration.
ClassifiersAccuracyF1WheezMCCWheezF1CrackMCCCrackF1OtherMCCOther
SUK (Baseline)63.368.163.576.847.121.714.6
LDA_10MRMR60.3 ± 0.145.0 ± 0.142.4 ± 0.075.1 ± 0.143.2 ± 0.236.3 ± 0.09.8 ± 0.1
LDA_100MRMR61.1 ± 0.069.2 ± 0.565.0 ± 0.673.5 ± 0.039.7 ± 0.028.8 ± 0.49.3 ± 0.1
LDA_Full62.9 ± 0.366.6 ± 0.461.7 ± 0.676.1 ± 0.445.4 ± 1.028.6 ± 1.413.9 ± 0.6
SVMrbf_10MRMR61.9 ± 0.160.3 ± 2.554.9 ± 2.375.7 ± 0.244.2 ± 0.531.5 ± 1.212.0 ± 0.3
SVMrbf_100MRMR63.7 ± 0.268.1 ± 0.463.3 ± 0.576.5 ± 0.146.4 ± 0.429.1 ± 0.615.7 ± 0.5
SVMrbf_Full63.6 ± 0.566.9 ± 2.462.0 ± 2.877.0 ± 0.247.5 ± 0.528.6 ± 1.914.4 ± 0.8
RUSBoost_10MRMR62.1 ± 0.568.6 ± 1.764.8 ± 2.475.8 ± 0.144.4 ± 0.420.5 ± 1.310.5 ± 1.6
RUSBoost_100MRMR62.7 ± 0.269.3 ± 0.865.5 ± 1.376.2 ± 0.245.6 ± 0.520.3 ± 3.211.7 ± 1.0
RUSBoost_Full62.9 ± 0.470.8 ± 0.767.2 ± 0.876.4 ± 0.646.1 ± 1.419.0 ± 4.611.4 ± 1.8
CNN_dualInput61.5 ± 0.473.0 ± 0.969.6 ± 1.275.4 ± 0.443.6 ± 1.23.4 ± 0.53.2 ± 0.9
CNN_Spectrogram61.7 ± 0.471.5 ± 0.868.0 ± 1.275.6 ± 0.443.9 ± 1.17.5 ± 1.27.7 ± 0.9
CNN_melSpectrogram61.6 ± 0.372.0 ± 0.868.8 ± 1.075.9 ± 0.344.8 ± 0.94.3 ± 1.03.9 ± 1.3
Table 12. Performance results obtained with 2 classes (crackle vs. other)—training: fixed duration; testing: variable duration.
Table 12. Performance results obtained with 2 classes (crackle vs. other)—training: fixed duration; testing: variable duration.
ClassifiersAccuracyAUCCrackF1CrackMCCCrackF1OtherMCCOther
LDA_10MRMR62.6 ± 0.866.4 ± 2.674.7 ± 0.842.2 ± 1.828.7 ± 0.815.5 ± 1.2
LDA_100MRMR61.5 ± 0.067.6 ± 0.673.5 ± 0.039.7 ± 0.029.1 ± 0.013.8 ± 0.0
LDA_Full65.7 ± 0.370.5 ± 0.076.4 ± 0.146.9 ± 0.437.3 ± 0.824.7 ± 0.8
SVMrbf_10MRMR65.5 ± 0.166.0 ± 0.977.5 ± 0.149.1 ± 0.226.5 ± 0.620.9 ± 0.5
SVMrbf_100MRMR66.1 ± 0.168.4 ± 2.078.1 ± 0.150.7 ± 0.125.1 ± 0.722.3 ± 0.5
SVMrbf_Full65.7 ± 0.156.9 ± 2.177.8 ± 0.150.0 ± 0.324.1 ± 1.220.8 ± 0.5
RUSBoost_10MRMR65.3 ± 0.354.5 ± 0.877.5 ± 0.149.0 ± 0.424.4 ± 1.319.7 ± 1.0
RUSBoost_100MRMR64.6 ± 0.354.8 ± 0.577.6 ± 0.149.8 ± 0.315.5 ± 2.015.7 ± 1.3
RUSBoost_Full65.1 ± 0.355.3 ± 1.377.5 ± 0.349.1 ± 1.022.6 ± 3.818.8 ± 1.6
CNN_dualInput63.6 ± 0.350.7 ± 0.477.6 ± 0.17.5 ± 1.93.0 ± 1.77.5 ± 1.9
CNN_Spectrogram64.2 ± 0.251.6 ± 0.377.8 ± 0.111.6 ± 1.57.1 ± 1.411.6 ± 1.5
CNN_melSpectrogram63.6 ± 0.150.7 ± 0.277.6 ± 0.07.9 ± 1.03.4 ± 0.77.9 ± 1.0
Table 13. Performance results obtained with 2 classes (wheeze vs. other)—training: fixed duration; testing: variable duration.
Table 13. Performance results obtained with 2 classes (wheeze vs. other)—training: fixed duration; testing: variable duration.
ClassifiersAccuracyAUCWheezF1WheezMCCWheezF1OtherMCCOther
LDA_10MRMR53.3 ± 0.455.4 ± 0.063.6 ± 0.558.4 ± 0.535.2 ± 0.230.3 ± 0.2
LDA_100MRMR53.7 ± 0.656.2 ± 1.563.8 ± 0.758.7 ± 0.735.8 ± 0.230.9 ± 0.3
LDA_Full56.6 ± 0.956.8 ± 0.767.3 ± 1.062.5 ± 1.135.2 ± 0.230.7 ± 0.3
SVMrbf_10MRMR57.3 ± 1.449.1 ± 2.269.6 ± 2.165.1 ± 2.527.2 ± 4.223.5 ± 3.6
SVMrbf_100MRMR57.4 ± 1.753.5 ± 1.470.3 ± 1.665.9 ± 1.924.7 ± 2.621.2 ± 2.6
SVMrbf_Full61.2 ± 0.657.2 ± 1.173.4 ± 0.569.5 ± 0.628.9 ± 1.826.4 ± 1.7
RUSBoost_10MRMR61.2 ± 0.951.7 ± 0.674.8 ± 0.771.6 ± 0.915.4 ± 2.314.8 ± 2.3
RUSBoost_100MRMR62.4 ± 0.553.2 ± 0.576.0 ± 0.373.3 ± 0.512.7 ± 1.713.4 ± 1.8
RUSBoost_Full61.3 ± 0.852.7 ± 1.975.6 ± 0.773.0 ± 1.05.8 ± 1.35.5 ± 1.1
CNN_dualInput64.1 ± 0.150.2 ± 0.177.9 ± 0.1−1.0 ± 0.74.7 ± 0.1−1.0 ± 0.7
CNN_Spectrogram64.1 ± 0.051.2 ± 0.077.9 ± 0.0−1.2 ± 0.24.8 ± 0.0−1.2 ± 0.2
CNN_melSpectrogram64.0 ± 0.550.2 ± 0.277.8 ± 0.5−1.1 ± 1.05.1 ± 1.2−1.1 ± 1.0
Table 14. Performance results obtained with 3 classes (crackle vs. wheeze vs. other)—training: variable duration; testing: variable duration.
Table 14. Performance results obtained with 3 classes (crackle vs. wheeze vs. other)—training: variable duration; testing: variable duration.
ClassifiersAccuracyF1WheezMCCWheezF1CrackMCCCrackF1OtherMCCOther
LDA_10MRMR62.3 ± 0.171.0 ± 0.067.8 ± 0.075.2 ± 0.142.5 ± 0.117.1 ± 0.214.2 ± 0.3
LDA_100MRMR65.5 ± 0.072.3 ± 0.169.8 ± 0.176.7 ± 0.147.8 ± 0.135.0 ± 0.422.5 ± 0.1
LDA_Full68.8 ± 0.172.2 ± 0.169.9 ± 0.178.2 ± 0.252.9 ± 0.248.9 ± 0.532.5 ± 0.3
SVMrbf_10MRMR65.6 ± 0.472.5 ± 0.569.1 ± 0.776.7 ± 0.347.2 ± 0.734.9 ± 2.723.2 ± 1.3
SVMrbf_100MRMR68.2 ± 0.968.8 ± 1.964.1 ± 2.277.4 ± 0.751.2 ± 1.452.1 ± 2.331.3 ± 2.2
SVMrbf_Full68.0 ± 1.165.2 ± 4.060.9 ± 3.975.9 ± 1.651.1 ± 1.557.7 ± 3.233.4 ± 1.7
RUSBoost_10MRMR65.4 ± 0.472.7 ± 0.669.9 ± 0.874.8 ± 0.945.2 ± 0.843.2 ± 3.824.1 ± 1.6
RUSBoost_100MRMR68.5 ± 0.573.6 ± 0.871.0 ± 1.275.4 ± 1.350.6 ± 1.055.2 ± 2.533.6 ± 1.1
RUSBoost_Full69.0 ± 1.173.7 ± 0.770.7 ± 0.775.4 ± 1.651.6 ± 1.757.7 ± 0.635.2 ± 1.6
CNN_dualInput81.8 ± 0.772.5 ± 2.369.3 ± 2.088.2 ± 0.675.2 ± 1.375.1 ± 1.362.1 ± 1.4
CNN_Spectrogram78.7 ± 0.970.5 ± 3.066.3 ± 3.186.2 ± 0.670.9 ± 1.769.6 ± 2.655.9 ± 1.8
CNN_melSpectrogram76.9 ± 1.370.3 ± 2.666.2 ± 2.484.7 ± 0.867.4 ± 2.066.3 ± 3.951.4 ± 3.0
Table 15. Performance results obtained with 2 classes (crackle vs. other)—training: variable duration; testing: variable duration.
Table 15. Performance results obtained with 2 classes (crackle vs. other)—training: variable duration; testing: variable duration.
ClassifiersAccuracyAUCF1CrackMCCCrackF1OtherMCCOther
LDA_10MRMR68.1 ± 0.274.7 ± 0.076.9 ± 0.149.4 ± 0.348.4 ± 0.533.3 ± 0.5
LDA_100MRMR70.2 ± 0.376.3 ± 0.276.4 ± 0.252.2 ± 0.559.7 ± 1.642.7 ± 1.5
LDA_Full68.5 ± 0.773.4 ± 1.174.9 ± 1.249.4 ± 1.157.5 ± 2.239.6 ± 2.0
SVMrbf_10MRMR68.7 ± 0.272.2 ± 0.578.6 ± 0.152.2 ± 0.341.4 ± 1.031.7 ± 0.8
SVMrbf_100MRMR72.6 ± 0.580.1 ± 0.878.6 ± 0.956.1 ± 0.961.8 ± 1.546.6 ± 0.9
SVMrbf_Full71.2 ± 1.378.6 ± 1.477.2 ± 1.853.7 ± 2.060.6 ± 1.444.4 ± 1.3
RUSBoost_10MRMR69.6 ± 0.376.0 ± 0.576.4 ± 0.651.2 ± 0.456.9 ± 2.340.1 ± 1.9
RUSBoost_100MRMR71.0 ± 0.779.7 ± 0.476.9 ± 0.853.4 ± 1.161.0 ± 0.744.4 ± 1.0
RUSBoost_Full69.9 ± 1.378.6 ± 0.975.0 ± 1.452.0 ± 2.062.4 ± 1.445.1 ± 2.1
CNN_dualInput87.4 ± 1.484.9 ± 2.390.5 ± 0.973.0 ± 2.781.4 ± 3.073.0 ± 2.7
CNN_Spectrogram86.5 ± 1.383.8 ± 2.389.8 ± 0.770.8 ± 2.579.9 ± 3.170.8 ± 2.5
CNN_melSpectrogram85.1 ± 1.281.8 ± 2.088.9 ± 0.767.7 ± 2.677.4 ± 2.967.7 ± 2.6
Table 16. Performance results obtained with 2 classes (wheeze vs. other)—training: variable duration; testing: variable duration.
Table 16. Performance results obtained with 2 classes (wheeze vs. other)—training: variable duration; testing: variable duration.
ClassifiersAccuracyAUCWheezF1WheezMCCWheezF1OtherMCCOther
LDA_10MRMR62.4 ± 0.162.5 ± 0.173.9 ± 0.270.2 ± 0.132.5 ± 0.130.0 ± 0.0
LDA_100MRMR55.7 ± 0.960.1 ± 1.462.2 ± 1.657.8 ± 1.546.2 ± 1.642.3 ± 1.8
LDA_Full56.5 ± 1.959.1 ± 1.763.8 ± 2.759.4 ± 2.745.0 ± 2.740.9 ± 3.0
SVMrbf_10MRMR63.4 ± 0.963.8 ± 0.472.5 ± 1.168.4 ± 1.245.3 ± 1.041.6 ± 1.0
SVMrbf_100MRMR66.2 ± 0.968.4 ± 1.674.6 ± 0.970.8 ± 1.049.2 ± 3.145.8 ± 3.0
SVMrbf_Full65.4 ± 1.268.5 ± 0.772.0 ± 1.968.4 ± 1.954.2 ± 2.350.8 ± 2.4
RUSBoost_10MRMR64.1 ± 1.067.7 ± 0.770.6 ± 1.466.8 ± 1.453.6 ± 1.850.1 ± 2.0
RUSBoost_100MRMR64.3 ± 1.568.2 ± 0.571.1 ± 2.567.3 ± 2.353.2 ± 1.849.8 ± 2.0
RUSBoost_Full60.9 ± 2.565.8 ± 1.966.8 ± 3.563.1 ± 3.352.3 ± 1.448.9 ± 1.7
CNN_dualInput73.2 ± 0.772.7 ± 1.178.4 ± 1.044.0 ± 1.664.8 ± 1.644.0 ± 1.6
CNN_Spectrogram69.2 ± 1.866.6 ± 1.576.0 ± 2.833.3 ± 2.456.5 ± 3.133.3 ± 2.4
CNN_melSpectrogram69.9 ± 1.366.7 ± 1.676.9 ± 1.633.6 ± 2.656.4 ± 2.933.6 ± 2.6
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Rocha, B.M.; Pessoa, D.; Marques, A.; Carvalho, P.; Paiva, R.P. Automatic Classification of Adventitious Respiratory Sounds: A (Un)Solved Problem? Sensors 2021, 21, 57. https://doi.org/10.3390/s21010057

AMA Style

Rocha BM, Pessoa D, Marques A, Carvalho P, Paiva RP. Automatic Classification of Adventitious Respiratory Sounds: A (Un)Solved Problem? Sensors. 2021; 21(1):57. https://doi.org/10.3390/s21010057

Chicago/Turabian Style

Rocha, Bruno Machado, Diogo Pessoa, Alda Marques, Paulo Carvalho, and Rui Pedro Paiva. 2021. "Automatic Classification of Adventitious Respiratory Sounds: A (Un)Solved Problem?" Sensors 21, no. 1: 57. https://doi.org/10.3390/s21010057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop