A Personalized Respiratory Disease Exacerbation Prediction Technique Based on a Novel Spatio-Temporal Machine Learning Architecture and Local Environmental Sensor Networks

Bhowmik, Rohan T.; Most, Sam P.

doi:10.3390/electronics11162562

Open AccessArticle

A Personalized Respiratory Disease Exacerbation Prediction Technique Based on a Novel Spatio-Temporal Machine Learning Architecture and Local Environmental Sensor Networks

by

Rohan T. Bhowmik

^1,2,*

and

Sam P. Most

¹

School of Medicine, Stanford University, Stanford, CA 94305, USA

²

The Harker School, San Jose, CA 95129, USA

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(16), 2562; https://doi.org/10.3390/electronics11162562

Submission received: 14 July 2022 / Revised: 6 August 2022 / Accepted: 7 August 2022 / Published: 16 August 2022

Download

Browse Figures

Versions Notes

Abstract

:

Chronic respiratory diseases, such as the Chronic Obstructive Pulmonary Disease (COPD) and asthma, are a serious health crisis, affecting a large number of people globally and inflicting major costs on the economy. Current methods for assessing the progression of respiratory symptoms are either subjective and inaccurate, or complex and cumbersome, and do not incorporate environmental factors to track individualized risks. Lacking predictive assessments and early intervention, unexpected exacerbations often lead to hospitalizations and high medical costs. This work presents a multi-modal solution for predicting the exacerbation risks of respiratory diseases, such as COPD, based on a novel spatio-temporal machine learning architecture for real-time and accurate respiratory events detection, and tracking of local environmental and meteorological data and trends. The proposed new neural network model blends key attributes of both convolutional and recurrent neural architectures, allowing extraction of the salient spatial and temporal features encoded in respiratory sounds, thereby leading to accurate classification and tracking of symptoms. Combined with the data from environmental and meteorological sensors, and a predictive model based on retrospective medical studies, this solution can assess and provide early warnings of respiratory disease exacerbations, thereby potentially reducing hospitalization rates and medical costs.

Keywords:

artificial intelligence; respiratory exacerbation; multi-modal; sensor network; personalized medicine

1. Introduction

Chronic respiratory diseases affect a large fraction of the world population, with Chronic Obstructive Pulmonary Disease (COPD) affecting 235 million and asthma affecting 339 million people worldwide, according to the World Health Organization [1]. Lacking effective early intervention, COPD and asthma cost over $130 Billion annually in the U.S. alone [2].

Existing methods of diagnosis and tracking of these disease conditions in clinical practice, including widely-used patient questionnaires, are highly variable due to the subjectivity of definition, perception, and reporting of respiratory events. In fact, many respiratory diseases are often over- or under-diagnosed. Based on the study by Diab. et al., approximately 70 percent of COPD cases worldwide may be underdiagnosed, while 30 to 60 percent of those diagnosed with COPD may not have the disease at all [3]. As the treatment of respiratory diseases often requires the prescription of steroids, misdiagnosis can cause serious problems.

Currently, no passive monitoring method exists for accurately predicting the exacerbation of respiratory conditions, which can lead to decreased quality of life and serious complications [4,5]. A number of cough detection methods have been reported, but no accurate real-time tracking technique exists for passive and continuous monitoring. Commonly used methods involve subjective reporting, often leading to frequent and dangerous misdiagnosis [6,7,8]. Besides the respiratory conditions of the patient, environmental factors such as pollen, humidity, air quality, etc., also play a significant role in the disease progression, exacerbations, and hospitalizations [9]. However, currently there is no multi-modal predictive technique that incorporates the trends of both respiratory events and local environmental factors in order to assess the progression of the patient’s conditions.

Thus, the development of an accurate and real-time predictive solution for respiratory disease exacerbation that is easily accessible is highly needed, based on monitoring of patient’s respiratory events as well as the local environmental and meteorological parameters. The recent advances in connected devices, sensors, data technologies, and machine learning techniques present a significant opportunity to develop respiratory telehealth capabilities, allowing for accurate remote monitoring of patient conditions as well as assessing potential exacerbations with predictive Artificial Intelligence (AI) models.

This work presents a multi-modal solution for real-time COPD exacerbation prediction that includes a novel spatio-temporal artificial intelligence architecture for cough detection, real-time cough-count and frequency monitoring, analytics of the local environmental and meteorological factors utilizing data from sensor networks, and exacerbation prediction using both respiratory event tracking and environmental conditions based on retrospective medical studies. The goal of this research is to develop an early-warning system based on AI and multi-factor analysis to reduce hospitalizations and medical costs, and demonstrate the feasibility of deploying a passive, continuous, remote patient monitoring and telehealth solution for chronic respiratory diseases.

2. Related Work

Researchers have previously identified that monitoring a patient’s respiratory events can be utilized to assess the patient’s condition [10]. In order to automate this process, a number of cough detection solutions have been proposed [11,12,13,14,15,16,17]. A survey of previously reported techniques, performances and limitations are listed in Table 1. Earlier methods used relatively simpler techniques, such as probabilistic statistical models on waveform data [11], but also yielded low accuracies. On the other hand, more recent studies have used specialized equipment and complex setups, such as wireless wearable patch sensors [15] or spirometers [17], to achieve relatively better results. However, no single technique simultaneously meets all of the following requirements: highly accurate, efficient, passive and continuous monitoring, and does not need extra equipment.

With the recent advancements in the field of artificial intelligence, researchers have moved towards exploring solutions based on Deep Neural Networks (DNN). Several researchers have demonstrated the detection of cough with either Convolutional Neural Networks (CNN) or Recurrent Neural Networks (RNN) [13,14]. Traditional CNN models are based on learning and detecting spatial features in the data and are typically used for image-based analysis, whereas RNN models are based on extracting temporal features and are often used for time-sequenced tasks such as speech processing. Since respiratory sounds, when converted to spectrograms, encode key spatial and temporal signatures, neither of the traditional models is well suited for respiratory event classification. This spectrogram technique has been described in sound analysis and detection techniques [18].

Some researchers have recently reported combined Convolutional-Recurrent Neural Networks (CRNN) for acoustic analysis [19,20,21,22,23]. These CRNN models have been shown to work better than CNN and RNN in both image processing and sequence-related tasks [19,22], but these frameworks do not fully utilize the spatial/temporal feature extraction capabilities of CNN/RNN architectures as they are created by simply stacking RNN layers after CNN layers in a sequential manner. The development of machine learning architecture based on deeply meshed spatio-temporal feature learning for respiratory sound classification has not been previously explored.

Medical researchers have also shown that several key environmental and meteorological factors are related to the exacerbations of COPD [9]; however, this research has not been combined with real-time monitoring of respiratory events to develop predictive models for exacerbations.

3. Proposed Work

3.1. Proposed Multi-Modal System Architecture

In this project, a novel multi-modal COPD patient monitoring and exacerbation prediction system has been developed based on real-time analysis and tracking of both respiratory events and environmental factors. As shown in Figure 1, the system architecture consists of three stages: (i) a detection module; (ii) an environmental module, and finally; (iii) a prediction module.

The detection module utilizes a new spatio-temporal machine learning algorithm for accurately detecting coughs from real-time audio and tracking the patient’s cough count and frequency. Simultaneously, the environmental module acquires local environmental and meteorological data from nearby weather stations and sensor networks to calculate the percentage increase of exacerbation risks in any location around the world based on the results of retrospective medical studies. Finally, the prediction module combines the historical cough count data and trends from the detection module and the calculated exacerbation risk increase from the environmental module in order to forecast the progression of the patient’s conditions, and alert the patients and caregivers for early interventions. Such a tool will help those with respiratory diseases better assess their current conditions and environmental outlook to make more informed decisions for their health and safety [24,25].

3.2. Detection Module

The detection module, as shown on the left-hand side of the system architecture diagram in Figure 1, consists of a new AI model for real-time detection and tracking of cough. As described earlier, previously reported models for respiratory sound analysis are based on the traditional convolutional, recurrent, or the more recent convolutional-recurrent structures. In this project, a new machine learning algorithm has been developed that incorporates a novel hybrid framework by deeply meshing convolutional and recurrent architectures, enabling more efficient extraction and analysis of spatio-temporal features, leading to better accuracies for classifying and tracking respiratory events.

The following subsections describe the new spatio-temporal machine learning framework for classifying and tracking respiratory events, creation of the dataset to train and test the model, the results of benchmarking the proposed model with traditional neural network architectures, and a live demonstration application showcasing the capability of real-time classification of respiratory sounds.

3.2.1. A New Machine Learning Architecture for Respiratory Sound Analysis

The new AI model, henceforth referred to as the Spatio-Temporal Artificial Intelligence Network (STAIN), interweaves convolutional neural network models within a recurrent neural network architecture, allowing for sequential image analysis over the time domain. The architecture of the STAIN framework is shown in Figure 2. First, the respiratory sound files are converted to corresponding spectrogram images by performing Fast Fourier Transforms. The resulting spectrogram is split into 200 millisecond slices, which are used as inputs for the machine learning model.

As illustrated in Figure 2, the machine learning model architecture incorporates a hybrid network based on a deep mesh integration of convolutional and recurrent architectures to learn spatio-temporal features. The STAIN framework consists of a CNN model which evaluates the corresponding audio slices and outputs its predicted confidence. The CNN architecture is a variation of Yann Lecun’s seminal LeNet model [26], which can flexibly adapt to any image dataset. Specifically, it consists of two groupings of Convolutional Layers of 2 × 2 kernels and 2 × 2 Maximum Pooling Layers followed by Rectified Linear Unit (ReLU) activation function. Then, the resulting data are flattened into a one-dimensional array before feeding it into two Fully Connected (Dense) Layers to reduce the number of neurons down to just one. The final output is then passed through a Sigmoid Layer to obtain a value between (0, 1).

The CNNs analyzing separate parts of the input image enable spatial feature extraction, while the Encoders passing down compressed inputs as RNN’s hidden variables enable temporal feature extraction. Various designs for the Encoder have been explored, the selected one being with a simple architecture consisting of a single Maximum Pooling layer, shrinking the input into a hidden variable. The compressed output of the encoder is extended along its longer end (56 to 256 through resizing) to be concatenated to the spectrogram slices, the product of which is sent through both the CNN and encoder again.

Effectively, each slice of the spectrogram image is assigned to an RNN unit, wherein a CNN generates an output and the Encoder generates the hidden data. Each output represents the probability of a cough during that slice. The hidden outputs carry on information from previous slices and are concatenated to the next slice. The final output is the maximum of all the outputs from all slices. That is, if only one of the slices’ output is high and indicating a cough while all the others are low and indicating a cough, the final overall classification of the audio file is positive for coughing.

The model is trained by minimizing the Binary-Cross Entropy loss, defined in Equation (1), where

L (y)

is the final loss function, y is the set of labels (1 for cough, 0 for no-cough),

p (y)

is the probability of the given label, and N is the total number of entries.

L (y) = - \frac{1}{N} \sum_{0}^{N} y_{i} * l o g (p (y_{i})) + (1 - y_{i}) * l o g (1 - p (y_{i})) .

(1)

All the codes in this project were written in Python, and the machine learning models were implemented using the PyTorch Libraries.

3.2.2. Creation of the Dataset

In order to train and evaluate the proposed STAIN machine learning model as well as benchmark with other state-of-the-art models including CNN, RNN and CRNN, an augmented dataset of audio segments were created and partitioned into 10,000 training files with coughs, 10,000 training files without coughs, 1000 testing files with coughs, and 1000 testing files without coughs. The models were trained only on the 20,000 training files and tested only on the 2000 testing files in order to objectively evaluate and compare the performance of various models.

First, roughly 500 cough sound files were downloaded from the Free Sound Database (FSD) from Kaggle’s audio tagging competition [27] and every file was adjusted to only contain either a cough burst or coughing fit. The cough files were sufficiently diverse, containing many variations of coughs from individuals of both genders and from a wide range of ages (from babies to elderly). Each file also has its unique recording quality, mimicking the varying degrees of audio quality from different devices.

In order to augment the data, the rest of the audio files from Kaggle’s FSD were utilized. To create an augmented audio file, an empty audio file is created with a duration randomly chosen between 2 s and 5 s. Then, using the PyDub Library, a randomly chosen number of non-cough files from the FSD are superimposed on the targeted augmented file. Each of the added audio files are placed at a randomly chosen timestamp, with audio exceeding the augmented files trimmed off. The result of this process creates an augmented audio file categorized as “No Cough”. To turn it into a “Cough” file, one of the cough files from the FSD is added in a similar fashion. Additionally, each added file’s decibel gain is randomized to simulate sounds from varying distances.

For example, the audio augmentation program could superimpose files of bell chimes, people talking, rain ambience, and whistle blowing from the FSD at different loudness levels time locations in the augmented audio (which could start before the audio or end after the audio). By itself, this audio would be placed in the “No Cough” dataset. IF the program were to add an uninterrupted coughing noise on top of that augmented audio at the beginning of middle of the file, it would be placed in the “Cough” dataset. Of course, the coughing noise will reasonably louder than any individual background sound to ensure that the model will not pick up softer, distant coughs or cough-like noises.

3.3. Environmental Module

While the detection module presented in the previous section tracks real-time cough frequency for patient-specific analysis, the environmental module offers local area-wide environmental and meteorological factor analysis. By examining certain environmental indicators, a patient’s increase of COPD exacerbation likelihood can be determined.

Breathing air quality is one of the most crucial factors in human health; poor air quality can cause any person’s health to significantly deteriorate and is an increasingly important issue following the advent of rapid industrialization. Especially since their lungs are compromised due to inflammation, COPD patients are extremely susceptible to exacerbations caused by bad air quality. A seminal retrospective study analyzed hospitalization and exacerbation rates for COPD patients as functions of the local environmental and meteorological factors, including the concentration of fine particulate matters (where PM

_{x}

refers to particles or droplets present in the air that are x micrometers or less in width), NO

_{2}

, and temperature variations [9,28]. These medical studies established that the percentage exacerbation risk increases are directly proportional to PM

_{2.5}

and PM

_{10}

levels, NO

_{2}

concentrations, and temperature variations. The details of the findings are outlined in Table 2, with each increase/decrease of the “Rate” from “Safety Standards” constituting an additional “Risk Increase Coefficient” for exacerbations.

Based on the results of these retrospective medical studies, an equation has been formulated in this project to estimate the percentage exacerbation risk increase using the four environmental and meteorological parameters in the patient’s location. If a factor falls below the threshold standard, its contribution to the final risk percentage is zero; otherwise, it follows the formula as shown in Equation (2).

\begin{matrix} R = 1 % * \frac{([{PM}_{2.5}] - 12 \frac{μ g}{m^{3}})}{10 . \frac{μ g}{m^{3}}} + 0.8 % * \frac{([{PM}_{10}] - 7 \frac{μ g}{m^{3}})}{10 . \frac{μ g}{m^{3}}} \\ + 2 % * \frac{([{NO}_{2}] - 101.23 \frac{μ g}{m^{3}})}{10 . \frac{μ g}{m^{3}}} + 4.7 % * \frac{68 . 0^{\circ} F - T_{F}}{1 . 8^{\circ} F} . \end{matrix}

(2)

Equation (2) estimates the increase in COPD exacerbation risks as a function of environmental and meteorological factors (PM

_{2.5}

, PM

_{10}

, NO

_{2}

, and T

_{F}

for Temperature), derived based on the retrospective medical studies [9,28].

In order to generate a real-time risk map that would represent the exacerbation risk increase for an individual given the environmental factors in the patient’s location, the environmental and climatological data measured by sensors deployed by PurpleAir which are accessible via an open-source database [31], and NO

_{2}

readings from the World Air Quality Index (WAQI) data platform [32], have been incorporated into the above equation and overlaid on the geographical map of the region. Moreover, a linear extrapolation method implemented using the SciPy library, which generates a 2D heatmap given a set of sensor locations and readings by adjusting each heatmap location such that values between sensors form a linear gradient, has been developed to estimate the data at a specific location using the data from the sensors deployed in adjacent areas. As an example, Figure 3 shows the data map for PM

_{2.5}

, PM

_{10}

, Temperature, and NO

_{2}

from over 6000 sensors in the San Francisco Bay Area. As a spot check for the data, Figure 4 shows the PM

_{2.5}

concentrations recorded by the PurpleAir sensors in Irvine and San Jose areas during the first half of September 2020. The onsets of spikes on 6 September and 10 September correspond to the El Dorado Fire and the SCU Lightning Fire events, respectively.

3.4. Prediction Module

Finally, the prediction module combines the results of the respiratory sound analysis from the detection module and the environmental and meteorological factors analysis from the environmental module to forecast a patient’s expected conditions.

Previously reported medical research studies have determined average cough frequencies for COPD-affected smokers, affected ex-smokers, healthy smokers, and healthy non-smokers [8,28]. Thus, by extrapolating the progression in cough frequency as determined by the STAIN machine learning model and exacerbation risk increase from environmental factors from the data trends, a patient’s expected condition is determined.

This method is illustrated in Figure 5. First, based on the continuous respiratory event classifications performed by the STAIN machine learning model within the detection module, a best-fit curve derived using the SciPy library is created to determine the patient’s cough frequency trend. A patient’s cough frequency, measure in coughs per hour, is derived by monitoring patients throughout the day and dividing total coughs (one for each audio file with a positive slice output from STAIN) detected over 24 h. For such, a cough event is represented by a period of continuous coughing (positive cough classification) surrounded by periods without coughing. Next, the future exacerbation risks are derived based on the extrapolated cough frequency data and the increased risks due to environmental and meteorological factors as determined by the correlations established by the retrospective medical studies, as explained in the previous section. If the prediction module forecasts exceeding the threshold levels that are acceptable, the patient and caregivers would be alerted of the imminent exacerbations for necessary early medical interventions, thereby improving the patient’s quality of life and potentially saving hospitalization costs.

4. Results

4.1. Benchmarking

Using the dataset described in the previous section, rigorous evaluations of the four different AI models were performed. The results of these analyses are shown in Table 3 and Figure 6, which present the following performance metrics: sensitivity (SE), specificity (SP), accuracy (ACC), Matthews Correlation Coefficients (MCC), and the confusion matrices. The former three metrics are expressed as percentages from 0% to 100%, while Matthew Correlation Coefficient is a number between −1 and 1 with higher values representing greater model robustness. Equation (3) details the formulas used to derive these numerical metrics, where TP is the number of true-positive classifications, TN is the number of true-negatives, FP is the number of false-positives, and FN is the number of false-negatives.

The networks evaluated in Table 3 include a standard LeNet-5 CNN model [26], an RNN with similar methods of processing slices but only using a hidden layer instead of a CNN-Encoder combination, a CRNN which contains three iterations of convolutional and max-pooling layers into a recurrent Long Short Term Memory (LSTM) layer into fully-connected layers which provides the output, and the spatio-temporal STAIN network (all trained on the same hyperparameters: 0.002 learning rate and 0.9 momentum).

\begin{matrix} SE = \frac{TP}{TP + FP} \\ SP = \frac{TN}{TN + FN} \\ ACC = \frac{TP + TN}{TP + FP + TN + FN} \\ MCC = \frac{(TP \times TN) - (FP \times FN)}{\sqrt{(TP + FP) (TP + FN) (TN + FP) (TN + FN)}} \end{matrix}

(3)

As these results illustrate, compared to RNN’s temporal feature analysis, CNN’s spatial analysis was better suited for classifying spectrograms. CRNN, created by simply stacking the CNN and RNN components, could not bring out the best of both architectures, performing worse than CNN. In contrast, the proposed new machine learning model, STAIN, performed better than all other models using its architecture for deeply meshed spatio-temporal feature analysis.

4.2. Demonstration of the Detection Module

A live demo application for the real-time cough detection module has been developed. This application, running on a laptop computer, captures user-generated sounds using the built-in microphones of the computer, converts the sound files into spectrogram images, processes the data through the STAIN machine learning model, classifies and tracks the cough count and cough frequency over time. The results are presented on the computer screen with a live display of the spectrogram images corresponding to the sound, superimposed with the classification results of the cough events. Figure 7 shows the representative screenshots of the application running real-time, and correctly classifying talking, clapping, page flipping, music, burp, and sneezes as “No Cough” (left-hand side of Figure 7), whereas successfully detecting cough events superimposed with the same background sound environments (right-hand side of Figure 7).

5. Conclusions

In summary, a multi-modal technique has been developed for predicting the exacerbation risks for respiratory diseases such as COPD, based on a new artificial intelligence model for respiratory sound analysis and retrospective medical studies correlating key environmental parameters to exacerbations. The proposed solution includes a novel spatio-temporal machine learning model for accurate real-time classification and monitoring of respiratory conditions, tracking of local environmental and meteorological factors with commercially deployed sensors, and forecasting the patient’s progression of conditions by combining the trends derived from these two modules.

The proposed new spatio-temporal artificial intelligence network architecture deeply meshes the salient structures of both convolutional and recurrent neural networks, and as a result outperforms both traditional CNN and RNN models, as well as the more recent CRNN models, in extracting the spatial and temporal features that are inherent in spectrograms of respiratory sounds. Extensive comparative tests have been performed to demonstrate that the new model achieves better sensitivity, specificity, accuracy, and Matthews Correlation Coefficient metrics than the traditional machine learning models.

A telehealth solution based on this work can assess the exacerbation risks and alert patients and doctors of early medical intervention, medication, and impending hospitalization. Thus, this technique can conveniently and cost-effectively help minimize and mitigate the impact of respiratory exacerbations, therefore improving patients’ quality of life and potentially reducing hospitalization costs [33,34].

Author Contributions

R.T.B. is responsible for the conception, methodology and software development, data collection and processing, results analysis, and manuscript preparation. S.P.M. is responsible for the research design, results interpretation, medical analysis, and supervision of the project. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Datasets used include: Kaggle Free Sound Dataset (FSD) 2018.

Acknowledgments

The authors would like to express sincere gratitude to Chris Spenner, Kailas Vodrahalli, Archelle Georgiou, Sridhar Nemala, and Krishna Vastare for their inputs.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

COPD	Chronic Obstructive Pulmonary Disease
AI	Artificial Intelligence
TP	True Positives
FP	False Positives
TN	True Negatives
FN	False Negative
SE	Sensitivity
SP	Specificity
ACC	Accuracy
MCC	Matthews Correlation Coefficient
PNN	Probabilistic Neural Network
CNN	Convolutional Neural Network
RNN	Recurrent Neural Network
CRNN	Convolutional-Recurrent Neural Network
STAIN	Spatio-Temporal Artificial Intelligence Network

References

Forum of International Respiratory Societies. The Global Impact of Respiratory Disease, 2nd ed.; European Respiratory Society: Sheffield, UK, 2017. [Google Scholar]
Syamlal, G.; Bhattacharya, A.; Dodd, K.E. Medical Expenditures Attributed to Asthma and Chronic Obstructive Pulmonary Disease Among Workers—United States, 2011–2015. Morb. Mortal. Wkly. Rep. 2020, 69, 809–814. [Google Scholar] [CrossRef] [PubMed]
Diab, N.; Gershon, A.S.; Sin, D.D.; Tan, W.C.; Bourbeau, J.; Boulet, L.P.; Aaron, S.D. Underdiagnosis and Overdiagnosis of Chronic Obstructive Pulmonary Disease. Am. J. Respir. Crit. Care Med. 2018, 198, 1130–1139. [Google Scholar] [CrossRef] [PubMed]
Christenson, S.A.; Smith, B.M.; Bafadhel, M.; Putcha, N. Chronic obstructive pulmonary disease. Lancet 2022, 399, 2227–2242. [Google Scholar] [CrossRef]
Camac, E.R.; Stumpf, N.A.; Voelker, H.K.; Criner, G.J. Short-Term Impact of the Frequency of COPD Exacerbations on Quality of Life. Chronic Obstr. Pulm. Dis. 2022, 9, 298–308. [Google Scholar] [CrossRef]
Tomasic, I.; Tomasic, N.; Trobec, R.; Krpan, M.; Kelava, T. Continuous remote monitoring of COPD patients—Justification and explanation of the requirements and a survey of the available technologies. Med. Biol. Eng. Comput. 2018, 56, 547–569. [Google Scholar] [CrossRef]
Bentsen, S.B.; Rustøen, T.; Miaskowski, C. Differences in subjective and objective respiratory parameters in patients with chronic obstructive pulmonary disease with and without pain. Int. J. Chronic Obstr. Pulm. Dis. 2012, 7, 137–143. [Google Scholar] [CrossRef]
Ho, T.; Cusack, R.P.; Chaudhary, N.; Satia, I.; Kurmi, O.P. Under- and over-diagnosis of COPD: A global perspective. Breathe 2019, 15, 24–35. [Google Scholar] [CrossRef]
De Miguel-Díez, J.; Hernández-Vázquez, J.; López-de-Andrés, A.; Álvaro-Meca, A.; Hernández-Barrera, V.; Jiménez-García, R. Analysis of environmental risk factors for chronic obstructive pulmonary disease exacerbation: A case-crossover study (2004–2013). PLoS ONE 2019, 14, e0217143. [Google Scholar] [CrossRef]
Smith, J.; Woodcock, A. Cough and its importance in COPD. Int. J. Chronic Obstr. Pulm. Dis. 2006, 1, 305–314. [Google Scholar] [CrossRef]
Barry, S.J.; Dane, A.D.; Morice, A.H.; Walmsley, A.D. The automatic recognition and counting of cough. Cough 2006, 2, 8. [Google Scholar] [CrossRef]
Liu, J.M.; You, M.; Wang, Z.; Li, G.Z.; Xu, X.; Qiu, Z. Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2014): Medical Informatics and Decision Making. BMC Med. Inform. Decis. Mak. 2015, 15 (Suppl. 4), S2. [Google Scholar] [CrossRef]
Wang, H.H.; Liu, J.M.; You, M.Y.; Li, G.Z. Audio signals encoding for cough classification using convolutional neural networks: A comparative study. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, Washington, DC, USA, 9–12 November 2015; pp. 442–445. [Google Scholar] [CrossRef]
Amoh, J.; Odame, K. Deep Neural Networks for Identifying Cough Sounds. IEEE Trans. Biomed. Circuits Syst. 2016, 10, 1003–1011. [Google Scholar] [CrossRef] [PubMed]
Elfaramawy, T.; Fall, C.L.; Morissette, M.; Lellouche, F.; Gosselin, B. Wireless respiratory monitoring and coughing detection using a wearable patch sensor network. In Proceedings of the 15th IEEE International New Circuits and Systems Conference, Strasbourg, France, 25–28 June 2017; pp. 197–200. [Google Scholar] [CrossRef]
Drugman, T.; Urbain, J.; Dutoit, T. Objective study of sensor relevance for automatic cough detection. In Proceedings of the 19th European Signal Processing Conference, Barcelona, Spain, 29 August–2 September 2011; pp. 1289–1293. [Google Scholar]
Soliński, M.; Lepek, M.; Koltowski, L. Automatic cough detection based on airflow signals for portable spirometry system. arXiv 2019, arXiv:1903.03588. [Google Scholar] [CrossRef]
Mesaros, A.; Heittola, T.; Virtanen, T.; Plumbley, M.D. Sound Event Detection: A tutorial. IEEE Signal Process. Mag. 2021, 38, 67–83. [Google Scholar] [CrossRef]
Çakır, E.; Parascandolo, G.; Heittola, T.; Huttunen, H.; Virtanen, T. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1291–1303. [Google Scholar] [CrossRef]
Sang, J.; Park, S.; Lee, J. Convolutional Recurrent Neural Networks for Urban Sound Classification Using Raw Waveforms. In Proceedings of the 26th European Signal Processing Conference, Rome, Italy, 3–7 September 2018; pp. 2444–2448. [Google Scholar] [CrossRef]
Deshmukh, S.; Raj, B.; Singh, R. Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection. arXiv 2020, arXiv:2008.07085. [Google Scholar]
Sorana. Analytics Vidhya. 2020. Available online: https://www.analyticsvidhya.com/blog/2020/11/a-short-intuitive-explanation-of-convolutional-recurrent-neural-networks/ (accessed on 1 February 2021).
Sanjeevan, K.; Hung, T. UrbanSound Classification Using Convolutional Recurrent Networks in PyTorch. 2020. Available online: https://github.com/ksanjeevan/crnn-audio-classification (accessed on 1 February 2021).
Parikh, S.; Henderson, K.; Gondalia, R.; Kaye, L.; Remmelink, E.; Thompson, A.; Barrett, M. Perceptions of Environmental Influence and Environmental Information-Seeking Behavior among People with Asthma and COPD. Front. Digit. Health 2022, 4, 748400. [Google Scholar] [CrossRef]
Patel, N.; Kinmond, K.; Jones, P.; Birks, P.; Spiteri, M.A. Validation of COPDPredict™: Unique Combination of Remote Monitoring and Exacerbation Prediction to Support Preventative Management of COPD Exacerbations. Int. J. Chron. Obstruct. Pulmon. Dis. 2021, 16, 1887–1899. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Kaggle FSD. 2018. Available online: https://www.kaggle.com/c/freesound-audio-tagging (accessed on 1 February 2021).
Jo, E.J.; Song, W.J. Environmental triggers for chronic cough. Asia Pac. Allergy 2019, 9, e16. [Google Scholar] [CrossRef]
The United States Environmental Protection Agency. National Ambient Air Quality Standards for Particle Pollution. Available online: https://www.epa.gov/sites/production/files/2016-04/documents/2012_aqi_factsheet.pdf (accessed on 1 February 2021).
The United States Environmental Protection Agency. NAAQS Table. Available online: https://www.epa.gov/criteria-air-pollutants/naaqs-table (accessed on 1 February 2021).
PurpleAir. Real-Time Air Quality Monitoring. Available online: https://www2.purpleair.com/ (accessed on 1 February 2021).
World Air Quality Index: Real-Time Air Pollution. Available online: https://waqi.info/ (accessed on 1 February 2021).
Khoshrounejad, F.; Hamednia, M.; Mehrjerd, A.; Pichaghsaz, S.; Jamalirad, H.; Sargolzaei, M.; Hoseini, B.; Aalaei, S. Telehealth-Based Services During the COVID-19 Pandemic: A Systematic Review of Features and Challenges. Front. Public Health 2021, 19, 711762. [Google Scholar] [CrossRef] [PubMed]
Gajarawala, S.N.; Pelkowski, J.N. Telehealth Benefits and Barriers. J. Nurse Pract. 2021, 17, 218–221. [Google Scholar] [CrossRef] [PubMed]

Figure 1. This flowchart represents the proposed system architecture for real-time multi-modal exacerbation prediction. The detection module depicts the respiratory event analysis system using a novel spatio-temporal artificial intelligence neural network. The environmental module depicts the disease exacerbation risk analysis system using local environmental factors. The prediction module takes the respiratory event data and trends from the detection module and exacerbation risk increase from the environmental module to predict future exacerbations and provide necessary alerts for early intervention.

Figure 2. Architecture for the new machine learning model, which is referred to as the spatio-temporal artificial intelligence network (STAIN). This proposed AI model deeply blends the elements of both convolutional and recurrent neural networks, and effectively learns both spatial and temporal features encoded within the respiratory sound spectrograms for accurate classifications.

Figure 3. Data maps for the relevant environmental and meteorological factors (PM

_{2.5}

, PM

_{10}

, NO

_{2}

, and Temperature), obtained from the sensors deployed by PurpleAir and the WAQI data platform. An extrapolation method was used to estimate the data in areas with sparse sensor coverage.

Figure 3. Data maps for the relevant environmental and meteorological factors (PM

_{2.5}

, PM

_{10}

, NO

_{2}

, and Temperature), obtained from the sensors deployed by PurpleAir and the WAQI data platform. An extrapolation method was used to estimate the data in areas with sparse sensor coverage.

Figure 4. Sensors deployed by PurpleAir in Irvine and San Jose showed that the PM

_{2.5}

concentration spiked to dangerous levels during 2–13 September 2020, fire season. The onsets of spikes on 6 September and 10 September correspond to the El Dorado Fire and the SCU Lightning Fire events.

Figure 4. Sensors deployed by PurpleAir in Irvine and San Jose showed that the PM

_{2.5}

concentration spiked to dangerous levels during 2–13 September 2020, fire season. The onsets of spikes on 6 September and 10 September correspond to the El Dorado Fire and the SCU Lightning Fire events.

Figure 5. Illustration of the procedures implemented within the prediction module that forecasts the expected progression of the condition of the patient in the days ahead. This final step in the multi-modal architecture combines the results from the respiratory sound analysis performed by the machine learning model of the detection module, and the environmental and meteorological factors and trends analysis conducted by the environmental module. By extrapolating the cough frequency trends by taking cough frequencies in future days from a best-fit curve, along with the predicted exacerbation risks due to the environmental and meteorological data, the system can alert the patient and caregivers of the imminent risks and preempt medical interventions to potentially reduce hospitalization costs. A green circle represents a day where the adjusted cough frequency is lower than the threshold, orange represents a day when the adjusted frequency is higher than the threshold, and red represents a day when the original cough frequency exceeds the threshold. Orange and red are “danger” zones.

Figure 6. Confusions Matrices, created using MatPlotLib, for (A) CNN, (B) RNN, (C) CRNN, and (D) the proposed STAIN machine learning models. The new STAIN architecture outperforms the traditional neural network architectures for more accurate cough detection.

Figure 7. Screenshots of the live demonstration of the cough detection module based on the new spatio-temporal machine learning model. The real-time application, implemented on a laptop computer, captures user-generated sounds using it’s integrated microphones, converts the sound into spectrogram images, processes through the STAIN model to detect the presence of cough, and displays the results on the screen. The x-axis of the spectrogram represents time and the y-axis represents frequency such that each pixel represents a different intensity of sound of a certain frequency at a specific time; lower intensities are represented as black to purple pixels, and higher intensities are represented as red to yellow. The text within each subfigure’s top-left corner describes the sounds played within the spectrogram, and the top-right text details the network prediction and confidence.

Table 1. A survey of previously reported techniques for automatic cough detection (P/C/R/DNN = Probabilistic/Convolutional/Recurrent/Deep Neural Network; SP = Specificity; SE = Sensitivity; ACC = Accuracy). The above techniques follow these observations: (i) generally higher accuracies were achieved with more complex models utilizing spectrograms; (ii) techniques aided with extra equipment produced better results; (iii) no single technique simultaneously meets all of the following requirements: highly accurate, efficient, passive and continuous monitoring, does not need extra equipment.

References	Detection Technique	Technical Merit	Equipment Used	Technique Achievements	Technique Downsides
Barry et al., 2006 [11]	PNN on raw waveform	96% SP, 80% SE	Audio Microphone	Earliest Method for Cough Detection, used PNN on raw waveform	Technique had high specificity but low sensitivity (cannot identify no coughs well)
Liu et al., 2015 [12]	Cough event classification by pre trained DNN	90% SP, 85% SE, ∼89% ACC	Audio Microphone	First method utilizing spectrograms for cough detection	Accuracy is mediocre, unfavorable for precise monitoring
Li et al., 2015 [13]	CNN (AlexNet) on spectrogram features	98.6% ACC, 0.977 F1	Audio Microphone	First method using transfer learning on audio, High accuracy	Large network creates high latency: unfavorable for real-time monitoring
Amoh et al., 2016 [14]	CNN and RNN on spectrograms	92.7% SP for CNN, 87.7% SE for RNN	Audio Microphone	Comparative analysis of CNN vs. RNN for cough detection on audio, Found CNN performs better and higher specificity, RNN higher sensitivity	Both models have mediocre accuracy, unfavorable for precise monitoring
Elfaramawy et al., 2018 [15]	Wearable Patch Sensors	N/A	Proprietary Patch Sensors	Using gyro sensors on abdomen, chest to monitor breathing pattern w/o AI, Abnormal disruption of pattern is a cough	Requires patch sensors + supporting equipment to be worn at all times, inconvenient for everyday use
Drugman et al., 2019 [16]	Evaluating the efficacy of 6 sensors to detect coughs	94.5% SP, 94.4% SE	Audio and Contact Microphones	Comparative analysis of several sensors for cough detection Found method using sound + contact produced best results	Requires contact microphones to be worn at all times + supporting equipment, inconvenient for everyday use
Soliński et al., 2020 [17]	DNN on Spirometry Curves	91% SP, 86% SE, 91% ACC	Spirometer	Analyzed airflow through spirometer to differentiate normal breath vs. cough	Can only detect coughs through, inconvenient for everyday use

Table 2. Correlations between the degradation of the environmental and meteorological factors and the increase in COPD exacerbation risks, derived from retrospective medical studies [9,28]. As an example, these studies demonstrated that an increase in NO₂ concentration by 10 ug/m³ resulted in about 2% increase in the risk. These correlations were used to estimate the overall risk trends based on the real-time data from local sensors.

Parameters	Risk Increase Coefficient [9,28]	Safety Standards	Rate [9,28]
PM $_{2.5}$	1%	12 $μ$ g/m³ [29]	10. $μ$ g/m³
PM $_{10}$	0.8%	7 $μ$ g/m³ [29]	10. $μ$ g/m³
NO $_{2}$	2%	101.23 $μ$ g/m³ [30]	10. $μ$ g/m³
Temperature	4.7%	68.0 $^{\circ}$ F	−1.8 $^{\circ}$ F

Table 3. Summary of results of the comparative study of CNN, RNN, CRNN, and the proposed STAIN machine learning models for cough detection. The sensitivity, specificity, accuracy, and Matthews Correlation Coefficient metrics were obtained for all four models using the same datasets. As can be seen, the STAIN model outperforms all the other traditional AI models with its deeply meshed spatio-temporal feature extraction architecture, which is more advantageous for effectively classifying respiratory events.

Performance Metrics	Convolutional Neural Network (CNN)	Recurrent Neural Network (RNN)	Convolutional Recurrent Neural Network (CRNN)	Spatio-Temporal AI Network (STAIN)
Sensitivity	91.6%	83.4%	89.8%	92.7%
Specificity	93.8%	89.2%	92.1%	94.2%
Accuracy	92.7%	86.3%	91.0%	93.4%
MCC	0.8542	0.7272	0.8192	0.8691

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bhowmik, R.T.; Most, S.P. A Personalized Respiratory Disease Exacerbation Prediction Technique Based on a Novel Spatio-Temporal Machine Learning Architecture and Local Environmental Sensor Networks. Electronics 2022, 11, 2562. https://doi.org/10.3390/electronics11162562

AMA Style

Bhowmik RT, Most SP. A Personalized Respiratory Disease Exacerbation Prediction Technique Based on a Novel Spatio-Temporal Machine Learning Architecture and Local Environmental Sensor Networks. Electronics. 2022; 11(16):2562. https://doi.org/10.3390/electronics11162562

Chicago/Turabian Style

Bhowmik, Rohan T., and Sam P. Most. 2022. "A Personalized Respiratory Disease Exacerbation Prediction Technique Based on a Novel Spatio-Temporal Machine Learning Architecture and Local Environmental Sensor Networks" Electronics 11, no. 16: 2562. https://doi.org/10.3390/electronics11162562

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Personalized Respiratory Disease Exacerbation Prediction Technique Based on a Novel Spatio-Temporal Machine Learning Architecture and Local Environmental Sensor Networks

Abstract

1. Introduction

2. Related Work

3. Proposed Work

3.1. Proposed Multi-Modal System Architecture

3.2. Detection Module

3.2.1. A New Machine Learning Architecture for Respiratory Sound Analysis

3.2.2. Creation of the Dataset

3.3. Environmental Module

3.4. Prediction Module

4. Results

4.1. Benchmarking

4.2. Demonstration of the Detection Module

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI