1. Introduction
Chronic obstructive pulmonary disease is a life-threatening progressive pulmonary syndrome that causes breathlessness and may lead to death if not diagnosed and treated at early stages. COPD reduces the lung’s breathing capacity, creating respiration problems in patients. The primary element of COPD is emphysema and chronic bronchitis [
1]. Emphysema is a pulmonary condition in which the lung tissues of patients are damaged. Chronic bronchitis is a bronchial condition caused by excessive coughing and mucus production in the respiratory tract. The most common symptoms of COPD are wheezing, shortness of breath and coughing. Coronary heart disease, weight loss, obesity, cognitive dysfunction, anorexia, and lung cancer indicate COPD [
2]. According to a recent report by the world health organization (WHO), COPD is the third leading cause of death worldwide and has caused around 3.23 million deaths in 2019 alone [
3]. The research shows that COPD has an extremely high mortality rate.
COPD is diagnosed by analyzing the patient’s history of exposure to pulmonary irritants such as smoking and family history. Currently, the doctor performs a physical examination to diagnose COPD in the hospital. The doctor uses classical methods such as a stethoscope to listen to a patient’s lung and heart sounds. In addition, several tests can also be performed for COPD detection including spirometry, chest radiography (X-ray), blood gas test, a computed tomography (CT) scan, and genetic tests. The spirometry examination is another best method for diagnosing COPD [
4]. During the spirometry examination, the patient has to inhale to full lung capacity and then push the air out of the lungs as quickly as possible. The patient must maintain this examination until the lung volume is close to the residual volume. The efficiency of the spirometry examination raised several questions [
5,
6]. The improperly trained staff and inadequate testing are the reasons for low-quality spirometry results. By examining the classical method’s efficiency, advanced systems must be built to detect COPD in patients.
The ultra-wideband (UWB) radar is gaining wide attraction and is a highly used technology by the time domain and Xtreme spectrum companies [
7,
8]. The UWB radar has high data rates and low levels of transmission power. The UWB radar has non-intrusive and non-tackling capabilities to penetrate various obstacles or materials, which is an advantage over other classical methods [
9,
10]. Nowadays, many experiments are held in healthcare applications using wireless sensing systems based on UWB to identify vital signs related to different diseases [
10,
11]. The UWB radar radiates and absorbs little electricity compared to other instruments in biomedical applications.
Machine learning and deep learning are the subfields of artificial intelligence. Artificial intelligence-based techniques provide systems with the ability to automatically learn and improve from experience without being explicitly programmed to perform the task. In recent years, machine learning and deep learning have been applied to various bioinformatics problems such as drug discovery, protein structure prediction, disease detection, and many more. Machine learning and deep learning techniques can help us analyze large biomedical datasets to find new insights and develop new applications for bioinformatics research. Several COPD detection algorithms have been presented that utilize machine and deep learning methods. However, such methods are predominantly intrusive and lack the desired prediction accuracy. Situations such as pandemics require non-invasive methods that can detect COPD patients from a distance and can provide high accuracy. This study uses the temporal and spectral features data extracted from UWB radar-based biological signals to detect COPD paints. Machine learning and deep learning-based methods detect COPD patients from the temporal and spectral features data. The major contributions of this study are as follows:
A new dataset is created in this study to detect COPD patients. The dataset is based on the UWB radar, which is used to collect data from confirmed patients and healthy people from a hospital.
This study utilizes the temporal and spectral features from UWB radar biological signals data. The exploratory data analysis is applied to discover dataset patterns and correlations. The correlations analysis is conducted to select the dataset features with high correlation values that result in high performance.
For experiments, four machine learning, and two deep learning model are employed for performance comparison. The decision tree (DT), logistic regression (LR), Gaussian Naive Bayes (GNB), and support vector machines (SVM) are the applied machine learning methods while long short-term memory (LSTM) and gated recurrent unit (GRU) are applied as deep learning models.
Performance is also validated using k-fold cross-validation, as well as performance comparison with existing state-of-the-art studies.
The study is further divided into several sections.
Section 2 analyzes the literature on COPD detection. The materials and methods of the proposed framework are examined in
Section 3. The experimental results and discussions are presented in
Section 4. In the end,
Section 5 concludes the study findings for COPD detection.
2. Related Work
In view of the increasing number of deaths from COPD, several research works have been presented during the past years. For example, the goal of [
12] is to examine whether or not UWB radar could be used as a non-invasive method for distinguishing between COPD patients and healthy individuals. Raw data are obtained from a distance of 1.5 m in a real-world setting (a hospital). The obtained raw data are then processed using signal extraction methods to obtain respiration data. The detection of COPD patients based just on the respiratory rate is insufficient. However, the performance is significantly improved by including other factors such as age, gender, and smoking history. Several machine learning classifiers are used to identify COPD cases including Naive Bayes (NB), SVM, random forest (RF), K nearest neighbor (KNN), Adaboost, and deep-learning models such as convolutional neural network (CNN), and LSTM. The findings of the experiments indicate that LSTM has the highest accuracy of 93%.
The study [
13] determines the role of therapy to slow or stop disease development, especially severe COPD. Twelve channel recordings of lung function are examined for variable levels of COPD using the RespiratoryDatabase@TR. Forty-one patients’ right and left posterior (chest) and anterior (back) clinical auscultation sites are used to capture lung sounds. To isolate distinctive anomalies in lung sounds, a 3D second-order difference plot is used. Quantization based on the cuboid and octant is used to isolate signature anomalies on the chaos plot. In the classification phase, the deep extreme learning machine classifier (deep ELM), one of the most reliable and speedy deep learning algorithms, is used. Compared to the standard ELM autoencoder, the novel HessELM and LuELM autoencoder kernels are applied to deep ELM, resulting in improved generalization abilities and a quicker training time. The overall accuracy, weighted sensitivity, weighted specificity, and area under the curve (AUC) value of the proposed deep ELM model with LuELM autoencoder for classifying COPD severity are 94.31%, 94.28%, 98.76%, and 0.9659, respectively.
Electrocardiograph (ECG)-derived respiration (EDR) is used to differentiate between COPD patients and healthy individuals in [
14]. The MP45 Biopac is used to record the heart rates and breath rates of 30 people during experiments. After examining the morphological pattern shifts in the respiration and EDR signals, three statistical characteristics are generated for each subject including area, time, and skewness ratio. Error computation and statistical analysis are used to establish how closely the EDR signal matches the original respiration signal. DT, linear discriminant analysis (LDA), SVM, and KNN classifiers are utilized for classification. An accuracy of 98.33% is found for respiration and EDR-derived features when using both DT and KNN.
The researchers in [
15] employed a risk prediction strategy based on deep learning to detect COPD automatically by monitoring respiration rates. In order to distinguish between COPD and non-COPD, several feature combinations were employed using LR, DT, LDA, KNN, SVM, and quadratic discriminant classifiers. Spirometry readings and other parameters of respiration were used to identify the category. When using the two most important characteristics for interpreting lung sounds—median frequency and linear predictive parameters—the SVM classifier can attain a maximum classification accuracy of 83.6%. Using median frequency, linear predictive coefficient, and spirometry data, SVM and LR both obtained 99 percent accuracy.
An ECG- and EDR-based technique is proposed in [
16] to detect subtle and obstructive respiratory disorders. The heartbeat data are collected using a Biopac system MP45. Each patient has an electrocardiogram recorded for 300 s at a sampling rate of 1000 hertz. Both the ECG and the EDR signal’s morphological variations are used to obtain temporal information, which is then used to identify distinctive features. The subjects are then categorized into normal, obstructive, and restrictive clusters utilizing numerous supervised classifiers. Evaluation of the classifier’s performance on 90 participants (both healthy and unwell) reveals that the SVM has a classification accuracy of more than 98%.
The study [
17] discovers and compares the informative features of lung sounds using various signal processing techniques, as well as chooses the classification approach that gives the most accurate detection of bronchopulmonary system conditions. Power spectrum density (PSD) is estimated for respiratory signals using the Fast Fourier transform (FFT) technique. The spectrograms of the obtained signals are examined to derive the spectral features of the lung sounds. The average temporal dependences of the PSD at various frequencies are calculated. As spectrogram features, the sum of magnitude values of the power spectrum curve for each frequency band is used. The ratios of energies related to the detail levels of wavelet decomposition to the overall energy of the decomposed signal are employed as the parameters for wavelet analysis-based signal identification. As characteristics produced from mel-cepstral analysis, it is recommended to employ the logarithmic (mel) filterbank energies, averaged across time frames, depending on the channel index and time, as well as the mel frequency cepstrum based on cepstrum index. The best classification models for computerized illness screening are determined using supervised machine learning based on decision trees, discriminant analysis, SVM, LR, KNN, and ensemble learning. Using these feature sets, the accuracy of the various classifiers is calculated and compared. Based on the results, a combination of characteristics and classifiers with an identification accuracy of 93% for lung conditions is presented.
In the same manner, ref. [
18] employs CNN to aid medical professionals by offering a comprehensive and rigorous analysis of the medical respiratory audio data for COPD identification. Librosa machine learning library features such as MFCC, Mel-Spectrogram, Chroma, Chroma (Constant-Q), and Chroma CENS are utilized for this purpose. Additionally, the proposed system could interpret the degree of the discovered ailment, such as mild, moderate, or severe. The findings of the research verify the effectiveness of the suggested deep learning method that achieved an accuracy score of 93%.
A CNN-based model is developed by [
19] to diagnose COPD using the 3D lung airway tree. After extracting airway trees from CT scans, ventral, dorsal, and isometric snapshots of their 3D representations are generated. Using snapshots of each image, a deep CNN model is developed and then tuned using a Bayesian optimization approach in order to identify COPD. The ultimate forecast is determined by the majority vote of three opinions. The class-discriminatory localization maps have been created to graphically illustrate the CNNs’ judgments. The accuracy of the models trained with a single view (ventral, dorsal, and isometric) of colorful images are comparable (86.8%, 87.5%, and 86.7%), while the model after voting reaches an accuracy of 88.2%. Using gray and binary snapshots, the final voting model obtains an accuracy of 88.6% and 86.6%, respectively. Similarly, the study [
20] proposes an integrated model for diagnosing COPD patients, based on the knowledge graph. First, a knowledge graph of COPD is developed in order to assess the link between feature subsets and identify knowledge about illnesses revealed by the data. Second, an algorithm for sorting features and an adaptive feature subset selection method, CMFS-, are proposed. CMFS- picks an ideal subset of features from the original high-dimensional collection. Finally, the DSA-SVM integrated model is used as a classifier for the diagnosis and prediction of COPD that achieved an accuracy of 95.1%.
The accurate analysis of respiratory tract fluids, such as saliva, can be a promising approach for identifying the severity of the disease and predicting its future exacerbations in a Point-of-Care (PoC) environment. However, it is important to take the demographic and medical parameters of patients into account to obtain accurate results. The study [
21] applied machine learning techniques on saliva samples from COPD patients and healthy people, along with demographic information, for PoC recognition of the disease. As part of the Exasens joint research project, two sets of saliva samples were gathered from healthy controls (HC) and COPD patients. The samples consist of 160 HC and 79 COPD patient specimens and were collected at the BioMaterialBank Nord in Borstel, Germany between November 2016 and February 2018. A permittivity biosensor was used to analyze the dielectric properties of saliva samples. The XGBoost gradient boosting algorithm achieved a high classification accuracy and sensitivity of 91.25% and 100%, respectively, indicating its potential for COPD detection.
The study [
22] compared several machine learning algorithms to identify early-stage COPD using multichannel lung sounds. The study analyzed multichannel lung sounds using statistical features of frequency modulations extracted using the Hilbert–Huang transform. The proposed deep learning model with Hilbert–Huang transform-based statistical features achieves high classification rates of 93.67%, 91%, and 96.33% for accuracy, sensitivity, and specificity, respectively. The analysis of multichannel lung sounds provides a standardized evaluation with high classification performance, and the 12-channel lung sound analysis provides the advantage of assessing entire lung obstructions. This study is the first to directly focus on lung sounds to differentiate between COPD and non-COPD patients, and its significance lies in its ability to provide a standardized assessment using advanced machine learning algorithms.
The authors analyzed the impact of different features for COPD detection in [
23] with a focus on differentiating between early and advanced stages of the disease. The recursive feature elimination cross-validated (RFECV) method was utilized for feature selection, and expert doctors were consulted to recommend features among those selected using the RFECV method. Two sets of features were selected, and different machine learning algorithms were employed to compare their performance and feature importance. The RFECV method produced an accuracy of 96%, while feature reduction with doctor recommendation (FRDR) achieved an accuracy of 90%. Despite the slight difference in results, both sets of features exhibited promising outcomes.
The above-discussed studies report good results for COPD detection using various technologies and approaches. However, these studies have several limitations regarding the used technology or approach. Traditional approaches predominantly utilize physical examinations, in addition to several tests such as spirometry, chest radiography, blood test, and genetic tests. Yet, such tests are invasive, requiring close contact with the device and other people. In addition, the diagnosis is based on a doctor’s subjective evaluation and may be prone to error or misjudgments. Pandemic situations such as COVID-19, where physical contact is restricted, demand non-invasive technologies. In such scenarios, the proposed UWB-based approach is potentially important. Other than that, the reported accuracy of the discussed research works requires further improvement. The timely detection of COPD patients is another important aspect where UWB can be very effective. In addition, it has the capability to penetrate various obstacles or materials, which means that it can detect COPD patients through clothing and other obstructions. It also has low levels of transmission power and can radiate and absorb little electricity compared to other instruments in biomedical applications, which makes it safer for patients.