MOS Sensors Array for the Discrimination of Lung Cancer and At-Risk Subjects with Exhaled Breath Analysis

Marzorati, Davide; Mainardi, Luca; Sedda, Giulia; Gasparri, Roberto; Spaggiari, Lorenzo; Cerveri, Pietro

doi:10.3390/chemosensors9080209

Open AccessArticle

MOS Sensors Array for the Discrimination of Lung Cancer and At-Risk Subjects with Exhaled Breath Analysis

by

Davide Marzorati

^1,*

,

Luca Mainardi

¹

,

Giulia Sedda

²

,

Roberto Gasparri

²

,

Lorenzo Spaggiari

^2,3

and

Pietro Cerveri

^1,*

¹

Department of Electronics, Information, and Bioengineering, Politecnico di Milano, 20133 Milan, Italy

²

Department of Thoracic Surgery—IEO, European Institute of Oncology IRCCS, 20132 Milan, Italy

³

Department of Oncology and Hemato-Oncology, University of Milan, 20122 Milan, Italy

^*

Authors to whom correspondence should be addressed.

Chemosensors 2021, 9(8), 209; https://doi.org/10.3390/chemosensors9080209

Submission received: 13 May 2021 / Revised: 30 July 2021 / Accepted: 30 July 2021 / Published: 5 August 2021

(This article belongs to the Special Issue State-of-the-Art in Electronic Nose Based on Optoelectronic/Electrochemical Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

Lung cancer is characterized by a tremendously high mortality rate and a low 5-year survival rate when diagnosed at a late stage. Early diagnosis of lung cancer drastically reduces its mortality rate and improves survival. Exhaled breath analysis could offer a tool to clinicians to improve the ability to detect lung cancer at an early stage, thus leading to a reduction in the associated survival rate. In this paper, we present an electronic nose for the automatic analysis of exhaled breath. A total of five a-specific gas sensors were embedded in the electronic nose, making it sensitive to different volatile organic compounds (VOCs) contained in exhaled breath. Nine features were extracted from each gas sensor response to exhaled breath, identifying the subject breathprint. We tested the electronic nose on a cohort of 80 subjects, equally split between lung cancer and at-risk control subjects. Including gas sensor features and clinical features in a classification model, recall, precision, and accuracy of 78%, 80%, and 77% were reached using a fourfold cross-validation approach. The addition of other a-specific gas sensors, or of sensors specific to certain compounds, could improve the classification accuracy, therefore allowing for the development of a clinical tool to be integrated in the clinical pipeline for exhaled breath analysis and lung cancer early diagnosis.

Keywords:

exhaled breath; MOS sensors; VOCs

1. Introduction

Lung cancer survival rates are greatly dependent on the stage of the disease at the time of diagnosis. If diagnosed at an early stage, lung cancer has a 70–90% five-year survival rate, which drastically drops to 12% if diagnosed at a late stage [1]. Every year, a total of 1.6 million people die because of lung cancer, and 1.8 million people are diagnosed with this disease [2]. Therefore, lung cancer poses worldwide issues of high mortality and economic burden. The need for lung cancer early diagnosis becomes more important every year and still has to be properly addressed. Imaging techniques can be useful in screening purposes. Low dose computed tomography (LDCT) represents the elective imaging technique for screening purposes, as demonstrated by the National Lung Cancer Screening Trial [3,4]. Nonetheless, for large scale lung cancer clinical screenings, the use of LDCT is not considered appropriate, due to the increasing risk of radiation exposure, the associated economic burden, and the high number of detected false positives [5,6]. Therefore, researchers are taking into consideration alternative techniques which may help in improving lung cancer early diagnosis. Exhaled breath, blood serum, and urine are all examples of mediums from which lung cancer specific biomarkers could be identified and used for early diagnosis [7]. Starting from [8], in the last few years, exhaled breath has been extensively studied and analyzed with the aim of diagnosing several diseases [9,10]. It is now known that exhaled breath could contain more than 3000 different VOCs [11]. In the unfortunate presence of cancer in the human body, the affected cells are characterized by a different metabolism compared to a healthy condition, that could cause an altered VOCs production by cancerous cells and their subsequent release in exhaled breath through the blood–air barrier. Therefore, exhaled breath VOCs could be used as lung cancer biomarkers, useful for early diagnosis. Even though many researchers have focused their efforts on the identification of compounds in healthy and diseased conditions, there is still no consensus on which VOCs are specific to lung cancer, and no compound has been found only in lung cancer subjects [12]. In fact, evidence exists suggesting that a combination of compounds, rather than single compounds, could be specific to lung cancer [13,14]. In order for a compound to be considered as a valid clinical biomarker for disease diagnosis, a proper pipeline must be followed. Pepe et al. suggest a five-step flow for biomarker identification: pre-clinical exploratory phase, clinical validation, retrospective analysis, prospective screening, and cancer control [15]. To date, no compound found in exhaled breath has reached such a level of validation for lung cancer diagnosis. As a consequence of this uncertainty, no clinical tool is currently available to clinicians for lung cancer early diagnosis based on exhaled breath. There are several techniques that could potentially be used for VOCs analysis, but to date, none of them can be considered suitable for clinical practice [9]. Great promises are offered by electronic noses. These devices are based on gas sensor arrays, with each sensor potentially designed with different sensing principles. Examples are colorimetric gas sensors [16,17], conductive polymer gas sensors [18], metal oxide gas sensors [19,20], and also type-different sensor arrays [21]. Even though studies involving the use of electronic noses for exhaled breath analysis have been extensively published in the last few years, the application of this technique in clinical practice as screening tool still remains a hypothesis [22]. As a matter of fact, some issues of such a technique can be pinpointed and need to be properly addressed by researchers before proceeding with application in standard clinical pipelines. First, the most common medium used for the temporary storage of the breath sample, Tedlar bags, do not allow for long-term storage of the sample under analysis without causing VOCs leakage [23]. To date, no consensus has been reached on the best strategy and gases to be used to obtain a proper cleaning of Tedlar bags before breath sampling [23,24,25]. Furthermore, sensor washout and cleaning should be carried out in order to maintain a stable baseline when analyzing multiple samples, without being affected by environmental changes and sensor drift. Even though extensive studies have been published in recent years, no off-the-shelf device is available to clinicians to replace standard screening methods for disease early diagnosis with exhaled breath analysis. Little progress has been made for its clinical application, as we are aware of only one registered clinical trial which is currently underway. This clinical trial (NCT02612532)) is sponsored by Owlstone Ltd. (Cambridge, UK) and has the main aim of defining the diagnostic accuracy of the ReCIVa breath sampler coupled to a gas analyzer based on spectrometer techniques. This clinical trial started in 2015 and involved more than 26 clinical sites for exhaled breath collection. The estimated enrollment was 520 subjects, but no results have been published so far.

In the present paper, we propose an electronic nose, embedding an array of five commercial MOS gas sensors, for the analysis of exhaled breath coupled to machine learning techniques for the analysis of the collected gas sensor responses. The proposed electronic nose was designed with the aim of being composed by commercial components, embedding few gas sensors, and being portable. Two main motivations were behind these design choices. The first one was the development of a device to be used by general practitioners as a large population screening tool, and as such requiring a portable instrument and less expensive equipment compared to standard gas chromatography-mass spectrometry (GC-MS) laboratory tools. The second one was the easier interpretation and readability of the achieved results, as few sensors were embedded in the electronic nose, thus reducing the related number of features extracted from the gas sensor responses. A feasibility study of the proposed device is described in this manuscript. The device was evaluated in a prospective study on a cohort of 80 subjects, equally split between lung cancer and at-risk subjects. For each subject involved in the experiment, a breathprint was computed through the extraction of features from the gas sensors’ responses to exhaled breath. The subject-specific breathprints, together with clinical features, were used as inputs for machine learning algorithms, with the aim of discriminating lung cancer from at-risk subjects. Furthermore, we performed an analysis on the achieved results based on the time difference between breath collection and the subsequent analysis with the electronic nose. This was done with the aim of assessing whether classification accuracy was dependent on this time difference when using Tedlar bags as temporary storage for exhaled breath analysis. The manuscript is organized as follows. Section 2 describes the metal oxide developed electronic nose used in this study, the features extracted from the gas sensors’ responses, and the machine learning techniques. Section 3 presents the results obtained from the analysis of the exhaled breath of a cohort of 80 subjects, equally split between lung cancer and control subjects. Section 4 analyzes the obtained results, and Section 5 draws the appropriate conclusions.

2. Materials and Methods

2.1. Gas Sensors

The gas sensors used in the proposed electronic nose for exhaled breath analysis are commercial metal oxide (MOS) sensors. The basic structure of an MOS sensor is a ceramic support tube which is then coated with

S_{n} O_{2}

[26]. The sensitivity of the sensor can be controlled by acting on the surface material property [27]. For instance, the level of porosity of the sensitive material and its working temperature produces effects on the sensitivity of metal oxide gas sensors [28]. A conductivity change is observed when specific gas molecules, present in the air and to which the sensor is sensitive to, interact with its surface [28]. A measuring circuit, which can be as simple as a voltage divider able to detect resistance changes in the sensor, can be used for measuring conductivity changes. Some MOS sensors offer the possibility to precisely control their working temperature, thus allowing for additional tailoring to specific substance detection [28].

Table 1 describes the commercial gas sensors embedded in the electronic nose described in this study. A total of five gas sensors (Figaro USA Inc., Arlington Heights, IL, USA) sensitive to different compounds were chosen. Previous studies showed the capability of such sensors in discriminating healthy controls from subjects affected by lung cancer, diabetes, or other diseases [19,21,29]. All the sensors contain inside them a heater that can be configured to set the working temperature and to maximize sensor compound detection. In the present work, gas sensors were operated at a constant power, as the total current through the heater resistance was maintained constant. Figure 1 shows the typical gas sensor response curve when exposed to a sample of exhaled breath. In the first phase, namely the cleaning phase, the sensor has a constant value which is considered as its baseline value. When the exposure to the sample of interest starts in the measuring phase, the conductivity of the sensor increases, and as a consequence, resistance decreases upon reaching a plateau. In the last phase, named recovery, the sensor is exposed to environmental air and recovers its initial baseline value.

2.2. Electronic Nose

Gas sensors alone are unable to perform for exhaled breath analysis, as they need to be integrated into a more complex device. Such a device has two main purposes. The first one is controlling gas flow into and out of an analysis chamber where the gas sensors are located, and the second one is measuring the gas sensor resistance changes. These kinds of devices are commonly referred to as electronic noses [26]. A schematic diagram showing the typical components of an electronic nose is shown in Figure 2. The analysis chamber employed in the proposed electronic nose was entirely made of die-cast aluminum with a volume of approximately 500 mL. Two airtight holes, positioned on two opposite sides of the chamber, allowed for gas inflow and outflow. The five gas sensors were all embedded in the analysis chamber. In addition to the gas sensors, a temperature and relative humidity sensor (model SHT75, Sensirion AG, Switzerland) was used to monitor temperature and relative humidity inside the chamber. A 12 V air pump was used to pump air into and out of the analysis chamber (model H085-11, Parker Hannifin, Cleveland, OH, USA). The air flow was controlled by means of three solenoid valves (model L172, Sirai, Italy): two valves controlled air flow from the Tedlar bag and the environment, while an additional valve controlled air flow out of the analysis chamber. The main core controller of the electronic nose was an ARM Cortex-M3 microcontroller (Infineon Technologies, Neubiberg, Germany). The controller took care of all the required tasks for exhaled breath analysis: among them analog-to-digital conversions of gas sensor resistance, air pump and solenoid valves control, and data streaming towards a host device. The sampling frequency of the gas sensor resistance values was maintained constant at 10 Hz, and the sampling resolution was set to 16-bit. The schematic of the circuit handling gas sensor control and resistance measurement is shown in Figure 3. A software running on a host computer and communicating using Bluetooth communication protocol with the electronic nose was developed using Python and Kivy framework. This software allowed for the control of the device, visualization of the collected measurements, and storage of the gas sensor response data.

2.3. Data Pre-Processing and Feature Extraction

The aim of the feature extraction process was the computation of relevant characteristics of the time series curves, which could be later used as input to a classification algorithm. Prior to performing feature extraction, the raw gas sensors’ response curves were filtered with a 4th order low-pass Butterworth filter with a cut-off frequency of 1 Hz to remove noise and high-frequency oscillations affecting the sensors’ data. Furthermore, the sampled voltage values were converted to resistance for subsequent analysis. An example of filtered and converted data is shown in Figure 4.

Feature extraction from raw gas sensor data was carried out in order to extract subject-specific breathprints and later use them for classification purposes. Starting from features described in preceding studies and from our own experience, we extracted a total number of 9 features from each sensor curve [30]. Given that 5 gas sensors were embedded in the proposed electronic nose, the total number of features for each subject was equal to 45. The extracted features were divided into static and dynamic features. Referring to Figure 1, the static features described in Table 2 were extracted.

Dynamic features, instead, were extracted in the phase-space of the gas sensor data [31,32]. Firstly, starting from the gas sensor response, named

S (t)

, we computed its derivative

d S (t) / d t

. Secondly, the derivative was plotted with respect to the raw gas sensors’ response. A visual explanation of this procedure is reported in Figure 1. The extracted dynamic features are depicted in Figure 1, with only one feature, AreaPS, not reported. This feature is computed as the area enclosed by the curve in the phase-space plot during the measurement phase.

The presence of correlated and redundant features could influence the subsequent step of data classification, making it difficult to complete a proper training of the algorithm, potentially leading to data overfitting. Therefore, principal component analysis (PCA) and sequential forward feature selection (SFFS) were used as feature reduction and selection methods with the aim of diminishing the number of features while still keeping important information embedded in the data. This step allowed reducing the dimensions of the input data used for the classification algorithm aimed at discriminating between lung cancer and control subjects. PCA aims at reducing the total number of features by projecting the features into a lower dimensional space, and it is typically used in gas sensor applications [21,33]. SFFS sequentially adds features from the feature subset in a greedy way, and it was applied with gas sensor data in other literature studies [29,34,35].

In addition to sensor features, clinical features were also considered in the classification process. For each subject participating in the experiments, a set of clinical features was provided by clinicians after analyzing the medical history of each subject. Table 3 summarizes the included clinical features. Smoking level was assessed in terms of both smoking status and pack years. Three main different groups of clinical features were identified: numerical (age, pack years, and BMI); binary (gender and all comorbidities); and categorical (smoking). As PCA requires numerical features to be properly applied, it was used as a pre-processing step for feature reduction only when sensor features were employed in the classification model. When clinical features were embedded in the classification model, only SFFS was used as a feature selection technique, without any feature reduction procedure, as clinical data contained binary and categorical variables.

2.4. Classification Algorithms and Metrics

After feature extraction and feature reduction, the next step was the classification of the collected breathprints for the discrimination between lung cancer and control subjects. In this manuscript, two classification methods have been tested. The following algorithms were applied to the collected gas sensor data: support vector machine (SVM) and AdaBoost. In the literature, such algorithms have already been tested for the classification of gas sensor data [18,21,36,37]. A 4-fold cross-validation (CV) scheme was used to assess the algorithm performance, and a grid search allowed for the optimization of the hyperparameters of the models.

Given the confusion matrix reported in Table 4, accuracy, recall, and precision metrics can be computed [38]. These metrics were used to assess the ability of the classification algorithms in discriminating subjects in lung cancer and control groups:

Accuracy:

$A c c = \frac{T P + F N}{T P + F P + T N + F N}$

(1)
Recall:

$R e = \frac{T P}{T P + F N}$

(2)
Precision:

$R e = \frac{T P}{T P + F P}$

(3)

As the algorithms were tested in a CV scheme, the results will be reported as mean ± standard deviation (

μ \pm σ

) on the 4 folds used for testing purposes.

2.5. Exhaled Breath Collection

The assessment of the performance of the developed electronic nose in discriminating lung cancer subjects from at-risk subjects required the collection of exhaled breath from human subjects. This study was approved by the European Institute of Oncology (IEO) ethical committee, with approval number R1004/19-IEO 1056. The study started in July 2019 and lasted until December 2020. All the individuals who participated in the experiments were provided with all the required information and instructions regarding the exhaled breath collection procedure. Prior to exhaled breath collection, informed consent was signed by the participants. Each study participant was requested to abide by simple guidelines before the breath sampling procedure. These guidelines were previously described in the literature by our group [39]. Lung cancer subjects were recruited among subjects waiting to undergo lung cancer removal surgery at IEO, while control subjects were recruited among subjects undergoing standard screening procedures as they were considered at-risk of developing lung cancer. At the moment of breath collection, which was carried out at IEO, the participants were asked to perform an inspiration followed by a full expiration. While a subject was performing the expiration, the operator in charge of breath collection controlled a three way valve in order to consecutively fill two Tedlar bags. The first bag, characterized by a maximum volume of 0.5 L, was devoted to dead space collection. The second bag, instead, with a volume of 3 L, allowed for collection of alveolar space breath. Exhaled breath samples were stored in the Tedlar bags and analyzed within the same day of collection with the electronic nose described in this study. For the purpose of this study, only Tedlar bags with alveolar breath were analyzed.

2.6. Exhaled Breath Analysis

After the breath collection procedure was completed, Tedlar bags were transferred from the site of collection (IEO) to the analysis laboratory and analyzed within the same day. Before performing any exhaled breath analysis with the electronic nose, the device was powered on so that the gas sensors were kept heated. No measurement was started until the gas sensors showed a stable response, which was checked by computing the standard deviation of the gas sensor responses. Referring to Figure 2, the procedure for the breath bags’ analysis was carried out as follows:

The Tedlar breath bag was connected to the electronic nose, and an analysis session was started from the host PC;
Environmental air (see Figure 2) was brought into the analysis chamber by opening the appropriate valve, while keeping the breath bag valve closed and the valve positioned at the outlet of the analysis chamber open (cleaning phase in Figure 1);
During the cleaning phase, the gas sensors stabilize on a baseline value. A check on the standard deviation of the gas sensors response was performed, allowing for a maximum duration of this phase of 120 s;
Upon completion of the cleaning phase, the environmental air valve was closed, and the Tedlar breath bag valve was opened, contemporary to the closing of the valve positioned at the outlet of the analysis chamber. Breath started to flow inside the chamber with a rate of 1 L/min. Once the breath bag was completely empty, all valves were closed;
The gas sensors were exposed to the breath sample for a total of 180 s (measuring phase);
After the measuring phase concluded, gas sensors were again exposed to environmental air until they approached the baseline value (recovery phase). The duration of this phase was set to a maximum of 10 min.

Considering all the described phases, the analysis of a breath bag lasted for a maximum of 14 min.

3. Results

3.1. Study Participants

A total of 80 subjects participated in this study. Table 5 summarizes the main demographic and clinical characteristics of the subjects involved in the experimental study. The control group was characterized by a greater percentage of male subjects with respect to the lung cancer group (62.5% and 52.5%, respectively). A Mann–Whitney test showed a statistically significant difference when comparing age (

p = 0.01

) across the two groups, while no difference was found when comparing the other clinical features (

p > 0.06

). The number of non-smoking subjects was equal for both groups (n = 8). The current smokers were present in a higher number than the ex-smokers in both groups (21 and 18 vs. 11 and 14, respectively). Statistical analysis on pack-years showed no significant difference between the two groups (

p = 0.11

). Hypertension was the most dominant comorbidity, with 10 control and 23 lung cancer subjects reporting it, followed by chronic obstructive pulmonary disease (COPD) and hypercholesterolemia (HCL). As the number of subjects reporting comorbidities other than hypertension was low compared to the dataset dimensions, only hypertension was included in the dataset for classification purposes. Regarding staging of the disease, half of the lung cancer subjects had stage I cancer, followed by stage II (n = 11), and stage III (n = 4). Five lung cancer subjects had benign cancer; thus, no pTNM staging was reported for them.

3.2. Feature Distribution

Figure 5 shows boxplots of all the sensor features for both lung cancer and control subjects, considering all the five sensors embedded in the proposed electronic nose. In order to assess if any significant difference was found for certain features when comparing lung cancer and control groups, statistical analysis was carried out with a Mann–Whitney test. Significant differences were found for the following features with a 1% significance level:

Δ R

for sensor TGS2620; ratio for sensor TGS822;

d^{‴}

for sensors TGS2600, TGS2602, and TGS2620; d for sensor TGS2602; and

d^{″}

for sensor TGS2620.

3.3. Lung Cancer Classification

In this section, we report the results obtained with classification algorithms in discriminating lung cancer from control subjects. As described earlier in Section 2.4, algorithm hyperparameters were optimized with a grid-search approach on a fourfold CV. As the total number of subjects did not allow for a proper training-test set split, the CV folds were also used to assess the generalization capabilities of the algorithms. Results are reported in terms of mean and standard deviation of recall, precision, and accuracy metrics on the four CV test folds. Figure 6 reports the PCA plots for the first three principal components extracted from static and dynamic sensor features. Table 6 reports the results obtained when training the classification algorithms only on sensor data. Instead, Table 7 reports the results when using both sensors and clinical features when training the algorithms. The best classification result using sensor data was found when using AdaBoost as a classification algorithm, both static and dynamic sensor features and PCA with five components selected as the feature reduction method. A mean recall, precision, and accuracy of 67%, 64%, and 66% were reported. SVM was found to be the optimal classification algorithm when integrating clinical features in the model, with dynamic and clinical features as inputs to the model without any feature reduction method, as Table 7 reports. The clinical features that were considered in the classification model are the ones reported in Table 3, with only hypertension considered as a comorbidity due to the fact that diabetes, HCL, COPD, and obesity were reported in few subjects of the overall cohort, as highlighted in Table 5. Such a combination resulted in a mean recall, precision, and accuracy of 78%, 80%, and 77%, respectively. The ensemble technique used in this manuscript for classification achieved similar performance when considering the mean precision metric (0.79 ± 0.19 compared to 0.80 ± 0.12) but resulted in worse results for recall and accuracy. The AdaBoost ensemble algorithm employed static and clinical features as inputs for the model, with no feature reduction applied.

3.4. Lung Cancer Staging Recall Analysis

An additional analysis was carried out on the classification results on lung cancer and control subjects. Classification recall to lung cancer stage was analyzed with the classifier which achieved the best results on the CV folds used for testing purposes. As reported in Table 5, 20 lung cancer subjects were diagnosed with stage I lung cancer, 11 with stage II, and 4 with stage III. As the total number of subjects diagnosed with stage III was lower in comparison to stage I and stage II, two groups of subjects were considered for the lung cancer recall analysis: stage I and stage II–III. Table 8 reports the results for the best classification algorithms on the four CV folds used in the algorithm performance assessment. The best SVM classifier achieved recall of 78% on lung cancer stage I subjects, with a reduction to 71% for lung cancer stage II and stage III subjects.

3.5. Time Dependency Analysis

As a final step in our analysis, we wanted to determine if the elapsed time between breath collection and analysis had an effect on the classification results. Figure 7 shows the time difference in terms of time (hours) between breath collection at IEO and breath analysis at the laboratory where the electronic nose was located. All breath bags were analyzed within the same day, with a minimum time difference of 2 h and a maximum difference time of 9 h and 20 min. For subjects belonging to the lung cancer group, time difference mean and standard deviation were 4 h, 41 min ± 1 h, 8 min, while for the control group, they were 4 h, 33 min ± 1 h, 41 min. A Mann–Whitney U-test showed no significant difference across the two groups at the 5% significance level (

p = 0.1

). A value of 4.5 h was chosen as a threshold to split the dataset into two different groups. One group consisted of subjects that had their breath analyzed within a time-frame of 4.5 h, and the other group consisted of subjects analyzed after the 4.5 h threshold. This resulted in equally split datasets, each one composed of 40 subjects, with different proportions of lung cancer and control subjects, as reported in Table 9. An analysis of the cross-validation predictions was carried out to determine if significant differences were found when comparing the classification results for subjects who were analyzed before the 4.5 h threshold and for those who were analyzed after the identified time threshold. The results are reported in Table 9. Classification results for the two different models that were tested did not show significant differences in terms of mean recall and accuracy for both lung cancer and control subjects across the four folds used for cross validation purposes when comparing the results across the two datasets.

4. Discussion

In this manuscript, two main topics related to exhaled breath analysis for disease diagnosis were addressed. The first one was the development of an electronic nose with five commercial gas sensors embedded for exhaled breath analysis, and the second one was the implementation of machine learning techniques for the analysis of gas sensor responses. This setup, composed of a custom device for breath analysis and of data analysis techniques, was tested in a prospective study with a cohort of 80 subjects, equally split between lung cancer and control subjects, with the aim of discriminating the two groups of subjects based on the exhaled breath analysis. Exhaled breath, together with blood and urine, is one of the three main biological fluids in which researchers are looking for lung cancer biomarkers. The identification of biomarkers able to determine the presence of lung cancer could help in reducing lung cancer’s high mortality rate, allowing for an early diagnosis and an improvement in lung cancer survival rates. Exhaled breath has been extensively studied since [8] first reported the content of exhaled breath in terms of chemical compounds [8]. Several researchers have later analyzed the chemical content of exhaled breath in terms of VOCs concentration [24,40]. Still, to date there is no consensus on the VOCs which are significant for lung cancer, i.e., those compounds that are present only in exhaled breath of subjects affected by lung cancer [12]. Therefore, rather than analyzing the concentration of specific compounds in exhaled breath, an analysis on the exhaled breath mixture is typically carried out [21,37]. Such analysis can be performed with so-called electronic noses, which are devices with embedded sensors sensitive to compounds in the air. In this study, a custom electronic nose with five commercial gas sensors embedded was described. The electronic nose was used to analyze the exhaled breath of 80 subjects, equally split between lung cancer and control groups. The best performance in terms of discrimination between the two groups was achieved when integrating both dynamic sensor features and clinical features into an SVM model. Such a model resulted in a mean recall, precision, and accuracy of 78%, 80%, and 77% across the four CV folds, respectively.

The results obtained in the study described in this manuscript are lower in comparison to those reported in some of the listed studies. Table 10 reports a comparison between the results achieved with the proposed device and several research studies. From the reported data, we can highlight that the sample size is different throughout the studies, making it difficult to assess a proper comparison. Furthermore, several approaches regarding sensor technologies were used in the last few years, ranging from custom electronic noses based on quartz microbalance sensor arrays, MOS sensors, or type-different sensor arrays to the use of commercial devices such as the Cyranose 320. When comparing our results to what was found in other studies, we can see comparable recall and accuracy with studies that reported similar values in the range of 70–80%, but they are still significantly lower than some studies which reported very high recall and accuracy values in lung cancer classification, even greater than 95%. A proper comparison across studies, characterized by different cohorts and sensing technologies, is a challenging tasks. The same cohort of subjects should be recruited, and the collected exhaled breath must be analyzed with the proposed different methods to address if the differences in classification performance are due to the different cohorts and sampling methods or to the sensing technologies. Liu et al. achieved an accuracy, recall, and specificity of 95.75%, 94.78%, and 96.96% on a cohort of 214 subjects with a sensor system composed of 11 sensors and an ensemble learning framework for classification [37]. Tirzïte et al. reached between 95.8% and 96.2% recall when discriminating 252 lung cancer patients from 223 healthy volunteers, with logistic regression analysis as the classification algorithm and the Cyranose 320 electronic nose as the breath analysis method [18]. Chen et al. designed an electronic nose with custom gas sensors, based on a metal ion induced assembly of graphene oxide, and reached a recall of 95.8% and a specificity of 96.0% when testing it on a cohort of 106 subjects (with 48 affected by lung cancer) [41]. We identified several issues that could cause lower performance with respect to the results reported in the literature, summarized in Table 11 and described hereafter.

Three main issues related to the usage of Tedlar bags as temporary storage of exhaled breath were determined. First, the use of Tedlar bags, in particular when the exhaled breath analysis is not performed immediately, could lead to the leakage of VOCs if the analysis is not carried out immediately, with VOC concentration decreasing with bag storage time [23]. Our time analysis, with results reported in Table 9, did not show significant differences in terms of classification performance when splitting the subjects based on the time difference between breath collection and analysis. Nonetheless, the time difference between breath sample collection and breath sample analysis, as depicted in Figure 7, can be considered a potential limitation of the presented study, and further analysis should be carried out minimizing the time difference between sampling and analysis. Second, Tedlar bags require the subjects to fill the bag with a single total lung capacity expiration maneuver. Not only could this procedure cause distress to the subject, but it could also result in an incompletely filled bag. Third, in the presented study, no cleaning procedure was carried out on the Tedlar bags prior to breath sampling, as other authors instead reported [23,24,25,40]. As far as the cleaning procedure of Tedlar bags is concerned, no consensus has been reached among researchers for the best strategy to carry out for properly cleaning the bags. In fact, some studies report the use of nitrogen as cleaning gas, while other report the usage of argon [24,25,40]. In addition to the uncertainty on the cleaning gas, there is still no common technique on the heating process to follow during the cleaning procedure [9]. Taking into consideration the issues related to Tedlar bags, the use of other techniques for breath temporary storage could help in concentrating the VOCs found in exhaled breath, which can have a concentration range as low as in the parts per billion range. The great advantage offered by a process of gas pre-concentration, such as gas desorption tubes, is the potential increase in the detection ability of the device used for the subsequent breath analysis (e.g., an electronic nose) [44,45]. In addition to the above described issues, for a proper comparison across studies, it is necessary to consider the sensors embedded in the proposed electronic nose, with three potential issues of the proposed setup that could be pinpointed. As the first issue, we identified the a-specificity of the employed sensors. As described in Table 1, the embedded sensors were not sensitive to specific substances, but rather to mixtures of compounds in the air. As a consequence, a mixture of compounds detected by the sensors may not be sufficiently compound-specific to achieve proper discrimination between lung cancer and control subjects. Second, the studies proposed in the literature often involve a larger number of sensors in the electronic nose array, ranging up to a maximum of 32 sensors in the Cyranose 320 [18,36]. In our electronic nose, a smaller array composed by only five sensors was designed with the aim of developing a smaller and portable electronic nose. As the last issue, the sensor washout and cleaning process has to be seriously taken into consideration. With MOS sensors, baseline changes over time can be noticed due to either environmental changes or sensor drift, causing difficulty in properly interpreting the results as the sensor baseline value is different across multiple samples. All these factors together could have caused the low performance of the sensor array in the discrimination between the two groups of subjects and must be properly addressed, should a clinical application of exhaled breath be carried out in the future. In the present work we carried out an analysis on the ability of our device to discriminate lung cancer subjects with different lung cancer stages according to the pTNM staging method. We found a recall value of 78% for stage I lung cancer subjects and of 71% for stage II–III subjects. A similar result was found in a paper in which the authors found a recall value of 92% for stage I subjects and of 58% for stage II/III/IV subjects [39]. Other authors have reported results on the discrimination of subjects with different lung cancer stages or subtypes. Mazzone et al. reported the ability of discriminating lung cancer stages [46]. Kort et al. were able to diagnose subtypes of lung cancer in a multi-center prospective study [47]. As we previously described, the identification of lung cancer specific VOCs is already a challenging task, and attempting to determine which VOCs are specific to each lung cancer stage can be even more challenging. Linking the progression of lung cancer or different disease sub-types to the compounds detected by an electronic nose is not straightforward, and future studies are required to assess if such diagnosis methodology is feasible [39].

As far as the application of exhaled breath analysis in clinical practice is concerned, we are aware of only one clinical trial currently underway on exhaled breath analysis for lung cancer early diagnosis. This clinical trial (NCT02612532), sponsored by Owlstone Ltd (UK), has the aim of defining the lung cancer diagnostic accuracy of the ReCIVa breath sampler coupled to a gas analyzer based on spectrometer techniques. This clinical trial started in 2015 and had an estimated enrollment of 520 subjects. No results have been published so far. The publication of results on exhaled breath analysis on such a larger population could help researchers in moving towards the right direction for lung cancer early diagnosis based on exhaled breath analysis. In fact, even if studies in the literature focusing on exhaled breath analysis for disease diagnosis had large cohorts of subjects, the enrolled subjects were affected by several different diseases, with subjects diagnosed only with lung cancer composing a small percentage of the overall cohort [29,48]. Therefore, clinical trials involving a large number of subjects are mandatory to understand if exhaled breath analysis has the potential to be used as a valid clinical tool for lung cancer early diagnosis. This study is not a clinical trial either, as it was designed only as a prospective study aiming at assessing whether the proposed electronic nose provided satisfactory results to be used in large scale clinical trials. Furthermore, a database with exhaled breath data collected in several studies and clinical trials should be made public and released to researchers, allowing them to carry out data analysis and validation of the collected data, potentially integrating data from several sources in a single database.

We believe that the integration of exhaled breath with other biological fluids, such as blood serum, urine, and saliva, could potentially improve the ability for early disease diagnosis, in particular for lung cancer [22,49]. In fact, in the last few years, researchers have focused on the analysis of such biological fluids, looking for additional biomarkers in addition to those offered by the VOC mixture in exhaled breath [50,51,52]. Such multi-fluid analysis, designed as a simple test to be administered by general practitioners, could have the potential to improve the early diagnosis of lung cancer, thus decreasing its extremely high mortality rate.

5. Conclusions

The electronic nose described in this manuscript, composed of an array of five gas sensors targeted to VOC detection, coupled with feature extraction and classification strategies, achieved satisfactory results on a cohort of 80 subjects, equally split into two groups of control and lung cancer. Median recall, precision, and accuracy values of 78%, 80%, and 77% were reached using a fourfold cross-validation approach, embedding both features extracted from the gas sensor response curves and clinical features in the classification model. A recall of 78% was found for stage I lung cancer subjects, with a decrease to 71% for stage II–III subjects. An analysis on the effects of the time distance between breath collection in Tedlar bags and its subsequent analysis showed no significant difference in the classification results. Two main future activities are planned with the aim of improving the classification capabilities of the device: (1) expansion of the gas sensor array with the addition of commercial gas sensors, thus increasing the number of extracted features and potentially improving classification performance, and (2) integration of custom developed gas sensors, based on molecularly imprinted polymers, in the electronic nose array. Once these improvements are be completed, the proposed electronic nose could be considered as a candidate prototype tool for exhaled breath analysis on larger cohorts of subjects.

Author Contributions

Conceptualization, R.G., L.S., and P.C.; methodology, D.M.; software, D.M.; validation, R.G. and P.C.; formal analysis, D.M.; investigation, D.M.; resources, G.S., R.G., and L.S.; data curation, G.S. and D.M.; writing—original draft preparation, D.M., L.M., and P.C.; writing—review and editing, D.M., L.M., and P.C.; visualization, D.M. and L.M.; supervision, P.C., R.G., and L.S.; project administration, R.G., L.S., and P.C.; funding acquisition, R.G. and L.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work of DM is supported by a fellowship founded by the European Institute of Oncology (Milan, Italy) with funds of the 2015 5 × 1000 Campaign—Italian Ministry of Health.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Ethics Committee of the European Institute of Oncology (protocol code R1004/19-IEO 1056, 19/06/2019).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Goldstraw, P.; Chansky, K.; Crowley, J.; Rami-Porta, R.; Asamura, H.; Eberhardt, W.E.E.; Nicholson, A.G.; Groome, P.; Mitchell, A.; Bolejack, V.; et al. The IASLC Lung Cancer Staging Project: Proposals for Revision of the TNM Stage Groupings in the Forthcoming (Eighth) Edition of the TNM Classification for Lung Cancer. J. Thorac. Oncol. Off. Publ. Int. Assoc. Study Lung Cancer 2016, 11, 39–51. [Google Scholar] [CrossRef] [Green Version]
Ferlay, J.; Soerjomataram, I.; Dikshit, R.; Eser, S.; Mathers, C.; Rebelo, M.; Parkin, D.M.; Forman, D.; Bray, F. Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012. Int. J. Cancer 2015, 136, E359–E386. [Google Scholar] [CrossRef] [PubMed]
The National Lung Screening Trial Research Team; Aberle, D.R.; Berg, C.D.; Black, W.C.; Church, T.R.; Fagerstrom, R.M.; Galen, B.; Gareen, I.F.; Gatsonis, C.; Goldin, J.; et al. The National Lung Screening Trial: Overview and study design. Radiology 2011, 258, 243–253. [Google Scholar] [CrossRef] [Green Version]
The National Lung Screening Trial Research Team; Church, T.R.; Black, W.C.; Aberle, D.R.; Berg, C.D.; Clingan, K.L.; Duan, F.; Fagerstrom, R.M.; Gareen, I.F.; Gierada, D.S.; et al. Results of initial low-dose computed tomographic screening for lung cancer. N. Engl. J. Med. 2013, 368, 1980–1991. [Google Scholar] [CrossRef] [Green Version]
Christensen, J.D.; Tong, B.C. Computed tomography screening for lung cancer: Where are we now? North Carol. Med J. 2013, 74, 406–410. [Google Scholar] [CrossRef]
Black, W.C.; Gareen, I.F.; Soneji, S.S.; Sicks, J.D.; Keeler, E.B.; Aberle, D.R.; Naeim, A.; Church, T.R.; Silvestri, G.A.; Gorelick, J.; et al. Cost-effectiveness of CT screening in the National Lung Screening Trial. N. Engl. J. Med. 2014, 371, 1793–1802. [Google Scholar] [CrossRef] [Green Version]
Hasan, N.; Kumar, R.; Kavuru, M.S. Lung cancer screening beyond low-dose computed tomography: The role of novel biomarkers. Lung 2014, 192, 639–648. [Google Scholar] [CrossRef]
Pauling, L.; Robinson, A.B.; Teranishi, R.; Cary, P. Quantitative analysis of urine vapor and breath by gas-liquid partition chromatography. Proc. Natl. Acad. Sci. USA 1971, 68, 2374–2376. [Google Scholar] [CrossRef] [Green Version]
Marzorati, D.; Mainardi, L.; Sedda, G.; Gasparri, R.; Spaggiari, L.; Cerveri, P. A Review of Exhaled Breath: A Key Role in Lung Cancer Diagnosis. J. Breath Res. 2019, 13, 034001. [Google Scholar] [CrossRef]
Zhou, J.; Huang, Z.A.; Kumar, U.; Chen, D.D. Review of recent developments in determining volatile organic compounds in exhaled breath as biomarkers for lung cancer diagnosis. Anal. Chim. Acta 2017, 996, 1–9. [Google Scholar] [CrossRef]
Phillips, M.; Herrera, J.; Krishnan, S.; Zain, M.; Greenberg, J.; Cataneo, R.N. Variation in volatile organic compounds in the breath of normal humans. J. Chromatogr. B Biomed. Sci. Appl. 1999, 729, 75–88. [Google Scholar] [CrossRef]
Schallschmidt, K.; Becker, R.; Jung, C.; Bremser, W.; Walles, T.; Neudecker, J.; Leschber, G.; Frese, S.; Nehls, I. Comparison of volatile organic compounds from lung cancer patients and healthy controls-challenges and limitations of an observational study. J. Breath Res. 2016, 10, 046007. [Google Scholar] [CrossRef] [PubMed]
Ligor, T.; Pater, L.; Buszewski, B. Application of an artificial neural network model for selection of potential lung cancer biomarkers. J. Breath Res. 2015, 9, 027106. [Google Scholar] [CrossRef]
Phillips, M.; Altorki, N.; Austin, J.H.M.; Cameron, R.B.; Cataneo, R.N.; Greenberg, J.; Kloss, R.; Maxfield, R.A.; Munawar, M.I.; Pass, H.I.; et al. Prediction of lung cancer using volatile biomarkers in breath. Cancer Biomark. Sect. A Dis. Markers 2007, 3, 95–109. [Google Scholar] [CrossRef]
Pepe, M.S.; Etzioni, R.; Feng, Z.; Potter, J.D.; Thompson, M.L.; Thornquist, M.; Winget, M.; Yasui, Y. Phases of biomarker development for early detection of cancer. J. Natl. Cancer Inst. 2001, 93, 1054–1061. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhong, X.; Li, D.; Du, W.; Yan, M.; Wang, Y.; Huo, D.; Hou, C. Rapid recognition of volatile organic compounds with colorimetric sensor arrays for lung cancer screening. Anal. Bioanal. Chem. 2018, 410, 3671–3681. [Google Scholar] [CrossRef] [PubMed]
Mazzone, P.J.; Hammel, J.; Dweik, R.; Na, J.; Czich, C.; Laskowski, D.; Mekhail, T. Diagnosis of lung cancer by the analysis of exhaled breath with a colorimetric sensor array. Thorax 2007, 62, 565–568. [Google Scholar] [CrossRef] [Green Version]
Tirzïte, M.; Bukovskis, M.; Strazda, G.; Jurka, N.; Taivans, I. Detection of lung cancer with electronic nose and logistic regression analysis. J. Breath Res. 2018, 13, 016006. [Google Scholar] [CrossRef] [Green Version]
Chang, J.E.; Lee, D.S.; Ban, S.W.; Oh, J.; Jung, M.Y.; Kim, S.H.; Park, S.; Persaud, K.; Jheon, S. Analysis of volatile organic compounds in exhaled breath for lung cancer diagnosis using a sensor system. Sens. Actuators B Chem. 2018, 255, 800–807. [Google Scholar] [CrossRef]
Gregis, G.; Sanchez, J.B.; Bezverkhyy, I.; Guy, W.; Berger, F.; Fierro, V.; Bellat, J.P.; Celzard, A. Detection and quantification of lung cancer biomarkers by a micro-analytical device using a single metal oxide-based gas sensor. Sens. Actuators B Chem. 2018, 255, 391–400. [Google Scholar] [CrossRef]
Li, W.; Liu, H.; Xie, D.; He, Z.; Pi, X. Lung Cancer Screening Based on Type-different Sensor Arrays. Sci. Rep. 2017, 7. [Google Scholar] [CrossRef]
Becker, R. Non-invasive cancer detection using volatile biomarkers: Is urine superior to breath? Med. Hypotheses 2020, 143, 110060. [Google Scholar] [CrossRef]
Beauchamp, J.; Herbig, J.; Gutmann, R.; Hansel, A. On the use of Tedlar® bags for breath-gas sampling and analysis. J. Breath Res. 2008, 2, 046001. [Google Scholar] [CrossRef] [PubMed]
Buszewski, B.; Ulanowska, A.; Ligor, T.; Denderz, N.; Amann, A. Analysis of exhaled breath from smokers, passive smokers and non-smokers by solid-phase microextraction gas chromatography/mass spectrometry. Biomed. Chromatogr. BMC 2009, 23, 551–556. [Google Scholar] [CrossRef]
Filipiak, W.; Ruzsanyi, V.; Mochalski, P.; Filipiak, A.; Bajtarevic, A.; Ager, C.; Denz, H.; Hilbe, W.; Jamnig, H.; Hackl, M.; et al. Dependence of exhaled breath composition on exogenous factors, smoking habits and exposure to air pollutants. J. Breath Res. 2012, 6, 036008. [Google Scholar] [CrossRef] [Green Version]
Wilson, A.D.; Baietto, M. Applications and advances in electronic-nose technologies. Sensors 2009, 9, 5099–5148. [Google Scholar] [CrossRef]
Saruhan, B.; Fomekong, R.L.; Nahirniak, S. Review: Influences of Semiconductor Metal Oxide Properties on Gas Sensing Characteristics. Front. Sens. 2021, 2. [Google Scholar] [CrossRef]
Wang, C.; Yin, L.; Zhang, L.; Xiang, D.; Gao, R. Metal oxide gas sensors: Sensitivity and influencing factors. Sensors 2010, 10, 2088–2106. [Google Scholar] [CrossRef] [Green Version]
Kou, L.; Zhang, D.; Liu, D. A Novel Medical E-Nose Signal Analysis System. Sensors 2017, 17, 402. [Google Scholar] [CrossRef] [Green Version]
Blatt, R.; Bonarini, A.; Calabro, E.; Torre, M.D.; Matteucci, M.; Pastorino, U. Lung Cancer Identification by an Electronic Nose based on an Array of MOS Sensors. In Proceedings of the 2007 International Joint Conference on Neural Networks, Orlando, FL, USA, 12–17 August 2007. [Google Scholar] [CrossRef]
Vergara, A.; Llobet, E.; Martinelli, E.; Di Natale, C.; D’Amico, A.; Correig, X. Feature extraction of metal oxide gas sensors using dynamic moments. Sens. Actuators B Chem. 2007, 122, 219–226. [Google Scholar] [CrossRef]
Zhang, S.; Xie, C.; Hu, M.; Li, H.; Bai, Z.; Zeng, D. An entire feature extraction method of metal oxide gas sensors. Sens. Actuators B Chem. 2008, 132, 81–89. [Google Scholar] [CrossRef]
Cavallari, M.R.; Braga, G.S.; da Silva, M.F.P.; Izquierdo, J.E.E.; Paterno, L.G.; Dirani, E.A.T.; Kymissis, I.; Fonseca, F.J. A Hybrid Electronic Nose and Tongue for the Detection of Ketones: Improved Sensor Orthogonality Using Graphene Oxide-Based Detectors. IEEE Sens. J. 2017, 17, 1971–1980. [Google Scholar] [CrossRef]
Paulsson, N.; Larsson, E.; Winquist, F. Extraction and selection of parameters for evaluation of breath alcohol measurement with an electronic nose. Sens. Actuators A Phys. 2000, 84, 187–197. [Google Scholar] [CrossRef]
Yan, K.; Zhang, D. Blood glucose prediction by breath analysis system with feature selection and model fusion. In Proceedings of the 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Chicago, IL, USA, 26–30 August 2014; Volume 2014, pp. 6406–6409. [Google Scholar] [CrossRef]
Tirzīte, M.; Bukovskis, M.; Strazda, G.; Jurka, N.; Taivans, I. Detection of lung cancer in exhaled breath with an electronic nose using support vector machine analysis. J. Breath Res. 2017, 11, 036009. [Google Scholar] [CrossRef]
Liu, L.; Li, W.; He, Z.; Chen, W.; Liu, H.; Chen, K.; Pi, X. Detection of lung cancer with electronic nose using a novel ensemble learning framework. J. Breath Res. 2021, 15, 026014. [Google Scholar] [CrossRef]
Metz, C.E. Basic principles of ROC analysis. In Seminars in Nuclear Medicine; Elsevier: Amsterdam, The Netherlands, 1978; Volume 8, pp. 283–298. [Google Scholar]
Gasparri, R.; Santonico, M.; Valentini, C.; Sedda, G.; Borri, A.; Petrella, F.; Maisonneuve, P.; Pennazza, G.; D’Amico, A.; Natale, C.D.; et al. Volatile Signature for the Early Diagnosis of Lung Cancer. J. Breath Res. 2016, 10, 016007. [Google Scholar] [CrossRef]
Ulanowska, A.; Kowalkowski, T.; Trawińska, E.; Buszewski, B. The application of statistical methods using VOCs to identify patients with lung cancer. J. Breath Res. 2011, 5, 046008. [Google Scholar] [CrossRef]
Chen, Q.; Chen, Z.; Liu, D.; He, Z.; Wu, J. Constructing an E-Nose Using Metal-Ion-Induced Assembly of Graphene Oxide for Diagnosis of Lung Cancer via Exhaled Breath. ACS Appl. Mater. Interfaces 2020, 12, 17713–17724. [Google Scholar] [CrossRef] [PubMed]
Kononov, A.; Korotetsky, B.; Jahatspanian, I.; Gubal, A.; Vasiliev, A.; Arsenjev, A.; Nefedov, A.; Barchuk, A.; Gorbunov, I.; Kozyrev, K.; et al. Online breath analysis using metal oxide semiconductor sensors (electronic nose) for diagnosis of lung cancer. J. Breath Res. 2019, 14, 016004. [Google Scholar] [CrossRef]
Rao, V.K.; Teradal, N.L.; Jelinek, R. Polydiacetylene Capacitive Artificial Nose. ACS Appl. Mater. Interfaces 2019, 11, 4470–4479. [Google Scholar] [CrossRef]
Filipiak, W.; Filipiak, A.; Sponring, A.; Schmid, T.; Zelger, B.; Ager, C.; Klodzinska, E.; Denz, H.; Pizzini, A.; Lucciarini, P.; et al. Comparative analyses of volatile organic compounds (VOCs) from patients, tumors and transformed cell lines for the validation of lung cancer-derived breath markers. J. Breath Res. 2014, 8, 027111. [Google Scholar] [CrossRef] [PubMed]
Horváth, I.; Lázár, Z.; Gyulai, N.; Kollai, M.; Losonczy, G. Exhaled biomarkers in lung cancer. Eur. Respir. J. 2009, 34, 261–275. [Google Scholar] [CrossRef]
Mazzone, P.J.; Wang, X.F.; Xu, Y.; Mekhail, T.; Beukemann, M.C.; Na, J.; Kemling, J.W.; Suslick, K.S.; Sasidhar, M. Exhaled Breath Analysis with a Colorimetric Sensor Array for the Identification and Characterization of Lung Cancer. J. Thorac. Oncol. 2012, 7, 137–142. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kort, S.; Brusse-Keizer, M.; Schouwink, H.; Gerritsen, J.W.; de Jongh, F.; van der Palen, J. Detection of non-small cell lung cancer by an electronic nose. In Lung Cancer; European Respiratory Society: Lausanne, Switzerland, 2017. [Google Scholar] [CrossRef]
Nakhleh, M.K.; Amal, H.; Jeries, R.; Broza, Y.Y.; Aboud, M.; Gharra, A.; Ivgi, H.; Khatib, S.; Badarneh, S.; Har-Shai, L.; et al. Diagnosis and Classification of 17 Diseases from 1404 Subjects via Pattern Analysis of Exhaled Molecules. ACS Nano 2017, 11, 112–125. [Google Scholar] [CrossRef] [Green Version]
Gasparri, R.; Romano, R.; Sedda, G.; Borri, A.; Petrella, F.; Galetta, D.; Casiraghi, M.; Spaggiari, L. Diagnostic biomarkers for lung cancer prevention. J. Breath Res. 2018, 12, 027111. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Montani, F.; Marzi, M.J.; Dezi, F.; Dama, E.; Carletti, R.M.; Bonizzi, G.; Bertolotti, R.; Bellomi, M.; Rampinelli, C.; Maisonneuve, P.; et al. miR-Test: A Blood Test for Lung Cancer Early Detection. JNCI J. Natl. Cancer Inst. 2015, 107. [Google Scholar] [CrossRef] [Green Version]
Sun, Y.; Liu, S.; Qiao, Z.; Shang, Z.; Xia, Z.; Niu, X.; Qian, L.; Zhang, Y.; Fan, L.; Cao, C.X.; et al. Systematic comparison of exosomal proteomes from human saliva and serum for the detection of lung cancer. Anal. Chim. Acta 2017, 982, 84–95. [Google Scholar] [CrossRef]
Zhang, C.; Leng, W.; Sun, C.; Lu, T.; Chen, Z.; Men, X.; Wang, Y.; Wang, G.; Zhen, B.; Qin, J. Urine Proteome Profiling Predicts Lung Cancer from Control Cases and Other Tumors. EBioMedicine 2018, 30, 120–128. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. Left: Raw curve collected from an MOS gas sensor when exposed to exhaled breath. Three main phases can be identified, namely cleaning, measuring, and recovery. In the cleaning phase, the sensor maintains a constant value at its baseline. In the measuring phase, the sensor shows a change in conductivity based on the presence of compounds in the sample of air. A decrease in resistance is noticed if the analyzed sample contains a compound the sensor is sensitive to.

R_{0}

,

R_{m i n}

, and

Δ T

are values computed from the gas sensor response to be used for static feature extraction. Right: Gas sensor response in the phase-space. A total of five features were extracted from the phase-space: four are reported in the plot, with the last one being the relative integral in the measuring (adsorption) phase. Data shown in the figure were collected from the electronic nose presented in this study.

Figure 1. Left: Raw curve collected from an MOS gas sensor when exposed to exhaled breath. Three main phases can be identified, namely cleaning, measuring, and recovery. In the cleaning phase, the sensor maintains a constant value at its baseline. In the measuring phase, the sensor shows a change in conductivity based on the presence of compounds in the sample of air. A decrease in resistance is noticed if the analyzed sample contains a compound the sensor is sensitive to.

R_{0}

,

R_{m i n}

, and

Δ T

are values computed from the gas sensor response to be used for static feature extraction. Right: Gas sensor response in the phase-space. A total of five features were extracted from the phase-space: four are reported in the plot, with the last one being the relative integral in the measuring (adsorption) phase. Data shown in the figure were collected from the electronic nose presented in this study.

Figure 2. (a) Schematic diagram of the electronic nose for exhaled breath analysis. Two solenoid valves allowed the flow from the Tedlar bag and the environmental air to be controlled. An air pump pumped air inside the analysis chamber, where gas sensors were located. An additional solenoid valve, placed downstream, allowed air flow to be blocked from outside the analysis chamber. (b) A prototypical version of the developed electronic nose.

Figure 3. Schematic of gas sensor resistance measurement and heater control. For sake of clarity, only the schematic of a single gas sensor is shown. The red rectangle shows the components contained inside a gas sensor:

R_{S}

is the variable sensor resistance, while

R_{H}

is the heater resistance.

Figure 3. Schematic of gas sensor resistance measurement and heater control. For sake of clarity, only the schematic of a single gas sensor is shown. The red rectangle shows the components contained inside a gas sensor:

R_{S}

is the variable sensor resistance, while

R_{H}

is the heater resistance.

Figure 4. Example of filtered resistance values collected from all gas sensors when exposed to a sample of breath from a healthy subject.

Figure 5. Boxplots of normalized feature distributions for lung cancer and control groups. Boxplots in blue refer to control subjects, while boxplots in red refer to lung cancer subjects. Data have been normalized with min-max scaling before computing the boxplots.

Figure 6. PCA plots for the first three principal components (PC) based on static and dynamic sensor features. Data in blue refer to control subjects, while data in red refer to lung cancer subjects.

Figure 7. Left: Histogram showing the time difference (hours) between breath collection and breath analysis for all the subjects involved in the experiments. Right: Histogram showing the time difference (hours) between breath collection and breath analysis for lung cancer and control groups.

Table 1. Commercial gas sensors embedded in the electronic nose for exhaled breath analysis and the associated sensitive organic compounds. In addition to the reported organic compounds, inorganic compounds also have an effect on the gas sensors’ response.

Sensor	Sensitive Organic Compounds
TGS822	Organic Solvent Vapors (Ethanol, Acetone, Benzene, ...)
TGS2602	Ethanol, Toluene
TGS2620	Methane, Isobutene, Ethanol
TGS2600	Alcohol, Benzene, Hexane
TGS2603	Air contaminants (Trimethylamine, Ethanol, ...)

Table 2. Static sensor features extracted from the gas sensor response curves. Refer to Figure 1 for the values used in the feature extraction equations.

Feature	Value
$Δ R$	$(R_{0} - R_{m i n})$
Slope	$(R_{0} - R_{m i n}) / Δ T$
Ratio	$R_{0} / R_{m i n}$
Area	$\int R (t) / R_{0} d t$

Table 3. Clinical features included in the subject database. HCL: Hypercolesterolomia; COPD: Chronic Obstructive Pulmonary Disease.

Feature	Value
Age	Age of the subject (in years)
Gender	M/F
Smoking	Yes/No/Ex
Pack Years	Packs of cigarettes per day x Smoking Years
BMI	Weight/(Height × Height)
Hypertension	Yes/No
Diabetes	Yes/No
HCL	Yes/No
COPD	Yes/No
Obesity	Yes/No

Table 4. Confusion matrix for binary classification. TP: True Positives; FP: False Positives; FN: False Negative; TN: True Negatives [38].

		Predicted Results
		Positive	Negative
Real Results	Positive	TP	FN
Real Results	Negative	FP	TN

Table 5. Demographic and clinical characteristics of control and lung cancer subjects involved in the exhaled breath experiments.

		All (n = 80)	Control (n = 40)	LC (n = 40)
Gender	Male	46	25	21
Gender	Female	34	15	19
Age	(Years)	64 ± 8	62 ± 7	66 ± 8
Height	(cm)	169 ± 9	171 ± 10	167 ± 8
Weight	(Kg)	74 ± 15	76 ± 16	73 ± 14
Smoking	Yes	35	21	14
	Ex	29	11	18
	No	16	8	8
	Pack Years	6–90	17.5–90	6–75
Comorbidities	Hypertension	33	10	23
	Diabetes	2	-	2
	HCL	8	2	6
	COPD	8	6	2
	Obesity	3	-	3
pTNM Staging	I			20
	II			11
	III			4

Table 6. Lung cancer classification results in terms of recall, precision, and accuracy for different algorithms and feature reduction methods when using only sensor features. Re: Recall; Pr: Precision; Acc: Accuracy.

Algorithm	Feature Reduction	Re	Pr	Acc
SVM	Static + Dynamic PCA (n = 15)	0.64 ± 0.31	0.54 ± 0.08	0.59 ± 0.10
AdaBoost	Static + Dynamic PCA (n = 5)	0.67 ± 0.28	0.64 ± 0.11	0.66 ± 0.14

Table 7. Lung cancer classification results in terms of recall, precision, and accuracy for different algorithms and feature reduction methods when using both sensor and clinical features. Re: Recall; Pr: Precision; Acc: Accuracy.

Algorithm	Feature Reduction	Re	Pr	Acc
SVM	Dynamic + Clinical −	0.78 ± 0.21	0.80 ± 0.12	0.77 ± 0.04
AdaBoost	Static + Clinical −	0.69 ± 0.31	0.79 ± 0.19	0.72 ± 0.14

Table 8. Lung cancer sensitivity achieved with best classifier and feature reduction methods.

Algorithm	Feature Reduction		Recall
SVM	Dynamic + Clinical	Stage I	0.78 (0.60–0.93)
SVM	Dynamic + Clinical	Stage II–III	0.71 (0.53–0.94)
AdaBoost	Static + Clinical	Stage I	0.72 (0.50–0.86)
AdaBoost	Static + Clinical	Stage II–III	0.57 (0.33–0.79)

Table 9. Dataset split based on the time difference between breath collection and breath analysis. The threshold chosen for splitting the dataset was set to 4.5 h. Classification results are reported as mean recall (Re) and accuracy (Acc) over the 4 test folds used for the CV schema. Confidence interval of 95% is also reported.

Model		Threshold (hours)
		≤4.5		>4.5
		LC (24)	Control (16)	LC (16)	Control (24)
SVM	Re	0.78 (0.58–0.93)	0.75 (0.48–0.93)	0.77 (0.48–0.93)	0.78 (0.58–0.93)
SVM	Acc	0.77 (0.62–0.89)		0.78 (0.62–0.89)
AdaBoost	Re	0.65 (0.45–0.84)	0.69 (0.41–0.89)	0.69 (0.41–0.89)	0.65 (0.48–0.93)
AdaBoost	Acc	0.67 (0.51–0.81)		0.67 (0.51–0.81)

Table 10. Comparison between the device proposed in this manuscript and other electronic noses proposed in studies for lung cancer diagnosis. The size of the cohort is reported as total number of included subjects (control subjects/lung cancer subjects). Recall and accuracy values for each study are reported. GNP: Gold Nano-Particles; QMB: Quartz Micro-Balance.

	Sensors	Sample Size	Recall	Accuracy
[30]	MOS	81 (58/23)	0.97	0.93
[39]	QMB	146 (76/70)	0.81	–
[21]	Type-Different	52 (28/24)	0.92	0.92
[29]	MOS	1667 (1291/376)	0.71	0.71
[19]	MOS	85 (48/37)	0.79	0.75
[18]	Cyranose 320	475 (252/223)	0.96	0.91
[42]	MOS	118 (53/65)	0.95	0.97
[41]	Custom	106 (58/48)	0.96	–
[37]	Type-Different	214 (116/98)	0.95	0.96
This study	MOS	80 (40/40)	0.78	0.77

Table 11. Confounding factors and issues related to exhaled breath analysis for disease diagnosis associated with the ability to interpret and solve them for a possible clinical use.

	Interpretable	Solvable
Tedlar Bags Cleaning	Bag cleaning should be carried out before breath sampling to remove unwanted VOCs [23].	Yes, but consensus must be reached across researchers to determine a common strategy for bag cleaning.
Sensor Washout	MOS Sensors should always have the same baseline value when analyzing different samples.	Not easy. Environmental conditions and sensor drift could cause sensor baseline changes over time.
Sensor A-Specificity	Commercial MOS Sensors are a-specific and targeted to generic VOC mixture detection. Difficult to determine the presence of a specific substance in the sample under analysis.	Not easy. One possibility is offered by the integration of commercial MOS sensors with custom sensors targeted to specific substances in the same electronic nose [43].
VOCs Concentration	VOCs in exhaled breath could have concentration as low as in the parts per billion range.	Yes. Pre-concentration techniques with sample absorption/desorption could help in increasing detection capabilities [19].

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Marzorati, D.; Mainardi, L.; Sedda, G.; Gasparri, R.; Spaggiari, L.; Cerveri, P. MOS Sensors Array for the Discrimination of Lung Cancer and At-Risk Subjects with Exhaled Breath Analysis. Chemosensors 2021, 9, 209. https://doi.org/10.3390/chemosensors9080209

AMA Style

Marzorati D, Mainardi L, Sedda G, Gasparri R, Spaggiari L, Cerveri P. MOS Sensors Array for the Discrimination of Lung Cancer and At-Risk Subjects with Exhaled Breath Analysis. Chemosensors. 2021; 9(8):209. https://doi.org/10.3390/chemosensors9080209

Chicago/Turabian Style

Marzorati, Davide, Luca Mainardi, Giulia Sedda, Roberto Gasparri, Lorenzo Spaggiari, and Pietro Cerveri. 2021. "MOS Sensors Array for the Discrimination of Lung Cancer and At-Risk Subjects with Exhaled Breath Analysis" Chemosensors 9, no. 8: 209. https://doi.org/10.3390/chemosensors9080209

APA Style

Marzorati, D., Mainardi, L., Sedda, G., Gasparri, R., Spaggiari, L., & Cerveri, P. (2021). MOS Sensors Array for the Discrimination of Lung Cancer and At-Risk Subjects with Exhaled Breath Analysis. Chemosensors, 9(8), 209. https://doi.org/10.3390/chemosensors9080209

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MOS Sensors Array for the Discrimination of Lung Cancer and At-Risk Subjects with Exhaled Breath Analysis

Abstract

1. Introduction

2. Materials and Methods

2.1. Gas Sensors

2.2. Electronic Nose

2.3. Data Pre-Processing and Feature Extraction

2.4. Classification Algorithms and Metrics

2.5. Exhaled Breath Collection

2.6. Exhaled Breath Analysis

3. Results

3.1. Study Participants

3.2. Feature Distribution

3.3. Lung Cancer Classification

3.4. Lung Cancer Staging Recall Analysis

3.5. Time Dependency Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI