Next Article in Journal
Multi-Population Differential Evolution Algorithm with Uniform Local Search
Previous Article in Journal
Mechanistic Study on the Influence of Stratigraphy on the Initiation and Expansion Pattern of Hydraulic Fractures in Shale Reservoirs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning in Recognition of Basic Pulmonary Pathologies

Artificial Intelligence Division, Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(16), 8086; https://doi.org/10.3390/app12168086
Submission received: 8 July 2022 / Revised: 3 August 2022 / Accepted: 9 August 2022 / Published: 12 August 2022
(This article belongs to the Special Issue Deep Learning in Health and Medicine)

Abstract

:
Nowadays, during the diagnosis process, the doctor is able to obtain access to much information describing the patient’s condition using appropriate tools. However, there are always two sides to the coin. The doctor has certain limitations regarding the amount of data they can process at once. Information technology comes to the rescue, which with the help of computers is able to quickly and effectively separate important information from redundant information and support the doctor in making a diagnosis. In this work, a decision-making system was created to diagnose common lung pathologies in digital radiography images. Here, we consider four basic pulmonary diseases: pneumothorax, pneumonia, pulmonary consolidation, and lung lesions. Our objective is to develop a new automatic detection method of lung pathologies on chest X-ray radiographs using python programming language and its libraries. The approach uses solutions in the field of artificial intelligence, such as deep learning, convolutional neural network and segmentation to make a diagnosis that aims to help the radiologist at work. In the first sections, this work describes the fundamentals of the present form of diagnosis, a proposal to improve this process, the method of operation of the algorithms used, data acquisition, segmentation and processing methods. Then, the results of the operation of four different models and their implementation in a practical window program were presented. The best model, which detects pulmonary consolidation, achieves accuracy higher than 91%, which is a satisfactory result because they are not intended to replace radiologists but to improve their work. In the future, this type of program can be further developed by adding models that recognize other conditions.

1. Introduction

In the last decades, the fields of different applications of artificial intelligence (AI) in human activity have grown exponentially [1,2,3,4,5]. These advances also have facilitated the development of processes for high-throughput extraction of features from medical X-ray images.
Compared to other medical imaging equipment, X-ray generators are relatively easy to use, cheap to manufacture, and relatively safe for the patient, making digital radiography one of the fastest and most popular forms of medical imaging. Well-described and anonymized medical photos of diagnosed pathologies can be used to create large databases that can be used to train automatic diagnosis models. The computer can analyze all images of the patient faster than the doctor. It is able to “see” something that is not visible to the human eye, so although the program will not make a diagnosis with 100% certainty, it will speed up the process and draw the doctor’s attention to essential aspects of the diagnosis that he/she did not previously consider [6,7,8,9,10,11].
This work aims to create a novel approach and a program for the automatic diagnosis of four pathologies (pneumothorax, pneumonia, pulmonary consolidation, and lung lesions) in digital radiography images based on chest X-ray radiography. The program uses a previously trained deep neural network model and a lung segmentation algorithm. Based on the uploaded photo, the algorithm will provide the probability of predicting the presence of each of the four mentioned pathologies. The application has a graphical user interface that allows radiologists to operate it intuitively [12,13,14].
Beneath, we shortly describe four basic pulmonary diseases considered in our approach. Pneumothorax is one of the most critical pathologies that a radiologist must correctly and effectively recognize on X-ray images of the chest. Classic features of pneumothorax visible on X-ray are the presence of the parietal pleura (noticeable as a white line in the pulmonary field), which is always invisible in standard images because it is covered by the ribs at the chest wall. Laterally to this line, there is no trace of lung tissue—just clear, black space. Additionally, as part of the differentiation of pathology, the radiologist must assess whether the entire mediastinum has moved. In the case of large pneumothoraxes, the picture is quite characteristic. However, the clinical problem may be caused by small pneumothoraxes, which cause symptoms in patients, and in chest X-ray pictures, they may be barely noticeable, or the pleural line may be imitated by other pathologies or, the most common error, by the medial side of the spatula, which is why the correct diagnosis is so important.
The second disease is pneumonia whose radiological picture is highly variable and depends on the etiology of the infection. The diagnosis can be made when the radiologist recognizes homogeneous densities in typical locations, e.g., the entire lobe or a segment of one of the lungs, sharply ending at the interlobar fissures. In other cases, densities may have the so-called patchy pattern, which also suggests inflammation. In atypical infections, the densities are located peripherally.
Next, we consider pulmonary consolidation. Diagnostics of the consolidations differ from the diagnostics of pneumonia. In cases of inflammation, they are arranged in characteristic features and locations, and the consolidations may also indicate other pathologies, e.g., airflow disorders or the presence of effusion in the lungs. Consolidation is not a disease itself, but it can be a symptom of many diseases of the lung tissue. They are diagnosed by recognizing opacification in the image which, in fact, is a brighter area in the image, with a higher value of pixels.
In the fourth case, we consider lung lesions. They are pathological areas that must be noticed and diagnosed by a radiologist because they may indicate potentially fatal diseases, such as lung cancer, tuberculosis, or abscesses. They can be seen as limited, planar, or coin shapes. They can be localized at any place in the lungs. Due to the fact that the blood vessels in the lungs can have the same brightness, it is essential that such lesions do not escape the attention of the radiologist; hence, there is a great need for artificial intelligence to assist in the detection of such pathologies.
Those four pathologies were chosen to be analyzed because they are usually diagnosed through the analysis of X-ray images by radiologists. This means that those diseases are considerably common in image databases, which can provide data for machine learning. Additionally, its popularity makes this program really useful because it can speed up many diagnoses.
The diagnosis of each of the above-mentioned pathologies is time-consuming, and the verdict is often ambiguous. Many factors are not visible to the human eye in the photo. Some pathology features are discrete and can only be seen through deep image analysis taking into account differences in the shade of pixels in grayscale. In the case of computed tomography images, the diagnosis is even more difficult because the radiologist has to analyze several dozen or sometimes several hundred sections for one patient. The fact that there is a shortage of radiologists does not help the hospital’s efficiency in the form of the number of diagnoses per day. All these problems can be solved with the help of innovative artificial intelligence technologies.
Modern computers can process and analyze images much faster than humans. Innovative algorithms, including neural networks, are able to analyze several hundred images and make a diagnosis in several seconds. Such programs often see more information than a human. They are able to quickly calculate the histogram of the photo, the average shade of the grayscale, and other advanced statistical parameters in the image. They can instantly apply a filter to an image, transforming it into another form that apparently has little to do with the original but has essential information for a computer.
This approach will never replace the radiologist who always makes the final diagnosis decision. However, it can improve their work, allow faster data flow, and draw attention to pathologies that they may not have suspected before. In this paper, the program will only analyze images for the four lung pathologies, but such programs can be developed in the future to allow them to diagnose more pathologies [15,16,17,18,19,20,21].

2. Methodology

The algorithms used in this study were trained on data obtained from the CheXpert database containing 224,316 accurately described chest radiographs from 65,240 patients. This dataset is remarkably large, and it provides labels made by experienced doctors. All images have good resolution and satisfying quality, and each one has 14 labels that describe common pulmonary pathologies. The label can contain ‘1′ for presence, ‘0′ for absence, and ‘-’ for no information provided [22].
A considerably important part of the whole algorithm is lung segmentation. Because of the planar type of imaging, it is much harder to perform this process on X-ray images than on computed tomography images. There were attempts to make this process by image processing, but they failed. This is why the primary tool that performs lung segmentation here, is a previously trained neural network. It was downloaded from the GitHub repository [23]. It consists of 32 layers divided into convolutional blocks and has 7,759,521 trained weights. It was trained on a set of lung photos and corresponding masks created by radiologists. It accepts photos of 512 × 512 px as input, and each photo was previously scaled and presented as a NumPy array. This network returns a binary matrix of ones and zeros that make up a mask that only lets the pixels that contain the lungs through when placed over the original image. Masks were improved algorithmically and by applying morphological operations, such as binary opening, closing, and erosion. Sample improved segmentation was presented in Figure 1. The number of images after segmentation is presented in Table 1.
The neural network was built using transfer learning, i.e., adding a few more neural layers to ResNet-50 architecture, as explained in Figure 2. The weight optimization gradient is propagated back from the last layer to the initial layer in neural networks. Unfortunately, the more layers there are, the more this gradient disappears, which affects the weights in the first layers less and less. This phenomenon is known as the vanishing gradient effect and is dealt well with ResNet networks that use shortcuts to propagate the gradient. As a result, this network architecture uses hundreds of layers simultaneously, achieving outstanding performance. This network consists of the first block, which has a convolutional, batch normalization, activation (ReLU) and MaxPooling layer, and then in later stages has convolutional blocks and identity blocks, which have the shortcuts as mentioned earlier. All of them add up to 178 layers. The output indicates the probability of the presence of pathology in the image. This network has 31,956,481 weights that will process the input image after training [24].
The data were divided into training sets, validation sets, and test sets in a ratio of 8:1:1. Since it is impossible to upload all the photos to the computer’s RAM simultaneously, special data generators were used, which, in addition to delivering them to the network, appropriately formatted them. The network was built to accept images with dimensions of 224 × 224 px at the entrance, so each photo was scaled to this format. Data augmentation algorithms were used to prevent the network from overfitting: each image from the training set before entering the network was rotated by a random, slight angle. Thus, the network never received the same data twice. After building the network, the model was compiled with a Stochastic Gradient Descent optimizer with a momentum value of 0.9 and a learning step of 0.0001, and a binary cross-entropy loss function. Early stopping switched off the training process when validation loss did not improve in five consequential epochs, and the Model Checkpoint function restored weights, providing the highest validation accuracy.
During training, the network was subjected to a validation process every single epoch, which consisted of testing the model on a validation set to monitor how much the model was overfitted with the training data. Figure 3 describes the graphs of a sample training process. It shows that the validation accuracy is increasing, and validation loss is still decreasing which means that the model has not been overfitted.

3. Considerations on Quality of the Models

The previous section described the process of training models which support the diagnosis. However, they were firstly trained on original images and then on segmented images. The purpose was to check whether model accuracy would improve after the segmentation process. The results that the networks achieved are briefly presented in Table 2; this section will describe them more precisely and will show how the quality of models was determined.
All models were tested with test sets that accounted for 10% of the total data. For each tested case, the models returned a number ranging from 0 to 1, which can be interpreted as the probability of the appearance of a given disease. In order to determine the quality of the prediction, it has been decided that a certain threshold will be set that will separate the predictions into positive and negative ones, creating a binary prediction vector. By comparing this vector with the label vector (Grand Truth), the number of true positives, true negatives, false positives, and false negatives was calculated. These numbers also allowed the sensitivity and specificity of each model to be calculated and the ROC curve to be drawn. The area under this curve (ROC-AUC) defines the quality of the model.
In order to counteract the phenomenon of network overfitting during training, a function for creating checkpoints was used in the form of model saves every epoch. Then, using the training graphs and validation, the model quality was analyzed depending on the training duration (number of epochs). The analysis was performed for different threshold values separating positive and negative cases. Results had shown that the model was not overfitted, and with training lasting 40 epochs, it achieved the best result. This analysis method was performed for all models, and the best one was selected.
Figure 4, Figure 5, Figure 6 and Figure 7 show the confusion matrices and the corresponding ROC curves. All those plots describe the quality of the models. Each of them, during testing, had a tendency to return small values, which means that output could not be interpreted as a probability of the presence of pathology. It resulted in the necessity of using a threshold that could divide predictions into positive and negative ones. Models were tested, and the best threshold was chosen. Results are presented in Table 3.
Additionally, it was decided to count one more measure. For each model, three independent testing sets were created; it was checked how the quality of the model would differ between them. Thanks to this, it is possible to determine the level of precision of a given model, which allows defining the boundary that helps to compare models with each other. The results are shown in Table 4.

4. Application with Graphical User Interface

In order to present the operation of the models, a window application with basic functionalities was created in accordance with the MVC pattern architecture. It allows the user to load an image and analyze it with all four algorithms. It also performs an image segmentation and presents it on screen. All four diagnoses are performed separately in series and are independent, which means that the bad performance of one of them does not affect the performance of the other one.
After every model was tested separately, an attempt was made to check how all models combined into one program would diagnose all four pathologies. On the graphical user interface, it can be seen that there are two possible analyses, but as it was mentioned before, analysis with segmentation has considerably better results, so analysis without segmentation should be used when radiologists estimate that there is significant information outside the segmentation. There are also buttons that allow the user to switch between diagnoses presented in the form of probability and the boolean diagnosis. The application also provides buttons that can switch the view from segmented lungs to the original image.
The program works smoothly and diagnoses pathologies correctly on the testing set. Sample diagnoses were presented in Figure 8, Figure 9 and Figure 10. The radiologist can use this application to analyze one or many images very quickly and obtain one specific diagnosis. If they accept this diagnosis, they can go on and check the next patient and if they have doubts, they should check their diagnosis again and maybe change their verdict. As it is presented, the segmentation process also works without problems and shows the image that can help the radiologist to focus attention on what is essential and not become distracted with worthless information.

5. Conclusions

The experimental results of the models described in the previous section are satisfying, and they can provide help to radiologists. The algorithm that uses a deep neural network trained on segmented images and detects pulmonary consolidation turned out to be the best model. It achieved a quality of over 91%. The worst model is the algorithm that detects lung lesions with the result of 77% accuracy, which can be explained by the presence of medical devices on images which could have confused the algorithm. The model that detects pneumothorax with a quality of 78% can be improved by a better segmentation process used on the training data because the presence of pneumothorax can be vastly discrete, and even a slight lack of information can outweigh the result.
Concluding, this paper proves that algorithms of this type can immensely help radiologists with their work. It will not replace the human who is responsible for the diagnosis, but it can speed up the process and make it more accurate. Of course, this type of application can be expanded with more models for diagnosing other diseases or pathologies. Its capabilities are limited only to the number of models which are limited by databases. This is why it is so important to gather data and create great databases, especially in terms of medical imaging. Those databases should contain as much anonymous data as they can and they should be increased with every new record. Many innovative approaches, such as radiomics can combine features calculated from images with patient data, such as sex or age to gain information. With continuous dataflow, engineers would be able to create models which learn online, adapt to new information and predict even better.
In our future work, we can develop the issue of precision level mentioned in point 3 which was partially considered in Table 4. The approaches reported in [25,26] may be applied in our research. It is also possible to create algorithms responsible for the detection of pathology in images obtained from computed tomography, magnetic resonance imaging, or other forms of medical imaging. As follows from the above considerations machine learning and neural networks may be very promising in research for health and medicine.

Author Contributions

Conceptualization, J.M.; Software, J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Karimi, M.; Vaferi, B.; Hosseini, S.H.; Rasteh, M. Designing an Efficient Artificial Intelligent Approach for Estimation of Hydrodynamic Characteristics of Tapered Fluidized Bed from Its Design and Operating Parameters. Ind. Eng. Chem. Res. 2018, 57, 259–267. [Google Scholar] [CrossRef]
  2. Roshani, G.H.; Muhammad Ali, P.J.; Mohammed, S.; Hanus, R.; Abdulkareem, L.; Alanezi, A.A. Feasibility Study of Using X-ray Tube and GMDH for Measuring Volume Fractions of Annular and Stratified Regimes in Three-Phase Flows. Symmetry 2021, 13, 613. [Google Scholar] [CrossRef]
  3. Hosseini, S.; Vaferi, B. Determination of methanol loss due to vaporization in gas hydrate inhibition process using intelligent connectionist paradigms. Arab. J. Sci. Eng. 2021, 47, 5811–5819. [Google Scholar] [CrossRef]
  4. Chang, S.-H.; Abdul, A.; Chen, J.; Liao, H.-Y. A Personalized Music Recommendation System Using Convolutional Neural Networks Approach. In Proceedings of the IEEE International Conference on Applied System Innovation, Chiba, Japan, 13–17 April 2018. [Google Scholar] [CrossRef]
  5. Briot, J.P.; Hadjeres, G.; Pachet, F.D. Deep Learning Techniques for Music Generation—A Survey. August 2019. Available online: https://hal.sorbonne-universite.fr/hal-01660772 (accessed on 11 August 2022).
  6. Rockall, A.; Hatrick, A.; Armstrong, P.; Wastie, M. Diagnostic Imaging, Includes Wiley E-Text; Wiley-Blackwell: Chichester, UK, 2013; ISBN 9780470658901. [Google Scholar]
  7. Zhang, X.; Smith, N.; Webb, A. 1—Medical imaging. In Biomedical Information Technology; Feng, D.D., Ed.; Academic Press: Burlington, MA, USA, 2008; pp. 3–27. ISBN 978-0-12-373583-6. [Google Scholar] [CrossRef]
  8. George, R.B. Chest Medicine: Essentials of Pulmonary and Critical Care Medicine; Lippincott Williams & Wilkins: Philadelphia, PA, USA, 2005. [Google Scholar]
  9. Tadeusiewicz, R. Neural Networks in Biomedical Engineering. In Polish, Acta Bio-Optica et Informatica Medica. Inzynieria Biomedyczna; Jacek Doskocz: Wrocław, Poland, 2007; Volume 13, pp. 184–189. ISSN 1234–5563. [Google Scholar]
  10. Pezzotti, N.; Hollt, T.; Van Gemert, J.; Lelieveldt, B.P.; Eisemann, E.; Vilanova, A. Deepeyes: Progressive visual analytics for designing deep neural networks. IEEE Trans. Vis. Comput. Graph. 2018, 24, 98–108. [Google Scholar] [CrossRef] [PubMed]
  11. Lambin, P.; Leijenaar, R.T.H.; Deist, T.M.; Peerlings, J.; de Jong, E.E.C.; van Timmeren, J.; Sanduleanu, S.; Larue, R.T.H.M.; Even, A.J.G.; Jochems, A. Radiomics: The bridge between medical imaging and personalized medicine. Nat. Reviews. Clin. Oncol. 2017, 14, 749–762. [Google Scholar] [CrossRef] [PubMed]
  12. Avanzo, M.; Stancanello, J.; El Naqa, I. Beyond imaging: The promise of radiomics. Phys. Med. Int. J. Devoted Appl. Phys. Med. Biol. Off. J. Ital. Assoc. Biomed. Phys. AIFB 2017, 38, 122–139. [Google Scholar] [CrossRef] [PubMed]
  13. Hosny, A.; Parmar, C.; Coroller, T.P.; Grossmann, P.; Zeleznik, R.; Kumar, A.; Bussink, J.; Gillies, R.J.; Mak, R.H.; Aerts, H.J.W.L. Deep learning for lung cancer prognostication: A retrospective multi-cohort radiomics study. PLoS Med. 2018, 15, e1002711. [Google Scholar] [CrossRef] [PubMed]
  14. Fournier, L.; Costaridou, L.; Bidaut, L.; Michoux, N.; Lecouvet, F.E.; de Geus-Oei, L.-F.; Boellaard, R.; Oprea-Lager, D.E.; Obuchowski, A.N.; Caroli, A.; et al. Incorporating radiomics into clinical trials: Expert consensus endorsed by the european society of radiology on considerations for data-driven compared to biologically driven quantitative biomarkers. Eur. Radiol. 2021, 48, 6001–6012. [Google Scholar] [CrossRef]
  15. Parekh, V.; Jacobs, M.A. Radiomics: A new application from established tech-niques. Expert Rev. Precis. Med. Drug Dev. 2016, 1, 207–226. [Google Scholar] [CrossRef]
  16. Thibault, G.; Fertil, B.; Navarro, C.; Pereira, S.; Cau, P.; Lévy, N.; Sequeira, J.; Mari, J. Texture indexes and gray level size zone matrix application to cell nuclei classification. In 10th International Conf. on Pattern Recognition and Information Processing; Aix Marseille Université: Marseille, France, 2009. [Google Scholar]
  17. Masokano, B.; Liu, W.; Xie, S.; Henrio Marcellin, D.F.; Pei, Y.; Li, W. The application of texture quantification in hepatocellular carcinoma using ct and mri: A review of perspectives and challenges. Cancer Imaging Off. Publ. Int. Cancer Imaging Soc. 2020, 20, 67. [Google Scholar] [CrossRef] [PubMed]
  18. Lambin, P.; Rios-Velazquez, E.; Leijenaar, R.; Carvalho, S.; van Stiphout, R.G.P.M.; Granton, P.; Zegers, C.M.L.; Gillies, R.; Boellard, R.; Dekker, A.; et al. Radiomics: Extracting more information frommedical images using advanced feature analysis. Eur. J. Cancer 2012, 48, 441–446. [Google Scholar] [CrossRef] [PubMed]
  19. Gillies, R.J.; Kinahan, P.E.; Hricak, H. Radiomics: Images are more than pictures, they are data. Radiology 2016, 278, 563–577. [Google Scholar] [CrossRef] [PubMed]
  20. Kumar, V.; Gu, Y.; Basu, S.; Berglund, A.; Eschrich, S.A.; Schabath, M.B.; Forster, K.; Aerts, H.J.W.L.; Dekker, A.; Fenstermacher, D. Radiomics: The process and the challenges. Magn. Reson. Imaging 2012, 30, 1234–1248. [Google Scholar] [CrossRef] [PubMed]
  21. van Griethuysen, J.J.M.; Fedorov, A.; Parmar, C.; Hosny, A.; Aucoin, N.; Narayan, V.; Beets-Tan, R.G.H.; Fillion-Robin, J.-C.; Pieper, S.; Aerts, H.J.W.L. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 2017, 77, e104–e107. [Google Scholar] [CrossRef] [PubMed]
  22. Irvin, J.; Rajpurkar, P.; Ko, M.; Yu, Y.; Ciurea-Ilcus, S.; Chute, C.; Marklund, H.; Haghgoo, B.; Ball, R.; Shpanskaya, K.; et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. arXiv 2019, arXiv:1901.07031. [Google Scholar] [CrossRef]
  23. rani700, Lung Fields Segmentation on Chest X-ray Images. 2020. Available online: https://github.com/rani700/xray (accessed on 11 August 2022).
  24. Chollet, F. Deep Learning with Python. Manning Publications, Co.: Shelter Island, NY, USA, 2018. [Google Scholar]
  25. Du, X.; Wang, T.; Wang, L.; Pan, W.; Chai, C.; Xu, X.; Jiang, B.; Wang, J. CoreBug: Improving Effort-Aware Bug Prediction in Software Systems Using Generalized k-Core Decomposition in Class Dependency Networks. Axioms 2022, 11, 205. [Google Scholar] [CrossRef]
  26. Pan, W.; Ming, H.; Yang, Z.; Wang, T. Comments on “Using k-core Decomposition on Class Dependency Networks to Improve Bug Prediction Model's Practical Performance”. IEEE Trans. Softw. Eng. 2022. [Google Scholar] [CrossRef]
Figure 1. Sample lung segmentation on X-ray image.
Figure 1. Sample lung segmentation on X-ray image.
Applsci 12 08086 g001
Figure 2. An explanation how new network was build using Transfer Learning.
Figure 2. An explanation how new network was build using Transfer Learning.
Applsci 12 08086 g002
Figure 3. (a) Model Loss function values during training of model which detects pulmonary consolidation. (b) Accuracy plots during training of model which detects pulmonary consolidation.
Figure 3. (a) Model Loss function values during training of model which detects pulmonary consolidation. (b) Accuracy plots during training of model which detects pulmonary consolidation.
Applsci 12 08086 g003aApplsci 12 08086 g003b
Figure 4. Model quality—Pneumothorax. (a) Confusion matrix. (b) ROC curve.
Figure 4. Model quality—Pneumothorax. (a) Confusion matrix. (b) ROC curve.
Applsci 12 08086 g004
Figure 5. Model quality—Pneumonia. (a) Confusion matrix. (b) ROC curve.
Figure 5. Model quality—Pneumonia. (a) Confusion matrix. (b) ROC curve.
Applsci 12 08086 g005
Figure 6. Model quality—Pulmonary Consolidation. (a) Confusion matrix. (b) ROC curve.
Figure 6. Model quality—Pulmonary Consolidation. (a) Confusion matrix. (b) ROC curve.
Applsci 12 08086 g006
Figure 7. Model quality—Lung Lesions. (a) Confusion matrix. (b) ROC curve.
Figure 7. Model quality—Lung Lesions. (a) Confusion matrix. (b) ROC curve.
Applsci 12 08086 g007
Figure 8. Diagnosis of patient with pneumonia.
Figure 8. Diagnosis of patient with pneumonia.
Applsci 12 08086 g008
Figure 9. Diagnosis of a healthy patient.
Figure 9. Diagnosis of a healthy patient.
Applsci 12 08086 g009
Figure 10. Diagnosis of patient with pneumothorax and lung lesion.
Figure 10. Diagnosis of patient with pneumothorax and lung lesion.
Applsci 12 08086 g010
Table 1. Number of patients after segmentation.
Table 1. Number of patients after segmentation.
PathologyNumber of Cases
PositiveNegative
Pneumothorax17,32263,776
Pneumonia458618,704
Lung Lesion690417,618
Lung Consolidation12,79636,345
Table 2. Difference in accuracies depending on training model on segmented or original images.
Table 2. Difference in accuracies depending on training model on segmented or original images.
Model
Accuracy
PneumothoraxPneumoniaLung
Consolidation
Lung
Lesions
Trained on
original images
57%69%80%67%
Trained on
segmented images
78%78%91%77%
Improvement+21%+9%+11%+10%
Table 3. Best thresholds for each model.
Table 3. Best thresholds for each model.
ModelPneumothoraxPneumoniaLung
Consolidation
Lung
Lesions
Optimal threshold0.30.20.20.3
Table 4. Partial considerations for precision levels for each model.
Table 4. Partial considerations for precision levels for each model.
ModelLung LesionsPneumothoraxPneumoniaPulmonary Consolidation
Volume of each image set81527017741634
Accuracy on 1st set0.690790.715660.788110.86474
Accuracy on 2nd set0.689570.698250.748060.83414
Accuracy on 3rd set0.723920.723430.773900.87637
Variance0.000380.000170.000410.00048
Precision level0.034350.025180.040050.04223
All charts and matrices were generated with Python libraries—matplotlib and scikit-learn.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Płudowski, J.; Mulawka, J. Machine Learning in Recognition of Basic Pulmonary Pathologies. Appl. Sci. 2022, 12, 8086. https://doi.org/10.3390/app12168086

AMA Style

Płudowski J, Mulawka J. Machine Learning in Recognition of Basic Pulmonary Pathologies. Applied Sciences. 2022; 12(16):8086. https://doi.org/10.3390/app12168086

Chicago/Turabian Style

Płudowski, Jakub, and Jan Mulawka. 2022. "Machine Learning in Recognition of Basic Pulmonary Pathologies" Applied Sciences 12, no. 16: 8086. https://doi.org/10.3390/app12168086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop