An Improved Weighted Cross-Entropy-Based Convolutional Neural Network for Auxiliary Diagnosis of Pneumonia
Abstract
:1. Introduction
2. Related Work
3. Methodology
3.1. Convolutional Neural Network
3.1.1. Input Layer
3.1.2. Convolution Layer
3.1.3. Activation Layer
- The sigmoid activation function is a commonly used function in neural networks. The mathematical expression of the sigmoid function is shown in Equation (1). The most notable characteristic of the sigmoid function is that its output is bounded and lies between 0 and 1. This makes it particularly important in the output layer when dealing with binary classification problems. Additionally, the sigmoid function is continuously differentiable, which is a crucial property for optimization algorithms such as gradient descent.
- The ReLU activation function is a simple yet effective nonlinear function; its mathematical expression is shown in Equation (2). The ReLU function remains linear for positive values and outputs zero for negative values. This nonlinear transformation helps introduce nonlinear characteristics into the network, enabling it to learn more complex functional relationships.
- The mathematical expression of the tanh activation function is shown in Equation (3). It maps any real number to the range (−1, 1). The advantage of the tanh function is that its output is centered at approximately 0, which often results in a faster learning process. However, similar to the sigmoid function, the gradient of the tanh function approaches 0 when the input values are large or small. This can lead to the vanishing gradient problem, making it difficult for the neural network to learn and update its weights.
3.1.4. Pooling Layer
3.1.5. Fully Connected Layer
3.2. Loss Function
Improved Weighted Cross-Entropy
3.3. Transfer Learning
3.4. Grad-CAM
4. Experimental Studies
4.1. Dataset Description and Processing
- Resizing: Random size cropping is performed on the images, followed by resizing them to 224 × 224 pixels. This provides a consistent data foundation and enhances the robustness of the model to different perspectives and scales through the randomness of cropping (see Figure 5a). Random cropping is a basic data augmentation method that has been widely used [45].
- Rotation and translation: Random rotation and translation are applied to simulate changes in shooting angles in real-world scenarios, improving the applicability and accuracy of the model (see Figure 5b,c). Studies have shown that random rotation and translation significantly enhance the performance of the model in handling data with different shooting angles [46].
- CLAHE image enhancement: Contrast-limited adaptive histogram equalization (CLAHE) is used to improve image contrast, which is particularly suitable for CXR images, significantly enhancing image quality and better supporting model training (see Figure 5d). This method has been widely validated as effective in medical imaging [47].
- Data normalization: The three channels of the RGB images were normalized to mean values of 0.485, 0.456, and 0.406 and standard deviations of 0.229, 0.224, and 0.225. These parameters are considered effective normalization parameters in deep learning practice. Each channel of the data is normalized via these parameters, as expressed in Equation (6). Equation (11), ensures a uniform distribution of the data, promoting the stability of neural network training [48].
4.2. Experimental Setup and Parameter Settings
- AlexNet is a milestone in deep learning, and its major contribution lies in achieving exceptional classification performance on the ImageNet dataset through a deep convolutional neural network. The core structure of AlexNet includes five convolutional layers and three fully connected layers. It introduces the ReLU activation function to accelerate training and uses dropout techniques to prevent overfitting. Additionally, AlexNet was the first model to use GPUs for large-scale parallel computing, significantly increasing the training speed.
- VGG16 increases network depth by using multiple stacked 3 × 3 small convolutional kernels to extract high-level feature representations. VGG16 consists of 13 convolutional layers and three fully connected layers. Although the deeper structure increased the computational load, it demonstrated excellent performance on the ImageNet dataset. Its simple and uniform design makes it easy to transfer to other visual tasks.
- GoogLeNet (Inception V1) maintains relatively low computational complexity while capturing multiscale information through the inception module. The inception module fuses features of different scales through parallel convolution and pooling operations, better representing both local and global information in images. GoogLeNet has shown high efficiency and superior image classification performance across various computational platforms.
- ResNet18 is a member of the residual network family with 18 layers. Its core idea is to address the vanishing gradient and degradation problems in deep networks by introducing residual blocks. Residual blocks use skip connections to pass information directly between layers, ensuring effective gradient propagation. This makes it possible to train very deep networks.
4.3. Evaluation Metrics
4.4. Performance Comparison
4.4.1. Performance Comparison on Improved Cross-Entropy Loss Function
4.4.2. Performance Comparison on TL
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ciotti, M.; Ciccozzi, M.; Terrinoni, A.; Jiang, W.C.; Wang, C.B.; Bernardini, S. The COVID-19 pandemic. Crit. Rev. Clin. Lab. Sci. 2020, 57, 365–388. [Google Scholar] [CrossRef]
- Miettinen, O.S.; Flegel, K.M.; Steurer, J. Clinical diagnosis of pneumonia, typical of experts. J. Eval. Clin. Pract. 2008, 14, 343–350. [Google Scholar] [CrossRef]
- Portugal, I.; Alencar, P.; Cowan, D. The use of machine learning algorithms in recommender systems: A systematic review. Expert Syst. Appl. 2018, 97, 205–227. [Google Scholar] [CrossRef]
- Khanal, S.S.; Prasad, P.; Alsadoon, A.; Maag, A. A systematic review: Machine learning based recommendation systems for e-learning. Educ. Inf. Technol. 2020, 25, 2635–2664. [Google Scholar] [CrossRef]
- Liu, L. e-Commerce Personalized Recommendation Based on Machine Learning Technology. Mob. Inf. Syst. 2022, 2022, 1761579. [Google Scholar] [CrossRef]
- Han, T.; Liu, C.; Yang, W.; Jiang, D. Deep transfer network with joint distribution adaptation: A new intelligent fault diagnosis framework for industry application. ISA Trans. 2020, 97, 269–281. [Google Scholar] [CrossRef] [PubMed]
- Han, T.; Liu, C.; Yang, W.; Jiang, D. A novel adversarial learning framework in deep convolutional neural network for intelligent diagnosis of mechanical faults. Knowl.-Based Syst. 2019, 165, 474–487. [Google Scholar] [CrossRef]
- Chang, Z.; Zhang, A.j.; Wang, H.; Xu, J.; Han, T. Photovoltaic Cell Anomaly Detection Enabled by Scale Distribution Alignment Learning and Multi-Scale Linear Attention Framework. IEEE Internet Things J. 2024; early access. [Google Scholar]
- Melati, D.; Grinberg, Y.; Kamandar Dezfouli, M.; Janz, S.; Cheben, P.; Schmid, J.H.; Sánchez-Postigo, A.; Xu, D.X. Mapping the global design space of nanophotonic components using machine learning pattern recognition. Nat. Commun. 2019, 10, 4775. [Google Scholar] [CrossRef]
- Wang, P.; Fan, E.; Wang, P. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognit. Lett. 2021, 141, 61–67. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, S.; Lin, L.; Cui, Z.; Zong, Y. Computer Vision and Deep Learning Transforming Image Recognition and Beyond. Int. J. Comput. Sci. Inf. Technol. 2024, 2, 45–51. [Google Scholar] [CrossRef]
- Anwar, S.M.; Majid, M.; Qayyum, A.; Awais, M.; Alnowami, M.; Khan, M.K. Medical image analysis using convolutional neural networks: A review. J. Med. Syst. 2018, 42, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Szepesi, P.; Szilágyi, L. Detection of pneumonia using convolutional neural networks and deep learning. Biocybern. Biomed. Eng. 2022, 42, 1012–1022. [Google Scholar] [CrossRef]
- Rahman, T.; Chowdhury, M.E.; Khandakar, A.; Islam, K.R.; Islam, K.F.; Mahbub, Z.B.; Kadir, M.A.; Kashem, S. Transfer learning with deep convolutional neural network (CNN) for pneumonia detection using chest X-ray. Appl. Sci. 2020, 10, 3233. [Google Scholar] [CrossRef]
- Jha, D.; Riegler, M.A.; Johansen, D.; Halvorsen, P.; Johansen, H.D. Doubleu-net: A deep convolutional neural network for medical image segmentation. In Proceedings of the 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), Rochester, MN, USA, 28–30 July 2020; pp. 558–564. [Google Scholar]
- Dhillon, A.; Verma, G.K. Convolutional neural network: A review of models, methodologies and applications to object detection. Prog. Artif. Intell. 2020, 9, 85–112. [Google Scholar] [CrossRef]
- Vinogradova, K.; Dibrov, A.; Myers, G. Towards interpretable semantic segmentation via gradient-weighted class activation mapping (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13943–13944. [Google Scholar]
- Goel, N.; Yadav, A.; Singh, B.M. Medical image processing: A review. In Proceedings of the 2016 Second International Innovative Applications of Computational Intelligence on Power, Energy and Controls with their Impact on Humanity (CIPECH), Ghaziabad, India, 18–19 November 2016; pp. 57–62. [Google Scholar]
- Tang, J.; Deng, C.; Huang, G.B. Extreme learning machine for multilayer perceptron. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 809–821. [Google Scholar] [CrossRef]
- Song, Y.Y.; Ying, L. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 2015, 27, 130. [Google Scholar] [PubMed]
- Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Washington, DC, USA, 4–10 August 2001; Volume 3, pp. 41–46. [Google Scholar]
- Ji, J.; Tang, C.; Zhao, J.; Tang, Z.; Todo, Y. A survey on dendritic neuron model: Mechanisms, algorithms and practical applications. Neurocomputing 2022, 489, 390–406. [Google Scholar] [CrossRef]
- Song, Z.; Tang, Y.; Ji, J.; Todo, Y. Evaluating a dendritic neuron model for wind speed forecasting. Knowl.-Based Syst. 2020, 201, 106052. [Google Scholar] [CrossRef]
- Song, Z.; Tang, C.; Song, S.; Tang, Y.; Li, J.; Ji, J. A complex network-based firefly algorithm for numerical optimization and time series forecasting. Appl. Soft Comput. 2023, 137, 110158. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Osareh, A.; Shadgar, B. Classification and diagnostic prediction of cancers using gene microarray data analysis. J. Appl. Sci. 2009, 9, 459–468. [Google Scholar] [CrossRef]
- Yahyaoui, A.; Yumuşak, N. Decision support system based on the support vector machines and the adaptive support vector machines algorithm for solving chest disease diagnosis problems. Biomed. Res. 2018. [Google Scholar] [CrossRef]
- Nalepa, J.; Kawulok, M. Selecting training sets for support vector machines: A review. Artif. Intell. Rev. 2019, 52, 857–900. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Anthimopoulos, M.; Christodoulidis, S.; Christe, A.; Mougiakakou, S. Classification of interstitial lung disease patterns using local DCT features and random forest. In Proceedings of the 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Chicago, IL, USA, 26–30 August 2014; pp. 6040–6043. [Google Scholar]
- Bhattacharjee, A.; Murugan, R.; Goel, T. A hybrid approach for lung cancer diagnosis using optimized random forest classification and K-means visualization algorithm. Health Technol. 2022, 12, 787–800. [Google Scholar] [CrossRef]
- Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef] [PubMed]
- Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K.; et al. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv 2017, arXiv:1711.05225. [Google Scholar]
- Gaba, S.; Budhiraja, I.; Kumar, V.; Garg, S.; Kaddoum, G.; Hassan, M.M. A federated calibration scheme for convolutional neural networks: Models, applications and challenges. Comput. Commun. 2022, 192, 144–162. [Google Scholar] [CrossRef]
- Xie, Y.; Zaccagna, F.; Rundo, L.; Testa, C.; Agati, R.; Lodi, R.; Manners, D.N.; Tonon, C. Convolutional neural network techniques for brain tumor classification (from 2015 to 2022): Review, challenges, and future perspectives. Diagnostics 2022, 12, 1850. [Google Scholar] [CrossRef]
- Falco, P.; Lu, S.; Natale, C.; Pirozzi, S.; Lee, D. A transfer learning approach to cross-modal object recognition: From visual observation to robotic haptic exploration. IEEE Trans. Robot. 2019, 35, 987–998. [Google Scholar] [CrossRef]
- Do, C.B.; Ng, A.Y. Transfer learning for text classification. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005; Volume 18. [Google Scholar]
- Shivakumar, P.G.; Georgiou, P. Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations. Comput. Speech Lang. 2020, 63, 101077. [Google Scholar] [CrossRef]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Chowdhury, M.E.; Rahman, T.; Khandakar, A.; Mazhar, R.; Kadir, M.A.; Mahbub, Z.B.; Islam, K.R.; Khan, M.S.; Iqbal, A.; Al Emadi, N.; et al. Can AI help in screening viral and COVID-19 pneumonia? IEEE Access 2020, 8, 132665–132676. [Google Scholar] [CrossRef]
- Rahman, T.; Khandakar, A.; Qiblawey, Y.; Tahir, A.; Kiranyaz, S.; Kashem, S.B.A.; Islam, M.T.; Al Maadeed, S.; Zughaier, S.M.; Khan, M.S.; et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput. Biol. Med. 2021, 132, 104319. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
- Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; ter Haar Romeny, B.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 1987, 39, 355–368. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Van Esesn, B.C.; Awwal, A.A.S.; Asari, V.K. The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv 2018, arXiv:1803.01164. [Google Scholar]
- Qassim, H.; Verma, A.; Feinzimer, D. Compressed residual-VGG16 CNN model for big data places image recognition. In Proceedings of the 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8–10 January 2018; pp. 169–175. [Google Scholar]
- Yoo, H.J. Deep convolution neural networks in computer vision: A review. IEIE Trans. Smart Process. Comput. 2015, 4, 35–43. [Google Scholar] [CrossRef]
- Ullah, A.; Elahi, H.; Sun, Z.; Khatoon, A.; Ahmad, I. Comparative analysis of AlexNet, ResNet18 and SqueezeNet with diverse modification and arduous implementation. Arab. J. Sci. Eng. 2022, 47, 2397–2417. [Google Scholar] [CrossRef]
Training Set | Validation Set | Test Set | Total | |
---|---|---|---|---|
COVID-19 | 2351 | 555 | 710 | 3616 |
Lung opacity | 3915 | 937 | 1160 | 6012 |
Normal | 6478 | 1160 | 2071 | 10,192 |
Pneumonia | 801 | 252 | 292 | 1345 |
AlexNet | AlexNet (IPEWF) | VGG16 | VGG16 (IPEWF) | |
---|---|---|---|---|
Parameters | 5.70 × 107 | 1.34 × 108 | ||
Accuracy | 76.28% | 77.69% | 82.16% | 84.38% |
ResNet18 | ResNet18 (IPEWF) | GoogLeNet | GoogLeNet (IPEWF) | |
Parameters | 1.12 × 107 | 9.94 × 106 | ||
Accuracy | 83.88% | 85.40% | 88.40% | 89.60% |
AlexNet (IPEWF) | AlexNet (IPEWF + Transfer) | VGG16 (IPEWF) | VGG16 (IPEWF + Transfer) | |
---|---|---|---|---|
Parameters | 5.70 × 107 | 1.34 × 108 | ||
Accuracy | 77.69% | 90.36% | 84.38% | 93.97% |
ResNet18 (IPEWF) | ResNet18 (IPEWF + Transfer) | GoogLeNet (IPEWF) | GoogLeNet (IPEWF + Transfer) | |
Parameters | 1.12 × 107 | 9.94 × 106 | ||
Accuracy | 85.40% | 94.14% | 89.60% | 92.58% |
COVID-19 | Lung Opacity | Normal | Viral Pneumonia | |
---|---|---|---|---|
VGG16 (IPEWF) | 0.966 | 0.945 | 0.950 | 0.996 |
VGG16 (IPEWF + Transfer) | 0.998 | 0.989 | 0.989 | 0.999 |
AlexNet (IPEWF) | 0.921 | 0.915 | 0.898 | 0.994 |
AlexNet (IPEWF + Transfer) | 0.991 | 0.976 | 0.976 | 0.998 |
ResNet18 (IPEWF) | 0.972 | 0.950 | 0.953 | 0.996 |
ResNet18 (IPEWF + Transfer) | 0.998 | 0.986 | 0.985 | 0.999 |
GoogLeNet (IPEWF) | 0.989 | 0.972 | 0.974 | 0.998 |
GoogLeNet (IPEWF + Transfer) | 0.997 | 0.984 | 0.984 | 0.999 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, Z.; Shi, Z.; Yan, X.; Zhang, B.; Song, S.; Tang, C. An Improved Weighted Cross-Entropy-Based Convolutional Neural Network for Auxiliary Diagnosis of Pneumonia. Electronics 2024, 13, 2929. https://doi.org/10.3390/electronics13152929
Song Z, Shi Z, Yan X, Zhang B, Song S, Tang C. An Improved Weighted Cross-Entropy-Based Convolutional Neural Network for Auxiliary Diagnosis of Pneumonia. Electronics. 2024; 13(15):2929. https://doi.org/10.3390/electronics13152929
Chicago/Turabian StyleSong, Zhenyu, Zhanling Shi, Xuemei Yan, Bin Zhang, Shuangbao Song, and Cheng Tang. 2024. "An Improved Weighted Cross-Entropy-Based Convolutional Neural Network for Auxiliary Diagnosis of Pneumonia" Electronics 13, no. 15: 2929. https://doi.org/10.3390/electronics13152929
APA StyleSong, Z., Shi, Z., Yan, X., Zhang, B., Song, S., & Tang, C. (2024). An Improved Weighted Cross-Entropy-Based Convolutional Neural Network for Auxiliary Diagnosis of Pneumonia. Electronics, 13(15), 2929. https://doi.org/10.3390/electronics13152929