1. Introduction
Cancer has been regarded as the leading cause of death among the world’s population from past to present, and its prevalence is expected to rise steadily. Among many types of common cancers, lung cancer is the leading cause of death, followed by colorectal, liver, stomach, and female breast cancers. According to the International Agency for Research on Cancer, there were 2.2 million new cases of lung cancer diagnosed and 1.8 million deaths globally in 2020 [
1]. The majority of lung cancer patients do not show symptoms until the disease has advanced, but some early lung cancer patients may show the symptoms; therefore, early diagnosis can lower the mortality rate significantly [
2]. Furthermore, lung cancer can be cured, and treatment is more effective if it is detected early [
3,
4]. In general, there are many techniques for diagnosis and staging of lung cancer such as computed tomography (CT), positron emission tomography—computed tomography (PET–CT), magnetic resonance imaging (MRI), and EBUS [
5,
6,
7]. EBUS has become popular in recent years since this technique utilizes no radiation and scans in real time. It is the most recent screening technology for obtaining small wounds with minimal pain [
8]. Although EBUS is a good way to detect lung cancer early, its performance is limited by tissue superposition, which can result in false-negative diagnoses [
9].
In clinical research, many researchers attempt to find criteria to distinguish pulmonary lesions in EBUS images by using both retrospective and prospective methods [
10,
11,
12,
13]. According to previous research [
14], the characteristics of malignant lesions in EBUS images have a heterogeneous pattern, a short axis, presence of coagulation necrosis sign, round shape, distinct margin, and absence of central hilar structure, while the characteristics of benign lesions in EBUS images show the presence of calcification, nodal conglomeration, and echo intensity. As a result, in visual tasks, precise and reliable EBUS interpretation and lung cancer diagnosis are extremely challenging and also depend on the skills and experiences of radiologists. Therefore, several computer-aided diagnosis (CAD) methods have been proposed to address this problem.
Morikawa et al. [
15] studied 30 malignant and 22 benign EBUS images from 60 patients who were subjected to a bronchoscopy using histogram-based quantitative evaluation of EBUS images. The regions of interest (ROIs) inside EBUS images were suitably selected by experimenting with a phantom model submerged in water to extract six histogram features. The extracted features of EBUS images were distinguished by using Mann–Whitney U tests.
Alici et al. [
16] processed 1051 lymph nodes from 532 patients by using the sonographic features such as grayscale, echogenicity, shape, size, margin, presence of necrosis, presence of calcification, and absence of central hilar structure via EBUS images. Decision tree analysis was applied to discriminate lymph nodes between benign and malignant.
Khomkham and Lipikorn [
17] proposed two robust features that were extended from a gray-level co-occurrence matrix (GLCM) as well as a technique for lung cancer classification utilizing a genetic algorithm and support vector machines (SVM). The classification performance with accuracy, sensitivity, specificity, and precision is 86.52%, 87.27%, 85.29%, and 90.57%, respectively.
Gómez et al. [
18] studied the performance of 22 co-occurrence statistics in conjunction with six gray-scale quantization levels to identify breast lesions on ultrasound (BUS) images. The 436 BUS images were utilized in this study; the number of carcinoma lesions was 217 and the number of benign lesions was 219. The best area under the curve obtained from using 32 gray levels and 109 features was 0.81.
Radiomics analysis is also widely used in cancer diagnosis [
15,
16,
17,
18]. The concept of radiomics analysis is to extract a massive number of quantitative features from medical images by using shape features, first order features, second order features, or higher order features. In recent years, deep learning (DL) methods have been used tremendously in computer vision aided by advances in computation and very large amounts of data. In comparison to traditional machine learning, deep learning can accurately detect appropriate features for particular classification tasks and possibly clarify feature selection problems without the need for complicated image processing pipelines and pattern recognition procedures. As a superb method in DL technology, convolutional neural network (CNN) has been significantly improved in image classification and object detection, including medical imaging and it is now one of the dominant methods. CNN has been applied to medical images to solve many different problems.
Jia et al. [
19] presented a novel framework for properly classifying cervical cells based on the strong feature CNN–support vector machine (SVM) model. The technique was developed for merging the strong features recovered by GLCM and Gabor with abstract features acquired from CNN’s hidden layers. Their method outperformed state-of-the-art models with 99.3 percent of accuracy.
Tan et al. [
20] proposed a modified CNN-based 3D-GLCM to classify polyps in colonography. This model could handle a small number of datasets by using the advantage of GLCM features. The experimental results show that CNN learning from GLCMs outperforms CNN on raw CT images in terms of classification performance. The model achieves up to 91 percent accuracy by using two-fold cross-validation.
Islam et al. [
21] created a deep learning approach consisting of the combination of CNN and long short-term memory (LSTM) to autonomously diagnose COVID-19 via X-ray images. The CNN was utilized for deep feature extraction, while LSTM was used for standard feature extraction and COVID-19 diagnosis. The experimental results reveal that the suggested method obtained an accuracy of 99.4 percent.
Li et al. [
22] used chest X-ray (CXR) images to assess the predictive performance of DL models in the recognition and classification of pneumonia. In the pooling step, they utilized bivariate linear mixed models. The results demonstrate that DL performed well in differentiating bacterial from viral pneumonia and in categorizing pneumonia from normal CXR radiographs.
Zhang et al. [
23] developed a ResNet model for medical picture classification in smart medicine by replacing global average pooling with adaptive dropout. The results of the experiments on a GPU cluster indicate that the provided model delivered excellent recognition performance without a significant loss in efficiency.
Cai et al. [
24] developed a mask region–convolutional neural network (Mask R–CNN) and ray-casting volume rendering algorithm-based detection and segmentation techniques for lung nodule 3D visualization diagnosis. Mask R–CNN of weighted loss achieved sensitivities of 88.1 percent and 88.7 percent, respectively.
Wang et al. [
25] presented a new multiscale rotation-invariant convolutional neural network (MRCNN) model for identifying different kinds of lung tissue using high-resolution computed tomography. The suggested technique outperformed the most recent findings on a public interstitial lung disease database.
Anthimopoulos et al. [
26] proposed to use a deep CNN to categorize patch-based CT image into seven groups, containing six distinct interstitial lung disease patterns and healthy tissue. A new network architecture was created to capture the low-level textural characteristics of lung tissue. According to the experiments, the categorization performance was around 85.5 percent.
In 2019, Chen et al. [
27] proposed the CAD system for differentiating lung lesions via EBUS images using CNN. Because the dataset was small, data augmentation was performed by flipping and rotating images. Then the fine-tuned CaffeNet–SVM was used to differentiate lung lesions. The experimental results revealed that the proposed system to achieve up to 85.4 percent accuracy.
In 2021, Lei et al. [
28] proposed a low-dose CT image denoising method for improved performance of lung nodule classification. Because scans have substantial noise, they have significant influence on lung nodule classification. The proposed method enables cooperative training of image denoising and lung nodule classification by utilizing self-supervised loss and cross-entropy loss. According to the experiments, the simultaneous training of image denoising and lung nodule classification increases the performance significantly.
Lei et al. [
29] proposed a novel method for analyzing shape nodule with a CNN using soft activation mapping. Soft activation mapping captures more fine-grained and discrete attention regions to locate the low-grade malignant nodule. The results of the experiments on the LIDC–IDRI dataset revealed that the proposed method outperformed state-of-the-art models in terms of false positive rate.
Ensemble methods are techniques for developing multiple models and then combining them to produce better results. Moreover, when compared to a single model, ensemble approaches often produce more accurate results. Recently, an ensemble method has been reported in a variety of fields. The ensemble method has been applied to medical images to solve many different problems. Guo et al. [
30] proposed an ensemble learning method for COVID-19 diagnosis via CT obtained by using ordinal regression. This model could enhance classification accuracy by learning both intraclass and interclass links between phases. The experimental results revealed that as modified ResNet-18 was utilized as the backbone; accuracy rose by 22% when compared to standard approaches.
However, most of the existing techniques need large datasets to yield satisfactory results. Thus, this paper proposes a novel pulmonary lesion classification framework that does not need a large training dataset by combining radiomics features and patient data with standard features that are extracted from EBUS images as input data, then using random forest, CNN, and weighted ensemble to classify pulmonary lesions.
The structure of this paper is as follows:
Section 2 describes the details of the materials;
Section 3 explains the proposed framework; the results and discussion are summarized in
Section 4; and
Section 5 provides the conclusion.