Adaptive Dynamic Learning Rate Optimization Technique for Colorectal Cancer Diagnosis Based on Histopathological Image Using EfficientNet-B0 Deep Learning Model

Abd El-Ghany, Sameh; Mahmood, Mahmood A.; Abd El-Aziz, A. A.

doi:10.3390/electronics13163126

Open AccessArticle

Adaptive Dynamic Learning Rate Optimization Technique for Colorectal Cancer Diagnosis Based on Histopathological Image Using EfficientNet-B0 Deep Learning Model

by

Sameh Abd El-Ghany

^*

,

Mahmood A. Mahmood

and

A. A. Abd El-Aziz

Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakakah 72388, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(16), 3126; https://doi.org/10.3390/electronics13163126 (registering DOI)

Submission received: 6 July 2024 / Revised: 1 August 2024 / Accepted: 5 August 2024 / Published: 7 August 2024

(This article belongs to the Special Issue Revolutionizing Medical Image Analysis with Deep Learning)

Download

Browse Figures

Versions Notes

Abstract

:

The elevated death rate associated with colorectal cancer (CRC) continues to impact human life worldwide. It helps prevent disease and extend human life by being detected early. CRC is frequently diagnosed and detected through histopathological examination. The decision is based on clinicians’ subjective perceptions and daily image analyses. Histological image (HI) classification is difficult because HIs contain multiple tissue types and characteristics. Therefore, deep learning (DL) models are employed to classify different kinds of CRC HIs. Therefore, to increase the efficiency of the CRC diagnostic procedure from HIs, we propose a fine-tuning model for the CRC diagnosis process with the EfficientNet-B0 DL model. The proposed model performs a multi-classification for HIs. It uses an adaptive learning rate (ALR) to overcome the overfitting problem caused by using the static learning rate (SLR) and to enhance the performance of detecting the CRC. The ALR compares the training loss value at the beginning of each epoch. If it is smaller, we increase the ALR; if it is larger, we decrease it. Our proposed model speeds diagnosis, reduces diagnostic costs, and reduces medical errors; hence, it enhances the diagnostic procedure from the patient’s perspective. We trained and evaluated the proposed model over the two datasets (NCT-CRC-HE-100K and CRC-VAL-HE-7K). Normalization and scaling methods were used to pre-process the NCT-CRC-HE-100K dataset. The EfficientNet-B0 model attained accuracy, sensitivity, specificity, precision, and an F1-score of 99.87%, 99.64%, 99.95%, 99.62%, and 99.63%, respectively when applied to the NCT-CRC-HE-100K dataset. On the CRC-VAL-HE-7K dataset, the EfficientNet-B0 model achieved 99%, 94.52%, 99.45%, 94.41%, and 94.36% for accuracy, sensitivity, specificity, precision, and F1-score, respectively. As a result, the EfficientNet-B0 model outperforms the state of the art in this field.

Keywords:

colorectal; cancer; deep learning; EfficientNet-B0; fine-tuning; adaptive learning rate; transfer learning

1. Introduction

As per worldwide malignant growth measurements, CRC is the third globally in men and the second globally in women in terms of incidence (6.1%). In 2018, there were over 1.85 million new cases and about 0.88 million deaths worldwide, accounting for 9.2% of cancer-related deaths [1]. In 2022, the American Cancer Society estimated that there were approximately 1.5 million new cases of colorectal cancer (CRC) in the United States, resulting in about 53,000 deaths. By the year 2030, it is anticipated that new cases and deaths from CRC will increase by 60% globally [2]. It happens in the internal organs or rectum and is created by limitless cell division because of quality development.

Age, polyposis, eating a lot of processed food, being overweight, smoking, drinking too much alcohol, and having a family history of colon cancer are all significant factors that increase the risk of developing CRC. Clinicians and researchers face difficulties in detecting CRC from HIs. HIs derived from tissue slides contain a wealth of information about cell morphologies and tissue structures. For instance, pathologists use hematoxylin and eosin (H&E) stain slides, which are the primary tissue stains utilized in histology, to identify various cell types, such as tumor cells, infiltrating lymphocytes, and stroma, to perform a diagnosis. However, manual inspection takes more time and effort, is less quantitative, and may not detect intricate image patterns. Prognosticators can be directly extracted from the H&E. The typical tissue categories consist of adipose tissue, stroma associated with cancer, normal colon mucosa, lymphocytes, and abnormal growths [3].

Polyps are chiefly liable for CRC. A polyp is a noncancerous expansion that fills gradually in the inward edge of the colon or rectum. Adenomatous and hyperplastic are the most widely recognized kinds of polyps. Adenomatous polyps have a higher chance of developing disease. Polyps that bring carcinogenic cells are known as dangerous polyps [4].

Scientists and researchers have conducted many studies to help diagnose CRC early. Tests like fecal occult blood, fecal immuno-chemistry, and colonoscopy are utilized to identify CRC by inspecting various abnormalities on the colon’s surface, such as their location, shape, and pathological changes. These tests enhance both the accuracy of diagnosis and the ability to predict the disease’s severity for better treatment. However, they can sometimes yield incorrect results, either showing a problem where none exists (false positive) or missing a problem that is present (false negative), causing unnecessary worry, procedures, or overlooked diagnoses. The high costs and limited availability of screening facilities can impede broad adoption, especially in areas with fewer resources. Colonoscopy, considered the most reliable CRC screening method, is intrusive and may discourage people from being screened regularly. Symptoms of CRC can be similar to those of other digestive disorders, complicating early detection. Additionally, current treatments, like surgery, chemotherapy, and radiation, can have significant side effects that affect patients’ quality of life. Tumors can become resistant to chemotherapy and targeted therapies, limiting their long-term effectiveness. Even though precision medicine is progressing, customizing treatments for individual patients based on genetic and molecular profiles remains intricate and not yet fully realized [5].

Pathologists manually review histological slide images of CRC tissue for subjective evaluation, which is still the standard for cancer diagnosis and staging. However, each pathologist’s training, experience, evaluation conditions, and time constraints may result in a different diagnosis. Consequently, there is notable clinical importance in the overall automated categorization of pathological tissue slide images of CRC for equitable assessment [6].

To increase treatment efficacy and survival rates, accurate and prompt detection of CRC is essential. To diagnose CRC, highly trained pathologists must conduct a thorough visual examination. A crucial concept in the early diagnosis of CRC is medical imaging [7]. It is challenging and time-consuming to analyze the data concerning the disease’s progression rate despite the increase in medical imaging data. In addition, if data misinterpretation is considered when diagnosing diseases, the accuracy performance decreases, and the period of early detection lengthens [8]. As a result, real-time, accurate, objective inspection results may benefit patients.

The initial technique used in medical imaging is the capturing of digital whole-slide images (WSIs) of tissue samples stained with H&E. These samples can be either formalin-fixed, paraffin-embedded (FFPE) or frozen tissue specimens. WSI analysis is difficult and time-consuming due to histological variations in size, shape, texture, and staining of nuclei, as well as the extremely large image size (>10,000 pixels) [9].

Another method for diagnosing cancer is histopathological inspection. Histo-pathology examines the polyp under a microscope to ascertain its precise location and extent. The histopathological diagnosis requires a clinical specialist’s assistance, requires more inspection time, and is based on clinicians’ subjective perceptions [10].

Artificial intelligence (AI) is a promising and rapidly advancing field within medical imaging and healthcare. AI, particularly DL and computer vision techniques, is being used to assist in the early and accurate detection of CRC, which can significantly improve patient outcomes. AI models can be applied to the analysis of medical images, such as colonoscopy and computed tomography (CT) scans, to detect abnormalities or suspicious lesions. AI can help radiologists and gastroenterologists identify potential areas of concern more accurately [5].

DL is a subset of AI that utilizes multiple layers of neural networks. These numerous layers enable DL to gradually extract higher-level features from data, reducing the need for human intervention in recognizing different classes within images. Considered a technological advancement in the field of machine learning (ML), DL methods require less human intervention and can autonomously generate results that rival or surpass those produced by humans.

Utilizing DL in clinical application tools enables the utilization of a wider variety of input data to optimize patient care decisions that necessitate a clinician’s considerable time investment. Using DL will increase the capability of detecting CRC over traditional methods.

All classifying models must be trained with enough images to acquire distinctive spatial domain features. DL methods work well when there are many images to learn from [11]. When choosing the hyperparameters and model parameters, the use of images for training and their availability are considered. Following each training step, the model updates the new weights for a predetermined number of classes to use the utilized dataset to learn from and classify images. Hyperparameters like momentum, batch size, and leaning rate (LR) need to be fine-tuned to improve the performance of the artificial neural network (ANN) model in particular applications. The LR is the most important hyperparameter for increasing model accuracy during convolutional neural network (CNN) model training. The application-appropriate LR is determined through trial and error by the conventional LR strategies of exponential decay, step decay, and constant LR. Instead of using other options, the primary approach is to train the model by employing a strategy where the LR is fixed.

When the LR is set too low, the model takes a longer time to reach convergence. However, if the LR is too high, the model’s training process deviates, leading to unsatisfactory outcomes. By using an optimal LR, the network can converge in fewer iterations. The LR plays a crucial role in determining how much of the loss gradient is backpropagated, ultimately affecting the model’s ability to reach global minima. Only the computational effort required to move forward is worthwhile once the gradient reaches its local minimum. ALR training employs the LR that fluctuates by a pre-determined value if there is no improvement in accuracy after a few epochs or the LR remains stuck at local minima. In contrast, in the nonadaptive schedule, the LR will either remain constant throughout the training or decrease gradually over time at each epoch.

There exist three ALR methods: the cyclical learning rate (CLR) [12], stochastic gradient descent (SGD) with warm restarts (SGDWRs) [13], and cosine annealing [14]. The CLR varies between pre-established maximum and minimum values while training. The level of the LR is kept very low, but it gradually increases until it reaches its highest point. Conversely, the nonadaptive schedule decreases the LR in small steps or remains constant throughout the training. The LR then returns to the fundamental value of completing one cycle. As a result, a cycle consists of two steps and a fixed step size—the number of loops over which the LR reaches its maximum value. Until the final epoch of the triangular LR, the pattern repeats itself after each training cycle. Although raising the LR will temporarily affect the accuracy, it will eventually decrease the loss associated with training in the future [12].

The deep neural network (DNN) with SGD is the DNN training algorithm. The optimizer updates the parameters after each epoch. Even though optimization progresses through incremental advancements, Figure 1 shows that the time taken for convergence increases when encountering saddle point plateaus with a lower LR. Increasing the LR can avoid saddle points in nonconvex optimization problems.

Cosine annealing provides an alternative method to modify the LR schedule within the ALR framework. The cosine function determines the annealing schedule and begins with a high learning rate before rapidly increasing again. Equation 1 can be used to represent the cosine annealing schedule. With cosine annealing, each batch’s learning rate decreases for the sixth run.

η_{t} = η_{m i n}^{i} + 0.5 (η_{m a x}^{i} - η_{m i n}^{i}) (1 + \cos (T_{c u r} * π / T_{i}))

(1)

η_{m i n}^{i}

and

η_{m a x}^{i}

are the LR ranges, and

T_{c u r}

is the number of epochs that have passed since the most recent restart [15].

This research uses the EfficientNet-B0 algorithm with ALR to propose a fine-tuned model. The ALR compares the training loss value at the beginning of each epoch to those of the preceding. If it is smaller, we increase the ALR; if it is larger, we decrease it. The proposed model with ALR overcomes the overfitting problem caused by using the SLR, enhancing the performance of detecting the CRC. Clinicians will have an easier time diagnosing CRC with the proposed model. The proposed model reduces the workload of pathologists to speed diagnosis, reduces diagnostic costs, reduces medical diagnostic errors, and enhances early diagnosis. Hence, patients will have the necessary treatment rapidly. The proposed model performs a multi-classification for HIs. We performed training and assessment on two different datasets, namely, NCT-CRC-HE-100K and CRC-VAL-HE-7K. The NCT-CRC-HE-100K dataset was pre-processed by scaling and normalization methods. On the NCT-CRC-HE-100K dataset, the EfficientNet-B0 model achieved 99.87%, 99.64%, 99.95%, 99.62%, and 99.63% for accuracy, sensitivity, specificity, precision, and F1-score, respectively. On the CRC-VAL-HE-7K dataset, the EfficientNet-B0 model achieved 99%, 94.52%, 99.45%, 94.41%, and 94.36% for accuracy, sensitivity, specificity, precision, and F1-score, respectively. Therefore, the proposed model using the EfficientNet-B0 model outperforms the state of the art in this field. A summary of the research’s contributions is provided below:

We proposed a highly accurate model that automatically detects the CRC from HIs using the EfficientNet-B0 as a full CNN.
The proposed model with ALR improved CRC detection performance and solved the overfitting issue brought on by SLR.
We used the EfficientNet-B0 for its highly accurate multi-classification and compared its results to the previous methods for determining the CRC.
The proposed model can quickly diagnose CRC with high accuracy, reducing the time and money needed to diagnose the disease and assisting clinicians in providing CRC patients with the appropriate treatment.
We used the NCT-CRC-HE-100K and the CRC-VAL-HE-7K datasets to train and evaluate the proposed model. The proposed model achieved a 99% accuracy when applied to the CRC-VAL-HE-7K dataset.

The remaining part of this paper is structured as follows: Section 2 provides an overview of the existing research on the diagnosis system for CRC. Section 3 offers a comprehensive description of the materials and design of the model. Section 4 encompasses the implementation and assessment of the system. Finally, Section 5 provides a summary of the research.

2. Literature Review

Lin Qi et al. [3] proposed a DL framework—CRC-SPA—to identify prognostic spatial organization features based on HIs. The whole-slide SOFs measured by CRC-SPA summarized the delegate qualities of agreement sub-atomic subtypes. Utilizing two free CRC accomplices, they exhibited the predictive worth of invading lymphocyte and stroma proportions, which warrants further streamlining and, what is more, approval in later examinations.

Min-Jen Tsai et al. [5] proposed an improved version of DL parameters to improve classification accuracy. They compared the ability of the five DL network models (AlexNet, SqueezeNet, VGGNet, GoogLeNet, and ResNet) that were utilized the most frequently to accurately distinguish between CRC tissues to verify the accuracy of the proposed optimized parameters by employing CRC HIs as the experimental dataset. The classification accuracy of the 100,000 HIs in the NCT-HE-100K dataset was approximately 99% when evaluated on an internal test dataset. On an external test dataset from the Kather-texture-2016-image dataset, the accuracy was measured at 94.3%. The result of ResNet50 was 99.69% accuracy in the same internal testing dataset and 99.32% in the same external testing dataset. Moreover, the independent dataset with eight classes was utilized for comparison purposes. As a result, ResNet50 achieved an accuracy rate of 94.86%, and the highest was 87.4%. ResNet50, with the settings recommended by the authors, may be the most effective and accurate DL technique for classifying CRC tissue.

Anurodh Kumar et al. [7] proposed the CRCCN-Net model for automatically classifying HIs in colorectal tissue. The researchers utilized colorectal histology and the NCT-CRC-HE-100K datasets. They compared four pre-trained models, namely, VGG16, InceptionResNetV2, Xception, and DenseNet121, and their own CRCCN-Net, in terms of different performance measures. CRCCN-Net obtained an accuracy rate of 93.50% and 96.26% on the colorectal histology dataset and NCT-CRC-HE-100K dataset, respectively. On the consolidated dataset, the developed CRCCN-Net received an F1-score of 98.70%, an accuracy of 99.21%, a sensitivity of 98.23%, a precision of 99.18%, and a specificity of 99.80%. The lung cancer dataset has also been used to test the proposed CRCCN-Net’s ability to classify lung tissue into multiple categories.

K. S. Wang et al. [16] proposed a novel patch aggregation method for clinical CRC diagnosis using weakly labeled pathological WSI patches. An unprecedentedly large number of 170,099 patches, with more than 14,680 WSIs and more than 9631 subjects, were used to train and validate the proposed method. The proposed model was evaluated on four independent datasets and achieved an accuracy of 98.11% and an area under the curve (AUC) of 98.8%.

Ke Zhaoa et al. [17] used transfer learning and trained a CNN model to determine the tumor–stroma ratio (TSR) using WSI. The proposed model was trained on the NCT-HE-100K dataset and evaluated on two independent datasets, and it achieved an accuracy of 75.9%.

Xingyu Li et al. [18] hypothesized that more accurate predictions of CRC survival can be performed by incorporating holistic patch data. To examine this theory, they created a survival learning algorithm called DeepDis-MISL, which is based on distribution and multiple instances. They applied it to the MCO CRC and TCGA COAD-READ international CRC WSIs datasets. The proposed model was compared with six existing models: DeepAttnMISL, Meanpooling, MesoNet, Maxpooling (top 10 instance), MeanFeaturePool, and Maxpooling (top 1 instance). The proposed model had a significant mean C-index (0.647) for the MCO CRC dataset. In both the five-fold cross-validation and the external validation, DeepDisMISL outperformed MesoNet by 6.3% and 2.8% of the mean C-index, respectively.

Aurelia Bustos et al. [19] proposed a DL framework for predicting microsatellite instability (MSI) in CRC from the NCT-CRC-HE-100K and CRC-VAL-HE-7K datasets. To identify the areas of interest, the proposed system integrated a module that classifies tissue types and a strategy that rejects multiple biases using adversarial training. Regarding the MSI prediction task, the learned features from the bias-ablated model had the highest discriminative power and the lowest statistical mean dependence with the biases. With four magnifications and all three selected tissues (lymphocytic regions, tumor epithelium, and mucin), the AUC was 0.87 at the tile level. It increased to 0.9 at the patient level, and the accuracy was 88%.

Jakob Nikolas KatherID et al. [20] trained the VGG19 CNN model on the NCT-HE-100K dataset of HI images. The proposed CNN was evaluated on the CRC-VAL-HE-7K dataset. The overall nine-class accuracy in an internal testing dataset was close to 99%, while in an external testing dataset, it was 94.3%.

Yiqing Shen et al. [21] trained a stain-agnostic DL model by combining stain normalization (SN) and stain augmentation (SA) using a novel RandStainNA scheme that limits the range of possible variations in stain styles. SN in a collection of color spaces can be accomplished with RandStainNA. The authors proposed a random color space selection scheme to enhance performance. The researchers utilized classification and segmentation, which are two types of image analysis techniques, to assess the RandStainNA method they proposed. They employed the NCT-CRC-HE-100K-NONORM dataset for patch-level classification during the training and validation phase, and the CRC-VAL-HE-7K dataset for conducting external testing. For the proposed model, ResNet18, ResNet50, MobileNet, EfficientNet, ViT, and SwinTransformer achieved an accuracy of 94.66%, 94.45%, 94.53%, 94.62%, 93.34%, and 92.39%.

Alexander Khvostikov et al. [22] proposed a novel CNN-based automatic tissue-type recognition strategy. Additionally, they described an efficient training pipeline that used two distinct training datasets. The automatic identification of tissue types using the proposed technique achieved an accuracy of 92.9% and a balanced accuracy of 90.3% when applied to the CRC-VAL-HE-7K dataset. In total, nine different types of tissues were successfully classified. The test subset of slide images from the PATH-DTMSU dataset could classify five types with an accuracy of 98% and a balanced accuracy of 92.6%.

In our proposed model, we have incorporated ALR instead of a fixed LR to address its drawbacks. The ALR technique flexibly modifies the LR during training based on the data and model characteristics, unlike the fixed LR approach. The ALR strategy tends to converge more rapidly compared to the fixed LR method. For datasets with sparse features, the ALR method can assign varying LRs to different parameters, optimizing the learning process for sparse data more effectively. By adjusting the LR according to gradients, these algorithms can navigate the loss surface more efficiently, leading to faster convergence towards a local or global minimum. Unlike selecting an appropriate fixed LR, which can be a complex task, the ALR method automates LR adjustments during training, reducing the need for manual tuning and making the process more user-friendly. Furthermore, the dynamic adjustment of LR by the ALR method enhances robustness to the initial LR choice, minimizing convergence issues caused by inappropriate learning rates and ensuring more stable and reliable training. Lastly, the ALR method offers a solution to the vanishing gradient problem in deep networks by boosting the LR for parameters with small gradients, thereby sustaining an adequate gradient magnitude for effective learning and preventing training stagnation.

Moreover, the EfficientNet-B0 DL model stands out for its impressive accuracy across various benchmarks, including ImageNet. In comparison to earlier architectures like ResNet, Inception, and MobileNet, it achieves this high accuracy using fewer parameters and floating-point operations. This versatility makes it a great choice for both high-performance tasks and resource-limited settings. One key innovation of EfficientNet-B0 is its use of compound scaling, a technique that ensures a well-balanced adjustment of network dimensions (such as depth, width, and resolution). By adopting this approach, the model can make optimal use of its capacity, enhancing performance without significantly increasing computational demands.

3. Materials and Methods

3.1. Materials

In this research, we trained and evaluated the proposed model over the two datasets, NCT-CRC-HE-100K and CRC-VAL-HE-7K, which are freely accessible. The NCT-CRC-HE-100K dataset is a detailed collection of 100,000 unique image patches taken from 86 human CC and normal tissue slides stained with H&E. These images were sourced from the National Center for Tumor Diseases (NCT) biobank in Heidelberg, Germany, and the University Medical Center Mannheim (UMM) pathology archive in Germany. All images were normalized using the Macenko method, with each image having dimensions of 224 × 224 pixels (px) and a resolution of 0.5 m/px [23]. The NCT-CRC-HE-100K dataset was categorized according to the tissue type into the nine tissue classes shown in Table 1. Table 2 shows the distribution of HIs among the nine classes of the test set of the NCT-CRC-HE-100K. Figure 2 displays the HIs of the nine categories found in the NCT-CRC-HE-100K dataset. Slides of primary CRC tumors and tumor tissue from CRC liver metastases were found in the tissue samples. To enhance diversity, non-cancerous areas from the gastrectomy sample were incorporated into the normal tissue categories. The normal tissue categories consist of smooth muscle and fatty tissue [22].

The CRC-VAL-HE-7K dataset comprises 7180 HIs collected from 50 patients with CRC adenocarcinoma. These HIs were manually extracted from 25 human cancer tissue slides stained with H&E, which were obtained from formalin-fixed, FFPE samples from the NCT bank. The images serve as a validation set for models that have been trained on the NCT-CRC-HE-100K dataset, ensuring that there is no overlap between the two datasets. Each image in the CRC-VAL-HE-7K dataset has dimensions of 224 × 224 pixels, corresponding to a resolution of 0.5 microns per pixel. To standardize the images, the Macenko method was applied, maintaining the same dimensions of 224 × 224 pixels and a resolution of 0.5 microns per pixel [23]. Table 3 shows the distribution of HIs among the nine classes of the test set of NCT-CRC-HE-7K.

In our research, the dataset was split into three parts (training, test, and validation) using a patch-wise technique. The training and validation sets were utilized to assess how well the model performs when trained on data, while the test set was used for making predictions. As shown in Figure 3, the data were randomly divided, with 70% assigned to the training set, 15% to the validation set, and 15% to the testing set. The validation set aids in choosing hyperparameters like regularization and learning rate. Effective hyperparameter selection can reduce overfitting and improve accuracy. After the model performs well with the validation set, training is halted after a set time to avoid unnecessary experiments. Following the learning phase, the model was tested using a separate test dataset. This unique test set was kept separate during training to ensure no overlap between training and testing data. It was exclusively used to evaluate the model’s performance by calculating metrics such as accuracy, precision, recall, and other measures assessing the model’s ability to generalize to new data.

3.2. Model Architecture and Training

EfficientNet-B0 is currently showing satisfying multi-classification success. The main objective of the extensively proposed model is to develop an automated system that can process HIs to detect and identify CRC disease. The next part will be about the proposed model. Algorithm 1 and Figure 3 detail the proposed model’s steps:

Phase 1—Pre-processing for HIs of NCT-CRC-HE-100K: To begin the process of preparing the HIs of the NCT-CRC-HE-100K dataset for processing, we initiated the downloading of the NCT-CRC-HE-100K dataset. The HIs of this dataset were then subjected to pre-processing, which is a crucial step in obtaining accurate results. To achieve this, scaling and normalization techniques were applied to pre-process the HIs of the NCT-CRC-HE-100K dataset.

Phase 2—Splitting the NCT-CRC-HE-100K dataset: We split the NCT-CRC-HE-100K dataset into a training set of 70%, with 69,999 His; a test set of 15%, with 15,001 HIs, and validation set of 15%, with 15,000 HIs.

Phase 3—Pre-training the EfficientNet-B0 model: In this phase, we applied the supervised pre-training phase, which is the first phase of the transfer learning process. In the supervised pre-training phase, we trained the EfficientNet-B0 on a large dataset such as the ImageNet dataset.

Phase 4—Fine-tuning the EfficientNet-B0 model. In this phase, we applied the fine-tuning phase, which is the second phase of the transfer learning process. In the fine-tuning phase, we tuned the parameters for the EfficientNet-B0 by training it on the training set of the NCT-CRC-HE-100K dataset.

Phase 5—Applying the ADL rate: We determined the error of the training set. We retrained the model if the error of the training set was too high. We calculated the error of the test set if the error in the training set was low. We retrained the model if the error in the test set was too high.

Phase 6—Evaluating the EfficientNet-B0 model: In the final step, we used the test set of the CRC-VAL-HE-7K dataset to evaluate the performance of the EfficientNet-B0 model by calculating the measured metrics: sensitivity, specificity, precision, F1-score, and accuracy.

Algorithm 1: The algorithm of the proposed fine-tuned EfficientNet-B0
1	Input $\to$ NCT-CRC-HE-100K.
2	Output $\leftarrow$ Fine-tuned algorithm for CRC detection.
3	BEGIN
4	STEP 1: Pre-processing of NCT-CRC-HE-100K images
5	FOR EACH HI IN $N C T - C R C - H E - 100 K$ DO
6	Apply the Macenko method for normalization.
7	Resize HI to 224 × 224 px and 0.5m/px.
8	END FOR
9	STEP 2: NCT-CRC-HE-100K Splitting
10	SPLIT $N C T - C R C - H E - 100 K$ INTO
11	Training set → 70%.
12	Testing set → 15%.
13	Validation set → 15%.
14	STEP 3: EfficientNet-B0 Pre-Training
15	Load and pre-train EfficientNet-B0 on the ImageNet dataset.
16	STEP 4: Fine-tuning EfficientNet-B0
17	Fine-tune EfficientNet-B0 on the training set of NCT-CRC-HE-100K.
18	STEP 5: Applying ADL
19	Apply ADL through the training by determining the error of the training set.
20	IF the error of the training set was too high.
21	Re-train the model.
22	ELSE
23	Calculate the error of the test set.
24	END IF
25	IF the error of the set was too high.
26	Re-train the model.
27	ELSE
28	GOTO line 30.
29	END IF
30	Utilize a validation set to implement early stopping based on the optimal performance of the model.
31	Evaluate the results achieved on the test set of NCT-CRC-HE-100K.
32	STEP 6: EfficientNet-B0 Evaluation
33	Assess the performance of EfficientNet-B0 by measuring the accuracy of the test set of CRC-VAL-HE-7K.
34	END

3.2.1. ALR

An ALR is a technique that adjusts the learning rate of a neural network model during training based on certain criteria, such as the current performance, the gradient information, or the training dynamics. The ADL rate can help to overcome some of the challenges of choosing a fixed learning rate, such as finding a good balance between speed and stability, avoiding local optima, and escaping plateaus. There are different methods to implement the ADL rate techniques. Algorithm 2 shows our algorithm for implementing the ADL rate technique.

Algorithm 2: The ALR algorithm
1	Input $\to$ epoch_no, fr: fr > 0.0 and fr < 1.0.
2	Output $\leftarrow$ ADL
3	BEGIN
4	DO
5	Inputs $\to$ $X$
6	IF $(X = 0)$
7	GOTO line 11.
8	ELSE
9	e $poch_no = X$ .
10	END IF
11	WHILE ( $X = 0$ )
12	$X = 1$
13	WHILE ( $X \leq epoch_no)$
14	Inputs $\to$ $dwell_Value$
15	IF (dwell_Value = True)
16	Compute Cu_valid_loss.
17	Compute Cu_train_accu.
18	IF (Cu_valid_loss > Pr_valid_loss)
19	Cu_W = Pr_W.
20	Cu_B = Pr_B.
21	LR = Cu_RL ∗ fr.
22	END IF
23	IF (Cu_train_ac < Pr_train_ac)
24	Cu_W = Pr_W.
25	Cu_B = Pr_B.
26	LR = Cu_LR ∗ fr.
27	END IF
28	END IF
29	X++
30	END WHILE
31	END

In the given algorithm, we first check the number of epochs. If the number is zero, the training is stopped. Otherwise, we proceed with the training process. In each epoch, we check the value of the “dwell” variable. If it is true, we calculate the training accuracy and validation loss. At the end of each epoch, we compare the current validation loss with the previous validation loss from the previous epoch. If the current validation loss is higher, we update the weights of the current epoch with the weights from the previous epoch. Additionally, if the current training accuracy is lower than the previous training accuracy from the previous epoch, we also update the weights. To adjust the learning rate, we multiply the current learning rate by a factor that is between 0.0 and 1.0. This factor helps in controlling the learning rate for better training performance.

3.2.2. EfficientNet-B0

EfficientNet-B0 is a CNN architecture that uses a novel technique called compound scaling to balance model size, accuracy, and computational efficiency. It is the base model of the EfficientNet family, which consists of eight variants (B0 to B7) that have different dimensions and performance. EfficientNet-B0 has 5.3 million parameters and achieves 77.3% top-one accuracy on ImageNet [24].

Compound scaling is a method that uniformly scales the width, depth, and resolution of a neural network using a compound coefficient. This coefficient is determined by a small grid search on the original small model (B0). The other variants (B1 to B7) are obtained by increasing the compound coefficient, resulting in larger and more accurate models [25].

EfficientNet-B0 is based on the inverted bottleneck residual blocks of MobileNetV2, with the addition of squeeze-and-excitation blocks. These blocks help the model to learn more expressive features by adaptively recalibrating the channel-wise feature responses [26]. The particulars of the scale for the model EfficientNet-B0 are shown in Table 4, and Figure 4 depicts the EfficientNet-B0 architecture [27]. The modified training algorithm of Algorithm 2: the modified training algorithm of the EfficientNet-B0 model is shown in Algorithm 3.

Algorithm 3: The modified training algorithm of the EfficientNet-B0 model
1	Input $\to$ NCT-CRC-HE-100K dataset $N C T$
2	Output $\leftarrow$ Trained EfficientNet-B0 model
3	BEGIN
4	Inputs $\to$ 224 × 224 × 3
5	Batch_Size $\to$ 25
6	LR $\to$ 0. 0001
7	Dropout $\to$ 0.4
8	Batch_No $\to$ {momentum = 0.99, epsilon = 0.001}
9	L $\to$ {0,1,2,3,4}.
10	G_AP $\to$ Global average pooling
11	Dense $\to$ = 4
12	FOR EACH HI IN NCT-CRC-HE-100K
13	Data_images. append(HI)
14	Splitting $N C T$
15	B0_Model.fc $\to$ {G_AP, DROP, Dense}
16	B0_Model $\to$ Model (inputs = X.inputs, outputs = B0_Model.fc)
17	OPT $\to$ ADL (0.0001)
18	FOR EACH layer in Model.layers [-20:]
19	IF not instance (layer, layer. Batch_No)
20	layer.trainable = True
21	End IF
22	End FOR
23	B0_Model.compile (OPT, loss = “sparse_categorical_crossentropy”).
24	END

The EfficientNet-B0 architecture involves several components, including depth, width, and resolution scaling. The mathematical model can be described by the following equations:

Depth scaling: The number of layers in the network (d) is scaled with a depth coefficient (α):

$d = α^{\emptyset}$

(2)

where $\emptyset$ is the compound scaling coefficient.
Width scaling: The width of each layer ( $w$ ) is scaled with a width coefficient (β):

$w = β^{\emptyset}$

(3)
Resolution scaling: The input image resolution ( $r$ ) is scaled with a resolution coefficient (γ):

$r = γ^{\emptyset}$

(4)
Number of channels in each layer: The number of channels ( $c$ ) in each layer is determined by a compound coefficient of $β a n d γ$

$c = r o u n d (γ . β)$

(5)
EfficientNet-B0 Block:
- Each block in the network consists of a series of convolutional layers. The number of layers in a block is determined by the depth scaling.
- The block includes a skip connection with a convolutional layer if the input and output dimensions do not match.
- The output of the block is obtained by adding the input and the result of the convolutional layers.

The model architecture is composed of inverted bottleneck residual blocks with squeeze-and-excitation blocks. The number of filters and layers and the resolution for each stage are scaled by

w

,

d

, and

r

respectively.

4. Implementation and Evaluation of the Proposed Model

4.1. Metrics for Model Evaluation

The CRC-VAL-HE-7K dataset evaluated the proposed model by measuring the accuracy, precision, sensitivity, specificity, and F1-score, defined in Equations (6)–(10):

Accuracy = (TP + TN)/(TP + TN + FP + FN)

(6)

Precision = TP/(TP + FP)

(7)

Sensitivity = TP/(TP + FN)

(8)

Specificity = TN/(TN + FP)

(9)

F1-score= 2 × (Precision × Sensitivity)/(Precision + Sensitivity)

(10)

TP, TN, FP, and FN represent true positive, false negative, false positive, and false negative, respectively. Accuracy is the proportion of correct predictions out of the total. Precision is the ratio of true positives to the number of positive predictions. Sensitivity is the proportion of actual cases with positive results that were predicted to be positive. Specificity is the proportion of predicted negatives that turned out to be actual negatives. The F1-score is a harmonic mean of precision and sensitivity.

4.2. Model Implementation

In this research, we implemented four experiments. In the first experiment, the EfficientNet-B0 model was trained using the fixed LR on the NCT-CRC-HE-100K dataset. In the second experiment, the trained EfficientNet-B0 model using the fixed LR was tested on the CRC-VAL-HE-7K dataset. In the third experiment, the EfficientNet-B0 model was trained using the ALR on the NCT-CRC-HE-100K dataset. In the fourth experiment, the trained EfficientNet-B0 model was cross-evaluated on a different dataset to check the model generalization using the CRC-VAL-HE-7K dataset.

During our experiments, we employed the NCT-CRC-HE-100K and CRC-VAL-HE-7K datasets to both train and assess the EfficientNet-B0 model. We split the NCT-CRC-HE-100K dataset into a training set of 70%, with 69,999 His; a test set of 15%, with 15,001 His; and a validation set of 15%, with 15,000 HIs. The implementation process was carried out in the Kaggle environment. Table 5 lists the device specifications used in the implementation procedure.

For the first experiment, the EfficientNet-B0 model was trained using the NCT-CRC-HE-100K dataset. To perform multi-classification on the nine tissue categories of NCT-CRC-HE-100K, we employed an SLR. The first experiment’s results of the multi-classification on the test set using the EfficientNet-B0 model are shown in Table 6. In the first experiment, we classified nine tissue classes: ADI, BACK, DEB, LYM, MUC, MUS, NORM, STR, and TUM. The average of the sensitivity, specificity, precision, F1-score, and accuracy were measured following the evaluation of the trained EfficientNet-B0 model with the SLR on the test set of the NCT-CRC-HE-100K dataset. The EfficientNet-B0 model achieved an average of 67.97%, 96.10%, 80.41%, 65.25%, and 93.12% for sensitivity, specificity, precision, F1-score, and accuracy, respectively.

The EfficientNet-B0 model achieved the highest sensitivity, at 97.39%, for the DEB class. It achieved the highest specificity and precision, at 100%, for the classes LYM and STR. It achieved the highest F1-score and accuracy, at 94.60% and 98.93%, respectively, for the ADI class.

The EfficientNet-B0 model achieved the lowest sensitivity, at 20.37%, for the LYM class. The DEB class had the lowest specificity, precision, and accuracy, at 83.28%, 43.11%, and 84.90%, respectively. It had the lowest F1-score, at 16.94%, for the STR class.

Figure 5 shows the confusion matrix for the EfficientNet-B0 model on the test set of NCT-CRC-HE-100K with the SLR. There were nine tissue classes of the test set. The distribution of the HIs among the nine classes of the test set is presented in Table 2. The accuracy of the EfficientNet-B0 was 89.8%, as it predicted 1402 HIs correctly of 1561 for the ADI class, 75.2% for the BACK class, 97.3% for the DEB class, 20% for the LYM class, 49% for the MUC class, 90.4% for the MUS class, 92.3% for the NORM class, 9.2% for the STR class, and 86.6% for the TUM class.

Figure 6 shows the training and validation loss of EfficientNet-B0 on NCT-CRC-HE-100K with the SLR. The training loss increased as the number of epochs increased, and there was a variance in the validation loss. Since there was a variance in the validation loss, the training loss was low, and the validation loss was very high, we had an overfitting problem.

Figure 7 depicts the training and validation accuracy of EfficientNet-B0 on NCT-CRC-HE-100K with the SLR. After the fifth epoch, the training accuracy was close to 100%, but there was a variance in the validation accuracy, which indicates that the EfficientNet-B0 model was overfitted on the training set.

We evaluated the trained EfficientNet-B0 model with the SLR in the second experiment using the CRC-VAL-HE-7K dataset. The second experiment’s results of the multi-classification using the trained EfficientNet-B0 model are shown in Table 7. The average of the sensitivity, specificity, precision, F1-score, and accuracy were measured following the evaluation of the trained EfficientNet-B0 model with the SLR on the test set of the CRC-VAL-HE-7K dataset. The EfficientNet-B0 model achieved an average of 62.64%, 95.82%, 71.10%, 55.78%, and 92.30% for sensitivity, specificity, precision, F1-score, and accuracy, respectively.

The EfficientNet-B0 model achieved the highest sensitivity, at 99.71%, for the DEB class. It achieved the highest specificity, at 99.98%, for the LYM class. It achieved the highest precision, F1-score, and accuracy, at 98.88%, 96.05%, and 99.09%, respectively, for the BACK class.

The EfficientNet-B0 model achieved the lowest sensitivity and F1-score, at 1.26%, and 2.49%, respectively, for the LYM class. It had the lowest specificity and accuracy, at 83.49% and 82.40%, respectively, for the MUS class. It had the lowest precision, at 25.72%, for the DEB class.

Figure 8 shows the confusion matrix for the EfficientNet-B0 model on the test set of CRC-VAL-HE-7K with the SLR. The test set of CRC-VAL-HE-7K contained nine different tissue categories. The distribution of the HIs among the nine classes of the CRC-VAL-HE-7K is presented in Table 3. The accuracy of the EfficientNet-B0 was 52.7%, as it predicted 706 HIs correctly of 1338 for the ADI class, 93.3% for the BACK class, 99.7% for the DEB class, 1.26% for the LYM class, 64% for the MUC class, 70.2% for the MUS class, 92.7% for the NORM class, 2.6% for the STR class, and 86.5% for the TUM class.

In the third experiment, we trained the EfficientNet-B0 model on NCT-CRC-HE-100K, and we employed an ALR to perform the multi-classification of the nine tissue categories within the NCT-CRC-HE-100K dataset.

The third experiment’s results of the multi-classification on the testing set using the EfficientNet-B0 model are shown in Table 8.

The average of the sensitivity, specificity, precision, F1-score, and accuracy were measured following the evaluation of the trained EfficientNet-B0 model with the ALR on the test set of the NCT-CRC-HE-100K dataset. The EfficientNet-B0 model achieved an average of 99.64%, 99.95%, 99.62%, 99.63%, and 99.87% for sensitivity, specificity, precision, F1-score, and accuracy, respectively.

The EfficientNet-B0 model achieved the highest sensitivity, at 100%, for the DEB and BACK classes. It achieved the highest specificity, precision, F1-score, and 100% accuracy for the BACK class.

The EfficientNet-B0 model achieved the lowest sensitivity, at 99.39%, for the NORM class. It had the lowest specificity, at 99.84%, for the STR class. It achieved the lowest precision, at 99.33%, for the MUC class. The STR class achieved the lowest F1-score and accuracy, at 99.01% and 99.79%, respectively.

Figure 9 depicts the confusion matrix for the EfficientNet-B0 model on the NCT-CRC-HE-100K test set with the ALR. The EfficientNet-B0 classes ADI and BACK had 100% accuracy; the classes DEB and MUS had 99.5%, the LYM class had 99.8%, the classes MUC and STR had 99.6%, and the classes NORM and TUM had 99.3%.

Figure 10 depicts the training and validation loss of EfficientNet-B0 on NCT-CRC-HE-100K. The model was performing well on the training set because the training loss was fairly low, and after the 10th epoch, the validation loss was identical to the training loss. Hence, the EfficientNet-B0 model did not have a problem with bias or variance.

Figure 11 depicts the training and validation accuracy of EfficientNet-B0 on NCT-CRC-HE-100K with the ALR. After the fifth epoch, the training accuracy was 100% and there was no variance in the validation accuracy, and after the 10th epoch, the validation accuracy was identical to the training accuracy. Hence, EfficientNet-B0 was balanced on the training set. Therefore, the proposed model resolved the overfitting problem and improved the performance of CRC disease detection.

In the fourth experiment, we evaluated the trained EfficientNet-B0 model with the ALR using the CRC-VAL-HE-7K dataset. The fourth experiment’s results of the multi-classification using the trained EfficientNet-B0 model are shown in Table 9.

The average of the sensitivity, specificity, precision, F1-score, and accuracy were measured following the evaluation of the trained EfficientNet-B0 model with ALR on the test set of the CRC-VAL-HE-7K dataset. The EfficientNet-B0 model achieved an average of 94.52%, 99.45%, 94.41%, 94.36%, and 99% for sensitivity, specificity, precision, F1-score, and accuracy, respectively.

The EfficientNet-B0 model had the highest sensitivity, F1-score, and accuracy, at 100%, 98.26%, and 99.58%, respectively, for the BACK class. It had the highest specificity and precision, at 99.91% and 99.61%, respectively, for the ADI class.

The EfficientNet-B0 model had the lowest sensitivity and F1-score, at 77.91% and 84.32%, respectively, for the STR class. The MUS class achieved the lowest specificity, precision, and accuracy, at 98.27%, 82.49%, and 97.65%, respectively.

Figure 12 depicts the confusion matrix for the EfficientNet-B0 model on the test set of CRC-VAL-HE-7K with the ATL. The accuracy of EfficientNet-B0 was 95.5%, as it predicted 1278 HIs correctly of 1338 for the ADI class, 100% for the BACK class, 97% for the DEB class, 97.9% for the LYM class, 97.5% for the MUC class, 90.8% for the MUS class, 98.7% for the NORM class, 76.4% for the STR class, and 95.2% for the TUM class.

4.3. Model Results and Discussion

We trained and evaluated the proposed model over the two datasets (NCT-CRC-HE-100K and CRC-VAL-HE-7K). The NCT-CRC-HE-100K dataset was pre-processed by applying normalization and scaling. On the NCT-CRC-HE-100K dataset, EfficientNet-B0 model achieved 99.87%, 99.64%, 99.95%, 99.62%, and 99.63% for accuracy, sensitivity, specificity, precision, and F1-score, respectively. On the CRC-VAL-HE-7K dataset, the EfficientNet-B0 model achieved 99%, 94.52%, 99.45%, 94.41%, and 94.36% for accuracy, sensitivity, specificity, precision, and F1-score, respectively. As a result, the EfficientNet-B0 model outperforms the state of the art in this field.

After explaining and presenting our proposed technique, it is important to compare its performance, and the state-of-the-art methods listed in Table 10. Min-Jen Tsai et al. [5] and K. S. Wang et al. [16] achieved the highest results in Table 10. In Min-Jen Tsai et al. [5], the result of ResNet50 was 99.69% accuracy in the internal testing dataset and 99.32% in the external testing dataset. K. S. Wang et al. [16] evaluated their proposed model on four independent datasets and achieved an accuracy of 98.11% and an area under the curve (AUC) of 98.8%.

Looking from the patient’s perspective, our proposed model promises to enhance diagnostic speed, delay disease progression, and lower diagnostic costs. Evaluation of the EfficientNet-B0 model with ALR has demonstrated unparalleled performance compared to other existing frameworks, especially with tuning.

We have some future directions for this research scope: We will utilize the reinforced transformer network (RTN) [28], a type of neural network that merges components of transformer models with reinforcement learning strategies. This approach employs a transformer model to establish a policy that determines actions within a reinforcement learning (RL) environment. The self-attention mechanism enables the agent to evaluate various aspects of the state at the same time. Additionally, it integrates rewards from the RL framework into the training regime of the transformer. This integration helps steer the model to produce outputs that are not only coherent in language but also optimize a specific reward. Once a transformer model is pre-trained on a substantial dataset, it can be fine-tuned with reinforcement learning to tailor it for particular tasks that require sequential decision-making.

4.4. Comparison of the Proposed Model with the State-of-the-Art Methods

Our proposed EfficientNet-B0 model with the ALR has shown superior performance in terms of multi-classification accuracy compared to the latest methods outlined in Table 10. This improvement stems from our integration of ALR in place of the SLR used by all the techniques in Table 10, addressing their limitations. The ALR technique dynamically adjusts LR during training based on data and model characteristics, in contrast to the static LR approach. ALR facilitates faster convergence, particularly with datasets containing sparse features. It allocates different LRs to different parameters, enhancing learning optimization for sparse data sets. By adapting the LR based on gradients, these algorithms navigate the loss surface more effectively, leading to quicker convergence towards a local or global minimum. In contrast to manually selecting a fixed LR, which can be challenging, the ALR method automates LR adjustments during training, streamlining the process and reducing the need for manual tuning. Moreover, the dynamic LR adjustment by ALR enhances the robustness of the initial LR choice, mitigating convergence issues arising from inappropriate learning rates and ensuring stable and reliable training. The ALR method also addresses the vanishing gradient problem in deep networks by boosting the LR for parameters with small gradients, maintaining an optimal gradient magnitude for effective learning and preventing training stagnation.

5. Conclusions

In this research, we proposed a fine-tuned CRC diagnosis model using EfficientNet-B0 and the ALR. The ALR was set up to compare the training loss value at the beginning of each epoch to those of the previous ones. We increased the ALR for smaller training loss values and decreased it for larger ones. The proposed model with the ALR overcame the SLR’s overfitting problem, and the CRC detection performance was improved. The proposed model helps clinicians to diagnose CRC. Pathologists will be able to speed up diagnosis using the proposed model, which will also reduce diagnostic costs, reduce medical diagnostic errors, and improve early diagnosis.

The proposed model determined the CRC disease by multi-classifying the nine tissue classes. The proposed model underwent training and assessment using the NCT-CRC-HE-100K dataset and the CRC-VAL-HE-7K dataset. Both datasets were pre-processed by normalization and scaling techniques. In the multi-classification, for the NCT-CRC-HE-100K dataset, the accuracy, sensitivity, specificity, precision, and F1-score of the EfficientNet-B0 model were found to be 99.87%, 99.64%, 99.95%, 99.62%, and 99.63% on average, respectively. For the CRC-VAL-HE-7K dataset, the EfficientNet-B0 model achieved 99%, 94.52%, 99.45%, 94.41%, and 94.36% for accuracy, sensitivity, specificity, precision, and F1-score, respectively. On the CRC-VAL-HE-7K dataset, the EfficientNet-B0 model achieved 99%, 94.52%, 99.45%, 94.41%, and 94.36% for accuracy, sensitivity, specificity, precision, and F1-score, respectively. The EfficientNet-B0 model outperformed other existing frameworks in accuracy when fine-tuned. Our model, assessed using the ALR, showed promising results compared to other current frameworks after fine-tuning. However, processing speed poses a challenge for our developing system. Future research will explore the proposed model on various types of human cancers and employ hyper-optimization algorithms to automatically enhance hyper-parameterization. Additionally, we will use confidence levels as a measurement metric. Furthermore, we plan to implement the RTN that combines the strengths of reinforcement learning and transformer models to assess image quality. This method can be particularly effective for identifying and improving the diagnostic quality of HIs, which are crucial for detecting CRC diseases.

Author Contributions

Conceptualization, S.A.E.-G. and A.A.A.E.-A.; methodology, S.A.E.-G.; software, A.A.A.E.-A. and M.A.M.; validation, S.A.E.-G., A.A.A.E.-A. and M.A.M.; formal analysis, S.A.E.-G.; investigation, A.A.A.E.-A.; resources, M.A.M.; data curation M.A.M.; writing—original draft preparation, A.A.A.E.-A.; writing—review and editing, S.A.E.-G.; visualization M.A.M.; supervision, S.A.E.-G.; project administration, S.A.E.-G.; funding acquisition, S.A.E.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No. DGSSR-2023-02-02415.

Data Availability Statement

The datasets used in this article are NCT-CRC-HE-100K and CRC-VAL-HE-7K, benchmark datasets sourced from the Zenodo web page: https://zenodo.org/records/1214456 (Last accessed on 2 August 2024).

Acknowledgments

The authors extend their appreciation to the Deanship of Graduate Studies and Scientific Research at Jouf University for funding this work through research grant No. (DGSSR-2023-02-02415).

Conflicts of Interest

The authors declare no conflict of interest.

References

Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
Siegel, R.L.; Miller, K.D.; Fuchs, H.E.; Jemal, A. Cancer statistics. CA Cancer J. Clin. 2022, 72, 7–33. [Google Scholar] [CrossRef]
Qi, L.; Ke, J.; Yu, Z.; Cao, Y.; Lai, Y.; Chen, Y.; Gao, F.; Wang, X. Identification of prognostic spatial organization features in colorectal cancer microenvironment using deep learning on histopathology images. Med. Omics 2021, 2, 100008. [Google Scholar] [CrossRef]
Kather, J.N.; Weis, C.-A.; Bianconi, F.; Melchers, S.M.; Schad, L.R.; Gaiser, T.; Marx, A.; Zöllner, F.G. Multi-class texture analysis in colorectal cancer histology. Sci. Rep. 2016, 6, 27988. [Google Scholar] [CrossRef]
Elmore, J.G.; Longton, G.M.; Carney, P.A.; Geller, B.M.; Onega, T.; Tosteson, A.N.; Nelson, H.D.; Pepe, M.S.; Allison, K.H.; Schnitt, S.J.; et al. Diagnostic concordance among pathologists interpreting breast biopsy specimen. JAMA 2015, 313, 1122–1132. [Google Scholar] [CrossRef] [PubMed]
Tsai, M.-J.; Tao, Y.-H. Deep learning techniques for the classification of colorectal cancer tissue. Electronics 2021, 10, 1662. [Google Scholar] [CrossRef]
Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setiol, A.A.A.; Ciompi, F.; Ghafoorian, M.; Van Der Laak, J.A.; Van Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [Google Scholar] [CrossRef]
Waite, S.; Scott, J.M.; Legasto, A.; Kolla, S.; Gale, B. Systemic error in radiology. Am. J. Roentgenol. 2017, 209, 629–639. [Google Scholar] [CrossRef]
Komura, D.; Ishikawa, S. Machine learning methods for histopathological image analysis. Comput. Struct. Biotechnol. J. 2018, 16, 34–42. [Google Scholar] [CrossRef]
Kumar, A.; Vishwakarma, A.; Bajaj, V. CRCCN-Net: Automated framework for classification of colorectal tissue using histopathological images. Biomed. Signal Process. Control. 2023, 79, 104172. [Google Scholar] [CrossRef]
Shen, D.; Wu, G.; Suk, H.I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 2017, 19, 221–248. [Google Scholar] [CrossRef] [PubMed]
Smith, L.N. Cyclical learning rates for training neural networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017. [Google Scholar]
Loshchilov, I.; Hutter, F. SGDR: Stochastic gradient descent with warm restarts. arXiv 2017, arXiv:1608.03983. [Google Scholar]
Izmailov, P.; Podoprikhin, D.; Garipov, T.; Vetrov, D.; Wilson, A.G. Averaging weights leads to wider optima and better generalization. arXiv 2019, arXiv:1803.05407. [Google Scholar]
Johny, A.; Madhusoodanan, K.N. Dynamic learning rate in deep cnn model for metastasis detection and classification of histopathology images. Comput. Math. Methods Med. 2021, 2021, 5557168. [Google Scholar] [CrossRef] [PubMed]
Wang, K.S.; Yu, G.; Xu, C.; Meng, X.H.; Zhou, J.; Zheng, C.; Deng, Z.; Shang, L.; Liu, R.; Su, S.; et al. Accurate diagnosis of colorectal cancer based on histopathology images using artificial intelligence. BMC Med. 2021, 19, 76. [Google Scholar] [CrossRef]
Zhao, K.; Li, Z.; Yao, S.; Wang, Y.; Wu, X.; Xu, Z.; Wu, L.; Huang, Y.; Liang, C.; Liu, Z. Artificial intelligence quantified tumour-stroma ratio is an independent predictor for overall survival in resectable colorectal cancer. EBioMedicine 2020, 61, 103054. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Jonnagaddala, J.; Cen, M.; Zhang, H.; Xu, S. Colorectal cancer survival prediction using deep distribution based multiple-instance learning. Entropy 2022, 24, 1669. [Google Scholar] [CrossRef]
Bustos, A.; Payá, A.; Torrubia, A.; Jover, R.; Llor, X.; Bessa, X.; Castells, A.; Carracedo, Á.; Alenda, C. xDEEP-MSI: Explainable bias-rejecting microsatellite instability deep learning system in colorectal cancer. Biomolecules 2021, 11, 1786. [Google Scholar] [CrossRef]
Kather, J.N.; Krisam, J.; Charoentong, P.; Luedde, T.; Herpel, E.; Weis, C.A.; Gaiser, T.; Marx, A.; Valous, N.A.; Ferber, D.; et al. Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLoS Med. 2019, 16, e1002730. [Google Scholar] [CrossRef]
Shen, Y.; Luo, Y.; Shen, D.; Ke, J. RandStainNA: Learning stain-agnostic features from histology slides by bridging stain augmentation and normalization. arXiv 2022, arXiv:2206.12694. [Google Scholar]
Khvostikov, A.; Krylov, A.; Mikhailov, I.; Malkov, P.; Danilova, N. Tissue type recognition in whole slide histological images. In Proceedings of the 31st International Conference on Computer Graphics and Vision, Nizhny Novgorod, Russia, 27–30 September 2021; pp. 496–507. [Google Scholar]
Kather, J.N.; Halama, N.; Marx, A. 100,000 Histological Images of Human Colorectal Cancer and Healthy Tissue, version 0.1; Zenodo: Geneva, Switzerland, 2018. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2020; pp. 6105–6114. [Google Scholar]
Alhichri, H.; Alswayed, A.S.; Bazi, Y.; Ammour, N.; Alajlan, N.A. Classification of remote sensing images using efficientnet-b3 cnn model with attention. IEEE Access 2021, 9, 14078–14094. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Putra, T.A.; Rufaida, S.I.; Leu, J. Enhanced skin condition prediction through machine learning using dynamic training and testing augmentation. IEEE Access 2020, 4, 40536–40546. [Google Scholar] [CrossRef]
Lu, Y.; Fu, J.; Li, X.; Zhou, W.; Liu, S.; Zhang, X.; Wu, W.; Jia, C.; Liu, Y.; Chen, Z. RTN: Reinforced Transformer Network for Coronary CT Angiography Vessel-level Image Quality Assessment. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2022, Singapore, 18–22 September 2022; Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S., Eds.; Lecture Notes in Computer Science; Springer: New York, NY, USA, 2022; Volume 13431. [Google Scholar]

Figure 1. The saddle point of the loss landscape.

Figure 2. Images displaying the nine classifications present in the NCT-CRC-HE-100K dataset.

Figure 3. The overall model architecture.

Figure 4. The modified EfficientNet-B0 architecture.

Figure 5. Confusion matrix for the first experiment using the SLR on the test set of NCT-CRC-HE-100K.

Figure 6. Training and validation loss of EfficientNet-B0 on NCT-CRC-HE-100K using the SLR.

Figure 7. Training and validation accuracy of EfficientNet-B0 on NCT-CRC-HE-100K using the SLR.

Figure 8. Confusion matrix for the second experiment using the SLR on the test set of CRC-VAL-HE-7K.

Figure 9. Confusion matrix for the third experiment using the ALR on the test set of NCT-CRC-HE-100K.

Figure 10. Training and validation loss of EfficientNet-B0 on NCT-CRC-HE-100K using the ALR.

Figure 11. Training and validation accuracy of EfficientNet-B0 on NCT-CRC-HE-100K using the ALR.

Figure 12. Confusion matrix for the fourth experiment using the ALR on the test set of CRC-VAL-HE-7K.

Table 1. The nine classes of the NCT-CRC-HE-100K dataset in multi-classification.

Class	Image Count
Adipose (ADI)	10,407
Background (BACK)	10,566
Debris (DEB)	11,512
Lymphocytes (LYM)	11,557
Mucus (MUC)	8896
Smooth muscle (MUS)	13,536
Normal colon mucosa (NORM)	8763
Cancer-associated stroma (STR)	10,446
Colorectal adenocarcinoma epithelium (TUM)	14,317

Table 2. The distribution of HIs among the nine classes of the test set of NCT-CRC-HE-100K.

Class	Image Count
ADI	1561
BACK	1585
DEB	1727
LYM	1733
MUC	1334
MUS	2031
NORM	1315
STR	1567
TUM	2148

Table 3. The distribution of HIs among the nine classes of the test set of NCT-CRC-HE-7K.

Class	Image Count
ADI	1338
BACK	847
DEB	339
LYM	634
MUC	1035
MUS	592
NORM	741
STR	421
TUM	1233

Table 4. The scale for EfficientNet-B0.

Model Name	Width Coefficient	Depth Coefficient	Resolution	Dropout Rate
EfficientNet-B0	1.0	1.0	224	0.2

Table 5. The device specifications.

Item	Specification
CPU	1.80 GHz 2.30 GHz Intel(R) i7-10510U processor
Installed Random Access Memory	16 gigabytes
OS	x64-bit Windows 10 Home

Table 6. The results of the first experiment using the SLR on the test set of NCT-CRC-HE-100K.

Class	Sensitivity (%)	Specificity (%)	Precision (%)	F1-Score (%)	Accuracy (%)
ADI	89.81	99.99	99.93	94.60	98.93
BACK	75.27	99.55	95.14	84.04	96.98
DEB	97.39	83.28	43.11	59.76	84.90
LYM	20.37	100	100	33.84	90.80
MUC	49.85	96.98	61.69	55.14	92.79
MUS	90.50	89.79	58.13	70.79	89.89
NORM	92.40	97.94	81.16	86.42	97.45
STR	9.25	100	100	16.94	90.52
TUM	86.87	97.35	84.55	85.69	95.85
Average	67.97	96.10	80.41	65.25	93.12

Table 7. The results of the second experiment using the SLR on the test set of CRC-VAL-HE-7K.

Class	Sensitivity (%)	Specificity (%)	Precision (%)	F1-Score (%)	Accuracy (%)
ADI	52.77	99.85	98.74	68.78	91.07
BACK	93.39	99.86	98.88	96.05	99.09
DEB	99.71	85.73	25.72	40.90	86.39
LYM	1.26	99.98	88.89	2.49	91.27
MUC	64.54	98.08	84.99	73.37	93.25
MUS	70.27	83.49	27.66	39.69	82.40
NORM	92.71	98.15	85.24	88.82	97.59
STR	2.61	99.78	42.31	4.92	94.08
TUM	86.54	97.43	87.46	87	95.56
Average	62.64	95.82	71.10	55.78	92.30

Table 8. The results of the third experiment using the ALR on the test set of NCT-CRC-HE-100K.

Class	Sensitivity (%)	Specificity (%)	Precision (%)	F1-Score (%)	Accuracy (%)
ADI	100	99.99	99.94	99.97	99.99
BACK	100	100	100	100	100
DEB	99.59	99.95	99.59	99.59	99.91
LYM	99.88	99.98	99.88	99.88	99.97
MUC	99.55	99.93	99.33	99.44	99.90
MUS	99.46	99.98	99.85	99.65	99.91
NORM	99.39	99.96	99.62	99.51	99.91
STR	99.43	99.84	98.61	99.01	99.79
TUM	99.44	99.95	99.72	99.58	99.88
Average	99.64	99.95	99.62	99.63	99.87

Table 9. The results of the fourth experiment using the ALR on the test set of CRC-VAL-HE-7K.

Class	Sensitivity (%)	Specificity (%)	Precision (%)	F1-Score (%)	Accuracy (%)
ADI	100	99.99	99.94	99.97	99.99
BACK	100	100	100	100	100
DEB	99.59	99.95	99.59	99.59	99.91
LYM	99.88	99.98	99.88	99.88	99.97
MUC	99.55	99.93	99.33	99.44	99.90
MUS	99.46	99.98	99.85	99.65	99.91
NORM	99.39	99.96	99.62	99.51	99.91
STR	99.43	99.84	98.61	99.01	99.79
TUM	99.44	99.95	99.72	99.58	99.88
Average	99.64	99.95	99.62	99.63	99.87

Table 10. Comparison of the proposed model with the state-of-the-art methods.

Reference	Methodology	Performance	Dataset
Lin Qi et al. [3]	CNN	99%	NCT-CRC-HE-100k
Lin Qi et al. [3]	CNN	95%	CRC-VAL-HE-7K
Min-Jen Tsai et al. [5]	CNN	99% for the internal and 94.3% for the external testing set	NCT-CRC-HE-100K
Min-Jen Tsai et al. [5]	CNN	99.69% for the internal and 99.32% for the external testing set	Kather-texture-2016-image
Anurodh Kumar et al. [7]	CNN for the classification of multi-class colorectal tissue	93.50%	Colorectal histology
		96.26%	NCT-CRC-HE-100K
		99.21%	Merged dataset
K. S. Wang et al. [16]	CNN	98.11%	14,234 CRC WSIs from Fourteen independent sources
Ke Zhao et al. [17]	CNN-based TSR	75.9%	NCT-HE-100K and (CRC-VAL-HE-7K)
Xingyu Li et al. [18]	CNN	60.6% 64.7%	MCO CRC
Xingyu Li et al. [18]	CNN	60.6% 64.7%	TCGA COAD-READ
Aurelia Bustos et al. [19]	DL systems	88%	NCT-CRC-HE-100K and CRC-VAL-HE-7K
Jakob Nikolas Kather et al. [20]	VGG19 CNN	99% for the internal and 94.3% for the external testing dataset	NCT-HE-100K and CRC-VAL-HE-7K
Yiqing Shen et al. [21]	RandStainNA is composed of stain normalization (SN) and stain augmentation (SA)	94.66%	NCT-CRC-HE-100K
Alexander Khvostikov et al. [22]	CNN	92.9%	CRC-VAL-HE-7K
Alexander Khvostikov et al. [22]	CNN	98%	PATH-DTMSU
Proposed Model	Modified EfficientNet-B0 and ALR	99.87%	NCT-CRC-HE-100K
Proposed Model	Modified EfficientNet-B0 and ALR	99%	CRC-VAL-HE-7K

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abd El-Ghany, S.; Mahmood, M.A.; Abd El-Aziz, A.A. Adaptive Dynamic Learning Rate Optimization Technique for Colorectal Cancer Diagnosis Based on Histopathological Image Using EfficientNet-B0 Deep Learning Model. Electronics 2024, 13, 3126. https://doi.org/10.3390/electronics13163126

AMA Style

Abd El-Ghany S, Mahmood MA, Abd El-Aziz AA. Adaptive Dynamic Learning Rate Optimization Technique for Colorectal Cancer Diagnosis Based on Histopathological Image Using EfficientNet-B0 Deep Learning Model. Electronics. 2024; 13(16):3126. https://doi.org/10.3390/electronics13163126

Chicago/Turabian Style

Abd El-Ghany, Sameh, Mahmood A. Mahmood, and A. A. Abd El-Aziz. 2024. "Adaptive Dynamic Learning Rate Optimization Technique for Colorectal Cancer Diagnosis Based on Histopathological Image Using EfficientNet-B0 Deep Learning Model" Electronics 13, no. 16: 3126. https://doi.org/10.3390/electronics13163126

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Dynamic Learning Rate Optimization Technique for Colorectal Cancer Diagnosis Based on Histopathological Image Using EfficientNet-B0 Deep Learning Model

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Materials

3.2. Model Architecture and Training

3.2.1. ALR

3.2.2. EfficientNet-B0

4. Implementation and Evaluation of the Proposed Model

4.1. Metrics for Model Evaluation

4.2. Model Implementation

4.3. Model Results and Discussion

4.4. Comparison of the Proposed Model with the State-of-the-Art Methods

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI