*Article* **Customized Deep Learning Classifier for Detection of Acute Lymphoblastic Leukemia Using Blood Smear Images**

**Niranjana Sampathila <sup>1</sup> , Krishnaraj Chadaga <sup>2</sup> , Neelankit Goswami <sup>1</sup> , Rajagopala P. Chadaga <sup>3</sup> , Mayur Pandya <sup>2</sup> , Srikanth Prabhu 2,\*, Muralidhar G. Bairy <sup>1</sup> , Swathi S. Katta 4,\*, Devadas Bhat <sup>1</sup> and Sudhakara P. Upadya <sup>5</sup>**


**Abstract:** Acute lymphoblastic leukemia (ALL) is a rare type of blood cancer caused due to the overproduction of lymphocytes by the bone marrow in the human body. It is one of the common types of cancer in children, which has a fair chance of being cured. However, this may even occur in adults, and the chances of a cure are slim if diagnosed at a later stage. To aid in the early detection of this deadly disease, an intelligent method to screen the white blood cells is proposed in this study. The proposed intelligent deep learning algorithm uses the microscopic images of blood smears as the input data. This algorithm is implemented with a convolutional neural network (CNN) to predict the leukemic cells from the healthy blood cells. The custom ALLNET model was trained and tested using the microscopic images available as open-source data. The model training was carried out on Google Collaboratory using the Nvidia Tesla P-100 GPU method. Maximum accuracy of 95.54%, specificity of 95.81%, sensitivity of 95.91%, F1-score of 95.43%, and precision of 96% were obtained by this accurate classifier. The proposed technique may be used during the pre-screening to detect the leukemia cells during complete blood count (CBC) and peripheral blood tests.

**Keywords:** acute lymphoblastic leukemia (ALL); blood smear; convolutional neural networks; deep learning; white blood cells

## **1. Introduction**

ALL (acute lymphoblastic leukemia) is a lymphoid blood cell malignancy characterized by the development of immature lymphocytes [1]. These impaired white blood cells harm the entire body and bone marrow, putting the immune system as a whole at risk. It also inhibits the bone marrow's capacity to generate red blood cells and platelets. Moreover, these cancerous cells can enter the bloodstream and cause serious harm to other regions of the human body, including the kidney, liver, brain, heart, and other organs, leading to the development of other deadly cancers. According to worldwide statistics by the World Health Organization (WHO)'s International Agency for Research on Cancer, they reported 437,033 cases of leukemia and 303,006 deaths as of 2022 [2]. The blood, bone marrow, and extramedullary sites all show signs of ALL. This deadly disease is categorized into T-lymphoblastic leukemia (Pre-T), B-lymphoblastic leukemia (Pre-B), and B lymphoblastic leukemia according to the WHO [3]. Mature-B lymphoblastic leukemia starts in the bone marrow and releases an abnormal quantity of white blood cells in the body. These dangerously formed cells are known as "leukemia cells" or "blasts" because they are severely

**Citation:** Sampathila, N.; Chadaga, K.; Goswami, N.; Chadaga, R.P.; Pandya, M.; Prabhu, S.; Bairy, M.G.; Katta, S.S.; Bhat, D.; Upadya, S.P. Customized Deep Learning Classifier for Detection of Acute Lymphoblastic Leukemia Using Blood Smear Images. *Healthcare* **2022**, *10*, 1812. https://doi.org/10.3390/ healthcare10101812

Academic Editor: Mahmudur Rahman

Received: 18 August 2022 Accepted: 9 September 2022 Published: 20 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

undeveloped. Leukemia is defined by Khullar et al. [4] as an abnormal hyper-proliferation of polymorphonuclear leukocytes which do not produce solid tumor aggregates, i.e., liquid cancer. Acute leukemia cells mature fast and can be lethal if not detected early.

Machine learning plays a crucial part in the battle against various life-threatening diseases [5–7]. It also contributes significantly to educational and clinical studies [8–11]. Machine learning has shown significant potential in medical, engineering, psychology, multi-disciplinary science, earth sciences, analytical practices, healthcare, and other domains. Using blood smear images and CNN, this article provides an early filtering deep learning strategy for correctly identifying ALL. This algorithm can assist clinicians and medical personnel to a great extent. Acute lymphoblastic leukemia is a type of malignant blood cell cancer that affects mostly children and adults above age 65 [12]. Leukocytes, or white blood cells as they are commonly known, make up around one percent of all blood cells. It can be observed in the bone marrow, blood, and extramedullary sites. In this type of leukemia, the immature leukocytes proliferate in the body rapidly, bringing about the need for early detection. Usually, the diagnosis is done based on a bone marrow examination. This method is labor intensive, time-consuming, and might generate inaccurate results. Therefore, there is a need for automation. The peripheral blood smear (PBS) test is one of the methods used for screening for leukemia [13]. A blood samples smear is analyzed under the microscope. To automate the process of recognizing ALL, deep learning is already playing a significant role and will help in reducing manual error as well. Over the past, several strategies have been used to expedite the process of detecting leukemia using artificial intelligence. Automated analysis of PBS introduces a smart healthcare facility by screening a sample for diagnostic purposes [14,15].

The robustness and the performance of CNN inspired our research [16]. Therefore, we developed customized CNN architecture to classify ALL. According to the study, the accuracy and reliability of prediction have improved. In this research, we put forward a method to automate the process of lymphoblast (blast cell) detection in the single-celled image to help in the detection of leukemia. The C\_NMC\_2019 dataset was used to train the model, and evaluation was done on the same [17]. Deep learning eliminates the process of using hand-crafted features for classification. Instead, the classification is done using a CNN custom architecture. The average accuracy after 6-fold cross-validation on the reliable C\_NMC\_2019 dataset was 94.95%. A computer-aided diagnosis system can be built on this result, potentially reducing the time and error required for the same.

Over the years, several approaches have been utilized to automate the process of detecting ALL. Many of the methodological procedures resolve the utilization of various machine and deep learning algorithms. Jiang et al. [18] applied Vit-CNN ensemble models to diagnose ALL. An accuracy of 89% was obtained by the models, claims the study. Differentiate enhancement-random sampling (DEES) was used to prevent data imbalance. Leukemia subtypes identification using CNN and microscopic images were researched in [19]. Two image datasets, ASH-Image-Bank and ALL-IDB, were used. The accuracy obtained was 88.25% and 81.74%, respectively. This algorithm was compared with traditional ML algorithms, such as support vector machine, decision tree, and KNN. Ghaderzadeh et al. [20] used CNN to diagnose B-ALL and its further subtypes from peripheral smear images. Ten well-known CNN architectures were used. These models obtained good results with accuracy, specificity, and sensitivity of 99.85%, 99.89%, and 99.82%, respectively. The dataset was obtained from a Kaggle competition. Qiao et al. [21] used a compact CNN model for the preliminary screening of ALL. Two datasets, APL-Cytomorphology-JHH and APL-Cytomorphology-LMU, were used. They yielded a precision of 96.53% and 99.20%, respectively. Promyelocytes were distinguished from normal leukocytes in this study. A hybrid model that used mutual information was used to diagnose ALL [22]. The Deep CNN classifier achieved an accuracy of 98%. The AA-IDB2 database was considered for this study. The rest of the research is described in Table 1.

[25] DNA Se-

[23]


our classifier along with a comparison of the outcomes from other established studies. The

**Table 1.** Existing research that diagnoses ALL using deep learning. ALL-IDB2

[24] IDB dataset AlexNet 96% - 96.74%

CNN and other ML models

leukocytes in this study. A hybrid model that used mutual information was used to diagnose ALL [22]. The Deep CNN classifier achieved an accuracy of 98%. The AA-IDB2 da-

**Used Accuracy Precision Recall**

75% - -

CNN and SVM 99% 99% 99%

tabase was considered for this study. The rest of the research is described in Table 1.

To further support our claim, the remainder of the article has been described in the following manner. The following section describes the workflow and image processing techniques used, and the network is explained. Section 3 presents the results produced by our classifier along with a comparison of the outcomes from other established studies. The conclusion of the article is described in Section 4. conclusion of the article is described in Section 4. **2. Materials and Methods** This section explains the workflow, image processing techniques, and network archi-

#### **2. Materials and Methods** tecture developed for leukemia classification. Figure 1 describes the workflow followed

*Healthcare* **2022**, *10*, x FOR PEER REVIEW 3 of 16

**Table 1.** Existing research that diagnoses ALL using deep learning.

**Research Dataset Algorithm** 

Three hybrid image databases

quence images

BCCD

This section explains the workflow, image processing techniques, and network architecture developed for leukemia classification. Figure 1 describes the workflow followed for this work. After acquiring the dataset, augmentation was carried out to increase the size as well as the robustness. Afterward, the image data was fed to the CNN for automated feature extraction. The network then managed to classify the images as either ALL (blast cell) or HEM (healthy cell). for this work. After acquiring the dataset, augmentation was carried out to increase the size as well as the robustness. Afterward, the image data was fed to the CNN for automated feature extraction. The network then managed to classify the images as either ALL (blast cell) or HEM (healthy cell).

**Figure 1.** Leukemia screening **Figure 1.** system. Leukemia screening system.

#### *2.1. Input Data* The dataset used in this research belongs to the "ALL Challenge dataset of ISBI, 2019" [30–33]. The dataset contained cell images of both normal individuals and patients diag-[30–33]. The dataset contained cell images of both normal individuals and patients diag-

*2.1. Input Data*

*2.1. Input Data*

The dataset used in this research belongs to the "ALL Challenge dataset of ISBI, 2019" [30–33]. The dataset contained cell images of both normal individuals and patients diagnosed with ALL. As shown in Figure 2, the original images acquired from a digital microscope having various components of the blood smear are pictorially depicted. The CNN classification for leukemia is done using segmented white blood cell (WBC) regions. This requires preprocessing and color segmentation. The subsequent process involves finding better segmentation of the region of interest. The HSI color space-based images of blood smears are shown in Figure 3. The white blood cells are seen to have better contrast than the other components of the image. To further localize the white blood cells, we selected the saturated component as it describes the intensity of the color, which is shown in Figure 4. This was then converted to a binary image by performing thresholding, as shown in Figure 5. The thresholding was performed on the gray scale image in such a way that all pixels in the range of (180–255) were converted to white, while all pixels belonging to values below this threshold were converted to black. The segmented image was obtained by finding the product of the original image and the segmented image, which was then used for further processing, as shown in Figure 6. nosed with ALL. As shown in Figure 2, the original images acquired from a digital microscope having various components of the blood smear are pictorially depicted. The CNN classification for leukemia is done using segmented white blood cell (WBC) regions. This requires preprocessing and color segmentation. The subsequent process involves finding better segmentation of the region of interest. The HSI color space-based images of blood smears are shown in Figure 3. The white blood cells are seen to have better contrast than the other components of the image. To further localize the white blood cells, we selected the saturated component as it describes the intensity of the color, which is shown in Figure 4. This was then converted to a binary image by performing thresholding, as shown in Figure 5. The thresholding was performed on the gray scale image in such a way that all pixels in the range of (180–255) were converted to white, while all pixels belonging to values below this threshold were converted to black. The segmented image was obtained by finding the product of the original image and the segmented image, which was then used for further processing, as shown in Figure 6. nosed with ALL. As shown in Figure 2, the original images acquired from a digital microscope having various components of the blood smear are pictorially depicted. The CNN classification for leukemia is done using segmented white blood cell (WBC) regions. This requires preprocessing and color segmentation. The subsequent process involves finding better segmentation of the region of interest. The HSI color space-based images of blood smears are shown in Figure 3. The white blood cells are seen to have better contrast than the other components of the image. To further localize the white blood cells, we selected the saturated component as it describes the intensity of the color, which is shown in Figure 4. This was then converted to a binary image by performing thresholding, as shown in Figure 5. The thresholding was performed on the gray scale image in such a way that all pixels in the range of (180–255) were converted to white, while all pixels belonging to values below this threshold were converted to black. The segmented image was obtained by finding the product of the original image and the segmented image, which was then used for further processing, as shown in Figure 6.

The dataset used in this research belongs to the "ALL Challenge dataset of ISBI, 2019"

*Healthcare* **2022**, *10*, x FOR PEER REVIEW 4 of 16

*Healthcare* **2022**, *10*, x FOR PEER REVIEW 4 of 16

**Figure 2.** Original images. **Figure 2.** Original images. **Figure 2.** Original images.

**Figure 3.** HIS color space. **Figure 3.** HIS color space. **Figure 3.** HIS color space.

**Figure 4.** Saturation component. **Figure 4.** Saturation component. **Figure 4.** Saturation component. **Figure 4.** Saturation component.

**Figure 5.** Images after thresholding. **Figure 5.** Images after thresholding. **Figure 5.** Images after thresholding. **Figure 5.** Images after thresholding.

**Figure 6.** Image after segmentation. **Figure 6.** Image after segmentation. **Figure 6.** Image after segmentation. **Figure 6.** Image after segmentation.

A total of 10,661 images were collected from 73 participants from the C NMC 2019 dataset. There were 7272 images of blast cells and 3389 images of healthy cells in total. The images in this dataset were uniform with a size of 450 × 450 × 3 and had been pre-processed such that only the object of interest (WBC) was included, and everything else was A total of 10,661 images were collected from 73 participants from the C NMC 2019 dataset. There were 7272 images of blast cells and 3389 images of healthy cells in total. The images in this dataset were uniform with a size of 450 × 450 × 3 and had been pre-processed such that only the object of interest (WBC) was included, and everything else was A total of 10,661 images were collected from 73 participants from the C NMC 2019 dataset. There were 7272 images of blast cells and 3389 images of healthy cells in total. The images in this dataset were uniform with a size of 450 × 450 × 3 and had been pre-processed such that only the object of interest (WBC) was included, and everything else was A total of 10,661 images were collected from 73 participants from the C NMC 2019 dataset. There were 7272 images of blast cells and 3389 images of healthy cells in total. The images in this dataset were uniform with a size of 450 × 450 × 3 and had been pre-processed such that only the object of interest (WBC) was included, and everything else was padded

with black. Figure 2 gives a glimpse of the kind of image available in the dataset. Figure 7a represents the deadly blast cell, and Figure 7b represents a normal white blood cell. This dataset is reliable since expert oncologists have done the blast/healthy cell classification. cell. This dataset is reliable since expert oncologists have done the blast/healthy cell classification. cell. This dataset is reliable since expert oncologists have done the blast/healthy cell classification.

padded with black. Figure 2 gives a glimpse of the kind of image available in the dataset. Figure 7a represents the deadly blast cell, and Figure 7b represents a normal white blood

padded with black. Figure 2 gives a glimpse of the kind of image available in the dataset. Figure 7a represents the deadly blast cell, and Figure 7b represents a normal white blood

*Healthcare* **2022**, *10*, x FOR PEER REVIEW 6 of 16

*Healthcare* **2022**, *10*, x FOR PEER REVIEW 6 of 16

**Figure 7.** Images in the dataset: (**a**) represent blast cells; (**b**) represent healthy cell. **Figure 7.** Images in the dataset: (**a**) represent blast cells; (**b**) represent healthy cell. **Figure 7.** Images in the dataset: (**a**) represent blast cells; (**b**) represent healthy cell.

#### *2.2. Data Augmentation 2.2. Data Augmentation 2.2. Data Augmentation*

The number of images provided to the neural network plays a pivotal role in the feature extraction procedure. The dataset used had an imbalance of images of the two classes. This would make the classification process biased. So, to remove the bias, the images were subjected to auto orientation and resizing. Augmentation steps involved were: (1) vertical horizontal flipping, (2) clockwise and anti-clockwise rotation, (3) random brightness adjustments (4) random Gaussian blur with the addition of pepper and salt noise to the pixels. The final dataset consisted of 12,000 images, with 6000 images in each class. Figure 8 depicts the images obtained after augmentation. The number of images provided to the neural network plays a pivotal role in the feature extraction procedure. The dataset used had an imbalance of images of the two classes. This would make the classification process biased. So, to remove the bias, the images were subjected to auto orientation and resizing. Augmentation steps involved were: (1) vertical horizontal flipping, (2) clockwise and anti-clockwise rotation, (3) random brightness adjustments (4) random Gaussian blur with the addition of pepper and salt noise to the pixels. The final dataset consisted of 12,000 images, with 6000 images in each class. Figure 8 depicts the images obtained after augmentation. The number of images provided to the neural network plays a pivotal role in the feature extraction procedure. The dataset used had an imbalance of images of the two classes. This would make the classification process biased. So, to remove the bias, the images were subjected to auto orientation and resizing. Augmentation steps involved were: (1) vertical horizontal flipping, (2) clockwise and anti-clockwise rotation, (3) random brightness adjustments (4) random Gaussian blur with the addition of pepper and salt noise to the pixels. The final dataset consisted of 12,000 images, with 6000 images in each class. Figure 8 depicts the images obtained after augmentation.

**Figure 8.** Original image of a blast cell and its augmented versions for the C\_NMC\_2019 dataset. **Figure 8. Figure 8.** Original Original image of a blast cell and its augmented versions for the C\_NMC\_2019 dataset. image of a blast cell and its augmented versions for the C\_NMC\_2019 dataset.

#### *2.3. CNN*

A CNN has a sequence of layers that transforms an image volume into an output volume through a differential function. The architecture of CNN was inspired by the visual cortex of the brain. The architecture fits the data better because of the reduction in the number of parameters involved and the reusability of weights. There are different types of layers in a convolutional network which includes: convolution (CONV), pooling (Pool), and fully connected (FC).

• **Convolution (CONV) Layers** The convolution layers are the main building blocks of CNN. They comprise a set of independent filters, which are convolved with the input volume to compute an activation map made of neurons. The useful features from the input images are extracted by having multilayered architecture. Each of the filters can be of a different type, and they extract different features, such as vertical lines, horizontal lines, and edges. The CNN layers help in extracting features through convolution. The extracted deep features play a major role in the decision support system.

The 2D convolution is given by Equations (1) and (2), respectively.

$$y[m,\ n] = \mathbf{x}[m,\ n] \times \mathbf{h}[m,\ n] \tag{1}$$

$$y[m, n] = \sum\_{i = -\infty}^{\infty} \sum\_{j = -\infty}^{\infty} x[i, j]. \ h[m - i, n - j] \tag{2}$$

where, *x*[*m*, *n*] = Input

*m*, *n* = no. of rows, no. of columns, respectively

*i*, *j* = row index and column index

Similarly, the size of the image after convolution is given by Equation (3):

$$Size = \left\lfloor \left( \frac{m + 2p - n}{s} + \mathbf{1}, \frac{m + 2p - n}{s} + \mathbf{1} \right) \right\rfloor \tag{3}$$

where *m* = number of input features


the learning of each layer in an independent manner. An advantage of using batch normalization is that learning rates can be set higher as it makes sure that no activation goes high or low. Batch normalization also reduces overfitting as it has regularization effects. To improve the stability of the neural network, batch normalization normalizes the previous activation layers' outputs. This process adds two parameters to each layer, so the normalized output gets multiplied by gamma (standard deviation) and beta (mean). Mathematically, the mini-batch mean is given by Equation (4):

$$
\mu\_B = \frac{1}{m} \sum\_{i=1}^m x\_i \tag{4}
$$

Mini-batch variance is shown in Equation (5):

$$
\sigma\_{\mathcal{B}}^2 = \frac{1}{m} \sum\_{i=1}^m (\mathbf{x}\_i - \mu\_{\mathcal{B}}) \tag{5}
$$

Normalization is given by Equation (6):

$$
\hat{\mathfrak{x}}\_{l} = \frac{\mathfrak{x}\_{l} - \mu\_{\mathbf{B}}}{\sqrt{\sigma\_{\mathbf{B}}^{2} + \mathfrak{e}}} \tag{6}
$$


$$L(y\_\prime \circ \hat{y}) = -\sum\_{j=0}^{M} \sum\_{i=0}^{N} \left[ y\_{ij} \times \log \left( y\_{ij}^{\circ} \right) \right] \tag{7}$$

where *y*-hat is the predicted expected value and y is the observed value.

• **Optimizer** Optimizers are algorithms used to change the attributes of the neural network such as learning rates and weights to reduce the loss. In the model, adaptive movement estimation (Adam) has been incorporated. Adam is a combination of the root mean square propagation (RMSProp) and adaptive gradient algorithm (AdaGrad). The proposed convolution neural network architecture is shown in Figure 2, which makes use of pooling layers, fully connected layers, convolutional layers, dropout and batch normalization. Features were automatically extracted from the input images by the CNN. Feature extraction is then performed by the convolutional layers and the pooling layers. Four convolutional layers, four max-pooling layers, and 3 fully connected layers were utilized. Batch normalization and Dropout were applied to account for overfitting, vanishing, and exploding of gradients. This model consisted of 95,099,266 parameters in total. The architecture for the designed model can be seen in Figure 9. A more detailed description of the model is described in Table 2.

**Figure 9.** The architecture of CNN (ALLNet). **Figure 9.** The architecture of CNN (ALLNet).

**Table 2. ALLNet Architecture**. **Table 2.** ALLNet Architecture.


Conv2D (17, 17, 512) 1,769,984 Total Parameters:—95,099,266, Trainable Parameters:—95,097,474, Non-Trainable Parameters:—1792.

#### Batch Normalization (17, 17, 512) 2048 **3. Results and Discussion**

#### *3.1. Performance Metrics*

1792.

Max\_Pooling (6, 6, 512) - Dropout (6, 6, 512) - The performance metrics estimated include *accuracy*, *precision*, *recall*, and *F*1 *score*: *Accuracy*: It is the ratio of true positive predictions to the total number of predictions and is given in Equation (8).

$$Accuracy = \frac{\text{True positive} + \text{True Negative}}{\text{Total samples}} \times 100\tag{8}$$

Dropout 4096 - *Precision*: It is the ability of the model to return only relevant instances, given by Equation (9).

$$Precision = \frac{\text{True Positive (TP)}}{\text{True Positive} (TP) + \text{False Positive (FP)}} \times 100\tag{9}$$

Output 2 8194 Total Parameters:—95,099,266, Trainable Parameters:—95,097,474, Non-Trainable Parameters:— *Recall*: It is the ability of the model to identify all relevant instances, as shown in Equation (10). It emphasizes false negative results. This is also called the *true positive* rate or sensitivity.

$$Recall = \frac{\text{True positive (TP)}}{\text{True positive (TP)} + \text{FalseNegative (FN)}} \times 100\tag{10}$$

*Specificity***:** It is an important metric to identify false-positive results. It is described in Equation (11)

$$Specificity = \frac{\text{True negative (TN)}}{\text{True negative (TN)} + \text{FalsePositive (FP)}} \times 100\tag{11}$$

*F***1** *Score*: This is the harmonic mean of precision and recall and is used to indicate a balance between *Precision* and *Recall* given in Equation (12).

$$F1\text{ Score} = 2 \times \frac{Precision \times Recall}{Precision + Recall} \tag{12}$$

#### *3.2. Model Evaluation*

The ALL-NET architecture consists of a total of 4 convolution layers alternated with max pooling layers. This setting is followed by a total of 3 fully connected layers. The reason for keeping a max pooling layer after every convolution is to maintain the size of the processed image instance to a minimum. This can create a potential problem; if excessive max pooling is done, then it can lead to potential loss of information or patterns which are not spanning wide or large enough. Further, as observed previously, the data augmentation step has already added noise to the image; such noise can also affect the operation of max pooling. Batch normalization is carried out during every alternate max pooling to make sure that the flowing data is normalized, and every neuron has some input to give. Finally, to avoid any overfitting, we use dropout. These layers will ensure that multiple neurons having similar weight vectors are not unintentionally learning the same pattern in the given image instance. The learning rate was initially set at 1 <sup>×</sup> <sup>10</sup>−<sup>3</sup> initially, but it was observed that 1 <sup>×</sup> <sup>10</sup>−<sup>5</sup> gave marginally better improvement during the learning phase. The batch size for the training images was set at 16. While the epochs were initially selected to be in the range of 50 to 100. As pointed out in the later sections, it was found that the model was prone to overfitting if the epochs exceeded 70. Further fine tuning was done, and 65 epochs were finally selected as the final parameter value.

After image augmentation, the number of test instances available as blast cell images and healthy cell images was nearly the same as shown in Figure 10, so there was no imbalance present. *Healthcare* **2022**, *10*, x FOR PEER REVIEW 11 of 16

**Figure 10.** Prediction of classes of the test set images. **Figure 10.** Prediction of classes of the test set images.

accuracies are plotted against epochs.

The dataset was initially given an 80–20 train test split. The training split was further divided into 5-fold cross-validation. The best performing fold was then utilized for testing on the holdout validation 20% test split. The cross-validation was carried out to overcome any potential overfitting which might occur due to model exposure to one class of images. The proposed model was trained on 12,000 images and tested on 2132 images. Each fold The dataset was initially given an 80–20 train test split. The training split was further divided into 5-fold cross-validation. The best performing fold was then utilized for testing on the holdout validation 20% test split. The cross-validation was carried out to overcome any potential overfitting which might occur due to model exposure to one class of images. The proposed model was trained on 12,000 images and tested on 2132 images. Each fold of

of 5-fold cv was learned by the model for 65 epochs. We performed validation of this model by training and conducting simultaneous test evaluations for five runs. The augmented data was mixed along with the original data to provide a wide range of training

are described in Figure 11 and Figure 12, respectively. From Figure 12, it can be noticed how the categorical cross-entropy loss function starts to reduce heavily in the span of 10– 30 epochs, referring to Figure 11. The model is performing adequately but stopping the model training at this point would have led to potential underfitting. Epochs after 35 do not provide many variations in the loss, and a steady decrease in the loss function can be observed. This behavior of the ALL-NET was observed in all 5 separate training processes. While increasing the epoch number from 65 would have led to a decrease in the loss even further as well as simultaneous increase in the accuracy, this would have led to overfitting

During the first instance, the accuracy, specificity, recall, F1-score, and specificity were 94.94%, 94.87%, 94%, 94.96%, and 95.95%, respectively. When the model was run again, the accuracy, specificity, recall, F1-score, and specificity obtained were 94.5%, 95.8%, 93.2%, 93.2% and 96%, respectively. When the model was run the third time, the above metrics obtained were 94.72%, 95.8%, 93.2%, 93.2%, and 96%, respectively. During the fourth iteration, the accuracy, specificity, recall, F1-score, and specificity obtained were 95%, 95.91%, 94%, 94.96 and 95.95%, respectively. In the last instance, the above metrics obtained were 95.45%, 95%, 95.91%, 95.43% and 94.94%, respectively. In Figure 12, the

as we will see when the model is evaluated against the holdout test set.

5-fold cv was learned by the model for 65 epochs. We performed validation of this model by training and conducting simultaneous test evaluations for five runs. The augmented data was mixed along with the original data to provide a wide range of training samples, this in turn helped the model to generalize better. The accuracy and loss curves are described in Figures 11 and 12, respectively. From Figure 12, it can be noticed how the categorical cross-entropy loss function starts to reduce heavily in the span of 10–30 epochs, referring to Figure 11. The model is performing adequately but stopping the model training at this point would have led to potential underfitting. Epochs after 35 do not provide many variations in the loss, and a steady decrease in the loss function can be observed. This behavior of the ALL-NET was observed in all 5 separate training processes. While increasing the epoch number from 65 would have led to a decrease in the loss even further as well as simultaneous increase in the accuracy, this would have led to overfitting as we will see when the model is evaluated against the holdout test set. *Healthcare* **2022**, *10*, x FOR PEER REVIEW 12 of 16 *Healthcare* **2022**, *10*, x FOR PEER REVIEW 12 of 16

**Figure 11.** Accuracy for the five instances. **Figure 11.** Accuracy for the five instances. **Figure 11.** Accuracy for the five instances.

**Figure 12.** Training loss for the five instances. **Figure 12.** Training loss for the five instances. **Figure 12.** Training loss for the five instances.

A confusion matrix was used to evaluate the number of true predictions. The prediction was done on two possible classes, "ALL", i.e., blast cells and "HEM" i.e., healthy cells. It was found that from a total of 2132 images, 1454 images were classified as blast cells, and 667 images were classified as healthy cells. During prediction, the model predicted 1996 images, i.e., 94.2% images correctly, and 5.8% images incorrectly. Predictions made per class are seen more clearly in Figure 13, depicting the confusion matrix. These results can be seen in Table 3. A confusion matrix was used to evaluate the number of true predictions. The prediction was done on two possible classes, "ALL", i.e., blast cells and "HEM" i.e., healthy cells. It was found that from a total of 2132 images, 1454 images were classified as blast cells, and 667 images were classified as healthy cells. During prediction, the model predicted 1996 images, i.e., 94.2% images correctly, and 5.8% images incorrectly. Predictions made per class are seen more clearly in Figure 13, depicting the confusion matrix. These results can be seen in Table 3. During the first instance, the accuracy, specificity, recall, F1-score, and specificity were 94.94%, 94.87%, 94%, 94.96%, and 95.95%, respectively. When the model was run again, the accuracy, specificity, recall, F1-score, and specificity obtained were 94.5%, 95.8%, 93.2%, 93.2% and 96%, respectively. When the model was run the third time, the above metrics obtained were 94.72%, 95.8%, 93.2%, 93.2%, and 96%, respectively. During the fourth iteration, the accuracy, specificity, recall, F1-score, and specificity obtained were 95%, 95.91%, 94%, 94.96 and 95.95%, respectively. In the last instance, the above metrics obtained were 95.45%, 95%, 95.91%, 95.43% and 94.94%, respectively. In Figure 12, the accuracies are plotted against epochs.

A confusion matrix was used to evaluate the number of true predictions. The prediction was done on two possible classes, "ALL", i.e., blast cells and "HEM" i.e., healthy cells.

It was found that from a total of 2132 images, 1454 images were classified as blast cells, and 667 images were classified as healthy cells. During prediction, the model predicted 1996 images, i.e., 94.2% images correctly, and 5.8% images incorrectly. Predictions made per class are seen more clearly in Figure 13, depicting the confusion matrix. These results can be seen in Table 3. It was found that from a total of 2132 images, 1454 images were classified as blast cells, and 667 images were classified as healthy cells. During prediction, the model predicted 1996 images, i.e., 94.2% images correctly, and 5.8% images incorrectly. Predictions made per class are seen more clearly in Figure 13, depicting the confusion matrix. These results can be seen in Table 3.

tion was done on two possible classes, "ALL", i.e., blast cells and "HEM" i.e., healthy cells.

*Healthcare* **2022**, *10*, x FOR PEER REVIEW 12 of 16

**Figure 11.** Accuracy for the five instances.

**Figure 12.** Training loss for the five instances.

**Figure 13.** Confusion matrix of 5 separate training instances. used a neighborhood correction technique to diagnose this fatal condition at an early **Figure 13.** Confusion matrix of 5 separate training instances.

**Table 3.** Model evaluation results.

**Instance Accuracy (%) Specificity (%) Recall (%) F1-Score (%) Precision (%)**

Instance 1 94.94 94.87 94 94.96 95.95 89.42 Instance 2 94.5 95.8 93.2 93.2 96 89.53 Instance 3 94.72 95.8 93.2 93.2 96 88.76 Instance 4 95 95.91 94 94.96 95.95 88 Instance 5 95.45 95 95.91 95.43 94.94 89.5

Referring to Table 3, it can be observed that the F1 score obtained by the model in each of the 5 running instances is up to par if not better than the current existing model performances which we shall discuss later. A consistent F1 score of 90 plus in all 5 runs indicates that the balance between the precision and recall is maintained. The balance between precision and recall is quite pertinent for a model such as ALL-NET which can potentially serve as a preliminary screening tool. Our approach of using a simple CNN network for this dataset has given us good results across multiple performance metrics.

Many approaches have been used for classifying leukemia. Of the many, a few approaches with good performance are discussed in Table 4. ALL was diagnosed using Abunadi et al. [34] using an ensemble deep learning approach. The combined model achieved an accuracy of 100%. For the C\_NMC\_2019 dataset, Yongsheng Pan et al. [35]

**Matthews Correlation Coefficient (MCC)**


**Table 3.** Model evaluation results.

Referring to Table 3, it can be observed that the F1 score obtained by the model in each of the 5 running instances is up to par if not better than the current existing model performances which we shall discuss later. A consistent F1 score of 90 plus in all 5 runs indicates that the balance between the precision and recall is maintained. The balance between precision and recall is quite pertinent for a model such as ALL-NET which can potentially serve as a preliminary screening tool. Our approach of using a simple CNN network for this dataset has given us good results across multiple performance metrics.

Many approaches have been used for classifying leukemia. Of the many, a few approaches with good performance are discussed in Table 4. ALL was diagnosed using Abunadi et al. [34] using an ensemble deep learning approach. The combined model achieved an accuracy of 100%. For the C\_NMC\_2019 dataset, Yongsheng Pan et al. [35] used a neighborhood correction technique to diagnose this fatal condition at an early stage. An accuracy of 92% was obtained by this algorithm. Khandekar et al. [36] used the YOLOe4 algorithm to diagnose this fatal disease. A maximum recall of 96% was obtained by the deep learning model. Christian et al. [37] have utilized an attention-based neural network to detect ALL. A maximum F1-score of 82% was obtained by this efficient algorithm. Table 4 demonstrates the comparison of prior research with the suggested technique for a similar dataset. The need for preprocessing to tackle this problem may be a slight drawback in this research. The model could be made more robust if it were trained with more data. The intelligent algorithms are promising with the optimized features for screening various problems in digital pathology [38–40]. The improved telehealth framework with the intelligent algorithm will enable the remote diagnosis facility [41–43].


**Table 4.** Comparison of results on the C\_NMC\_2019 dataset.

#### **4. Conclusions**

A method for early diagnosis of cancer from infinitesimal images of white blood cells using CNN has been proposed in this research. Since the deep learning approaches do not require manual feature engineering, the model performs exceptionally well when compared to traditional image processing techniques. The good performance of blast cell detection is supported by the accurate classification results. Maximum accuracy of 95% was obtained by the custom deep learning ALL-NET classifier. It operates on all data available rather than a portion specified by a feature vector, which is also a benefit. This work can help during the screening, reducing the rate of error, as well as decreasing the computational time. As a result, this research can be used to provide a theoretical framework for a diagnosis support tool for the detection of ALL. The future study includes expanding the dataset with noisy images with very little pre-processing, to address the problem of using actual medical images for prediction. Combining these models with explainabality models provides useful inferences to practitioners. Algorithms such as Yolov4, Resnet, and AlexNet can also be explored since they can perform better on these tasks.

**Author Contributions:** Conceptualization, N.S. and S.S.K.; formal analysis, K.C. and M.P.; investigation, N.S., K.C. and N.G.; methodology, K.C. and M.P.; project administration, S.S.K.; resources, M.G.B., D.B. and S.P.U.; software, N.G.; supervision, N.S.; visualization, N.S.; writing—original draft, N.S.; writing—review and editing, R.P.C., D.B., S.P.U. and S.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**

