Induction of Convolutional Decision Trees with Success-History-Based Adaptive Differential Evolution for Semantic Segmentation

López-Lobato, Adriana-Laura; Acosta-Mesa, Héctor-Gabriel; Mezura-Montes, Efrén

doi:10.3390/mca29040048

Open AccessArticle

Induction of Convolutional Decision Trees with Success-History-Based Adaptive Differential Evolution for Semantic Segmentation^†

by

Adriana-Laura López-Lobato

,

Héctor-Gabriel Acosta-Mesa

^*

and

Efrén Mezura-Montes

Artificial Intelligence Research Institute, Universidad Veracruzana, Campus Sur, Calle Paseo Lote II, Sección Segunda No. 112, Nuevo Xalapa, Veracruz 91097, Mexico

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in New Trends in Computational Intelligence and Applications 2023.

Math. Comput. Appl. 2024, 29(4), 48; https://doi.org/10.3390/mca29040048

Submission received: 28 May 2024 / Revised: 22 June 2024 / Accepted: 24 June 2024 / Published: 27 June 2024

(This article belongs to the Special Issue New Trends in Computational Intelligence and Applications 2023)

Download

Browse Figures

Versions Notes

Abstract

:

Semantic segmentation is an essential process in computer vision that allows users to differentiate objects of interest from the background of an image by assigning labels to the image pixels. While Convolutional Neural Networks have been widely used to solve the image segmentation problem, simpler approaches have recently been explored, especially in fields where explainability is essential, such as medicine. A Convolutional Decision Tree (CDT) is a machine learning model for image segmentation. Its graphical structure and simplicity make it easy to interpret, as it clearly shows how pixels in an image are classified in an image segmentation task. This paper proposes new approaches for inducing a CDT to solve the image segmentation problem using SHADE. This adaptive differential evolution algorithm uses a historical memory of successful parameters to guide the optimization process. Experiments were performed using the Weizmann Horse dataset and Blood detection in dark-field microscopy images to compare the proposals in this article with previous results obtained through the traditional differential evolution process.

Keywords:

semantic segmentation; image segmentation; convolutional decision tree; differential evolution; SHADE

1. Introduction

Semantic segmentation is a relevant process in numerous scientific fields for image analysis. It involves assigning labels to the pixels of an image to distinguish objects of interest from the image background; see Figure 1.

Several methods have been used to solve the image segmentation problem [1,2], but Convolutional Neural Networks (CNNs) are currently the most popular [3]. CNNs produce powerful results; however, they are often called “black boxes” because the process they follow to produce results can be difficult to understand and explain. Therefore, their use in critical contexts, such as the medical field, needs to be thoroughly studied [4,5].

Convolutional Decision Trees (CDTs) are an alternative to CNNs because they have the graphical structure of a decision tree model that is easy to interpret [6]. Several methods have been proposed to solve optimization problems in the CDT induction process. The original one, proposed in [6], maximizes the information gain function in each tree node through an analytical optimization process. This process is a local search that partitions the data (pixels) to obtain a single CDT.

Since the classical method for CDT induction is a greedy search, another method uses the differential evolution (DE) algorithm to induce a CDT with a global search to improve performance [7]. DE is one of the most popular metaheuristic search strategies for solving optimization problems; it incorporates stochastic elements and parameters that enhance its ability to explore the problem domain, even when the parameters must be adapted to the specific problem [8]. The method in [7] performs a global search to induce a CDT using the DE algorithm to maximize the F1-score. In this proposal, each individual in the DE algorithm represents all the kernels of the CDT, so a set of CDTs is obtained to select the one with the best F1-score.

The latest method, called DE-CDT-BKS [9], performs a local search using the DE algorithm to identify the optimal convolution kernel size and the convolution kernel of each node of the CDT to split the data (pixels) into two sets, using the F1-score as the fitness function. In this method, the user provides a list of kernel sizes. After applying the learning process with these kernel sizes, the kernel with the best F1-score at each partition node is selected. This results in a single CDT with convolutional kernels of different sizes.

The methods proposed in [6,7,9] produce CDTs with more explanatory structure than a CNN and transparency in the image segmentation process. However, the first method requires a post-training process called graph cutting to improve the results [6]. The first method produces better results than the other two but does not allow modifications to the objective function. In contrast, in the other two methods the objective function can be modified due to the nature of the DE algorithm, although they produce lower results [7,9]. Furthermore, in the first two methods, the CDT structure only considers kernels of the same size [6,7].

To overcome this, the present work proposes the use of SHADE [10] instead of the traditional DE algorithm in the second and third methods to achieve superior segmentation results. As an adaptive DE algorithm, SHADE uses a historical memory of successful parameters to guide the optimization process. This article compares these techniques with previous results obtained using the traditional differential evolution process [7,9]. The comparison is based on the segmentation of two sets of images: the “Weizmann Horse dataset” and the “Blood detection in dark-field microscopy images”. The selection of these databases makes it possible to compare the results with those described in [6,7,9].

The remaining paper is structured into three sections. Section 2 describes the DE algorithm and highlights the most relevant characteristics of SHADE. This section also presents the details of the two CDT induction procedures with SHADE. Section 3 is dedicated to the experiments and the results obtained. Finally, Section 4 offers detailed conclusions and future work.

2. Materials and Methods

This section covers the two main subjects of the project: the differential evolution algorithm and SHADE. A description of Convolutional Decision Trees (CDTs) and the methodologies proposed for CDT induction using SHADE are also presented.

2.1. Differential Evolution Algorithm and SHADE

2.1.1. Differential Evolution Algorithm

The Differential Evolution (DE) algorithm is a metaheuristic search strategy for solving optimization problems [11,12,13]. It works with a population that represents potential solutions to the problem and develops it by recombining solutions within it to generate a new population (offspring). The DE algorithm involves three operators: mutation, crossover, and selection. It also involves user-defined parameters that guide the search: the scaling factor F, the crossover rate

C R

, the population size

N P

, and the number of generations

N G

.

The standard DE process, known as DE/rand/1/bin, uses the following operators to generate a trial vector

u_{i}

for each target vector

x_{i}

in the population:

Mutation operator: To generate the trial vector $u_{i}$ , the mutation operator computes a noise vector $ν_{i}$ using Equation (1). The vectors $x_{r_{0}}$ , $x_{r_{1}}$ , and $x_{r_{2}}$ are randomly selected from the population and are different from each other and from $x_{i}$ . F is a factor that scales the difference between the vectors $x_{r_{1}}$ and $x_{r_{2}}$ . This process is equivalent to stepping towards a new point in the search space.

$ν_{i} = x_{r_{0}} + F (x_{r_{1}} - x_{r_{2}})$

(1)
Crossover operator: The crossover operator merges the information of the noise vector $ν_{i}$ with that of the target vector $x_{i}$ to generate the trial vector $u_{i}$ , component by component, using the function in Equation (2). This equation involves a randomly generated number, $r a n d_{j}$ , within the range $[0, 1]$ for each vectorial component. To ensure that the trial vector takes at least one component from the noise vector $ν_{i}$ , a randomly selected position j, denoted as $J_{r a n d}$ , is used. The process is controlled by the crossover rate $C R$ .

$u_{i j} = \{\begin{matrix} ν_{i j}, & if (r a n d_{j} \leq C R) or (j = J_{r a n d}); j = 1, \dots, | x_{i} | \\ x_{i j}, & otherwise . \end{matrix}$

(2)
Selection operator: The selection operator adds the vector with the highest fitness value between $x_{i}$ and $u_{i}$ to the population for the next generation. This operator guarantees that the best solution is preserved throughout the iterations, thanks to the property of elitism.

As mentioned before, this process is the classic version of differential evolution, called DE/rand/1/bin [14], where “rand” means that the vectors are chosen randomly in the mutation operator, “1” means that only one difference vector is used to form the noise vector, and the term “bin” (binomial distribution) means that a uniform crossover is used when creating the test vector. However, several variants of the DE algorithm consider different ways of generating the noise vector

ν_{i}

in the mutation operator [15]. Some of these variants use the individual with the best fit

x_{b e s t}

or use more than three individuals from the population by including more than one scaled difference. Some examples of these variants are:

DE/rand/2

$ν_{i} = x_{r_{0}} + F (x_{r_{1}} - x_{r_{2}}) + F (x_{r_{3}} - x_{r_{4}}),$

(3)
DE/best/1

$ν_{i} = x_{b e s t} + F (x_{r_{0}} - x_{r_{1}}),$

(4)
DE/best/2

$ν_{i} = x_{b e s t} + F (x_{r_{0}} - x_{r_{1}}) + F (x_{r_{2}} - x_{r_{3}}) .$

(5)

It is known that the values for the parameters

C R

, F,

N P

, and

N G

are problem-dependent and impact the performance of the DE algorithm [15], so it is necessary to tune them to obtain good results when applied to real-world problems. Several self-adaptive mechanisms to adjust these parameters have been studied, such as JADE [16], SHADE [10], and L-SHADE [17].

This research considers the algorithm Success-History-based Adaptive Differential Evolution (SHADE), which uses a historical memory of successful parameters

C R

and F to readjust their values in the search process. The most relevant characteristics of SHADE are mentioned below.

2.1.2. SHADE

SHADE is a technique that regulates the DE algorithm parameter values

C R

and F through an adaptive parameter control mechanism. It is well established that the values of these parameters depend on the specific real-world problem considered and directly affect the model’s performance [15]. Therefore, the SHADE technique allows the initial values of the parameters to change and adapt during the search process to achieve the desired results [10].

This approach uses the mutation strategy called DE/current-to-pbest/1, shown in Equation (6), where

x_{p b e s t}^{(G)}

is an individual randomly selected from the top

N P * p

individuals in the population of generation G, where

p \in (0, 1]

.

x_{r_{0}}^{(G)}

and

x_{r_{1}}^{(G)}

are vectors randomly selected from the population of generation G, and

F_{i}^{(G)}

is the F parameter used by individual

x_{i}^{(G)}

.

ν_{i}^{(G)} = x_{i}^{(G)} + F_{i}^{(G)} (x_{p b e s t}^{(G)} - x_{i}^{(G)}) + F_{i}^{(G)} (x_{r_{0}}^{(G)} - x_{r_{1}}^{(G)}) .

(6)

The SHADE algorithm employs an archive A that is initially empty. The target vectors

x_{i}^{(G)}

that lose against its trial vector in the selection operator are added to the archive during the evolution process. This process results in the random selection of the vector

x_{r_{1}}^{(G)}

from the union of the individuals in the population and the vectors in A. When the size of set A exceeds the maximum size, which is typically equal to

N P

, randomly selected elements are eliminated from A to maintain the size.

To calculate the values of the parameters

C R_{i}^{(G)}

and

F_{i}^{(G)}

, two memories of size H are used, denoted as

M_{C R}

and

M_{F}

. They store the mean values of the successful parameters of previous generations, understanding as successful parameters those

C R_{i}^{(G)}

and

F_{i}^{(G)}

values that generate a trial vector that defeats its associated target vector. Initially, all values in the

M_{C R}

and

M_{F}

memories are set to 0.5. In each generation G, the parameter values

C R_{i}^{(G)}

and

F_{i}^{(G)}

are calculated for each individual

x_{i}^{(G)}

with Equations (7) and (8), where

r a n d n_{i} (μ, σ^{2})

and

r a n d c_{i} (μ, σ^{2})

are values randomly obtained from a normal distribution and a Cauchy distribution, respectively, both with mean

μ

and variance

σ^{2}

, and

r_{i}

is a randomly selected position from the memories

M_{C R}

and

M_{F}

.

C R_{i}^{(G)} = r a n d n_{i} (M_{C R, r_{i}}, 0.1)

(7)

F_{i}^{(G)} = r a n d c_{i} (M_{F, r_{i}}, 0.1)

(8)

The successful parameters

C R_{i}^{(G)}

and

F_{i}^{(G)}

are stored in the variables

S_{C R}

and

S_{F}

. The values at the position k in the memories are updated with Equations (9) and (10).

M_{C R, k}^{(G + 1)} = \{\begin{matrix} m e a n_{W A} (S_{C R}) & if S_{C R} \neq \emptyset \\ M_{C R, k}^{(G)} & otherwise \end{matrix}

(9)

M_{F, k}^{(G + 1)} = \{\begin{matrix} m e a n_{W L} (S_{F}) & if S_{F} \neq \emptyset \\ M_{F, k}^{(G)} & otherwise \end{matrix}

(10)

G is the current generation of the process. If no successful parameters are identified during this generation, the values stored within the memories remain unchanged. k has the value 1 at the beginning of the process and increases by one unit each time a new element is updated in the memories. When

k > H

, k returns to the value 1. In these equations,

m e a n_{W A} (\cdot)

is the weighted mean, calculated with Equations (11) and (13), and

m e a n_{W L} (\cdot)

is the weighted Lehman mean, calculated with Equations (12) and (13). The improvement in the fitness function between the trial vector that defeats its target vector is denoted by

Δ f_{j} = |f (u_{j}^{(G)}) - f (x_{j}^{(G)})|

.

m e a n_{W A} (S_{C R}) = \sum_{j = 1}^{| S_{C R} |} w_{j} \cdot S_{C R, j} .

(11)

m e a n_{W L} (S_{F}) = \frac{\sum_{j = 1}^{| S_{F} |} w_{j} \cdot S_{F, j}^{2}}{\sum_{j = 1}^{| S_{F} |} w_{j} \cdot S_{F, j}^{2}} .

(12)

w_{j} = \frac{Δ f_{j}}{\sum_{i = 1}^{| S_{C R} |} Δ f_{i}} .

(13)

The value

p_{i}

to adjust the mutation strategy (current-to-pbest/1) is randomly selected for each individual

x_{i}^{(G)}

in the population. See Equation (14), where

p_{m i n} = 2 / N P

. Thus,

x_{p b e s t}^{(G)}

is selected from at least 2 individuals up to

20 %

of the population.

p_{i} = r a n d [p_{m i n}, 0.2] .

(14)

The pseudocode of the SHADE procedure is shown in Algorithm A1 in Appendix A.

2.2. Convolutional Decision Trees (CDTs)

An image is an array of discrete pixels that use different levels of red, green, and blue (RGB) to create the colors and shapes in the image. This work deals with images in grayscale, which are 2D arrays in which each pixel has an integer value between 0 and 255 (the range of values that an 8-bit number can represent).

In computer vision and image processing, convolution kernels are essential for extracting specific features from the image [18]. A convolution kernel is a squared array of discrete numbers, called weights, used to calculate the dot product at each position when a convolution operation is performed. The result of these products, along with the bias associated with the kernel, is the new value for each pixel in the output feature map. Figure 2 illustrates an example of the convolution operation on a

6 \times 6

image with a

3 \times 3

kernel. Full convolution is achieved by repeating the process until the convolution kernel has passed through all pixels of the input image. Therefore, the result of the convolution is a filtered version of the image.

Since the convolution kernels extract specific image features, the Convolutional Decision Trees (CDTs) are defined in [6] as algorithms for adaptive feature learning and segmentation, developing the idea of oblique (multivariate) trees [19,20]. The oblique trees successively partition the data into subsets according to a predefined criterion called predicated

ϕ

, defined in Equation (15), where

x^{T} \cdot β

represents a linear combination of the attributes, with

x \in R^{d}

, and the parameter

β \in R^{d}

is used to determine the partition. Thus, for points x in one half-space,

ϕ (x) = 1

, while

ϕ (x) = 0

for x in another half-space. The predicate is obtained by maximizing a measure of informativeness, such as the information gain or Gini’s diversity index.

ϕ (x) = \{\begin{matrix} 1 & i f x^{T} \cdot β > 0, \\ 0 & i f x^{T} \cdot β \leq 0 . \end{matrix}

(15)

The CDT is distinguished by its suitability for structural data, such as the spatial structure of image patches in image segmentation. In this context,

β

represents a convolution kernel and x denotes the information of a pixel and its neighboring pixels. In consequence, each split of the CDT is represented by a convolutional kernel, which is learned in a supervised manner, maximizing the information gain of the split; see Figure 3.

The classical method for the CDT induction is a greedy search [6]. To improve performance, the methods proposed in [7,9] use the differential evolution (DE) algorithm to induce a CDT. However, these methods have parameters that must be defined by the user and directly affect the model’s performance. Therefore, instead of the traditional DE algorithm, the use of SHADE is proposed in this work.

The following section describes the methodologies proposed for the CDT induction using SHADE.

2.2.1. CDT Induction with SHADE

This paper proposes two search strategies: a global strategy and a local strategy. In the global strategy, SHADE-CDT, the DE algorithm uses the SHADE mechanism to find a complete CDT of given depth d with kernels of fixed size s. In the local strategy, SHADE-CDT-BKS (BKS = Best Kernel Size), the SHADE mechanism is used to find the size of each kernel and the kernels of an optimized CDT of a given depth d.

In both strategies, the images used for the CDT induction are processed, and each pixel (instance) is encoded as the vector with the values of the pixels in the neighborhood of size

s \times s

surrounding it, adding the value 1 for the bias. Figure 4 shows an example of an instance encoding associated with a pixel, considering a neighborhood of size

3 \times 3

.

Algorithm 1 presents the pseudocode of the

b u i l d M a t r i x

function implemented to obtain a matrix M with the vector representation associated with the pixels of the images and the vector

R L V

with the real labels of these pixels.

Algorithm 1: Function buildMatrix

To manipulate the pixel information in the CDT, a real-valued vector of size

s^{2} + 1

is used to represent a convolutional kernel associated with an internal CDT node. These values represent the weights of the convolutional kernel and a value for the bias. Figure 5 shows an example of a convolutional kernel coding for a kernel of size 3.

A perceptron-like structure is used to determine which branch of the tree to take when classifying an instance, where the dot product of the instance and the weights of the corresponding kernel pass through an activation function that returns a label 0 or 1; see Figure 6. This label indicates the node where the instance goes next, the kernel on the left branch for label 0, or the kernel on the right branch for label 1.

The strategies implemented in this paper use this process to assign a label to each instance with a CDT, passing it through the corresponding kernels until it reaches a leaf node. The label assigned to the instance is obtained with the corresponding kernel in the leaf node. The pseudocode for this process is presented in Algorithm 2 with the

T r e e P e r f o r m

function, where the aptitude of a CDT is calculated. Moreover, in both strategies, the F1-score metric is used to determine the fitness value of each individual in the process. The F1-score compares the labels assigned by a model with the actual labels of the instances to obtain the resulting fitness value between 0 and 1. This project aims to maximize this fitness value, either globally or locally, using the DE algorithm with SHADE.

Global Strategy (SHADE-CDT)

The SHADE-CDT method is a global search strategy based on the methodology proposed in [7]. In this method, the SHADE algorithm is used to induce a CDT of a given depth d with kernels of size s. The population size (

N P

), the number of generations (

N G

), and the memory size (H) for the SHADE algorithm are user-defined parameters. The individuals in the population are vectors representing all the convolutional kernels in the CDT, so the encoding of the potential solutions depends on two factors: the depth d of the tree and the size of the convolutional kernel s. If the number of weights that are needed for each kernel is

s^{2} + 1

, and the number of kernels in the CDT is

2^{d} - 1

, then the individuals of the population are vectors of size

(s^{2} + 1) (2^{d} - 1)

with random values chosen from −255 to 255. This range of values was chosen because grayscale images have an integer value between 0 and 255 in each pixel, and the same range was given on the negative side to allow the convolution kernel to take negative values if needed. Figure 7 shows an example of encoding for a CDT of depth 3 and kernel size of 3, and Algorithm 3 shows the pseudocode of this process with the

G l o b a l C D T

function.

Algorithm 2: Function TreePerform

Algorithm 3: Function

G l o b a l C D T

Input:: Set of grayscale images (I) and their corresponding ground-truth images (G), kernel size (s), depth of the CDT (d), population size ( $N P$ ), number of generations ( $N G$ ) and memory size (H)
Output:: Kernels of a CDT induced with the SHADE-CDT strategy and its aptitude

1: $[M, P R L] \leftarrow b u i l d M a t r i x (s, I, G)$
2: Initialize population $P = {x_{1}, x_{2}, \dots, x_{N P}}$ of vectors of size $(s^{2} + 1) * (2^{d} - 1)$ with random values chosen from −255 to 255
3: $x_{b e s t} \leftarrow S H A D E (P_{0} = P, N = N P, H, f = T r e e P e r f o r m (M, P R L, x_{i}, s, d))$ ;
4: $a p t i t u d e (x_{b e s t}) \leftarrow T r e e P e r f o r m (M, P R L, x_{b e s t}, s, d)$ ;
5: Return: $x_{b e s t}$ , $a p t i t u d e (x_{b e s t})$

Local Strategy (SHADE-CDT-BKS)

The SHADE-CDT-BKS method is a local search strategy based on the proposal in [9]. The main difference between the SHADE-CDT approach and the SHADE-CDT-BKS method is that the SHADE algorithm is used to find the kernel sizes and the kernels of a CDT of a given depth d. This is achieved by a systematic approach that ensures that each kernel partitions the instances in the most effective way. The population size (

N P

), the number of generations (

N G

), and the memory size (H) for the SHADE algorithm are user-given parameters.

In this proposal, the user provides a list S of possible sizes for the kernels, with odd values s, and the depth d of the tree. The SHADE algorithm is applied to each value in S, where the individuals of the population are real-valued vectors of size

s^{2} + 1

with random values chosen from −255 to 255; see Figure 5. For each value s, the individual with the best fitness is found, and among them, the one with the best F1-score is selected as the best solution. In this way, the CDT is induced by finding each kernel until the depth specified by the user is reached.

All instances are considered in the SHADE process for the root node kernel, and for the other kernels, only the instances tagged with the class corresponding to the branch in which the kernel is located are considered. With the SHADE-CDT-BKS method, a new kernel is found only if more than 20 instances are considered for the SHADE process of the kernel. These instances must belong to different classes, at least

5 %

of one of them. Otherwise, the method does not find a new kernel. The pseudocode for this process is described in the

b u i l d L o c a l C D T

function of Algorithm 4. In this way, the SHADE-CDT-BKS method produces an optimized CDT with different kernel sizes without using a pruning process; see Figure 8.

Algorithm 4: Function

b u i l d L o c a l C D T

3. Experiments and Results

This section presents the results obtained by inducing a CDT using the SHADE-CDT and SHADE-CDT-BKS methods on images from the Weizmann Horse Dataset [21] and Blood detection in dark-field microscopy images obtained from Kaggle (https://github.com/PerceptiLabs/bacteria/tree/main?tab=readme-ov-file accessed on 28 May 2024). The selection of these databases made it possible to compare the results with those described in [6,7,9]. Three subsections are presented for each dataset: SHADE-CDT experiments, SHADE-CDT-BKS experiments, and a comparison between methods. The first two subsections show the experiments and describe the results. The last subsection compares the two approaches based on the differences in the segmentation task per image. A review of the explicability of a CDT is also included at the end of this section.

The proposed strategies were implemented in Matlab R2023b software. Table 1 provides the specifications of the computer that was used to perform the experiments.

3.1. Weizmann Horse Dataset

The Weizmann Horse Dataset [21] consists of 327 manually segmented images of horses of different colors and in a broad spectrum of landscapes. To perform experiments with the methods proposed in this work, an image resizing procedure was applied to reduce the number of pixel-associated instances in the CDT induction process to 40% of their original size. Controlled experiments were performed to calibrate the parameter values of the SHADE algorithm (population size

N P

, number of generations

N G

, and size of memories H), using fixed training and test sets of 33 and 295 images, respectively. These proportions were chosen based on the best result of the study in [7]; see Table 2 for details. For the values of

N P

,

N G

, and H for each method, the different combinations of the following values were considered: 50, 100, and 200 for

N P

and

N G

, and 15, 30, 50, and 100 for H. After performing several experiments with these features, the highest F1-scores were obtained with

N P = 100

,

N G = 200

, and

H = 100

for the SHADE-CDT method, and with

N P = 50

,

N G = 200

, and

H = 100

for the SHADE-CDT-BKS method, so these six values were maintained in the following experiments.

3.1.1. SHADE-CDT Experiments

To induce a CDT with the SHADE-CDT method, the following parameter values for the structure of the CDT were considered: depths from one to five and kernel sizes three, five, seven, and nine. Table 3 shows the results of 20 experiments performed with this method under the aforementioned conditions. The best results for kernel sizes three, five, and seven were obtained with a CDT of depth four in experiments 4, 9, and 14, respectively. For kernel size nine, the best result was obtained in experiment 17 with a CDT of depth two.

Based on the results of experiment 17, where the best overall result was obtained with a kernel of the maximum size considered, nine, and the best results by depth were obtained with kernels of size nine, it is clear that the larger kernel size leads to a higher F1-score for the Weizmann Horse Dataset. Figure 9 shows the CDT induced in this experiment, along with the best and worst individual results obtained for the test dataset, the original images, the ground truth, and the predictions generated using the SHADE-CDT method. The F1-scores and accuracies obtained for these images are also displayed.

The analysis of individual images in the test set underscored the significant influence of color and background structure on the model’s performance. The wide range of colors of the horses strongly influenced the negative results of the model. In addition, they were set against a variety of background structures—fences, trees, mountains, and plants. When the images were converted to grayscale, the tones of these structures interfered with the tones of the pixels corresponding to the horses, resulting in classification errors.

When applied to images of white horses, the proposed method produced a reverse labeling result, highlighting the pixels corresponding to the background structures and leaving out the horse’s shape, as illustrated in Figure 9.

Table 3 shows that the SHADE-CDT method gives more varied and better results compared to those in [7]. None of the experiments achieved results like the F1-score of 80.4% obtained in [6], but their method for CDT induction required 12 h for training, and the maximum time in the experiments presented in Table 3 was 2.05 h. These results show that explainable trees with shallow depth were effective since the induction of deeper trees did not necessarily result in a better F1-score. As shown in Table 3, the best results were obtained with a depth not greater than four.

3.1.2. SHADE-CDT-BKS Experiments

The following parameter values were considered to induce a CDT with the SHADE-CDT-BKS method: depths from one to five, and two sets of kernel sizes,

S = {3, 5, 7}

and

S = {3, 5, 7, 9}

.

Table 4 shows the results of 10 experiments performed with this method under the conditions described above. The best result for the set of kernel sizes

{3, 5, 7}

was obtained in experiment 4 with a CDT of depth four. For the other set of kernel sizes, the best result was obtained in experiment 9 with a CDT of depth four.

The best overall result was obtained using the CDT of depth four induced in experiment 9, shown in Figure A1 of Appendix B. Figure 10 shows the best and worst individual results obtained for the test dataset, the original images, the ground truth, the generated predictions, and the F1-score and accuracy obtained for each image. Again, the negative results of the model were influenced by the color of the horses, as the white horses had lower F1-scores, and the dark horses had higher scores.

Table 3 and Table 4 show that the SHADE-CDT-BKS method produced better results than the SHADE-CDT method and consequently, better results than those in [7].

3.1.3. Comparison between Methods

In Figure 11, images 125, 131, 41, and 233 are shown to compare the change in the segmentation task between experiment 1 with the SHADE-CDT method and the results with the CDT induced in experiments 4 and 9 by the SHADE-CDT-BKS method. Experiment 4 with the SHADE-CDT-BKS method was chosen because it gave the best F1-score for the kernel sizes set

S = {3, 5, 7}

.

There are some important notes to make:

The SHADE-CDT-BKS method outperformed the SHADE-CDT method with the highest F1-score of 0.53651. Although it required more computational time due to the use of the DE algorithm, $| S | = 4$ times in each node to induce the CDT, as suggested by corresponding Table 3 and Table 4, and the array (F1-score, accuracy) in blue in Figure 11.
Analyzing images 41 and 233, it is clear that more pixels associated with the horse were classified as class 1, as they should be; see the rows corresponding to these images in Figure 11. The F1-scores of these images also supported this. For example, the F1-score of image 41 increased significantly from 0.13621 in experiment 17 using the SHADE-CDT method to 0.48607 using the SHADE-CDT-BKS method. It is important to note that this increase indicates a shift in the distribution of the pixels in the CDT, resulting in a more significant number of pixels with clear tones being assigned to class 1.
On the other hand, in the predictions for images 125 and 131, the profile of the horses was extended by adding to class 1 the surrounding pixels that were previously classified as pixels of label 0. This confirms a shift in the distribution of the pixels. See the corresponding rows for these images in Figure 11.
Figure 11 shows that the increase in the overall F1-score between these experiments was because the images with white horses received higher F1-scores; however, the images with dark horses decreased their F1-score by adding more class 0 pixels to the pixels corresponding to a horse (class 1).
Thus, even if the F1-score increased between these experiments, the comparison with Figure 11 suggests that the model’s behavior on images as diverse as those in this dataset could lead to classification errors because it is difficult for the learning process to make the appropriate divisions in the dataset.

3.2. Blood Detection in Dark-Field Microscopy Images

The Blood detection in dark-field microscopy images is a dataset containing 366 dark-field microscopy images for observing and segmenting erythrocytes in blood tissue. Each image in the database was resized to a uniform size of

200 \times 200

. This was performed to streamline the CDT induction process by reducing the number of instances associated with the pixels. It should be noted that certain images in this dataset were excluded from the training and test sets because their ground truth contained only labels for class 0, indicating the absence of erythrocytes for identification. See Appendix C for more details.

The experiments used predetermined training and test sets with a split of 70% and 30%, respectively. After removing 21 images, the training set consisted of 241 images, and the test set had 104 images. The experiments with this dataset were intentionally limited to CDTs of depth one and two. This decision was not arbitrary but was based on careful consideration of the amount of information in the training set, the computational time required, and the results obtained.

3.2.1. SHADE-CDT Experiments

For the SHADE-CDT method, the values

N P = 100

and

N G = 200

were considered based on the results in [9]. Also,

H = 100

was considered for the size of the memories in the SHADE algorithm since this value seemed to be the best option in previous experiments. Table 5 shows the results obtained with these parameter values, considering kernel sizes of three, five, and seven, and CDTs of depth one and two.

The experiments showed that the induction of a CDT with the SHADE-CDT method gave better results with shallow convolutional trees. The results of experiments 1, 3, and 5 in Table 5 with trees of depth one outperformed those with depth two at each kernel size.

Experiment 1 achieved the best result for this dataset by inducing a CDT of depth one with a kernel of size three. Figure 12 shows the CDT induced in this experiment, along with the best and worst individual results obtained for the test dataset, the original images, the ground truth, and the predictions generated using the SHADE-CDT method. The F1-scores and accuracies obtained for these images are also presented. This experiment yielded F1-scores higher than 0.9 for some images. However, it is important to note that due to several structures with low intensity labeled as class 1 in the ground truth, image 331 received an F1-score of less than 0.03, significantly reducing the average F1-score. The second image with the lowest score, image 85, had an F1-score of 0.35623. While this score may not be exceptionally high, the corresponding ground-truth image and the predicted labels demonstrated the precision of the CDT in identifying the brighter parts of the structure in the original image.

With the results for experiments 1, 3, and 6 shown in Table 5 and those shown in [9], we can conclude that this dataset had a better segmentation with shallow CDTs of depth one, since the best results by kernel size were obtained with CDTs of this depth.

3.2.2. SHADE-CDT-BKS Experiments

After analyzing the results and the computational time used for the experiments with the SHADE-CDT method, it was decided to reduce the population size

N P

to 50 for the SHADE-CDT-BKS method, keeping the number of generations

N G = 200

and the size of the memories

H = 100

. Table 6 shows the results obtained under these conditions. As expected, the computational time increased with this method since the DE algorithm was applied

| S | = 3

times to find each CDT node.

As with the SHADE-CDT method, the best result was obtained with a CDT of depth one and kernel size three, i.e., in experiment 1; see Figure 13. In that experiment, as in the SHADE-CDT method, image 331 had the worst F1-score. Image 303 was the second-worst segmented image with an F1-score higher than 0.3. In that case, the original image had several structures with brighter parts that were not considered as label 1 in the ground truth, so the CDT had classification errors.

3.2.3. Comparison between Methods

In Figure 14, images 127, 74, 85, and 331 are shown to compare the change in the segmentation task between experiment 1 with the SHADE-CDT method and the results with the CDT induced in experiments 1 and 2 by the SHADE-CDT-BKS method. Experiment 2 with the SHADE-CDT-BKS method was chosen because it gave the third best F1-score for the dataset.

The following observations can be made about the results of this dataset:

The SHADE-CDT method outperformed the SHADE-CDT-BKS method with the highest F1-score of 0.74946, using a CDT of depth one and a kernel size of three. Thus, a normal convolution process was the best way to segment the images in this dataset. It is also important to note that in the SHADE-CDT-BKS experiments, the population size was decreased to reduce the computation time. For this reason, the result was slightly worse.
The predictions for images 85 and 331 showed that the CDT induced for any of the methods proposed in this work failed to obtain a high F1-score in images where some structures with low tones were labeled as class 1 in the corresponding ground truth. For example, image 331 consistently exhibited an F1-score below 0.1. This is because in the ground truth, certain structures that were difficult to see in the original image, such as those with tones similar to the background tones, were classified as class 1. Something similar happened in image 85, where the brighter pixels were identified with label 1, and some pixels with a tone like the background tone were also labeled as class 1.
The predictions for images 127 and 74 showed that the CDT induced with these methods gave good results when the original images contained bright structures labeled as class 1 in their corresponding ground truth.
Figure 14 shows that the decrease in the overall F1-score between these experiments was because the images with brighter structures received higher F1-scores when these structures were labeled as class 1 in the ground truth, but the images with structures labeled as class 1 with tones similar to the tones of the background pixels had lower F1-scores.
For these experiments, it is clear that the depth of the induced CDT affected the F1-score, probably leading to overfitting in the training data set. However, it is important to note a difference between experiment 2 with the SHADE-CDT method and experiment 2 with the SHADE-CDT-BKS method; see Table 5 and Table 6. In experiment 2 with the SHADE-CDT method, a CDT of depth two was induced with kernels of size three, yielding an F1-score of 0.60856 against the CDT induced in experiment 2 with the SHADE-CDT-BKS method, with an F1-score of 0.66765, also of depth two, but with two kernels of size three and one kernel of size seven; see Figure 15. In these experiments, the same depth was considered for the CDT, but better results were obtained with the SHADE-CDT-BKS method since, in that case, kernels of different sizes were used. This observation leads to the question of whether considering deeper trees with this method allows for a better partitioning of the dataset. These experiments were not initially considered due to the high computational time required for this type of experiment. However, they could be considered by reducing the kernel sizes, i.e., by using $S = {3, 7}$ .
Thus, the comparison with Figure 14 suggests that the behavior of the model on the images of this dataset was influenced by those images where structures with tones similar to the tones of the pixels in the background were labeled as class 1, or conversely, where pixels with lighter tones were labeled as class 0.

3.3. Review of Explainability in a CDT

The structure of a CDT allows an analysis of how each internal kernel classifies the image pixels, since the user can follow the CDT structure through convolutional operations on an image to analyze the results for each branch and node. The kernels obtained by the proposed methods are expected to classify image pixels by patterns and shapes, as shown in Figure 16a.

For example, in the Weizmann Horse dataset, the segmentation performance of each kernel can be analyzed on a CDT of depth two in image 125; see Figure 16b. The first kernel classified the pixels well, while the lower kernels focused exclusively on the horse’s profile. Similarly, in the case of Blood detection in dark-field microscopy images, a CDT of depth one, i.e., one kernel of size three, performed an exemplary classification of the pixels in image 74; see Figure 16c. This analysis can be used to identify the reasons why the model made mistakes in images with low F1-scores.

4. Conclusions and Future Work

This paper presented various experiments analyzing the performance of the globally induced CDT and locally induced CDT using the SHADE-CDT and SHADE-CDT-BKS methods, respectively. These methods are new variants of the proposals made in [7,9], but they use the SHADE algorithm to guide the differential evolution process.

It is crucial to consider the time required for the computation of the fitness function during the DE process to analyze the performance of the methods proposed in this paper. In the global approach, the SHADE-CDT method, each pixel in the training set must pass through the corresponding kernels of a CDT. Since each individual in the population represents a CDT during the DE process, the F1-score and computation time for each CDT increase with the number of training instances. In the local approach, the SHADE-CDT-BKS method, since the training set is partitioned kernel by kernel, the F1-score of each individual in the population is computed using a classical convolution process. However, this approach is complex because for each kernel, multiple sizes in the set S are analyzed to select the optimal kernel size, i.e., the kernel size with the best F1-score. Thus, the DE process must be performed

| S |

times for each kernel, which increases the computation time.

The best results in this paper were obtained with the local search strategy, but the computational time used for this method was high compared to the other one. The SHADE algorithm outperformed the results obtained in [7] with the Weizmann Horse dataset. However, the result of the F1-score obtained in [6], equal to 0.804, was better than the results in this work. Although it is important to mention that in [6], the proportions in the train and test sets employed were two-thirds and one-third, respectively, and the CDT obtained was reported with a depth of 18 with kernels of size 31, so the results presented here are explicable models compared to the other.

Regarding the Blood detection in dark-field microscopy images, F1-scores greater than 0.92 were reported in [9]. However, in these experiments, the training and test sets with 8 and 4 images, respectively, were generated from a pre-selection of 12 images with similar characteristics in the dataset. This pre-selection allowed the authors to analyze the behavior of the model, but in the experiments performed here, only some of the images were excluded since in their ground truths, all the pixels were labeled as class 0. Thus, the F1-scores obtained were lower than those in [9] but with more interesting features to analyze.

Thanks to the experiments carried out in this work, the following conclusions can be highlighted:

The DE algorithm can induce explicable CDTs in conjunction with the SHADE algorithm, and both methods, SHADE-CDT and SHADE-CDT-BKS, can be trained with small data sets.
It is important to note that the model has certain limitations. The two datasets analyzed revealed that the model struggled with textures and specific background components. This is particularly relevant since the project was limited to grayscale images. The use of such images can potentially lead to confusion regarding the tones of the different structures within them, thereby increasing the risk of classification errors.
After performing the multiple experiments shown in this paper, both methods, SHADE-CDT and SHADE-CDT-BKS, seem to have a linear complexity time with respect to the tree depth.
Explainable CDTs with shallow depth are effective since inducing deeper trees does not necessarily result in a better F1-score.
This work highlights the importance of generating explainable models where the segmentation process is clear to the user.

For future work, some techniques to reduce individuals’ evaluation time will be implemented. Furthermore, the performance of the proposed models with other image segmentation approaches, including Convolutional Neural Networks, will be compared. Finally, the possibility of using alternative fitness functions will be explored.

Author Contributions

Conceptualization, A.-L.L.-L., H.-G.A.-M., and E.M.-M.; methodology, A.-L.L.-L., H.-G.A.-M., and E.M.-M.; software, A.-L.L.-L.; validation, H.-G.A.-M. and E.M.-M.; formal analysis, A.-L.L.-L.; investigation, A.-L.L.-L.; resources, A.-L.L.-L., H.-G.A.-M., and E.M.-M.; data curation, A.-L.L.-L.; writing—original draft preparation, A.-L.L.-L.; writing—review and editing, A.-L.L.-L., H.-G.A.-M., and E.M.-M.; visualization, A.-L.L.-L.; supervision, H.-G.A.-M. and E.M.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available. The Weizmann Horse dataset on Kaggle at https://www.kaggle.com/datasets/ztaihong/weizmann-horse-database/data and the Blood detection in dark-field microscopy images on GitHub at https://github.com/PerceptiLabs/bacteria/tree/main?tab=readme-ov-file (accessed on 28 May 2024).

Acknowledgments

The first author would like to thank the Consejo Nacional de Humanidades, Ciencia y Tecnología (CONAHCyT), an institution of the Government of Mexico, for the financial support provided through the “beca de estancia posdoctoral” with CVU 712182 as part of the program Estancias Posdoctorales por México.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CDT	Convolutional Decision Tree
DE	Differential evolution
BKS	Best Kernel Size
SHADE	Success-History-based Adaptive Differential Evolution
CNN	Convolutional Neural Network

Appendix A. SHADE Algorithm

The pseudocode shown in Algorithm A1 corresponds to the SHADE algorithm proposed in [10], where the differential evolution process is performed using a historical memory of successful parameters for the crossing rate (

C R

) and scale factor (F) in the mutation and crossover operators, respectively.

Algorithm A1: SHADE algorithm

Appendix B. CDT for the Weizmann Horse Dataset

In Figure A1, the CDT induced for the Weizmann Horse dataset with the SHADE-CDT-BKS method in experiment 9 is shown. This CDT consists of 15 convolution kernels with the following dimensions: kernel 1–7, kernel 2–3, kernel 3–5, kernel 4–9, kernel 5–5, kernel 6–9, kernel 7–3, kernel 8–9, kernel 9–7, kernel 10–9, kernel 11–3, kernel 12–7, kernel 13–7, kernel 14–7, and kernel 15–3.

Figure A1. CDT induced with the SHADE-CDT-BKS method in experiment 9 for the Weizmann Horse dataset.

Appendix C. Analysis of Blood Detection in Dark-Field Microscopy Images

The images displayed in this section were not used for training or testing during the CDT induction with the Blood Detection in Dark-Field Microscopy images dataset.

The analysis excluded the following 21 images: 13, 45, 61, 62, 63, 77, 86, 87, 88, 156, 161, 162, 163, 269, 288, 305, 308, 320, 324, 328, and 337. All their ground-truth labels were of class 0, representing the background of the images. However, it is important to note that some structures were visible in the original images, as shown in Figure A2.

Figure A2. Examples of images excluded in the training and testing process for the CDT induction with the Blood detection in dark-field microscopy images.

This characteristic consistently caused errors in the segmentation process of the induced CDTs when using any of the methods proposed in this work with all the images in the dataset. The structures observed in these 21 images significantly shifted the threshold for the segmentation task, resulting in a notable decrease in F1-score values.

References

Lateef, F.; Ruichek, Y. Survey on semantic segmentation using deep learning techniques. Neurocomputing 2019, 338, 321–348. [Google Scholar] [CrossRef]
Patil, D.D.; Deore, S.G. Medical image segmentation: A review. Int. J. Comput. Sci. Mob. Comput. 2013, 2, 22–27. [Google Scholar]
Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
Diakogiannis, F.I.; Waldner, F.; Caccetta, P.; Wu, C. ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS J. Photogramm. Remote Sens. 2020, 162, 94–114. [Google Scholar] [CrossRef]
Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 2nd ed. 2024. Available online: https://christophm.github.io/interpretable-ml-book (accessed on 28 May 2024).
Laptev, D.; Buhmann, J.M. Convolutional decision trees for feature learning and segmentation. In Proceedings of the German Conference on Pattern Recognition, Münster, Germany, 2–5 September 2014; Springer: Cham, Switzerland, 2014; pp. 95–106. [Google Scholar]
Barradas Palmeros, J.A.; Mezura Montes, E.; Acosta Mesa, H.G.; Márquez Grajales, A.; Rivera López, R. Induction of Convolutional Decision Trees with Differential Evolution for Image Segmentation. In Proceedings of the Congreso Mexicano de Inteligencia Artificial, Guadalajara, Mexico, 30 May–3 June 2023. [Google Scholar]
Eiben, A.E.; Smith, J.E. Introduction to Evolutionary Computing; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
López-Lobato, A.L.; Acosta-Mesa, H.G.; Mezura-Montes, E. Blood Cell Image Segmentation Using Convolutional Decision Trees and Differential Evolution. In Proceedings of the Advances in Computational Intelligence, MICAI 2023 International Workshops, Yucatán, Mexico, 13–18 November 2023; Springer Nature: Cham, Switzerland, 2024; pp. 315–325. [Google Scholar]
Tanabe, R.; Fukunaga, A. Success-history based parameter adaptation for differential evolution. In Proceedings of the 2013 IEEE Congress on Evolutionary Computation, Cancun, Mexico, 20–23 June 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 71–78. [Google Scholar]
Storn, R.; Price, K. Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Rivera-Lopez, R.; Canul-Reich, J. Construction of near-optimal axis-parallel decision trees using a differential-evolution-based approach. IEEE Access 2018, 6, 5548–5563. [Google Scholar] [CrossRef]
Rivera-Lopez, R.; Canul-Reich, J.; Mezura-Montes, E.; Cruz-Chávez, M.A. Induction of decision trees as classification models through metaheuristics. Swarm Evol. Comput. 2022, 69, 101006. [Google Scholar] [CrossRef]
Price, K.; Storn, R.M.; Lampinen, J.A. Differential Evolution: A Practical Approach to Global Optimization; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Ahmad, M.F.; Isa, N.A.M.; Lim, W.H.; Ang, K.M. Differential evolution: A recent review based on state-of-the-art works. Alex. Eng. J. 2022, 61, 3831–3872. [Google Scholar] [CrossRef]
Zhang, J.; Sanderson, A.C. JADE: Adaptive differential evolution with optional external archive. IEEE Trans. Evol. Comput. 2009, 13, 945–958. [Google Scholar] [CrossRef]
Tanabe, R.; Fukunaga, A.S. Improving the search performance of SHADE using linear population size reduction. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC), Beijing, China, 6–11 July 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1658–1665. [Google Scholar]
Kim, S.; Casper, R. Applications of Convolution in Image Processing with MATLAB; University of Washington: Seattle, WA, USA, 2013; pp. 1–20. [Google Scholar]
Dolotov, E.; Zolotykh, N. Evolutionary algorithms for constructing an ensemble of decision trees. In Proceedings of the Analysis of Images, Social Networks and Texts: 8th International Conference, AIST 2019, Kazan, Russia, 17–19 July 2019; Revised Selected Papers 8. Springer: Cham, Switzerland, 2020; pp. 9–15. [Google Scholar]
Rivera-Lopez, R.; Canul-Reich, J. Differential evolution algorithm in the construction of interpretable classification models. In Artificial Intelligence-Emerging Trends and Applications; IntechOpen: London, UK, 2018; pp. 49–73. [Google Scholar]
Borenstein, E.; Sharon, E.; Ullman, S. Combining top-down and bottom-up segmentation. In Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop, Washington, DC, USA, 27 June–2 July 2004; IEEE: Piscataway, NJ, USA, 2004; p. 46. [Google Scholar]

Figure 1. Semantic segmentation technique applied to horse detection.

Figure 2. Example of the convolution operation on a

6 \times 6

image with a

3 \times 3

kernel.

Figure 2. Example of the convolution operation on a

6 \times 6

image with a

3 \times 3

kernel.

Figure 3. Example of a Convolutional Decision Tree with kernels of size

3 \times 3

.

Figure 3. Example of a Convolutional Decision Tree with kernels of size

3 \times 3

.

Figure 4. Example of an instance codification associated with a pixel (size

s = 3

).

Figure 4. Example of an instance codification associated with a pixel (size

s = 3

).

Figure 5. Example of a convolutional kernel codification with size

s = 3

.

Figure 5. Example of a convolutional kernel codification with size

s = 3

.

Figure 6. Processing an instance associated with a pixel.

Figure 7. Codification of the convolutional kernels in a CDT for the SHADE-CDT method.

Figure 8. Kernel selection for each internal node of the CDT with the SHADE-CDT-BKS method.

Figure 9. CDT induced by the SHADE-CDT method in experiment 17. The two best and two worst segmentation results obtained, with the original images, the ground truth (real segmented masks), and the predictions are shown along with the corresponding image number, F1-score, and accuracy.

Figure 10. The two best and two worst segmentation results from experiment 9 using the SHADE-CDT-BKS method, with its corresponding image number, F1-score, and accuracy.

Figure 11. Comparison of the segmentation results for images 125, 131, 41, and 233 of the Weizmann Horse Dataset for experiment 1 with the SHADE-CDT method and for experiments 4 and 9 with the SHADE-CDT-BKS method. The original images, the ground truth, and the predictions for these experiments are shown, along with their corresponding image number (under the original image) and the array (F1-score, accuracy) under the corresponding image result. The overall F1-Score and accuracy of each experiment are shown in blue under the corresponding column.

Figure 12. CDT induced by the SHADE-CDT method in experiment 1. The two best and two worst segmentation results obtained with the original images, the ground truth, and the predictions are shown, along with the corresponding image number, F1-score, and accuracy.

Figure 13. CDT induced by the SHADE-CDT-BKS method in experiment 1. The two best and two worst segmentation results obtained with the original images, the ground truth, and the predictions are shown on the right, along with the corresponding image number, F1-score, and accuracy.

Figure 14. Comparison of the segmentation results for images 127, 74, 85, and 331 of the Blood detection in dark-field microscopy images for experiment 1 with the SHADE-CDT method and for experiments 1 and 2 with the SHADE-CDT-BKS method. The original images, the ground truth, and the predictions for these experiments are shown, along with their corresponding image number (under the original image) and the array (F1-score, accuracy) under the corresponding image result. The F1-score and accuracy of each experiment are presented in blue under their respective column.

Figure 15. CDT induced by the SHADE-CDT-BKS method in experiment 2. The two best and two worst segmentation results obtained with the original images, the ground truth, and the predictions are shown on the right, along with the corresponding image number, F1-score, and accuracy.

Figure 16. (a) Example of the expected explainability in a CDT. Here, the kernels performed the classification of image pixels by patterns. (b) Segmentation of image 125 of the Weizmann Horse dataset by a CDT of depth 2. (c) Segmentation of image 74 of the Blood detection in dark field microscopy images by a CDT of depth 1.

Table 1. Specifications of the computer that was used to perform the experiments.

Operating system	Windows 11 Pro 23H2
RAM	64 GB
Processor	AMD Ryzen 5 5600G with Radeon Graphics
Processor speed	3.90 GHz

Table 2. Best result obtained with the CDT induction method proposed in [7] with the variant DE/best/1/bin, and the parameters CR and F set to 0.9. A training dataset with a proportion of 1/10 (33 images) was used in the learning process.

Exp.	Popsize	Generations	Depth	F1-Score	Accuracy	Time
8	80	200	3	0.4882	0.6798	23.17 h

Table 3. Experiments with the Weizmann Horse Dataset for CDT induction using the SHADE-CDT method with

N P

= 100,

N G

= 200, and H = 100. The best result by kernel size is shown in bold.

Table 3. Experiments with the Weizmann Horse Dataset for CDT induction using the SHADE-CDT method with

N P

= 100,

N G

= 200, and H = 100. The best result by kernel size is shown in bold.

Experiment	Kernel Size	Depth	Time	F1-Score	Accuracy
1	3	1	28.41 min	0.45442	0.76264
2	3	2	35.86 min	0.47463	0.75427
3	3	3	42.62 min	0.45648	0.7625
4	3	4	48.87 min	0.47894	0.72317
5	3	5	56.43 min	0.47538	0.70138
6	5	1	34.94 min	0.45667	0.75605
7	5	2	43.31 min	0.49249	0.72448
8	5	3	49.02 min	0.49795	0.72471
9	5	4	56.95 min	0.49805	0.73129
10	5	5	1.09 h	0.49601	0.71724
11	7	1	41.36 min	0.45423	0.74869
12	7	2	51.27 min	0.50945	0.72868
13	7	3	58.68 min	0.50874	0.72947
14	7	4	1.11 h	0.5151	0.73056
15	7	5	1.27 h	0.50853	0.72809
16	9	1	51.3 min	0.47535	0.74975
17	9	2	1.44 h	0.52204	0.74759
18	9	3	1.71 h	0.51292	0.74353
19	9	4	2.05 h	0.51652	0.73279
20	9	5	1.57 h	0.51517	0.72943

Table 4. Experiments with the Weizmann Horse Dataset for CDT induction using the SHADE-CDT-BKS method with

N P

= 50,

N G

= 200, and H = 100. The best result by set of kernel sizes is shown in bold.

Table 4. Experiments with the Weizmann Horse Dataset for CDT induction using the SHADE-CDT-BKS method with

N P

= 50,

N G

= 200, and H = 100. The best result by set of kernel sizes is shown in bold.

Experiment	Kernel Sizes	Depth	Time	F1-Score	Accuracy
1	{3, 5, 7}	1	1.4 h	0.4755	0.75228
2	{3, 5, 7}	2	2.63 h	0.51567	0.70908
3	{3, 5, 7}	3	4.09 h	0.51902	0.67215
4	{3, 5, 7}	4	5.34 h	0.52558	0.66385
5	{3, 5, 7}	5	5.50 h	0.50859	0.59686
6	{3, 5, 7, 9}	1	1.67 h	0.48134	0.74936
7	{3, 5, 7, 9}	2	3.31 h	0.51361	0.67674
8	{3, 5, 7, 9}	3	4.88 h	0.50145	0.60426
9	{3, 5, 7, 9}	4	6.48 h	0.53651	0.67394
10	{3, 5, 7, 9}	5	7.44 h	0.53062	0.62843

Table 5. Experiments with the Blood Detection in Dark-Field Microscopy images for CDT induction using the SHADE-CDT method with

N P

= 100,

N G

= 200, and H = 100. The best result by kernel size is shown in bold.

Table 5. Experiments with the Blood Detection in Dark-Field Microscopy images for CDT induction using the SHADE-CDT method with

N P

= 100,

N G

= 200, and H = 100. The best result by kernel size is shown in bold.

Experiment	Kernel Size	Depth	Time	F1-Score	Accuracy
1	3	1	11.36 h	0.74946	0.89667
2	3	2	14.68 h	0.60856	0.79846
3	5	1	12.64 h	0.62068	0.85601
4	5	2	16.45 h	0.47262	0.82742
5	7	1	15.68 h	0.52155	0.83746
6	7	2	20.33 h	0.49722	0.82554

Table 6. Experiments with the Blood Detection in Dark-Field Microscopy Images for CDT induction using the SHADE-CDT-BKS method with

N P

= 50,

N G

= 200, and H = 100. The best result by kernel size is shown in bold.

Table 6. Experiments with the Blood Detection in Dark-Field Microscopy Images for CDT induction using the SHADE-CDT-BKS method with

N P

= 50,

N G

= 200, and H = 100. The best result by kernel size is shown in bold.

Experiment	Kernel Sizes	Depth	Time	F1-Score	Accuracy
1	{3, 5, 7}	1	1.61 days	0.72588	0.88132
2	{3, 5, 7}	2	3.15 days	0.66765	0.84057

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

López-Lobato, A.-L.; Acosta-Mesa, H.-G.; Mezura-Montes, E. Induction of Convolutional Decision Trees with Success-History-Based Adaptive Differential Evolution for Semantic Segmentation. Math. Comput. Appl. 2024, 29, 48. https://doi.org/10.3390/mca29040048

AMA Style

López-Lobato A-L, Acosta-Mesa H-G, Mezura-Montes E. Induction of Convolutional Decision Trees with Success-History-Based Adaptive Differential Evolution for Semantic Segmentation. Mathematical and Computational Applications. 2024; 29(4):48. https://doi.org/10.3390/mca29040048

Chicago/Turabian Style

López-Lobato, Adriana-Laura, Héctor-Gabriel Acosta-Mesa, and Efrén Mezura-Montes. 2024. "Induction of Convolutional Decision Trees with Success-History-Based Adaptive Differential Evolution for Semantic Segmentation" Mathematical and Computational Applications 29, no. 4: 48. https://doi.org/10.3390/mca29040048

Article Menu

Induction of Convolutional Decision Trees with Success-History-Based Adaptive Differential Evolution for Semantic Segmentation^†

Abstract

1. Introduction

2. Materials and Methods

2.1. Differential Evolution Algorithm and SHADE

2.1.1. Differential Evolution Algorithm

2.1.2. SHADE

2.2. Convolutional Decision Trees (CDTs)

2.2.1. CDT Induction with SHADE

Global Strategy (SHADE-CDT)

Local Strategy (SHADE-CDT-BKS)

3. Experiments and Results

3.1. Weizmann Horse Dataset

3.1.1. SHADE-CDT Experiments

3.1.2. SHADE-CDT-BKS Experiments

3.1.3. Comparison between Methods

3.2. Blood Detection in Dark-Field Microscopy Images

3.2.1. SHADE-CDT Experiments

3.2.2. SHADE-CDT-BKS Experiments

3.2.3. Comparison between Methods

3.3. Review of Explainability in a CDT

4. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. SHADE Algorithm

Appendix B. CDT for the Weizmann Horse Dataset

Appendix C. Analysis of Blood Detection in Dark-Field Microscopy Images

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Induction of Convolutional Decision Trees with Success-History-Based Adaptive Differential Evolution for Semantic Segmentation †

Abstract

1. Introduction

2. Materials and Methods

2.1. Differential Evolution Algorithm and SHADE

2.1.1. Differential Evolution Algorithm

2.1.2. SHADE

2.2. Convolutional Decision Trees (CDTs)

2.2.1. CDT Induction with SHADE

Global Strategy (SHADE-CDT)

Local Strategy (SHADE-CDT-BKS)

3. Experiments and Results

3.1. Weizmann Horse Dataset

3.1.1. SHADE-CDT Experiments

3.1.2. SHADE-CDT-BKS Experiments

3.1.3. Comparison between Methods

3.2. Blood Detection in Dark-Field Microscopy Images

3.2.1. SHADE-CDT Experiments

3.2.2. SHADE-CDT-BKS Experiments

3.2.3. Comparison between Methods

3.3. Review of Explainability in a CDT

4. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. SHADE Algorithm

Appendix B. CDT for the Weizmann Horse Dataset

Appendix C. Analysis of Blood Detection in Dark-Field Microscopy Images

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Induction of Convolutional Decision Trees with Success-History-Based Adaptive Differential Evolution for Semantic Segmentation^†