1. Introduction
Deep learning is a subset of machine learning that involves training artificial neural networks with multiple layers to perform complex tasks such as image recognition [
1], advisory frameworks [
2], image classification and sequencing, health image processing, natural speech processing [
3], brain–computer interfaces [
4], and economics time series [
5].
Generally, CNNs consist of several layers such as convolutional layers, pooling layers, and fully connected layers, where pooling layers down-sample the feature maps to reduce their size and improve efficiency. Pooling is a critical component of CNNs that plays an important role by reducing the spatial dimensions of the input data, while at the same time retaining the important information. Max pooling and average pooling are two of the most popular pooling operations used, due to the fact that they are simple and fast.
In many different fields, such as engineering, finance, medicine, and natural language processing, the information provided might be imprecise due to various factors, which requires giving a greater degree of precision to every member; thus, fuzzy sets appear as a viable solution to handle these types of problems by assigning each element with a degree of membership. However, the available information corresponding to a fuzzy concept may be incomplete, where the sum of the membership degrees and the non-membership degrees may be less than one.
As a solution, Atanassov [
6,
7] introduced a flexible extension of traditional fuzzy sets that generalizes fuzzy sets into intuitionistic fuzzy sets (IFS) by adding a “hesitation degree” as a new function, defining the lack of knowledge and thereby providing a tool to deal with the hesitancy of the decision-maker in assigning an element to a set or its complement. Thus, IFS has turned out to be an important tool in modeling real situations [
8,
9].
The goal of this article is to explore and to apply the concept of intuitionistic fuzzy sets (IFS) to the pooling operation so as to lead to more accurate and robust feature representations and thereby demonstrate the advantages of using IFS-based pooling instead of other more traditional pooling methods.
The rest of the paper is structured as follows:
Section 2 provides an overview of various pooling operations through a discussion on related work.
Section 3 introduces the concept of intuitionistic fuzzy logic and the proposed intuitionistic pooling model. In
Section 4, experimental results are presented, including performance measures and CNN classification tests conducted on different pooling methods. Finally,
Section 5 concludes the paper and offers perspectives for future research.
2. Related Work
Convolutional Neural Networks (CNNs) are a deep learning algorithm, widely used in computer vision, based on two operations: convolution (which extracts features through filtering) and pooling (which reduces the dimensionality). There are multiple types of pooling operators that are used for different purposes as will be shown below in this section.
The two more common types of pooling are max and average pooling, due to their simplicity in the absence of the parameters to tune, where average pooling summarizes all the features in the pooling region, then reduces noisy features, and the background region dominates. In contrast, max pooling selects the strongest activation in the pooling region, thus avoiding the effect of unwanted background and capturing noisy features [
10]. In this direction, some new pooling has emerged through the use of both max and average pooling in order to take greater advantage of them, such as mixed max-average pooling [
11] which combines them linearly, thereby giving it a weight that determines the proportions of using each type of pooling, or like gated mix-average pooling that depends on the characteristics of each image instead of the characteristics of the dataset [
10,
11] because of the high correlation between adjacent pixels of the image and due to both the mixed max-average pooling and gated mix-average pooling and thus considering each pooling region independently. Dynamic Correlation pooling [
10,
12] is introduced to use the correlation information between adjacent pixels of the image. In medical imaging, soft pooling approaches are widely used compared to linear combination by using a smooth differentiable function to approximate the max and the average pooling for different parameter settings [
10]. Log-Sum-Exp pooling (LSE), Polynomial pooling [
13], Learned-Norm pooling [
14],
pooling [
15],
Integration (
I) pooling [
16], rank-based pooling [
17], Dynamic pooling [
18], Smooth-Maximum pooling [
19], soft pooling [
20], Maxfun pooling [
21] and Ordinal pooling [
22] are all types of soft pooling approaches.
Moreover, there are other soft pooling approaches based on the characteristics of the pooling region, such as Polynomial pooling, which enhances the detail sensitivity of a segmentation network and is compatible with any pre-trained classification [
13], and
pooling, which provides a flexible way to transition smoothly from max to average pooling [
17] and is characterized by the order of the unit that is learned according to geometrical perspective rather than pre-defined [
14]. In the case of rank-based pooling, the top
k elements in each pooling region are averaged together as the pooled representation. Ordinal pooling and Multiactivation pooling [
23] are similar to rank-based pooling since they also use the rank of the elements when applying pooling.
Some variants of pooling aim to handle overfitting such as Mixed-pooling [
24] and Hybrid pooling [
25], which are randomly selected since either max or average operations will be performed during training and the mode used in training will also be used in testing [
10], in addition to stochastic pooling [
26] which solves the down-weight caused by average pooling as well as the overfitting problems caused by max pooling. In case the training data are limited, over-fitting occurs due to strong activations that dominate the updating process, and therefore, rank-based stochastic pooling [
17] could be used. In contrast to stochastic pooling that uses only one value from each pooling region, Max-pooling dropout [
27] uses a set of values that could be randomly sampled, then pooling can be applied on these randomly sampled activations [
10]. Comparing with stochastic pooling, both randomly sample activation based on multinomial distributions at the pooling stage, with better performance by Max-pooling dropout. To date, this approach introduces randomness in the pooling stage. S3 pooling [
28] and fractional max pooling [
29] introduce randomness in the spatial sampling stage.
There are other pooling approaches used for specific purposes such as encoding spatial structure information. Spatial Pyramid pooling (SPP) [
30] is a popular one and is useful for rigid structures. Cell Pyramid Matching (CPM) [
31] is proposed for cell image classification through the incorporation of two spatial structure Dual Region (DR) descriptors and Spatial Pyramid pooling (also known as Spatial Pyramid Matching (SPM) [
30]).
In case of images that include objects with various poses [
10], part-based pooling [
10,
32] is useful as a solution according to its ability to detect diverse parts of each image and pool these features, finally concatenating them together as the final image representation. In the case of rotated objects, Concentric Circle pooling (CCP) [
33] and Polycentric Circle pooling [
34] are efficient in dealing with the rotation variance problem in CNNs.
Unlike previous methods, which aimed to capture large-scale spatial structure information, a Geometric
-Norm pooling [
35] aims to capture local structure information [
10].
Pooling can also be used for capturing the interaction between different feature maps, and different regions of feature maps, such as Improved Bilinear pooling [
36], and Second-Order pooling [
37], which preserves information about their pairwise correlations.
Grouping Bilinear Pooling (GBP) [
38] is an improvement of Bilinear pooling aimed at fine-grained image classification, as well as achieving good accuracy with the fewest parameters, in addition to Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification, where the correlation among the regions and their spatial layouts to encode complementary partial information are taken into account [
39]. Moreover, self-attentive pooling extracts more complex relationships between the different features from non-local features of the activation maps in comparison to existing local pooling layers [
40].
Certain types of structured data utilize special types of pooling better suited to their structure, for instance, graph-structured data that use graph pooling like Self-attention graph pooling SAGPool [
41], a method based on self-attention that considers both node features and graph topology, or Adaptive Structure Aware Pooling (ASAP) [
42], a sparse pooling operator able to capture local sub-graph information using a new self-attention, Master2Token (M2T), and a modified GNN formulation to capture the importance of each node within a given graph. In addition, Graph multihead attention pooling with self-supervised learning (GMAPS) [
43] enables the construction of graph pooling through a differentiable node assignment depending on a multihead attention mechanism and a hierarchical objective based on maximizing mutual information, along with Coarsened Graph Infomax Pooling (CGIPool), which maximizes the mutual information between the input and the coarsened graph of each pooling layer.Also, dual-sampling attention pooling was proposed [
44] for 3D mesh, and a tripool pooling method [
45] was proposed for 3D action recognition from skeleton data.
During implement pooling, some discriminative details can be lost; as a solution, Preserving pooling (DPP) [
46] and Local Importance-Based pooling (LIP) [
47] were proposed for pooling so as to preserve important features.Without ignoring the popular problem of computational complexity, RNNPool [
48] is an efficient pooling for reducing computational complexity and peak memory usage for inference without a substantial loss in accuracy. Also, fuzzy pooling based on fuzzification, aggregation, and the defuzzification of feature map neighborhoods to cope with the uncertainty of feature values can preserve the important features of the pooling areas by transforming the crisp input volume space into a fuzzy feature space [
49].
In this paper, we propose an intuitionistic fuzzy pooling methodology that can be integrated into any existing Convolutional Neural Network (CNN) architecture as a replacement for traditional pooling layers.
Our proposed approach extends the concept of fuzzy pooling by addressing an important limitation. In fuzzy pooling, the membership of an element to a fuzzy set is represented by a single value between zero and one. However, this value may not accurately capture the underlying uncertainty. To overcome this limitation, we introduce a new function called the hesitation function which handles cases where the membership value does not precisely correspond to the element.
By incorporating the hesitation function, our method ensures that the neglected value in fuzzy pooling is appropriately considered. This enhancement leads to a more comprehensive and accurate representation of uncertainty during the pooling process.
Overall, our intuitionistic fuzzy pooling methodology presents a valuable extension to the existing fuzzy pooling technique, enabling a more refined treatment of uncertainty in CNN architectures.
3. Intuitionistic Fuzzy Pooling
3.1. Intuitionistic Fuzzy Sets
In this section, we outline some fundamental definitions that are necessary to understand the research context and the technical terms used throughout the paper, as well as the diagram in
Figure 1, which further explains our method. In what follows, the set
U represents a universe of discourse (the set of values that a fuzzy variable can take).
Definition 1 ([
8])
. An intuitionistic fuzzy set (IFS) I is obtained by associating two non-negative values to each element x of U. In other words, to construct an intuitionistic set, we need two functions and , which represent the degrees of membership and non-membership of each element to I, respectively. In this sense, I is explicitly given by . Moreover, , Equation (1) is valid:Note that and model the experts’ knowledge of our agent’s environment.
The fact that does not expect 1 means that there is a lack of information or certainty about whether x belongs to I. The following definition quantifies the degree of hesitation.
Definition 2 ([
8])
. The intuitionistic fuzzy indicator or the hesitation indicator from x in I is given by the formula . is the degree of indeterminacy of at the IFS I. reflects the lack of knowledge of whether or not each belongs to the IFS. Evidently, whatever , we have 0
Definition 3 ([
50])
. The intuitionistic triangular fuzzy distribution of I can be expressed by the next equation:ε is an arbitrary non-negative number such that and
3.2. Defuzzification of IFSs
In this section, we take up the challenge of defuzzifying an IFS
I [
51]. A typical way of associating a real number with an IFS
I can be illustrated by the steps below:
- (i)
Converts the IFS I into a fuzzy (normal) entity;
- (ii)
Assesses the standard fuzzy pattern by means of a defuzzification strategy.
Regarding step (i), in [
52], the contributors gave the name “de-i-fuzzification” for the scheme for generating a convenient fuzzy set out of an IFS. Moreover, they suggested utilizing the operator presented in [
53]:
with
. Note that
is a standard fuzzy subset with membership function
In particular, they proposed
= 0.5 as a solution for the minimum problem:
where
d is the Euclidean distance. In this case, the fuzzy set
is characterized by the the following membership function:
For step (ii), in agreement with the approach suggested in ([
54], Section 10), we may evaluate the IFS
I by computing the center of gravity (COG) of the obtained fuzzy set, that is,
with
.
3.3. Intuitionistic Pooling Model
To introduce the fuzzy intuitionistic pooling operator, we consider the following triangular fuzzy intuitionistic membership/non-membership functions [
50]:
where
,
,
,
,
,
,
,
; these choices are inspired by the paper that introduced the fuzzy pooling operator [
49]. The non-negative real number
expresses the amount of missing information and
p is a non-negative real number. It is worth noting that fuzzification and defuzzification operations are easier to perform for the system using intuitionistic triangular fuzzy numbers (in comparison with Gaussian numbers) [
51].
Let
. For
, the fuzzy membership function
is obtained by aggregating
and
using weight
:
For each patch
,
, and based on
, we define the summary batch by
Pooling starts with the aggregation of the intuitionistic fuzzy patch as follows:
Based on these scores, for each
p, another patch
is built by selecting the spatial intuitionistic fuzzy patches
, that have the large scores
:
For each patch
,
, if
, then the intuitionistic fuzzy crisp value
, associated with the batch
n, is given by Equation (
10):
In
Appendix A,
Figure A1 illustrates an example of the different steps of the INT-FUPooling operation, starting from a patch extracted from a set of maps with a shape of 3 × 3 and a number of filters equal to 3.
3.4. Fuzzy Average to Fuzzy System
The premature aggregation of the blocks (sum), after the immediate application of intuitionistic functions, makes the direct clarification of the fuzzy process almost impossible. But this does not prevent the mathematical rules governing the fuzzy averaging of the various tasks from being spelled out in equation form. Indeed,
, the transformation of the bloc
into a single crisp value can be seen as the calculation of a fuzzy average of the set
that implements the membership
determined by Equation (
9). In this sense, each patch
n is governed by the following mathematical rule:
where ∗ is the convolution operator,
,
, and ∨ is the max logical operator. As the proposed method performs several aggregations (fuzzy summations before reaching the conclusion part), it is difficult to extract the fuzzy rules governing the proposed intuitionistic pooling operation.
In order to transform the fuzzy averaging procedure into an explicit fuzzy system (with inputs, outputs, rules, fuzzification, and defuzzification operators), we consider a sample of
N images, which we break down into
blocks. Then, we perform intuitionistic pooling for each block, forming an input–output dataset (vectors formed by the
components, intuitionist pooling values). Next, we use enhanced self-generated dynamic fuzzy Q-learning (EDSGFQL) to systematically construct fuzzy inference systems (FISs) [
55]. In the EDSGFQL process, the structural recognition and parameter approximation of SIFs are carried out using fuzzy cmeans [
56] for grouping the input data space when generating SIFs. Meanwhile, the frame and preconditioning components of an SIF are created by reinforcement learning, i.e., the fuzzy rules are tuned and deleted based on the reinforcement signals. In this sense,
, and for
, we obtain the sub-fuzzy-system presented in
Figure 2. In this figure,
,
,
, and
represent the components of the patch
n.
In the following, we give different components of the intuitionistic fuzzy system given in
Figure 2:
[System]: Fis type = ‘mamdani’; NumInputs = 4; NumOutputs = 1; NumRules = 3; AndMethod = ‘min’; OrMethod = ‘max’; ImpMethod = ‘min’; AggMethod = ‘max’; DefuzzMethod = ‘centroid’.
[Input1 ]: Range = [0.057 0.897]; MF1 = [‘Low’, ‘gaussmf’, sd = 0.0375, mean = 0.447]; MF2 = [‘Medium’, ‘gaussmf’, sd = 0.034, mean = 0.484]; MF3 = [‘High’, ‘gaussmf’, sd = 0.042, mean = 0.531].
[Input2 ]: Range = [0.034 0.992], MF1 = [‘Low’, ‘gaussmf’, sd = 0.037, mean = 0.448]; MF2 = [‘Medium’, ‘gaussmf’, sd = 0.034, mean = 0.482]; MF3 = [‘High’, ‘gaussmf’, sd = 0.042, mean = 0.530].
[Input3 ]: Range = [0.050 1], MF1 = [‘Low’, ‘gaussmf’, sd = 0.037, mean = 0.448]; MF2 = [‘Medium’, ‘gaussmf’, sd = 0.034, mean = 0.482]; MF3 = [‘High’, ‘gaussmf’, sd = 0.042, mean = 0.530].
[Input4 ]: Range = [0 0.93], MF1 = [‘Low’, ‘gaussmf’, sd = 0.037, mean = 0.447] MF2 = [‘Medium’, ‘gaussmf’, sd = 0.034, mean = 0.482] MF3 = [‘High’, ‘gaussmf’, sd = 0.043, mean = 0.532].
[output intuitAverage]: Range = [0.227 1.559]; MF1 = [‘Low’, ‘gaussmf’, sd = 0.025, mean = 0.447], MF2 = [‘meduim’, ‘gaussmf’, sd = 0.023, mean = 0.482]; MF3 = [‘High’, ‘gaussmf’, sd = 0.032, mean = 0.530].
where sd represents the standard deviation of different Gaussian membership functions and MF is the abbreviation of the membership function.
[Rules]:
- 1.
If ( is Low) and ( is Low) and ( is Low) and ( is Low), then (intuitAverage is Low) (1).
- 2.
If ( is Medium) and ( is Medium) and ( is Medium) and ( is Medium), then (intuitAverage is Medium) (1).
- 3.
If ( is High) and ( is High) and ( is High) and ( is High), then (intuitAverage is High) (1).
It should be noted that all the rules have the same weight, equal to 1. In addition, the components of this system will be modified if we enrich the learning dataset with other images.
3.5. Optimal Control Problem to Train Deep Neural Network
Given a set of labeled images
, in order to automate the prediction of the label of each image, we build a CNN with
P convolution layers
, such as
, whose last layer is connected to a multilayer perceptron
, where
is the matrix of the connection between its different layers. The primary objective of learning is to shorten the distance between the predicted label of a given input image and the target label. Considering the collection of training and labeled image from
, Golobal’s loss function, e.g., root mean square error (RMSE), is given in [
57] by:
where
and
is the matrix of connection between neurons of the MLP component of CNN. To minimize the loss
L, given by Equation (
12), the backpropagation (BP) algorithm based on stochastic gradient descent is implemented to update the weights at the
th step as follows:
where
is a random image from
, and
is the time step. If
is sufficiently small, one can perform the following approximation:
Let
be a time series of images uniformly generated from
, and
a time of the MLP weights associated with these images. The aim behind training the CNN via BP is to tune all the parameters and therefore optimize the loss function. In this sense, the problem of training CNN can be reformulated as an optimal control problem [
58,
59]:
The control
u is nothing but the time step of the BP algorithm. We can use Pontryagin’s Minimum Principle [
59] or a local search method [
60] to solve the problem
.
4. Experimental Setup and Results
4.1. Metrics
The measures used in this paper are used to calculate the quality of compressed images and how many are similar to the original image. In general, the results of the measures on datasets are obtained by calculating the average of measures taken over all the images.
Mean Squared Error (MSE) measures the average squared difference between the pixels of two images; it strongly depends on the image intensity scaling:
, where (M × N) is the size of both input images and .
A lower MSE value indicates high similarity.
Peak Signal-to-Noise Ratio (PSNR) is the ratio between the maximum possible power of an image and the power of corrupting noise that affects the quality of its representation [
61]. PSNR = 10
, R =
(
n = 8, 8 bits); let R denote the maximum pixel value of the original image, which is set at R = 255.
A higher PSNR value indicates higher image quality, while a lower value suggests a significant difference between the images.
This metric is typically used to evaluate the quality of reconstructed or compressed images.
Structural Similarity Index Measure (SSIM) [
61] is a well-known metric used for measuring the similarity, based on three factors: loss of correlation, luminance distortion and contrast distortion. SSIM varies between
and 1, where 1 indicates perfect similarity.
Root Mean Squared Error (RMSE) is the square root of the Mean Squared Error (MSE) and calculates the differences between values predicted values and the true values. RMSE, similarly to MSE, is frequently used as a metric for evaluating the quality of predictions from a model, the smaller value referring to the better performance of models or similarity for images.
Signal-to-Reconstruction Error (SRE) is a metric widely used to measure the quality of speech signals. It can also be used to measure quality of images. SRE measures the errors related to the mean image intensity and is relevant for making errors comparable between images that have different brightness levels [
62]. Higher SRE values indicate better quality.
Feature-Based Similarity Index (FSIM) [
63] is a measure to show the similarity rates between two images based on two basic features: the primary feature Phase Congruency (PC) and the second feature Gradient Magnitude (GM) [
64]. The value of FSIM ranges between 0 and 1, a high value referring to similar images.
Universal image quality index (UIQ) is designed by modeling any image distortion as a combination of three factors: loss of correlation, luminance distortion, and contrast distortion [
65]. Its values range from
to 1, values approaching 1 in the case of more similar images.
4.2. Data Sets
4.2.1. MNIST Dataset
MNIST is a collection of 70,000 divided into two parts: the first contains 60,000 images for training, while the second part contains 10,000 testing images. This collection of images consists of handwritten digits from 0 to 9 with a size of 28 × 28 as shown in
Figure 3.
4.2.2. Fashion MNIST Dataset
The fashion MNIST dataset contains 70,000 images of clothes, each image being a 28 × 28 grayscale of 10 categories of fashion products (“T-shirt”, “Trouser”, “Pullover”, “Dress”, “Coat”, “Sandal”, “Shirt”, “Sneaker”, “Bag”, “Ankle boot”). The fashion MNIST dataset
Figure 4 is divided into 60,000 for training datasets and 10,000 for testing, and every class of images contains 7000 images.
4.3. Image Reconstruction: INT-FUP vs. Classical Pooling Models
The pooling operation reduces the dimensionality of images by summarizing a block of information into a single value. The reverse operation consists of reconstructing the images by using only the values obtained through aggregation. In this section, the INT-FUP model is tested against average pooling, max pooling, and fuzzy pooling using some of the image similarity measures mentioned earlier.
Table 1 shows the high performance of our proposed method over all measures metric, and the evaluation performed for 1500 images of handwritten digits. The measure in
Table 1 shows the similarities and differences between the real images and the images after pooling, in the second table, there are results of measures between pooling and depooling images. We will also show pooling experimental using various types of noises.
In order to evaluate the efficacy of our model, we conducted experiments on two widely accessible National Institute of Standards and Technology (MNIST) datasets: handwritten digits-MNIST [
66] and Fashion-MNIST [
67] datasets.
The experiments presented in
Table 1 were performed using the same set of parameters as mentioned in
Section 3, involving a comparison of four pooling approaches. The results from the experiments provide clear evidence that intuitionistic fuzzy pooling outperforms the other pooling methods. This superiority is evident across multiple measure scales, such as PSNR, FSIM, UIQ, SSIM, and SRE, where intuitionistic pooling consistently achieves higher values, indicating better similarity when compared to the other pooling methods. Although there is a slight difference favoring fuzzy pooling in the case of the FSIM measure, intuitionistic pooling outperforms the other methods in all other measures. Furthermore, intuitionistic pooling demonstrates a lower RMSE value, indicating a small difference compared to alternative pooling methods.
Note: Compared with fuzzy polishing, Int-FUP slightly improves PNSR with 0.022, UIQ with 0.001, MSE with 0.05, SSIM with , and SRE with 0.01. To highlight the ability of the hesitation membership function to process the main information in the stochastic environment, the reduction and reconstruction operator will be performed on noisy images in the next section.
4.4. Noisy Image Reconstruction: Int-FUP vs. Classical Pooling Models
To demonstrate the ability of the hesitation membership function to retain key information in blurred environments, the reduction and reconstruction operations are applied to noisy images in this section using max, min, average, random, fuzzy, and intuitionistic pooling. The noise types considered here are Gaussian, Poisson, Salt, Pepper, and Speckle. In addition to the MNIST and Fashion datasets, we use another public dataset available at UCI [
68], and we compare the pooling operators on RMSE and PSNR performance measures.
To challenge the pooling approaches, we add different degrees of noise to the images, resulting in four data images: Images + Gauss, Images + Poisson, Images + Salt_Pepper, and Images + Speckle.
Table 2 gives the average RMSE for the different pooling methods used to compress the images, to which we have added four types of noise. We note that intuitionistic pooling has the lowest RMSE, far ahead of all traditional methods (max, min, average and rand). Furthermore, Int-FUP significantly outperforms fuzzy pooling and achieves a significant improvement in terms of RMSE: 1.519 for Gaussian noise, 1.573 for Poisson noise, 0.999 for Salt–Pepper, and 1.072 for Speckle noise. In this sense, the proposed method improves fuzzy clustering by around 20.20%.
Table 3 gives the average PSNR for the different pooling methods used to compress the various images considered, to which we add four types of noise. We can see that intuitionistic pooling has by far the best PSNR of all the traditional methods (max, min, average, and rand). Moreover, Int-FUP far outperforms fuzzy pooling and achieves a significant improvement in term of PSNR: 0.123 for Gaussian noise, 0.083 for Poisson noise, 0.088 for Salt–Pepper, and 0.093 for Speckle noise. In this sense, in terms of PSNR, the proposed method improves fuzzy pooling by almost 37%.
The cause of this success is the ability of intuitionistic logic to quantify the degree of hesitation thanks to the non-membership function which implements the epsilon parameter. We notice that in order to deal with uncertainty, this parameter must be increased largely in comparison with the case where there is no noise.
To study the sensitivity of the fuzzy and intuitionistic pooling methods, while considering different noises, we represent, in terms of boxes with whiskers, the two series RMSE and PSNR, associated with the compressions of all the images and all the noises; see
Figure 5,
Figure 6 and
Figure 7.
Figure 8 shows the fuzzy and intuitionistic pooling whiskers of the RMSEs series corresponding to the four noises. We notice that the boxes associated with intuitionistic pooling are below the boxes associated with fuzzy pooling. Moreover, the boxes associated with Int-FUP are small compared to those of fuzzy pooling, which means that the quartile ranges and deciles of the RMSEs-Int-FUP series are very small and therefore have a low sensitivity in comparison to fuzzy pooling (in the RMSE sense).
Figure 9 shows the fuzzy and intuitionistic pooling whiskers of the PSNRs series corresponding to the four noises. We notice that the boxes associated with intuitionistic pooling are above the boxes associated with fuzzy pooling. Moreover, the boxes associated with intuitionistic pooling are considerable compared to those of fuzzy pooling, which means that the quartile ranges and deciles of the PSNRs–intuitionistic pooling series are better; therefore, intuitionistic pooling has a low sensitivity compared to fuzzy pooling (in the sense of PSNR).
Towards the end, the fact of considering non-membership functions endows intuitionistic logic with a higher capacity to reason correctly (in comparison with fuzzy logic) in stochastic environments.
An experiment on Int-FUP is performed to show the impact of noises used in the experiments above. As shown in
Figure 10, Poisson noise has high RMSE values; as a result, Int-FUP performs poorly in a noisy Poisson environment, while the results in other noisy environments remain acceptable.
Note: It should be noted that the inclusion of a non-adhesion function leads to a slight increase in the processor operating time. Indeed, to perform the pooling operation, fuzzy pooling requires 0.2381375 s, max pooling requires 0.002265 s, average pooling requires 0.02757 s, intuitionistic pooling requires 0.2529685 s, and random pooling requires 0.0089365 s. We can reduce the CPU time corresponding to the intuitionistic pooling operation by performing a parallel calculation, as the membership and non-membership functions work with the same input data.
4.5. Deep CNN Classification: INT-FUP-CNN vs. Classical Pooling-CNN
To perform the classification task via CNN with intuitionistic pooling, several layers [convolution operator + relu + intuitionistic pooling] are introduced: the convolution layers extract relevant information from images using an appropriate number of masks; the Rectified Linear Unit (ReLU) function is used to correct the values of each patch (transforms negative grey levels into 0); and intuitionistic fuzzy clustering reduces the image size to optimize the CNN architecture. Next, the final matrix is flattened to transform the matrix into vectors. The final convolution layers are connected to a fully connected artificial neural network [
69,
70,
71,
72], which consists of three types of layers: the input layer, the hidden layers, and the decision layer (or classification layer). The input layer performs no processing but represents the resulting vectors to the neural network. The hidden layers perform almost all the processing so that the neural network can store the images in the image dataset. The decision layer, equipped with an appropriate activation function, makes the classification decision in terms of probability.
The pooling methods are mainly used in CNN to reduce computational complexity and memory requirements, and this section aims to show the performance of our proposed method compared to other common pooling approaches.
Table 4 shows the result obtained for different evaluation metrics for one epoch on the MINST dataset, where the results of our model clearly exceed the other pooling methods, along with
Figure 11 that shows the AUC-ROC (Area Under The Curve–Receiver Operating Characteristics) curve performance when compared to the other approaches.
In this experiment, various performance measures are employed to compare the effectiveness of our model against commonly used pooling methods such as Max and Average pooling, as well as the fuzzy pooling method, which our model extends. The outcomes of our model are presented in
Table 4 and
Table 5, thus demonstrating the superior performance of our proposed model.
In
Figure 11, we can observe the confusion matrices representing the performance of the proposed model on the MNIST handwritten digits dataset. The top row showcases images representing the “Max” and “Average” approaches, where the image on the right side corresponds to the “Average” approach. In the bottom row, we have matrices illustrating the “Fuzzy” and “Intuitionistic” approaches, with the matrix on the right side representing the “Intuitionistic” approach.
By comparing these matrices, it is evident that the proposed model demonstrates a higher level of confidence in classifying the images. This suggests that the proposed model shows improved accuracy and effectiveness in accurately identifying and categorizing the handwritten digits, which is clearly demonstrated in both
Table 4 and
Table 5.
Moreover,
Figure 12 shows the performance of our proposed method through the AUC-ROC curve, thereby pointing out the effectiveness of the fuzzy and intuitionistic pooling approach. In order to further highlight the capabilities of our model, in the next section, we introduce additional noise to the images, demonstrating its robustness and enhanced performance on fuzzy sets.