Fractional Calculus Meets Neural Networks for Computer Vision: A Survey

Coelho, Cecília; Costa, M. Fernanda P.; Ferrás, Luís L.

doi:10.3390/ai5030067

Open AccessReview

Fractional Calculus Meets Neural Networks for Computer Vision: A Survey

by

Cecília Coelho

^1,*

,

M. Fernanda P. Costa

¹

and

Luís L. Ferrás

^1,2

¹

Centre of Mathematics (CMAT), University of Minho, 4710-057 Braga, Portugal

²

Department of Mechanical Engineering (Section of Mathematics) and CEFT—Centro de Estudos de Fenómenos de Transporte—FEUP, University of Porto, 4200-465 Porto, Portugal

^*

Author to whom correspondence should be addressed.

AI 2024, 5(3), 1391-1426; https://doi.org/10.3390/ai5030067 (registering DOI)

Submission received: 3 July 2024 / Revised: 18 July 2024 / Accepted: 25 July 2024 / Published: 7 August 2024

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

:

Traditional computer vision techniques aim to extract meaningful information from images but often depend on manual feature engineering, making it difficult to handle complex real-world scenarios. Fractional calculus (FC), which extends derivatives to non-integer orders, provides a flexible way to model systems with memory effects and long-term dependencies, making it a powerful tool for capturing fractional rates of variation. Recently, neural networks (NNs) have demonstrated remarkable capabilities in learning complex patterns directly from raw data, automating computer vision tasks and enhancing performance. Therefore, the use of fractional calculus in neural network-based computer vision is a powerful method to address existing challenges by effectively capturing complex spatial and temporal relationships in images and videos. This paper presents a survey of fractional calculus neural network-based (FC NN-based) computer vision techniques for denoising, enhancement, object detection, segmentation, restoration, and NN compression. This survey compiles existing FFC NN-based approaches, elucidates underlying concepts, and identifies open questions and research directions. By leveraging FC’s properties, FC NN-based approaches offer a novel way to improve the robustness and efficiency of computer vision systems.

Keywords:

computer vision; neural networks; fractional calculus; object detection; denoising; segmentation; image generation

1. Introduction

Computer vision has become a transformative field with significant impact across various industries. Using computers to interpret visual information has been crucial for enhancing surveillance systems and aiding medical imaging for disease diagnosis.

Traditional computer vision involves many algorithms and methods that are used to extract meaningful information from images. These techniques typically include steps like image preprocessing, feature extraction, and classification or inference [1].

One of the fundamental tasks in computer vision is image segmentation, where traditional techniques such as thresholding, edge detection, and region-based methods are commonly employed to partition images into semantically meaningful regions or objects. Object detection, another important task, was traditionally addressed using techniques such as Haar cascades, Histogram of Oriented Gradients (HOG), and feature-based classifiers. These methods rely on manually designed features and classifiers to detect objects within images, often requiring the careful tuning of parameters and heuristics to achieve optimal performance. Similarly, image denoising, restoration, and enhancement have been tackled using conventional filtering techniques, such as median filtering, Gaussian blurring, and wavelet transforms, aimed at removing noise, restoring lost details, and improving overall image quality [2,3].

Fractional calculus (FC) is a powerful mathematical framework that extends integer-order derivatives and integrals to non-integer orders. This generalisation allows for more degrees of freedom and provides a more nuanced and flexible way to model systems with memory effects, long-range dependencies, and anomalous behaviours, which are prevalent in real-world phenomena [4]. In recent years, FC has gained attention across various fields, including physics [5], engineering [6], biology [7], finance [8], and medicine [9], due to its ability to capture complex dynamics that cannot be adequately described by classical integer-order derivatives.

Due to proven advantages, several FC-based approaches have been introduced in the literature to improve performance in computer vision tasks [10,11,12,13]. Although the traditional FC-based approaches have been instrumental in enabling various applications, they often struggle with complex, real-world scenarios and lack the ability to generalise well across different domains. These approaches rely heavily on manual feature engineering and human knowledge or assumptions on the ground-truth image (prior). Furthermore, FC-based approaches often face challenges in handling variability in lighting conditions, viewpoint changes, and occlusions, which are prevalent in real-world images [2].

FC has also been attracting increasing attention from researchers in the field of deep learning algorithms. The inherent ability to capture complex dynamics and temporal relationships is closely aligned with the objectives of ML, particularly in tasks that involve sequential data analysis, such as time-series forecasting, natural language processing, and sequential decision making [14,15,16].

Recently, neural networks (NNs) have demonstrated remarkable capabilities in learning complex patterns and features directly from raw data. In the context of computer vision, NNs have enabled breakthroughs in image denoising, enhancement, segmentation, restoration, and object detection. Moreover, constant efforts by the research community have also lead to reducing the computational cost of these tasks. By leveraging large datasets and powerful computational resources, NNs have surpassed traditional computer vision approaches, achieving state-of-the-art performances across a wide range of tasks. Furthermore, the ability of NNs to automatically learn and extract features from data has eliminated the need for manual feature engineering and enabled systems to adapt and generalise to diverse and complex visual environments [2].

More recently, FC has emerged in NN-based computer vision, offering a compelling approach to enhance the performance of NN architectures. Through incorporating non-integer orders derivatives, FC makes it possible to account for the intricate spatial and temporal relationships, combining local and global features inherent in images and videos [17]. For instance, NNs have been used to leverage the power of fractional-order differential mask operators, enabling the discovery of optimal mask orders to improve their performance in achieving specific image goals, such as denoising and enhancement [18]. Additionally, the introduction of fractional-order convolutional kernels has facilitated the compression of NN architectures, resulting in significant reductions in the number of trainable parameters [19]. These advancements underscore the versatility and efficacy of fractional derivatives in not only enhancing the performance of computer vision systems but also streamlining computational processes, thereby propelling the field toward more efficient and scalable solutions.

Based on the above, this work aimed to conduct a survey of the FC NN-based computer vision techniques presented in the literature, focusing on the tasks of denoising, enhancement, object detection, segmentation, restoration, and NN compression. The aim was to compile various approaches and provide a concise yet intuitive explanation of the underlying concepts. Additionally, we aimed to identify open questions and research directions that stem from each paper covered in this survey.

This paper is organised as follows. In Section 2, we provide a brief background to understand the basics of FC. Section 3 starts with a concise overview of computer vision and the various tasks it encompasses, namely denoising, enhancement, segmentation, object detection, restoration, and the compression of NN architectures. Subsequently, for each task, we present the methods in the literature that combine FC and NN-based computer vision, concluding with a brief discussion of open questions and potential research directions arising from each method. Finally, the paper ends in Section 4 with a summary of the findings and conclusions drawn. Additionally, Appendix A provides a summary table of all the methods discussed in this work, briefly detailing their advantages, experimental setups, and results as reported in the original papers.

2. Fractional Calculus

Contrary to popular belief, fractional differential calculus is not a recent subject. For example, the symbol

d^{n} y / d x^{n}

was first proposed by Leibniz, and, in 1695, L’Hôpital asked Leibniz about the meaning of

d^{1 / 2} y / d x^{1 / 2}

, effectively asking, “What if n is fractional?”. Leibniz responded,

“Although infinite series and geometry are distant relations, infinite series admits only the use of exponents that are positive and negative integers and does not, as yet, know the use of fractional exponents”.

He continued, “This is an apparent paradox from which, one day, useful consequences will be drawn”.

This correspondence can be seen as the beginning of fractional differential calculus. The term “fractional” comes from L’Hôpital’s question about the fraction

1 / 2

, although the order of differentiation can be any real or complex number. From this letter, we learn that classical and fractional differential calculus were conceived almost simultaneously [4,20,21,22]. In 1716, Leibniz died, but the interest in understanding derivatives of fractional (non-integer) order increased. Several other authors devoted their time to this subject, and the well-known Leonard Euler also contributed to the understanding and generalisation of fractional differential calculus. He extended the notion of the factorial,

n!

, to non-integer values. This extension was later named the Gamma function,

Γ (.)

, by Adrien-Marie Legendre around 1811:

Γ (z) = \int_{0}^{\infty} t^{z - 1} e^{- t} d t

(1)

where

ℜ (z) > 0

. The Gamma function plays a crucial role in defining generalised derivatives, as shown below.

The idea of generalising derivative and integral operators to an order n seems simple because one only needs to obtain operators that can be defined for non-integer n values and that match the classical operators when n is an integer. Consequently, several definitions have been proposed in the literature by different authors. Since this work is not a survey on fractional calculus, only three definitions of fractional derivatives will be considered here: the Riemann–Liouville and Caputo definitions (often used in applications in physics and engineering), and the Grünwald-Letnikov definition, which seems to be preferred in NN-based computer vision.

The Riemann–Liouville and Caputo fractional derivatives will now be introduced. However, before defining these derivatives, we will first provide the definition of a fractional integral. For that, recall the Fundamental Theorem of Calculus.

Theorem 1.

Fundamental Theorem of Calculus: Let

f : [a, b] \to R

be a continuous function, and let

F : [a, b] \to R

be defined by

F (x) = \int_{a}^{x} f (t) d t .

(2)

Then, F is differentiable, and

F^{'} = \frac{d F}{d t} = f .

(3)

In order to write in a more compact way some of the results to come, we now define the following differential and integral operators.

Definition 1.

Derivative and Integral Operators: We denote the differential operator that maps a function f into its derivative

D f (x) = f^{'} (x)

by D and the integral operator that maps a function

f (x)

into its primitive (whenever the integration can be performed on the compact interval

[a, b]

) by

J_{a}

:

J_{a} f (x) = \int_{a}^{x} f (t) d t, x \in [a, b]

(4)

These operators can be generalised to perform n-fold iterates:

D^{n} f (x) = \frac{d}{d t} \dots \frac{d}{d t} \frac{d f}{d t} = D^{1} D^{n - 1} f (x),

(5)

J_{a}^{n} f (x) = \int_{a}^{x} \dots \int_{a}^{x} \int_{a}^{x} f (t) d t = J_{a}^{1} J_{a}^{n - 1} f (x) .

(6)

The following Lemma introduces a way to write the n-fold integral using only one integral symbol.

Lemma 1.

n-fold Integration: Let f be Riemann integrable on

[a, b] .

Then, for

a \leq x \leq b

and

n \in N

, we have

J_{a}^{n} f (x) = \frac{1}{(n - 1)!} \int_{a}^{x} {(x - t)}^{n - 1} f (t) d t .

(7)

To generalise the previous integral to non-integer orders, one simply needs to replace

(n - 1)!

with the Gamma function defined earlier,

Γ (n)

, taking into account that

Γ (n) = (n - 1)!, n \in N

.

Definition 2.

Riemann–Liouville Fractional Integral: Let

n \in R_{+}

and

J_{a}^{n}

be the operator defined on

L_{1} [a, b]

by

J_{a}^{n} f (x) = \frac{1}{Γ (n)} \int_{a}^{x} {(x - t)}^{n - 1} f (t) d t, x \in [a, b] .

(8)

Then,

J_{a}^{n}

is called the Riemann–Liouville fractional integral operator of order n.

The fractional derivative is obtained by taking a derivative (of a certain integer order) of the fractional integral just defined. This implies that fractional derivatives may depend on integral operators.

Recall that in the classical case (integer orders), we have the following lemma.

Lemma 2.

n-fold Integration: Let

m, n \in N

with

m > n

, and let f be a function with a continuous

n^{th}

derivative on the interval

[a, b]

. Then,

D^{n} f = D^{m} J_{a}^{n - n} f .

(9)

Therefore, a generalisation of this lemma leads to the definition of a Riemann–Liouville fractional derivative [21].

Definition 3.

Riemann-Liouville Fractional Derivative: Let

α \in R_{+}

and

m = ⌈ α ⌉

. The Riemann–Liouville fractional derivative of order α (

{}_{a}^{R}D_{t}^{α} f

) is given by

{}_{a}^{R}D_{x}^{α} f (x) = D^{m} J_{a}^{m - α} f (x) = \frac{D^{m}}{Γ (m - α)} \int_{a}^{x} {(x - t)}^{m - α - 1} f (t) d t .

(10)

For

n = 0

, we have

{}_{a}^{R}D_{t}^{0} : = I

.

This definition of the fractional derivative generalises the classical case of integer-order derivatives. However, it may lead to properties that could be seen as less appealing. For instance, the Riemann–Liouville derivative of a constant is not zero. Although, if we exchange the order of integration and differentiation, this less appealing characteristic can be easily supressed, and we obtain a new definition of a fractional derivative, proposed by M. Caputo [23].

Definition 4.

Caputo Fractional Derivative: Let

α \in R_{+}

,

m = ⌈ α ⌉

and

D^{m} f (t) \in L_{1} ([a, b])

. The Caputo fractional derivative of order α

(_{a}^{C} D_{t}^{α} f)

is given by

{}_{a}^{C}D_{x}^{α} f (x) = J_{a}^{m - α} D^{m} f (x) = \frac{1}{Γ (m - α)} \int_{a}^{x} {(x - t)}^{m - α - 1} D^{m} f (t) d t .

(11)

Note the resemblance with the Riemann–Liouville fractional derivative.

These two definitions of fractional derivatives have expressions that depend on integrals. Therefore, besides the property of order generalisation, they are often used in modelling physical problems where memory is an important factor. At each instant, the integral computes the past history. The range of applications for generalised derivatives is immense. Consequently, many recent works have used these general operators.

There is another definition of a fractional derivative that has captured the attention of many researchers. This definition is based on classical differentiation and seems to be more intuitive. Grünwald (1867 [24]), Post (1930 [25]), and Letnikov (1872 [26]) presented the idea of the fractional derivative as the limit of a sum.

Itde variáveis is well known that a classical derivative can be approximated as a limit of difference quotients. For example,

f^{'} (x) = D^{1} f (x) = lim_{h \to 0} \frac{f (x) - f (x - h)}{h}

(12)

We know that

\begin{matrix} \nabla_{h}^{1} f (x) = & f (x) - f (x - h) \\ \nabla_{h}^{2} f (x) = & \nabla_{h}^{1} f (x) - \nabla_{h}^{1} f (x - h) = f (x) - 2 f (x - h) + f (x - 2 h) \\ ⋮ & ⋮ \\ \nabla_{h}^{n} f (t) = & \sum_{k = 0}^{n} {(- 1)}^{k} (\begin{matrix} n \\ k \end{matrix}) f (x - k h) . \end{matrix}

(13)

where

(\begin{matrix} n \\ k \end{matrix}) = \frac{n (n - 1) \dots (n - k + 1)}{k!}

is the binomial coefficient. Therefore, we can state the following.

Theorem 2.

Let

n \in N

,

f \in C^{n} ([a, b])

and

a < t \leq b

. Then,

D^{n} f (t) = lim_{h \to 0} \frac{\nabla_{h}^{n} f (t)}{h^{n}}

(14)

For example, a second-order derivative can be written as

D^{2} f (x) = lim_{h \to 0} \frac{f (x) - 2 f (x - h) + f (x - 2 h)}{h^{2}} .

(15)

Grünwald and Letnikov performed a generalisation of this result to non-integer n values, leading to the following definition of a fractional derivative.

Definition 5.

Let

α \in R_{+}

,

f (t) \in C^{⌈ α ⌉} ([a, b])

and

h_{N} = (t - a) / N

. The Grünwald–Letnikov fractional derivative of order α (

{}_{a}^{G L}D_{x}^{α} f

) is given by

{}_{a}^{G L}D_{x}^{α} f (x) = lim_{N \to \infty} \frac{\nabla_{h_{N}}^{α} f (t)}{h_{N}^{α}} .

(16)

with

\begin{matrix} \nabla_{h}^{α} f (x) = & \sum_{k = 0}^{\infty} {(- 1)}^{k} (\begin{matrix} α \\ k \end{matrix}) f (x - k h) . \end{matrix}

(17)

Note that

(\begin{matrix} α \\ k \end{matrix}) = \frac{Γ (α + 1)}{Γ (k + 1) Γ (α - k + 1)}

(18)

is the fractional binomial coefficient, a generalisation of the classical binomial coefficient to non-integer values [20].

To simplify the notation, we will drop the GL, and the Grünwald–Letnikov fractional derivative will be simply represented by

D^{α} f (x)

. It is also common to represent the derivative by making

h_{N} \to 0

, that is,

D^{α} f (x) = lim_{h_{N} \to 0} \frac{1}{h_{N}^{α}} \sum_{k = 0}^{\infty} {(- 1)}^{k} \frac{Γ (α + 1) f (x - k h)}{Γ (k + 1) Γ (α - k + 1)} .

(19)

This definition allows for the discretisation of fractional derivatives, enabling their computation using simple finite differences. Note that there are various definitions of fractional derivatives, each with its own advantages and disadvantages (the interested reader should consult [22]).

In computer vision, integer-order calculus plays a pivotal role. Techniques such as gradient-based edge detection use derivatives to identify abrupt changes in intensity, forming the foundation of edge detection algorithms. Furthermore, integral calculus finds applications in image processing tasks such as convolution and filtering, where convolution operations are analogous to computing the integral of a function over a given region. The mathematical principles of differential and integral calculus also form the foundation for numerous algorithms in feature extraction, object recognition, and image segmentation, allowing computers to effectively interpret and analyse visual information [27]. Therefore, in the following chapters, we present various works that use fractional calculus to enhance neural network-based computer vision. By fractional calculus, we mean works involving fractional derivatives or generalising integer-order operations (e.g., using the Gamma function instead of the factorial, or employing non-polynomials or polynomials of non-integer order instead of classical polynomials, etc). While some studies use fractional optimisation algorithms to optimise neural network parameters for computer vision tasks, this survey focuses exclusively on methods that leverage fractional calculus to modify or contribute to the architecture, such as in feature extraction or image enhancement.

3. Computer Vision

Computer vision encompasses a diverse range of tasks, ranging from denoising and enhancement to object detection, segmentation, and restoration [28], as shown in Figure 1.

Recent advancements in the literature have shown significant performance improvements in computer vision, and therefore, in this section, we introduce and elaborate on neural network architectures and techniques that use fractional calculus to enhance computer vision tasks. These innovative approaches not only improve the tasks’ performance but also contribute to reducing the computational cost associated with NN-based computer vision systems, making them more practical and scalable for real-world applications.

3.1. Denoising

Image denoising is the process of removing unwanted noise from digital images to enhance their visual quality and improve the accuracy of subsequent analysis or processing tasks. Noise in images can arise from various sources, including sensor limitations, transmission errors, or environmental factors during image capture. The goal of denoising algorithms is to distinguish between the true signal representing the underlying scene and the undesirable noise components, and then attenuate or eliminate the noise while preserving important image features. This typically involves applying filters or statistical techniques tailored to suppress noise, thus resulting in cleaner and more visually appealing images [29].

Image denoising using FC has emerged as a promising approach to address the challenges posed by noise in digital images, offering advantages over traditional methods. By leveraging the memory effects and long-range interactions inherent in fractional calculus, researchers have developed novel denoising algorithms capable of preserving image details while effectively suppressing noise. Several works in the literature have explored this approach, with more contributions continuously being made [30,31,32,33,34,35,36].

Due to their approximation capabilities, NNs have emerged as powerful tools in image denoising by learning the underlying structure of clean images and effectively differentiating between noise and true image features. These NNs are normally trained on pairs of noisy and clean images, where they learn to map noisy inputs to their corresponding clean versions. Convolutional neural networks (CNNs) are particularly well suited for image denoising tasks due to their ability to automatically extract hierarchical features from images. Through iterative training processes, NNs find the optimal parameters that minimise the difference between the denoised output and the clean ground-truth image [37]. This approach has shown remarkable success in various applications, such as medical imaging [38], surveillance systems [39], ocean biodiversity monitorisation [40], and others.

In the literature, only two works proposing the combination of fractional calculus (FC) and NN-based techniques for image denoising were found [41,42]. In [41], the authors propose formulating denoising as a variational problem, aiming to minimise a functional that incorporates fidelity to the observed noisy image and smoothness of the denoised image, by integrating non-integer order derivatives. These are integrated into the regularisation term of the variational model, facilitating the preservation of edges and textures while reducing noise, denoted as Fractional-order Total Variation. In [42], the authors use weights given by FDEs to propagate the feature maps from one layer to the next, giving rise to the Fractional Optimal Control Network.

3.1.1. Fractional-Order Total Variation

Total Variation (TV) regularisation is a core technique for image denoising that preserves important features such as edges and textures, with the ability to effectively reduce noise while maintaining sharp transitions between regions of an image. Unlike simpler techniques, such as linear smoothing or median filtering, which can blur edges and details, TV regularisation exploits the inherent sparsity in the gradient of the image, penalising rapid changes in pixel intensity.

TV regularisation can be formulated using the

L_{2}

norm as [41]

L_{TV} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} \sqrt{{(X^{'} (i + 1, j) - X^{'} (i, j))}^{2} + {(X^{'} (i, j + 1) - X^{'} (i, j))}^{2}}

(20)

or using the

L_{1}

norm as

L_{TV} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} | X^{'} (i + 1, j) - X^{'} (i, j) | + | X^{'} (i, j + 1) - X^{'} (i, j) |

(21)

where I and J are the width and length (number of pixels in the horizontal and vertical directions) of the image, respectively, i and j are the coordinates of a pixel in the image, and

X^{'}

is the denoised image [41]. The difference

X^{'} (i + 1, j) - X^{'} (i, j)

calculates the finite horizontal gradient, corresponding to the change in pixel value along the horizontal direction. Similarly,

X^{'} (i, j + 1) - X^{'} (i, j)

is the finite vertical gradient, representing the difference between adjacent pixels along the vertical direction.

The idea is straightforward. In addition to the classical loss function, which measures the difference between the ground truth and the result obtained from the neural network—

L_{error}

—a new loss function is introduced to aid in image denoising and artefact reduction. This new loss function is defined as

L_{total} = L_{error} + λ L_{TV},

(22)

where

λ

is the regularisation hyperparameter. This parameter balances the fit of the model to the observed data (data fidelity term) with the smoothness or sparsity enforced by the regularisation term. A small regularisation parameter might lead to overfitting, where the model closely matches the training data but does not generalise well to new data. Conversely, a large regularisation parameter could overly smooth or simplify the solution, potentially causing the loss of important features or details. In essence,

λ

controls the significance of the TV loss in the optimisation process. This exploits the inherent sparsity in the gradient of the image, penalising rapid changes in pixel intensity [43].

TV fails to fully use the information from neighbouring pixels, which can lead to artefacts in images. To address this limitation, fractional-order differences have been proposed for processing this gradients. The fractional variations inherently incorporate information from neighbouring pixels, allowing them to draw data not only from adjacent pixels but also from more distant ones. Consequently, fractional-order differences can theoretically capture richer pixel information and reduce artefacts.

Taking this into account, in [41], the authors proposed a new NN-based denoising model that incorporates Fractional-order TV (FTV) regularisation into the loss function. To the best of our knowledge, this was the first time FTV was used in conjunction with a NN; however, multiple studies have compared NN-based methods to FTV filters [44,45,46].

FTV is therefore an extension of TV regularisation proposed in [47], by introducing a fractional exponent

α

to the TV term.

In [41], the authors proposed a modified FTV regularisation that computes the fractional gradient in eight directions around each pixel, namely

x^{-}, y^{-}, x^{+}, y^{+}

, Left-Down Direction (LDD), Right-Up Direction (RUD), Left-Up Direction (LUD), and Right-Down Direction (RDD):

\begin{matrix} L_{FTV} = & \sum_{i = 1}^{I} \sum_{j = 1}^{J} | D_{x^{+}}^{α} X^{'} | + | D_{y^{+}}^{α} X^{'} | + | D_{x^{-}}^{α} X^{'} | + | D_{y^{-}}^{α} X^{'} | + | D_{L D D}^{α} X^{'} | + \\ | D_{R U D}^{α} X^{'} | + | D_{L U D}^{α} X^{'} | + | D_{R D D}^{α} X^{'} |, \end{matrix}

(23)

where each term

D^{α}

corresponds to applying a fractional differential mask (using an approximation to the Grünwald–Letnikov derivative) in the corresponding direction,

x^{-}, y^{-}, x^{+}, y^{+}

, LDD, RUD, LUD, or RDD. For example, in the horizontal direction x, we have that

D_{x^{+}}^{α} X^{'} = [\begin{matrix} \dots & \dots & \dots & \dots \\ 0 & 0 & 0 \\ C_{S_{- 1}} & C_{S_{0}} & \dots & C_{S_{n}} \\ 0 & 0 & \dots & 0 \\ \dots & \dots & \dots & \dots \end{matrix}], D_{x^{-}}^{α} X^{'} = [\begin{matrix} \dots & \dots & \dots & \dots \\ 0 & 0 & 0 \\ C_{S_{n}} & C_{S_{n - 1}} & \dots & C_{S_{- 1}} \\ 0 & 0 & \dots & 0 \\ \dots & \dots & \dots & \dots \end{matrix}] .

(24)

Here, n is the last entry, with entries derived from the Grünwald–Letnikov derivative given by

C_{S_{k}} = \frac{1}{Γ (- α)} [\frac{Γ (k - α + 1)}{(k + 1)!} \cdot (\frac{α}{4} + \frac{α^{2}}{8}) + \frac{Γ (k - α)}{k!} \cdot (1 - \frac{α^{2}}{4}) + \frac{Γ (k - α - 1)}{(k - 1)!} \cdot (- \frac{α}{4} + \frac{α^{2}}{8})],

(25)

with k as the entry of the fractional differential mask being computed [41].

FTV regularisation is then added to the loss function of deep learning methods with the aim of preserving texture and enhancing details [41].

L_{total} = L_{error} + λ L_{FTV},

(26)

The method of adding FTV to the loss function of NN-based models for image denoising is still very new and there has not been much research in this area yet. One significant study that used this method is found in [48], which focused on classifying environmental sounds. This shows that the field has many unexplored areas, such as understanding how the choice of the

α

value affects the results and whether it is possible to include more than one FTV regularisation term with different

α

values in the loss function.

3.1.2. Fractional Optimal Control Network

The Fractional Optimal Control Network (FOCNet) is an NN architecture designed for denoising tasks, leveraging the principles of Fractional Ordinary Differential Equations (F-ODEs) to propagate features depth-wise within the network. This architecture exploits the memory persistence inherent in F-ODEs, enabling enhanced denoising performance compared to traditional NN approaches. The underlying methodology of FOCNet involves solving a Fractional Optimal Control problem [42]:

\begin{array}{l} \min_{θ (t)} \frac{1}{2} \int_{ω} (Φ (u (T, s)) - x (s))^{2} d s \\ s . t . D_{t}^{α} u (t, s) = f (u (t, s), θ (t)), \\ u (0, s) = Ψ (y (s)), t \in [0, T] \end{array}

(27)

where

y (s)

represents the input image to be denoised, s is a pixel,

x (s)

denotes the corresponding ground-truth image,

Φ (\cdot)

and

Ψ (\cdot)

denote linear transformations (such as convolutions) that are predefined, and

u (t, s)

represents the control input. The function

f (u (t, s), θ (t))

describes the dynamics of the system, parameterised by an NN [42].

FOCNet conceptualises the NN as an infinite-depth architecture, where to each layer’s output, the previous layers’ outputs are added, multiplied by a weight

w_{k}

given by discretising an F-ODE using Grünwald–Letnikov [42], as shown in Figure 2:

u_{t + 1} = \sum_{k = 0}^{t} w_{k} u_{k} + σ (θ_{t} u_{t}), with w_{k} = {(- 1)}^{t - k + 2} (\binom{α}{t - k + 1}) .

(28)

where

u_{t + 1}

is the output of layer

t + 1

, k denotes all previous layers,

0 < k < t

, and

σ

is a nonlinear operation given by convolution followed by batch normalisation and a Rectified Linear Unit [42].

This approach enables the propagation of features from one layer to another throughout the network, enabling effective feature extraction and denoising. Thus, the usage of F-ODE allows us assign weights to each layers’ contribution to the end result [42].

The goal of FOCNet is to optimise the denoising process by minimising the difference between the denoised image and the ground-truth image. This is achieved through the iterative adjustment of the parameters

θ (t)

of the NN, guided by the solutions to the Fractional Optimal Control problem, (27). To discretise the fractional-order dynamic system inherent in FOCNet, the Grünwald–Letnikov fractional derivative definition is employed [42].

In comparison to traditional denoising NNs (not FC-based), the FDE enables the NN to assign weights to each layer’s contribution to the end result. This is because F-ODEs provide a mathematical framework for describing the dynamics of the system (denoising process), including the propagation of features and the influence of different network layers on the final output [42].

The literature has extensively demonstrated the benefits of using multiple scales of the same image to extract diverse features, thus enhancing the NN’s capacity for feature extraction. Building upon this, in [42], the authors extended FOCNet to incorporate multi-scale representations, giving rise to a multi-level architecture.

The multi-scale FOCNet architecture comprises multiple hierarchical levels, with each level representing a distinct scale and containing a dedicated FOCNet, (27). This modification enables the network to capture and retain both previous features and features across different scales, facilitating the long-term memory mechanisms inherent in FDEs. Unlike the standard FOCNet (27), the denoising process in the multi-scale version incorporates an additional step. This step involves the application of a function

g (x) = w T (x)

, where

w \in 0, 1

and

T (\cdot)

represents a pooling or unpooling operation, enabling the propagation of contributions from lower-level layers to higher-level layers. The denoising process within the multi-scale FOCNet can be formulated as [42]

\{\begin{matrix} D_{t}^{α} u (t, s, l_{1}) = f (u (t, s, l_{1}), g (u (t, s, l_{1 + 1}), θ_{1} (t))) \\ D_{t}^{α} u (t, s, l_{2}) = f (u (t, s, l_{2}), g (u (t, s, l_{2 \pm 1}), θ_{2} (t))) \\ \dots \\ D_{t}^{α} u (t, s, l_{i}) = f (u (t, s, l_{i}), g (u (t, s, l_{i + 1}), θ_{i} (t))) \\ u (0, s, l_{1}) = Ψ y (s) \\ u (0, s, l_{i}) = T_{↓} u (1, s, l_{i - 1}) \\ 1 \leq l_{i} \leq k \\ 0 \leq t \leq T, \end{matrix}

(29)

where

l_{i}

represents the FOCNet level i (with

l_{1}

denoting the original level),

θ_{i}

denotes the parameters of the corresponding level,

u_{t}^{l_{i} \pm 1}

denotes either the upper-level feature

u_{t}^{l_{i} + 1}

or the lower-level features

u_{t}^{l_{i} - 1}

, and

T ↓

is a pooling operation [42].

The computation to obtain the result of each layer is now more complex:

u_{t + 1}^{l_{i}} = \sum_{k = 0}^{t} w_{k} u_{k}^{l_{i}} + σ (θ_{i} (u_{t}^{l_{i}} + g (u_{t}^{l_{i} \pm 1}))),

(30)

aiming to strengthen the NN by promoting cross-level feature interactions [42], as shown in Figure 3.

The work presented in [42] prompts several research questions and potential directions for further investigation. Firstly, the choice of fractional derivative definition remains largely unexplored and warrants discussion or justification to provide insight into why a particular definition was selected over others. Moreover, considering that FOCNet extracts scaling features at various levels and combines them, there is potential for enhancing its performance by employing different strategies for combining these features. Currently, in FOCNet, each level’s features contribute equally, but introducing weighting factors could be an intriguing avenue to explore. Adjusting the contribution of features at different levels could potentially improve the network’s ability to capture and leverage hierarchical information effectively.

3.2. Enhancement

Image enhancement is the process of improving the visual quality or perception of digital images by manipulating their attributes such as brightness, contrast, sharpness, and colour balance. Unlike denoising, which specifically targets noise reduction, image enhancement aims to enhance the overall appearance of images to make them more visually appealing or suitable for specific applications. Enhancement techniques can range from simple adjustments like histogram equalisation or contrast stretching to more advanced algorithms such as image fusion or super-resolution. These techniques can be used to highlight important features, improve visibility in low-light conditions, or adapt images for specific display or analysis requirements [49].

Image enhancement using FC is a well-explored field with several works showing the advantages over traditional methods that rely on integer calculus. The construction of masks and filters with fractional orders has opened the possibility of attaining in-between behaviours of traditional masks, offering enhanced flexibility and performance in image enhancement tasks [10,50,51,52,53,54,55,56].

In image enhancement, NNs learn from image datasets to understand what constitutes an enhanced or improved version of an image. CNNs are a very popular architecture that, by analysing these image pairs, learns to identify patterns and relationships between low-quality features and their desired improvements. During image enhancement, the CNN takes a low-quality image as input and processes it through its layers, making adjustments to brightness, contrast, noise levels, and other visual aspects. Furthermore, NNs have also shown high efficacy in learning masks and filters to perform tasks such as edge enhancement, texture synthesis, and artefact removal [49]. The use of NNs for image enhancement has emerged as a powerful tool in several applications, such as the detection of diseases in plants [57], downsampling of temperature climatic maps [58], and enhancement of medical imaging [59].

The authors could only find two works leveraging NNs for improving image enhancement in the literature, which will be herein discussed [18,60].

In [18], the authors propose using NNs to learn the best fractional order for the masks, demonstrating promising results in image enhancement tasks.

In [60], the authors propose using fractional Rényi entropy to enhance images before feeding them into an NN for image segmentation purposes. The results show that a higher performance in segmentation is achieved when compared with other methods.

3.2.1. Neural Fractional-Order Adaptive Masks

Masks (or kernels) and filters are mathematical operators applied to images that enhance the quality of images, easing the extraction of relevant features such as patterns and structures. Masks are matrices that are applied to the pixels in an image to perform a specific operation, such as edge detection and convolution; two very popular masks are Sobel and Lapalace [27].

The Sobel mask consists of two

3 \times 3

convolution masks, one to detect horizontal changes and the other to detect vertical changes in intensity within an image [27]:

S_{x} = [\begin{matrix} + 1 & 0 & - 1 \\ + 2 & 0 & - 2 \\ + 1 & 0 & - 1 \end{matrix}] * y (s)

(31)

S_{y} = [\begin{matrix} + 1 & + 2 & - 1 \\ 0 & 0 & 0 \\ + 1 & - 2 & - 1 \end{matrix}] * y (s)

(32)

where

y

is the input image,

S_{x}

and

S_{y}

are the image with the horizontal and vertical gradients, respectively, and ∗ is a convolution operation. The resulting gradient magnitude image is obtained by combining the horizontal and vertical gradient images using the Euclidean norm [27]:

S = \sqrt{S_{x}^{2} + S_{y}^{2}}

(33)

This operation highlights regions of significant intensity variation, effectively detecting edges in the image. The Sobel mask is commonly used in edge detection and various other image processing tasks due to its simplicity and effectiveness. Sobel masks belong to the category of first-order masks since they compute the first-order derivative of image intensity with respect to spatial coordinates [27].

In contrast, the Laplace mask computes the second-order spacial derivative of image intensity, having the same mask for vertical and horizontal convolution [27]:

L_{x} = [\begin{matrix} 0 & + 1 & 0 \\ + 1 & - 4 & + 1 \\ 0 & 1 & 0 \end{matrix}] * y (s)

(34)

The Sobel and Laplace masks are both widely used for edge detection; however, their approach and characteristics differ, each offering unique advantages and disadvantages. The Sobel mask, which consists of a separate masks for horizontal

S_{x}

and vertical gradients

S_{y}

, excels at detecting edges with a clear orientation, providing detailed information about the direction of intensity changes within an image. Its structured design makes it robust to noise and suitable for detecting edges in noisy environments. However, the Sobel mask struggles to detect edges at corners or junctions accurately, as it only emphasises the dominant direction of change at each pixel. In contrast, the Laplace mask detects edges regardless of their orientation and is sensitive to abrupt intensity changes, making it effective in this case. Yet, it is more sensitive to noise compared to the Sobel mask, leading to potential false edges detections in noisy images [10,27].

Due to the advantages and disadvantages inherent in the first- and second-order masks, coupled with the established theoretical foundations of fractional calculus, the exploration and refinement of fractional-order masks has been an active and growing field of research [10,61]. These masks offer a balance, leveraging the precision of higher-order derivatives while retaining the adaptability and noise resilience typically associated with lower-order operators.

A problem that persists in fractional-order masks lies in their uniform treatment of the entire image with the same fractional order. This approach can lead to the excessive enhancement of low-spatial frequency content, potentially overshadowing the subtler details within the image, while simultaneously failing to adequately boost high-frequency components [18].

Keeping this in mind, in [18], the authors propose the Adaptive Fractional-order Differential (AFD) mask. This approach uses an NN to dynamically determine the optimal order of differentiation

α

. The goal is to train an NN to optimise the orders of the mask for any given image.

The AFD mask comprises two

3 \times 3

convolutional matrices, horizontal

A_{x}

and vertical

A_{y}

, derived from the Grünwald–Letnikov derivative definition (see [18] for details on the derivation process):

A_{x} (s) = [\begin{matrix} 0 & \frac{α_{s}^{2} - α_{s}}{2} & 0 \\ 0 & - α_{s} & 0 \\ 0 & 0 & 0 \end{matrix}] * y (s)

(35)

A_{y} (s) = [\begin{matrix} 0 & 0 & 0 \\ \frac{α_{s}^{2} - α_{s}}{2} & - α_{s} & 1 \\ 0 & 0 & 0 \end{matrix}] * y (s) .

(36)

The

α_{s}

values for each pixel of the image are determined by an NN with learnable parameters

θ

. These values are computed from the average gradient of the pixel in eight directions

M (i, j)

. To train the NN, the authors propose using a training dataset generated by employing a piece-wise function that dictates the

α_{s}

order of each pixel. This function is specified by the AFD Algorithm (AFDA) [18]:

α_{s} = \{\begin{matrix} \frac{M (i, j) - t_{g}}{M (i, j)} & if M (i, j) \geq t_{g} and \frac{M (i, j) - t_{g}}{M (i, j)} \geq \frac{A G_{e d} - Q}{A G_{e d}} \\ \frac{A G_{e d} - Q}{A G_{e d}} & if M (i, j) \geq t_{g} and \frac{M (i, j) - t_{g}}{M (i, j)} < \frac{A G_{e d} - Q}{A G_{e d}} \\ \frac{Q - A G_{t e x}}{Q} & if 2 < M (i, j) < t_{g} and \frac{M (i, j)}{t_{g}} \geq \frac{Q - M_{t e x}}{Q} \\ \frac{M (i, j)}{t_{g}} & if 2 < M (i, j) < t_{g} and \frac{M (i, j)}{t_{g}} < \frac{Q - A G_{t e x}}{Q} \\ 0 & if 0 \leq M (i, j) \leq 2 \end{matrix}

(37)

where

t_{g}

is a hyperparameter for the gradient threshold for edges, Q is the mean gradient of image

y

, and

A G_{e d}

and

A G_{t e x}

are the average gradients of edges and textures, respectively. The computation of

M (i, j)

is as follows [18]:

M (i, j) = \frac{| 8 y (i, j) - y (i - 1, j - 1) - y (i - 1, j) - y (i - 1, j + 1) - y (i, j - 1) - y (i, j + 1) - y (i + 1, j - 1) - y (i + 1, j) - y (i + 1, j + 1) |}{8} .

(38)

So, to generate the NN training dataset, the AFDA is used to compute the NN’s input values,

M (i, j)

, and their corresponding

α_{s}

orders, which serve as the ground-truth output. Subsequently, the NN is trained to minimise the error between these ground-truth

α_{s}

values and the predicted

{\hat{α}}_{s}

values, for example using the Mean Squared Error [18], as shown in Figure 4.

Upon the completion of training, the NN is capable of outputting the

α_{s}

orders of the masks to treat each corresponding pixel in the target image, providing the average gradient of the pixel as input [18].

The experimental results presented in [18] demonstrate that the proposed masks yield higher contrast, clearer edges, and enhance smooth areas and texture within the images. Additionally, it is observed that after training the NN to compute the mask orders, the computational requirements for determining these orders are reduced, leading to improved performance compared to using the AFDA by itself.

Although the approach presented in [18] demonstrates improvements over the manual computation of derivative orders, there remains considerable scope for enhancement and exploration. Firstly, the authors employ a relatively simple NN architecture, leaving room for experimentation with more complex architectures that could potentially yield better results. Additionally, the rationale behind the choice of the Grünwald–Letnikov definition is not explicitly addressed by the authors, which poses an open question for further investigation. Furthermore, the learned derivative orders may be constrained due to the training dataset being generated by the AFDA, which could limit the NN’s ability to learn optimal orders without extensive prior knowledge input. This suggests a need to explore alternative approaches to dataset generation and NN training to broaden the capabilities and flexibility of the model.

3.2.2. Fractional Rényi Entropy

Rényi entropy

R_{α} (y)

[62] is a measure of uncertainty or randomness within a probability distribution, commonly employed in information theory and computer vision to quantify diversity and uncertainty in pixel intensities or image features. For a greyscale image

y

, the Rényi entropy is defined as [63]

R_{α} (y) = \frac{1}{1 - α} log (\sum_{s = 0}^{255} p_{s}^{α}), α \in [0, \infty]

(39)

where

p_{s}

is a normalised histogram of pixel intensities, and

α \in N

is a parameter governing the focus on different parts of the probability distribution. When

α \to 1

, Rényi entropy reduces to Shannon entropy [64], indicating overall uncertainty in the image. For

α \neq 1

, Rényi entropy highlights various aspects of the distribution’s structure. As

α \to 0

, it emphasises the most frequent pixel intensities akin to min-entropy [65], suitable for capturing dominant image features. Conversely, as

α \to \infty

, it emphasises rare pixel intensities akin to max-entropy [65], which is valuable for detecting subtle textures or anomalies in the image [63].

Fractional Rényi entropy extends the concept of Rényi entropy by extending

α

to non-integer values, allowing for a more refined characterisation of the uncertainty and complexity within a probability distribution. This generalisation enables a spectrum of entropy measures between the limit cases of

α = 1

and

α = 0

[60].

In [60], the authors point out that image contrast and quality are major factors in the quality of image segmentation techniques. To address this, the authors propose using fractional Rényi entropy for image enhancement before employing a CNN for segmentation, as shown in Figure 5.

The enhanced image

\tilde{y}

is obtained from the input image

y

through pixel-wise multiplication formulated as follows (for details, see [60]):

\tilde{y} = y * \frac{R_{α} (p)}{α} (\sum_{i = 1}^{m} p_{i}^{α}) .

(40)

The fractional-order

α

is determined experimentally on the training dataset [60].

The findings in [60] suggest that employing fractional Rényi entropy enhances the robustness of the model against inhomogeneous intensity values and preserves spatial relationships between image pixels. Although promising, several questions remain unanswered, such as whether there are less time-consuming strategies for selecting the

α

value and whether the

α

value could be adaptively chosen for each image region. Furthermore, fractional Rényi entropy can provide advantages to other well-established networks, such as Transformers. Addressing these questions could further improve the effectiveness and efficiency of the proposed approach.

3.3. Object Detection

Object detection is the process of locating and classifying objects within digital images or video frames. The primary objective of object detection is to accurately identify and locate instances of predefined object classes within the image, using bounding boxes or outlines delineating their positions [28]. Then, a label or category is assigned to each detected object [66].

FC for object detection remains a relatively unexplored frontier, with limited research available in this area. Existing work focuses on exploiting fractional-order moments for feature extraction [67] and employing fractional-order populational and evolutionary optimisation strategies to refine the localisation of objects within images [68].

The field of object detection has undergone a revolutionary transformation with the widespread adoption of NNs, enabling the autonomous identification and location of objects. Numerous research efforts have propelled the development of several NN architectures with enhanced accuracy, faster inference speeds, or reduced computational costs. Among the most widely used architectures is the region-based convolutional neural network (RCNN) family [69], which includes variants such as Fast RCNN [70] and Faster RCNN [71]. These leverage a blend of convolutional layers for feature extraction and region proposal algorithms to pinpoint potential object locations. By scrutinising these regions individually, they are able to classify objects and predict bounding boxes. Moreover, the emergence of one-stage detectors, such as YOLO (You Only Look Once) [72] and SSD (Single-Shot Detector) [73] has enabled real-time object detection by directly predicting object classes and bounding boxes in a single pass. Through the usage of extensive datasets annotated with object labels, NNs acquire the ability to generalise across diverse object categories and accommodate variations in scale, orientation, and occlusion, thereby solidifying their indispensable role across applications spanning satellite surveillance [74,75], public parking management [76], robotics [77], and pest management [78].

The application of FC in NN-based techniques for object detection is still relatively new but shows promising results [67,79,80,81].

In [79], inspired by previous work [67], the authors propose leveraging fractional-order Legendre moments for feature extraction. These features are then used by an NN for object detection based on the extracted feature maps.

Additionally, in [80,81], the authors propose fractional-order population-based and evolution-based optimisation algorithms to improve the optimisation process of NN parameters. However, since this survey paper only focuses on methodologies that directly modify the NN architecture or preprocess input images before feeding them into the NNs, these works are not be covered.

Fractional-Order Legendre Moment Invariants

Image moments are mathematical descriptors that are used to characterise the spatial distribution and properties of intensity values within an image. They are computed by integrating the intensities of the pixels in an image

y (i, j)

, where i and j are the spatial coordinates. The image moment

M_{p, q}

is defined as [82]

M_{p, q} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} i^{p} j^{q} y (i, j),

(41)

where p and q are non-negative integers representing the order of the moment. These moments provide insights into various image attributes, such as the centroid, area, orientation, and higher-order shape characteristics. Since these are translation dependent, central moments

μ_{p, q}

are often preferred, as they are invariant to translation [82]:

μ_{p, q} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} {(i - \bar{i})}^{p} {(j - \bar{j})}^{q} y (i, j),

(42)

where

\bar{i}

and

\bar{j}

are the centroid of an image computed as

\bar{i} = \frac{M_{10}}{M_{00}}

and

\bar{j} = \frac{M_{01}}{M_{00}} .

Image moments are widely used for shape analysis and object recognition, as they provide valuable information about the location, size, and centres of objects within an image.

There are several moments commonly used in image analysis, each with unique properties and applications. Integer-order moments, such as Zernike, Legendre, and Chebyshev moments, are among the most widely employed due to their effectiveness in capturing different aspects of image content [82].

Legendre moments are a class of orthogonal moments used in image analysis to capture shape information and structural features within an image. These moments are derived from Legendre polynomials and are computed by integrating the pixel intensities of the image weighted by Legendre polynomials of integer degree, mathematically formulated as [82]

L_{p q} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} P_{p} (i) P_{q} (j) y (i, j),

(43)

where

P_{p} (i)

and

P_{q} (j)

are Legendre polynomials of degree p and q, respectively. Legendre moments offer advantages such as orthogonality, compactness, and rotational invariance, making them well suited for tasks such as pattern recognition, shape analysis, and image retrieval. Additionally, their robustness to noise and illumination variations enhances their utility in real-world applications. Legendre moments also provide a concise representation of image content while preserving important geometric and structural information, contributing to the development of efficient and effective image-processing techniques [83]. The Legendre polynomials

P_{p} (i)

can be computed using a recurrence formula given by

P_{p + 1} (i) = \frac{(2 p + 1) (2 i - 1)}{p + 1} L_{p} (i) - \frac{p}{p + 1} L_{p - 1} (i), p \geq 1,

(44)

with

P_{0} (i) = 1

and

P_{1} (i) = 2 i - 1

.

P_{q} (j)

can be computed reciprocally.

Although integer-order moments offer valuable insights into the overall shape and spatial distribution, they may lack the sensitivity required to accurately represent intricate features [67,83].

Fractional-order moments extend the concept of integer-order moments to non-integer values of

α

and

β

, offering a more refined characterisation of image properties. These moments

M_{α, β}

are formulated as

M_{α, β} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} i^{α} j^{β} y (i, j),

(45)

where

α

and

β

are non-integer values. Fractional-order moments offer enhanced sensitivity to subtle variations in image structure and texture, enabling more precise analysis and interpretation. This heightened sensitivity allows for improved accuracy in locating Regions of Interest within the image [67].

Fractional-order Legendre moments are an extension of integer-order Legendre moments (43). Through allowing non-integer values for the degree of Legendre polynomials, these moments provide a more flexible and adaptive framework for image analysis. The fractional Legendre moment

L_{α, β}

is formulated as [67]

L_{α, β} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} P_{α} (i) P_{β} (j) y (i, j),

(46)

where the Legendre polynomials

P_{α} (i)

can be computed using a generalisation of recurrence Formula (47) and introducing the change in variable

i = 2 t^{a} - 1

(for details, see [79]):

P_{α + 1}^{a} (t) = \frac{(2 α + 1) (2 t^{a} - 1)}{α + 1} L_{α}^{a} (t) - \frac{α}{α + 1} P_{α - 1}^{a} (t), α \geq 1,

(47)

with

P_{0}^{α} (t) = 1

and

P_{1}^{α} (t) = 2 t^{α} - 1 .

P_{q + 1}^{β} (t)

can be computed reciprocally.

The fractional-order Legendre moments can be extended to be used in three dimensions, enabling the representation of the features of three-dimensional (3D) images much used in medical imaging. Thus, the 3D fractional Legendre moment

L_{α, β, γ}

can be formulated as

L_{α, β, γ} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} \sum_{k = 1}^{K} P_{α} (i) P_{β} (j) P_{γ} (k) y (i, j, k),

(48)

where

P_{α} (i), P_{β} (j),

and

P_{γ} (k)

are the fractional-order Legendre polynomials of degrees

α, β,

and

γ

along the

i, j,

and k axes, respectively, and

y (i, j, k)

is a 3D image.

Motivated by the application of fractional-order moments for the classification of 2D objects, in [79] the authors propose using 3D fractional-order moments (for formulation details, see [79]) as an input descriptor of an NN, thus giving rise to a new NN architecture, the Fractional-Order Lagrange Moments Deep NN (FrOLM-DNN).

The 3D fractional-order Legendre moments serve as an input descriptor for an NN. For each image, the moments are computed, forming a descriptor vector containing moments up to order r, where r is a user-selected hyperparameter. Subsequently, this descriptor vector is fed into the input layer of an NN, enabling the network to learn and classify the object within the image accurately. In this way, the 3D fractional-order Legendre moments function as feature extractors, facilitating effective object classification [79], as shown in Figure 6.

The main motivation for employing fractional-order moments lies in their additional parameters, which offer the potential for improved results tailored to specific use cases. The experimental results described in [79] demonstrate that the integration of 3D fractional-order moments with an NN leads to improved classification accuracy compared to using them in isolation.

However, an open question raised in [79] is the impact of selecting the moments order r on the results and how to effectively determine the value of this hyperparameter. One potential avenue for future research involves devising a strategy to automate and optimise the selection of this parameter. By developing such a strategy, researchers can streamline the process of hyperparameter tuning and potentially improve the overall performance of the classification system.

3.4. Segmentation

Image segmentation is the process of partitioning a digital image into multiple segments or regions based on certain criteria such as colour, intensity, texture, or spatial proximity. The goal is to break an image into smaller parts; these segments often correspond to objects or regions of interest within the image, allowing for further analysis or manipulation at a more granular level [28].

The development of FC methods for image segmentation is a well-explored and rising field, with various approaches explored in the recent literature. Some methods employ fractional-order optimisation algorithms to refine segmentation accuracy, while others introduce novel loss functions incorporating fractional orders. Additionally, established segmentation algorithms have been extended to accommodate fractional-order operations, enhancing performance on images with intricate textures [12,84,85,86,87,88].

CNNs are particularly well suited for segmentation tasks due to their ability to effectively capture spatial dependencies and hierarchies of features within images. One of the most well-known architectures is U-Net, which employs an encoder–decoder structure with skip connections to preserve spatial information and enhance segmentation accuracy [89]. Through training on annotated datasets, NNs learn to delineate boundaries and assign pixel-level labels to different regions within an image, effectively segmenting objects from the background or distinguishing between different object classes. Several successful applications can be found in the literature, including the medical analysis of chest scans [90], fingerprint security devices [91], and the detection of road cracks [92].

While some research endeavours have incorporated FC into NN-based methods for image segmentation, the majority have primarily relied on fractional-order population-based optimisation algorithms to enhance the optimisation process of NN parameters [93,94]. However, this survey paper focuses on methodologies that directly modify the NN architecture or preprocess input images before feeding them into the NNs. In light of this, we highlight two notable works [17,95].

In [17], the authors use FOCNet [42] for image segmentation. In contrast, [95] proposes the use of fractional-order differentiation active contour models for segmentation, employing an NN to solve fractional-order PDEs and thereby reduce the computational complexity associated with fractional-order active contour models.

3.4.1. Active Contour Detection with Fractional-Order Regularisation Term

Active contour models are tools particularly useful for segmenting objects with complex or ambiguous boundaries. These models are represented as a parametric curve or contour that evolves over time to minimise an energy functional. The contour is attracted to features of interest in the image while being constrained by factors such as smoothness and shape.

Level-set functions (LSF) are improved active contours that implicitly represent the contour as the zero level set of a higher-dimensional function defined over a larger domain that includes the entire image. The evolution of the contour is described by the evolution of the level-set function, governed by a fractional-order partial differential equation (FPDE). The level-set methods evolve smoothly over the entire domain, and the contour is extracted as the zero level set at each time step. Level-set methods offer advantages in handling topological changes and complex contour deformations, making them suitable for tasks where the object boundaries are ill defined or undergo significant changes over time [95].

Despite their differences, level-set methods and active contour models share the goal of accurately segmenting images by evolving contours to capture object boundaries. Level-set methods can be seen as a generalisation of active contour models, where the contour evolution is described implicitly through the evolution of a level-set function. This connection allows for the incorporation of active contour energy terms into level-set formulations, enhancing their versatility, robustness, and adaptability [95].

Variational level-set methods extend basic level-set techniques by incorporating variational principles into the formulation, enabling the optimisation of a variational energy functional to evolve contours over time. Through formulating the segmentation problem as an optimisation task, variational level-set methods provide a systematic framework for integrating various constraints, prior knowledge, and image features into the segmentation process. This approach offers improved convergence, stability, and flexibility, and has been successfully applied to challenging segmentation problems involving complex object shapes, noisy images, and topological changes [96].

The first instance of combining FDEs with active contour methods was introduced in [97]. In this work, a fractional-order differentiation active contour model was proposed, employing variational level-set methods. The energy function proposed in this model comprises three terms: a fractional-order fitting term, a regularisation term, and a penalty term. The fractional-order term enables a more precise representation of the image and improves the robustness to noise, while the penalty term ensures stable evolution [95,97].

Despite these advantages, this method incurs a significant computational burden [95]. To address computational cost, [95] proposes using cellular neural networks (CeNNs) [98] to solve FPDEs instead of finite difference numerical schemes (see [95] for the mathematical formulation). CeNNs, which are composed of locally connected neurons arranged in a grid-like structure, offer stability, noise robustness, and time efficiency in computing solutions to FPDEs for active contour methods [95].

Due to the usage of FPDEs, which describe the temporal evolution of a system, an open area of research is the application of this approach for object tracking in temporal image sequences. Additionally, considering the popularity of physics-informed neural networks [99], it would be interesting to explore replacing CeNNs with this architecture in future studies.

3.4.2. FOCNet for Segmentation

In [17], the authors recognise the potential of FOCNet beyond denoising tasks. They propose to use the weighted skip connections of FOCNet to improve the image segmentation performance, as shown in Figure 2. To facilitate segmentation using FOCNet, the authors suggest employing the Dice Coefficient as the loss function, which quantifies the similarity between the ground truth and predicted segmentation masks. Hence, the distinction between FOCNet as proposed in [42] and in [17] lies in the choice of loss function and the nature of the training data. For denoising tasks, the input comprises a noisy image, with the corresponding ground truth being the denoised version of the image [42]. Conversely, in segmentation tasks, the input consists of the original image, while the ground truth represents the segmented image [17].

The usage of FOCNet for segmentation has demonstrated computational efficiency and outperformed other segmentation methods. The presence of fractional derivatives that determine the weights of the connections enables the propagation of information from shallower layers [17].

3.5. Restoration

Image restoration is the process of recovering an undistorted version of a digital image from a damaged or corrupted input. Degradation in images can occur due to various factors such as sensor limitations, time degradation, or environmental factors. The goal is to complete the degraded images, making them suitable for analysis, interpretation, or presentation purposes. Restoration/inpainting techniques aim to reverse or mitigate the effects of degradation by applying mathematical models, filters, or learning-based algorithms to estimate and recover the original image underlying.

The usage of FC for enhancing image restoration and inpainting has received considerable attention in the literature, with numerous studies showcasing its performance advantages. These works demonstrate the efficacy of FC in modelling the intricate spatial and temporal variations present in image structures, leveraging information from surrounding areas to achieve more accurate restoration and inpainting results. By seamlessly integrating fractional-order techniques, these approaches effectively capture the nuanced in-between behaviours often overlooked by traditional integer-order methods [100,101,102,103,104].

The field of restoration has risen in popularity mostly due to architectures based on Variational Autoencoders (VAEs) [105] and Generative Adversarial Networks (GANs) [106] that offer novel approaches to image reconstruction. VAEs are renowned for their ability to learn rich probabilistic models of data, enabling them to effectively capture complex distributions in image space. By encoding input images into a latent space and then decoding them back to their original form, VAEs can learn to inpaint images while also providing uncertainty estimates [107]. On the other hand, GANs introduce a competitive training scheme between a generator network and a discriminator network, resulting in the generation of highly realistic images. In the context of restoration, GANs excel in producing visually convincing results by learning intricate details and textures from training data [108]. The success of NN architectures in image restoration has sparked various applications, including restoring damaged historical texts [109], removing unwanted objects [110], and the completion of medical scans [111].

The combination of FC and NN-based methods for image restoration/inpainting is almost unexplored, with only one work available [112]. Given the potential already demonstrated in the literature with the several works that are not using NNs, it comes easily that this is a promising field.

In their work [112], the authors introduce an innovative approach that leverages fractional-order wavelet transforms to enhance feature extraction within an NN encoder. Their methodology extends beyond traditional techniques by integrating fractional-order operations into the encoding process, enabling more nuanced and comprehensive feature representation. Moreover, the authors propose a novel strategy for image generation, where multiple fractional-order encoders are employed to produce different representations of the same image. These representations are subsequently merged to create a single composite image, with enhanced detail and richness.

Fractional Wavelet Scattering Networks

GANs [106] and VAEs [105] represent two influential paradigms in the realm of generative modelling, a field aiming to understand and replicate the underlying structure of data. GANs operate on a game-theoretic framework where a generator network competes against a discriminator network, iteratively improving the generation of data until it becomes indistinguishable from real data. On the other hand, VAEs adopt a probabilistic approach, aiming to encode and decode data by modelling the underlying probability distribution. While GANs excel in generating high-quality, realistic samples, VAEs offer a principled way to learn latent representations of data and perform tasks such as data compression and synthesis. Both methods have witnessed tremendous advancements and applications across various domains, revolutionising tasks like image generation, data augmentation, and anomaly detection.

GANs and VAEs are notorious for their challenging training dynamics, including unstable training and issues like blurred images or model collapse. To address these challenges, Generative Scattering Networks (GSNs) were introduced, leveraging wavelet scattering networks (ScatNets) as encoders and CNNs as decoders [113].

ScatNets are mathematical models designed to extract meaningful features from signals, particularly in the domain of generative modelling. They operate by performing a cascade of operations on the input signal

y (t)

in each layer of the ScatNet

S_{n}

: wavelet transform, in which

y (t)

is convolved with wavelet function

ψ_{λ} (t)

at different scales

λ

,

S_{1} y (t) = | y (t) ψ_{λ} (t) |

; modulus nonlinearity, in which after each wavelet transform, the modulus operation is applied element-wise

S_{n} y (t) = | S_{n - 1} y (t) |

; and pooling, in which pooling operations are performed to aggregate information across scales

S_{p o o l} y (t) = T_{↑} | S_{n} y (t) |

. Through this hierarchical process, ScatNets create a series of increasingly invariant and abstract representations of the input signal, capturing both local and global structures.

After extracting the features with a ScatNet, Principal Component Analysis (PCA) is applied to reduce the dimensional resulting in the latent space vector

z

. Then, the decoder uses a CNN for deconvolution and to output the predicted image

\tilde{y}

, as shown in Figure 7.

GSNs simplify training by avoiding the need to learn ScatNet parameters, yet they may suffer from reduced image quality due to limitations in ScatNets’ expressiveness and overfitting induced by PCA in dimensionality reduction [112].

In response to these limitations, Generative Fractional Scattering Networks (GFRSNs) were proposed as an extension of GSNs in [112]. GFRSNs aim to address the overfitting issue by introducing a more suitable dimensional reduction method, thus enhancing GSN performance.

GFRSN includes an encoder composed of novel components, a Fractional Wavelet Scattering Network (FrScatNet) and a Feature Map Fusion (FMF) dimensional reduction method [112].

FrScatNet extends the concepts of ScatNets to non-integer order wavelet transforms by introducing a fractional convolution operator

Θ_{α}

to the wavelet transform operation formulated as [112]

S_{1} y (t) = | x (t) Θ_{α} ψ_{λ} | = e^{- \frac{j}{2} t^{2} cot (θ) [y (t) e^{\frac{j}{2} t^{2} cot (θ)} ψ_{λ} (t)]},

(49)

where

α

is the fractional order, and

θ = \frac{α π}{2}

is the rotation angle. Note that FrScatNet can be reduced to a ScatNet when

α = 1 .

In [112], after employing FrScatNet for feature extraction, the authors propose using an FMF method instead of PCA, as used in ScatNets. The rationale behind this choice is that PCA fails to consider the semantic differences present in the features extracted by FrScatNets across different layers, thus overlooking the hierarchical information contained within the features. The reduced dimensional feature map after applying FMF to the features extracted by FrScatNet is the latent space

z

(for more details on FMF, see [112]).

FrScatNets are equipped with a hyperparameter

α

that determines the fractional order of the convolution. Since

α

can be chosen arbitrarily, different

α

values lead to the extraction of distinct features. Consequently, using FrScatNets with varying

α

values results in the generation of multiple feature vectors. To fully exploit this diversity in feature extraction, the authors propose merging the image predictions obtained from different

α

values using an image fusion technique. This approach allows for embedding the input in different fractional-order domains to enhance the quality of the generated images. The image fusion method proposed can be formulated as [112]

{\tilde{y}}_{α_{1}, α_{2}} = ω {\tilde{y}}_{α_{1}} + (1 - ω) {\tilde{y}}_{α_{2}},

(50)

where

{\tilde{y}}_{α_{1}}

and

{\tilde{y}}_{α_{2}}

are image predictions generated by feature extraction using an FrScatNet with fractional-orders

α_{1}

and

α_{2}

, respectively. The hyperparameter

ω

acts as a weighting factor that determines the contribution of each predicted image [112], Figure 8.

The introduction of GFRSN in [112] opens up several research gaps that warrant further investigation. First, there is a need to explore the methodologies to select the appropriate

α

value for FrScatNets. Identifying which

α

values yield the most distinct features would be beneficial, as combining such features could potentially enhance the completion of the predicted output. Furthermore, while the proposed FMF method takes the average of the third-layer’s feature maps, there is room for exploring more sophisticated weighting strategies. Using alternative weight strategies could potentially improve the efficacy of the dimensionality reduction technique used in GFRSNs.

3.6. Compression

Contemporary advancements in deep learning have propelled the field to achieve remarkable performances across a spectrum of tasks, encompassing image classification, semantic segmentation, object detection, pose detection, and beyond. This progress is underpinned by the evolution of increasingly complex architectures, housing millions, and even billions of trainable parameters, thereby augmenting model efficacy. However, the proliferation of such massive architectures poses challenges concerning memory and computational resources, hindering seamless deployment in edge computing devices and other resource-constrained environments.

Reducing the computational cost of NN architectures for denoising, image restoration, segmentation, object detection, and enhancement is a critical endeavour in computer vision research. Given the increasing demand for real-time and resource-efficient solutions, optimising these architectures for computational efficiency is essential for practical deployment in various applications. Techniques for reducing computational cost typically focus on minimising the number of operations, parameters, or memory footprint required by the models while maintaining satisfactory performance.

In our survey of the literature, we uncovered two works that harness FC to alleviate the computational burden associated with NN-based computer vision tasks [19,114].

In [114], the authors introduce fractional max-pooling (FMP), a novel technique designed to address the limitations of traditional max-pooling methods. Through incorporating fractional-order principles, fractional max-pooling allows for overlapping windows and mitigates information loss during the pooling process, thereby enhancing feature preservation while reducing the reduction size. Conversely, in [19], the authors present a groundbreaking approach by deriving fractional convolutional filters tailored for popular integer-order filters such as Gaussian, Sobel, and Laplacian. This innovative formulation not only improves filter performance but also offers a compelling advantage: a reduction in the number of parameters is required, which remains constant regardless of filter size. Moreover, these fractional filters exhibit intermediate behaviours, providing a versatile tool for capturing nuanced features within the image data.

3.6.1. Fractional Max-Pooling

Max-pooling is a fundamental operation in CNNs used for downsampling feature maps, reducing computational complexity, and extracting dominant features. Given an input feature map with dimensions

I_{i n} \times J_{i n} \times D_{i n}

, where

I_{i n}

,

J_{i n}

, and

D_{i n}

are the width, height, and number of channels, respectively, max-pooling partitions the input into non-overlapping regions and computes the maximum value within each region to produce the output [114].

Let

l \times l

denote the size of the pooling window, where l is typically a small integer; usually,

l = 2

is used as default. The output dimensions of the feature map after max-pooling are given by

I_{o u t} = ⌊\frac{I_{i n}}{l}⌋

and

J_{o u t} = ⌊\frac{J_{i n}}{l}⌋ .

The pooling regions

P_{i, j}

are computed by dividing the input feature map

I_{i n} \times I_{i n}

by the size of the output feature map

I_{o u t} \times I_{o u t}

(considering

J_{i n} = I_{i n}

and

J_{o u t} = I_{o u t}

) [114]:

P_{i, j} \subset {1, 2, \dots, I_{i n}}^{2} with (i, j) \in {1, \dots, I_{o u t}}^{2},

(51)

where the size of the pooling windows can be computed as

P_{i, j} = [2 i - 1, 2 i] \times [2 j - 1, 2 j] .

(52)

Then, for each pooling region, the maximum pixel value of the input feature map

I n_{P_{i, j}}

will be kept and used to compose the output feature map

O u t_{i, j}

:

O u t_{i, j} = max (I n_{P_{i, j}}) .

(53)

Max-pooling introduces translation invariance and reduces the spatial dimensions of the feature maps, which helps in controlling overfitting and improving computational efficiency. Through retaining only the maximum activations within each pooling region, max-pooling focuses on preserving the most salient features while discarding less relevant information, facilitating hierarchical feature learning in CNNs [114].

The limitations of max-pooling in CNNs are indeed well recognised in the literature [115,116,117]. One major drawback is its fixed pooling window size, which may not adequately capture diverse spatial patterns within feature maps, especially in scenarios where objects vary significantly in scale or orientation. Additionally, the non-overlapping nature of max-pooling leads to a loss of spatial information between adjacent pooling regions, potentially discarding valuable details crucial for tasks like object localisation. Moreover, default max-pooling results in a rapid reduction in the size of hidden layers

\frac{I_{i n}}{I_{o u t}} \approx 2

, necessitating the use of multiple stacked convolutional layers to achieve significant depth. While some methods have been proposed to address this issue [115,116], they still result in a halving of the size of hidden layers, highlighting the need for a gentler approach to spatial pooling [114].

A potential solution lies in adopting a more flexible approach to pooling that reduces the size of hidden layers by a smaller factor. By incorporating additional layers of pooling, each with a smaller reduction factor, we can observe the input image at different scales, potentially leading to an easier recognition of distinctive features indicative of specific object classes. This approach could lead to more effective feature extraction and improve the performances of CNNs in various computer vision tasks [114].

In [114], the authors introduce FMP to address the issue of controlling the reduction in the spatial size of images by a fractional-order

1 < α < 2

. Additionally, FMP introduces flexibility by allowing overlapping pooling regions, thus preserving spatial information more effectively. The pooling regions

P_{i, j}

in FMP can either be overlapping squares or disjoint collections of rectangles. To generate

P_{i, j}

, two hyperparameters are needed:

a_{i}

and

b_{j}

, with

i, j \in 0, \dots, I_{o u t}

. In considering

a_{i}

and

b_{j}

as two increasing sequences with a step of 1 ending at

I_{i n}

, then the overlapping pooling regions can be computed as [114]

P_{i, j} = [a_{i - 1}, a_{i} - 1] \times [b_{j - 1}, b_{j} - 1],

(54)

and the disjoint pooling regions as

P_{i, j} = [a_{i - 1}, a_{i}] \times [b_{j - 1}, b_{j}] .

(55)

The output dimensions of the feature map after FMF are given by

I_{o u t} = ⌊\frac{I_{i n}}{α}⌋

and

J_{o u t} = ⌊\frac{J_{i n}}{α}⌋

, where

α

is the fractional-order of reduction

\frac{I_{i n}}{I_{o u t}} \in (1, 2)

[114].

FMP holds promise in enhancing the performance of CNNs as well as in reducing their computational cost without losing information. Several research directions stemming from this work [114] could be pursued. For instance, investigating optimal combinations of reduction ratios and overlap factors in FMF could lead to improved feature representation while minimising information loss. Additionally, exploring the integration of FMF with other techniques, such as attention mechanisms or adaptive pooling strategies, may result in synergistic improvements in both model performance and efficiency.

3.6.2. Fractional Convolutional Filters

In [19], authors propose learning reduced representations of convolutional filters through combining fractional calculus and NNs, originating fractional convolutional filters.

As seen previously, convolutional filters are used in NNs to extract features from images, with the most popular filters being Gaussian, Sobel and Laplacian. These are intimately connected through the derivatives of a Gaussian filter. The first derivative of the Gaussian corresponds to the Sobel operator, emphasising edges in a particular direction, while the second derivative of the Gaussian, known as the Laplacian of the Gaussian, enhances areas of rapid intensity change regardless of direction. Mathematically, the relationship can be expressed as follows [19]:

G (i, j) = \frac{1}{2 π s t d^{2}} e^{- \frac{i^{2} + j^{2}}{2 s t d^{2}}}, \frac{\partial G (i, j)}{\partial i} \propto j e^{- \frac{i^{2} + j^{2}}{2 s t d^{2}}}, \frac{\partial^{2} G (i)}{\partial i^{2}} = 4 i^{2} e^{- i^{2}} - 2 e^{- i^{2}},

(56)

where

s t d

is the standard deviation.

Mathematically, using the Grünwald–Letnikov definition and considering the first 15 terms of the Taylor series, the authors extend this to the fractional-order derivative of order

α

of the Gaussian

D^{α} G (i, j)

[19]:

D^{α} G (i) = \frac{A}{h^{α}} \sum_{m = 0}^{15} \frac{Γ (α + 1) G (i)}{{(- 1)}^{m} Γ (m + 1) Γ (1 - m + α)},

(57)

where

Γ

is the Gamma function.

The fractional derivative of a Gaussian provides a versatile framework for deriving a range of filters, including Gaussian, Sobel, Laplacian, and more. By selecting specific values for the fractional order

α

, we can directly obtain the corresponding traditional filters:

α = 0

yields the Gaussian filter;

α = 1

, the Sobel filter; and

α = 2

, the Laplacian filter. Remarkably, by varying

α

, we can interpolate between these filters, creating a continuum of filter behaviours that smoothly transition between them. This flexibility enables the creation of general customised filters whose behaviour ranges between the previously referred filters [19]. A 2D fractional convolutional filter is thus defined as

F (i, j) = A D^{α} D^{β} e^{- \frac{{(i - i_{o})}^{2} + {(j - j_{o})}^{2}}{s t d^{2}}},

(58)

where

D^{α} D^{β} G (i, j) = D^{α} G (i) \times D^{β} G (j)

. The 3D fractional convolutional filter is given by

F (i, j, k) = A D^{α} D^{β} D^{γ} e^{- \frac{{(i - i_{o})}^{2} + {(j - j_{o})}^{2} + {(k - k_{o})}^{2}}{s t d^{2}}},

(59)

where

α, β,

and

γ

are the orders of the fractional derivatives, and

s t d, A, i_{o}, j_{o},

and

k_{o}

are the parameters that define the filter.

One significant computational benefit of fractional convolutional filters is that the number of parameters required to describe them remains constant, regardless of the filter size, offering a substantial reduction in the number of parameters compared to integer-order filters. For instance, instead of needing

l \times l

parameters for an

l \times l

pixel-wise dimension filter, fractional convolutional filters require only a maximum of 6 (2d) or 9 (3d) parameters. This reduction in parameter count simplifies the model and enhances computational efficiency, making fractional convolutional filters an attractive option for various applications in image processing and computer vision [19].

An important result from [19] is that the introduction of fractional convolutional filters in CNNs allowed us to achieve a record for the smallest model that could achieve an accuracy greater than

99 %

on the MNIST dataset [19]. In building upon this milestone, several promising research directions emerge. Firstly, exploring the applicability of fractional convolutional filters across diverse datasets and domains can unveil their generalisability and robustness. Furthermore, delving into the theoretical foundations and mathematical properties of fractional convolutional filters may lead to deeper insights into their underlying mechanisms and ease the development of more sophisticated variants.

4. Conclusions

In this survey paper, we conducted an extensive exploration of the integration of fractional calculus with neural network-based computer vision methodologies, focusing on denoising, enhancement, object detection, segmentation, restoration, and neural network compression tasks; see a summary in Appendix A Table A1. While fractional calculus’s application in computer vision is well established, its incorporation into neural network-based approaches is still in its early stages, with limited works in the literature. Nonetheless, the results from existing studies demonstrate notable performance improvements, indicating the potential for further advancements in this area. Through our investigation, we identified several research gaps and outlined potential directions for future exploration in each of the studied works.

In real-world conditions, image capture often occurs under less-than-ideal circumstances, resulting in complex artefacts and challenges such as corruption and varying light conditions. Addressing these challenges has spurred the development of numerous methods aimed at enhancing neural network performances in computer vision tasks. The versatility of fractional calculus methods, with their increased degrees of freedom capable of modelling nuanced behaviours, holds promise in overcoming the hardships encountered in computer vision tasks.

The goal of this survey is to provide an overview of the current research and to offer intuitive explanations of the proposed methods. We aim to demystify fractional calculus and motivate its usage over integer-order methods by providing accessible explanations. We understand that fractional calculus can appear daunting, but we hope that this survey serves to dispel misconceptions and inspire researchers to contribute to the incorporation of fractional calculus into neural network-based computer vision.

In addition to the identified research gaps, we suggest investigating the influence of non-integer order derivatives on additional aspects of convolutional neural networks architectures, such as regularisation techniques, attention mechanisms, and transfer learning. Exploring these avenues could yield novel strategies for model optimisation and performance enhancement. Additionally, given the time dependence inherent in fractional calculus, we envisage that leveraging the memory capabilities could advance video object detection and generation tasks.

Furthermore, it is imperative to explore the interpretability and explainability of fractional calculus in neural network-based computer vision models. Understanding how fractional-order operations affect feature representations and decision-making processes within the network can provide valuable insights into model behaviour and foster trust among researchers and field experts. Furthermore, investigating the generalisability and robustness of fractional calculus-enhanced neural networks across diverse datasets and real-world scenarios is crucial to ensure their practical applicability. Finally, the scalability and computational efficiency of fractional calculus-based methods merit further investigation, particularly in resource-constrained environments and large-scale applications.

Author Contributions

Conceptualization, C.C.; methodology, C.C.; validation C.C., M.F.P.C. and L.L.F.; formal analysis, C.C. and L.L.F.; investigation, C.C.; writing—original draft preparation, C.C.; writing—review and editing, C.C., M.F.P.C. and L.L.F.; supervision, M.F.P.C. and L.L.F. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge the funding by Fundação para a Ciência e Tecnologia (Portuguese Foundation for Science and Technology) through CMAT projects UIDB/00013/2020 and UIDP/00013/2020 and the funding by the FCT and Google Cloud partnership through projects CPCA-IAC/AV/589164/2023 and CPCA-IAC/AF/589140/2023. C. Coelho would like to thank FCT for the funding through the scholarship with reference 2021.05201.BD. This work was also financially supported by national funds through the FCT/MCTES (PIDDAC), under project 2022.06672.PTDC—iMAD—Improving the Modelling of Anomalous Diffusion and Viscoelasticity: solutions to industrial problems.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

List of symbols.
Symbol	Meaning
$α, β, γ$	Fractional-orders of the the derivatives
$D^{α} f (x)$	Fractional derivative of order $α$ of function $f (x)$
$D^{2} f (x)$	Second-order derivative of function $f (x)$
h	Step size
$t_{0}$	Arbitrary initial time
$t_{f}$	Arbitrary final time
$y (s)$	Input image
s	Pixel
$Φ (\cdot), Ψ (\cdot)$	Linear transformations
$u (t, s)$	Control input
$θ$	NN learnable parameters
$w_{k}$	Weight of layer k
$T_{↑}, T_{↓}$	Pooling and unpooling operations
$σ$	Nonlinear operation
$l_{i}$	FOCNet of level i
$θ_{i}$	Learnable parameters of level i
$u_{t}^{l_{i} \pm 1}$	Upper-level and lower-level features
$I, J, K$	Pixel-wise width, length, and depth of an image
$i, j, k$	Coordinates of a pixel
$X^{'}$	Denoised image
$L (θ)$	Loss function
$λ$	Hyperparameter
$S$	Sobel filter
$L$	Laplace filter
$A$	AFD mask
$M (i, j)$	Average gradient of a pixel
$t_{g}$	Gradient threshold
Q	Mean gradient of an image
$A G_{e d}, A G_{t e x}$	Average gradients of edges and textures
$R_{α}$	Rényi entropy
$p_{s}$	Normalised histogram of pixel intensities
M	Image moment
$μ$	Central moment
$\bar{x}, \bar{y}$	Image centroid
P	Legendre polynomial
L	Legendre moment
$\tilde{Y}$	Generated image
$S_{n}$	ScatNet layer
$S_{p o o l}$	ScatNet pooling
$Θ_{α}$	Fractional convolution operator
$ψ_{λ} (t)$	Wavelet function at scale $λ$
$ω$	Weighting hyperparameter
$G (i, j)$	Gaussian operator
$s t d$	Standard deviation
$F (i, j)$	Fractional convolutional filter
$i_{o}, j_{o}, k_{o}$	Parameters of a filter
l	Pixel-wise dimension of a filter
$I_{i n}, J_{i n}, D_{i n}$	Width, height, and channels of the input feature map
$I_{o u t}, J_{o u t}, D_{o u t}$	Width, height, and channels of the output feature map
$P_{i, j}$	Pooling region
$I n$	Input feature map
$O u t$	Output feature map
$a_{i}, b_{j}$	Hyperparameters for computing pooling windows
List of abbreviations.
Abbreviation	Meaning
HOG	Histogram of Oriented Gradients
FC	Fractional Calculus
NN	Neural Network
ML	Machine Learning
CNN	Convolutional Neural Network
FDE	Fractional Differential Equation
FPDE	Fractional Partial Differential Equation
FOCNet	Fractional Optimal Control Network
F-ODE	Fractional Ordinary Differential Equations
TV	Total Variation
FTV	Fractional-order Total Variation
LDD	Left-Down Direction
RUD	Right-Up Direction
LUD	Left-Up Direction
RDD	Right-Down Direction
AFD	Adaptive Fractional-order Differential
AFDA	Adaptive Fractional-order Differential Algorithm
RCNN	Region-based Convolutional Neural Network
YOLO	You Only Look Once
SSD	Single-Shot Detector
FrOLM-DNN	Fractional-Order Lagrange Moments Deep Neural Network
LSF	Level-Set Function
PDE	Partial Differential Equation
CeNN	Cellular Neural Network
VAE	Variational Autoencoder
GAN	Generative Adversarial Network
GSN	Generative Scattering Network
ScatNet	Wavelet Scattering Network
PCA	Principal Component Analysis
GFRSN	Generative Fractional Scattering Networks
FrScatNet	Fractional Wavelet Scattering Network
FMF	Feature Map Fusion
FMP	Fractional Max-Pooling

Appendix A. Summary Table

Table A1. Summary table of the methods discussed in this survey.

Task	Proposal	Advantages	Experimental Setup	Results
Denoising	Fractional-Order Total Variation [41]	Incorporates information from neighbouring pixels Reduces artefacts	Dataset: Google Maps images Comparison: same network with total and fractional-order total variation loss Metrics: peak signal noise ration, structural similarity index, and universal quality	Best performance metric Better performance in preserving texture details
	Fractional Optimal Control Network [42]	Propagates features depth-wise within an NN	Dataset: Set12 [118], BSD68 [119], and Urban100 [120], with three induced noise levels Comparison: BM3D [121], WNNM [122], TNRD [123], DnCNN [118], FFDNet [124], RED [125], MemNet [126], and $N^{3} N e t$ [127] Metric: average peak signal-to-noise ratio	Leading performance metric results Similar computational cost
Enhancement	Neural Fractional-order Adaptive Masks [18]	Captures the advantages and reduce the disadvantages of first- and second-order masks	Dataset: chest X-ray images, ultrasonic images, and pelvis radiography images, Comparison: same network with and without neural fractional-order adaptive masks Metrics: information entropy, mean absolute difference coefficient, and absolute mean brightness error	Better performance metrics Offers higher contrast and clearer edges Enhances smooth areas and texture
	Fractional Rényi Entropy [60]	Refined characterisation of the uncertainty and complexity within a probability distribution Enhances robustness and preserves spatial relationships between image pixels	Dataset: kidney magnetic resonance imaging scans Comparison: Hasan et al. [128], Li et al. [129], Ibrahim et al. [130], Alaa et al. [131], and DLSS [132] Metric: accuracy	Achieves best performance metrics Improves fine details with low contrast
Object Detection	Fractional-Order Legendre Moment Invariants [67]	Fractional-order moments have additional parameters, offering improved results	Dataset: AR database of faces Comparison: LMs [133], DTMs [134], DKMs [135], ZMs [133], OFMs [136], and BMs [137] Metrics: statistical normalisation image reconstruction error and correct classification percentages	Better performance metrics Higher noise robustness
Segmentation	Active Contour Detection With Fractional-Order Regularisation Term [95]	More precise representation of the image Improves the robustness to noise	Dataset: synthetic and medical images Comparison: GAC [138], Chan-Vese [139], Chunming Li [140], Lankton [141], Shi [142], RSFLG [143], and LPF [144] Metrics: Dice Similarity Coefficient, peak signal-to-noise ratio, Hausdorff distance, structural similarity index measure, and mean sum of squares distance	Higher robustness and effectiveness Time-efficient Easy implementation
	FOCNet For Segmentation [17]	Propagation of information from shallower layers	Dataset: Massachusetts Road Dataset [145] and Ottawa Road Dataset [146] Comparison: U-Net [89], Dlinknet [147], HsgNet [148], Dense-UNet [149], SUNet [149], and SDUNet [149] Metrics: recall, precision, Dice Coefficient, accuracy, and mean intersection over union	Better performance metrics Lesser loss of information and computationally efficient
Restoration	Fractional Wavelet Scattering Networks [112]	More suitable dimensional reduction method that considers features extracted across different layers Improves reduced image quality and overfitting	Dataset: CIFAR-10 [150] and CelebA [151] Comparison: Principal Component Analysis [152] Metrics: peak signal-to-noise ratio and structural similarity	Better performance metrics Generates better images
Compression	Fractional Max-Pooling [114]	Preserves spatial information More effective feature extraction	Dataset: MNIST [153], CIFAR-100 [150], The Online Handwritten Assamese Characters Dataset [154], CASIA-OLHWDB1.1 [155], and CIFAR-10 [150] Comparison: max-pooling Metric: accuracy	Better performance metrics Better way of encoding information
	Fractional Convolutional Filters [19]	Reduced number of parameters to describe filters	Dataset: MNIST [153], CIFAR-10 [150], ImageNet [156], and UCF101 [157] Comparison: LeNet [153], LeNet5 [153], 50-50-200-10NN [158], Best Practices [159], CNN for MNIST and All-CNN [160], MobileNetV1 [161], MobileNetV2 [162], ShuffleNet 8G [163], ShuffleNet 1G [163], HENet [164], ResNet18 [165] for CIFAR-10, and ResNet18 [165] for UCF101 Metric: accuracy	Achieved new record for the smallest model with >99% accuracy on MNIST Effective for filter compression Reduces computational cost

References

Szeliski, R. Computer Vision: Algorithms and Applications; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
Dey, S. Hands-On Image Processing with Python: Expert Techniques for Advanced Image Analysis and Effective Interpretation of Image Data; Packt Publishing Ltd.: Birmingham, UK, 2018. [Google Scholar]
Prince, S.J. Computer Vision: Models, Learning, and Inference; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
Herrmann, R. Fractional Calculus: An Introduction for Physicists; World Scientific: Singapore, 2011. [Google Scholar]
Abdou, M.A. An analytical method for space–time fractional nonlinear differential equations arising in plasma physics. J. Ocean Eng. Sci. 2017, 2, 288–292. [Google Scholar] [CrossRef]
Alquran, M. The amazing fractional Maclaurin series for solving different types of fractional mathematical problems that arise in physics and engineering. Partial Differ. Equ. Appl. Math. 2023, 7, 100506. [Google Scholar] [CrossRef]
Ionescu, C.; Lopes, A.; Copot, D.; Machado, J.T.; Bates, J.H. The role of fractional calculus in modeling biological phenomena: A review. Commun. Nonlinear Sci. Numer. Simul. 2017, 51, 141–159. [Google Scholar] [CrossRef]
Ma, Y.; Li, W. Application and research of fractional differential equations in dynamic analysis of supply chain financial chaotic system. Chaos Solitons Fractals 2020, 130, 109417. [Google Scholar] [CrossRef]
Jan, A.; Srivastava, H.M.; Khan, A.; Mohammed, P.O.; Jan, R.; Hamed, Y. In vivo HIV dynamics, modeling the interaction of HIV and immune system via non-integer derivatives. Fractal Fract. 2023, 7, 361. [Google Scholar] [CrossRef]
Zhang, Y.; Pu, Y.; Zhou, J. Construction of fractional differential masks based on Riemann-Liouville definition. J. Comput. Inf. Syst. 2010, 6, 3191–3199. [Google Scholar]
Yang, Q.; Chen, D.; Zhao, T.; Chen, Y. Fractional calculus in image processing: A review. Fract. Calc. Appl. Anal. 2016, 19, 1222–1249. [Google Scholar] [CrossRef]
Ghamisi, P.; Couceiro, M.S.; Benediktsson, J.A.; Ferreira, N.M. An efficient method for segmentation of images based on fractional calculus and natural selection. Expert Syst. Appl. 2012, 39, 12407–12417. [Google Scholar] [CrossRef]
Tian, D.; Xue, D.; Wang, D. A fractional-order adaptive regularization primal–dual algorithm for image denoising. Inf. Sci. 2015, 296, 147–159. [Google Scholar] [CrossRef]
Coelho, C.; Costa, M.F.P.; Ferrás, L. Neural Fractional Differential Equations. arXiv 2024, arXiv:2403.02737. [Google Scholar]
Alsaade, F.W.; Al-zahrani, M.S.; Yao, Q.; Jahanshahi, H. A Model-Free Finite-Time Control Technique for Synchronization of Variable-Order Fractional Hopfield-like Neural Network. Fractal Fract. 2023, 7, 349. [Google Scholar] [CrossRef]
Boroomand, A.; Menhaj, M.B. Fractional-order Hopfield neural networks. In Proceedings of the Advances in Neuro-Information Processing: 15th International Conference, ICONIP 2008, Auckland, New Zealand, 25–28 November 2008; Revised Selected Papers, Part I 15. Springer: Berlin/Heidelberg, Germany, 2009; pp. 883–890. [Google Scholar]
Arora, S.; Suman, H.K.; Mathur, T.; Pandey, H.M.; Tiwari, K. Fractional derivative based weighted skip connections for satellite image road segmentation. Neural Netw. 2023, 161, 142–153. [Google Scholar] [CrossRef]
Krouma, H.; Ferdi, Y.; Taleb-Ahmedx, A. Neural adaptive fractional order differential based algorithm for medical image enhancement. In Proceedings of the 2018 International Conference on Signal, Image, Vision and their Applications (SIVA), Guelma, Algeria, 26–27 November 2018; pp. 1–6. [Google Scholar]
Zamora, J.; Vargas, J.A.C.; Rhodes, A.; Nachman, L.; Sundararajan, N. Convolutional filter approximation using fractional calculus. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 383–392. [Google Scholar]
Diethelm, K.; Ford, N.J. Analysis of fractional differential equations. J. Math. Anal. Appl. 2002, 265, 229–248. [Google Scholar] [CrossRef]
Ross, B. A brief history and exposition of the fundamental theory of fractional calculus. In Fractional Calculus and Its Applications; Springer: Berlin/Heidelberg, Germany, 1975; pp. 1–36. [Google Scholar]
Machado, J.T.; Kiryakova, V.; Mainardi, F. Recent history of fractional calculus. Commun. Nonlinear Sci. Numer. Simul. 2011, 16, 1140–1153. [Google Scholar] [CrossRef]
Caputo, M. Linear Models of Dissipation whose Q is almost Frequency Independent—II. Geophys. J. Int. 1967, 13, 529–539. [Google Scholar] [CrossRef]
Grünwald, A. Ueber, Begrenzte, Derivationen und Deren Anwendung. Z. Math. Phys 1867, 12, 441–480. [Google Scholar]
Post, E.L. Generalized differentiation. Trans. Am. Math. Soc. 1930, 32, 723–781. [Google Scholar] [CrossRef]
Letnikov, A. An explanation of the concepts of the theory of differentiation of arbitrary index. Mosc. Matem. Sb. 1872, 6, 413–445. [Google Scholar]
Jain, R.; Kasturi, R.; Schunck, B.G. Machine Vision; McGraw-Hill: New York, NY, USA, 1995; Volume 5. [Google Scholar]
Bishop, C.M.; Bishop, H. Deep Learning: Foundations and Concepts; Springer Nature: Cham, Switzerland, 2023. [Google Scholar]
Komatsu, R.; Gonsalves, T. Comparing U-Net based models for denoising color images. AI 2020, 1, 465–486. [Google Scholar] [CrossRef]
Liu, Y.; Pu, Y.; Zhou, J. Design of image denoising filter based on fractional integral. J. Comput. Inf. Syst. 2010, 6, 2839–2847. [Google Scholar]
Liu, Y. A digital image denoising method based on fractional calculus. J. Sichuan Univ. Eng. Sci. Ed. 2011, 43, 90–95+144. [Google Scholar]
Azerad, P.; Bouharguane, A.; Crouzet, J.F. Simultaneous denoising and enhancement of signals by a fractal conservation law. Commun. Nonlinear Sci. Numer. Simul. 2012, 17, 867–881. [Google Scholar] [CrossRef]
Zheng, W.; Xianmin, M. Fractional-Order Differentiate Adaptive Algorithm for Identifying Coal Dust Image Denoising. In Proceedings of the 2014 International Symposium on Computer, Consumer and Control, Taichung, Taiwan, 10–12 June 2014; pp. 638–641. [Google Scholar]
Pan, X.; Liu, S.; Jiang, T.; Liu, H.; Wang, X.; Li, L. Non-causal fractional low-pass filter based medical image denoising. J. Med. Imaging Health Inform. 2016, 6, 1799–1806. [Google Scholar] [CrossRef]
Li, D.; Jiang, T.; Jin, Q.; Zhang, B. Adaptive fractional order total variation image denoising via the alternating direction method of multipliers. In Proceedings of the 2020 Chinese Control And Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 3876–3881. [Google Scholar]
Al-Shamasneh, A.R.; Ibrahim, R.W. Image denoising based on quantum calculus of local fractional entropy. Symmetry 2023, 15, 396. [Google Scholar] [CrossRef]
Ilesanmi, A.E.; Ilesanmi, T.O. Methods for image denoising using convolutional neural network: A review. Complex Intell. Syst. 2021, 7, 2179–2198. [Google Scholar] [CrossRef]
Jifara, W.; Jiang, F.; Rho, S.; Cheng, M.; Liu, S. Medical image denoising using convolutional neural network: A residual learning approach. J. Supercomput. 2019, 75, 704–718. [Google Scholar] [CrossRef]
Singh, P.; Shankar, A. A novel optical image denoising technique using convolutional neural network and anisotropic diffusion for real-time surveillance applications. J. Real-Time Image Process. 2021, 18, 1711–1728. [Google Scholar] [CrossRef]
Chandra, I.S.; Shastri, R.K.; Kavitha, D.; Kumar, K.R.; Manochitra, S.; Babu, P.B. CNN based color balancing and denoising technique for underwater images: CNN-CBDT. Meas. Sens. 2023, 28, 100835. [Google Scholar] [CrossRef]
Bai, Y.C.; Zhang, S.; Chen, M.; Pu, Y.F.; Zhou, J.L. A fractional total variational CNN approach for SAR image despeckling. In Proceedings of the Intelligent Computing Methodologies: 14th International Conference, ICIC 2018, Wuhan, China, 15–18 August 2018; Proceedings, Part III 14. Springer: Cham, Switzerland, 2018; pp. 431–442. [Google Scholar]
Jia, X.; Liu, S.; Feng, X.; Zhang, L. Focnet: A fractional optimal control network for image denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6054–6063. [Google Scholar]
Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
Yang, X.; Zhang, J.; Liu, Y.; Zheng, X.; Liu, K. Super-resolution image reconstruction using fractional-order total variation and adaptive regularization parameters. Vis. Comput. 2019, 35, 1755–1768. [Google Scholar] [CrossRef]
Wang, Q.; Gao, Z.; Xie, C.; Chen, G.; Luo, Q. Fractional-order total variation for improving image fusion based on saliency map. Signal Image Video Process. 2020, 14, 991–999. [Google Scholar] [CrossRef]
Zhang, X.; Yan, H. Medical image fusion and noise suppression with fractional-order total variation and multi-scale decomposition. IET Image Process. 2021, 15, 1688–1701. [Google Scholar] [CrossRef]
Jun, Z.; Zhihui, W. A class of fractional-order multi-scale variational models and alternating projection algorithm for image denoising. Appl. Math. Model. 2011, 35, 2516–2528. [Google Scholar] [CrossRef]
Yazgaç, B.G.; Kırcı, M. Fractional-order calculus-based data augmentation methods for environmental sound classification with deep learning. Fractal Fract. 2022, 6, 555. [Google Scholar] [CrossRef]
Lepcha, D.C.; Goyal, B.; Dogra, A.; Sharma, K.P.; Gupta, D.N. A deep journey into image enhancement: A survey of current and emerging trends. Inf. Fusion 2023, 93, 36–76. [Google Scholar] [CrossRef]
Pu, Y. Fractional calculus approach to texture of digital image. In Proceedings of the 2006 8th International Conference on Signal Processing, Guilin, China, 16–20 November 2006; Volume 2. [Google Scholar]
Huang, G.; Xu, L.; Chen, Q.; Men, T. Image Enhancement Using a Fractional-Order Differential. In Proceedings of the 4th International Conference on Computer Engineering and Networks: CENet2014, Shanghai, China, 19–20 July 2014; Springer: Cham, Switzerland, 2015; pp. 555–563. [Google Scholar]
Saadia, A.; Rashdi, A. Fractional order integration and fuzzy logic based filter for denoising of echocardiographic image. Comput. Methods Programs Biomed. 2016, 137, 65–75. [Google Scholar] [CrossRef] [PubMed]
Lei, J.; Zhang, S.; Luo, L.; Xiao, J.; Wang, H. Super-resolution enhancement of UAV images based on fractional calculus and POCS. Geo-Spat. Inf. Sci. 2018, 21, 56–66. [Google Scholar] [CrossRef]
AbdAlRahman, A.; Ismail, S.M.; Said, L.A.; Radwan, A.G. Double fractional-order masks image enhancement. In Proceedings of the 2021 3rd Novel Intelligent and Leading Emerging Sciences Conference (NILES), Giza, Egypt, 23–25 October 2021; pp. 261–264. [Google Scholar]
Aldawish, I.; Jalab, H.A. A Mathematical Model for COVID-19 Image Enhancement based on Mittag-Leffler-Chebyshev Shift. Comput. Mater. Contin. 2022, 73, 1307–1316. [Google Scholar]
Miah, B.A.; Sen, M.; Murugan, R.; Gupta, D. Developing Riemann–Liouville-Fractional Masks for Image Enhancement. Circuits Syst. Signal Process. 2024, 43, 3802–3831. [Google Scholar] [CrossRef]
Yogeshwari, M.; Thailambal, G. Automatic feature extraction and detection of plant leaf disease using GLCM features and convolutional neural networks. Mater. Today Proc. 2023, 81, 530–536. [Google Scholar] [CrossRef]
Accarino, G.; Chiarelli, M.; Immorlano, F.; Aloisi, V.; Gatto, A.; Aloisio, G. Msg-gan-sd: A multi-scale gradients gan for statistical downscaling of 2-meter temperature over the euro-cordex domain. AI 2021, 2, 600–620. [Google Scholar] [CrossRef]
He, C.; Li, K.; Xu, G.; Yan, J.; Tang, L.; Zhang, Y.; Wang, Y.; Li, X. Hqg-net: Unpaired medical image enhancement with high-quality guidance. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–15. [Google Scholar] [CrossRef]
Jalab, H.A.; Al-Shamasneh, A.R.; Shaiba, H.; Ibrahim, R.W.; Baleanu, D. Fractional Renyi entropy image enhancement for deep segmentation of kidney MRI. Comput. Mater. Contin. 2021, 67, 2061–2075. [Google Scholar]
Ferdi, Y. Some applications of fractional order calculus to design digital filters for biomedical signal processing. J. Mech. Med. Biol. 2012, 12, 1240008. [Google Scholar] [CrossRef]
Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics; University of California Press: Berkeley, CA, USA, 1961; Volume 4, pp. 547–562. [Google Scholar]
Karmeshu. Entropy Measures, Maximum Entropy Principle and Emerging Applications; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Bromiley, P.; Thacker, N.; Bouhova-Thacker, E. Shannon entropy, Renyi entropy, and information. Stat. Inf. Ser. (2004-004) 2004, 9, 2–8. [Google Scholar]
Zhu, S.C.; Wu, Y.N.; Mumford, D. Minimax entropy principle and its application to texture modeling. Neural Comput. 1997, 9, 1627–1660. [Google Scholar] [CrossRef]
Ibraheam, M.; Li, K.F.; Gebali, F.; Sielecki, L.E. A performance comparison and enhancement of animal species detection in images with various r-cnn models. AI 2021, 2, 552–577. [Google Scholar] [CrossRef]
Xiao, B.; Li, L.; Li, Y.; Li, W.; Wang, G. Image analysis by fractional-order orthogonal moments. Inf. Sci. 2017, 382, 135–149. [Google Scholar] [CrossRef]
Kumar, S.P.; Latte, M.V. Modified and optimized method for segmenting pulmonary parenchyma in CT lung images, based on fractional calculus and natural selection. J. Intell. Syst. 2019, 28, 721–732. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Coelho, C.; Costa, M.F.P.; Ferrás, L.L.; Soares, A.J. Object detection with retinanet on aerial imagery: The algarve landscape. In Proceedings of the International Conference on Computational Science and Its Applications, Cagliari, Italy, 13–16 September 2021; Springer: Cham, Switzerland, 2021; pp. 501–516. [Google Scholar]
Tang, Z.; Liu, X.; Chen, H.; Hupy, J.; Yang, B. Deep learning based wildfire event object detection from 4K aerial images acquired by UAS. AI 2020, 1, 166–179. [Google Scholar] [CrossRef]
Albuquerque, L.; Coelho, C.; Costa, M.F.P.; Ferrás, L.; Soares, A. Improving public parking by using artificial intelligence. AIP Conf. Proc. 2023, 2849, 220003. [Google Scholar]
Gunturu, S.; Munir, A.; Ullah, H.; Welch, S.; Flippo, D. A spatial AI-based agricultural robotic platform for wheat detection and collision avoidance. AI 2022, 3, 719–738. [Google Scholar] [CrossRef]
Barbedo, J.G.A. Detecting and classifying pests in crops using proximal images and machine learning: A review. AI 2020, 1, 312–328. [Google Scholar] [CrossRef]
El Ogri, O.; Karmouni, H.; Sayyouri, M.; Qjidaa, H. 3D image recognition using new set of fractional-order Legendre moments and deep neural networks. Signal Process. Image Commun. 2021, 98, 116410. [Google Scholar] [CrossRef]
Zhou, M.; Li, B.; Wang, J. Optimization of Hyperparameters in Object Detection Models Based on Fractal Loss Function. Fractal Fract. 2022, 6, 706. [Google Scholar] [CrossRef]
Mahaveerakannan, R.; Anitha, C.; Thomas, A.K.; Rajan, S.; Muthukumar, T.; Rajulu, G.G. An IoT based forest fire detection system using integration of cat swarm with LSTM model. Comput. Commun. 2023, 211, 37–45. [Google Scholar] [CrossRef]
Castleman, K.R. Digital Image Processing; Prentice Hall Press: Saddle River, NJ, USA, 1996. [Google Scholar]
Deepika, C.L.; Kandaswamy, A.; Vimal, C.; Satish, B. Palmprint authentication using modified legendre moments. Procedia Comput. Sci. 2010, 2, 164–172. [Google Scholar] [CrossRef]
Kamaruddin, N.; Abdullah, N.A.; Ibrahim, R.W. Image segmentation based on fractional non-markov poisson stochastic process. Pak. J. Stat. 2015, 31, 557–574. [Google Scholar]
Tang, Q.; Gao, S.; Liu, Y.; Yu, F. Infrared image segmentation algorithm for defect detection based on FODPSO. Infrared Phys. Technol. 2019, 102, 103051. [Google Scholar] [CrossRef]
Kamaruddin, N.; Maarop, N.; Narayana, G. Fractional Active Contour Model for Edge Detector on Medical Image Segmentation. In Proceedings of the 2020 2nd International Conference on Image, Video and Signal Processing, Marrakesh, Morocco, 4–6 June 2020; pp. 39–44. [Google Scholar]
Vivekraj, A.; Sumathi, S. Resnet-Unet-FSOA based cranial nerve segmentation and medial axis extraction using MRI images. Imaging Sci. J. 2023, 71, 750–766. [Google Scholar] [CrossRef]
Geng, N.; Sheng, H.; Sun, W.; Wang, Y.; Yu, T.; Liu, Z. Image segmentation of rail surface defects based on fractional order particle swarm optimization 2D-Otsu algorithm. In Proceedings of the International Conference on Algorithm, Imaging Processing, and Machine Vision (AIPMV 2023), Qingdao, China, 15–17 September 2023; Volume 12969, pp. 56–59. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
De Silva, M.S.; Narayanan, B.N.; Hardie, R.C. A patient-specific algorithm for lung segmentation in chest radiographs. AI 2022, 3, 931–947. [Google Scholar] [CrossRef]
Chhabra, M.; Ravulakollu, K.K.; Kumar, M.; Sharma, A.; Nayyar, A. Improving automated latent fingerprint detection and segmentation using deep convolutional neural network. Neural Comput. Appl. 2023, 35, 6471–6497. [Google Scholar] [CrossRef]
Zhang, T.; Wang, D.; Lu, Y. ECSNet: An accelerated real-time image segmentation CNN architecture for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2023, 24, 15105–15112. [Google Scholar] [CrossRef]
Marques, F.; De Araujo, T.P.; Nator, C.; Saraiva, A.; Sousa, J.; Pinto, A.M.; Melo, R. Recognition of simple handwritten polynomials using segmentation with fractional calculus and convolutional neural networks. In Proceedings of the 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), Salvador, Brazil, 15–18 October 2019; pp. 245–250. [Google Scholar]
Nirmalapriya, G.; Agalya, V.; Regunathan, R.; Ananth, M.B.J. Fractional Aquila spider monkey optimization based deep learning network for classification of brain tumor. Biomed. Signal Process. Control 2023, 79, 104017. [Google Scholar] [CrossRef]
Lakra, M.; Kumar, S. A fractional-order PDE-based contour detection model with CeNN scheme for medical images. J. Real-Time Image Process. 2022, 19, 147–160. [Google Scholar] [CrossRef]
Li, C.; Xu, C.; Gui, C.; Fox, M.D. Level set evolution without re-initialization: A new variational formulation. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 430–436. [Google Scholar]
Ren, Z. Adaptive active contour model driven by fractional order fitting energy. Signal Process. 2015, 117, 138–150. [Google Scholar] [CrossRef]
Chua, L.O.; Yang, L. Cellular neural networks: Theory. IEEE Trans. Circuits Syst. 1988, 35, 1257–1272. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Zhang, Y.; Pu, Y.; Hu, J.; Zhou, J. A class of fractional-order variational image inpainting models. Appl. Math. Inf. Sci 2012, 6, 299–306. [Google Scholar]
Bosch, J.; Stoll, M. A fractional inpainting model based on the vector-valued Cahn–Hilliard equation. SIAM J. Imaging Sci. 2015, 8, 2352–2382. [Google Scholar] [CrossRef]
Li, D.; Tian, X.; Jin, Q.; Hirasawa, K. Adaptive fractional-order total variation image restoration with split Bregman iteration. ISA Trans. 2018, 82, 210–222. [Google Scholar] [CrossRef] [PubMed]
Ammi, M.R.S.; Jamiai, I. Finite difference and legendre spectral method for a time-fractional diffusion-convection equation for image restoration. Discret. Contin. Dyn. Syst.-Ser. S 2018, 11, 103–117. [Google Scholar]
Gouasnouane, O.; Moussaid, N.; Boujena, S.; Kabli, K. A nonlinear fractional partial differential equation for image inpainting. Math. Model. Comput. 2022, 9, 536–546. [Google Scholar] [CrossRef]
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Peng, J.; Liu, D.; Xu, S.; Li, H. Generating diverse structure for image inpainting with hierarchical VQ-VAE. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 10775–10784. [Google Scholar]
Chen, Y.; Zhang, H.; Liu, L.; Chen, X.; Zhang, Q.; Yang, K.; Xia, R.; Xie, J. Research on image inpainting algorithm of improved GAN based on two-discriminations networks. Appl. Intell. 2021, 51, 3460–3474. [Google Scholar] [CrossRef]
Farajzadeh, N.; Hashemzadeh, M. A deep neural network based framework for restoring the damaged persian pottery via digital inpainting. J. Comput. Sci. 2021, 56, 101486. [Google Scholar] [CrossRef]
Cai, X.; Song, B. Semantic object removal with convolutional neural network feature-based inpainting approach. Multimed. Syst. 2018, 24, 597–609. [Google Scholar] [CrossRef]
Wang, Q.; Chen, Y.; Zhang, N.; Gu, Y. Medical image inpainting with edge and structure priors. Measurement 2021, 185, 110027. [Google Scholar] [CrossRef]
Wu, J.; Zhang, J.; Wu, F.; Kong, Y.; Yang, G.; Senhadji, L.; Shu, H. Generative networks as inverse problems with fractional wavelet scattering networks. arXiv 2020, arXiv:2007.14177. [Google Scholar]
Angles, T.; Mallat, S. Generative networks as inverse problems with scattering transforms. arXiv 2018, arXiv:1805.06621. [Google Scholar]
Graham, B. Fractional max-pooling. arXiv 2014, arXiv:1412.6071. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Zeiler, M.D.; Fergus, R. Stochastic pooling for regularization of deep convolutional neural networks. arXiv 2013, arXiv:1301.3557. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
Ghahremani, M.; Khateri, M.; Sierra, A.; Tohka, J. Adversarial distortion learning for medical image denoising. arXiv 2022, arXiv:2204.14100. [Google Scholar]
Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
Gu, S.; Zhang, L.; Zuo, W.; Feng, X. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2862–2869. [Google Scholar]
Chen, Y.; Pock, T. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1256–1272. [Google Scholar] [CrossRef] [PubMed]
Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef] [PubMed]
Romano, Y.; Elad, M.; Milanfar, P. The little engine that could: Regularization by denoising (RED). SIAM J. Imaging Sci. 2017, 10, 1804–1844. [Google Scholar] [CrossRef]
Tai, Y.; Yang, J.; Liu, X.; Xu, C. Memnet: A persistent memory network for image restoration. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4539–4547. [Google Scholar]
Plötz, T.; Roth, S. Neural nearest neighbors networks. Adv. Neural Inf. Process. Syst. 2018, 31, 1087–1098. [Google Scholar]
Hasan, A.M.; Meziane, F.; Aspin, R.; Jalab, H.A. Segmentation of brain tumors in MRI images using three-dimensional active contour without edge. Symmetry 2016, 8, 132. [Google Scholar] [CrossRef]
Li, C.; Xu, C.; Gui, C.; Fox, M.D. Distance regularized level set evolution and its application to image segmentation. IEEE Trans. Image Process. 2010, 19, 3243–3254. [Google Scholar] [CrossRef] [PubMed]
Ibrahim, R.W.; Hasan, A.M.; Jalab, H.A. A new deformable model based on fractional Wright energy function for tumor segmentation of volumetric brain MRI scans. Comput. Methods Programs Biomed. 2018, 163, 21–28. [Google Scholar] [CrossRef] [PubMed]
Al-Shamasneh, A.R.; Jalab, H.A.; Shivakumara, P.; Ibrahim, R.W.; Obaidellah, U.H. Kidney segmentation in MR images using active contour model driven by fractional-based energy minimization. Signal Image Video Process. 2020, 14, 1361–1368. [Google Scholar] [CrossRef]
Deep Learning Super Sampling (DLSS). Available online: https://developer.nvidia.com/rtx/dlss (accessed on 17 July 2024).
Teague, M.R. Image analysis via the general theory of moments. Josa 1980, 70, 920–930. [Google Scholar] [CrossRef]
Mukundan, R.; Ong, S.; Lee, P.A. Image analysis by Tchebichef moments. IEEE Trans. Image Process. 2001, 10, 1357–1364. [Google Scholar] [CrossRef] [PubMed]
Asli, B.H.S.; Flusser, J. Fast computation of Krawtchouk moments. Inf. Sci. 2014, 288, 73–86. [Google Scholar] [CrossRef]
Sheng, Y.; Shen, L. Orthogonal Fourier–Mellin moments for invariant pattern recognition. JOSA A 1994, 11, 1748–1757. [Google Scholar] [CrossRef]
Xiao, B.; Ma, J.F.; Wang, X. Image analysis by Bessel–Fourier moments. Pattern Recognit. 2010, 43, 2620–2629. [Google Scholar] [CrossRef]
Caselles, V.; Kimmel, R.; Sapiro, G. Geodesic active contours. Int. J. Comput. Vis. 1997, 22, 61–79. [Google Scholar] [CrossRef]
Chan, T.F.; Vese, L.A. Active contours without edges. IEEE Trans. Image Process. 2001, 10, 266–277. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Kao, C.Y.; Gore, J.C.; Ding, Z. Minimization of region-scalable fitting energy for image segmentation. IEEE Trans. Image Process. 2008, 17, 1940–1949. [Google Scholar] [PubMed]
Lankton, S.; Tannenbaum, A. Localizing region-based active contours. IEEE Trans. Image Process. 2008, 17, 2029–2039. [Google Scholar] [CrossRef] [PubMed]
Shi, Y.; Karl, W.C. A real-time algorithm for the approximation of level-set-based curve evolution. IEEE Trans. Image Process. 2008, 17, 645–656. [Google Scholar] [PubMed]
Ding, K.; Xiao, L.; Weng, G. Active contours driven by region-scalable fitting and optimized Laplacian of Gaussian energy for image segmentation. Signal Process. 2017, 134, 224–233. [Google Scholar] [CrossRef]
Ding, K.; Xiao, L.; Weng, G. Active contours driven by local pre-fitting energy for fast image segmentation. Pattern Recognit. Lett. 2018, 104, 29–36. [Google Scholar] [CrossRef]
Mnih, V. Machine Learning for Aerial Image Labeling. Ph.D. Thesis, University of Toronto, Toronto, ON, Canada, 2013. [Google Scholar]
Liu, Y.; Yao, J.; Lu, X.; Xia, M.; Wang, X.; Liu, Y. RoadNet: Learning to comprehensively analyze road networks in complex urban scenes from high-resolution remotely sensed images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2043–2056. [Google Scholar] [CrossRef]
Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 182–186. [Google Scholar]
Xie, Y.; Miao, F.; Zhou, K.; Peng, J. HsgNet: A road extraction network based on global perception of high-order spatial information. ISPRS Int. J. Geo-Inf. 2019, 8, 571. [Google Scholar] [CrossRef]
Yang, M.; Yuan, Y.; Liu, G. SDUNet: Road extraction via spatial enhanced and densely connected UNet. Pattern Recognit. 2022, 126, 108549. [Google Scholar] [CrossRef]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep Learning Face Attributes in the Wild. In Proceedings of the International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
Shlens, J. A tutorial on principal component analysis. arXiv 2014, arXiv:1404.1100. [Google Scholar]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Baruah, U.; Hazarika, S. Online Handwritten Assamese Characters Dataset; UCI Machine Learning Repository; UC Irvine: Irvine, CA, USA, 2011. [Google Scholar]
Wang, D.H.; Liu, C.L.; Yu, J.L.; Zhou, X.D. CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; pp. 1206–1210. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild. arXiv 2012, arXiv:1212.0402. [Google Scholar]
Schölkopf, B.; Platt, J.; Hofmann, T. Efficient Learning of Sparse Representations with an Energy-Based Model. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems, NIPS 2006, Vancouver, BC, Canada, 4–7 December 2007; pp. 1137–1144. [Google Scholar]
Simard, P.; Steinkraus, D.; Platt, J. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition, Edinburgh, UK, 3–6 August 2003; pp. 958–963. [Google Scholar]
Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for Simplicity: The All Convolutional Net. arXiv 2015, arXiv:1412.6806. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Zhu, Q.; Zhang, R. HENet:A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage. arXiv 2018, arXiv:1803.02742. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]

Figure 1. Example use cases of different tasks in computer vision: denoising for removing unwanted noise, enhancement for ground-truth image’s quality improvement, object detection for identification and labelling, segmentation for image partitioning for further analysis, and restoration for missing parts inpainting (ground-truth image generated by DALL-E 3).

Figure 2. Architecture of FOCNet.

Figure 3. Schematic representation of a multi-scale FOCNet with two levels.

Figure 4. Training process of Neural Fractional-Order Adaptive Masks.

Figure 5. Image enhancement with fractional Rényi entropy before using a CNN for image segmentation.

Figure 6. Architecture of FrOLM-DNN for object detection and classification of 3D image (input image generated by DALL-E 3).

Figure 7. Architecture of a GSN with a ScatNet and PCA encoder, and a CNN decoder.

Figure 8. Architecture of a GFRSN with an FrScatNet and FM encoder, and a CNN decoder. Using two GFRSNs with different fractional-orders, one can enhance the predicted image

\tilde{y}

by merging the outputs from both orders,

{\tilde{y}}_{α_{1}}

and

{\tilde{y}}_{α_{2}}

.

Figure 8. Architecture of a GFRSN with an FrScatNet and FM encoder, and a CNN decoder. Using two GFRSNs with different fractional-orders, one can enhance the predicted image

\tilde{y}

by merging the outputs from both orders,

{\tilde{y}}_{α_{1}}

and

{\tilde{y}}_{α_{2}}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Coelho, C.; Costa, M.F.P.; Ferrás, L.L. Fractional Calculus Meets Neural Networks for Computer Vision: A Survey. AI 2024, 5, 1391-1426. https://doi.org/10.3390/ai5030067

AMA Style

Coelho C, Costa MFP, Ferrás LL. Fractional Calculus Meets Neural Networks for Computer Vision: A Survey. AI. 2024; 5(3):1391-1426. https://doi.org/10.3390/ai5030067

Chicago/Turabian Style

Coelho, Cecília, M. Fernanda P. Costa, and Luís L. Ferrás. 2024. "Fractional Calculus Meets Neural Networks for Computer Vision: A Survey" AI 5, no. 3: 1391-1426. https://doi.org/10.3390/ai5030067

Article Menu

Fractional Calculus Meets Neural Networks for Computer Vision: A Survey

Abstract

1. Introduction

2. Fractional Calculus

3. Computer Vision

3.1. Denoising

3.1.1. Fractional-Order Total Variation

3.1.2. Fractional Optimal Control Network

3.2. Enhancement

3.2.1. Neural Fractional-Order Adaptive Masks

3.2.2. Fractional Rényi Entropy

3.3. Object Detection

Fractional-Order Legendre Moment Invariants

3.4. Segmentation

3.4.1. Active Contour Detection with Fractional-Order Regularisation Term

3.4.2. FOCNet for Segmentation

3.5. Restoration

Fractional Wavelet Scattering Networks

3.6. Compression

3.6.1. Fractional Max-Pooling

3.6.2. Fractional Convolutional Filters

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Summary Table

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI