1. Introduction
The study of the physical processes of interconnected heat-and-mass transfer in fractal anisotropic media is a relevant and interdisciplinary topic that covers physics, mathematics, materials science, geophysics, biomedicine, and other fields. Fractal media have complex geometry, which affects the behavior of physical processes, especially when these processes are in a non-equilibrium state. The construction of effective numerical algorithms for implementing the mathematical models of heat-and-moisture transfer remains an important problem in modern research. Since the case in point is a physical system that has a complex fractal structure and is characterized by such essential properties as memory, self-organization, and spatial non-locality, then, for its mathematical description, it is necessary to use non-traditional methods based on the differential apparatus of fractional calculus.
The mathematical apparatus of fractional integro-differentiation has been known for a long time and is quite developed. However, its use in modeling the physical systems of fractal media has only recently begun. Its application makes it possible to more deeply comprehend the existing known research results and obtain new solutions that take into account the properties of temporal non-locality and spatial self-similarity. Traditional models based on the theory of integer differentiation and Euclidean geometric assumptions are often unable to adequately describe such complex structures with fractal properties. Also, the problems of using fractional differential and integral operators for modeling various processes are associated with a variety of their definitions and the lack of a substantiated physical interpretation.
Various fractional-order differential operators such as Riemann–Liouville, Caputo, and Atangana–Baleanu [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10], as well as other fractional calculus concepts like Caputo–Fabrizio, Hilfer, Prabhakar, and Riesz [
10,
11,
12,
13,
14,
15,
16], have wide applications in many fields of science and engineering, where it is necessary to model processes with fractal properties, heredity, or anomalous behavior. Fractal mathematical models are used to study physical systems, in particular, viscoelasticity and anomalous diffusion, as well as in the engineering, financial, chemical, biological, medical, and pharmaceutical fields. Fractional models can describe the complex mechanical properties of materials, particularly the behavior of composite materials, polymers, or shape-memory materials. The Caputo fractional derivative [
1,
4,
17,
18] has its specific advantages for taking into account the history of the process, hereditary effects, or complex dynamics, especially in cases where the initial conditions are given in the form of classical derivatives.
The Caputo–Fabrizio fractional derivative [
9,
10,
19,
20] is a modification of the classical Caputo fractional derivative, which is distinguished by the presence of a kernel with exponential fading, which allows the modeling of systems with limited memory. This derivative has important advantages while modeling systems with memory effects, but without strong singularities at the starting point of the time count. It has found applications in many fields, particularly in those where it is necessary to consider the time dynamics with the gradual fading of hereditary effects. This model is especially suitable for describing heat conduction processes in materials with limited memory [
21], such as nanomaterials or composite materials.
The Atangana–Baleanu fractional derivative [
4,
5,
6,
7,
8] is a modification of fractional derivatives that features a convolution kernel based on the Mittag–Leffler function. It is used to model complex physical, engineering, biological, and economic systems with the properties of heredity and nonlinear dynamics [
5,
6,
22,
23]. The advantage of this derivative is that it combines the features of fractional calculus and ensures the smoothness of the models, making them more stable in many applications. The Prabhakar fractional derivative [
13,
14,
15] is one of the new modifications of fractional derivatives, based on the use of the generalized Prabhakar function, which allows the description of processes with non-linear inheritance. This derivative is used to model systems that have complex time delays and hereditary effects that not only fade out over time, but also have complex forms of extinction or resonance.
The Prabhakara derivative is applied in anomalous heat transfer models in cases where materials have complex thermal conductivity properties [
11,
12,
24,
25]. This is important for describing processes in composite materials, nanomaterials, or liquids with additional energy states.
The implementation of various mathematical models of heat-and-moisture transfer by analytical methods, such as the Laplace and Fourier transform methods [
26,
27], etc., are limited in application, and numerical methods, such as the spectral method based on the Laguerre polynomials, finite element method, and finite difference method [
28,
29,
30,
31,
32], are characterized by high computational complexity and require significant amounts of memory and time. Therefore, there is a need to develop alternative methods. Today, an approach using artificial neural networks (ANNs) is relevant for finding an approximation solution to fractional-order differential equations.
The studies that used ANNs to solve a class of problems described by differential equations began as early as the last century. At the initial stages, researchers used a multilayer perceptron and a cost-valuation function for the numerical calculation of ordinary differential equations [
33,
34] and partial differential equations with regular boundaries [
34]. Later, the method was extended to irregular boundaries [
35]. During the next decade, more and more research works appeared with an increased number of parameters and layers of ANNs and with the use of radial-basis neural networks. The next stage in the development of network methods is associated with progress in software, open access to libraries such as the Tensorflow and PyTorch, and the implementation of the automatic differentiation operation [
36] in such libraries.
An important and successful approach in the implementation of the numerical algorithms of differential equations is the use of physics-informed neural networks (PINNs) due to the integration of physical laws directly into the architecture of neural networks. The idea of including previously known information about the physical model in the neural network algorithm belongs to Dissanayake and Phan-Thien [
37]. The first publications about networks that include information about the physical process as a parameter of the regularization of the cost function appeared in 2017. Two years later, a combined version of these articles was published, in which the authors [
38], using the examples of the Burgers, Schrödinger, and Allen–Kahn equations, demonstrated the use of neural networks based on the implementation of physical laws (PINNs) and obtained improved interpolation properties of such networks. However, using PINN in real time is associated with certain difficulties since the network learning is slow and expensive due to the use of optimization algorithm—in particular, gradient descent [
39]. There are also challenges with network scaling [
40], vanishing gradient [
41], solution stuck at local minima, and fine-customizing the PINN learning process, which remains a manual process [
38,
42]. The authors of [
43] tried to improve the effectiveness of the PINN learning process by taking into account the initialization influence factor, and [
44] noted the ability of this network structure to effectively learn on a limited data set. This is particularly important for fractal anisotropic media where data may be limited or difficult to access.
The authors of [
45] demonstrated that transfer learning can improve the reliability and convergence of PINNs for high-frequency problems, reducing the need for data and training time without increasing the number of network parameters. However, due to the advantages of PINNs, such as consistency with physical constraints, the accuracy of approximation, generalizations, and the ability to learn on sparse data, PINNs have become widely used to solve a diverse class of problems including partial differential equations and integro-differential fractional equations. This applies, in particular, to the study of signal and image processing, associative memory, control and optimization systems, the theory of viscoelasticity, heat-and-moisture transfer solute transport [
46], and even the evaluation of wind loads on bridges [
47]. Additionally, the PINN model effectively solves direct and inverse problems, as shown by studies [
48,
49] on the prediction of frictional contact temperature and the estimation of the input data, states, and parameters of dynamic systems based on limited output data.
In further studies, PINNs were adapted to solve stochastic differential equations, and the stability of non-linear systems was analyzed [
50]. At the same time, other machine learning methods have also demonstrated success in solving computational mechanics problems. For example, a study using an energy-based approach to partial differential equations combined with a collocation strategy showed that the system’s natural energy can be used as a loss function for solving engineering problems [
51]. Additionally, genetic algorithms have been successfully applied to optimize architectures such as deep neural networks and adaptive neuro-fuzzy inference systems. This has improved the accuracy of material property predictions, specifically the fracture energy of polymer/nanoparticle composites [
52].
The further development of the studies is the implementation of fractal physics-informed neural networks (fPINNs). fPINNs can be adapted for a wide range of tasks, including modeling heat transfer, fluid flow, wave propagation, and other processes in fractal anisotropic media. Although fPINN is not the only approach for finding solutions to fractional partial differential equations, its relevance in modeling fractal anisotropic media is due to the ability to efficiently and accurately model complex systems, for example, with complex geometry and/or high dimensionality, and the integration of physical laws in the learning process, lower computational costs, and universal implementation. Their application to fractal anisotropic media is the newest direction in the field of machine learning and computational physics. Another variation of a physically determined neural network for finding a solution to fractional differential equations is LaplacefPINN [
53]. This neural network method allows a simplification of the loss function and avoids the need to introduce many auxiliary points. In [
54], the ANN is improved by using the Hausdorff fractional derivative in the activation function of hidden layers to find a numerical approximation of the diffusion equations with the Hausdorff derivative in the one-dimensional case. In [
55], researchers used a neural network combined with the Broyden–Fletcher–Goldfarb–Shanno (BFGS) optimization techniques to solve both linear and non-linear fractal differential equations. The authors of [
56] combined the ideas of the Monte Carlo method for calculating integrals and the concept of the method of neural networks that take into account physical laws. This makes it possible to efficiently calculate unbiased estimates for the constraints of fractional integro-differential equations in the loss functional when optimizing neural network parameters.
This study examines the use of a fractal physics-informed neural network (fPINN) to approximate the numerical solution of a system of interconnected the partial differential equations of fractional order that describe heat-and-mass transfer processes in anisotropic media. This is the first study of its kind to provide a comprehensive description of the occurrence of two related phenomena, thermal and physical, in a material with an anisotropic fractal structure.
3. Results and Discussion
3.1. Algorithmic Aspects of fPINN Implementation
This section demonstrates how the proposed fractal physics-informed neural network can be used for the numerical implementation of the mathematical model in (
1)–(
5). For the software implementation of this approach, the Python programming language and machine learning tools available in the open source libraries TensorFlow and Keras [
59] were used. A description of the typical steps to follow is given to provide a visual representation of the general procedure.
Due to the capabilities of physics-based neural networks to use physics knowledge in learning and to operate with general non-linear equations with partial and fractional derivatives, they are able to effectively solve problems with a limited amount of data (or a set of noisy data) or to recover solutions without a set of learning data. Taking into account this property, when conducting the studies, small amounts of experimental data from [
60,
61] were used or the constructed models were used as surrogates, i.e., when the unsupervised learning method was applied solely based on physical equations and initial and boundary conditions.
To compare the results obtained by the neural network method and the finite difference method, learning points were generated corresponding to the grid points of the finite difference method. In other cases, a uniformly distributed arrangement of points in the inner region and on the boundary of the region is chosen. The learning points for the test set were chosen according to the experimental data described in [
60,
61]. Before starting network learning, network weights were initialized using Xavier’s pseudo-random initialization method [
62].
A model consisting of two separate fully connected neural networks, which are united by a common input, has been created. It is important to note that both networks have the same configuration. Each of these networks includes 8 hidden layers, where each layer contains 40 nodes. As a result, the total number of parameters in each network is 11,641. In the input layer, there are input parameters (variables’ spatial and temporal coordinates) that provide input data to the network for processing. On the other hand, there is one output in the output layer of each independent network, which is used to approximate and predict the values of moisture or temperature changes.
A hyperbolic tangent has been selected to describe the activation function that was applied to all hidden layers. This function is smooth and continuous, with non-zero derivative and centered with respect to zero. These characteristics of the activation function help avoid “vanishing gradients” problems when the neural networks are learning.
To optimize the loss function, the following algorithms were used: Adam, Adam with Nesterov momentum, root mean square distribution, and stochastic gradient descent. The studies were carried out on a computer station.
The training time for this network configuration can vary significantly depending on a range of factors, such as the volume of training data and training parameters, including the number of epochs and learning rate. Fractal physics-informed neural networks also require additional computational resources for calculating derivatives with respect to time and spatial coordinates, as well as for the numerical implementation of fractional differential equations during training. To measure the training time, we used the time library in Python and built-in TensorFlow tools to monitor the training process.
Let us present the algorithmic aspects of the implementation of the fractal neural model with sequential learning in the context of implementing the heat-and-mass transfer problem in (
1)–(
5). The following paragraphs present code snippets from our work that illustrate key stages such as model initialization, the calculation of residual errors with the consideration of fractal properties, the definition of the loss function, and the determination of the number of training iterations.
Step 1. We determine the points of the learning data sets
,
,
for Equations (
1) and (
2), boundary (
4) and (
5) and initial conditions (
3), and learning data
;
Step 2. According to
Figure 1, we describe the function for creating and initializing the feed-forward network model (Listing 1) by specifying the number of hidden layers, the number of neurons in each layer, the activation function for each layer, and the method for initializing the weights for the initial determination of the parameters
and
;
Listing 1. Initialization of feed-forward model. |
|
Step 3. Create two separate instances of the neural network models and using the initialization function from the previous step;
Step 4. Create a function that describes the residuals of the fractional differential equations in (
25) and (
26), presented as difference schemes, using the capabilities of Tensor-Flow (Listing 2);
Listing 2. Computation of residuals. |
|
Step 5. Similar to the previous step, describe the residuals of the boundary conditions (
28) and the initial conditions (
27) using TensorFlow functionalities;
Step 6. We formulate the loss functions in (
40) and (
41), which are defined as the weighted root mean square error of the remainders of the fractional differential equations in (
25) and (
26), boundary (
28) and initial (
27) conditions, and the difference between the network output and test data (
15) (Listing 3);
Listing 3. Loss function determination. |
|
Step 7. For the models determined in step 3, taking into account step 6, we determine the learning procedure using (
41) and (
42);
Step 8. We set optimization hyper-parameters, such as optimization function and learning rate. We determine the weighting coefficients from Formulas (
40) and (
41);
Step 9. Determine the number of learning iterations and/or set the error value to stop the iterative process (Listing 4);
Listing 4. Finding the number of learning iterations. |
|
Step 10. Conduct the step-by-step training procedure in (
41) and (
42) of the networks
and
with random initialization. During the training process, the values of the loss functions for each model and the time required for training are calculated and recorded. Use callbacks to monitor and adjust the training process;
Step 11. Use the trained models from the previous step to obtain model predictions and evaluate the solutions of the fractional differential equations on the test data;
Step 12. Visualization of the obtained results.
3.2. Research on Temperature and Moisture in Anisotropic Materials
Using the developed neural network model and the given algorithm, a set of experiments was carried out to study the dynamics of spatial heat-and-mass transfer in capillary-porous materials during hydrothermal treatment. The material used was wood with different values of density and various thicknesses (5 mm, 15 mm, 20 mm, 25 mm), depending on the particular test and experimental data availability. The initial parameters of the model were set to the following values: initial sample temperature
, ambient temperature
, initial moisture content
, relative air humidity
%, drying agent speed
, material density
. The rest of the parameters and the required empirical relationships were obtained on the basis of experimental data [
30,
32,
60,
61,
63,
64]. In the absence of other instructions, the following values of fractional parameters of the model are set:
,
,
.
To evaluate the accuracy of the created model, a test experiment was carried out for a one-dimensional fractal problem of heat-and-mass transfer, which was derived from the considered two-dimensional mathematical model. A comparison of the output results of the network for moisture and temperature with known data from other researchers [
30,
32,
60,
61,
64] was conducted.
The dependencies shown in
Figure 2 and
Figure 3 describe the process of thermodiffusion (non-isothermal moisture transfer), considering the fractal characteristics of the material. Considering fractal properties is necessary, as they reflect the complex structure of the material, which directly affects heat transfer and mass transfer processes. For example, fractal structures can either slow down or accelerate diffusion processes, depending on the degree of porosity or the inhomogeneity of the material.
Figure 3b and
Figure 4b show the absolute error between the values simulated using the finite difference method (FDM) (
Figure 3a and
Figure 4a) and the results obtained using the fractal network model (
Figure 2a,b), respectively. According to the data obtained, the relative error in predicting moisture content and temperature, using the fractal network fPINN and FDM, remains within the limits of satisfactory accuracy for the case under consideration.
When analyzing the graphs in
Figure 5, which reflect changes in moisture content during 60
of the process of the hydrothermal treatment of a material sample at an ambient temperature of 120
, considering the fractal structure of the material (
,
,
), we can conclude that taking into account the eridity and internal self-organization of the material by using fractional indicators in the mathematical model and the use of a neural network approach allow for fairly accurate modeling of the heat-and-mass transfer process. This testifies to the adequacy of the model and the method of numerical implementation.
The temperature dynamics and moisture content of capillary-porous materials during hydrothermal treatment depend on the density of the material, which indicates the degree of porosity. To analyze this dependence for materials with a fractal structure that were 20 mm thick with different densities (namely, 500
, 520
, 650
, 670
, 700
), a numerical study was conducted, the results of which are shown in
Figure 6 for the moisture content indicator. The result obtained, in particular, shows that moisture is removed faster from material with lower density. This is due to the heterogeneous structure of the material and the ability to self-organize, causing materials with less density restore their state more slowly.
We investigate the influence of environmental parameters during hydrothermal treatment on changes in temperature and moisture content in a sample of capillary-porous material with a fractal structure.
Figure 7 represents a graph showing how the temperature of a sample with a base density of 460
changes, taking into account the fractal structure of the material and the different temperatures of the drying agent: 70
, 80
, and 90
. Analyzing this graph, it can be noted that, as the temperature of the agent increases, the intensity of the sample heating increases. It is important to note that the influence of the fractal structure of the material—namely, the fractional-differential parameters
and
of the model—on the temperature dynamics becomes more significant when increasing the temperature conditions of the process.
The temperature in the center of the sample increased faster when using the fractional parameter in the numerical experiment compared to using the integer parameter. The temperature indicator reached about 90 much faster when taking into account the fractional parameter , which contributes to the faster removal of moisture from the porous material. The analysis of the results of the numerical experiment, taking into account the fractal and integer parameters, showed that, with a decrease in the value of the parameter , the temperature of the capillary-porous material rose more quickly to the temperature of the drying agent. However, a consideration of the fractional exponent demonstrates a slight slowdown in the heating of the sample, which, in turn, slows down the process of removing moisture from the material.
Figure 8 represents the dependence, which shows changes in the moisture content of a sample with a density of 500
, taking into account the fractal structure of the material and different values of the temperature of the drying agent: 60
, 80
, and 100
. It should be noted that moisture in the material is released faster with a decrease in the fractional-differential indicators of the model in the process of heat-and-mass transfer in anisotropic media. In particular, taking into account the fractal structure of the capillary-porous material, the process of moisture release is accelerated, that is, the material dries faster, and the moisture content asymptotically approaches the equilibrium value.
A study was also conducted on the influence of the relative humidity of the drying agent on the change in the moisture content of the material. The hydrothermal process was simulated for a material with a density of 500
at an ambient temperature of 70
and a drying agent speed of
. During the study, the relative humidity was 50%, 60%, and 70%. The obtained results are presented in
Figure 9.
The results of the study took into account the fractal structure of the capillary-porous material. It is obvious that, with a decrease in the relative humidity of the environment, the process of moisture release accelerates, and the influence of the fractal structure of the material on the change in moisture content becomes less significant. This numerical experiment confirms the significant influence of the fractal structure of the material on the dynamics of the moisture content.
Let us investigate how the change in moisture content depends on the anisotropy of the thermophysical properties of the capillary-porous material. In this case, we consider the coefficients that are used in the mathematical model in (
1)–(
5). These coefficients are anisotropic, that is, their values change, depending on moisture content and temperature. Among them, the following coefficients can be distinguished: thermal conductivity
,
and moisture conductivity
,
, heat exchange and moisture exchange, equilibrium moisture content and density, and thermogradient coefficient. The ratio for their calculation is given in the source [
60,
61,
63]. Analyzing the graphs presented in
Figure 10, the following conclusions can be drawn regarding the change in moisture content. With an increase in the values of the anisotropic coefficients of moisture conductivity and thermal conductivity, a decrease in the moisture content in the material is observed. In addition, it should be noted that the fractal structure of the material influences the dynamics of moisture content. With an increase in the values of the studied coefficients, the influence of the fractal structure of the material on moisture content tends to decrease.
Comparing
Figure 8,
Figure 9 and
Figure 10, one can see a significant influence of the sample thickness on the moisture transfer phenomenon, which is also related to heat convection.
3.3. Evaluation of fPINN Architecture Efficiency and the Use of Optimizers
Experimental results evaluating the effectiveness of the selected network architecture are shown in
Figure 11. The left part of the figure shows the evolution of one loss function for two architecturally separated networks.
Figure 11b shows the results of step-by-step learning of two separated networks and disconnected loss functions.
In the context of multi-objective optimization, where gradients are unbalanced, weighting factors are used to improve the optimization algorithm. These weighting factors are determined during network learning by analyzing the convergence of the optimization process in order to reduce the dynamics of the flow of gradient values.
During the course of the studies, an analysis of the influence of of several optimization methods was carried out (
Figure 12), including methods for estimating the adaptive moment (Adam), the Nesterov-adaptive moment (NAdam), stochastic gradient descent (SGD), and the application of the root mean square of of the gradient (RMSProp) for the output task.
It was found that the use of the Adam optimizer ensures the speed of achieving results and the highest accuracy in all test tasks. The results shown by the stochastic gradient descent (SGD) and RMSProp algorithms are approximately identical. Since the initial sample is not balanced, this in turn negatively affects the performance of these methods. The stability of the learning rate parameter for the SGD method and the regulation of the learning step by the accumulated rate of gradient change at the previous stages for the RMSProp method ensured reaching the local optimum. The adaptive moment method uses not only the mean value (first moment) of the gradients, but also the mean squared value of the gradients (second moment) to adapt the learning rate of the parameters. Also, this method updates the learning rate for each network weight individually and has a non-zero initial calibration. The adaptive moment with impulse unexpectedly shows lower accuracy with longer computation time compared to the traditional Adam method. This means that the value of the received impulse must be optimized. From this, it can be concluded that Adam’s method can find an efficient solution with fewer iterations than the other considered optimization methods. The obtained results partially support the assumption that more universal optimizers shall not perform worse than those they can approximate. However, since the setting-up effort can be disproportionately large, then the use of the above algorithms is not always practically expedient, especially in the context of comparing optimizers such as Adam and NAdam.
To effectively evaluate the model, the data were divided into three main sets: the training set, the validation set, and the test set. The model was trained on the training set, and the loss function was used to optimize its parameters. The validation set was used for hyperparameter tuning and monitoring for overfitting. The loss function on this set helped adjust the model and prevent overfitting. After the training and validation phases, the model was assessed on the test set. The loss function at this stage provided a final evaluation of the model’s generalization ability. Analyzing the loss functions, particularly those presented in
Figure 13 and related to the stages of training, validation, and testing, helped in selecting the optimal number of epochs (≈4000) to avoid overfitting.
3.4. Evaluation of the Impact of Data Noise
Let us investigate the impact of noisy data on the system. Let us add the Gaussian white noise with zero mean and root-mean-square deviation
to the learning data set of a one-dimensional problem and obtain
, where
. Accordingly, Formula (
15) will take the form
, while the other parameters of the model will not change.
We used L2-regularization [
65] to reduce the impact of noise and to smooth the data. For this, we added a penalty term to the loss functions in (
40) and (
41) in the form of the sum of the squared values of all model parameters (weights and shifts) multiplied by the constant
. Adding this penalty term forces the model to seek a trade-off between prediction accuracy and parameter simplicity, which can improve its ability to generalize new data. To determine the percentage of noise in the data, we used the following approach:
, where
is the mean square deviation of the data without noise, which determines the amount of value spread in “clean” data. To evaluate the obtained results, we used the relative error:
Figure 14 shows how the impact of noisy data affects the accuracy of the system solution in (
1)–(
5). A noise level of 5% allows it to reach almost
% relative error. In addition, to accurately recover the concentration fields with an accuracy of about 1% at a noise level of 5%, it is necessary to have at least 40 learning points in each hidden layer. The model exhibits the best regenerative qualities when there are 60 nodes in each hidden layer. An increase in the number of nodes in the model leads to a deterioration of its ability to generalize, which is associated with a limited amount of learning data.