1. Introduction
Predicting the output of a system is one of the most frequent tasks in the fields of research in engineering, economics, medicine, etc. It may be a performance estimation of a device, cost analysis, or predictive medicine. Whenever possible, classical mathematical models are developed based on the laws of physics, observations, and, unavoidably, assumptions. In this paper, the phrase “the classical mathematical model” is used as an equivalent of white-box model. Mathematical models often can be inaccurate, incomplete, or very hard to formulate due to gaps in the existing knowledge. In such cases, to be able to predict a system’s output, approximation methods are used. Computational models known as Artificial Neural Networks (ANN) have brought a vast improvement to predictions in many fields. An ANN can achieve excellent performance in function approximation, comparable to accurate mathematical models [
1]. For such high accuracy, an ANN needs a lot of data. Obtaining enough samples may be expensive and time-consuming, and, in effect, unprofitable. Data numerousness is mostly dependent on the complexity of the process one wants to predict. As a vivid example of such a problem, we can offer a fuel cell modeling a multiscale and interdisciplinary problem. The fuel cell models are hard to generalize over different types and have very complicated models because of various species transport phenomena and electrochemical reaction kinetics [
2,
3]. Obtaining the necessary tomographic data for electrodes’ microstructure characterization costs a few months of operator work [
4,
5]. For such a problem, collecting more than twenty data points per dimension would take years and therefore is infeasible in many cases [
6,
7,
8].
In this paper, we address the problem of data shortage by integrating the classical mathematical model with an artificial neural network as presented in
Figure 1. For clarification, we will refer to our implementation of a grey box model as the Interactive Mathematical modeling-Artificial Neural Network (IMANN). The approach presented in this study stems from the fact that every mathematical model’s solution, including the example given above, is represented as a function or a set of functions. The method consists of determining the most uncertain parts of a mathematical model or lacking a theoretical description, and substituting them with a prediction of an ANN, instead of using the mathematical model or the ANN alone. With that being said, the work with the IMANN differs from the practice with a regular ANN. In a conventional procedure, a dataset is divided into training and test data. In addition to these steps, the IMANN requires to divide the problem into two parts. The mathematical model describes one of them, and the artificial neural network approximates the other. The decision regarding this division is a crucial part of working with IMANN, which affects the architecture of the network as well as the data. The underlying research of this paper is to improve the predictive accuracy under a limited dataset, which is understood as less than twenty points per input dimensionality. This research problem is addressed by incorporating an artificial neural network to predict different parts of benchmark functions, which are regarded as a general representation of mathematical models. The ANN is learned based on the mathematical model’s output. A detailed description is presented in
Section 3. In practice, it might be understood that the ANN prediction replaces only some equations in a model (or even just a part of an equation) and adapts to the system’s behavior. As an example, let us consider a system of equations in which one of the equations is replaced by the prediction done by an artificial neural network. The obtained approximation values of that function would be forced by the numerical model to fulfill all equations in the system. If the prediction is incorrect there would be a discrepancy between the measured and the expected output during training. As a consequence, the ANN would be forced to improve its weights and biases until the model is in accordance to the experimental data. As we note later in this work, such a replacement can benefit the accuracy and minimal dataset needed for an artificial neural network prediction.
2. Literature Review
The problem with limited datasets is addressed frequently in the literature with many different approaches [
9,
10,
11,
12]. One type of method is a data augmenting—the generation of a slightly different sample by modifying existing ones [
13]. Baird et al. used a combination of many graphical modifications to improve text recognition [
14]. Simard et al. proposed an improvement: the Tangent Prop method [
15]. In Tangent Prop, modified images were used to define the tangent vector—an imitation of function derivative, which was included in error estimation [
15]. Methods in which such vectors were used are still improved in recent works [
16,
17]. In the literature, one can find a variety of methods that can modify datasets, improving ANN learning. A remarkably interesting approach, when dealing with two- and three-dimensional images is based on a persistent diagram (PD) technique. The PD changes the representation of the data to extract crucial characteristics [
18] and uses as little information as possible to store them [
19]. Adcock et al. in [
20] presented how persistent homology can improve machine learning. All mentioned techniques are based on the idea to manipulate the dataset on which the ANN is trained.
Another method is to alter the ANN by including knowledge into its structure. Typical models are of black-box or white-box type. White-box models are robust and trustworthy due to the incorporation of physical laws. On the other hand, white-box models need empirical parameters and usually work only for specific conditions. Black box models need big datasets and can return unrealistic results, but they are great for generalization and, after training, their response time is negligible. A natural idea is to combine these two approaches into a grey model, which can result in great accuracy and generality. A successful attempt to add knowledge is made by a Knowledge-Based Artificial Neural Networks (KBANN) [
21]. The KBANN starts with some initial logic, which is transformed into the ANN [
21]. This ANN is then refined by using the standard backpropagation method [
21]. The KBANN utilizes a knowledge which is given by a symbolic representation in the form of logical formulas [
21]. In situations when knowledge is given by a functional representation, i.e., containing variables, an interesting approach was presented by Su et al. in [
22]. The authors proposed a new type of neural network—an Integrated Neural Network (INN) [
22]. In the INN, an ANN is coupled with a mathematical model in such a way that it learns how to bias the model’s output, to improve concurrence with the modeled system [
22]. The INN output consists of the sum of the model and ANN output [
22]. A similar approach to improving the mathematical model was presented by Wang and Zhang in [
23]. They proposed an ANN, in which some of the neurons had their activation functions changed to the empirical functions [
23]. The ANN was used to alter the empirical model of the existing device in such a way, that it can be used for a different device. This idea led to a Neuro-Space Mapping (Neuro-SM) in which the functional model is a part of the ANN [
24,
25]. Neuro-SM can be viewed as a model augmented by the ANN. In the Neuro-SM, the ANN maps the input and output of the model, but it does not interfere with the functional representation itself.
In more recent works, the most successful attempt for the interaction between physics and ANN is coupling ANNs with differential equations. The very first attempt to connect ANN with differential equations was performed by Psichogios and Ungar in [
26]. The authors presented a way to improve the white-box model by estimating the empirical part of a differential equation describing a fed-batch bioreactor with an ANN. The work of Psichogios and Ungar was continued by Hagge et al. [
27] with use of modern machine learning techniques on the same physical problem as Psichogios. The fed-batch bioreactor problem was further addressed in [
28,
29,
30]. Many more researchers attempted to solve the problem in a similar manner—identify the difficult to model parts in first principles and substitute them with machine learning techniques: Cubillos et al. [
31] in solid drying process, Piron et al. in crossflow microfiltration process [
32], Oliveira in fed-batch bioreactor and bakers’ yeast production [
30], Vieira and Mota in water gas heater system [
33], Romijn et al. in energy balance for a glass melting process [
34] and Chaffart and Ricardez-Sandoval in thin film growth process [
35]. Cen et al. [
36] presented a method for incorporating several neural networks into nonlinear dynamic systems for fault estimation problems. A different approach, which used differential equations as a basis for ANN was presented by Lagaris et al. [
37]. The mathematical model solution was presented as a sum of a predefined function which fulfilled the boundary conditions and Neural Network prediction. Error was estimated by providing the ANN solution to the differential equation. An interesting approach called Physics-Informed Neural Network (PINN) was presented by Raissi et al. [
38]. The authors presented a framework in which ANN was used to solve PDE system [
38]. In PINNs, an ANN is used to learn how to solve given PDE system with use of sum of two errors—one consists of difference between imposed boundary conditions and second is a residual on arbitrary set of collocation points. The general framework was presented by Parish et al. in [
39]. The idea is currently extensively studied as can be seen in [
40,
41,
42] and used in complex applications [
43,
44]. An advanced approach in the field of Computational Fluid Dynamics (CFD) was proposed by Ling et al. [
45] and later on by Wu et al. in [
46]. Chan and Elsheikh [
47] proposed ANN architecture for application in multiscale methods. An interesting utilization of ANN was presented by Tripathy and Bilionis in [
48]. Authors used Neural Networks for high dimensional uncertainty quantification, which can be used for computation-heavy numerical simulations.
As presented in the literature survey, the grey-box models attract continuously increasing attention. Unlike other artificial intelligence methods, artificial neural networks that incorporate knowledge in their structure receive the technology’s continuous growth. The increasing focus on technology is due to the high utilitarian value of this approach. Including artificial neural networks to make a data-driven prediction for the most nonlinear part of the model significantly reduces the required dataset and improves accuracy. The need for a more computationally complex problem creates a need for developing new methods of integrating models and the network.
In this paper, we present a new approach, in which almost any part of a model, from a constant to an entire equation, can be replaced by ANN’s estimation. The interaction is implemented by shifting a part of the mathematical model to be predicted by the ANN and teaching the ANN with an evolutionary algorithm (EA) using a numerical model’s output errors. Learning with evolutionary algorithms allows flexibility and many broad applications, e.g., in standard backpropagation algorithm, an error is needed for every output, which is not possible when the ANN has to approximate a function for every domain point required for the model. The novelty of our approach lies in the interaction between the numerical model and the ANN, in which any intermediate variable or parameter from the numerical model can be an ANN’s input, and the ANN can be called multiple times during one model prediction, without affecting learning algorithm. We have tested our approach statistically with simple benchmarking functions as well as with a real-life application in Solid Oxide Fuel Cell modeling.
4. Benchmarking Functions
Every system’s output, physical or theoretical, depends on its boundaries (inputs) and characteristics (adaptable parameters). Mathematical models are functions that try to reflect real system behavior. Mathematical models are often based on assumptions and empirical parameters due to the gap in existing knowledge regarding the phenomena. The simplifications in problem formulation results in the discrepancy between the model and the real system outputs. The difference can dissolve only for hypothetical cases where the system output is described by mathematical equations and well-defined.
Every function can be treated as a system, and any arbitrary function can be treated as its mathematical model. The concept of IMANN strives to be reliable and applicable to any system: physical, economic, biological, or social, only when a part of this system can be represented in a mathematical form. To be able to represent a wide range of possible applications, benchmark functions were employed as a representation of the system. The system’s inputs () and measurable outputs () correspond to the inputs and outputs of benchmarking functions. The ANN predicts a part of the system, and the numerical model calculates the rest of it. From the model’s view, the ANN’s prediction is a parameter or one of the model’s equations. Values calculated by the benchmarking functions represent the measurable system outputs. If the subfunction values predicted by the ANN had the same value as calculated from the extracted part of the benchmark function, this would represent a perfect match between the IMANN and the system output. If the real system output differs from the predicted one, the IMANN will be forced to improve the weights and biases.
Functions
To test the IMANN, two benchmarking functions are used. Typical test functions used in optimization problems were used. One polynomial function—Styblinski-Tang function—for a one-dimensional input and Rosenbrock function for two dimensions. The chosen polynomial function is given by formula:
the two-dimensional Rosenbrock function is given by:
In the case of polynomial functions, eight formulations of mathematical models are used, four with one subfunction and four with two subfunctions. The difference between the formulations lies only in the form and number of the subfunctions. Model formulations with one subfunction are given by the following equations:
where
and
are subfunctions, and
is the
i-th model function. We have chosen the formulations to distribute the complexity of prediction between ANN and numerical model in different scenarios. Equation (
3a) is a case where ANN is responsible for a very simple task, while Equation (
3d) results in a complexity similar to the predicting a value of Equation (
1). For a perfect match with the modeled benchmarking function, the subfunctions should be functions of
x in the form:
The polynomial function’s model formulations with two subfunctions are defined as:
where
,
and
are subfunctions. The idea behind choosing different formulations is the same as for Equation (
3a–d). All model functions are defined in the domain
. Ideally, the subfunctions should be functions of
x in the form:
In the case of the Rosenbrock function, the model formulation is defined in
and is given by:
where
and
ideally are subfunctions of
x and
y in the form of:
The ANN in the grey-box model is responsible for predicting the value of all subfunctions mentioned above and provides them into the model. The ANN is learned with the difference between the model output
and the data generated from a benchmarking function
in
n sample points
, from which
are training data points, and
are test data points.
is the vector of weights and biases of a neural network. The learning process is performed by an evolutionary algorithm, here with the use of a CMA-ES library for Python as explained in
Section 3.2. The vector
, which fully describes one network, is optimized by an optimization algorithm based on the fitness value:
and the best network is chosen based on the whole dataset according to:
The dimensionality of the optimization of the fully connected network for the considered problem can be expressed with the following formula:
where
D is the optimization dimensionality,
k is the number of hidden layers,
is the number of neurons in the
i-th hidden layer and
and
are the input and output dimensionality, respectively. For instance, the IMANN architecture for the polynomial prediction with one subfunction, the dimensionality of the optimization problem is equal to 47. The computational complexity of the CMA-ES algorithm is at least
. Every candidate solution consists of calling ANN for output, which is negligible, and a model function call. Assuming that a numerical model is a unit operation and runs at
, the overall computational complexity would be
. If computational complexity of a numerical model is dependent on some parameter
ℓ and is equal to
with some unit operation, the overall computational complexity would be
.
6. Discussion
From the idea of the IMANN, one can see that by decreasing the load on the ANN part to zero in the IMANN will result in the IMANN’s prediction performance equal to the model (see
Figure 2). In the proposed benchmarking case, the numerical model performance is perfect because we already know the form of the subfunctions. In real applications, finding even such a simple thing as the constant fitting parameters might be a problem.
Generalized computations from
Section 5 prove that the IMANN can achieve higher performance than the DNN, as presented in
Table 3. Gray-box approach is better than black-box when the complexity of subfunctions is lower than the complexity of a predicted function (see the upper part of
Table 3). There was no significant difference when one and two subfunctions were modeled by the IMANN with similar complexity of subfunctions (see the lower part of
Table 3. Therefore, the IMANN can improve the mathematical model’s performance by modeling its oversimplified, uncertain or missing parts, that are responsible for the discrepancy between the numerical model’s prediction and the modelled system.. The power of artificial neural network is utilized only for the problem part that cannot be solved accurately any other way. Such an approach reduces the complexity of a problem that is solved by an ANN, resulting in better prediction, especially with a small number of learning data points. The obtained results indicate great potential in the integration of mathematical models and artificial neural networks.
There are two major limitations of the IMANN. Since learning requires to run a numerical algorithm for each training data point each time a set of weights is tested, a reasonable learning time may not be obtained with computationally heavy numerical models. A second limitation arises from the fact that a problem of providing a part of the model by the ANN can become ambiguous, which can result in an unphysical or irrational form of subfunctions without additional control. A proposed way to keep ANN outputs physical is to impose additional constraints by adding penalty terms in . As for results provided in this paper, this topic is not studied.
7. Practical Application in Solid Oxide Fuel Cells Modeling
Microscale modeling of Solid Oxide Fuel Cell’s (SOFC) anode comes with a highly difficult problem of modeling a multistep electrochemical reaction between oxygen ions and hydrogen. The reaction occurs in the so-called triple phase boundary (TPB) and is a basis of SOFC’s operation. The equation which describes the reaction has empirical parameters that have to be estimated or fitted into the solution. Simplified equation (without elaborating the preexponential term
) takes the form described by the Butler-Volmer formula:
where
is the volumetric exchange current density (
),
is the exchange current density (
),
F = 96,485.3415 s A/mol is the Faraday constant,
T is the temperature (
),
and
are the charge transfer coefficients (
1) that are obtained by fitting to the experimental data [
53],
is activation overpotential (
). Detailed mathematical model of SOFC’s anode phenomena can be found elsewhere (e.g., [
54]).
The Butler-Volmer equation comes with three empirical parameters—
,
and
, for which the values and the functional dependence from microstructural and system parameters is unknown [
55]. Here, an ANN is employed to provide
and
values as functions of temperature
T (K) and current density
I (
) for numerical model of the anode. The architecture is schematically presented in
Figure 14.
Data available in the literature [
56] was used as a source of microstructural parameters, typically used charge transfer coefficients’ values and both training and test datasets. Training set consisted of twelve data points, six for
and six for
. The test set was made of six data points for temperature
.
The ANN’s architecture contains one hidden layer consisting of three neurons with Bent Identity activation functions. The output layer contains two neurons—one for and one for charge transfer coefficients—with SoftPlus activation functions, as values of charge transfer coefficients should be positive. The output of activation functions was directly provided to the numerical model. The inputs to the ANN were temperature and current density.
To check if the IMANN architecture provides values that are physically feasible, it is shown what is the proposed functional form of charge transfer coefficients in dependence from current density,
Figure 15, and temperature,
Figure 16. In
Figure 15 values of both
and
charge transfer coefficients are presented as functions of current density for three different operating temperatures. It can be seen that the
value proposed by the ANN changes nonlinearly with the current density and is increasing monotonically. In
Figure 16 charge transfer coefficients are presented as functions of temperature for three different current densities. The
value changes nonlinearly with the temperature and is decreasing monotonically. The
value is practically independent from both temperature and current density. Values of charge transfer coefficients proposed by the ANN are close to the ones frequently used in the literature. Researchers in the literature propose different values for charge transfer coefficents. For
and
, one can find values in the ranges
and
, corespondingly [
55,
57,
58].
To see the improvement of the overall model, IMANN architecture is compared to the model with typically used charge transfer coefficient values. In the
Figure 17 overpotential of the anode versus current density is presented for three different temperatures taken from the literature [
56]. Experimental data points are marked as symbols, the standard model is represented as a dashed line, and the proposed method is represented as a solid line. It can be seen that for all data points—both training and test—hybrid model outperforms the classical model in terms of agreement with experimental measurements.