2.1. Experiments
The experimental plant built for performing the study of ultrasound cavitation and the materials’ mass loss is presented in
Figure 1.
Its main parts are as follows [
3,
39]:
- -
The tank (1) containing the liquid subject to the ultrasound cavitation;
- -
The high-frequency ultrasound generator (8), designed to work at 220 V, 18 kHz, and three power levels;
- -
The piezoceramic transducer (7) that produces cavitation entering into oscillation as a response to the high-frequency signal received from the generator;
- -
The control panel (command block) (12) from where the ultrasound generator’s working power is selected;
- -
The cooler (11) that is used to maintain a constant temperature of the liquid;
- -
The measurement electrodes (13), which are utilized only in the experiments related to capturing the signal induced in the cavitation field;
- -
The acquisition data unit (14), used only in the experiments related to electrical signals induced by ultrasound cavitation to collect the signals.
For experiments in the circulating liquid medium—not discussed in this article—the pump (3) is switched on. The following conditions have been met in the experiments whose results are presented here.
The studied samples had the following compositions:
- -
Cu containing small percentages of Fe, Sn, and Zn (0.0395%, 0.0446%, and 0.0747%, respectively);
- -
A brass with 2.75% Pb, 38.45% Zn besides Cu (57.95 %);
- -
A bronze containing Zn, Pb, and Sn (4.07%, 4.40%, and 6.4%, respectively) besides Cu.
The samples had a hexagonal shape, with a side of about 1.5 cm. They were suspended by rigid plastic wires inside the tank at a distance of about 20 cm from the transducer.
The ultrasound generator worked at 180 W, and the water temperature was maintained at 20 °C. The samples were kept in saline water under cavitation produced by ultrasound for 1320 min, cleaned, and weighted every 20 min. The composition of the seawater used for all the experiments is the following: pH = 7, 22.17 g/L NaCl, 0.051 mg/L Fe, 0.0033 mg/L Ni, 0.31 g/L , total water hardness—6.27 meq/L.
The experiments were performed in triplicate.
To estimate the ultrasound effect on the samples from a quantitative point of view, the mass variation (computed by the difference between the sample’s mass at the experiment’s beginning and the mass at the moment
t) on the surface (S) was recorded for the modeling stage. The data series are represented in
Figure 2.
The mass loss variation can be explained by a complex mechanism (electrochemical and erosion–corrosion). Since the material is introduced in seawater with a high concentration of NaCl, two opposite processes appear: (a) passivation leading to the formation of Cu oxides deposited on the sample surface; (b) removal of these oxides due to the ultrasound action. In the first phase, the sample mass increases, while in the second one, it decreases, explaining the oscillations observed in the chart.
Moreover, as a cavitation effect, the material’s microcrystalline structure is weakened by the erosion–corrosion that breaks the atomic links. Consequently, the material’s resistance decreases, resulting in microfissures and cracks. These fissures become new sites where the material is broken and detached from the sample. Therefore, the material mass loss is more accelerated when it is no longer maintained in the cavitation field, increasing the mass loss rate. The material loss is not uniform because it is composed of different elements with various structures and resistances to cavitation.
2.2. Modeling Methodology for the Weight Loss
The AI methods utilized in this work for modeling the mass loss of the materials in the described conditions are SVR, GRNN, GEP, and RBF.
SVR [
40,
41,
42] belongs to the class of supervised learning algorithms characterized by high performances correlated with a low computational cost in solving regression and classification problems. Using training sets composed of
n pairs
(
k = ) are the vectors of known features from a domain
in the
m-dimensional space,
(
are the target values, the algorithm output is a function
f based on which unknown
are provided when
are given. The model function
f, in the linear case, is
where
is the inner product in
D, and
b is a real constant.
SVR produces prediction functions mapped on a set
of vectors support [
27] and
f should minimize the objective function
where
is the norm in
,
,
L is a
-insensitive loss function [
40,
41].
The maximum deviation between the target and the
f function’s values must be less than
on the training set. The number of support vectors is controlled by
whereas
C ensures a balance between the model function’s flatness and the bias from the
[
27,
41].
The nonlinear problems are transformed into linear through some mappings, defined using different kernels is the inner product of and in .
One of the most important aspects of SVR modeling is choosing the kernel type. The most used kernels are RBF, linear, sigmoid, and polynomial. Their choice significantly influences the model quality. Therefore, for our study, we performed experiments with all these types of kernels. Since the best results were obtained utilizing a linear kernel, we report them in the article.
Another aspect is related to the time necessary to perform the experiments. SVR is the worst compared with GEP, GRNN, and RBF. It takes a few minutes to run the algorithm, compared with a few seconds for the other three methods. The time consumed increases when the number of the kernel’s parameters to be estimated increases and the step size in the grid search for each parameter decreases.
After choosing the linear kernel, the step size in the grid search was selected to be 0.1 (to maintain a balance between the time and parameters’ quality) and the parameter C was searched in the interval [0, 50,000].
The number of predictors should also be selected for performing the algorithm. Theoretically, the user should choose the number of predictors based on his experience or the series characteristics. After performing experiments with different predictors, and given that the mass loss depends on the mass value before running the experiment, the number of predictors was chosen to be one, which is the lag 1 variable (the sample’s absolute mass loss at the previous moment).
For modeling, the data series was divided into two parts—the first for training and the second for testing. Different ratios were tested. Here, we report the best results obtained for the ratio training:test = 70:30 (70% of the data series for training the model and the rest for the testing). A 4-fold cross-validation was used in the training–prediction process. The optimization criterion was to minimize the total error.
For details of SVR, the reader may see the articles of Vapnik [
40] and Smola [
41].
GRNNs (
Figure 3) are ANNs with four layers—input, hidden, summation, and output. These feedforward networks process the information in successive order from one layer to another without feedback [
43].
The neurons in the hidden layers symbolize different patterns. They calculate the distance (usually Euclidean) between each input data and a center point and apply an activation function to these distances.
Each training sample,
, is utilized as the mean of a Gaussian distribution
where
and
is the distance between the training sample and the prediction point.
is a measure of how well each training sample can represent the position of prediction, X. When
is small,
is big. When
and the evaluation point is best represented by the training sample. A high
produces a small
; as a consequence, the contribution of the other training samples to the prediction is relatively small. If
is big, the possible representation of the evaluation point by the training sample is possible for a wider range of X, whereas when
is small, the representation is limited to a narrow range of X [
44]. Since the influence radius of each neuron is controlled by
this parameter must be found in the training process to optimize the network’s performance.
The third layer is formed by the S- and D-summation neurons, which sum up the information received from the previous layer (weighted by the first one).
The GRNN is described by the equation [
45]
where the symbols have the following meanings:
X—the vector containing the network input;
Y—the output vector;
—the activation function;
—the matrix of the weights of the input in the hidden layer;
—the matrix containing the weights of the results of the hidden layer;
—the error vector corresponding to the first (hidden layer).
In this study, to select the optimum the employed algorithm was the conjugate gradient, and the search was performed in the interval [0.0001; 10]. The convergence tolerances were 10−8—absolute and 10−4—relative, and the upper limit of the number of iterations (without improvement, respectively) was 5000 (1000, respectively). As for SVR, the best results are reported here. They were obtained when running the algorithm with a ratio of 70:30 between the training and test sets.
RBF belongs to the feedforward ANNs and is built of three layers—input, hidden, and output. The first layer’s output is obtained by computing the distance between its input and the second layer’s centers. The hidden layer has as output the weighted values of the output of the first layer. The center is a vector of the hidden layer’s neurons. An activation function (usually Gaussian) is associated with the neurons from the second layer. This function has a spread parameter that controls its behavior.
The output of an RBF network can be described by [
46]
with
—the kth neuron’s output;
x—the input data vector;
cj —the jth neuron center vector;
J (K)—the number of neurons belonging to the second (third) layer;
—the weight corresponding to the jth and kth neuron and output;
—the Euclidean norm;
—the bias corresponding to the kth neuron’s output;
—the spread parameter (radius) that controls the jth neurons’ spread.
For the network’s performance evaluation, MSE or RMSE are usually utilized.
The RBF training is performed to minimize the objective function, which is the sum of square errors, defined by
where
is the recorded value and
is the result after running RBF [
47].
In RBF networks, choosing the neurons’ number in the hidden layer affects the network’s complexity and its generalizing capability. If this number is insufficient, the network cannot learn the data adequately. If this number is very high, it may result in overfitting or limited generalization capacity [
48,
49].
It was also shown that the centers’ detection (in the second layer) significantly influences the performance of the RBF network [
49]. Therefore, one of the main tasks when training the network is determining the best center positions.
The RBF network training must also include the optimization of the spread parameters of each neuron and the selection of the weights between the second and third layers. Therefore, in the training process, the number of neurons, the centers in the second layer, the spread parameters, and the weights must be appropriately selected. To achieve this goal, the network’s training was performed by an orthogonal forward algorithm proposed by Chen et al. [
50] that employs tunable center vector nodes for building the RBF function and minimizes the leave-one-out procedure. The algorithm does not need to impose a stop criterion. The ridge regression was utilized for the weights’ computation.
For tuning the neurons’ parameters, the size of the population was fixed to 200, maximum number of generations—20, maximum generations flat—5, and maximum boosting tolerance—10−4. The used networks’ parameters were maximum number of neurons—100, minimum (maximum) radius = 0.01 (400), and absolute tolerance—10−6.
The reader may refer to [
47,
51,
52,
53] for more details on RBF networks.
Genetic Algorithms (GAs) are evolutionary techniques based on the principle of Darwinian selection, operating on populations of individuals, successively created by choosing the best individuals based on their fitness. The selected individuals are combined using specific operations to give birth to new generations. The steps of this procedure are (a) random initialization of the population, (b) fitness assignment, (c) individuals selection, (d) crossover and mutation. The algorithm is run until the stop criterion is met [
54]. Random mutations do not permit GAs to remain captured in a locally optimal region.
Introduced in 1988 by Koza [
55], Genetic Programming (GP) belongs to the same class of evolutionary techniques [
56] and differs from others by the information’s representation as programs inspired by nature mechanisms [
57]. GP provides the solution to the problem at hand, enabling the computer to search for it and deliver it in the form of parse trees whose nodes but the terminal ones contain operators. The terminal nodes contain constants and variables so the expressions can be quickly evaluated and evolved. For example, the tree in
Figure 4 presents the parse tree that encodes the expression 2 + (7 * X) − (11/cos(Y)).
Gene Expression Programming (GEP), proposed by Ferreira in 2001 [
58], incorporates features from GAs (such as linear chromosomes of fixed length) and GP (the structure of a tree with different shapes and sizes).
A chromosome is a string consisting of elements belonging to (a) and (b) that is mapped into a tree that has, as a correspondent, a unique mathematical formula [
59]. It contains at least one gene, which is formed, at its turn, by several symbols (fixed) and has a head (that contains constants, variables, and functions) and a tail (containing terminal symbols–constants and variables). When a minimum of two genes are present, they are connected by a linking function to generate the solution to the problem at hand.
For solving a problem using GEP, one must specify (a) the set of functions, (b) the terminal set (that contains constants and variables), (c) the fitness function, (d) the control parameters, and (e) the stop condition.
To start, GEP randomly generates a population of chromosomes evaluated with respect to a fitness function defined by the user. Then, the best individuals are chosen, using the roulette-wheel selection criterion, for producing the next generation using genetic operations (mutation, transposition, crossover). Still, elitism is also applied: the individual with the highest fitness in each generation goes without modifications to the next generation. The number of genes and the linking function must be specified before running the algorithm. In all cases, the main point is that the solution to a problem is represented by individuals evolving and improving from one generation to the next. The process continues until the stop criterion is met.
It should be mentioned that the expression with the simplest form is preferred among the expressions with the same performances.
For details on GEP, the reader may see [
58,
60].
In the experiments, we performed 50 independent runs for each setup.
In this article, the following settings have been used to run the algorithm:
- -
Number of genes per chromosome—4.
- -
Length of the head of a gene—8.
- -
Number of constant per gene—10.
- -
The size of the population—50 individuals.
- -
The maximum number of generations (and without improvement)—2000 (1000).
- -
The fitness function—MSE, and the hit tolerance—0.01.
- -
The functions used to build the final expression—{+, −, *, /, sqrt}.
- -
The linking function was the addition.
- -
The algorithm was allowed to do algebraic simplification.
- -
The mutation (and inversion) rate—0.44 (0.1).
- -
The transposition rate—0.1.
- -
The one-point (two-point and gene) recombination rate—0.3 (0.3 and 0.1).
- -
For the experimental reasons explained above, the regressor in the model was the lag 1 variable, to predict .
- -
The obtained models are compared with respect to the MSE values obtained on the original data set; in the following, we report only the best model (i.e., the model with the smallest mean standard error) found in all 50 runs of the algorithm.
- -
Experiments were performed using different ratios between the Training and Test sets, such as 80:20 or 90:10; the best results are presented in this article, which were obtained for the Training set formed by 70% of the data series, and the Test set formed by the rest of the series’ values.
To compare the algorithms’ performances, the following indicators were used: the proportion of variance explained by the model (R2), the coefficient of variation (CV), the correlation between actual and predicted (rap), the mean absolute and mean absolute percentage error (MAE and MAPE), and the root mean squared error (RMSE).
The DTREG software [
61] was employed for running the algorithms.
The flowchart of the study is presented in
Figure 5.