1. Introduction
Modern neural networks used to diagnose complex dynamic objects, such as aircraft engines [
1,
2] and industrial medical systems [
3,
4,
5,
6], are becoming increasingly complex and adaptive. However, despite high accuracy and efficiency, many algorithms [
2,
5,
6,
7] face stability and reliability problems in the changing environmental parameters and the system’s internal structure. The neural networks’ morphology symmetry research allows for an increase in their ability to adapt to changes in an object’s characteristics and, as a result, optimise the diagnostic process. The neural networks’ symmetrical morphology can reduce the need for frequent reconfigurations, minimising the incorrect recognition risks and anomalies in the system’s operation, thereby increasing the diagnostics’ accuracy and reliability.
Due to increasing requirements for the technical systems’ reliability and safety [
1,
3,
4,
8], research on the influence of the neural networks’ morphology symmetry on the dynamic objects’ diagnostics is becoming relevant. Incorporating symmetry principles into the neural networks’ architecture [
9] can increase its resistance to changes in system parameters and reduce the computational load required to adapt to new conditions. It is essential in actual operating conditions, where the object is exposed to various external factors that affect its behaviour and characteristics.
Research into the application of neural network technologies in the complex dynamic systems’ diagnostics field is actively developing, especially in areas requiring high accuracy, such as aerospace engineering [
10,
11], energy [
12,
13], and industrial automation [
14,
15,
16]. Neural network technologies are widely used for data time series analysis [
17], predicting anomalies [
18] and malfunctions [
19], and optimising the dynamic objects’ control in operational conditions [
20,
21]. At the same time, their application emphasises recognising deviations in algorithms that can developed [
22,
23] from the norm and predicting potential failures in the system. In particular, recurrent neural networks, such as LSTM (long short-term memory) networks [
24,
25], have shown the required results in the time sequence diagnostics problems due to their ability to store and analyse long-term dependencies in data.
The symmetric neural architectures used [
26,
27], in which the network structure maintains mirror or other symmetry in the connections and weights distribution, have been shown to increase robustness to noise and input data distortion, improving the network’s ability to extract critical patterns. Symmetry use also reduces the number of neural network parameters [
26,
28], which reduces the computational load and makes models less susceptible to overfitting. However, much of the research, including [
26,
27,
28], has focused on symmetry use in static problems such as image recognition, and only a limited number of studies (e.g., [
29]) have aimed to adapt these methods to dynamic systems.
One of the most promising areas is the symmetry used to improve neural networks’ stability and adaptability in changing data structures and system parameters. Research [
30,
31,
32] shows that using symmetric neural networks is appropriate when the object is exposed to multicomponent external factors. Traditional diagnostic methods often lose accuracy in such conditions and require regular calibration. Symmetry in the neural networks’ morphology can potentially provide better stability and reduce the neural networks’ dependence on individual variables. However, the practical implementation of this approach for dynamic systems is still an open task.
However, issues related to determining the optimal symmetry level that achieves a balance between accuracy and stability remain understudied. Most existing approaches rely on empirical data, but no formalised methods exist for determining and quantifying optimal symmetry parameters for dynamic objects. In addition, the symmetry effect on the neural network’s ability to adapt to changing system operating conditions, such as load, temperature, or other external factors, has not yet received sufficient theoretical justification and experimental verification.
An equally important task is to develop methods for integrating symmetric structures into more complex neural network architectures capable of operating in real-time. An essential requirement for most dynamic objects, especially those operating in critical conditions, is the neural networks’ ability to process data and adapt to its changes quickly. However, most research focuses on static symmetry analysis, while dynamic symmetry and its impact on network performance remain poorly understood.
The research aims to develop and study the influence of neural network morphology symmetry on diagnostic systems’ accuracy, stability, and adaptability for complex dynamic objects. The research object is neural networks used for diagnostics and monitoring the complex dynamic objects’ operating state under changing external factors. The research subject is the neural networks’ symmetrical morphology and its influence on the neural networks’ ability to adapt to changing operating conditions, improving the stability and accuracy of complex dynamic objects’ diagnostics.
The article consists of an introduction, main sections (“Materials and Methods”, “Case Study”, “Discussions”), conclusions, references, and
Appendix A. The introduction substantiates the relevance of research on neural networks’ morphology symmetry to improve the accuracy, stability, and adaptability of complex dynamic objects’ operating system diagnostics under changing external factors, which will optimise the diagnostic process and reduce the computational load. The “Materials and Methods” section proposes a mathematical model that takes into account the neural networks’ dynamic morphology symmetry for diagnosing complex dynamic objects. The symmetric architecture and adaptation parameters concept are introduced, the conditions for the weights symmetry and their dynamic adaptation are formulated, and optimisation methods taking into account symmetry regularization are proposed. A theorem on the symmetric neural network optimisation is also proved, which ensures the solution’s stability and the loss function minimization with a unique global minimum. The “Case Study” section includes mathematical modelling of the scale’s behaviour under symmetry, a convergence analysis of gradient descent with symmetry, mathematical modelling of the weights’ behaviour under symmetry, an analysis of the regularization parameter influence on symmetry and overall error, and an analysis of the symmetry influence on the loss function and error dynamics. The “Discussion” section presents the research generalization substantiating the symmetry advantages in neural network architecture. This includes the symmetric regularization influence on the optimisation stability, training algorithms’ convergence, and the weight matrix stability, as well as the limitations analysis and prospects for further development for solving applied problems. The “Conclusions” present the research results.
Appendix A presents an example of a neural network diagnostic model of a helicopter turboshaft engine based on a five-layer perceptron (3-6-12-6-3 structure) that analyses key engine performance parameters (rotor speeds and gas temperature in front of the compressor turbine) to detect defects and assess the engine condition based on data collected in real flight conditions.
2. Materials and Methods
To research the influence of neural networks’ morphology symmetry, a mathematical model is proposed that considers the neural networks’ dynamic symmetry for the diagnosis of complex dynamic objects, and the concept of symmetric architecture and adaptation parameters is proposed. Let us consider the neural network as a function
f: ℝ
n → ℝ
m [
33], which finds the relations between the input data and diagnostic outputs. It is assumed that the neural networks’ weights are determined by the matrix
W, and the shift vectors are determined by the matrix
b. For the
l-th layer with
n neurons, the weights and shifts are defined as follows:
The outputs of the
l-th layer are defined as follows:
where
σ is an activation function, such as ReLU (and its modifications, such as SmoothReLU [
34]) or sigmoid, and
a(l−1) is the previous layers’ output.
The weights must satisfy a certain symmetric condition for a symmetric neural network. It is assumed that
W(l) is a symmetric matrix of the following form:
Then, each element
, which significantly reduces the number of unique parameters in the weight matrix. To take into account dynamic changes, a symmetry function
S is introduced, which changes the weights depending on the systems’ state as follows:
where
t is time, and the function
S dynamically adjusts the weights depending on current conditions. For example, the function
S can be defined as follows:
where
is the weights’ initial symmetric state, and
a(l) is a function that regulates the initial states’ contribution and the systems’ current state. It is assumed that
θ(
t) is the adaptation parameters vector, including weights and biases that depend on time:
Then, the training problem with dynamic symmetry is formulated as the parameters
θ(
t) optimisation taking into account the loss function
L(
t) minimisation:
where
xi is the input data,
yi is the expected output, and
N is the number of training examples.
A condition on the weights’ gradients is introduced to optimise the parameters considering symmetry. It is assumed that ∇
W(l) is the loss function gradient over the weights:
To ensure the weights’ symmetry, a constraint of the form is added as follows:
In this case, the weights’ update is carried out taking into account this limitation as follows:
where
η is the training rate.
To take into account the dynamic symmetry influence, a regularising term
R(
W) is added, which minimises the deviation from the symmetric state:
where
and
λ is the regularisation coefficient that controls symmetry, and
Ltotal(
t) is the resulting loss function with dynamic symmetry.
A function
γ(
t) is introduced for dynamic symmetry, which controls the symmetry degree as a function of time. Then, the symmetry condition can be modified as follows:
where
is the weights’ symmetric part, and
γ(
t) ∈ [0, 1] determines the symmetry level.
Then, the final expression for determining the loss function taking into account regularisation takes the following form:
To solve the loss function minimising optimisation problem with regularisation given in Equation (13), it is necessary to minimise it by the parameters
θ(
t), including the weights
W(l) and the biases
b(l). To optimise the function
Ltotal(
t) by
W(l), based on [
32,
35,
36], it is advisable to use the gradient descent method. According to this method, the gradients of each term are calculated (the primary term is the neural network error, and the regularisation term is the weights’ symmetry). The loss functions’ (13) central part is the mean square error between the neural network prediction
f(
xi,
θ(
t)) and the expected result
yi:
For simplicity, the error for one training example is denoted as follows:
Then, this error’s partial derivative concerning the weights
W(l) is equal to the following:
The calculation
depends on the activation functions and the neural network architecture. For example, for a superficial linear layer with activation
, the partial derivative is as follows:
The regularisation term responsible for the weights’ symmetry has the following form:
or the following expanded form:
For weights
, the regularising terms’ partial derivative for
is equal to
Then, the expression for determining the total loss function
Ltotal(
t) gradient by weights
W(l) takes the following form:
After setting up expressions to determine partial derivatives, we obtain the following:
Using the gradient descent method, the weights’
W(l) update at
t-th step is carried out as follows:
After substituting the expression for determining the gradient (22), we obtain the following:
Since the weights’ symmetry is required, the weights are adjusted after each update step by averaging their values with the transposed matrix:
Considering the neural network weights’ symmetry property, which influences the solutions’ stability to optimise the loss function problem, Theorem 1, “On the symmetric neural network optimisation stability”, is formulated.
Theorem 1. If the weight matrix W is symmetric and positive definite, then minimising the loss function L(W), which has a smooth, convex shape, leads to a unique global minimum.
Proof of Theorem 1. Let
W ∈ ℝ
n×n be a symmetric and positive definite matrix that is as follows:
To prove the formulated Theorem 1, we consider the loss function that must be minimised for
W:
where
f(
W) is a convex function depending on the weights, and the second term is a regularisation that ensures the weights’ symmetry. For a symmetric matrix
W =
WT, the second term vanishes, as follows:
Thus,
L(
W) =
f(
W) for symmetric
W. Since
f(
W) is convex, it has a unique global minimum; that is, there is a unique point
W* such that
Next, the loss function
L(
W) is minimised, including symmetry regularisation. For this, the
L(
W) derivative concerning
W of the form is considered, as follows:
Since
W is symmetric,
W =
WT, and the regularisation term 2·
λ·(
W −
WT) vanishes. Therefore,
Since W is positive and definite, this property holds for f(W), ensuring the stability of the solution. The f(W) convexity ensures that W* is the only minimum, and the positive definiteness of W confirms that W* is stable and minimal. Since there is a unique minimum of L(W) for a symmetric, positive definite matrix W, minimising this function leads to a stable and unique solution. Thus, it is proven that the symmetry and positive definiteness of the weight matrix in a neural network ensure the loss functions’ stable optimisation with a unique global minimum. □
The proof of Theorem 1 relies on the symmetry and positive definiteness of the weight matrix W and the loss function L(W) convexity. The symmetry and positive definiteness of W guarantee the uniqueness and stability of a solution that minimizes the loss function L(W). The positive definiteness of W ensures that the quadratic form xT· W·x > 0 for all x ≠ 0, which confirms the solutions’ stability and minimality. The convexity of f(W) ensures the unique global minimum existence of W* such that f(W*) ≤ f(W) for all W ∈ ℝn×n. The symmetry regularization of L(W), including the term λ·∥W − WT∥2, forces W to be symmetric; for W = WT, this term is zero, and L(W) = f(W). The derivative ∇WL(W) = ∇Wf(W) + 2·λ·(W − WT) also simplifies to ∇WL(W) = ∇Wf(W) for symmetric W, and the convexity of f(W) ensures that ∇Wf(W) = 0 has a unique solution W*, confirming the optimisation’s uniqueness and stability.
Thus, a final optimisation procedure for symmetry-based weights is proposed, consisting of the following steps:
The proposed optimisation procedure allows for considering the weights’ dynamic symmetry, minimising the overall loss function, and ensuring the neural network’s stable operation when diagnosing complex dynamic objects. For this aim, several studies were conducted in the research, described in
Table 1.
The proposed mathematical model demonstrates an innovative approach to accounting for symmetry in neural networks, which is emphasized by the weights’ behaviour analysis when introducing symmetry, gradient descent convergence, and the regularization parameter influence. The weights behaviour under symmetry modelling shows that regularization improves the network’s stability by minimizing the discrepancies between the elements of the weight matrix and its transposed version. The gradient descent convergence analysis, taking into account symmetry, reveals that the weights’ symmetric structure contributes to more stable and predictable dynamics of parameter updates, which is confirmed by the global minimum uniqueness proof. The regularization parameter λ plays a key role in the balance between prediction accuracy and symmetry. Increasing λ emphasizes symmetry preservation, which can reduce the error during generalization, but excessive values of the parameter can lead to a limitation of the flexibility of the model. The effect of symmetry on the loss function is expressed in a decrease in dynamic errors due to a decrease in parameter redundancy and a simplification of the optimisation landscape.
3. Case Study
3.1. Mathematical Modelling of the Scale’s Behaviour Under Symmetry
A mathematical model has been developed to prove the stability of symmetric regularisation in a neural network that analyses the behaviour of the weights W with symmetric regularisation and estimates their stability over time. Consider the loss function Ltotal(W), presented in a generalised form in (27), which includes the main error component and symmetric regularisation, in which f(W) is a convex function depending on the weights, that is, the primary loss function, λ > 0 is the regularisation parameter, and the regularising term ∥W − WT∥2 is minimised when W is symmetric, that is, W = WT.
This study shows that symmetric regularisation, with an appropriate choice of the parameter
λ, promotes the weights’ robust behaviour, in which small perturbations of
W do not lead to significant deviations in the loss function
Ltotal value. Using gradient descent to update the weights
W, we obtain the following:
where
η is the training step, and the gradient ∇
WLtotal(
W) is given by the following:
Stability requires that the weights
W(
t) converge to the equilibrium value
W* while minimising
Ltotal, and that small changes in the initial conditions
W(0) do not lead to significant deviations of
W(
t) from
W*. It is achieved if the Hessian matrix
of the loss function
is positive definite. The Hessian
of the loss function
can be written as the following sum:
where
is the Hessian of the main loss function
f(
W), and
2·
λ·
I is the symmetric regularisation contribution.
For sufficiently large
λ,
becomes a positive definite matrix since
2·
λ·
I adds positive eigenvalues, which the weights’ behaviour stabilises. Stability requires that all
eigenvalues of
be positive. It is ensured by choosing
λ such that
Thus, if
Hf has negative or small positive eigenvalues, adding
2·
λ·
I with sufficient
λ shifts all eigenvalues to the positive region, ensuring stability. We define the energy function for the weight
W as
E(
W) =
Ltotal(
W)·
E(
W). Stability implies that the change in
E(
W) over time tends to zero as the equilibrium state
W* is approached:
Thus, symmetric regularisation causes E(W) to decrease, and the weight system stabilises at W = WT, where the loss function is minimal. Since symmetric regularisation adds positive definiteness to the loss function Ltotal Hessian, it leads to stability in weight training since small perturbations do not cause significant deviations from the minimum point.
3.2. Convergence Analysis of Gradient Descent with Symmetry
To analyse the gradient descent with symmetric regularisation convergence, we consider the full loss function
Ltotal(
W), which includes the main error component and symmetric regularisation and is presented in Equation (27). The analysis studies the gradient norm ∥∇
Ltotal∥ and the weights’ norm ∥
W(l)∥ at each step. It will allow us to determine how symmetric regularisation affects the convergence speed and stability. Using gradient descent, the weight update at the
t-th step is carried out according to (32), while the loss function
Ltotal full gradient, considering symmetric regularisation, is determined according to (33). Thus, the iterative weight update rule takes the following form:
For convergence, it is required that
decreases as the iterations
t increases the number. For this, it is assumed that
W* is the optimal value of the weights that minimises
Ltotal(
W). The change in the loss function at each step can be written as follows:
To prove the convergence, we assume that
Ltotal(
W) is convex and ∇
WLtotal(
W) is Lipschitz continuous with constant
L, that is,
Then, for a convex function with symmetric regularisation, the gradient descent convergence will be ensured if the training step
η is chosen such that
where 4·
λ is related to the symmetric regularisation. This condition allows the control of the step size and thus promotes stable convergence.
To estimate the change in the weights’ norm ∥
W(
t)∥, the change in the weights’ norm considers the symmetric regularisation. At each step,
Substituting the gradient value, we obtain the following:
Equation (42) shows that symmetric regularisation adds a term 2·λ·(W(t) − (W(t))T)2, which minimises the weights’ asymmetry, gradually bringing W closer to the symmetric state. This regularisation smoothes out the changes in the weights’ norm, which prevents sharp fluctuations and promotes stable convergence.
To prove stability, the Lyapunov method is used. Let
V(
W) = ∥
W −
W*∥ be the Lyapunov function, where
W* is the minimum point. Then, the change in
V at each step will be equal to the following:
Using the gradient descent weight update formula, after substituting (32), we obtain the following:
After the norm square expanding, expanding the brackets and reducing ∥
W −
W*∥
2, taking into account that for a small step
η the regularisation term 2·
λ·(
W(
t) − (
W(
t))
T)
2 smooths out the asymmetry,
W(
t) is brought closer to symmetry:
For V(W(t + 1)) − V(W(t)) ≤ 0, it is required that the second term does not exceed the first. This change will be negative if the training step η satisfies the abovementioned conditions and the symmetry regularisation λ stabilises the trajectory W, minimising V(W). Thus, V(W) decreases at each step, proving the algorithms’ convergence.
3.3. Mathematical Modelling of the Weights’ Behaviour Under Symmetry
To construct a weights’ evolution W mathematical model under symmetric regularisation, the weight changes dynamics are described as a differential equations system that considers the loss function primary gradient and the regularising symmetric term. To do this, we consider the full loss function, which includes the main error component and symmetric regularisation, presented in Equation (27), where ∥W − WT∥2 is the symmetric regularisation minimised at W = WT. This study aims to construct the weights’ W(t) evolution model taking into account regularisation to understand how they change over time depending on the initial conditions and the training step η.
To derive the differential equation for the weights, it is assumed that the weights’ evolution is described by continuous dynamics, where the first-order differential equation determines the changes in the weights W(t) over time t:
Let us substitute Equation (27), which describes the total gradient
Ltotal(
W), to obtain an expression for the right-hand side of (46):
Equation (47) describes the change in the weights
W(
t) under the loss function
f(
W) main gradient action and symmetric regularisation. Equation (47) is split into two components to analyse the weights’ behaviour under symmetry: the primary gradient and regularisation contributions. The result is the dynamics determination for the weights’ symmetric and asymmetric parts, representing
W as the sum of the symmetric
Ws and antisymmetric
Wa parts, that is,
where
is a symmetrical part, and
is an asymmetrical part.
For the symmetric part, regularisation has no effect since
. Then, the dynamics for
Ws are described only by the loss function main gradient:
The regularisation tends to reduce the antisymmetric part to zero, i.e.,
Wa → 0. Then, the dynamics for
Wa will be as follows:
Equation (50) shows that the antisymmetric part of
Wa will exponentially decrease at a rate dependent on the regularisation parameter
λ. The larger the value of
λ, the faster
Wa tends to zero, which leads to the matrix
W symmetrisation over time. The solution for the antisymmetric part, the
Wa equation, is determined by Equation (50). If the main loss function
f(
W) does not have a significant effect on
Wa, then the equation approximately takes the following form:
The solution to this differential equation will be as follows:
where
Wa(0) is the initial value of the antisymmetric part.
Solution (52) shows that the antisymmetric part Wa(t) exponentially tends to zero, which confirms the matrix W symmetrisation in the regularisation presence.
3.4. Analysis of the Regularisation Parameter λ Influence on Symmetry and Overall Error
To assess the influence of the regularisation coefficient λ on the weights’ W symmetry and the final error Ltotal, the analysed parameter is the loss function, presented in the form of (27), where f(W) is the primary loss function (e.g., the mean square error (14)), and λ·∥W – WT∥2 is the regularisation term that controls the weight matrix W symmetry degree. The regularisation coefficient λ determines the regularisation weight: small values of λ have a minimal effect on symmetry, while large values can exaggerate symmetrisation, potentially worsening the models’ accuracy. To analyse the weights’ behaviour with a change in λ, the loss function gradient for the weights W is determined according to (33). The calculated gradient ∇WLtotal(W) is interpreted as follows:
The first component ∇Wf(W) is aimed at minimising the primary loss function, affecting the models’ accuracy.
The second component 2·λ·(W − WT) is the regularisation gradient, proportional to the difference between W and its transpose. The regularisation gradient tends to make W symmetric.
Symmetry regularisation affects the final error and the models’ accuracy as follows:
For small values of λ, the regularisation gradient 2·λ·(W − WT) has a small weight, and the weights’ symmetry has a minimal effect on the loss function. Only the weights that minimise f(W) have a major influence on training.
For large values of λ, the regularisation gradient is amplified, forcing the weights to be symmetric, which can lead to the models’ degradation due to a narrowing of the possible values of W.
To quantify symmetry, a deviation metric from symmetry is introduced in the following form:
The total loss function, taking into account the model error and symmetry, becomes the following:
Thus, to analyse the influence of λ, changes in Ltotal(W, λ) and the norm ∥W − WT∥ are investigated for different values of λ. With increasing λ, the following is observed:
If λ is too large, W will be “driven” towards symmetric values, which can reduce accuracy because the weights will be less flexible to optimise the underlying loss function f(W).
If λ is too small, symmetry will not emerge, and the weight matrix will be dominated by model error, resulting in a suboptimal weight structure.
In this case, the change in the gradient norm is described by the following expression:
To experimentally confirm the obtained theoretical results using the helicopter turboshaft engines’ (TE) neural network diagnostic model [
38,
39] presented in
Appendix A as an example, the following were obtained: a diagram of the final error
Ltotal depending on
λ (
Figure 1) to assess the model accuracy dependence on the symmetrisation strength; a diagram of the symmetry measure ∥
W −
WT∥ depending on
λ (
Figure 2) to show how an increase in
λ leads to an increase in symmetry; and training curves for different
λ (
Figure 3) to observe the convergence rate and the difference in the final error for various parameter values.
According to
Figure 1, as
λ increases, the symmetry measure decreases, indicating that the weights tend to the neural networks’ more symmetrical configuration. This behaviour of the symmetry measure highlights the possibility of ensuring symmetry with strong regularisation (
λ ≥ 1), while smaller values of
λ allow for more significant deviation from symmetry. Small fluctuations may represent small changes in weight adjustment due to other factors in the training process.
According to
Figure 2, as λ increases,
Ltotal decreases, indicating that the performance improves due to the regularisation effect. However, after the point
λ ≈ 1, further increasing
λ causes
Ltotal to increase, indicating that over-regularisation may lead to underfitting and reduced model accuracy.
According to
Figure 3, at
λ = 0.1, the training curve shows a relatively high initial loss (≈0.5) and slow convergence, indicating insufficient regularisation. At
λ = 0.1, the neural network diagnostic model takes longer to reach a stable minimum, which may reflect slight overfitting. At
λ = 0.5, with a moderate value, the model achieves better convergence, reaching a lower overall loss (the maximum loss does not reach 0.4%) more quickly. It suggests a better balance where regularisation helps the model generalise without significantly limiting the training flexibility. The value of
λ = 1.0 is optimal. At
λ = 1.0, the training curve shows the most desirable training behaviour with fast convergence to a low final loss (the loss is almost eliminated). At
λ = 1.0, the balance between regularisation and flexibility gives the best results. At a high regularisation value (
λ = 1.5), the training curve converges to a higher final loss (loss increases by 2.0 times compared to the results obtained with
λ = 1.0), indicating underfitting. At
λ = 1.5, the neural network diagnostic model is over-constrained, limiting its ability to reduce further error.
3.5. The Symmetry Influence on the Loss Function and Error Dynamics
To analyse the influence of weight symmetry on the loss function Ltotal(W) landscape, a symmetric regularisation is introduced according to (27), where f(W) is the primary loss function, and λ·∥W − WT∥ is the symmetric regularisation. The loss function analysis as a function of weights consists of researching the influence of symmetry on the loss function Ltotal(W) and its change along different directions in the weights’ W space:
Symmetric direction, in which the weight matrix W changes in the symmetric matrices space (where W = WT);
Asymmetric direction, in which the weight matrix W has a component different from WT.
Similar to previous studies, the weights
W are divided into a symmetric part
Ws and an antisymmetric part
Wa according to (48). Then, the loss function is expressed as follows:
The loss function weight analysis consists of studying the loss function behaviour along symmetric and asymmetric directions:
In the symmetric direction, since
W =
Ws and
Wa = 0, the loss function takes the following form:
in which the regularisation term disappears, and the loss function behaviour is determined only by the underlying function
f(
Ws). If
f(
W) is convex in
Ws, then symmetrisation allows one to avoid local minima and focus on the global minimum.
- 2.
For an asymmetric direction
W =
Wa, the loss function will contain a regularisation term:
in which
λ·∥
Wa∥
2 creates an additional term that prevents
Wa from deviating too much from zero. The regularisation contribution tends to minimise the antisymmetric part, facilitating stable optimisation.
To analyse the symmetry influence on the optimisation dynamics and the landscape of the loss function, the loss functions’
Ltotal gradients and curvature are analysed through the Hessian. The loss functions’ gradient is decomposed into gradients by the symmetric and antisymmetric parts according to Equation (33), which shows that symmetric regularisation adds a gradient aimed at reducing the antisymmetric components, which avoids “drift” in antisymmetric directions and thus promotes smooth optimisation. To study the
Ltotal(
W) curvature, the Hessian is calculated as follows:
where
I·
Wa is the indicator matrix for the antisymmetric component. The Hessian’s second part,
, is positive definite, which enhances the loss functions’ convexity along the antisymmetric directions, making unwanted extremes less likely and reducing the probability of becoming stuck in local minima.
To experimentally confirm the obtained theoretical results using the helicopter TE neural network diagnostic model (
Appendix A) example, the following were obtained: a diagram of the loss function
Ltotal along symmetric and asymmetric directions (
Figure 4), which allows one to see the regularisation influence on the loss functions’ stability; a loss function gradients map (
Figure 5), which shows how the gradients direct the weights to the global minimum, avoiding unwanted antisymmetric components; and a Hessian eigenvalues spectrum (
Figure 6), which allows one to analyse the loss functions’ curvature and the regularisation influence.
For the symmetric direction (the “blue curve” in
Figure 4), the loss function diagram is displayed as a smoothed curve that reaches a minimum near
W = 0 with a minimum loss function value of approximately
Ltotal ≈ 0.9. It is noted that along the symmetric direction, there is a relatively small oscillation of the
Ltotal value, which indicates a more stable and predictable evolution of the loss function in the weights’ symmetric directions. The loss function along the asymmetric direction (the “red curve” in
Figure 4) is characterised by significant oscillations reflected in the additional local extrema form. The minimum value in this curve is also near
W = 0, but the overall profile is wavier, and the function reaches values up to
Ltotal ≈ 1.3 and higher. The obtained results indicate a tendency of asymmetric directions to form additional local minima and saddle points, which can complicate optimisation and lead to model instability.
According to
Figure 5, near the coordinates
x = 0 and
y = 0 the gradients take minimal values. Their length is noticeably reduced, indicating the loss functions’
Ltotal possible global minimum zone. In this region, the gradient values along both axes are approximately ∇
xLtotal ≈ 0.1 and ∇
yLtotal ≈ 0.1, which indicates proximity to the state with minimal error, where training slows down. In the zones with high gradients for values
x ≈ ±2 and
y ≈ ±2, the gradients increase to values ∇
xLtotal ≈ 4 and ∇
yLtotal ≈ 4. These vectors represent vital directions for updating the weights, in which the loss function increases steeply. Such a sharp increase in the gradient indicates a loss function’s “steep descent”, accelerating the training process while the weights are significantly far from the minimum. The map is symmetrical concerning the axes
x = 0 and
y = 0. It indicates that the loss function is symmetrical concerning the weight parameters, and the symmetric regularisation presence facilitates the neural network’s easier finding of the optimal direction. This symmetry suggests that the weights will tend to a symmetrical minimum at sufficiently large gradient values, minimising
Ltotal faster. When the neural networks’ weights are far from the minimum at the training stages, the significant gradient presence (up to four) accelerates training, making the network capable of finding the optimum faster. When approaching the minimum point (0, 0), small gradient values help to avoid overtraining and oscillations, maintaining a stable, smooth approach to the loss functions’ minimum.
According to
Figure 6, the eigenvalues are distributed around the mean
μ ≈ 0 with a normal distribution and standard deviation
σ = 1. Most eigenvalues are concentrated from –2 to 2, indicating weak curvature dominance in the corresponding directions. Values ranging outside of these small numbers indicate possible directions with high or low curvature, affecting the loss function
Ltotal landscape’s local properties.
4. Discussion
To study the influence of neural networks’ morphology symmetry, a mathematical model was developed that considers dynamic symmetry for diagnosing complex dynamic objects. The symmetric architecture concept and adaptive parameters were proposed. A function f represents the neural network, ℝn → ℝm, connecting the input data with the diagnostic outputs, where the weights W(l) and biases b(l) of the l-th layer are specified by the matrix (1). For symmetric networks, the weights satisfy condition (3), which reduces the number of unique parameters. A dynamic symmetry function S is introduced, which changes the weights depending on the system’s state (4). At the same time, symmetric training is achieved by minimising the loss function with regularisation that takes into account the deviation from the symmetric state (11), where R(W) is represented by (18). The weights are updated considering the gradient (22), which includes symmetry, and the resulting weights are averaged with the transposed matrix. Based on the obtained results, Theorem 1, “On the symmetric neural network optimisation stability”, is formulated and proven, stating that the weight matrix in a neural network symmetry and positive definiteness ensure the stable optimisation of the loss function with a single global minimum.
Symmetry regularization in neural networks adds additional computational cost to the training process due to the need to control the weight matrices’ symmetry. The main cost is associated with the regularization term R(W) calculation, which includes the difference norms between the weight matrix and its transpose, and with the calculation of the corresponding gradients . These operations require additional matrix operations at each optimisation step, including calculating the matrices’ transpose, addition, and subtraction, which increase the complexity proportionally to the weight matrix size. In addition, the symmetrisation step, where the weights are adjusted by averaging with their transpose, requires additional matrix operations. Thus, the regularization cost increases linearly with the number of layers and quadratically with the number of neurons in a layer. However, such additional costs can be justified by the regularization benefits, such as reducing the number of parameters, improving the optimisation convergence, ensuring the solution’s stability, and preventing overfitting by introducing structural constraints on the network parameters.
The block diagram (
Figure 7) shows the steps from the mathematical model for optimisation and weights symmetrisation.
A mathematical model analyses the behaviour of the weights W in a neural network with symmetric regularisation to prove its operation stability. According to (27), the losses Ltotal(W) include the main error and the regularising term ∥W − WT∥2, which is minimised when W = WT. It is shown that the choice of parameter λ > 0 contributes to the weights’ stability, as it is a value at which small perturbations of W do not lead to significant changes in the loss function. The use of gradient descent allows the weights to be updated according to rule (32), where the gradient contains the symmetric regularisation 2·λ·(W − WT)2 contribution. Stability is achieved in the case of a positive definite Hessian matrix (34), achieved by choosing a λ that satisfies condition (35). It shifts all eigenvalues to the positive region, ensuring the systems’ stability. The introduced energy function E(W) (36) decreases with time, tending to zero when the weights reach the equilibrium state W*, where the losses are minimal. Thus, symmetric regularisation is proven to provide stability, reducing sensitivity to minor disturbances. It allows for stabilising the weights during the neural networks’ training.
The total loss function Ltotal(W) (27) is investigated to analyse the gradient descent convergence with symmetric regularisation, including the main error and the regularising term. The weights are updated at each step according to rule (37), in which regularisation adds a stabilising effect. The neural networks’ training convergence is ensured by decreasing the gradient norm ∥∇WLtotal(W)∥ with an increasing number of iterations. In this case, the training step η is chosen to satisfy condition (40), in which L is the Lipschitz constant. Symmetric regularisation minimises the asymmetry of the weights W by adding the term 2·λ·(W − WT)2, which smooths out changes in the weight norm, prevents sharp fluctuations, and promotes stable convergence. The Lyapunov method with the function V(W) = ∥W − W*∥ proves stability, where W* is the optimal weight. The change in V(W), according to (45), at each step is negative if the conditions for the training step and the regularisation parameter are met, which guarantees a decrease in V(W), and proves the algorithms’ convergence.
To construct a mathematical model of the evolution of the weights W under symmetric regularisation action, the weight change dynamics are described by a differential equations system that considers the loss function primary gradient and the regularising symmetric term. The Ltotal(W) loss function is introduced and presented in Equation (27), including the main error and the regularisation ∥W − WT∥2, minimised at W = WT. The evolution of the weights W(t) is described by Equation (47). The matrix W (48) is decomposed into symmetric Ws and asymmetric Wa parts to analyse the behaviour of the weights W. It is determined that the symmetric part dynamics are determined only by the loss function (50) primary gradient, while the asymmetric part tends to zero under the regularisation action according to (51), whose solution is determined by Equation (52). The obtained solution shows that the antisymmetric part Wa(t) exponentially tends to zero with increasing λ, which ensures the matrix W symmetrisation in time.
The regularisation coefficients’
λ influences the symmetry of the weights W, and the final error
Ltotal (27) is analysed. It includes the main error
f(
W) and the regularising term
λ·∥
W −
WT∥
2, which the weight matrix symmetry degree controls. For small values of
λ, symmetry has a minimal effect on the loss function, and the weights are adjusted to minimise
f(
W). For large values of
λ, regularisation dominates, forcing the symmetry of W, which reduces the accuracy of the diagnostic model. Symmetry is estimated by the deviation metric ∥
W −
WT∥, and the loss function takes the form of Equation (54). It has been experimentally proven (see
Figure 1) that with an increase in
λ, the metric ∥
W −
WT∥ decreases, which increases symmetry, but excessive regularisation (
λ > 1) leads to underfitting and a decrease in the diagnostic model accuracy. Experimentally, for the helicopter TE diagnostic model, it was found (see
Figure 3) that
λ = 1.0 provides the optimal balance between symmetry and flexibility, minimising the error. At
λ = 0.1, slow convergence is observed. At
λ = 0.5, a rapid decrease in error is achieved. At
λ = 1.5, the model is over-limited, which increases the error by two times.
The influence of the loss function and error dynamics symmetry is analysed by introducing symmetric regularisation, which adds a regularisation term
λ·∥
W −
WT∥ to the loss function
Ltotal(
W) (27), minimising the antisymmetric components of the weight matrix
W. The loss function was studied in the symmetric direction (where
W =
WT) and the antisymmetric direction (
W ≠
WT). It is shown that symmetric regularisation contributes to the stabilisation of optimisations by adding a gradient that reduces the antisymmetric components and improves the convexity of loss functions through the second derivative (Hessian). The analysis showed that along the symmetric direction
Ltotal(
Ws, 0) has a more stable behaviour with more minor local minima (
Figure 4, blue curve), and along the antisymmetric direction
Ltotal(0,
Wa), additional extrema arise, complicating the optimisation (
Figure 4, red curve). It experimentally confirmed that symmetric regularisation reduces the loss functions’ oscillations and directs the weights to the global minimum, thereby ensuring the diagnostic models’ stability. The gradients along the symmetric direction show a uniform approach to the minimum, while sharp changes are observed in the antisymmetric regions, as shown in the gradient map (
Figure 5). The Hessian eigenvalues spectrum (
Figure 6) indicates a weak curvature along most directions, confirming symmetric regularisation’s advantage of reducing the probability of becoming stuck in local minima.
The limitations of the research are related to the assumptions used in developing the mathematical models. The developed model examines the neural networks’ symmetric architecture as a critical factor for optimisation stability. Still, in natural systems, additional parameters affecting stability are possible, such as input data noise, nonlinear dependencies, and external disturbances, which were not considered in the current analysis. In this case, the loss function with symmetric regularisation is minimised based on gradient descent, which involves the optimal training step and the regularisation parameter λ selection and does not consider the dynamic changes that influence the neural network training process. The theoretical analysis results, such as Theorem 1, “On the symmetric neural network optimisation stability”, and the Hessians’ behaviour, are limited to the weight matrix positive definition cases. However, they may not apply to neural networks with arbitrary parameters.
The limitations of the conducted research are related to the simplified assumptions used, such as perfect weights positive definiteness and the absence of a significant influence of input data noise. To overcome these limitations, future research is planned to develop adaptive training methods that take into account dynamic changes in network parameters and simulate the influence of real operating conditions, including noise and nonlinearities.
Prospects for further research are related to eliminating the identified limitations and expanding the developed mathematical model application scope. Additional studies will analyse the influence of input data noise and nonlinear dependencies on the stability of optimisations, including modelling actual operating conditions of neural networks. Another promising direction for further research is the development of adaptive training methods that consider dynamic changes in the neural networks’ parameters during the training process (using methods with a variable training step or introducing self-regulation mechanisms for the regularisation parameter λ). Further studies will expand the neural network models’ theoretical foundations with arbitrary parameters, including a case study where the weight matrix is not positive definite. It will include the regularisation of new types of development that ensure optimisation stability even when favourable definiteness conditions are violated. To confirm the effectiveness of the proposed approach, experimental studies will be conducted on actual data, including complex dynamic objects with asynchronous processes, which will allow for assessing the models’ applicability in applied diagnostics and predicting problems.
Future research should also consider using different benchmark datasets to test the proposed model, which will allow us to evaluate its generalizability and applicability in different areas. In addition, it is important to investigate the influence of different types of noise and dynamic factors to determine the robustness of the model under real operating conditions, as well as to validate its effectiveness in more diverse examples.
5. Conclusions
A neural network with a dynamic symmetry mathematical model has been developed, ensuring optimisation and reducing sensitivity to stability under minor disturbances. The symmetric architecture and regularisation with the introduction of a dynamic symmetry function S concept helps to reduce the number of unique parameters and simplifies the training process. Symmetric regularisation minimises the deviation from the symmetric state, which is theoretically proven within the Theorem 1 “On the symmetric neural network optimisation stability” framework, guaranteeing the presence of the loss function single global minimum under the weight matrix’s favourable definiteness conditions.
The analysis of weight W dynamics under symmetric regularisation confirmed that the proposed model ensures an exponential tendency of the antisymmetric components to zero, which stabilises the training process. The Lyapunov method used to prove convergence demonstrated a decrease in the loss function Ltotal(W) and the weights deviation from the equilibrium state at each gradient descent step. The results confirm that symmetric regularisation smooths out weight changes and prevents sharp fluctuations, thereby ensuring the stable convergence of neural network training under conditions with minor disturbances.
It is established that the dynamic symmetry function S(W(l), t) is a mechanism that allows for taking into account changes in the system state and adapting the neural network weights in real time, while maintaining their symmetric properties. This function regulates the balance between the initial symmetric state of the weights W0(l) and their current value W(l), and also takes into account external conditions changing over time t. Formally, the function is defined as S(W(l), t) = a(l)·W(l) + (1 – a(l))·W0(l), where a(l) is a parameter that depends on the system’ current state and regulates the contribution of the initial symmetric state W0(l). Such a model allows for the dynamical control of the weights’ symmetry degree, ensuring a balance between stability and adaptability. This approach’s implementation requires updating the weights W(l) taking into account the change in the symmetry function S(W(l), t) at each training step, as well as imposing additional constraints on the symmetry during the update. This is achieved by introducing a regularizing term into the loss function that minimizes the deviation from symmetry, and applying a correction to the weights through their symmetrisation after each update step.
The experiments using the helicopter turboshaft engines’ neural network diagnostic model showed that the regularisation coefficient λ = 1.0 provides an optimal balance between the weights’ symmetry and the models’ accuracy. At small values of λ (for example, λ = 0.1), slow convergence is observed, and with excessive regularisation (λ > 1.0), the diagnostic models’ error increases due to undertraining. A decrease in antisymmetric components with increasing λ was ensured by introducing the symmetry metric ∥W − WT∥, which ensured the neural network weight optimisation process stability.
It has been experimentally proven that symmetric regularisation reduces loss function oscillations and improves convexity along symmetric directions, reducing the probability of becoming stuck in local minima and accelerating the global optimum achievement. Experimental data and the Hessian eigenvalue spectrum analysis confirmed that the symmetric neural network architecture increases the training algorithm’s stability and efficiency when working with dynamic objects.