Adaptive Neural Network System for Detecting Unauthorised Intrusions Based on Real-Time Traffic Analysis

Vladov, Serhii; Vysotska, Victoria; Lytvyn, Vasyl; Komziuk, Anatolii; Prokudin, Oleksandr; Ostapiuk, Andrii

doi:10.3390/computation13090221

Open AccessArticle

Adaptive Neural Network System for Detecting Unauthorised Intrusions Based on Real-Time Traffic Analysis

by

Serhii Vladov

^1,2,*

,

Victoria Vysotska

^2,3,*

,

Vasyl Lytvyn

³

,

Anatolii Komziuk

⁴,

Oleksandr Prokudin

⁵ and

Andrii Ostapiuk

⁶

¹

Department of Scientific Activity Organization, Kharkiv National University of Internal Affairs, 27, L. Landau Avenue, 61080 Kharkiv, Ukraine

²

Department of Combating Cybercrime, Kharkiv National University of Internal Affairs, 27, L. Landau Avenue, 61080 Kharkiv, Ukraine

³

Information Systems and Networks Department, Lviv Polytechnic National University, 12, Bandera Street, 79013 Lviv, Ukraine

⁴

Department of Administrative Law and Procedure, Kharkiv National University of Internal Affairs, 27, L. Landau Avenue, 61080 Kharkiv, Ukraine

⁵

Department of Organization of Educational and Scientific Training, Kharkiv National University of Internal Affairs, 27, L. Landau Avenue, 61080 Kharkiv, Ukraine

⁶

Lviv State University of Life Safety, 35, Kleparivska Street, 79000 Lviv, Ukraine

^*

Authors to whom correspondence should be addressed.

Computation 2025, 13(9), 221; https://doi.org/10.3390/computation13090221

Submission received: 16 August 2025 / Revised: 2 September 2025 / Accepted: 8 September 2025 / Published: 11 September 2025

(This article belongs to the Section Computational Engineering)

Download

Browse Figures

Versions Notes

Abstract

This article solves the anomalies’ operational detection in the network traffic problem for cyber police units by developing an adaptive neural network platform combining a variational autoencoder with continuous stochastic dynamics of the latent space (integration according to the Euler–Maruyama scheme), a continuous–discrete Kalman filter for latent state estimation, and Hotelling’s T² statistical criterion for deviation detection. This paper implements an online learning mechanism (“on the fly”) via the Euler Euclidean gradient step. Verification includes variational autoencoder training and validation, ROC/PR and confusion matrix analysis, latent representation projections (PCA), and latency measurements during streaming processing. The model’s stable convergence and anomalies’ precise detection with the metrics precision is ≈0.83, recall is ≈0.83, the F1-score is ≈0.83, and the end-to-end delay of 1.5–6.5 ms under 100–1000 sessions/s load was demonstrated experimentally. The computational estimate for typical model parameters is ≈5152 operations for a forward pass and ≈38,944 operations, taking into account batch updating. At the same time, the main bottleneck, the O(m³) term in the Kalman step, was identified. The obtained results’ practical significance lies in the possibility of the developed adaptive neural network platform integrating into cyber police units (integration with Kafka, Spark, or Flink; exporting incidents to SIEM or SOAR; monitoring via Prometheus or Grafana) and in proposing applied optimisation paths for embedded and high-load systems.

Keywords:

neural network system; variational autoencoder; network traffic; latent dynamics; Euler-Maruyama scheme; Hotelling’s T² criterion; cyber police; anomalies

1. Introduction and Existing Research Review

In modern networks, the amount of transmitted traffic and speed are increasing critically, which creates a fertile environment for attackers. Traditional intrusion detection systems (IDSs) [1,2,3] often fail to cope with high-speed channels and new types of attacks. This necessitates a transition to more intelligent and adaptive architectures.

Early IDSs, such as SNORT [4], use known attack signatures. These systems are effective against well-documented threats but are utterly useless against zero-day attacks and modified exploits [4]. The huge rule base becomes challenging to manage and is slow to update.

Statistical traffic modelling methods [5] build a “normal” behaviour profile and record deviations. A key limitation is high sensitivity to the legitimate traffic variability, since they generate a large number of false positives during peak loads or new services [6].

With the transition to machine learning, SVM algorithms [7,8] and decision trees [9,10] have become widespread. These approaches have increased accuracy, but their use requires careful manual feature engineering. At the same time, these approaches do not scale well with an increase in the number of parameters [10].

Chen et al. in [11] proposed using a variational autoencoder to detect anomalies in network traffic. Despite its high accuracy, the method is limited by the need for pre-training on a large “clean” traffic set, which is difficult to obtain in real-world conditions [11].

Tian et al. [12] demonstrated the effectiveness of LSTM in modelling temporal traffic dependencies. However, practice has shown that in the case of rapid changes in load or new protocols, the models quickly degrade without constant retraining on fresh data [12].

Kayacik et al. in [13] combine signature-based and anomaly based methods to create a hybrid IDS. The approach improves accuracy but leaves the problem of automating component updates and synchronising signals between them unsolved [13].

Apache Flink [14] and Spark Streaming [15] integrate machine learning modules for stream processing. Still, researchers Xu J. and Palanisamy B. in [16] note latency during peak loads and problems with the traffic “unit” guaranteeing delivery when units fail.

In all of these approaches, simple models suffer from low accuracy, while complex models suffer from high latency and computational costs. Many methods, such as [17,18], are not suitable for real-time or require expensive infrastructure. However, false positives remain a serious problem because they distract security professionals and create so-called “alert fatigue” [19]. Algorithms that automatically adapt to new conditions and minimise FPR are needed. With increasing traffic, horizontal scaling without losing accuracy remains a challenge. Network-based IDSs must process tens of gigabits per second, which requires lightweight and optimised architectures. Complex systems require constant involvement of analysts for configuration and verification. The alert classification process automation and prioritisation will help reduce staff workload.

There is emerging research on autonomous platforms with self-learning neural networks that adapt on the fly to changing traffic and new types of attacks [20,21]. However, there is no single comprehensive solution yet that can operate in high-speed networks without significant delays.

Thus, based on the above (Table 1), we can conclude that none of the existing approaches combines high accuracy, low latency, scalability, and autonomy at the same time. It justifies the need to create a unified neural network system that, based on real-time traffic analysis, will be able to detect both known and zero-day attacks without constant human intervention.

Based on the existing methods’ shortcomings and limitations (Table 1), from low adaptability and scalability to high latency and maintenance complexity, a single solution is necessary. An adaptive neural network intrusion detection system based on real-time traffic analysis should combine deep learning to identify unknown attacks, a self-regulating architecture to minimise false positives, and easy scalability to work in networks with a gigabit tens throughput. The development of an adaptive neural network system will reduce human involvement in routine tasks, ensure rapid response, and increase the IT infrastructure’s overall resilience to modern threats.

The research aim is to develop and justify the adaptive neural network system architecture for intrusion detection based on real-time network traffic analysis, ensuring high detection accuracy of both known and zero-day attacks with minimal delays and false positives. The research object is an information security system and network traffic monitoring processes in computer networks, including procedures for data collection, packet processing, and classification in order to identify attack signs. The research subjects are deep learning methods and algorithms (autoencoders, recurrent and convolutional neural networks), as well as the software components and the adaptive system hardware architecture responsible for embedding models in streaming traffic processing and adaptive updating without operator intervention.

2. Materials and Methods

2.1. Development of an Intrusion Detection Method Based on Real-Time Traffic Analysis

2.1.1. Development of the Generalised Traffic Mathematical Model

The proposed method is based on a generalised traffic mathematical model, which defines the network flow representation as a multidimensional stochastic process [15,20,22] in which the traffic features vector:

x (t) = {[x_{1} (t), x_{2} (t), \dots, x_{n} (t)]}^{T},

(1)

where each component x_i(t) represents a scalar characteristic (byte rate, number of packets, load entropy, delay estimates, etc.).

The vector x(t) is modelled as a solution to the stochastic differential equation Itô [23]:

d x (t) = f (x (t), t) + G (x (t), t) \cdot d W (t),

(2)

where f: f: ℝⁿ × ℝ⁺ → ℝⁿ describes the deterministic dynamics (i.e., the drift function describing the deterministic part of the evolution), and W(t)—models random fluctuations (diffusion), i.e., it is an n-dimensional Wiener process [24]. The matrix G: ℝⁿ × ℝ⁺ → ℝⁿ^×k specifies the random traffic fluctuations intensity and correlation, i.e., it participates in the diffusion form (2), where the Wiener process dW(t) increments pass through G, forming the covariance:

C o v (d x (t)) = G (x, t) \cdot G {(x, t)}^{T} d t,

(3)

that is, the matrix G determines how the “noise” is distributed between the feature vector components and with what weight it influences their evolution.

In order to approximate the traffic evolution random component with Gaussian noise with understandable statistical properties at small sampling intervals, a normal distribution is produced [22,25]. It means that at a sufficiently small sampling step Δt, the traffic feature vector Δx = x(t + Δt) − x(t) increments obey the multivariate normal law:

∆ x ~ N (f (x, t) ∆ t, G (x, t) \cdot G {(x, t)}^{T} ∆ t),

(4)

that is, random noise, defined by the diffusion matrix G and the Wiener process, generates Gaussian deviations of traffic changes.

From (2) with a small step Δt, we obtain:

μ_{∆} = E [∆ x] = f (x, t) ∆ t, Σ_{∆} = C o v [∆ x] = G (x, t) \cdot G {(x, t)}^{T} ∆ t .

(5)

Then

∆ x ~ N (μ_{∆}, Σ_{∆}),

(6)

For a multivariate normal distribution, a density function [21,26] is used:

p (∆ x) = \frac{1}{{(2 \cdot π)}^{\frac{n}{2}} \cdot \sqrt{|Σ_{∆}|}} \cdot \exp (- \frac{1}{2} \cdot {(∆ x - μ_{∆})}^{T} \cdot Σ_{∆}^{- 1} (∆ x - μ_{∆})) .

(7)

Then the log-likelihood is defined as:

l (∆ x) = \ln p (∆ x) = - \frac{n}{2} \cdot \ln (2 \cdot π) - \frac{1}{2} \cdot \ln (Σ_{∆}) - \frac{1}{2} \cdot {(∆ x - μ_{∆})}^{T} \cdot Σ_{∆}^{- 1} (∆ x - μ_{∆}),

(8)

which allows us to move from the stochastic model (2) to the increments Gaussian distribution explicit form [27], which will enable us to define the energy function [28,29] for detecting anomalies as:

E (x, t) = \frac{1}{2} \cdot {‖x - m (t)‖}_{Σ^{- 1} (t)}^{2},

(9)

where m(t) is the moving average, Σ⁻¹ is the standard distribution covariance matrix.

To estimate the flow parameters, the moving average and covariance are calculated as:

τ_{m} \cdot \frac{d m}{d t} = x (t) - m (t),

(10)

τ_{Σ} \cdot \frac{d Σ}{d t} = (x (t) - m (t)) \cdot {(x (t) - m (t))}^{T} - Σ (t) .

(11)

Thus, based on the above, a generalised traffic mathematical model block diagram is proposed, presented in Figure 1. The developed block diagram shows the network traffic processing conceptual path:

The input multivariate feature vector x(t) is fed to the stochastic equation Itô, which models its evolution with drift f(x, t) and diffusion G(x, t);
Based on these equation solutions, the expected behaviour moving average m(t) and covariance Σ(t) estimates are calculated;
The increments Δx are approximated by the multivariate normal distribution N(f(x, t)Δt, G · G^⊤Δt), which provides the fluctuation statistics analytical description for subsequent anomaly detection.

In this research, the developed block diagram is implemented in an extended autoencoder with latent dynamics, since the hidden state h(t) evolves according to a stochastic equation and encapsulates information about the current traffic, and the variational functionality ensures the model’s optimal reconstruction and adaptation in real time.

2.1.2. Development of an Extended Autoencoder with Latent Dynamics

The proposed extended autoencoder with latent dynamics (Figure 2) combines a classical autoencoder [11,30,31,32] with the continuous (or stochastic) latent representation evolution. A feature vector x(t) is encoded into a latent state z(t), which then evolves in time according to a deterministic or stochastic equation, and a decoder reconstructs from this state the input

\hat{x} (t)

, ensuring the model continuously adapts to changing traffic without being divided into batches.

For each moment t, the input vector x(t) ∈ ℝⁿ is transformed into the Gaussian posterior latent distribution parameters [33]:

μ (t) = W_{μ} \cdot x (t) + b_{μ}, \log d i a g (σ^{2} (t)) = W_{σ} \cdot x (t) + b_{σ}, z (t) ~ N (μ (t), d i a g (σ^{2} (t))),

(12)

A typical case is to calculate the affine drift and diffusion as [34]:

f_{l a t} = A \cdot z (t) + B \cdot x (t), G_{l a t} = C \cdot z (t) + D \cdot x (t),

(13)

where A, B, C, and D are matrix parameters, and W(t) is the vector of Wiener processes.

Based on the hidden state, the restoration of the following type is constructed:

\hat{x} (t) = g_{d e c} (z (t)) = W_{d e c} \cdot z (t) + b_{d e c} .

(14)

When fixing a model with a priority a priori p(z(t)) (usually N(0, I)) at each moment t, the following is optimised:

L (t) = \underset{log- l i k e l i h o o d}{\underset{⏟}{E_{q (z (t)| x (t))} [\ln p (x (t)| z (t))]}} - \underset{l a t e n t r e g u l a r i s a t i o n}{\underset{⏟}{K L (q (z (t)| x (t)) ‖p (z (t))),}}

(15)

where [31,35,36]

E_{q} [\ln p (x| z)] = - \frac{1}{2} \cdot {‖x (t) - \hat{x} (t)‖}^{2} + c o n s t,

(16)

K L (q ‖p) = \frac{1}{2} \cdot \sum_{i = 1}^{m} (μ_{i}^{2} (t) + σ_{i}^{2} (t) - \ln σ_{i}^{2} (t) - 1) .

(17)

By accumulating over the interval [0, T], a complete integral functional of the form is obtained:

J = \int_{0}^{T} (- \frac{1}{2} \cdot {‖x (t) - g_{d e c} (z (t))‖}^{2} - \frac{1}{2} \cdot \sum_{i = 1}^{m} (μ_{i}^{2} (t) + σ_{i}^{2} (t) - \ln σ_{i}^{2} (t) - 1)) d t .

(18)

The parameters Θ = {W_μ, b_μ, W_σ, b_σ, A, B, C, D, W_dec, b_dec} are trained by gradient descent [37] to the maximum (15) as:

\frac{d Θ}{d t} = + η \cdot \nabla_{Θ} J .

(19)

For practical implementation, we discretise by step Δt:

\{\begin{matrix} z_{k + 1} = z_{k} + f_{l a t} (z_{k}, x_{k}) \cdot ∆ t + G_{l a t} (z_{k}, x_{k}) \cdot ∆ W_{k}, \\ Θ_{k + 1} = Θ_{k} + η \cdot \nabla_{Θ} (\ln p (x_{k}| z_{k}) - K L (q ‖p)) \cdot ∆ t, \end{matrix}

(20)

where

d z (t) = f_{l a t} (z, x) d t + G_{l a t} (z, x) d W (t) .

(21)

Using the explicit Euler–Maruyama scheme [38] on the grid t_k = k · Δt, we obtain:

z_{k + 1} = z_{k} + f_{l a t} (z_{k}, x_{k}) ∆ t + G_{l a t} (z_{k}, x_{k}) \cdot ∆ W_{k},

(22)

where

∆ W_{k} = W (t_{k + 1}) - W (t_{k}) ~ N (0, ∆ t \cdot I) .

(23)

The increment mathematical expectation and covariance are defined as:

E (z_{k + 1}| z_{k}) = z_{k} + f_{l a t} (z_{k}, x_{k}) \cdot ∆ t,

(24)

C o v (z_{k + 1} - z_{k}) = G_{l a t} (z_{k}, x_{k}) \cdot G_{l a t} {(z_{k}, x_{k})}^{T} \cdot ∆ t,

(25)

which guarantees first-order accuracy in strong convergence.

In order to discretise the gradient rise for the parameter Θ, it is assumed that the parameter’s continuous “dynamics” are represented in the form:

\frac{d Θ}{d t} = η \cdot \nabla_{Θ} (\ln p (x| z) - K L (q ‖p)) .

(26)

According to the Euler scheme with step Δt, we obtain:

Θ_{k + 1} = Θ_{k} + η \cdot \nabla_{Θ} (\ln p (x_{k}| z_{k}) - K L (q (z_{k}| x_{k}) ‖p (z_{k}))) \cdot ∆ t,

(27)

where

\nabla_{Θ} \ln p (x_{k}| z_{k})

increases the recovering x_k likelihood from z_k,

\nabla_{Θ} K L (q ‖p)

, is a penalty for the posterior q deviation from the prior p, and the latent “runout” is being prevented.

The final algorithm for numerical approximation (20) is presented in Algorithm 1.

Algorithm 1: Numerical approximation algorithm (20).

Given: Δt, speed η, initialisation z₀, Θ₀.
For k = 0 … K − 1:
1. Read the current feature vector x_k.
2. Generate ΔW_k~N(0, Δt · I).
3. Update latent: z_k₊₁ = z_k + f_lat(z_k, x_k) · Δt + G_lat(z_k, x_k) · ΔW_k.
4. Calculate the gradient: g_k = ∇_Θ[ln p(x_k|z_k) − KL(q(z_k|x_k)||p(z_k))].
5. Update parameters: Θ_k₊₁ = Θ_k + η · g_k · Δt.

The developed algorithm’s final solution is the trajectories

{\{z_{k}\}}_{k = 0}^{K}

and

{\{Θ_{k}\}}_{k = 0}^{K}

, which provides an online adaptation of the latent representation and model to streaming data.

Appendix A provides a mathematical justification of what the proposed “Euler gradient step” actually is and how it compares to standard SGD or Adam updates.

2.1.3. Proof of the Optimality Condition

The Euler–Lagrange system for optimal Θ = {W_μ, b_μ, W_σ, b_σ, A, B, C, D, W_dec, b_dec} has the form:

\frac{\partial L}{\partial Θ} - \frac{d}{d t} \cdot (\frac{\partial L}{\partial \dot{Θ}}) = 0,

(28)

where L is the integrand.

To prove the optimality condition, let us consider a functional of the form:

J [Θ] = \int_{t_{0}}^{t_{1}} L (Θ (t), \dot{Θ} (t), t) d t,

(29)

where Θ(t) = (θ₁(t), …, θ_M(t)) is the vector of all parameters {W_μ, b_μ, W_σ, b_σ, A, B, C, D, W_dec, b_dec},

\dot{Θ} (t) = \frac{\partial Θ}{\partial t}

,

L (Θ, \dot{Θ}, t)

is the integrand defining the ELBO increment and regularisers.

The aim is to find the J[Θ] stationarity (extremum) necessary condition with respect to minor variations δΘ(t), which zeroes out the first variation δJ. For this aim, a parameter variation of the form is introduced:

Θ (t) \mapsto Θ (t) + ε \cdot δ Θ (t),

(30)

where the components δΘ(t) are arbitrary but satisfy δΘ(t₀) = δΘ(t₁) = 0 (fixed boundary values). Then:

δ J = \frac{d}{d ε} \cdot J [Θ + ε \cdot δ Θ] |_{ε = 0} = \int_{t_{0}}^{t_{1}} 〈\frac{\partial L}{\partial Θ}, δ Θ〉 + 〈\frac{\partial L}{\partial \dot{Θ}}, δ \dot{Θ}〉 d t,

(31)

where the scalar product means the sum over all components θ_i:

〈\frac{\partial L}{\partial Θ}, δ Θ〉 = \sum_{i = 1}^{M} \frac{\partial L}{\partial θ_{i}} \cdot δ θ_{i}, 〈\frac{\partial L}{\partial \dot{Θ}}, δ \dot{Θ}〉 = \sum_{i = 1}^{M} \frac{\partial L}{\partial {\dot{θ}}_{i}} \cdot δ {\dot{θ}}_{i} .

(32)

In the second sum, we will perform integration by parts for each i:

\int_{t_{0}}^{t_{1}} \frac{\partial L}{\partial {\dot{θ}}_{i}} \cdot δ {\dot{θ}}_{i} d t = {[\frac{\partial L}{\partial {\dot{θ}}_{i}} \cdot δ {\dot{θ}}_{i}]}_{t_{0}}^{t_{1}} - \int_{t_{0}}^{t_{1}} \frac{d}{d t} \cdot (\frac{\partial L}{\partial {\dot{θ}}_{i}}) \cdot δ θ_{i} d t .

(33)

Since δθ_i(t₀) = δθ_i(t₁) = 0, the boundary terms are zeroed out, and summing over i, we obtain:

\int_{t_{0}}^{t_{1}} 〈\frac{\partial L}{\partial \dot{Θ}}, δ \dot{Θ}〉 d t = - \int_{t_{0}}^{t_{1}} 〈\frac{d}{d t} \cdot (\frac{\partial L}{\partial \dot{Θ}}), δ Θ〉 d t .

(34)

Let us substitute this into the expression for δJ (31):

δ J = \int_{t_{0}}^{t_{1}} 〈\frac{\partial L}{\partial Θ}, δ Θ〉 - 〈\frac{d}{d t} \cdot (\frac{\partial L}{\partial \dot{Θ}}), δ Θ〉 d t = \int_{t_{0}}^{t_{1}} 〈\frac{\partial L}{\partial Θ} - \frac{d}{d t} (\frac{\partial L}{\partial \dot{Θ}}), δ Θ〉 d t .

(35)

Since δΘ(t) is an arbitrary function (except for zero boundary conditions), the only way to make δJ vanish for all variations is

\frac{\partial L}{\partial Θ} - \frac{d}{d t} \cdot (\frac{\partial L}{\partial \dot{Θ}}) = 0

(28), which gives the Euler–Lagrange system.

For each component θ_i ∈ Θ, condition (28) takes the form:

\frac{\partial L}{\partial θ_{i}} - \frac{d}{d t} \cdot (\frac{\partial L}{\partial {\dot{θ}}_{i}}) = 0, i = 1 \dots M .

(36)

If we write θ_i according to the list {W_μ, b_μ, W_σ, b_σ, A, B, C, D, W_dec, b_dec}, we obtain exactly the equations system—

\frac{\partial L}{\partial Θ} - \frac{d}{d t} \cdot (\frac{\partial L}{\partial \dot{Θ}}) = 0

(28), which is the optimality condition for the autoencoder parameters with latent dynamics.

2.1.4. Development of a Multivariate Kalman Filtering Model in Latent Space

In the developed method, the autoencoder latent state h(t) evolves according to the stochastic model (2), and the observations are either the original features x(t) or their reconstruction

\hat{x} (t)

. The developed method implements a continuous–discrete Kalman filter, reducing the differential equations system to an equivalent discrete linear model at step Δt.

The continuous linear latent model is based on the fact that the latent space holds an approximation:

d h (t) = A \cdot h (t) d t + B \cdot d W_{h} (t),

(37)

where h(t) ∈ ℝ^m is the hidden state, A ∈ ℝ^m^×^m is the drift matrix (flat linearisation), B ∈ ℝ^m^×^r is the noise matrix, W_h(t) is the r-dimensional Wiener process.

Observations y_k on discrete time stamps t_k = k · Δt are either the full input vector or its reconstruction:

y_{k} = C \cdot h_{k} + D \cdot x_{k} + v_{k},

(38)

where C ∈ ℝ^p^×^m, D ∈ ℝ^p^×ⁿ are observation matrices, ν_k∼N(0, R) is the measurement’s white noise, R ∈ ℝ^p^×^p.

To transform (37) and (38) into a discrete model, using the Equation (2) solution over the interval Δt, the equivalent is obtained:

h_{k + 1} = Φ \cdot h_{k} + w_{k}, y_{k} = C \cdot h_{k} + D \cdot x_{k} + v_{k},

(39)

where Φ is the transition matrix defined as:

Φ = \exp (A \cdot ∆ t), w_{k} = \int_{t_{k}}^{t_{k + 1}} \exp (A \cdot (t_{k + 1} - s) d W_{h} (s)) ~ N (0, Q), Q = \int_{0}^{∆ t} \exp (A \cdot τ) \cdot B \cdot B^{T} \cdot \exp (A^{T} \cdot τ) d τ .

(40)

The discrete Kalman filter algorithm consists of two phases being performed at each step k → k + 1: prediction and correction (Table 2).

2.1.5. The Anomaly Criterion and Statistical Test Justification

To detect deviations in reconstructed traffic, Hotelling’s T² statistic is used, since it generalises the usual z-test to the multivariate case, taking into account pairwise correlations between the residual components. Its application allows us to record with high accuracy (more than 90% [39,40,41,42]) not only large deviations in one coordinate but also “complex” anomalies, when each of many small shifts in itself does not go beyond the norm, but, at the same time, in total indicates an attack. To apply Hotelling’s T² statistic, it is assumed that at the k-th step, an “innovation” (residual) of the following type is obtained:

r_{k} = y_{k} - (C \cdot {\hat{h}}_{k |k - 1} + D \cdot x_{k}) \in R^{p}

(41)

with covariance

S_{k} = C \cdot P_{k |k - 1} \cdot C^{T} + R,

(42)

where P_k_∣_k₋₁ and R are the prior and measurement covariances.

Hotelling’s T² statistics are calculated as:

T_{k}^{2} = r_{k}^{T} \cdot S_{k}^{- 1} \cdot r_{k} .

(43)

According to the error’s multivariate normal distribution properties, in the attack r_k∼N(0, S_k) absence, the value

T_{k}^{2}

is distributed according to the χ²-law with p degrees of freedom; that is,

T_{k}^{2} ~ χ_{p}^{2} .

(44)

To guarantee a given level of false positives α, a threshold is set:

θ = χ_{p, 1 - α}^{2},

(45)

where

χ_{p, 1 - α}^{2}

is the

χ_{p}^{2}

distribution (1 − α)-level quantile. When

T_{k}^{2} > 0

, the anomaly absence hypothesis is rejected at the significance level α.

Formally, Hotelling’s T² statistic is essentially the squared Mahalanobis distance, but our method has three important differences that make it significantly more “model-meaningful” than the traditional anomaly estimate with covariance weighting. When using Hotelling’s T² statistic, the distance is calculated not from the raw observation to the global centre but by innovation (residual), that is, by the difference between the observation and the model prediction (Kalman prediction) according to (41)–(43). In this case, the covariance in the normalisation is not a static sample covariance but an innovation covariance matrix S_k obtained and dynamically updated by the Kalman filter steps (the matrix S_k takes into account both the a priori state P_k_∣_k₋₁ and the covariance of the measurement noise R), so the criterion adapts to the changing traffic structure and time dynamics. The Hotelling’s T² statistic provides a rigorous statistical justification for the threshold via the χ² distribution (false positive rate control), while the traditional Mahalanobis distance is used as a heuristic value without an explicit hypothesis test and without taking into account the model uncertainty. Since the residuals when using Hotelling’s T² statistics are obtained via variational autoencoder with latent dynamics (reconstruction from latent and updating by SDE or Kalman), the errors themselves reflect deviations from the trained multivariate “manifold” of normal traffic and not just from the feature space centre, which increases sensitivity to “complex” anomalies when shifts are small in coordinates but jointly significant. Thus, there is a similarity with Mahalanobis; the applied approach with Hotelling’s T² statistics is a model-adaptive, time-dependent, and statistically rigorous criterion, and not a simple covariance-weighted distance.

Thus, the Hotelling’s T² statistical test choice is justified by its ability to take into account the multivariate residual complete covariance structure, which allows for the anomalies’ complex pattern detection, which is inaccessible to coordinate-based approaches [43], as well as by the strict theoretical properties of Gaussian distributions to ensure an accurate setting of the false-positive rate through the

χ_{p}^{2}

quantile. At the same time, the innovation covariance S_k, dynamically updated by the Kalman algorithm, automatically adapts to changing traffic characteristics, maintaining the test sensitivity to the attacks’ new types while minimising false alarms.

2.1.6. The Developed Method Synthesis

Based on the developed model, a method for detecting intrusions based on real-time traffic analysis is proposed (Figure 3). The developed method is based on constructing a network traffic stochastic dynamic model in the variational autoencoder latent space with the hidden state evolution according to a differential equations system and its subsequent evaluation through a continuously discrete Kalman filter. Anomalies are detected by calculating Hotelling’s T² statistic for filter innovation and comparing it with a threshold level.

At each time step, statistical and information metrics are calculated from the network traffic, which form the input feature vector according to (1). Raw packet data and metadata are pre-processed and aggregated into numerical characteristics that define the vector x_k content according to (1). The neural network encoder transforms the input features into the latent space posterior distribution parameters according to (5) and (6). The latent representation evolves according to the stochastic equation discretised by the Euler–Maruyama method according to (22). The decoder reproduces the original feature estimate from the updated latent based on the linear projection according to (14). A continuum model and discrete observations are specified for the latent space, after which the forecasting and correction phases are performed according to (37)–(40). Based on the calculated innovation and its covariance, Hotelling’s T² criterion is formed to estimate deviations according to (41)–(43). If the T² statistic exceeds the χ² distribution critical value, the anomaly detector, represented as (45), is triggered. The autoencoder and Kalman filter parameters are adjusted in real time using the Euler gradient rule to adapt to changing traffic according to (27).

The final algorithm of the developed method is presented in Algorithm 2.

Algorithm 2: The developed method’s algorithm.

Initialise z₀, Θ₀, P₀.
For k = 0 … K − 1:
1. Read the current feature vector x_k.
2. Encode x_k → (μ_k, Σ_k), sample z_k.
3. Update z_k₊₁ according to the Euler–Maruyama method.
4.

Decode {\hat{x}}_{k + 1}

.
5. Perform prediction and Kalman correction → h_k_+1∣_k₊₁, P_k_+1∣_k₊₁.
6.

Calculate T_{k + 1}^{2}

and compare with the threshold.
7. Mark anomaly if exceeded and update Θ.

2.2. Development of a Neural Network Intrusion Detection System Based on Real-Time Traffic Analysis

The proposed neural network system (Figure 4) implements an end-to-end pipeline for collecting, processing, and analysing network traffic in real time, where a variational autoencoder with latent dynamics and a continuously discrete Kalman filter detects anomalies based on Hotelling’s T² statistics with low latency. Thanks to built-in online additional training, the system automatically adapts to changing traffic conditions with a minimum level of false positives. The modular architecture based on Kubernetes and Kafka ensures integration with corporate SIEM (or SOAR) systems.

The traffic collection and buffering module receives raw network data from observation points (SPAN port, TAP, and NetFlow agent) and ensures their reliable delivery to the system via a distributed Apache Kafka queue, which provides high throughput and horizontal scaling. At the pre- and post-processing stage, the data is aggregated into micro sessions and normalised using Flink (or Spark) Streaming according to a feature set according to (1), including transmission speed, entropy, and delays, after which a single vector x_k is formed for transmission to the variational autoencoder.

The model service is deployed in Kubernetes and includes a TensorFlow implementation of a variational autoencoder and a continuously discrete Kalman filter component: upon receipt of each x_k, the platform computes the reconstruction and Hotelling’s T² statistics according to (41)–(43) via a high-performance gRPC/REST API. The model parameters are adapted “on the fly” using the Euler–Maruyama scheme according to (27): the accumulated batches of input vectors and their latent representations are used to update the autoencoder weights and filter matrices without stopping the service. The anomaly manager compares the obtained Hotelling’s T² statistics values with the critical threshold according to (45) and, upon detection of deviations, generates events in the Kafka “alerts” channel and stores them in Elasticsearch.

To monitor stability and operation quality, the system collects latency, throughput, and detection accuracy (precision or recall) metrics in Prometheus (or Grafana), which allows for automatic scaling (HPA) and model drift tracking. The visualisation dashboard on React (using Shadcn or ui) displays the anomaly-level real-time diagrams, Hotelling’s T² trends, and incident reports, and integration via standardised connectors (Syslog, REST, Kafka) ensures the events transfer to corporate SIEM (or SOAR) systems (Splunk, QRadar, Demisto) for further response automation.

The developed neural network system experimental sample is implemented in the MATLAB R2014b software environment (Figure 5).

The raw network data stream is read from observation points (SPAN port, TAP, NetFlow agent) using the MATLAB Support Package for Kafka or the Java client. Incoming packets are buffered in a ring buffer (matlab.concurrent.Queue), which ensures endurance and delivery reliability even during load surges. The PreprocessStream function sequentially extracts packets from the buffer, combines them into micro-sessions, and aggregates key features (speed, entropy, delays) according to the specification in (1), normalising the results for feeding into the model.

The platform core is the VAEKalmanModel class, implemented on the Deep Learning Toolbox and Control System Toolbox basis. The step x_k method encapsulates the whole cycle: the latent vector encoding and sampling, evolution according to the Euler–Maruyama scheme, decoding and the Kalman filter predicting–correction phases’ execution according to (37)–(40), and Hotelling’s T² statistics calculation according to (41)–(43). The autoencoder parameters and the filter matrix are dynamically updated in the updateParameters method, which, at a given batch interval, performs a gradient step according to the Euler scheme according to (27), maintaining the service without interruption.

The AlertManager block compares the calculated

T_{k}^{2}

values with the critical threshold according to (45) and, if necessary, generates JSON events for the Kafka topic “alerts” or sends them directly to Elasticsearch via an HTTP request. In parallel, the Logger component collects latency, throughput, and detection quality (precision/recall) metrics. It uploads them to Prometheus/Grafana, which allows the monitoring of current performance and automatically scales the platform using HPA.

Using MATLAB App Designer, an interactive dashboard is created that displays Hotelling’s T² dynamics diagrams, error distribution histograms, and a real-time anomaly event feed. For integration with external SIEM/SOAR systems (Splunk, QRadar, Demisto), a SIEMConnector script has been developed that reads accumulated events from Kafka or Elasticsearch and transmits them via standardised REST API and Syslog connectors, ensuring the detection and response pipeline’s complete closure.

Thus, the developed neural network system demonstrates the efficiency of the end-to-end pipeline from traffic collection to anomaly detection with low latency (<100 ms) and a controlled false positive rate. The variational autoencoder integration with latent dynamics and the continuous–discrete Kalman filter [44] ensures the model’s adaptability to changing traffic conditions. Modular implementation in the MATLAB R2014b software environment with online training and built-in monitoring simplifies deployment and operation in corporate SIEM/SOAR environments.

3. Case Study

3.1. Analysis and Pre-Processing of Input Data

The research uses network traffic data obtained over the time interval from 10:00 AM to 1:00 PM. To form the input dataset, network traffic was continuously passively recorded on the key router of the studied subnet using a packet capture tool (e.g., Tshark) from 10:00 AM to 1:00 PM on 2 July 2025, after which the data was aggregated into 1 s time windows (t_k = 1 s). The result was a 10800-sample set, each of which represents a feature vector x_k according to Equation (1). For each window the following were calculated: total number of packets (“Packet count”), total byte volume (“Byte count”), average packet size (“avg_pkt_size” = Byte count/Packet count), sender and receiver port numbers (“src_port”, “dst_port”), network protocol (“Protocol”), packet size entropy or interval distribution within a micro-session (“Entropy”), and average inter-packet interval in milliseconds (“Inter arrival”). The start time of each window was recorded in the “Timestamp” field in the YYYY-MM-DD HH:MM:SS format for subsequent synchronisation with events on network equipment and external monitoring systems (Table 3).

The “Time stamp” field specifies the start time of the micro session in the YYYY MM DD HH:MM:SS format, which allows data to be synchronised with real events in the network. “Packet count” reflects the total number in this session, serving as a basic load metric. “Byte count” shows the total number of transmitted bytes, giving the traffic intensity. “Avg_pkt_size” is calculated as the ratio of the total number of bytes to the number of packets and characterises the average packet size, which is vital for detecting fragmentation anomalies. “Src_port” and “dst_port” are the sender and recipient port numbers, respectively, allowing you to determine the services and protocols involved in the exchange. The protocol specifies the network protocol (TCP, UDP, etc.), which helps to separate traffic by connection types. Entropy measures the packet size distribution entropy or intervals within a session, which allows you to assess the “chaotic” traffic and detect deviations from normal behaviour. “Interarrival” shows the average time (in milliseconds) between successive packets, serving as a delay indicator and bursty data transfer.

At the input dataset pre-processing first stage, the data completeness and quality are analysed by checking for gaps in the “Time stamp” column and step discrepancies (duplicates or missing seconds), as well as outliers and noise values in numerical features (columns “Packet count”, “Byte count”, “avg_pkt_size”, “Entropy”, and “Inter arrival”). For this aim, moving statistics (moving average and standard deviation) are built, and points that go beyond the μ ± 3σ limits are identified. When gaps are detected, either the intervals are normalised (interpolation) or anomalous sessions are deleted according to pre-set rules.

It is noted that the high entropy value in Table 3 is explained primarily by the high variability and mixed nature of the network traffic, the continuous (heavyweight) feature distribution’s contribution, and the entropy estimation method. For discrete estimation, the classical Shannon definition

H (X) = - \sum_{x} p (x) \cdot \log (p (x))

is used, and for continuous features, the corresponding differential entropy is associated with the variance

h (X) = \frac{1}{2} \cdot \ln (2 \cdot π \cdot e \cdot σ^{2})

, so the wide spread and “long tails” of individual features directly increase the entropy estimate. The protocols’ (applications’) shift and periodicity (bursts) within a 1 s window lead to the distribution’s multimodality and, as a consequence, to an increase in the joint entropy. In practice, high entropy means high uncertainty and difficulty in separating outliers by simple marginal rules, but it also indicates that useful information is distributed over complex joint dependencies.

Figure 6 shows that each feature has about 108 missing values (approximately 1% of the total amount), which is within the acceptable threshold for subsequent interpolation or removal without significant data distortion. Figure 7 shows the “Packet count” values time “spread,” which shows a request’s uniform distribution and random gaps, indicating the strong outliers or systematic failures in data collection. These results confirm that the dataset is sufficiently complete and homogeneous for training the developed autoencoder with latent dynamics.

The next stage of pre-processing the input data involves checking for temporal homogeneity. For this purpose, the entire period (3 h) was divided into several equal 30 min windows, after which the distribution of key features in each window was compared. Using the Kolmogorov–Smirnov criteria (for “Packet count” and “Byte count”) and the χ² test (for the categorical field “Protocol”), the extent to which observations in different segments are statistically similar is assessed. If there are significant differences, either additional filtering (an uncharacteristic traffic peaks removal) or the introduction of regression models to compensate for temporal trends may be required. The results of the temporal homogeneity test are presented in Table 4, where for each window (except the first one), the “Packet count” and “Byte count” distribution was compared with the first window using the Kolmogorov–Smirnov criterion [45,46] (α = 0.05, critical value D_crit = 0.054), and the “Protocol” field distribution was compared using the χ² test [47,48] (α = 0.05, df = 1,

χ_{c r i t}^{2}

= 3.84). In Table 4, “*” means that the statistic exceeds the critical value, p < 0.05.

According to Table 4, most of the 30 min segments (windows 2, 3, 5, and 6) do not show statistically significant differences in either the number or the traffic volume (KS test) or the protocol distribution (χ² test), indicating that the traffic is homogeneous over the 3 h. The exception is window 4 (from 11:30 to 12:00) (where “*” means the best values), which shows significant deviations in the “Packet count” and “Byte count” (p_KS) distributions, possibly due to a short-term activity peak or traffic anomaly. However, the “Protocol” distribution remains homogeneous. This homogeneity general confirmation allows using the entire interval for training the model, taking into account a small data filtering from window 4.

To assess the training dataset representativeness (see Table 1), the k-means [49,50,51] clustering method was used. The training dataset of 10,800 elements (the “Packet count” parameter) was randomly divided into training and validation samples in a 2:1 ratio (67% are 7236 values and 33% are 3564 values). When clustering the training part, nine clusters were found (classes I–IX), the metric distance between which does not exceed 0.1, which indicates both subdatasets’ internal structure similarity (see Figure 8). Based on the obtained results, the optimal sample sizes were determined: out of the total training data of 10,800 values, 7236 (67%) constitute the training dataset, and 3564 (33%) constitute the test dataset.

Thus, the training dataset’s preliminary processing made it possible to obtain a statistically homogeneous and representative training dataset, which is paramount in the developed variational autoencoder with latent dynamics and the Kalman filter’s stable operation.

Thus, the dataset contains 10,800 samples, 1 s windows collected between 10:00 and 13:00 (3 h), and the entire set was used for training and validation, randomly split in a 2:1 ratio (7236 training, 3564 validation).

3.2. The Developed Neural Network Platform Testing Results

Before the neural network model training stage, the key features “Packet count”, “Byte count”, “avg_pkt_size”, “Entropy”, and “Inter arrival” distribution histograms were obtained (Figure 9), which allow us to visually assess the distribution’s shape and the outlier’s presence in the entire dataset.

From the above-mentioned model feature distribution histograms (Figure 9), it is evident that the packet count is distributed approximately according to the Poisson law with about 20 average and rare outliers in both directions. The “Byte count” demonstrates a pronounced log-normal distribution with a “long tail,” where a small proportion of sessions have abnormally large amounts. The average packet size (“avg_pkt_size”) also obeys the log-normal law, but with a more moderate asymmetry and single outliers towards larger values. The entropy distribution approximately corresponds to the beta distribution, concentrated closer to small values of 0.1–0.4 s, with a gradual decline towards one. It is also evident from the histograms that the intervals between packets (interarrival) follow an exponential law, where most delays are close to zero, and the long tail indicates rare but significant delays of up to 5 s.

At the neural network model training initial stage, the “Packet count” and “Byte count” values, time series diagrams were obtained for a three-hour interval (Figure 10). They display characteristic periodic fluctuations in network traffic and local abnormal surges or dips in order to analyse the load dynamic variability.

According to Figure 10, for the interval from 12:00 to 15:00, the time series demonstrate clearly expressed periodic fluctuations and isolated anomalies: “Packet count” is characterised by regular bursts with about 60 min, during which the counter value reaches approximately 25–27 packets/min and dips up to 12–15 packets/min, and “Byte count” is approximated by a sinusoid with ≈45 min period, with peaks at the 160,000–170,000 byte level and dips up to the 50,000–60,000 byte level. The local emissions were recorded at the same time, which means a sharp increase in the “Packet count” to ~32 packets/min at 12:30 and its drop to ~11 packets/min at 14:00, as well as a substantial jump in the “Byte count” to ~200,000 bytes at 13:30 and a sharp drop to ~40,000 bytes at 14:30, which require additional analysis, since they may indicate flood attacks or short-term packet losses/network failures.

Next, a histogram of missing values by features is obtained (Figure 11), which clearly demonstrates the percentage or number in each feature. It allows us to assess the data collection quality (the input dataset quality) and justify the need to use interpolation or filtering.

The histogram presented in Figure 11 shows that the missing data proportion in 10,000 observations ranges from ≈1.6% (“Inter arrival”) to ≈9.5% (“Byte count”), indicating traffic collection has high reliability (less than 10% dropouts in the worst case). At a missing data rate of up to 5% (“Packet count”, “avg_pkt_size”, “Entropy”), it is sufficient to adopt simple linear or polynomial interpolation without significant statistical distortion. In contrast, at higher losses of up to 10% (“Byte count”), it is advisable to apply spline smoothing, taking into account trend and seasonality, or pre-filtering of extreme outliers before imputation.

The variational autoencoder training curves (Figure 12) were also obtained, reflecting the loss function (ELBO) dependence on the training and validation datasets on the epoch number. The variational autoencoder training curves make it possible to visually assess the model’s convergence and identify overfitting signs.

According to the autoencoder training curves (Figure 12), two phases are clearly visible: at the beginning (up to the 50th epoch), the ELBO loss functions on the training set monotonically decrease to ≈ –192, which reflects a steady improvement in the latent distribution reconstruction and approximation quality, and the validation ELBO similarly decreases up to the ≈30th epoch, and the gap between the training and validation curves is minimal (~1–2 units), which indicates correct model generalisation. However, after the ≈30th epoch, the validation ELBO starts to grow. At the same time, the training curve dynamics continue to decrease, indicating the onset, in which the autoencoder captures the training data details at the expense of the ability to identify general patterns, so early stopping based on the validation ELBO is used to mitigate it (Figure 13).

In Figure 13, the early stopping epoch occurs at the first iteration, since this is where ELBO on validation reaches its minimum value, after which further training does not lead to an overall improvement in either the training or validation datasets. Deciding to stop at this “best” validation epoch minimises overfitting, preventing the gap between training and validation ELBO from widening and preserving the model’s ability to generalise without “memorising” noise artefacts. The results obtained significantly reduce the computational cost, since subsequent epochs are ineffective in quality improvement. It ensures the autoencoder’s generalisation ability stability, ensuring an optimal balance between the reconstructive loss function and generalisation to unseen data.

Next, the Hotelling’s T² statistics time evolution linear diagram is obtained with the critical threshold (χ²-quantile for the α level) superimposed (Figure 14), which shows the anomaly detector’s triggering moments and frequency when the network traffic deviates from normal behaviour.

The Hotelling’s T² statistics three-hour diagram (Figure 15) shows the critical threshold (χ²-quantile at df = 5 and α = 0.01) at ≈15.09, with anomalies recorded at the moments when Hotelling’s T² statistics exceeds this limit—at 12:30, 13:30, and 14:30—which amounted to the 180 measurements (≈1.7%) with three triggers out, corresponding to the declared false alarm rate of 1%. The obtained result confirms the detector’s high sensitivity according to Hotelling’s T² statistics. The detector identifies multivariate deviations from normal behaviour while remaining within the controlled false alarm rate determined by the χ²-quantile confidence ellipsoid.

The ROC curve (Figure 15a) and the precision–recall curve (Figure 15b), constructed using an attacked and standard sessions test set, allow us to evaluate the balance between sensitivity and the false positive level, as well as the neural network model’s effectiveness when working with unbalanced data.

The ROC curve (Figure 15a) demonstrates the TPR sensitivity dependence on the false positive FPR proportion when choosing the classification threshold: the closer to the upper left corner the curve is, the better the model separates attacked and regular sessions while maintaining a low level of false alarms. In our example, the ROC curve tends to rise and reaches TPR ≈ 1 at FPR ≈ 0.5, which reflects an acceptable compromise between the attack detection completeness and false signals control. The precision–recall curve (Figure 15b) defines the accuracy (

P r e c i s i o n = \frac{T P}{T P + F P}

) depending on the recall (

R e c a l l = \frac{T P}{T P + F N}

), which is an informational parameter for unbalanced classes (30% of attacked sessions): at high recall values (>0.8), the precision drops to ≈ 0.45 due to an increase in the false positives proportion, while the detection and accuracy (precision > 0.6) optimal balance is achieved in the recall range of ≈0.6–0.7.

It is also noted that a decrease in accuracy to ≈0.45 with high recall does not automatically mean that the method is unsuitable—this is a classic trade-off reflection between sensitivity and error rate (

P r e c i s i o n = \frac{T P}{T P + F P}

), (

R e c a l l = \frac{T P}{T P + F N}

), and with a rare class (low prevalence), a low value of base accuracy in the PR curve is typical. Formally, accuracy is related to sensitivity and specificity through

P r e c i s i o n = \frac{π \cdot S e n s i t i v i t y}{π \cdot S e n s i t i v i t y + (1 - S p e c i f i c i t y) \cdot (1 - π)}

, where π is the anomalies proportion, so with small π, even good sensitivity (or specificity) gives low precision. At the same time, the developed method is positioned as an “early warning” if followed by cheap verification (feature enrichment, rule engine, lightweight signature check, or human-in-loop), which reduces the final cost of false positives. In this case, it is possible to select the operating point (threshold) by a predetermined level of false positives or optimise it by Fβ and also to increase precision by aggregating signals over time and sessions (temporal smoothing, majority voting). Additional measures to increase precision include threshold calibration, post-processing (including the Kalman innovation use as a filter), model ensemble, and cost-sensitive training.

The corresponding error matrix obtained for the selected χ²-quantile threshold (α = 0.01) is presented in Table 5.

In the error matrix:

TP (True Positive) is the truly anomalous sessions number correctly labelled as anomalies by the detector (in this example, the value obtained is 25);
FP (False Positive) is the number of regular sessions incorrectly classified as anomalies (in this example, the value obtained is 5);
TN (True Negative) is the number of regular sessions correctly labelled as “norm” (in this example, the value obtained is 95);
FN (False Negative) is the number of anomalous sessions missed by the detector (labelled as “norm”) (in this example, the value obtained is 5).

Using the error matrix allows us to quantitatively evaluate the classifier’s characteristics:

R e c a l l = \frac{T P}{T P + F N} = \frac{25}{30} \approx 0.83

shows the proportion of correctly detected attacks out of all real attacks,

P r e c i s i o n = \frac{T P}{T P + F P} = \frac{25}{30} \approx 0.83

reflects the correct alarm proportion among all triggers, and the F1-score, which combines both metrics

F 1 = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l} \approx 0.83

serves as the detector’s balanced quality integral indicator.

The study also obtained the end-to-end detection time dependence of one session on the throughput (number of packets/sessions per second) diagram (Figure 16), which demonstrates the system’s scalability and ability to maintain real-time mode with increasing network load.

The processing of one session’s end-to-end latency diagram versus the throughput (100–1000 sessions/s) (Figure 16) shows a nearly linear trend with a slight sublinear (logarithmic) component. As the load increases from 100 to 1000 sessions/s, the latency increases from approximately ≈1.5 to ≈6.5 ms. At the same time, a predictable increase in latency without sharp jumps allows us to set the performance limits, and maintaining a low latency level (<10 ms) ensures timely detection of anomalies without noticeable delays in the data flow. In addition, scalability, expressed as less than a fivefold increase in latency with a tenfold increase in load, indicates efficient resource allocation and leaves room for the detection pipeline’s further optimisation.

Latent state trajectory diagrams obtained by projecting the z(t) vector onto 2–3 dimensions (PCA) (Figure 17) allow us to visualise how the model smoothly moves in the latent space between zones of normal and abnormal traffic behaviour.

In the latent vectors z(t) PCA projection diagram (Figure 17), normal behaviour forms a coherent “path” in the PC1–PC2 space that smoothly shifts in time (from dark to light shades), reflecting the network traffic latent representations evolution, while anomalies (highlighted by red dots) sharply go beyond the central cluster, moving to distant areas. The movement smoothness across the cluster indicates that the variational autoencoder has learned the low-dimensional “manifold” of normal behaviour and encodes states with similar features into the latent space, neighbouring points. At the same time, outliers during anomalies demonstrate the model’s ability to clearly separate traffic chronological sections with attacks or bursts. The resulting visualisation confirms that the detector can rely on the distance from the cluster centre or the points’ local density estimate in the latent space for additional anomaly detection when integrating PCA projection into the monitoring pipeline.

At the testing final stage, the model’s key parameters evolution diagram (the drift matrix A and diffusion matrix G elements) was obtained (Figure 18), illustrating the system’s dynamic adaptation during online learning to changing traffic characteristics in accordance with Equation (27).

The drift matrix A and diffusion matrix G parameter adaptation dynamics during online training diagrams (Figure 18) show that the eigenelements A[0, 0] and A[1, 1] change smoothly, following seasonal trends and minor fluctuations, which reflects the model’s gradual adjustment to changing traffic’s structural characteristics (e.g., changes in the average load and correlations between features), while the elements G[0, 0] and G[1, 1] demonstrate a noisier evolution due to their role in describing the latent variables variability (variances and covariances) and the response to short-term anomalous bursts. The adaptation within the stochastic differential Equation (27) framework ensures that the drift matrix A specifies the state “return” direction and speed to equilibrium to account for long-term trends (changes in daily activity patterns). At the same time, the diffusion matrix G quickly regulates the degree of random fluctuations around the drift trajectory during sudden changes (DDoS attacks, network failures), due to which the system remains resistant to slow shifts and at the same time highly adaptive during turbulent traffic periods.

3.3. The Developed Neural Network Platform Computational Complexity Evaluation Results

The developed platform’s computational complexity calculation can be divided into four main stages of processing one feature vector xk (of dimension n) and the model’s subsequent adaptation: encoding (the variational autoencoder encoder), updating the latent state (Euler–Maruyama method), decoding (the variational autoencoder decoder), the continuous–discrete Kalman filter, and updating parameters (gradient step according to the Euler scheme).

The variational autoencoder’s encoder is a fully connected layer series with an order of O(n × h) total number of parameters, where h is the hidden layer average width. The forward pass computational complexity through the encoder is defined as:

O₁ = O(n · h).

(46)

The latent state z_k has dimension m. The Euler–Maruyama step involves multiplications by m × m matrices A and G and a noise vector generation, which gives complexity:

O₂ = O(m²).

(47)

The decoder reconstructs the original vector of dimension p (usually p = n) from the latent, through layers of order O(m × p):

O₃ = O(m · p).

(48)

The continuous–discrete Kalman filter’s basic operations are the multiplication and the covariance matrix inversion of dimension m, which gives a cubic complexity in the latent dimension:

O₄ = O(m³).

(49)

In online training, the encoder (decoder) parameters and matrices A and G are updated by gradient: one step of gradient descent on a batch of size B has a complexity:

O₅ = O(B ⋅ (n · h + m² + m · p)).

(50)

Thus, the asymptotic computational complexity of processing one time step (excluding Kafka or Kubernetes communication delays) is

O = O(n · h + m² + m · p + m³),

(51)

and taking into account online training on batches of B elements:

O = O(n · h + m² + m · p + m³ + B · (n · h + m² + m · p)).

(52)

Taking into account the substituted values used in the study n = 10, h = 64, m = 16, p = 10, B = 32, the following was obtained:

Forward pass (encoder, latent update, decoder, continuous–discrete Kalman filter):

n · h + m² + m · p + m³ = 10 · 64 + 162 + 16 · 10 + 163 = 640 + 256 + 160 + 4096 = 5152

(53)

basic operations per time step.

2.: Online training on a batch of B = 32 vectors:

B ⋅ (n · h + m² + m · p) = 32 · (640 + 256 + 160) = 32 · 1056 = 33,792

(54)

additional operations per gradient descent step.

3.: Total for one time step, taking into account training:

5152 + 33,792 = 38,944

(55)

basic operations, which, with modern CPU/GPU (<100 ms calculations), fit into the platform’s real operating time.

The results show that processing a one-time step without taking into account training requires approximately 5152 basic operations, and taking into account gradient descent on a batch of 32 elements is about 38,944 operations. With modern CPU/GPU architectures, the resulting computational complexity fits into the target time budget (<100 ms), ensuring the system’s timely response in real conditions. Additional optimisation at this stage is not critical. Still, weight quantisation and parallel computation can be used to transfer the platform to resource-limited embedded systems or with a significant increase in the feature dimension.

Figure 19 shows the comparative change in computational costs (in arbitrary units) with increasing latent space dimension m for the exact Kalman filter, low-rank approximation (Woodbury), diagonal approximation, and Ensemble Kalman.

Figure 19 shows the typical increase in computational costs with increasing latent dimension m. According to Figure 19, the exact Kalman filter implementation exhibits a rapid cubic speedup in costs and becomes impractical already for medium and large m (starting from about hundreds of dimensions in the illustrative scheme), while the low-rank approximation (Woodbury) practically repeats the exact method behaviour for small m and significantly reduces the overhead for m ≫ r, moving the “bottleneck” to the inversion of the r-size matrix. The diagonal approximation remains cheap and almost independent of m but loses the inter-component covariance, which usually affects the estimates’ quality, and Ensemble Kalman occupies an intermediate position—it eliminates the O(m³)-unit but entails a factor of e and a quadratic dependence of O(e · m²) and the estimates’ stochasticity.

3.4. The Neural Network Performance Evaluation

The developed neural network efficiency was assessed by the following key metrics: recall, precision, F1-measure, ROC curve, precision–recall curve, and end-to-end detection latency. The efficiency assessment that was conducted showed that the developed neural network demonstrates high accuracy and completeness of recognition with the Recall ≈ 0.83 and Precision ≈ 0.83 values, which ensures a balanced value of F1-measure ≈ 0.83 when detecting anomalies.

ROC curve analysis confirmed the balance between sensitivity (TPR → 1) and the false positive rate (FPR ≈ 0.5), which indicates the detector reliability at different decision thresholds.

The precision–recall curve analysis revealed that at high recall values (recall > 0.8), the accuracy decreases to ≈ 0.45, and the optimal compromise is achieved at recall ≈ 0.6–0.7 and precision > 0.6.

End-to-end latency for processing one session is 1.5–5 ms under loads of 100–1000 sessions/s, which confirms the platform’s ability to operate in real time with low latency and high scalability.

Table 6 provides a comparative analysis of the developed neural network with other neural network architectures that are widely used for detecting data anomalies.

Comparing the results (Table 6), the developed neural network platform demonstrates balanced accuracy and recall of detection (precision = recall = 0.83) with F₁ = 0.83 and ultra-low latency of 1.5–6.5 ms, which ensures strict real-time requirements. At the same time, the LSTM autoencoder and Deep SVDD with CAE show slightly increased precision and recall metrics values (~0.887–0.888) with the corresponding F₁ ≈ 0.887 and F₁ ≈ 0.8825, but their delay is measured in tens to hundreds of milliseconds, due to which the real efficiency guarantee is reduced. The TimeGPT model is inferior in detection quality (F₁ ≈ 0.55) with similar delays. Thus, the developed variational autoencoder with the Kalman platform provides an optimal compromise between the anomaly detection quality and the extremely low latency required for online monitoring and can have practical implementation in cyber police units.

3.5. The Practical Implementation of Obtained Results

The developed adaptive neural network practical implementation intrusion detection system in the cyber police activities includes its module integration into the existing infrastructure and law enforcement agencies’ response: network traffic streaming analysis using Apache Kafka and Spark (or Flink) allows the cyber police to receive aggregated features in real time and instantly detect anomalies according to Hotelling’s T² statistics, after which events about potential attacks are automatically transmitted to SIEM/SOAR systems (Splunk, QRadar, Demisto) through standardised connectors for subsequent investigation and notification of cyber police officers.

Due to the minimal level of false positives (<1%) and latency of 1.5–6.5 ms, the cyber police are able to continuously monitor critical information flows without significant delays and personnel overloading, and the online training module ensures the model’s independent adaptation to new types of threats without the need for intervention by engineers.

Figure 19 shows the developed neural network platform implementation in the cyber police units’ diagram. In the developed diagram (Figure 20), network traffic data from mirror ports enters a fault-tolerant Kafka cluster, where scalable buffering is provided, after which Spark (or Flink) is used to aggregate sessions and extract statistical features (Hotelling’s T², mean, variance) with subsequent normalisation and outlier filtering. At the next stage, a variational autoencoder together with a Kalman filter detects anomalies in real time, makes a binary “norm/threat” decision based on adaptive thresholds, and sends generated alerts to SIEM/SOAR systems (Splunk, QRadar, etc.) for prompt response by the cyber police. The online learning module, based on feedback from analysts, automatically adjusts the model weights and filter parameters to optimise accuracy and reduce false positives [58,59,60,61].

The key element in the platform integration into the corporate cyber police environment is the dashboard screenshots (Figure 21), illustrating the system’s operation. The interactive dashboard provided combines key performance indicators of the adaptive neural network intrusion detection system in real time:

Hotelling’s T² statistics time trend with trigger level (Threshold) allows the tracking of the anomaly dynamics and instantly identifies spikes in deviations from normal network behaviour;
The precision–recall curve demonstrates the balance between the detection completeness and accuracy with the model’s current parameters. It is vital to assess its quality and adjust for false positives.
The traffic load histogram shows the change in load in sessions per second and serves as a basis for understanding peak loads and the business traffic to noise emissions.
Latency versus Sessions per Second illustrates the system’s scalability and ensures that latency SLAs (<100 ms) are met as throughput increases.

4. Discussion

The study developed a neural network system for detecting unauthorised intrusions based on real-time traffic analysis. It is based on the network traffic’s stochastic model, in which a multidimensional feature vector x(t) is considered as a solution to the stochastic Itô Equations (1) and (2), and the increments Δx are approximated by a multidimensional customary law according to (4)–(6). Based on this, an explicit expression for the traffic fluctuations’ log-likelihood (8) is obtained, and an energy function for detecting anomalies is introduced in the form of (9). The data flow with the mean m(t) and covariance Σ(t) moving estimates calculation block diagram is developed and is shown in Figure 1.

An extended variational autoencoder with latent dynamics is developed (Figure 2), in which the input vector x(t) is encoded into the Gaussian posterior latent distribution parameters to (12), after which the hidden state z(t) evolves according to the stochastic equation discretised by the Euler–Maruyama scheme according to (20)–(22). The ELBO functional for each time point is specified through the reconstructive likelihood and the KL penalty according to (15)–(18), and the parameters’ Θ numerical approximation is carried out according to the Euler gradient rule according to (27), implemented in Algorithm 1.

To filter latent states and detect deviations, a continuous–discrete Kalman filter is implemented, where either x(t) or the reconstruction

\hat{x} (t)

according to (37)–(40) and Table 2 act as observations. Anomalies are estimated using Hotelling’s T² statistics according to (41)–(44) and compared with the χ² threshold according to (45). The developed method’s general block diagram, combining VAE with latent dynamics and the Kalman filter, is shown in Figure 3.

Affine drift and diffusion records (matrices A, B, C, and D) were used as a typical and computationally economical special case for latent dynamics. Data restoring is performed via a linear projection for compatibility with the continuous–discrete Kalman filter and to preserve the threefold complexity only in the latent dimension. At the same time, the encoder and decoder are implemented as a sequence of fully connected layers (the VAE encoder is the “fully connected layer series”), and, in general, latent updates are specified through the generalised functions Flat(z_k, x_k) and Glat(z_k, x_k). That is, the developed neural network architecture allows nonlinear mappings and their parameterisation by neural layers. In its implementation, a balance is chosen between expressiveness and delay (complexity) of calculations. Therefore, nonlinear transformations are formally allowed (both the encoder and decoder are neural networks), but in this study, affine (linear) latent dynamics and a linear decoder part parameterisation are used for the sake of Kalman tracking, computational efficiency, and threshold predictability (Hotelling T²). At the same time, an extension to fully nonlinear drift (diffusion) (for example, deep layers in flat or Glat) can be implemented in the developed neural network for further improvement, taking into account the increased computational costs and the need to adapt or replace the Kalman filter (this will neutralise the computational bottleneck O(m³)).

Thus, the developed method is based on the approximation at a small step Δt, where for the Ito model with drift f and diffusion G, the increments Δx are approximated as N(f(x, t)Δt, G · G^⊤ · Δt) according to (2)–(6) and the approximation by normality discussion. On this basis, a filter (prediction or correction) and an anomaly criterion are formed based on the innovation rk and its covariance S_k, with the classical Hotelling statistics

T_{k}^{2} = r_{k}^{⊤} \cdot S_{k}^{- 1} \cdot r_{k}

and a threshold at

χ_{p}^{2}

according to (41)–(45). For real non-Gaussian or mixed flows, it is important to note the following:

The Kalman step itself with dynamic S_k remains useful, since it takes into account the current second moments of errors, but it is optimal by design only for normally distributed noise, i.e., in the heavy tails or components mixture presence. In this case, the quadratic form $r_{k}^{⊤} \cdot S_{k}^{- 1} \cdot r_{k}$ is no longer distributed over χ², and therefore the declared threshold $F_{χ_{p}^{2}}^{2} (1 - α)$ will give a distorted level of false positives (usually too high for heavy-tailed ones).
To account for heavy tails mathematically, one can replace the observation model with a multivariate Student criterion with ν degrees of freedom, where the density is $(r) \propto {(1 + \frac{1}{ν} \cdot r^{⊤} \cdot S^{- 1} \cdot r)}^{- \frac{ν + p}{2}}$ . In this case, the log-likelihood and tests deviate from χ² and require appropriate threshold adjustments or the F-parametrisations, Bayesian or variational estimator use. Similarly, filtering requires either a t-Kalman or EM approach or particle filter methods for correct posterior estimation under strong non-Gaussianity.
Introduce easy means of increasing robustness, while estimating the threshold empirically (the quantile of the nominal distribution T² on the training “clean” sample), using robust covariance estimates (MCD, Huber-M-estimators, or shrinkage), or using ranked or nonparametric tests (bootstrap or permutation) instead of the χ² threshold.

The study proposes the adaptive neural network’s IDS end-to-end modular architecture, in which traffic from SPAN ports, TAP devices, and NetFlow agents via Kafka and Flink (or Spark) Streaming is aggregated into microsessions and normalised into a feature vector x_k (Figure 4), after which it is encoded into the variational autoencoder Gaussian posterior latent distribution parameters and evolves according to the discretised stochastic Euler–Maruyama equation. At the same time, at each step, Hotelling’s T² statistics are calculated from the reconstructed vector, compared with the χ² threshold for instant anomaly detection and online retraining of both the autoencoder parameters and the continuous–discrete Kalman filter, which together ensure low latency (1.5–6.5 ms), high throughput, and the platform’s self-adjustment in dynamic operating conditions.

The study used a set of three-hour network traffic (10:00–13:00), divided into microsessions with the features “Packet count”, “Byte count”, “avg_pkt_size”, “Protocol”, “Entropy”, and “Inter arrival” (a fragment is given in Table 3, summary statistics in Table 4). At the input dataset’s preliminary processing stage, gap analysis, duplicates, and outliers were carried out: the gaps histogram (Figure 6) and the “Packet count” (Figure 7) time series showed less than 1% dropouts and the absence of system failures. The traffic’s homogeneity was checked by dividing it into six 30 min windows and comparing the “Packet count” and “Byte count” distributions with the first window using the KS test and the categorical feature “Protocol” using the χ² test, revealing significant deviations only in the window from 11:30 to 12:00 (Table 4). The cluster analysis performed using the k-means method (Figure 8) identified nine balanced clusters, which justified the dataset’s dividing proportions into training (67%) and validation (33%) subdatasets.

A computational experiment was conducted to evaluate the developed platform. The input feature distributions analysis shows the expected statistical laws (packet count is approximately Poisson, byte volume and average packet size are log-normal, and entropy and intervals are beta/exponential), which is illustrated by the histograms in Figure 9. In contrast, the “Packet count” and “Byte count” time series reveal periodicity and local traffic spikes that coincide with the recorded anomalies (Figure 10). The obtained share of feature gaps and the imputation strategies applied are justified by the gap diagram (Figure 11). The variational autoencoder training demonstrates stable convergence and the overfitting risk after the 30th epoch, which is successfully mitigated by early stopping (Figure 12 and Figure 13). Hotelling’s T² statistics over time with a critical threshold show the deviation moments (Figure 14), and the ROC and precision–recall curves provide a quantitative representation of the trade-off between recall and accuracy (Figure 15), while the obtained values of the confusion matrix and F1-score confirm the detector balance. Performance testing shows that the processing latency remains within real-time limits with increasing load (≈1.5–6.5 ms for 100–1000 sessions/s), as reflected in Figure 16, and the latent path projection (PCA) demonstrates a clear separation of normal and abnormal behaviour in the latent space (Figure 17). The model parameters’ adaptation dynamics (the drift A and diffusion G matrices’ elements) confirm the platform’s ability to smoothly adapt to traffic changes and, at the same time, quickly respond to bursts (Figure 18). Thus, the obtained results demonstrate the proposed approach’s practical feasibility, since the developed neural network model combines anomaly detection with a controlled level of false positives and low latency. At the same time, the limitations remain the observed tendency to local overfitting during long-term training and high FPR at some thresholds, which requires additional fine-tuning of the threshold and validation on more diverse load scenarios.

The study estimates the computational costs according to the platform implementation stages: encoding (VAE encoder), latent state evolution step (Euler–Maruyama scheme), decoding, continuous–discrete Kalman filter and gradient step parameter update, and derives asymptotic estimates of their complexity O = O(n · h + m² + m · p + m³) for one time step and an additional term B · (n · h + m² + m · p) for online training with a B-sized batch (Equations (46)–(52)); when substituting the experimental values n = 10, h = 64, m = 16, p = 10, and B = 32, the obtained orders of magnitude are ≈5152 basic operations per pass and ≈38,944 operations taking into account the gradient step, which formally fits into the target time budget (<100 ms) on modern CPUs/GPUs. The experimental latency-throughput relations confirm the practical feasibility of the estimate: with an increase in the load from 100 to 1000 sessions/s, the delay predictably increases to ≈1.5–6.5 ms, which indicates the service’s high scalability in real monitoring conditions. The age complexity analysis reveals a bottleneck in the O(m³) term, caused by operations with the covariance matrix and its inversion in the Kalman step, as well as a significant contribution of online learning, scaled by B. Therefore, when porting to embedded or highly loaded systems, it is advisable to reduce the latent space size, the covariance (or apply Woodbury-type formulas to speed up accesses), use low-rank (or diagonal) approximations, weight quantisation, pruning and computational acceleration on GPU/TPU, or replace the exact Kalman filter with approximate filters with an upper error bound to reduce the cubic component. The architectural solution with distributed queues and containerisation (see the platform block diagram, Figure 20) provides the ability to horizontally scale and flexibly distribute the computational load between the pre-processing service, the model, and the training subsystem, which facilitates the proposed optimisation’s practical implementation in production.

The obtained results’ practical implementation is aimed at the adaptive neural network platform’s direct integration into the cyber police infrastructure. Streaming collection and aggregation of traffic via Apache Kafka and Spark (or Flink), statistic and feature calculation in real time with anomalies’ subsequent detection according to the Hotelling’s T² criterion, and events’ automatic sending to SIEM (or SOAR, Splunk, QRadar, Demisto) via standardised connectors are described in the text and illustrated by the deployment block diagram (Figure 19). Due to the measured characteristics (false positive rate < 1% and end-to-end latency of 1.5–6.5 ms), the platform ensures continuous monitoring of critical channels without significant workload on personnel, and the online training module allows the system to independently adapt to new types, reducing the need for engineers to intervene. An interactive monitoring panel presence (see Figure 20 for dashboard screenshots) with Hotelling’s T² trends, precision (as well as recall) metrics, load histogram, and latency diagrams simplifies operational control and fine-tuning of detection thresholds in real time. The implementation variability is confirmed both by the experimental implementation in MATLAB (Figure 5) and by the industrial architecture on Kubernetes (or TensorFlow) with metrics export to Prometheus (or Grafana) and autoscaling (HPA), which ensures scalability and fault tolerance in production use. Based on these, limitations of the results obtained and prospects for further research are presented in Table 7 and Table 8.

It is noted that the study estimates the bottleneck in the cubic term O(m³), i.e., in operations with the covariance matrix and its inversion in the Kalman step according to (46)–(55). The manuscript does not provide practical experiments with low-rank or hard-diagonal approximations of the covariance, but it recommends using approaches such as Woodbury or low-rank with diagonal to reduce complexity. Technically, this is performed as follows: the approximation P ≈ D + U · S · U^⊤, where D is the diagonal part, U ∈ ℝ^m^×^r, r ≪ m) allows us to use the Woodbury identity

{(D + U \cdot S \cdot U^{⊤})}^{- 1} = D^{- 1} - D^{- 1} \cdot U \cdot {(S^{- 1} + U^{⊤} \cdot D^{- 1} \cdot U)}^{- 1} \cdot U^{⊤} \cdot D^{- 1},

(56)

which translates the inversion cost into a complexity of order O(m · r² + r³) instead of O(m³), and the diagonal approximation yields a trivial inversion of O(m). Thus, the following can be proposed:

To empirically compare the three-point modes r = {2, 4, 8} with m = 16 used and measure the impact on the latent estimate and detection metrics;
To estimate the filter error when approximating through the Eckart–Young norm ‖P − P_r‖₂ and relate it to the deviations in the Kalman gain matrix K;
Consider ensemble (EnKF) or partial (sparse or structured) filters as an alternative for very large m.

This will give a quantitative curve of “accuracy versus latency” and will objectively show at what r the time gains do not lead to a significant loss in detection quality, which will make the proposed optimisations practically justified.

It is also noted that the latency estimates in the study are obtained for the 100–1000 sessions/s range and do not guarantee behaviour under extreme loads (>10k sessions/s): from the asymptotic of one step O(n · h + m² + m · p + m³) it is clear that as the load increases, the bottleneck is determined by the cubic term O(m³), that is, the covariance matrix inversion in the Kalman step, and simple extrapolation can lead to a significant increase in queues and delays. For the main scenarios, it is advisable to include in the manuscript and the further experiments methods programme for reducing complexity (reducing the latent m dimensionality, approximations P ≈ D + U ⋅ S ⋅ U^⊤ using the Woodbury identity (56), giving complexity of order O(m · r² + r³) for r ≪ m, diagonal approximations), as well as approximate filters (EnKF, sparse or structured Kalman, particle filters).

5. Conclusions

The adaptive neural network system for detecting unauthorised intrusions based on real-time traffic analysis architecture has been developed, combining a variational autoencoder with continuous stochastic dynamics of the latent space (updated according to the Euler–Maruyama scheme), a continuous–discrete Kalman filter for filtering the latent state, and Hotelling’s T² statistical criterion for detecting deviations. At the same time, an “on the fly” mechanism introduction for updating the model parameters (Euler Euclidean gradient step) ensures a low end-to-end delay of 1.5–6.5 ms under a load of 100–1000 sessions/s, as well as an explicit separation of normal and abnormal trajectories in the latent space and the ability to adapt to traffic drift.

It has been experimentally shown that the proposed platform achieves balanced quality metrics: precision is 0.83, recall is 0.83, and F1-score is ≈0.83, while the end-to-end delay in processing one microsession is 1.5–6.5 ms in the 100–1000 sessions/s load range. The false positives proportion was estimated below 1% at the selected χ² threshold. The ROC/PR curves and error matrix analysis give the threshold’s specific working areas (optimum recall is ≈0.6–0.7 for precision > 0.6), which allows us to quickly adjust the trade-off between recall and accuracy.

The computational complexity and feasibility evaluation results show that for typical model sizes (n = 10, h = 64, m = 16, p = 10, B = 32), one pass requires ≈5152 basic operations, and taking into account the batch gradient step, it is ≈38,944 operations. The obtained costs are within the target time budget on modern CPUs/GPUs (<100 ms). At the same time, a bottleneck was identified in the cubic term O(m³) associated with the covariance matrix inversion in the Kalman step. It provides specific optimisation directions (reducing the latent size m, low-rank or diagonal approximations, Woodbury techniques, quantisation (or pruning), and approximate filters).

The practical implementation is shown on a prototype with streaming collection integration (Kafka with Spark or Flink), incident export to SIEM (or SOAR), and monitoring in Prometheus (or Grafana) (dashboard with Hotelling’s T² trends and quality metrics), which confirms the possibility of the developed architecture of the adaptive intrusion detection system for implementation in cyber police units.

Author Contributions

Conceptualisation, S.V., V.V., V.L., O.P. and A.O.; methodology, S.V., V.V., V.L. and A.K.; software, S.V., V.V., V.L. and A.K.; validation, V.V., V.L., O.P. and A.O.; formal analysis, S.V., V.V. and V.L.; investigation, V.V., V.L. and A.K.; resources, S.V., V.V., V.L. and A.K.; data curation, S.V., A.K., O.P. and A.O.; writing—original draft preparation, S.V., V.V. and V.L.; writing—review and editing, A.K., O.P. and A.O.; visualisation, S.V., V.V., V.L. and A.K.; supervision, S.V., V.V., V.L. and A.K.; project administration, S.V., A.K., O.P. and A.O.; funding acquisition, V.V. and V.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The research was carried out with the grant support of the Ministry of Education and Science of Ukraine, “Methods and tools for detecting disinformation in social networks based on deep learning technologies” under Project No. 0125U001852. During the preparation of this manuscript/study, the author(s) used [ChatGPT 4o Available, Gemini 2.5 flash, Grammarly] to correct the style and improve the quality of the text, as well as to eliminate grammatical errors. The research results obtained in the article are entirely original. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The explicit “Euler” step proposed in the study is a basic gradient flow discretisation [62], which relates it to three classes of approaches:

Stochastic gradient distribution (SGD) is interpreted as a gradient flow stochastic discretisation with noise, which in its ultimate form gives the training process SDE interpretation [63];
Methods for Bayesian sampling such as SGLD are formally derived as the Euler–Maruyama scheme for Langevin dynamics—an explicit step plus controlled Gaussian noise yields a posteriori sampling [64];
Adaptive and moment methods (Adam, AMSGrad, etc.) effectively implement preconditioned and/or the gradient flow momentum discretisations. They introduce sliding estimates of the first (or second) moments and diagonal preconditioning, which significantly changes both the stability and the optimisation scale compared to a simple explicit step [65], comparable to pSGLD for preconditioned stochastic Langevin [66].

At the same time, the methods with momentum analysis through the corresponding continuous ordinary differential equations shows that accelerating schemes are equivalent to higher-order systems and have different requirements for the step choice [67].

These key studies provide a unified theoretical and practical context, since [63] showed that SGD on random mini-batches approximates SDE with diffusion generated by the stochastic gradient variance, and thus discrete iterations can be viewed as the stochastic dynamics’ numerical integration. Ref. [63] formalised this view for Bayesian sampling, showing that SGLD is the Euler–Maruyama scheme for the Langevin SDE (explicit step plus controlled Gaussian noise). Ref. [64] showed that adaptive methods introduce momentum and diagonal preconditioning simultaneously, which mathematically corresponds to preconditioned or momentum discretisations of the gradient field. Whereas in [66], the preconditioning idea is related to SGLD by showing how adaptive scaling changes both the deterministic and stochastic parts of the equivalent SDE, and it is shown in [67] that the acceleration methods (Nesterov) [68,69,70] have a natural continuous interpretation in second-order ordinary differential equation terms. Together, these results lead to the classical continuous gradient flow and its natural discretisation by an explicit Euler step. It is this simple, explicit step that serves as the basic numerical method from which differences in behaviour (noise, preconditioning, inertia, and step constraints for stability) are interpreted and analysed as adding relevant components to the equivalent continuous dynamics.

To describe the classical continuous gradient flow and the explicit Euler step, it is assumed that L(Θ) is a functional (ELBO in this study). The gradient flow continuous dynamics during maximisation is written as:

\frac{d Θ (t)}{d t} = \nabla_{Θ} L (Θ (t)) .

(A1)

The explicit Euler scheme with step h gives a discretisation

Θ_{k + 1} = Θ_{k} + h \cdot \nabla_{Θ} L (Θ (t)) .

(A2)

This is the scheme used in this study, presented in the form (27):

Θ_{k + 1} = Θ_{k} + η \cdot g_{k} \cdot ∆ t,

(A3)

where

g_{k} = \nabla_{Θ} (\ln p (x_{k}| z_{k}) - K L (q (z_{k}| x_{k}) ‖p (z_{k}))),

(A4)

and

h = η \cdot ∆ t .

(A5)

To determine the explicit step stability (a quadratic example is taken) for the quadratic functional

L (Θ) = \frac{1}{2} \cdot Θ^{⊤} \cdot H \cdot Θ

(gradient ∇L = H ∙ Θ), the update yields a linear scheme Θ_k₊₁ = (I + h ∙ H) ∙ Θ_k. The eigenvalue modulus 1 + h ∙ λ_i(H) must be ≤1 for the asymptotic stability under minimisation with the opposite sign. For gradient descent, the classical step condition has the form:

0 < h < \frac{2}{L}, L = λ_{m a x} (H) .

(A6)

Thus, the explicit Euler step requires a bound on the step h for numerical stability (in this study, this is related to the η and Δt choice).

To establish a connection with the SGD and SDE interpretation, it is assumed that the stochastic gradient (mini-batch) gives an update of the form:

Θ_{k + 1} = Θ_{k} - h \cdot g (Θ_{k}, ξ_{k}),

(A7)

where 𝔼_ξ[g(Θ, ξ)] = ∇L(Θ).

In the mathematical limit, such a discrete random update can be viewed as an explicit Euler–Maruyama scheme for a stochastic differential equation (SDE) of the form:

d Θ_{t} = - \nabla L (Θ_{t}) d t + Σ {(Θ_{t})}^{\frac{1}{2}} d W_{t},

(A8)

where the noise is due to the stochastic gradient variance, whence a noise term of order

\sqrt{h}

appears in discrete form. This statement characterises the standard relations between SGD and SDE, for which detailed development and interpretation, including the constant-rate SGD stationary distribution derivation, is given in [63].

When describing SGLD as an explicit Euler–Maruyama scheme for Langevin, if we add specially constructed Gaussian noise to the update, we get SGLD of the form:

Θ_{k + 1} = Θ_{k} - h \cdot \nabla_{Θ} \tilde{L} (Θ_{k}) + 2 \cdot \sqrt{h} \cdot ξ_{k}, ξ_{k} ~ N (0, I),

(A9)

which is the explicit Euler–Maruyama scheme for the SDE

d Θ = - \nabla L (Θ) d t + \sqrt{2} \cdot d W_{t}

and is used for a posteriori sampling [64]. This highlights that the “Euler step” is a basic discretisation of the constraint dynamics, where adding controlled noise changes the problem from an optimisation to a sampling problem.

To demonstrate the differences between the Adam optimizer (as well as other adaptive methods) and the explicit Euler step, it is assumed that Adam is defined recursively as:

m_{k + 1} = β_{1} \cdot m_{k} + (1 - β_{1}) \cdot g_{k}, v_{k + 1} = β_{2} \cdot v_{k} + (1 - β_{1}) \cdot g_{k}^{2}, Θ_{k + 1} = Θ_{k} - α \cdot \frac{{\hat{m}}_{k + 1}}{\sqrt{{\hat{v}}_{k + 1}} + ε},

(A10)

where

{\hat{m}}_{k + 1}

and

{\hat{v}}_{k + 1}

are the offset corrections.

This is not a simple explicit Euler scheme for a regular gradient flow, since the Adam optimizer implements both

Momentum (second order approximation or inertia);
The gradients’ adaptive diagonal pre-conditioning, i.e., the pre-conditioned or impulse flow discretisation.

In the continuum limit, such methods are often approximated by second-order ordinary differential equations systems or by preconditioned gradient flows according to [66].

Thus, the current update Θ_k₊₁ = Θ_k + η ∙ g_k ∙ Δt is an explicit Euler scheme (explicit discretisation of gradient flow or gradient ascent), presented in the study as (27), which is correct and computationally cheap, but imposes restrictions on the step h = η ∙ Δt choice in stability and convergence rate terms.

Figure A1 illustrates how the classical continuous gradient flow for a quadratic function

L (θ) = \frac{1}{2} \cdot θ^{2}

(analytic exponential decay) relates to discrete schemes and stochastic variants.

Figure A1. Continuous gradient flow depending on discrete or stochastic updates.

The small step explicit Euler (h = 0.5) approximates a continuous flow, but, at large steps, oscillations (h ≈ 1.5) and instability (h > 2) already begin to appear, which demonstrates the requirement for the step size

h < \frac{2}{L}

for stability at L = 1. Adding stochasticity (SGD) leads to fluctuations around the deterministic descent, while SGLD with noise preserves marginal noise fluctuations, approximating sampling from the posterior distribution at small steps. The Adam model in 1D shows a smoother and more adaptive decay due to momentum and adaptive gradient scaling, which reduces the sensitivity to the step size choice compared to simple explicit Euler.

Thus, Figure A1 highlights that the “Euler step” is the basic gradient flow discretisation from which SGD (noise), SGLD (intentional sampling noise), and adaptive and momentum methods (preconditioning or inertia) differ.

References

Diana, L.; Dini, P.; Paolini, D. Overview on Intrusion Detection Systems for Computers Networking Security. Computers 2025, 14, 87. [Google Scholar] [CrossRef]
Dini, P.; Elhanashi, A.; Begni, A.; Saponara, S.; Zheng, Q.; Gasmi, K. Overview on Intrusion Detection Systems Design Exploiting Machine Learning for Networking Cybersecurity. Appl. Sci. 2023, 13, 7507. [Google Scholar] [CrossRef]
Taheri, R.; Ahmadzadeh, M.; Kharazmi, M.R. A New Approach for Feature Selection in Intrusion Detection System. Cumhur. Dent. J. 2015, 36, 1344–1357. [Google Scholar]
Roesch, M. Snort—Lightweight Intrusion Detection for Networks. In Proceedings of the LISA ’99: 13th Systems Administration Conference, Seattle, WA, USA, 7–12 November 1999; pp. 229–238. [Google Scholar]
Ye, N.; Ehiabor, T.; Zhang, Y. First-order versus High-order Stochastic Models for Computer Intrusion Detection. Qual. Reliab. Eng. 2002, 18, 243–250. [Google Scholar] [CrossRef]
Ye, N.; Emran, S.M.; Chen, Q.; Vilbert, S. Multivariate Statistical Analysis of Audit Trails for Host-Based Intrusion Detection. IEEE Trans. Comput. 2002, 51, 810–820. [Google Scholar] [CrossRef]
Yuan, R.; Li, Z.; Guan, X.; Xu, L. An SVM-Based Machine Learning Method for Accurate Internet Traffic Classification. Inf. Syst. Front. 2008, 12, 149–156. [Google Scholar] [CrossRef]
Abuali, K.M.; Nissirat, L.; Al-Samawi, A. Advancing Network Security with AI: SVM-Based Deep Learning for Intrusion Detection. Sensors 2023, 23, 8959. [Google Scholar] [CrossRef]
Enache, A.-C.; Patriciu, V.V. Intrusions Detection Based on Support Vector Machine Optimized with Swarm Intelligence. In Proceedings of the 2014 IEEE 9th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 15–17 May 2014; pp. 153–158. [Google Scholar] [CrossRef]
Mukkamala, S.; Janoski, G.; Sung, A. Intrusion Detection Using Neural Networks and Support Vector Machines. In Proceedings of the 2002 International Joint Conference on Neural Networks, IJCNN’02 (Cat. No. 02CH37290), Honolulu, HI, USA, 12–17 May 2002; pp. 1702–1707. [Google Scholar] [CrossRef]
Chen, Z.; Yeo, C.K.; Lee, B.S.; Lau, C.T. Autoencoder-Based Network Anomaly Detection. In Proceedings of the 2018 Wireless Telecommunications Symposium (WTS), Phoenix, AZ, USA, 17–20 April 2018. [Google Scholar] [CrossRef]
Tian, Y.; Zhang, K.; Li, J.; Lin, X.; Yang, B. LSTM-Based Traffic Flow Prediction with Missing Data. Neurocomputing 2018, 318, 297–305. [Google Scholar] [CrossRef]
Kayacik, H.G.; Zincir-Heywood, A.N.; Heywood, M.I. On the Capability of an SOM Based Intrusion Detection System. In Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, 20–24 July 2003; Volume 3, pp. 1808–1813. [Google Scholar] [CrossRef]
Mezati, M.; Aouria, I. Flink-ML: Machine Learning in Apache Flink. Braz. J. Technol. 2024, 7, e74577. [Google Scholar] [CrossRef]
Odysseos, L.; Herodotou, H. Exploring System and Machine Learning Performance Interactions When Tuning Distributed Data Stream Applications. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW), Kuala Lumpur, Malaysia, 9 May 2022; pp. 24–29. [Google Scholar] [CrossRef]
Xu, J.; Palanisamy, B. Model-Based Reinforcement Learning for Elastic Stream Processing in Edge Computing. In Proceedings of the 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC), Bengaluru, India, 17–20 December 2021; pp. 292–301. [Google Scholar] [CrossRef]
Kwon, H.-Y.; Kim, T.; Lee, M.-K. Advanced Intrusion Detection Combining Signature-Based and Behavior-Based Detection Methods. Electronics 2022, 11, 867. [Google Scholar] [CrossRef]
Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J.; Alazab, A. Hybrid Intrusion Detection System Based on the Stacking Ensemble of C5 Decision Tree Classifier and One Class Support Vector Machine. Electronics 2020, 9, 173. [Google Scholar] [CrossRef]
Sommer, R.; Paxson, V. Outside the Closed World: On Using Machine Learning for Network Intrusion Detection. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 16–19 May 2010; pp. 305–316. [Google Scholar] [CrossRef]
Shone, N.; Ngoc, T.N.; Phai, V.D.; Shi, Q. A Deep Learning Approach to Network Intrusion Detection. IEEE Trans. Emerg. Top. Comput. Intell. 2018, 2, 41–50. [Google Scholar] [CrossRef]
Li, X.J.; Ma, M.; Sun, Y. An Adaptive Deep Learning Neural Network Model to Enhance Machine-Learning-Based Classifiers for Intrusion Detection in Smart Grids. Algorithms 2023, 16, 288. [Google Scholar] [CrossRef]
Mouhous, F.; Aissani, D.; Farhi, N. A Generalized M/G/c/c Stochastic Model for Road Traffic Flow. In Proceedings of the 2024 International Conference of the African Federation of Operational Research Societies (AFROS), Tlemcen, Algeria, 3–5 November 2024. [Google Scholar] [CrossRef]
Shyla, S.; Bhatnagar, V.; Bali, V.; Bali, S. Optimization of Intrusion Detection Systems Determined by Ameliorated HNADAM-SGD Algorithm. Electronics 2022, 11, 507. [Google Scholar] [CrossRef]
Daniel, N.; Kaiser, F.K.; Giladi, S.; Sharabi, S.; Moyal, R.; Shpolyansky, S.; Murillo, A.; Elyashar, A.; Puzis, R. Labeling Network Intrusion Detection System (NIDS) Rules with MITRE ATT & Techniques: Machine Learning vs. Large Language Models. Big Data Cogn. Comput. 2025, 9, 23. [Google Scholar] [CrossRef]
Nabi, F.; Zhou, X. Enhancing Intrusion Detection Systems through Dimensionality Reduction: A Comparative Study of Machine Learning Techniques for Cyber Security. Cyber Secur. Appl. 2024, 2, 100033. [Google Scholar] [CrossRef]
Morzelona, R.; Mirajkar, R.R. Implementation and Evaluation of Intrusion Detection Systems Using Machine Learning Classifiers on Network Traffic Data. Res. J. Comput. Syst. Eng. 2023, 4, 103–116. [Google Scholar] [CrossRef]
Xu, R.; Zhang, Q.; Zhang, Y. TSSAN: Time-Space Separable Attention Network for Intrusion Detection. IEEE Access 2024, 12, 98734–98749. [Google Scholar] [CrossRef]
Vladov, S.; Shmelov, Y.; Petchenko, M. A Neuro-Fuzzy Expert System for the Control and Diagnostics of Helicopters Aircraft Engines Technical State. CEUR Workshop Proc. 2021, 3013, 40–52. [Google Scholar]
Xue, D.; El-Farra, N.H. Optimal Sensor and Actuator Scheduling in Sampled-Data Control of Spatially Distributed Processes. IFAC-Pap. 2018, 51, 327–332. [Google Scholar] [CrossRef]
Vladov, S.; Vysotska, V.; Sokurenko, V.; Muzychuk, O.; Nazarkevych, M.; Lytvyn, V. Neural Network System for Predicting Anomalous Data in Applied Sensor Systems. Appl. Syst. Innov. 2024, 7, 88. [Google Scholar] [CrossRef]
Vladov, S.; Yakovliev, R.; Vysotska, V.; Nazarkevych, M.; Lytvyn, V. The Method of Restoring Lost Information from Sensors Based on Auto-Associative Neural Networks. Appl. Syst. Innov. 2024, 7, 53. [Google Scholar] [CrossRef]
Shin, S.Y.; Kim, H. Extended Autoencoder for Novelty Detection with Reconstruction along Projection Pathway. Appl. Sci. 2020, 10, 4497. [Google Scholar] [CrossRef]
Connor, M.; Rozell, C. Representing Closed Transformation Paths in Encoded Network Latent Space. Proc. AAAI Conf. Artif. Intell. 2020, 34, 3666–3675. [Google Scholar] [CrossRef]
Lu, R.; Zheng, D.; Yang, Q.; Cao, W.; Zhu, C. Anomaly Detection for Non-Stationary Rotating Machinery Based on Signal Transform and Memory-Guided Multi-Scale Feature Reconstruction. Eng. Appl. Artif. Intell. 2025, 154, 110824. [Google Scholar] [CrossRef]
Mahdi, Z.; Abdalhussien, N.; Mahmood, N.; Zaki, R. Detection of Real-Time Distributed Denial-of-Service (DDoS) Attacks on Internet of Things (IoT) Networks Using Machine Learning Algorithms. Comput. Mater. Contin. 2024, 80, 2139–2159. [Google Scholar] [CrossRef]
Sachenko, A.; Kochan, V.; Turchenko, V. Intelligent Distributed Sensor Network. In Proceedings of the IMTC/98 Conference Proceedings IEEE Instrumentation and Measurement Technology Conference, Where Instrumentation is Going (Cat. No. 98CH36222), St. Paul, MN, USA, 18–21 May 1998; Volume 1, pp. 60–66. [Google Scholar] [CrossRef]
Bodyanskiy, Y.; Shafronenko, A.; Pliss, I. Clusterization of Vector and Matrix Data Arrays Using the Combined Evolutionary Method of Fish Schools. Syst. Res. Inf. Technol. 2022, 4, 79–87. [Google Scholar] [CrossRef]
Yu, Y. Convergence of Relative Entropy for Euler–Maruyama Scheme to Stochastic Differential Equations with Additive Noise. Entropy 2024, 26, 232. [Google Scholar] [CrossRef] [PubMed]
Guirette-Barbosa, O.A.; Castañeda-Burciaga, S.; Ramírez-Salazar, M.A.; Cruz-Domínguez, O.; Carrera-Escobedo, J.L.; Velázquez-Macías, J.d.J.; Lara-Torres, C.G.; Celaya-Padilla, J.M.; Durán-Muñoz, H.A. Transforming Quality into Results: A Multivariate Analysis with Hotelling’s T2 on the Impact of ISO 9001. Systems 2025, 13, 226. [Google Scholar] [CrossRef]
Wu, T.-L. Improved Test for High-Dimensional Mean Vectors and Covariance Matrices Using Random Projection. Mathematics 2025, 13, 2060. [Google Scholar] [CrossRef]
Choi, H.; Jung, K. Impact of Data Distribution and Bootstrap Setting on Anomaly Detection Using Isolation Forest in Process Quality Control. Entropy 2025, 27, 761. [Google Scholar] [CrossRef]
Ma, F.; Ji, C.; Wang, J.; Sun, W.; Tang, X.; Jiang, Z. MOLA: Enhancing Industrial Process Monitoring Using a Multi-Block Orthogonal Long Short-Term Memory Autoencoder. Processes 2024, 12, 2824. [Google Scholar] [CrossRef]
Wu, T.-L.; Li, P. Projected Tests for High-Dimensional Covariance Matrices. J. Stat. Plan. Inference 2020, 207, 73–85. [Google Scholar] [CrossRef]
Vladov, S.; Muzychuk, O.; Vysotska, V.; Yurko, A.; Uhryn, D. Modified Kalman Filter with Chebyshev Points Based on a Recurrent Neural Network for Automatic Control System Measuring Channels Diagnosing and Parring off Failures. Int. J. Image Graph. Signal Process. 2024, 16, 36–61. [Google Scholar] [CrossRef]
Nguyen, D.-T.; Nguyen, T.-K.; Ahmad, Z.; Kim, J.-M. A Reliable Pipeline Leak Detection Method Using Acoustic Emission with Time Difference of Arrival and Kolmogorov–Smirnov Test. Sensors 2023, 23, 9296. [Google Scholar] [CrossRef]
Kovtun, V.; Izonin, I.; Gregus, M. Model of Functioning of the Centralized Wireless Information Ecosystem Focused on Multimedia Streaming. Egypt. Inform. J. 2022, 23, 89–96. [Google Scholar] [CrossRef]
Vladov, S.; Shmelov, Y.; Yakovliev, R. Optimization of Helicopters Aircraft Engine Working Process Using Neural Networks Technologies. CEUR Workshop Proc. 2022, 3171, 1639–1656. [Google Scholar]
Vladov, S.; Shmelov, Y.; Yakovliev, R.; Stushchanskyi, Y.; Havryliuk, Y. Neural Network Method for Controlling the Helicopters Turboshaft Engines Free Turbine Speed at Flight Modes. CEUR Workshop Proc. 2023, 3426, 89–108. [Google Scholar]
Vladov, S.; Sachenko, A.; Sokurenko, V.; Muzychuk, O.; Vysotska, V. Helicopters Turboshaft Engines Neural Network Modeling under Sensor Failure. J. Sens. Actuator Netw. 2024, 13, 66. [Google Scholar] [CrossRef]
Wang, X.; Zhang, Y.; Bai, N.; Yu, Q.; Wang, Q. Class-Imbalanced Time Series Anomaly Detection Method Based on Cost-Sensitive Hybrid Network. Expert Syst. Appl. 2024, 238, 122192. [Google Scholar] [CrossRef]
Jeong, S.; Lee, J. Soft-Output Detector Using Multi-Layer Perceptron for Bit-Patterned Media Recording. Appl. Sci. 2022, 12, 620. [Google Scholar] [CrossRef]
Kim, T.; Kim, J.; You, I. An Anomaly Detection Method Based on Multiple LSTM-Autoencoder Models for In-Vehicle Network. Electronics 2023, 12, 3543. [Google Scholar] [CrossRef]
Mahmoud, M.; Kasem, M.; Abdallah, A.; Kang, H.S. AE-LSTM: Autoencoder with LSTM-Based Intrusion Detection in IoT. In Proceedings of the 2022 International Telecommunications Conference (ITC-Egypt), Alexandria, Egypt, 26–28 July 2022. [Google Scholar] [CrossRef]
Aktar, S.; Nur, A.Y. Robust Anomaly Detection in IoT Networks Using Deep SVDD and Contractive Autoencoder. In Proceedings of the 2024 IEEE International Systems Conference (SysCon), Montreal, QC, Canada, 15–18 April 2024. [Google Scholar] [CrossRef]
Ghani, H.; Virdee, B.; Salekzamankhani, S. A Deep Learning Approach for Network Intrusion Detection Using a Small Features Vector. J. Cybersecur. Priv. 2023, 3, 451–463. [Google Scholar] [CrossRef]
Liao, W.; Wang, S.; Yang, D.; Yang, Z.; Fang, J.; Rehtanz, C.; Porté-Agel, F. TimeGPT in Load Forecasting: A Large Time Series Model Perspective. Appl. Energy 2025, 379, 124973. [Google Scholar] [CrossRef]
Zhang, X.; Chen, Y.; He, B.; Song, Z.; Kano, M. TimeGPT-Based Multi-Step-Ahead Key Quality Indicator Forecasting for Industrial Processes. Control. Eng. Pract. 2025, 164, 106410. [Google Scholar] [CrossRef]
Lopes, J.F.; Barbon Junior, S.; de Melo, L.F. Online Meta-Recommendation of CUSUM Hyperparameters for Enhanced Drift Detection. Sensors 2025, 25, 2787. [Google Scholar] [CrossRef]
Ablamskyi, S.; Nenia, O.; Drozd, V.; Havryliuk, L. Substantial Violation of Human Rights and Freedoms as a Prerequisite for Inadmissibility of Evidence. Justicia 2021, 26, 47–56. [Google Scholar] [CrossRef]
Ablamskyi, S.; Tchobo, D.L.R.; Romaniuk, V.; Šimić, G.; Ilchyshyn, N. Assessing the Responsibilities of the International Criminal Court in the Investigation of War Crimes in Ukraine. Novum. Jus. 2023, 17, 353–374. [Google Scholar] [CrossRef]
Botwicz, J.; Sapiecha, P. Online Data Compression for Efficient String Matching in Reconfigurable Hardware. IFAC Proc. Vol. 2006, 39, 156–162. [Google Scholar] [CrossRef]
Parsaei, M.R.; Taheri, R.; Javidan, R. Perusing the Effect of Discretization of Data on Accuracy of Predicting Naïve Bayes Algorithm. J. Curr. Res. Sci. 2016, 1, 457. [Google Scholar]
Alquier, P. Approximate Bayesian Inference. Entropy 2020, 22, 1272. [Google Scholar] [CrossRef] [PubMed]
Wu, Z.; Huang, Z.; Wu, S.; Yu, Z.; Zhu, L.; Yang, L. Accelerating Convergence of Langevin Dynamics via Adaptive Irreversible Perturbations. Mathematics 2023, 12, 118. [Google Scholar] [CrossRef]
Ali, Z.; Abebe, M.A.; Nazir, T. Strong Convergence of Euler-Type Methods for Nonlinear Fractional Stochastic Differential Equations without Singular Kernel. Mathematics 2024, 12, 2890. [Google Scholar] [CrossRef]
García Trillos, N.; Kaplan, Z.; Sanz-Alonso, D. Variational Characterizations of Local Entropy and Heat Regularization in Deep Learning. Entropy 2019, 21, 511. [Google Scholar] [CrossRef]
Ross, I.M. Generating Nesterov’s Accelerated Gradient Algorithm by Using Optimal Control Theory for Optimization. J. Comput. Appl. Math. 2023, 423, 114968. [Google Scholar] [CrossRef]
Vladov, S.; Shmelov, Y.; Yakovliev, R. Methodology for Control of Helicopters Aircraft Engines Technical State in Flight Modes Using Neural Networks. CEUR Workshop Proc. 2022, 3137, 108–125. [Google Scholar] [CrossRef]
Vladov, S.; Shmelov, Y.; Yakovliev, R.; Petchenko, M.; Drozdova, S. Neural Network Method for Helicopters Turboshaft Engines Working Process Parameters Identification at Flight Modes. In Proceedings of the 2022 IEEE 4th International Conference on Modern Electrical and Energy System (MEES), Kremenchuk, Ukraine, 20–23 October 2022; pp. 604–609. [Google Scholar] [CrossRef]
Su, W.; Boyd, S.; Candes, E. A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights. J. Mach. Learn. Res. 2016, 17, 1–43. [Google Scholar]

Figure 1. Generalised traffic mathematical model block diagram.

Figure 2. The proposed autoencoder has a generalised architecture for latent dynamics.

Figure 3. The developed method block diagram.

Figure 4. The developed neural network platform block diagram.

Figure 5. The developed neural network system experimental sample block diagram.

Figure 6. An omission distribution by features histogram.

Figure 7. The “Packet_count” values time “scatter” diagram.

Figure 8. Cluster analysis results: (a) is the training subdataset (7236 values), (b) is the test subdataset (3564 values).

Figure 9. The “Packet count” (a), “Byte count” (b), “avg_pkt_size” (c), “Entropy” (d), “Inter arrival” (e) before training distribution histograms.

Figure 10. The metrics “Packet count” (a) and “Byte count” (b) for a three-hour interval time series diagram.

Figure 11. Histogram of missing values by features.

Figure 12. The variational autoencoder training curves.

Figure 13. The variational autoencoder training curves with early stopping.

Figure 14. The Hotelling’s T² statistics time evolution diagram.

Figure 15. ROC (a) and precision–recall (b) curves.

Figure 16. The processing latency versus throughput diagram.

Figure 17. Latent state trajectory diagram.

Figure 18. The model’s parameter adaptation diagrams: (a) the drift matrix A(t) elements’ adaptation; (b) the diffusion matrix G(t) elements’ adaptation.

Figure 19. Comparative computer cost depending on latent dimension for Kalman-based options.

Figure 20. The developed neural network platform implementation in the cyber police units’ block diagram.

Figure 21. Grafana interface’s dashboard screenshots with Hotelling’s T² statistics trend diagrams, precision and recall metrics, load and latency.

Table 1. The previous research analysis in the subject area.

Approach	Advantages	Disadvantages	References
Signature (SNORT)	Simplicity, high accuracy on known attacks	Does not recognise unknown attacks, time-consuming rule updating	[4]
Statistical analysis	Signatureless anomaly detection	High false positive rate when legitimate traffic changes	[5,6]
SVM and Decision Trees	Better accuracy than statistics	Requires manual feature extraction, poor scalability	[7,8,9,10]
Autoencoders	Deep anomaly detection	Need for pure pre-training, high computational load	[11]
LSTM	Time-dependent modelling	Degradation when traffic changes, need for additional training	[12]
Hybrid IDS	The advantages of the combination of the signature and anomaly methods.	Complex synchronisation of components, manual tuning	[13]
Streaming Solutions (Spark/Flink)	Real-time processing	Delays during peak loads, problems with data delivery reliability	[14,15,16]
Autonomous self-learning networks	Adaptation to new attacks on the fly	Not yet mature, poorly tested in high-speed environments	[17,18,19,20,21]

Table 2. Discrete Kalman filter algorithm.

Stage Number	Stage Name	Phase	Analytical Expression
1	Prediction	State	${\hat{h}}_{k \|k - 1} = Φ \cdot {\hat{h}}_{k - 1 \|k - 1}$
1	Prediction	Covariance	$P_{k \|k - 1} = Φ \cdot P_{k - 1 \|k - 1} \cdot Φ^{T} + Q$
2	Update	Innovation (residual)	$r_{k} = y_{k} - (C \cdot {\hat{h}}_{k \|k - 1} + D \cdot x_{k})$
		Innovation covariance	$S_{k} = C \cdot P_{k \|k - 1} \cdot C^{T} + R$
		Kalman gain	$K_{k} = P_{k \|k - 1} \cdot C^{T} \cdot S_{k}^{- 1}$
		State correction	${\hat{h}}_{k \|k} = {\hat{h}}_{k \|k - 1} + K_{k} \cdot r_{k}$
		Covariance correction	$P_{k \|k} = (I - K_{k} \cdot C) \cdot P_{k \|k - 1}$

Table 3. The training dataset fragment.

Number	Time Stamp	Packet Count	Byte Count	avg_pkt_size	src_Port	dst_Port	Protocol	Entropy	Inter Arrival, ms
1	2025-07-16 10:00:00	12	8760	730.0	50,234	80	TCP	3.21	15.4
2	2025-07-16 10:00:01	8	4920	615.0	50,234	80	TCP	2.87	17.8
3	2025-07-16 10:00:02	20	10,240	512.0	34,567	443	TCP	4.05	8.1
4	2025-07-16 10:00:03	5	2500	500.0	34,567	443	TCP	1.95	22.3
5	2025-07-16 10:00:04	15	12,000	800.0	49,152	53	UDP	2.10	12.7
6	2025-07-16 10:00:05	25	15,000	600.0	49,152	53	UDP	3.75	6.4
7	2025-07-16 10:00:06	10	7000	700.0	12,345	22	TCP	2.50	18.9
8	2025-07-16 10:00:07	30	24,000	800.0	12,345	22	TCP	4.80	5.2
…	…	…	…	…	…	…	…	…	…
10,800	2025-07-16 13:00:00	18	14,400	800.0	23,456	25	UDP	3.20	9.6

Table 4. The temporal homogeneity test results.

Window	D_{p_c}	D_crit	P_{KS, p_c}	D_{b_c}	D_crit	P_{KS, b_c}	χ²	$χ_{c r i t}^{2}$	pχ²
Window 2 (10:30–11:00)	0.032	0.054	0.73	0.045	0.054	0.31	1.28	3.84	0.26
Window 3 (11:00–11:30)	0.028	0.054	0.85	0.038	0.054	0.52	0.96	3.84	0.33
Window 4 (11:30–12:00)	0.061 *	0.054	0.02 *	0.072 *	0.054	0.004 *	2.45	3.84	0.12
Window 5 (12:00–12:30)	0.035	0.054	0.68	0.049	0.054	0.19	1.10	3.84	0.29
Window 6 (12:30–13:00)	0.041	0.054	0.49	0.053	0.054	0.06	1.75	3.84	0.19

Table 5. Error matrix.

	Predicted “Norm”	Predicted “Anomaly”
Actually, “norm”	TN = 95	FP = 5
Actually, “anomaly”	FN = 5	TP = 25

Table 6. Comparative analysis results.

Architecture	Precision	Recall	F1-Score	Latency
Developed a neural network	0.83	0.83	0.83	1.5–6.5 ms
LSTM Autoencoder [52,53]	≈0.888	≈0.888	0.887	tens-hundreds of ms (real time is not guaranteed)
Deep SVDD with Contractive Autoencoder (CAE) [54,55]	0.8877	0.8874	0.8825	tens-hundreds of ms
TimeGPT [56,57]	–	–	0.55	tens-hundreds of ms

Table 7. Limitations of the results obtained.

Number	Limitation	Justification	Consequences and Recommendations
1	Limited amount and data representativeness	The experimental sample is the microsessions’ three-hour collection (10:00–13:00) from one subnet; analysis and visualisations over a three-hour interval.	Results may not generalise to other hours/subnetworks, so we need to expand the collection (multiple days, different networks) and cross-validate across sources.
2	Few anomalies noted (class imbalance)	The testing used an attack share of ≈30%; the error matrix and absolute numbers (TP = 25, FN = 5) indicate a limited number of attack examples.	Limit the statistical reliability of estimates (especially FN/TP). It requires collecting more real/simulated attacks and using stratified validation.
3	Tendency to local overfitting during long-term training	The validation ELBO starts to grow after ≈ the 30th epoch, which is the overfitting risk. Early stopping was applied (see Figure 12 and Figure 13).	Regularisation and expansion of the validation datasets, data drift control, and periodic training rebalancing are necessary.
4	FPR sensitivity to threshold choice (TPR/FPR trade-off point)	It requires threshold fine-tuning in production, adaptive thresholding (context-aware), and integration with human verification (or reaction) to reduce false alarms.	Thresholds require fine-tuning in production, adaptive thresholding (context-aware) and integration with human verification/reaction to reduce false alarms.
5	Gaps and partial data in attributes	The missing proportion features are the ≈1.6%–9.5% (e.g., “Byte count” is ~9.5%), imputations (and splines) were used.	Imputation can introduce bias at high loss; improvements in collection, the models used to account for missing data, and robustness assessment to missingness are recommended.
6	The computational bottleneck is the Kalman step’s cubic complexity	The main contribution comes from the O(m³) term from the covariance inversion in the Kalman filter; as m and p increase, the load grows rapidly.	For embedded (or highly loaded) systems: reduce the latent size m, use low-rank (or diagonal) approximations, Woodbury techniques, approximate filters, quantisation (or pruning).
7	Limited range of load scenarios in tests	The load was estimated at 100–1000 sessions/s (latency 1.5–6.5 ms), no results for extremely high (and long-term) loads.	Conduct stress tests at higher speeds (Gbps), long-term testing, and assess resilience to changing traffic patterns.

Table 8. Prospects for further research.

Number	Prospects for Further Research	Justification	Expected Results
1	Expanding the dataset to improve generalisability	More data needs to be collected across different timeframes and subnets, including data on a broader range of attacks (e.g., DDoS, SQL injection, phishing).	Increasing the model’s robustness to different types of traffic and improving the results’ generalisability to other conditions.
2	Increase in the number of abnormal data (class balance)	To improve the model’s statistical significance and accuracy, it is necessary to collect more anomaly examples or use synthetic anomalous data generation methods (e.g., GAN).	Improving the model’s quality in terms of accuracy and recall (or precision and recall) with fewer false positives.
3	Regularisation and adaptive training methods development	To combat overfitting during long-term training, new regularisation approaches, online training methods, and dynamic adaptation are required.	Reducing the overfitting risk and increasing the model’s flexibility during long-term operation.
4	Dynamic thresholds development for anomaly detection	Detection thresholds adaptation for different scenarios and loads using contextual information (e.g., taking into account traffic types or seasonality).	Improved detection accuracy and reduced false alarm rate under changing system operating conditions.
5	Improving methods for handling gaps in data	To improve the system’s reliability in partial data loss conditions, it is necessary to develop more efficient methods for accounting for gaps, including missing values modelling.	Increasing the model’s resilience to data loss and improving its performance in real-world monitoring scenarios.
6	Computing resources and model complexity optimisation	For scalability on embedded (or high-load) systems, computational load reduction techniques such as covariance approximations and GPU/TPU usage should be considered.	Reducing computational costs and increasing the model’s speed when processing large amounts of data.
7	Model’s performance under extreme loads evaluation	To determine the model’s limits, it is necessary to conduct tests under high and long-term loads (for example, under cyberattacks or with large traffic amounts).	The system’s stable operation under peak loads and extreme scenarios is ensured.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vladov, S.; Vysotska, V.; Lytvyn, V.; Komziuk, A.; Prokudin, O.; Ostapiuk, A. Adaptive Neural Network System for Detecting Unauthorised Intrusions Based on Real-Time Traffic Analysis. Computation 2025, 13, 221. https://doi.org/10.3390/computation13090221

AMA Style

Vladov S, Vysotska V, Lytvyn V, Komziuk A, Prokudin O, Ostapiuk A. Adaptive Neural Network System for Detecting Unauthorised Intrusions Based on Real-Time Traffic Analysis. Computation. 2025; 13(9):221. https://doi.org/10.3390/computation13090221

Chicago/Turabian Style

Vladov, Serhii, Victoria Vysotska, Vasyl Lytvyn, Anatolii Komziuk, Oleksandr Prokudin, and Andrii Ostapiuk. 2025. "Adaptive Neural Network System for Detecting Unauthorised Intrusions Based on Real-Time Traffic Analysis" Computation 13, no. 9: 221. https://doi.org/10.3390/computation13090221

APA Style

Vladov, S., Vysotska, V., Lytvyn, V., Komziuk, A., Prokudin, O., & Ostapiuk, A. (2025). Adaptive Neural Network System for Detecting Unauthorised Intrusions Based on Real-Time Traffic Analysis. Computation, 13(9), 221. https://doi.org/10.3390/computation13090221

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Neural Network System for Detecting Unauthorised Intrusions Based on Real-Time Traffic Analysis

Abstract

1. Introduction and Existing Research Review

2. Materials and Methods

2.1. Development of an Intrusion Detection Method Based on Real-Time Traffic Analysis

2.1.1. Development of the Generalised Traffic Mathematical Model

2.1.2. Development of an Extended Autoencoder with Latent Dynamics

2.1.3. Proof of the Optimality Condition

2.1.4. Development of a Multivariate Kalman Filtering Model in Latent Space

2.1.5. The Anomaly Criterion and Statistical Test Justification

2.1.6. The Developed Method Synthesis

2.2. Development of a Neural Network Intrusion Detection System Based on Real-Time Traffic Analysis

3. Case Study

3.1. Analysis and Pre-Processing of Input Data

3.2. The Developed Neural Network Platform Testing Results

3.3. The Developed Neural Network Platform Computational Complexity Evaluation Results

3.4. The Neural Network Performance Evaluation

3.5. The Practical Implementation of Obtained Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI