1. Introduction and Existing Research Review
In modern networks, the amount of transmitted traffic and speed are increasing critically, which creates a fertile environment for attackers. Traditional intrusion detection systems (IDSs) [
1,
2,
3] often fail to cope with high-speed channels and new types of attacks. This necessitates a transition to more intelligent and adaptive architectures.
Early IDSs, such as SNORT [
4], use known attack signatures. These systems are effective against well-documented threats but are utterly useless against zero-day attacks and modified exploits [
4]. The huge rule base becomes challenging to manage and is slow to update.
Statistical traffic modelling methods [
5] build a “normal” behaviour profile and record deviations. A key limitation is high sensitivity to the legitimate traffic variability, since they generate a large number of false positives during peak loads or new services [
6].
With the transition to machine learning, SVM algorithms [
7,
8] and decision trees [
9,
10] have become widespread. These approaches have increased accuracy, but their use requires careful manual feature engineering. At the same time, these approaches do not scale well with an increase in the number of parameters [
10].
Chen et al. in [
11] proposed using a variational autoencoder to detect anomalies in network traffic. Despite its high accuracy, the method is limited by the need for pre-training on a large “clean” traffic set, which is difficult to obtain in real-world conditions [
11].
Tian et al. [
12] demonstrated the effectiveness of LSTM in modelling temporal traffic dependencies. However, practice has shown that in the case of rapid changes in load or new protocols, the models quickly degrade without constant retraining on fresh data [
12].
Kayacik et al. in [
13] combine signature-based and anomaly based methods to create a hybrid IDS. The approach improves accuracy but leaves the problem of automating component updates and synchronising signals between them unsolved [
13].
Apache Flink [
14] and Spark Streaming [
15] integrate machine learning modules for stream processing. Still, researchers Xu J. and Palanisamy B. in [
16] note latency during peak loads and problems with the traffic “unit” guaranteeing delivery when units fail.
In all of these approaches, simple models suffer from low accuracy, while complex models suffer from high latency and computational costs. Many methods, such as [
17,
18], are not suitable for real-time or require expensive infrastructure. However, false positives remain a serious problem because they distract security professionals and create so-called “alert fatigue” [
19]. Algorithms that automatically adapt to new conditions and minimise FPR are needed. With increasing traffic, horizontal scaling without losing accuracy remains a challenge. Network-based IDSs must process tens of gigabits per second, which requires lightweight and optimised architectures. Complex systems require constant involvement of analysts for configuration and verification. The alert classification process automation and prioritisation will help reduce staff workload.
There is emerging research on autonomous platforms with self-learning neural networks that adapt on the fly to changing traffic and new types of attacks [
20,
21]. However, there is no single comprehensive solution yet that can operate in high-speed networks without significant delays.
Thus, based on the above (
Table 1), we can conclude that none of the existing approaches combines high accuracy, low latency, scalability, and autonomy at the same time. It justifies the need to create a unified neural network system that, based on real-time traffic analysis, will be able to detect both known and zero-day attacks without constant human intervention.
Based on the existing methods’ shortcomings and limitations (
Table 1), from low adaptability and scalability to high latency and maintenance complexity, a single solution is necessary. An adaptive neural network intrusion detection system based on real-time traffic analysis should combine deep learning to identify unknown attacks, a self-regulating architecture to minimise false positives, and easy scalability to work in networks with a gigabit tens throughput. The development of an adaptive neural network system will reduce human involvement in routine tasks, ensure rapid response, and increase the IT infrastructure’s overall resilience to modern threats.
The research aim is to develop and justify the adaptive neural network system architecture for intrusion detection based on real-time network traffic analysis, ensuring high detection accuracy of both known and zero-day attacks with minimal delays and false positives. The research object is an information security system and network traffic monitoring processes in computer networks, including procedures for data collection, packet processing, and classification in order to identify attack signs. The research subjects are deep learning methods and algorithms (autoencoders, recurrent and convolutional neural networks), as well as the software components and the adaptive system hardware architecture responsible for embedding models in streaming traffic processing and adaptive updating without operator intervention.
2. Materials and Methods
2.1. Development of an Intrusion Detection Method Based on Real-Time Traffic Analysis
2.1.1. Development of the Generalised Traffic Mathematical Model
The proposed method is based on a generalised traffic mathematical model, which defines the network flow representation as a multidimensional stochastic process [
15,
20,
22] in which the traffic features vector:
where each component
xi(
t) represents a scalar characteristic (byte rate, number of packets, load entropy, delay estimates, etc.).
The vector
x(
t) is modelled as a solution to the stochastic differential equation Itô [
23]:
where f:
f: ℝ
n × ℝ
+ → ℝ
n describes the deterministic dynamics (i.e., the drift function describing the deterministic part of the evolution), and
W(
t)—models random fluctuations (diffusion), i.e., it is an
n-dimensional Wiener process [
24]. The matrix
G: ℝ
n × ℝ
+ → ℝ
n×k specifies the random traffic fluctuations intensity and correlation, i.e., it participates in the diffusion form (2), where the Wiener process
dW(
t) increments pass through
G, forming the covariance:
that is, the matrix
G determines how the “noise” is distributed between the feature vector components and with what weight it influences their evolution.
In order to approximate the traffic evolution random component with Gaussian noise with understandable statistical properties at small sampling intervals, a normal distribution is produced [
22,
25]. It means that at a sufficiently small sampling step Δ
t, the traffic feature vector Δ
x =
x(
t + Δ
t) −
x(
t) increments obey the multivariate normal law:
that is, random noise, defined by the diffusion matrix
G and the Wiener process, generates Gaussian deviations of traffic changes.
From (2) with a small step Δ
t, we obtain:
For a multivariate normal distribution, a density function [
21,
26] is used:
Then the log-likelihood is defined as:
which allows us to move from the stochastic model (2) to the increments Gaussian distribution explicit form [
27], which will enable us to define the energy function [
28,
29] for detecting anomalies as:
where
m(
t) is the moving average, Σ
−1 is the standard distribution covariance matrix.
To estimate the flow parameters, the moving average and covariance are calculated as:
Thus, based on the above, a generalised traffic mathematical model block diagram is proposed, presented in
Figure 1. The developed block diagram shows the network traffic processing conceptual path:
The input multivariate feature vector x(t) is fed to the stochastic equation Itô, which models its evolution with drift f(x, t) and diffusion G(x, t);
Based on these equation solutions, the expected behaviour moving average m(t) and covariance Σ(t) estimates are calculated;
The increments Δx are approximated by the multivariate normal distribution N(f(x, t)Δt, G · G⊤Δt), which provides the fluctuation statistics analytical description for subsequent anomaly detection.
In this research, the developed block diagram is implemented in an extended autoencoder with latent dynamics, since the hidden state h(t) evolves according to a stochastic equation and encapsulates information about the current traffic, and the variational functionality ensures the model’s optimal reconstruction and adaptation in real time.
2.1.2. Development of an Extended Autoencoder with Latent Dynamics
The proposed extended autoencoder with latent dynamics (
Figure 2) combines a classical autoencoder [
11,
30,
31,
32] with the continuous (or stochastic) latent representation evolution. A feature vector
x(
t) is encoded into a latent state
z(
t), which then evolves in time according to a deterministic or stochastic equation, and a decoder reconstructs from this state the input
, ensuring the model continuously adapts to changing traffic without being divided into batches.
For each moment
t, the input vector
x(
t) ∈ ℝ
n is transformed into the Gaussian posterior latent distribution parameters [
33]:
A typical case is to calculate the affine drift and diffusion as [
34]:
where
A,
B,
C, and
D are matrix parameters, and
W(
t) is the vector of Wiener processes.
Based on the hidden state, the restoration of the following type is constructed:
When fixing a model with a priority a priori
p(
z(
t)) (usually
N(0,
I)) at each moment
t, the following is optimised:
where [
31,
35,
36]
By accumulating over the interval [0,
T], a complete integral functional of the form is obtained:
The parameters Θ = {
Wμ,
bμ,
Wσ,
bσ,
A,
B,
C,
D,
Wdec,
bdec} are trained by gradient descent [
37] to the maximum (15) as:
For practical implementation, we discretise by step Δ
t:
where
Using the explicit Euler–Maruyama scheme [
38] on the grid
tk =
k · Δ
t, we obtain:
where
The increment mathematical expectation and covariance are defined as:
which guarantees first-order accuracy in strong convergence.
In order to discretise the gradient rise for the parameter Θ, it is assumed that the parameter’s continuous “dynamics” are represented in the form:
According to the Euler scheme with step Δ
t, we obtain:
where
increases the recovering
xk likelihood from
zk,
, is a penalty for the posterior
q deviation from the prior
p, and the latent “runout” is being prevented.
The final algorithm for numerical approximation (20) is presented in Algorithm 1.
Algorithm 1: Numerical approximation algorithm (20). |
Given: Δt, speed η, initialisation z0, Θ0. For k = 0 … K − 1: 1. Read the current feature vector xk. 2. Generate ΔWk~N(0, Δt · I). 3. Update latent: zk+1 = zk + flat(zk, xk) · Δt + Glat(zk, xk) · ΔWk. 4. Calculate the gradient: gk = ∇Θ[ln p(xk|zk) − KL(q(zk|xk)||p(zk))]. 5. Update parameters: Θk+1 = Θk + η · gk · Δt. |
The developed algorithm’s final solution is the trajectories and , which provides an online adaptation of the latent representation and model to streaming data.
Appendix A provides a mathematical justification of what the proposed “Euler gradient step” actually is and how it compares to standard SGD or Adam updates.
2.1.3. Proof of the Optimality Condition
The Euler–Lagrange system for optimal Θ = {
Wμ,
bμ,
Wσ,
bσ,
A,
B,
C,
D,
Wdec,
bdec} has the form:
where
L is the integrand.
To prove the optimality condition, let us consider a functional of the form:
where Θ(
t) = (
θ1(
t), …,
θM(
t)) is the vector of all parameters {
Wμ,
bμ,
Wσ,
bσ,
A,
B,
C,
D,
Wdec,
bdec},
,
is the integrand defining the ELBO increment and regularisers.
The aim is to find the
J[Θ] stationarity (extremum) necessary condition with respect to minor variations
δΘ(
t), which zeroes out the first variation
δJ. For this aim, a parameter variation of the form is introduced:
where the components
δΘ(
t) are arbitrary but satisfy
δΘ(
t0) =
δΘ(
t1) = 0 (fixed boundary values). Then:
where the scalar product means the sum over all components
θi:
In the second sum, we will perform integration by parts for each
i:
Since
δθi(
t0) =
δθi(
t1) = 0, the boundary terms are zeroed out, and summing over
i, we obtain:
Let us substitute this into the expression for
δJ (31):
Since δΘ(t) is an arbitrary function (except for zero boundary conditions), the only way to make δJ vanish for all variations is (28), which gives the Euler–Lagrange system.
For each component
θi ∈ Θ, condition (28) takes the form:
If we write θi according to the list {Wμ, bμ, Wσ, bσ, A, B, C, D, Wdec, bdec}, we obtain exactly the equations system— (28), which is the optimality condition for the autoencoder parameters with latent dynamics.
2.1.4. Development of a Multivariate Kalman Filtering Model in Latent Space
In the developed method, the autoencoder latent state h(t) evolves according to the stochastic model (2), and the observations are either the original features x(t) or their reconstruction . The developed method implements a continuous–discrete Kalman filter, reducing the differential equations system to an equivalent discrete linear model at step Δt.
The continuous linear latent model is based on the fact that the latent space holds an approximation:
where
h(
t) ∈ ℝ
m is the hidden state,
A ∈ ℝ
m×m is the drift matrix (flat linearisation),
B ∈ ℝ
m×r is the noise matrix,
Wh(
t) is the
r-dimensional Wiener process.
Observations
yk on discrete time stamps
tk =
k · Δ
t are either the full input vector or its reconstruction:
where
C ∈ ℝ
p×m,
D ∈ ℝ
p×n are observation matrices,
νk∼
N(0,
R) is the measurement’s white noise,
R ∈ ℝ
p×p.
To transform (37) and (38) into a discrete model, using the Equation (2) solution over the interval Δ
t, the equivalent is obtained:
where Φ is the transition matrix defined as:
The discrete Kalman filter algorithm consists of two phases being performed at each step
k →
k + 1: prediction and correction (
Table 2).
2.1.5. The Anomaly Criterion and Statistical Test Justification
To detect deviations in reconstructed traffic, Hotelling’s
T2 statistic is used, since it generalises the usual
z-test to the multivariate case, taking into account pairwise correlations between the residual components. Its application allows us to record with high accuracy (more than 90% [
39,
40,
41,
42]) not only large deviations in one coordinate but also “complex” anomalies, when each of many small shifts in itself does not go beyond the norm, but, at the same time, in total indicates an attack. To apply Hotelling’s
T2 statistic, it is assumed that at the
k-th step, an “innovation” (residual) of the following type is obtained:
with covariance
where
Pk∣k−1 and
R are the prior and measurement covariances.
Hotelling’s
T2 statistics are calculated as:
According to the error’s multivariate normal distribution properties, in the attack
rk∼
N(0,
Sk) absence, the value
is distributed according to the
χ2-law with
p degrees of freedom; that is,
To guarantee a given level of false positives
α, a threshold is set:
where
is the
distribution (1 −
α)-level quantile. When
, the anomaly absence hypothesis is rejected at the significance level
α.
Formally, Hotelling’s T2 statistic is essentially the squared Mahalanobis distance, but our method has three important differences that make it significantly more “model-meaningful” than the traditional anomaly estimate with covariance weighting. When using Hotelling’s T2 statistic, the distance is calculated not from the raw observation to the global centre but by innovation (residual), that is, by the difference between the observation and the model prediction (Kalman prediction) according to (41)–(43). In this case, the covariance in the normalisation is not a static sample covariance but an innovation covariance matrix Sk obtained and dynamically updated by the Kalman filter steps (the matrix Sk takes into account both the a priori state Pk∣k−1 and the covariance of the measurement noise R), so the criterion adapts to the changing traffic structure and time dynamics. The Hotelling’s T2 statistic provides a rigorous statistical justification for the threshold via the χ2 distribution (false positive rate control), while the traditional Mahalanobis distance is used as a heuristic value without an explicit hypothesis test and without taking into account the model uncertainty. Since the residuals when using Hotelling’s T2 statistics are obtained via variational autoencoder with latent dynamics (reconstruction from latent and updating by SDE or Kalman), the errors themselves reflect deviations from the trained multivariate “manifold” of normal traffic and not just from the feature space centre, which increases sensitivity to “complex” anomalies when shifts are small in coordinates but jointly significant. Thus, there is a similarity with Mahalanobis; the applied approach with Hotelling’s T2 statistics is a model-adaptive, time-dependent, and statistically rigorous criterion, and not a simple covariance-weighted distance.
Thus, the Hotelling’s
T2 statistical test choice is justified by its ability to take into account the multivariate residual complete covariance structure, which allows for the anomalies’ complex pattern detection, which is inaccessible to coordinate-based approaches [
43], as well as by the strict theoretical properties of Gaussian distributions to ensure an accurate setting of the false-positive rate through the
quantile. At the same time, the innovation covariance
Sk, dynamically updated by the Kalman algorithm, automatically adapts to changing traffic characteristics, maintaining the test sensitivity to the attacks’ new types while minimising false alarms.
2.1.6. The Developed Method Synthesis
Based on the developed model, a method for detecting intrusions based on real-time traffic analysis is proposed (
Figure 3). The developed method is based on constructing a network traffic stochastic dynamic model in the variational autoencoder latent space with the hidden state evolution according to a differential equations system and its subsequent evaluation through a continuously discrete Kalman filter. Anomalies are detected by calculating Hotelling’s
T2 statistic for filter innovation and comparing it with a threshold level.
At each time step, statistical and information metrics are calculated from the network traffic, which form the input feature vector according to (1). Raw packet data and metadata are pre-processed and aggregated into numerical characteristics that define the vector xk content according to (1). The neural network encoder transforms the input features into the latent space posterior distribution parameters according to (5) and (6). The latent representation evolves according to the stochastic equation discretised by the Euler–Maruyama method according to (22). The decoder reproduces the original feature estimate from the updated latent based on the linear projection according to (14). A continuum model and discrete observations are specified for the latent space, after which the forecasting and correction phases are performed according to (37)–(40). Based on the calculated innovation and its covariance, Hotelling’s T2 criterion is formed to estimate deviations according to (41)–(43). If the T2 statistic exceeds the χ2 distribution critical value, the anomaly detector, represented as (45), is triggered. The autoencoder and Kalman filter parameters are adjusted in real time using the Euler gradient rule to adapt to changing traffic according to (27).
The final algorithm of the developed method is presented in Algorithm 2.
Algorithm 2: The developed method’s algorithm. |
Initialise z0, Θ0, P0. For k = 0 … K − 1: 1. Read the current feature vector xk. 2. Encode xk → (μk, Σk), sample zk. 3. Update zk+1 according to the Euler–Maruyama method. 4.. 5. Perform prediction and Kalman correction → hk+1∣k+1, Pk+1∣k+1. 6. and compare with the threshold. 7. Mark anomaly if exceeded and update Θ. |
2.2. Development of a Neural Network Intrusion Detection System Based on Real-Time Traffic Analysis
The proposed neural network system (
Figure 4) implements an end-to-end pipeline for collecting, processing, and analysing network traffic in real time, where a variational autoencoder with latent dynamics and a continuously discrete Kalman filter detects anomalies based on Hotelling’s
T2 statistics with low latency. Thanks to built-in online additional training, the system automatically adapts to changing traffic conditions with a minimum level of false positives. The modular architecture based on Kubernetes and Kafka ensures integration with corporate SIEM (or SOAR) systems.
The traffic collection and buffering module receives raw network data from observation points (SPAN port, TAP, and NetFlow agent) and ensures their reliable delivery to the system via a distributed Apache Kafka queue, which provides high throughput and horizontal scaling. At the pre- and post-processing stage, the data is aggregated into micro sessions and normalised using Flink (or Spark) Streaming according to a feature set according to (1), including transmission speed, entropy, and delays, after which a single vector xk is formed for transmission to the variational autoencoder.
The model service is deployed in Kubernetes and includes a TensorFlow implementation of a variational autoencoder and a continuously discrete Kalman filter component: upon receipt of each xk, the platform computes the reconstruction and Hotelling’s T2 statistics according to (41)–(43) via a high-performance gRPC/REST API. The model parameters are adapted “on the fly” using the Euler–Maruyama scheme according to (27): the accumulated batches of input vectors and their latent representations are used to update the autoencoder weights and filter matrices without stopping the service. The anomaly manager compares the obtained Hotelling’s T2 statistics values with the critical threshold according to (45) and, upon detection of deviations, generates events in the Kafka “alerts” channel and stores them in Elasticsearch.
To monitor stability and operation quality, the system collects latency, throughput, and detection accuracy (precision or recall) metrics in Prometheus (or Grafana), which allows for automatic scaling (HPA) and model drift tracking. The visualisation dashboard on React (using Shadcn or ui) displays the anomaly-level real-time diagrams, Hotelling’s T2 trends, and incident reports, and integration via standardised connectors (Syslog, REST, Kafka) ensures the events transfer to corporate SIEM (or SOAR) systems (Splunk, QRadar, Demisto) for further response automation.
The developed neural network system experimental sample is implemented in the MATLAB R2014b software environment (
Figure 5).
The raw network data stream is read from observation points (SPAN port, TAP, NetFlow agent) using the MATLAB Support Package for Kafka or the Java client. Incoming packets are buffered in a ring buffer (matlab.concurrent.Queue), which ensures endurance and delivery reliability even during load surges. The PreprocessStream function sequentially extracts packets from the buffer, combines them into micro-sessions, and aggregates key features (speed, entropy, delays) according to the specification in (1), normalising the results for feeding into the model.
The platform core is the VAEKalmanModel class, implemented on the Deep Learning Toolbox and Control System Toolbox basis. The step xk method encapsulates the whole cycle: the latent vector encoding and sampling, evolution according to the Euler–Maruyama scheme, decoding and the Kalman filter predicting–correction phases’ execution according to (37)–(40), and Hotelling’s T2 statistics calculation according to (41)–(43). The autoencoder parameters and the filter matrix are dynamically updated in the updateParameters method, which, at a given batch interval, performs a gradient step according to the Euler scheme according to (27), maintaining the service without interruption.
The AlertManager block compares the calculated values with the critical threshold according to (45) and, if necessary, generates JSON events for the Kafka topic “alerts” or sends them directly to Elasticsearch via an HTTP request. In parallel, the Logger component collects latency, throughput, and detection quality (precision/recall) metrics. It uploads them to Prometheus/Grafana, which allows the monitoring of current performance and automatically scales the platform using HPA.
Using MATLAB App Designer, an interactive dashboard is created that displays Hotelling’s T2 dynamics diagrams, error distribution histograms, and a real-time anomaly event feed. For integration with external SIEM/SOAR systems (Splunk, QRadar, Demisto), a SIEMConnector script has been developed that reads accumulated events from Kafka or Elasticsearch and transmits them via standardised REST API and Syslog connectors, ensuring the detection and response pipeline’s complete closure.
Thus, the developed neural network system demonstrates the efficiency of the end-to-end pipeline from traffic collection to anomaly detection with low latency (<100 ms) and a controlled false positive rate. The variational autoencoder integration with latent dynamics and the continuous–discrete Kalman filter [
44] ensures the model’s adaptability to changing traffic conditions. Modular implementation in the MATLAB R2014b software environment with online training and built-in monitoring simplifies deployment and operation in corporate SIEM/SOAR environments.
3. Case Study
3.1. Analysis and Pre-Processing of Input Data
The research uses network traffic data obtained over the time interval from 10:00 AM to 1:00 PM. To form the input dataset, network traffic was continuously passively recorded on the key router of the studied subnet using a packet capture tool (e.g., Tshark) from 10:00 AM to 1:00 PM on 2 July 2025, after which the data was aggregated into 1 s time windows (
tk = 1 s). The result was a 10800-sample set, each of which represents a feature vector
xk according to Equation (1). For each window the following were calculated: total number of packets (“Packet count”), total byte volume (“Byte count”), average packet size (“avg_pkt_size” = Byte count/Packet count), sender and receiver port numbers (“src_port”, “dst_port”), network protocol (“Protocol”), packet size entropy or interval distribution within a micro-session (“Entropy”), and average inter-packet interval in milliseconds (“Inter arrival”). The start time of each window was recorded in the “Timestamp” field in the YYYY-MM-DD HH:MM:SS format for subsequent synchronisation with events on network equipment and external monitoring systems (
Table 3).
The “Time stamp” field specifies the start time of the micro session in the YYYY MM DD HH:MM:SS format, which allows data to be synchronised with real events in the network. “Packet count” reflects the total number in this session, serving as a basic load metric. “Byte count” shows the total number of transmitted bytes, giving the traffic intensity. “Avg_pkt_size” is calculated as the ratio of the total number of bytes to the number of packets and characterises the average packet size, which is vital for detecting fragmentation anomalies. “Src_port” and “dst_port” are the sender and recipient port numbers, respectively, allowing you to determine the services and protocols involved in the exchange. The protocol specifies the network protocol (TCP, UDP, etc.), which helps to separate traffic by connection types. Entropy measures the packet size distribution entropy or intervals within a session, which allows you to assess the “chaotic” traffic and detect deviations from normal behaviour. “Interarrival” shows the average time (in milliseconds) between successive packets, serving as a delay indicator and bursty data transfer.
At the input dataset pre-processing first stage, the data completeness and quality are analysed by checking for gaps in the “Time stamp” column and step discrepancies (duplicates or missing seconds), as well as outliers and noise values in numerical features (columns “Packet count”, “Byte count”, “avg_pkt_size”, “Entropy”, and “Inter arrival”). For this aim, moving statistics (moving average and standard deviation) are built, and points that go beyond the μ ± 3σ limits are identified. When gaps are detected, either the intervals are normalised (interpolation) or anomalous sessions are deleted according to pre-set rules.
It is noted that the high entropy value in
Table 3 is explained primarily by the high variability and mixed nature of the network traffic, the continuous (heavyweight) feature distribution’s contribution, and the entropy estimation method. For discrete estimation, the classical Shannon definition
is used, and for continuous features, the corresponding differential entropy is associated with the variance
, so the wide spread and “long tails” of individual features directly increase the entropy estimate. The protocols’ (applications’) shift and periodicity (bursts) within a 1 s window lead to the distribution’s multimodality and, as a consequence, to an increase in the joint entropy. In practice, high entropy means high uncertainty and difficulty in separating outliers by simple marginal rules, but it also indicates that useful information is distributed over complex joint dependencies.
Figure 6 shows that each feature has about 108 missing values (approximately 1% of the total amount), which is within the acceptable threshold for subsequent interpolation or removal without significant data distortion.
Figure 7 shows the “Packet count” values time “spread,” which shows a request’s uniform distribution and random gaps, indicating the strong outliers or systematic failures in data collection. These results confirm that the dataset is sufficiently complete and homogeneous for training the developed autoencoder with latent dynamics.
The next stage of pre-processing the input data involves checking for temporal homogeneity. For this purpose, the entire period (3 h) was divided into several equal 30 min windows, after which the distribution of key features in each window was compared. Using the Kolmogorov–Smirnov criteria (for “Packet count” and “Byte count”) and the
χ2 test (for the categorical field “Protocol”), the extent to which observations in different segments are statistically similar is assessed. If there are significant differences, either additional filtering (an uncharacteristic traffic peaks removal) or the introduction of regression models to compensate for temporal trends may be required. The results of the temporal homogeneity test are presented in
Table 4, where for each window (except the first one), the “Packet count” and “Byte count” distribution was compared with the first window using the Kolmogorov–Smirnov criterion [
45,
46] (
α = 0.05, critical value
Dcrit = 0.054), and the “Protocol” field distribution was compared using the
χ2 test [
47,
48] (
α = 0.05,
df = 1,
= 3.84). In
Table 4, “*” means that the statistic exceeds the critical value,
p < 0.05.
According to
Table 4, most of the 30 min segments (windows 2, 3, 5, and 6) do not show statistically significant differences in either the number or the traffic volume (KS test) or the protocol distribution (
χ2 test), indicating that the traffic is homogeneous over the 3 h. The exception is window 4 (from 11:30 to 12:00) (where “*” means the best values), which shows significant deviations in the “Packet count” and “Byte count” (
pKS) distributions, possibly due to a short-term activity peak or traffic anomaly. However, the “Protocol” distribution remains homogeneous. This homogeneity general confirmation allows using the entire interval for training the model, taking into account a small data filtering from window 4.
To assess the training dataset representativeness (see
Table 1), the k-means [
49,
50,
51] clustering method was used. The training dataset of 10,800 elements (the “Packet count” parameter) was randomly divided into training and validation samples in a 2:1 ratio (67% are 7236 values and 33% are 3564 values). When clustering the training part, nine clusters were found (classes I–IX), the metric distance between which does not exceed 0.1, which indicates both subdatasets’ internal structure similarity (see
Figure 8). Based on the obtained results, the optimal sample sizes were determined: out of the total training data of 10,800 values, 7236 (67%) constitute the training dataset, and 3564 (33%) constitute the test dataset.
Thus, the training dataset’s preliminary processing made it possible to obtain a statistically homogeneous and representative training dataset, which is paramount in the developed variational autoencoder with latent dynamics and the Kalman filter’s stable operation.
Thus, the dataset contains 10,800 samples, 1 s windows collected between 10:00 and 13:00 (3 h), and the entire set was used for training and validation, randomly split in a 2:1 ratio (7236 training, 3564 validation).
3.2. The Developed Neural Network Platform Testing Results
Before the neural network model training stage, the key features “Packet count”, “Byte count”, “avg_pkt_size”, “Entropy”, and “Inter arrival” distribution histograms were obtained (
Figure 9), which allow us to visually assess the distribution’s shape and the outlier’s presence in the entire dataset.
From the above-mentioned model feature distribution histograms (
Figure 9), it is evident that the packet count is distributed approximately according to the Poisson law with about 20 average and rare outliers in both directions. The “Byte count” demonstrates a pronounced log-normal distribution with a “long tail,” where a small proportion of sessions have abnormally large amounts. The average packet size (“avg_pkt_size”) also obeys the log-normal law, but with a more moderate asymmetry and single outliers towards larger values. The entropy distribution approximately corresponds to the beta distribution, concentrated closer to small values of 0.1–0.4 s, with a gradual decline towards one. It is also evident from the histograms that the intervals between packets (interarrival) follow an exponential law, where most delays are close to zero, and the long tail indicates rare but significant delays of up to 5 s.
At the neural network model training initial stage, the “Packet count” and “Byte count” values, time series diagrams were obtained for a three-hour interval (
Figure 10). They display characteristic periodic fluctuations in network traffic and local abnormal surges or dips in order to analyse the load dynamic variability.
According to
Figure 10, for the interval from 12:00 to 15:00, the time series demonstrate clearly expressed periodic fluctuations and isolated anomalies: “Packet count” is characterised by regular bursts with about 60 min, during which the counter value reaches approximately 25–27 packets/min and dips up to 12–15 packets/min, and “Byte count” is approximated by a sinusoid with ≈45 min period, with peaks at the 160,000–170,000 byte level and dips up to the 50,000–60,000 byte level. The local emissions were recorded at the same time, which means a sharp increase in the “Packet count” to ~32 packets/min at 12:30 and its drop to ~11 packets/min at 14:00, as well as a substantial jump in the “Byte count” to ~200,000 bytes at 13:30 and a sharp drop to ~40,000 bytes at 14:30, which require additional analysis, since they may indicate flood attacks or short-term packet losses/network failures.
Next, a histogram of missing values by features is obtained (
Figure 11), which clearly demonstrates the percentage or number in each feature. It allows us to assess the data collection quality (the input dataset quality) and justify the need to use interpolation or filtering.
The histogram presented in
Figure 11 shows that the missing data proportion in 10,000 observations ranges from ≈1.6% (“Inter arrival”) to ≈9.5% (“Byte count”), indicating traffic collection has high reliability (less than 10% dropouts in the worst case). At a missing data rate of up to 5% (“Packet count”, “avg_pkt_size”, “Entropy”), it is sufficient to adopt simple linear or polynomial interpolation without significant statistical distortion. In contrast, at higher losses of up to 10% (“Byte count”), it is advisable to apply spline smoothing, taking into account trend and seasonality, or pre-filtering of extreme outliers before imputation.
The variational autoencoder training curves (
Figure 12) were also obtained, reflecting the loss function (ELBO) dependence on the training and validation datasets on the epoch number. The variational autoencoder training curves make it possible to visually assess the model’s convergence and identify overfitting signs.
According to the autoencoder training curves (
Figure 12), two phases are clearly visible: at the beginning (up to the 50th epoch), the ELBO loss functions on the training set monotonically decrease to ≈ –192, which reflects a steady improvement in the latent distribution reconstruction and approximation quality, and the validation ELBO similarly decreases up to the ≈30th epoch, and the gap between the training and validation curves is minimal (~1–2 units), which indicates correct model generalisation. However, after the ≈30th epoch, the validation ELBO starts to grow. At the same time, the training curve dynamics continue to decrease, indicating the onset, in which the autoencoder captures the training data details at the expense of the ability to identify general patterns, so early stopping based on the validation ELBO is used to mitigate it (
Figure 13).
In
Figure 13, the early stopping epoch occurs at the first iteration, since this is where ELBO on validation reaches its minimum value, after which further training does not lead to an overall improvement in either the training or validation datasets. Deciding to stop at this “best” validation epoch minimises overfitting, preventing the gap between training and validation ELBO from widening and preserving the model’s ability to generalise without “memorising” noise artefacts. The results obtained significantly reduce the computational cost, since subsequent epochs are ineffective in quality improvement. It ensures the autoencoder’s generalisation ability stability, ensuring an optimal balance between the reconstructive loss function and generalisation to unseen data.
Next, the Hotelling’s
T2 statistics time evolution linear diagram is obtained with the critical threshold (
χ2-quantile for the
α level) superimposed (
Figure 14), which shows the anomaly detector’s triggering moments and frequency when the network traffic deviates from normal behaviour.
The Hotelling’s
T2 statistics three-hour diagram (
Figure 15) shows the critical threshold (
χ2-quantile at
df = 5 and
α = 0.01) at ≈15.09, with anomalies recorded at the moments when Hotelling’s
T2 statistics exceeds this limit—at 12:30, 13:30, and 14:30—which amounted to the 180 measurements (≈1.7%) with three triggers out, corresponding to the declared false alarm rate of 1%. The obtained result confirms the detector’s high sensitivity according to Hotelling’s
T2 statistics. The detector identifies multivariate deviations from normal behaviour while remaining within the controlled false alarm rate determined by the
χ2-quantile confidence ellipsoid.
The ROC curve (
Figure 15a) and the precision–recall curve (
Figure 15b), constructed using an attacked and standard sessions test set, allow us to evaluate the balance between sensitivity and the false positive level, as well as the neural network model’s effectiveness when working with unbalanced data.
The ROC curve (
Figure 15a) demonstrates the
TPR sensitivity dependence on the false positive FPR proportion when choosing the classification threshold: the closer to the upper left corner the curve is, the better the model separates attacked and regular sessions while maintaining a low level of false alarms. In our example, the ROC curve tends to rise and reaches
TPR ≈ 1 at
FPR ≈ 0.5, which reflects an acceptable compromise between the attack detection completeness and false signals control. The precision–recall curve (
Figure 15b) defines the accuracy (
) depending on the recall (
), which is an informational parameter for unbalanced classes (30% of attacked sessions): at high recall values (>0.8), the precision drops to ≈ 0.45 due to an increase in the false positives proportion, while the detection and accuracy (precision > 0.6) optimal balance is achieved in the recall range of ≈0.6–0.7.
It is also noted that a decrease in accuracy to ≈0.45 with high recall does not automatically mean that the method is unsuitable—this is a classic trade-off reflection between sensitivity and error rate (), (), and with a rare class (low prevalence), a low value of base accuracy in the PR curve is typical. Formally, accuracy is related to sensitivity and specificity through , where π is the anomalies proportion, so with small π, even good sensitivity (or specificity) gives low precision. At the same time, the developed method is positioned as an “early warning” if followed by cheap verification (feature enrichment, rule engine, lightweight signature check, or human-in-loop), which reduces the final cost of false positives. In this case, it is possible to select the operating point (threshold) by a predetermined level of false positives or optimise it by Fβ and also to increase precision by aggregating signals over time and sessions (temporal smoothing, majority voting). Additional measures to increase precision include threshold calibration, post-processing (including the Kalman innovation use as a filter), model ensemble, and cost-sensitive training.
The corresponding error matrix obtained for the selected
χ2-quantile threshold (
α = 0.01) is presented in
Table 5.
In the error matrix:
TP (True Positive) is the truly anomalous sessions number correctly labelled as anomalies by the detector (in this example, the value obtained is 25);
FP (False Positive) is the number of regular sessions incorrectly classified as anomalies (in this example, the value obtained is 5);
TN (True Negative) is the number of regular sessions correctly labelled as “norm” (in this example, the value obtained is 95);
FN (False Negative) is the number of anomalous sessions missed by the detector (labelled as “norm”) (in this example, the value obtained is 5).
Using the error matrix allows us to quantitatively evaluate the classifier’s characteristics: shows the proportion of correctly detected attacks out of all real attacks, reflects the correct alarm proportion among all triggers, and the F1-score, which combines both metrics serves as the detector’s balanced quality integral indicator.
The study also obtained the end-to-end detection time dependence of one session on the throughput (number of packets/sessions per second) diagram (
Figure 16), which demonstrates the system’s scalability and ability to maintain real-time mode with increasing network load.
The processing of one session’s end-to-end latency diagram versus the throughput (100–1000 sessions/s) (
Figure 16) shows a nearly linear trend with a slight sublinear (logarithmic) component. As the load increases from 100 to 1000 sessions/s, the latency increases from approximately ≈1.5 to ≈6.5 ms. At the same time, a predictable increase in latency without sharp jumps allows us to set the performance limits, and maintaining a low latency level (<10 ms) ensures timely detection of anomalies without noticeable delays in the data flow. In addition, scalability, expressed as less than a fivefold increase in latency with a tenfold increase in load, indicates efficient resource allocation and leaves room for the detection pipeline’s further optimisation.
Latent state trajectory diagrams obtained by projecting the
z(
t) vector onto 2–3 dimensions (PCA) (
Figure 17) allow us to visualise how the model smoothly moves in the latent space between zones of normal and abnormal traffic behaviour.
In the latent vectors
z(
t) PCA projection diagram (
Figure 17), normal behaviour forms a coherent “path” in the PC1–PC2 space that smoothly shifts in time (from dark to light shades), reflecting the network traffic latent representations evolution, while anomalies (highlighted by red dots) sharply go beyond the central cluster, moving to distant areas. The movement smoothness across the cluster indicates that the variational autoencoder has learned the low-dimensional “manifold” of normal behaviour and encodes states with similar features into the latent space, neighbouring points. At the same time, outliers during anomalies demonstrate the model’s ability to clearly separate traffic chronological sections with attacks or bursts. The resulting visualisation confirms that the detector can rely on the distance from the cluster centre or the points’ local density estimate in the latent space for additional anomaly detection when integrating PCA projection into the monitoring pipeline.
At the testing final stage, the model’s key parameters evolution diagram (the drift matrix
A and diffusion matrix
G elements) was obtained (
Figure 18), illustrating the system’s dynamic adaptation during online learning to changing traffic characteristics in accordance with Equation (27).
The drift matrix
A and diffusion matrix
G parameter adaptation dynamics during online training diagrams (
Figure 18) show that the eigenelements
A[0, 0] and
A[1, 1] change smoothly, following seasonal trends and minor fluctuations, which reflects the model’s gradual adjustment to changing traffic’s structural characteristics (e.g., changes in the average load and correlations between features), while the elements
G[0, 0] and
G[1, 1] demonstrate a noisier evolution due to their role in describing the latent variables variability (variances and covariances) and the response to short-term anomalous bursts. The adaptation within the stochastic differential Equation (27) framework ensures that the drift matrix A specifies the state “return” direction and speed to equilibrium to account for long-term trends (changes in daily activity patterns). At the same time, the diffusion matrix
G quickly regulates the degree of random fluctuations around the drift trajectory during sudden changes (DDoS attacks, network failures), due to which the system remains resistant to slow shifts and at the same time highly adaptive during turbulent traffic periods.
3.3. The Developed Neural Network Platform Computational Complexity Evaluation Results
The developed platform’s computational complexity calculation can be divided into four main stages of processing one feature vector xk (of dimension n) and the model’s subsequent adaptation: encoding (the variational autoencoder encoder), updating the latent state (Euler–Maruyama method), decoding (the variational autoencoder decoder), the continuous–discrete Kalman filter, and updating parameters (gradient step according to the Euler scheme).
The variational autoencoder’s encoder is a fully connected layer series with an order of
O(
n ×
h) total number of parameters, where
h is the hidden layer average width. The forward pass computational complexity through the encoder is defined as:
The latent state
zk has dimension
m. The Euler–Maruyama step involves multiplications by
m ×
m matrices
A and
G and a noise vector generation, which gives complexity:
The decoder reconstructs the original vector of dimension
p (usually
p =
n) from the latent, through layers of order
O(
m ×
p):
The continuous–discrete Kalman filter’s basic operations are the multiplication and the covariance matrix inversion of dimension
m, which gives a cubic complexity in the latent dimension:
In online training, the encoder (decoder) parameters and matrices
A and
G are updated by gradient: one step of gradient descent on a batch of size
B has a complexity:
Thus, the asymptotic computational complexity of processing one time step (excluding Kafka or Kubernetes communication delays) is
and taking into account online training on batches of
B elements:
Taking into account the substituted values used in the study n = 10, h = 64, m = 16, p = 10, B = 32, the following was obtained:
basic operations per time step.
- 2.
Online training on a batch of B = 32 vectors:
additional operations per gradient descent step.
- 3.
Total for one time step, taking into account training:
basic operations, which, with modern CPU/GPU (<100 ms calculations), fit into the platform’s real operating time.
The results show that processing a one-time step without taking into account training requires approximately 5152 basic operations, and taking into account gradient descent on a batch of 32 elements is about 38,944 operations. With modern CPU/GPU architectures, the resulting computational complexity fits into the target time budget (<100 ms), ensuring the system’s timely response in real conditions. Additional optimisation at this stage is not critical. Still, weight quantisation and parallel computation can be used to transfer the platform to resource-limited embedded systems or with a significant increase in the feature dimension.
Figure 19 shows the comparative change in computational costs (in arbitrary units) with increasing latent space dimension
m for the exact Kalman filter, low-rank approximation (Woodbury), diagonal approximation, and Ensemble Kalman.
Figure 19 shows the typical increase in computational costs with increasing latent dimension m. According to
Figure 19, the exact Kalman filter implementation exhibits a rapid cubic speedup in costs and becomes impractical already for medium and large
m (starting from about hundreds of dimensions in the illustrative scheme), while the low-rank approximation (Woodbury) practically repeats the exact method behaviour for small
m and significantly reduces the overhead for
m ≫
r, moving the “bottleneck” to the inversion of the
r-size matrix. The diagonal approximation remains cheap and almost independent of
m but loses the inter-component covariance, which usually affects the estimates’ quality, and Ensemble Kalman occupies an intermediate position—it eliminates the
O(
m3)-unit but entails a factor of
e and a quadratic dependence of
O(
e ·
m2) and the estimates’ stochasticity.
3.4. The Neural Network Performance Evaluation
The developed neural network efficiency was assessed by the following key metrics: recall, precision, F1-measure, ROC curve, precision–recall curve, and end-to-end detection latency. The efficiency assessment that was conducted showed that the developed neural network demonstrates high accuracy and completeness of recognition with the Recall ≈ 0.83 and Precision ≈ 0.83 values, which ensures a balanced value of F1-measure ≈ 0.83 when detecting anomalies.
ROC curve analysis confirmed the balance between sensitivity (TPR → 1) and the false positive rate (FPR ≈ 0.5), which indicates the detector reliability at different decision thresholds.
The precision–recall curve analysis revealed that at high recall values (recall > 0.8), the accuracy decreases to ≈ 0.45, and the optimal compromise is achieved at recall ≈ 0.6–0.7 and precision > 0.6.
End-to-end latency for processing one session is 1.5–5 ms under loads of 100–1000 sessions/s, which confirms the platform’s ability to operate in real time with low latency and high scalability.
Table 6 provides a comparative analysis of the developed neural network with other neural network architectures that are widely used for detecting data anomalies.
Comparing the results (
Table 6), the developed neural network platform demonstrates balanced accuracy and recall of detection (precision = recall = 0.83) with
F1 = 0.83 and ultra-low latency of 1.5–6.5 ms, which ensures strict real-time requirements. At the same time, the LSTM autoencoder and Deep SVDD with CAE show slightly increased precision and recall metrics values (~0.887–0.888) with the corresponding
F1 ≈ 0.887 and
F1 ≈ 0.8825, but their delay is measured in tens to hundreds of milliseconds, due to which the real efficiency guarantee is reduced. The TimeGPT model is inferior in detection quality (
F1 ≈ 0.55) with similar delays. Thus, the developed variational autoencoder with the Kalman platform provides an optimal compromise between the anomaly detection quality and the extremely low latency required for online monitoring and can have practical implementation in cyber police units.
3.5. The Practical Implementation of Obtained Results
The developed adaptive neural network practical implementation intrusion detection system in the cyber police activities includes its module integration into the existing infrastructure and law enforcement agencies’ response: network traffic streaming analysis using Apache Kafka and Spark (or Flink) allows the cyber police to receive aggregated features in real time and instantly detect anomalies according to Hotelling’s T2 statistics, after which events about potential attacks are automatically transmitted to SIEM/SOAR systems (Splunk, QRadar, Demisto) through standardised connectors for subsequent investigation and notification of cyber police officers.
Due to the minimal level of false positives (<1%) and latency of 1.5–6.5 ms, the cyber police are able to continuously monitor critical information flows without significant delays and personnel overloading, and the online training module ensures the model’s independent adaptation to new types of threats without the need for intervention by engineers.
Figure 19 shows the developed neural network platform implementation in the cyber police units’ diagram. In the developed diagram (
Figure 20), network traffic data from mirror ports enters a fault-tolerant Kafka cluster, where scalable buffering is provided, after which Spark (or Flink) is used to aggregate sessions and extract statistical features (Hotelling’s
T2, mean, variance) with subsequent normalisation and outlier filtering. At the next stage, a variational autoencoder together with a Kalman filter detects anomalies in real time, makes a binary “norm/threat” decision based on adaptive thresholds, and sends generated alerts to SIEM/SOAR systems (Splunk, QRadar, etc.) for prompt response by the cyber police. The online learning module, based on feedback from analysts, automatically adjusts the model weights and filter parameters to optimise accuracy and reduce false positives [
58,
59,
60,
61].
The key element in the platform integration into the corporate cyber police environment is the dashboard screenshots (
Figure 21), illustrating the system’s operation. The interactive dashboard provided combines key performance indicators of the adaptive neural network intrusion detection system in real time:
Hotelling’s T2 statistics time trend with trigger level (Threshold) allows the tracking of the anomaly dynamics and instantly identifies spikes in deviations from normal network behaviour;
The precision–recall curve demonstrates the balance between the detection completeness and accuracy with the model’s current parameters. It is vital to assess its quality and adjust for false positives.
The traffic load histogram shows the change in load in sessions per second and serves as a basis for understanding peak loads and the business traffic to noise emissions.
Latency versus Sessions per Second illustrates the system’s scalability and ensures that latency SLAs (<100 ms) are met as throughput increases.
4. Discussion
The study developed a neural network system for detecting unauthorised intrusions based on real-time traffic analysis. It is based on the network traffic’s stochastic model, in which a multidimensional feature vector
x(
t) is considered as a solution to the stochastic Itô Equations (1) and (2), and the increments Δ
x are approximated by a multidimensional customary law according to (4)–(6). Based on this, an explicit expression for the traffic fluctuations’ log-likelihood (8) is obtained, and an energy function for detecting anomalies is introduced in the form of (9). The data flow with the mean
m(
t) and covariance Σ(
t) moving estimates calculation block diagram is developed and is shown in
Figure 1.
An extended variational autoencoder with latent dynamics is developed (
Figure 2), in which the input vector
x(
t) is encoded into the Gaussian posterior latent distribution parameters to (12), after which the hidden state
z(
t) evolves according to the stochastic equation discretised by the Euler–Maruyama scheme according to (20)–(22). The ELBO functional for each time point is specified through the reconstructive likelihood and the KL penalty according to (15)–(18), and the parameters’ Θ numerical approximation is carried out according to the Euler gradient rule according to (27), implemented in Algorithm 1.
To filter latent states and detect deviations, a continuous–discrete Kalman filter is implemented, where either
x(
t) or the reconstruction
according to (37)–(40) and
Table 2 act as observations. Anomalies are estimated using Hotelling’s
T2 statistics according to (41)–(44) and compared with the
χ2 threshold according to (45). The developed method’s general block diagram, combining VAE with latent dynamics and the Kalman filter, is shown in
Figure 3.
Affine drift and diffusion records (matrices A, B, C, and D) were used as a typical and computationally economical special case for latent dynamics. Data restoring is performed via a linear projection for compatibility with the continuous–discrete Kalman filter and to preserve the threefold complexity only in the latent dimension. At the same time, the encoder and decoder are implemented as a sequence of fully connected layers (the VAE encoder is the “fully connected layer series”), and, in general, latent updates are specified through the generalised functions Flat(zk, xk) and Glat(zk, xk). That is, the developed neural network architecture allows nonlinear mappings and their parameterisation by neural layers. In its implementation, a balance is chosen between expressiveness and delay (complexity) of calculations. Therefore, nonlinear transformations are formally allowed (both the encoder and decoder are neural networks), but in this study, affine (linear) latent dynamics and a linear decoder part parameterisation are used for the sake of Kalman tracking, computational efficiency, and threshold predictability (Hotelling T2). At the same time, an extension to fully nonlinear drift (diffusion) (for example, deep layers in flat or Glat) can be implemented in the developed neural network for further improvement, taking into account the increased computational costs and the need to adapt or replace the Kalman filter (this will neutralise the computational bottleneck O(m3)).
Thus, the developed method is based on the approximation at a small step Δt, where for the Ito model with drift f and diffusion G, the increments Δx are approximated as N(f(x, t)Δt, G · G⊤ · Δt) according to (2)–(6) and the approximation by normality discussion. On this basis, a filter (prediction or correction) and an anomaly criterion are formed based on the innovation rk and its covariance Sk, with the classical Hotelling statistics and a threshold at according to (41)–(45). For real non-Gaussian or mixed flows, it is important to note the following:
The Kalman step itself with dynamic Sk remains useful, since it takes into account the current second moments of errors, but it is optimal by design only for normally distributed noise, i.e., in the heavy tails or components mixture presence. In this case, the quadratic form is no longer distributed over χ2, and therefore the declared threshold will give a distorted level of false positives (usually too high for heavy-tailed ones).
To account for heavy tails mathematically, one can replace the observation model with a multivariate Student criterion with ν degrees of freedom, where the density is . In this case, the log-likelihood and tests deviate from χ2 and require appropriate threshold adjustments or the F-parametrisations, Bayesian or variational estimator use. Similarly, filtering requires either a t-Kalman or EM approach or particle filter methods for correct posterior estimation under strong non-Gaussianity.
Introduce easy means of increasing robustness, while estimating the threshold empirically (the quantile of the nominal distribution T2 on the training “clean” sample), using robust covariance estimates (MCD, Huber-M-estimators, or shrinkage), or using ranked or nonparametric tests (bootstrap or permutation) instead of the χ2 threshold.
The study proposes the adaptive neural network’s IDS end-to-end modular architecture, in which traffic from SPAN ports, TAP devices, and NetFlow agents via Kafka and Flink (or Spark) Streaming is aggregated into microsessions and normalised into a feature vector
xk (
Figure 4), after which it is encoded into the variational autoencoder Gaussian posterior latent distribution parameters and evolves according to the discretised stochastic Euler–Maruyama equation. At the same time, at each step, Hotelling’s
T2 statistics are calculated from the reconstructed vector, compared with the
χ2 threshold for instant anomaly detection and online retraining of both the autoencoder parameters and the continuous–discrete Kalman filter, which together ensure low latency (1.5–6.5 ms), high throughput, and the platform’s self-adjustment in dynamic operating conditions.
The study used a set of three-hour network traffic (10:00–13:00), divided into microsessions with the features “Packet count”, “Byte count”, “avg_pkt_size”, “Protocol”, “Entropy”, and “Inter arrival” (a fragment is given in
Table 3, summary statistics in
Table 4). At the input dataset’s preliminary processing stage, gap analysis, duplicates, and outliers were carried out: the gaps histogram (
Figure 6) and the “Packet count” (
Figure 7) time series showed less than 1% dropouts and the absence of system failures. The traffic’s homogeneity was checked by dividing it into six 30 min windows and comparing the “Packet count” and “Byte count” distributions with the first window using the KS test and the categorical feature “Protocol” using the
χ2 test, revealing significant deviations only in the window from 11:30 to 12:00 (
Table 4). The cluster analysis performed using the k-means method (
Figure 8) identified nine balanced clusters, which justified the dataset’s dividing proportions into training (67%) and validation (33%) subdatasets.
A computational experiment was conducted to evaluate the developed platform. The input feature distributions analysis shows the expected statistical laws (packet count is approximately Poisson, byte volume and average packet size are log-normal, and entropy and intervals are beta/exponential), which is illustrated by the histograms in
Figure 9. In contrast, the “Packet count” and “Byte count” time series reveal periodicity and local traffic spikes that coincide with the recorded anomalies (
Figure 10). The obtained share of feature gaps and the imputation strategies applied are justified by the gap diagram (
Figure 11). The variational autoencoder training demonstrates stable convergence and the overfitting risk after the 30th epoch, which is successfully mitigated by early stopping (
Figure 12 and
Figure 13). Hotelling’s
T2 statistics over time with a critical threshold show the deviation moments (
Figure 14), and the ROC and precision–recall curves provide a quantitative representation of the trade-off between recall and accuracy (
Figure 15), while the obtained values of the confusion matrix and F1-score confirm the detector balance. Performance testing shows that the processing latency remains within real-time limits with increasing load (≈1.5–6.5 ms for 100–1000 sessions/s), as reflected in
Figure 16, and the latent path projection (PCA) demonstrates a clear separation of normal and abnormal behaviour in the latent space (
Figure 17). The model parameters’ adaptation dynamics (the drift
A and diffusion
G matrices’ elements) confirm the platform’s ability to smoothly adapt to traffic changes and, at the same time, quickly respond to bursts (
Figure 18). Thus, the obtained results demonstrate the proposed approach’s practical feasibility, since the developed neural network model combines anomaly detection with a controlled level of false positives and low latency. At the same time, the limitations remain the observed tendency to local overfitting during long-term training and high
FPR at some thresholds, which requires additional fine-tuning of the threshold and validation on more diverse load scenarios.
The study estimates the computational costs according to the platform implementation stages: encoding (VAE encoder), latent state evolution step (Euler–Maruyama scheme), decoding, continuous–discrete Kalman filter and gradient step parameter update, and derives asymptotic estimates of their complexity
O =
O(
n ·
h +
m2 +
m ·
p +
m3) for one time step and an additional term
B · (
n ·
h +
m2 +
m ·
p) for online training with a
B-sized batch (Equations (46)–(52)); when substituting the experimental values
n = 10,
h = 64,
m = 16,
p = 10, and
B = 32, the obtained orders of magnitude are ≈5152 basic operations per pass and ≈38,944 operations taking into account the gradient step, which formally fits into the target time budget (<100 ms) on modern CPUs/GPUs. The experimental latency-throughput relations confirm the practical feasibility of the estimate: with an increase in the load from 100 to 1000 sessions/s, the delay predictably increases to ≈1.5–6.5 ms, which indicates the service’s high scalability in real monitoring conditions. The age complexity analysis reveals a bottleneck in the
O(
m3) term, caused by operations with the covariance matrix and its inversion in the Kalman step, as well as a significant contribution of online learning, scaled by
B. Therefore, when porting to embedded or highly loaded systems, it is advisable to reduce the latent space size, the covariance (or apply Woodbury-type formulas to speed up accesses), use low-rank (or diagonal) approximations, weight quantisation, pruning and computational acceleration on GPU/TPU, or replace the exact Kalman filter with approximate filters with an upper error bound to reduce the cubic component. The architectural solution with distributed queues and containerisation (see the platform block diagram,
Figure 20) provides the ability to horizontally scale and flexibly distribute the computational load between the pre-processing service, the model, and the training subsystem, which facilitates the proposed optimisation’s practical implementation in production.
The obtained results’ practical implementation is aimed at the adaptive neural network platform’s direct integration into the cyber police infrastructure. Streaming collection and aggregation of traffic via Apache Kafka and Spark (or Flink), statistic and feature calculation in real time with anomalies’ subsequent detection according to the Hotelling’s
T2 criterion, and events’ automatic sending to SIEM (or SOAR, Splunk, QRadar, Demisto) via standardised connectors are described in the text and illustrated by the deployment block diagram (
Figure 19). Due to the measured characteristics (false positive rate < 1% and end-to-end latency of 1.5–6.5 ms), the platform ensures continuous monitoring of critical channels without significant workload on personnel, and the online training module allows the system to independently adapt to new types, reducing the need for engineers to intervene. An interactive monitoring panel presence (see
Figure 20 for dashboard screenshots) with Hotelling’s
T2 trends, precision (as well as recall) metrics, load histogram, and latency diagrams simplifies operational control and fine-tuning of detection thresholds in real time. The implementation variability is confirmed both by the experimental implementation in MATLAB (
Figure 5) and by the industrial architecture on Kubernetes (or TensorFlow) with metrics export to Prometheus (or Grafana) and autoscaling (HPA), which ensures scalability and fault tolerance in production use. Based on these, limitations of the results obtained and prospects for further research are presented in
Table 7 and
Table 8.
It is noted that the study estimates the bottleneck in the cubic term
O(
m3), i.e., in operations with the covariance matrix and its inversion in the Kalman step according to (46)–(55). The manuscript does not provide practical experiments with low-rank or hard-diagonal approximations of the covariance, but it recommends using approaches such as Woodbury or low-rank with diagonal to reduce complexity. Technically, this is performed as follows: the approximation
P ≈
D +
U ·
S ·
U⊤, where
D is the diagonal part,
U ∈ ℝ
m×r,
r ≪
m) allows us to use the Woodbury identity
which translates the inversion cost into a complexity of order
O(
m ·
r2 +
r3) instead of
O(
m3), and the diagonal approximation yields a trivial inversion of
O(
m). Thus, the following can be proposed:
To empirically compare the three-point modes r = {2, 4, 8} with m = 16 used and measure the impact on the latent estimate and detection metrics;
To estimate the filter error when approximating through the Eckart–Young norm ‖P − Pr‖2 and relate it to the deviations in the Kalman gain matrix K;
Consider ensemble (EnKF) or partial (sparse or structured) filters as an alternative for very large m.
This will give a quantitative curve of “accuracy versus latency” and will objectively show at what r the time gains do not lead to a significant loss in detection quality, which will make the proposed optimisations practically justified.
It is also noted that the latency estimates in the study are obtained for the 100–1000 sessions/s range and do not guarantee behaviour under extreme loads (>10k sessions/s): from the asymptotic of one step O(n · h + m2 + m · p + m3) it is clear that as the load increases, the bottleneck is determined by the cubic term O(m3), that is, the covariance matrix inversion in the Kalman step, and simple extrapolation can lead to a significant increase in queues and delays. For the main scenarios, it is advisable to include in the manuscript and the further experiments methods programme for reducing complexity (reducing the latent m dimensionality, approximations P ≈ D + U ⋅ S ⋅ U⊤ using the Woodbury identity (56), giving complexity of order O(m · r2 + r3) for r ≪ m, diagonal approximations), as well as approximate filters (EnKF, sparse or structured Kalman, particle filters).
5. Conclusions
The adaptive neural network system for detecting unauthorised intrusions based on real-time traffic analysis architecture has been developed, combining a variational autoencoder with continuous stochastic dynamics of the latent space (updated according to the Euler–Maruyama scheme), a continuous–discrete Kalman filter for filtering the latent state, and Hotelling’s T2 statistical criterion for detecting deviations. At the same time, an “on the fly” mechanism introduction for updating the model parameters (Euler Euclidean gradient step) ensures a low end-to-end delay of 1.5–6.5 ms under a load of 100–1000 sessions/s, as well as an explicit separation of normal and abnormal trajectories in the latent space and the ability to adapt to traffic drift.
It has been experimentally shown that the proposed platform achieves balanced quality metrics: precision is 0.83, recall is 0.83, and F1-score is ≈0.83, while the end-to-end delay in processing one microsession is 1.5–6.5 ms in the 100–1000 sessions/s load range. The false positives proportion was estimated below 1% at the selected χ2 threshold. The ROC/PR curves and error matrix analysis give the threshold’s specific working areas (optimum recall is ≈0.6–0.7 for precision > 0.6), which allows us to quickly adjust the trade-off between recall and accuracy.
The computational complexity and feasibility evaluation results show that for typical model sizes (n = 10, h = 64, m = 16, p = 10, B = 32), one pass requires ≈5152 basic operations, and taking into account the batch gradient step, it is ≈38,944 operations. The obtained costs are within the target time budget on modern CPUs/GPUs (<100 ms). At the same time, a bottleneck was identified in the cubic term O(m3) associated with the covariance matrix inversion in the Kalman step. It provides specific optimisation directions (reducing the latent size m, low-rank or diagonal approximations, Woodbury techniques, quantisation (or pruning), and approximate filters).
The practical implementation is shown on a prototype with streaming collection integration (Kafka with Spark or Flink), incident export to SIEM (or SOAR), and monitoring in Prometheus (or Grafana) (dashboard with Hotelling’s T2 trends and quality metrics), which confirms the possibility of the developed architecture of the adaptive intrusion detection system for implementation in cyber police units.