This chapter addresses the shortcomings of existing anomaly detection methods for industrial control devices, such as the generality of ICS device fingerprints and the accuracy of anomaly detection models. We propose an anomaly detection method for industrial control devices based on the fine-tuned Llama3 model. As shown in
Figure 4, this method consists of three main components: industrial control network traffic preprocessing, ICS device fingerprint extraction, and the anomaly detection model. The main innovation of this method lies in the introduction of ICS device fingerprints that comprehensively consider the software and hardware characteristics of the devices and can be obtained in both active and passive network communication environments. Additionally, the method features a powerful anomaly detection model based on the fine-tuned Llama3 model. This approach not only eliminates dependence on specific industrial control scenarios and protocols but also significantly improves the accuracy of detecting anomalous industrial control devices.
3.1. Fingerprint Extraction Method for Industrial Control Devices Based on the Industrial Control Protocol Communication Mode
As shown in
Figure 5, industrial control devices typically communicate with each other using an industrial control protocol to facilitate the upload of process data and the issuance of control commands. This chapter focuses on industrial control protocols that are based on the TCP/IP protocol and utilize a Client/Server communication model. Protocols such as Modbus/TCP, EtherNet/IP, and S7comm are widely used in various industries, including oil, chemical, and water resources, and thus have broad representativeness. This chapter aims to propose a fingerprint extraction method that can comprehensively reflect the software and hardware characteristics of industrial control devices and can be obtained in both active and passive network communication environments, thereby improving the distinguishability among different industrial control devices.
Based on the summary of the industrial control device communication process in
Figure 5, this chapter proposes the corresponding industrial control protocol communication model ICS_CM to formally describe the aforementioned communication process. Specifically, it is represented as follows:
In this model,
represents the message sequence of the connection establishment phase, expressed as
In this context, represents the SYN packet in the connection establishment phase, represents the SYN_ACK packet in the connection establishment phase, and represents the ACK packet in the connection establishment phase. The superscripts src and dst, respectively, indicate that the sender of the packet is the host requesting to establish the TCP connection, i.e., the Client, and the host agreeing to establish the TCP connection, i.e., the Server. The subscripts indicate the packet type.
represents the message sequence of the data transmission phase, which consists of one or more data transmissions, expressed as
where
represents the
i-th data transmission, with
and
n denotes the total number of data transmissions.
is expressed as
where
represents the request packet of the
i-th data transmission,
represents the TCP protocol ACK response packet of the
i-th data transmission, and
represents the industrial control protocol data response packet of the
i-th data transmission.
represents the message sequence of the connection termination phase, which consists of one or more message subsequences, expressed as
where
represents the sequence of packets initiated by the Client to terminate the TCP connection, and
represents the corresponding sequence of packets returned by the Server.
and
are expressed as follows:
The types and quantities of packets contained in , , and are determined by the specific implementation of the industrial control device protocol stack.
Based on the aforementioned industrial control protocol communication model (ICS_CM), this chapter designs the ICS device fingerprint vector (DF) from the perspective of the differences in the hardware and software implementations of industrial control devices, specifically expressed as
The explanations of the relevant features of the ICSDFs are shown in
Table 1. Examples of industrial control equipment fingerprints are shown in the
Table 2.
The aforementioned features of the DF can be broadly classified into two categories based on the characteristics they reflect: hardware features and software features. The software features mainly refer to the differential characteristics caused by the implementation of the industrial control device’s operating system or protocol stack, including
,
,
,
,
,
,
,
,
, and
. Hardware features are composed of
and
. As shown in
Figure 6,
is the time interval represented by
. Since the
packet is replied by the transport layer of the industrial control device’s protocol stack, and the
packet is replied by the application layer of the protocol stack, the time interval between the two reflects the hardware processing performance of the industrial control device. However, in some cases, the accessed industrial control device may not reply with a
packet, but instead directly reply with a
packet. In response to the aforementioned situation, this chapter proposes the
feature, which has a similar principle to
. Although
is less accurate than
, it is easier to obtain. Specifically,
is the difference between
and
. When
does not exist, the same function can be substituted with
, meaning that
is the difference between
and
. Therefore,
and
together characterize the hardware features of industrial control devices.
It is worth noting that although the DFs proposed in this chapter utilize the communication patterns of industrial control protocols based on the TCP/IP protocol and the Client/Server communication model, they do not rely on any specific industrial control protocol specification, nor do they require a specific method for acquiring industrial control network traffic. In other words, as long as the industrial control network protocol is of the aforementioned type, the proposed DFs can be effectively obtained in both active and passive network communication environments. This greatly enhances the applicability of the proposed industrial control device anomaly detection method and facilitates the construction of training data for subsequent industrial control device anomaly detection models, thereby improving the detection accuracy of the models.
3.2. Industrial Control Device Anomaly Detection Model Based on Fine-Tuned Llama3 Model
The specific process of fine-tuning large models is shown in
Figure 7. There are performance differences between different foundational large language models, and their effects often vary greatly after fine-tuning with the exact same data. Additionally, different models require varying amounts of computational power for fine-tuning and are suitable for different applications. Therefore, selecting an appropriate foundational large language model is the first step in fine-tuning the model to accomplish tasks in a specific domain.
First, the selected foundational large language model must be secure and compliant. Considering that the anomaly detection task must meet data security and privacy compliance requirements, it is advisable to choose an open-source foundational large language model that supports local deployment for fine-tuning. Second, while foundational large language models typically excel in handling natural language-related tasks, they generally have limited ability to learn discrete, purely numerical sample features in the context of industrial control device anomaly detection tasks. By preprocessing the training samples to convert the discrete, purely numerical sample features of the anomaly detection task into common question-and-answer pairs in natural language processing tasks, the fine-tuned model can better accomplish the industrial control device anomaly detection task. Finally, the training process of large language models often requires substantial computational resources and expensive hardware support. Even during the fine-tuning process, the hardware resources required and the training speed can vary significantly depending on the pre-model and the fine-tuning method. Therefore, it is also essential to consider how to reduce the demand for hardware resources and improve training speed while ensuring the accuracy of anomaly detection.
This study, based on publicly available information and evaluation data from authoritative institutions, selected gemma-7b-bnb-4bit, llama-3-8b-bnb-4bit, and Phi-3-mini-4k-instruct for fine-tuning and testing. The selection reasons are as follows: (1) All three models are open-source foundational large language models that support local fine-tuning and deployment without internet connection, meeting security and compliance requirements. (2) They can all be fine-tuned using question-and-answer pairs to accomplish the industrial control device anomaly detection task. Finally, (3) they can all be fine-tuned using Unsloth, with GPU resource requirements not exceeding 8 GB after using Unsloth. Moreover, compared to not using Unsloth acceleration, fine-tuning speed with Unsloth is approximately doubled.
Due to the specificity of the industrial control device anomaly detection task, the detection accuracy of anomalous devices is virtually zero for various foundational large language models before fine-tuning. After fine-tuning, different models exhibit varying abilities to detect anomalous devices. Experimental results identified llama-3-8b-bnb-4bit as the most suitable foundational large language model for the industrial control device anomaly detection task after fine-tuning, with the highest detection accuracy for anomalous devices when fine-tuned using the same method.
The llama-3-8b-bnb-4bit model adopts the latest optimization algorithms, significantly enhancing training efficiency. It can rapidly converge to high-quality model parameters while reducing computational resources and time, markedly lowering training costs. The model exhibits substantial improvements in inference tasks (such as complex logical analysis and question-answering systems) and natural language generation tasks (such as text continuation and summarization). The generated text is more coherent and logically consistent, demonstrating stability and consistency across multiple tasks. The model architecture has been deeply optimized to improve computational efficiency. Through more rational hierarchical design and parameter adjustments, under identical conditions, this model’s performance significantly surpasses previous versions, especially excelling in processing large-scale data.
During experiments, it was found that the number of training epochs significantly affects the detection accuracy of anomalous devices. If the number of training epochs is too small, the model cannot accurately grasp the characteristics of anomalous devices, leading to misidentification. Therefore, a sufficiently large number of training epochs is needed to ensure that the model learns effective features. However, increasing the number of training epochs can easily lead to overfitting due to the unique characteristics of industrial control device anomaly detection samples, still resulting in reduced detection accuracy. To address this issue, this study proposes using an annealing algorithm instead of the traditional linear algorithm. With a linear algorithm, if the model is trained for too many epochs, it has enough time to learn all details in the training data, including noise and outliers, leading to overfitting. In contrast, the annealing algorithm, with its probabilistic acceptance mechanism and temperature control strategy, helps the model to avoid local optima and reduces the risk of overfitting. This approach allows for selecting a larger number of training epochs for the model to fully learn the data features while avoiding overfitting, thus improving the accuracy of anomaly detection.
The selection of training epochs in the model fine-tuning process depends on multiple factors such as dataset size and model complexity. In practice, a small number of epochs is usually set initially and gradually increased based on performance on the validation set until the requirements are met, requiring numerous experiments to find the optimal number of training epochs, which is time-consuming and labor-intensive. Through experiments, this study observed that the relationship between the sample size and the optimal number of training epochs resembles a Sigmoid function trend. By uncovering this relationship, it would be possible to provide a rough estimate of the number of fine-tuning epochs for different sample sizes, offering a basis for setting training epochs and saving experimental time. Based on the above observation, this study designed the following formula for predicting the number of training epochs for model fine-tuning:
where
represents the adjustment constant of the fine-tuning epoch function,
represents the linear coefficient of the fine-tuning epoch function, and
x represents the number of training set samples.