Prediction of Mechanical Properties of Cold-Rolled Steel Based on Improved Graph Attention Network

Luo, Xiaoyang; Guo, Rongping; Zhang, Qiwen; Tang, Xingchang

doi:10.3390/sym16020188

Open AccessArticle

Prediction of Mechanical Properties of Cold-Rolled Steel Based on Improved Graph Attention Network

¹

Gansu JISCO Group, Hongxing Iron & Steel Co., Ltd., Carbon Steel& Thin Slab Rolling Plant, Jiayuguan 735100, China

²

School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China

³

State Key Laboratory of Advanced Processing and Recycling of Nonferrous Metals, Lanzhou 730050, China

⁴

College of Materials Science and Engineering, Lanzhou University of Technology, Lanzhou 730050, China

^*

Author to whom correspondence should be addressed.

Symmetry 2024, 16(2), 188; https://doi.org/10.3390/sym16020188

Submission received: 12 January 2024 / Revised: 31 January 2024 / Accepted: 1 February 2024 / Published: 5 February 2024

Download

Browse Figures

Versions Notes

Abstract

:

The prediction of mechanical properties of cold-rolled steel is very important for the quality control, process optimization, and cost control of cold-rolled steel, but it is still a challenging task to predict accurately. For the existing graph structure of graph attention networks, it is difficult to effectively establish the complex coupling relationship and nonlinear causal relationship between variables. At the same time, it is considered that the process of cold-rolled steel has typical full-flow process characteristics and the graph attention network makes it difficult to extract the path information between the central node and its higher-order neighborhood. The neural Granger causality algorithm is used to extract the latent relationship between variables, and the basic graph structure of mechanical property prediction data is constructed. Secondly, the node embedding layer is added before the graph attention network, which leverages the symmetry nature of Node2vec method by incorporating both breadth-first and depth-first exploration strategies. This ensures a balanced exploration of diverse paths in the graph, capturing not only local structures but also higher-order relationships. The combined graph attention networks are then able to effectively capture the symmetry path information between nodes and dependencies between variables. The accuracy and superiority of this method are verified by experiments in real cold-rolled steel production cases.

Keywords:

mechanical properties; graph attention networks; nonlinear causal relationship; neural Granger causality; Node2vec

1. Introduction

Cold-rolled steel, known for its smooth surface and exceptional mechanical properties, finds extensive applications in various manufacturing sectors, including automobile, shipbuilding, and electrical appliances [1,2]. Mechanical properties serve as pivotal indicators reflecting the quality of cold-rolled steel during the rolling production process and act as crucial control parameters. The assessment of these properties typically involves destructive experiments, incurring significant time and resource costs [3]. Moreover, the intricate relationships between mechanical properties, process parameters, and chemical compositions make achieving accurate predictions a challenging endeavor. Hence, the development of a mechanical property prediction model based on chemical composition and process parameters holds paramount importance for reducing production costs, enhancing efficiency, and elevating product quality.

At present, the model-based methods for predicting mechanical properties of rolled steel mainly include the methods based on metallurgical mechanism model and the methods based on data drive. However, in the cold-rolling process, both austenitic phase and ferritic phase undergo complex microstructure evolution, so it is difficult to establish an accurate mechanism model to explain the relationship between rolling process, chemical composition, and mechanical properties [4,5]. Compared with the metallurgical mechanism model, the data-driven method is based on the historical data of actual production, which does not rely on the physical model or theoretical assumptions of the actual construction, so it can be applied to different types of steel alloys and smelting processes, and is more flexible in dealing with the diversity and complexity of the problem, and has become an effective method for predicting the mechanical properties of rolled steel. For example, based on the multi-grain cascade forest architecture [6], the basic information of steel rolling and the local information collected by multi-grain are integrated to complete the performance prediction of steel rolling. Yan et al. [7] applied MSVR and particle swarm optimization to establish a multi-output prediction model for mechanical properties of cold-rolled steel and a multi-objective quality control method for cold-rolled products. In addition, with the rapid development of deep learning in extracting nonlinear representations, it has also been applied to the field of mechanical property prediction of rolled steel. Considering the time series correlation in the steel production process, the two-stage time series model [8] based on the two-stage attention mechanism extracts input features and time features by introducing input attention and time attention adaptive, so as to accurately predict the mechanical properties of rolled steel. The CNN-based method [9] realized the CNN-based steel rolling property prediction method by converting the actual production data into two-dimensional image data.

Although the above methods have been widely used in the prediction of steel rolling properties, they cannot accurately model the whole industrial process of cold rolling, which limits the practical application of the above methods in this field. Graph neural networks (GNN) provide an excellent solution for modeling industrial data dependencies due to their strong feature representation and permutation. Chen et al. [10] realized the long-term prediction of sintering temperature by introducing the adaptive adjacency matrix algorithm and the time-varying spatio-temporal correlation of process data accurately modeled by the spatio-temporal graph attention module. Sun et al. [11] realized accurate online prediction of key indicators of complex industrial processes through multi-modal clustering method of Gaussian mixture model and dynamic attribution graph attention network.

Nevertheless, it is difficult for existing graph attention networks to model complex nonlinear causality among variables. We introduce the neural Granger causality (NGC) algorithm to capture complex nonlinear causality among variables, establishing a graph structure for predicting the mechanical properties of cold-rolled steel. In order to further improve the prediction accuracy of mechanical properties of cold rolling, a node embedding layer is added in front of the graph attention network (GAT), which uses node embedding method to integrate node path information into GAT. Based on the above methods, a neural Granger causality and embedded graph attention network (NGC–EGAT) for predicting mechanical properties of cold-rolled steel is proposed.

2. Theoretical Foundation

2.1. Granger Causality

Given an N-dimensional time series

X = (X_{1}, X_{2}, \dots, X_{T}) = (x_{1}, x_{2}, \dots, x_{N}), X \in ℝ^{N \times T}

, Granger causality (GC) is defined as follows: when

i \neq j

and other sequences are given, the historical value of

x_{j}

is included in the current value of prediction

x_{i}

, which can improve the accuracy of prediction

x_{i}

, and then it can be concluded that there is GC between

x_{j}

and

x_{i}

. Model-based GC analysis usually uses a vector autoregressive (VAR) model where, at time

t

, the time series at

X_{t}

is a linear combination of K historical values of that series, then:

X_{t} = \sum_{k = 1}^{K} A^{(k)} X (t - k) + ε_{t} S

(1)

where

A^{(k)}

is the coefficient matrix of

N \times N

,

k = 1, 2, \dots, K

, and

ε_{t}

is 0 mean Gaussian noise.

In the VAR model, the sufficient condition for the different variables

X_{j}

of multivariate time series

X

to have no GC on

X_{i}

is

A_{i j}^{(k)} = 0

, that is, GC analysis can be determined in the VAR model by examining which values in all lag coefficient matrix

A^{(k)}

are 0.

2.2. Graph Attention Networks

GAT [12] is a novel network architecture that addresses the shortcomings of graph convolutional networks (GCN) by using a masked force layer to assign different weights to different nodes in the neighborhood. Assume that given a set of input node features,

h = {{\vec{h}}_{1}, {\vec{h}}_{2}, \dots, {\vec{h}}_{N}}, {\vec{h}}_{i} \in ℝ^{F}

, where

N

is the number of nodes and

F

is the number of features of nodes. The input features are passed through a self-attention layer with masking, employing the LeakyReLU activation function, and subsequently undergoing softmax normalization, resulting in attention coefficients:

α_{i j} = \frac{\exp (L e a k y Re L U ({\vec{a}}^{T} [W {\vec{h}}_{i} ‖ W {\vec{h}}_{j}]))}{\sum_{k \in N_{i}} \exp (L e a k y Re L U ({\vec{a}}^{T} [W {\vec{h}}_{i} ‖ W {\vec{h}}_{k}]))}

(2)

where

W

is a parameterized weight matrix of linear change for each node, ‖ represents the joining operation of the vectors. After the attention coefficient

α_{i j}

is obtained, the weighted summation of the feature vectors is performed as the final output feature of each node:

\vec{h_{i}^{'}} = σ (\sum_{j \in N_{i}} α_{i j} W {\vec{h}}_{j})

(3)

In order to improve the stability of attention mechanism learning, multi-head attention is used to extend GAT, allowing the model to learn multiple different weight distributions, which helps to better capture multiple dependencies in complex graph data. Figure 1 shows the structure diagram of GAT with three attention heads.

3. NGC–EGAT Method

3.1. NGC–EGAT Overall Framework

As illustrated in Figure 2, NGC–EGAT primarily comprises two modules: the neural Granger causal graph construction module and graph attention network integrated with node embedding layer. Firstly, recognizing the limitations of existing methods in accurately establishing complex relationships between variables, a novel graph construction method based on the NGC algorithm is introduced. Secondly, a new embedding graph attention network (EGAT) is proposed. EGAT captures node path information by incorporating a node embedding layer before the GAT network. The extracted path information and node characteristics are then fed into the GAT network to update node representations. Finally, a fully connected layer of two layers is used to obtain the final prediction result.

3.2. Neural Granger Causal Graph Structure

The mechanical properties of cold-rolled steel are usually affected by a variety of process parameters and chemical compositions, and there are intricate relationships among them. Therefore, inspired by [13], this paper proposes NGC learning algorithm by combining long short-term memory (LSTM) with GC analysis, and uses LSTM for GC analysis to extract complex nonlinear causality between variables. The group lasso is used to reduce the introduction of redundant edges, and then the graph structure with efficient information transfer and correct sparsity is constructed.

Specifically, each input component is modeled using a separate LSTM. For the variable

x_{t i}

at time t, it has the following functional relationship:

x_{t i} = g_{i} (x_{⊳ t 1}, \dots, x_{⊳ t n}) + ε_{t i}

(4)

where

g_{i}

is a nonlinear function that maps the past

t - 1

historical values to

x_{i}

,

x_{⊳ t i}

is the past

t - 1

historical values of

x_{i}

, and

ε_{t}

is 0 mean gaussian noise.

LSTM networks are often employed to model intricate temporal dependencies. They incorporate gate mechanisms to regulate information flow, with the cell state

c_{t}

facilitating the transmission of long-term dependencies and the hidden state

h_{t}

aiding in the transmission of short-term dependencies. The computational formula is expressed as follows:

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})

(5)

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

(6)

o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o})

(7)

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ \tanh (W_{c} x_{t} + U_{c} h_{t - 1})

(8)

h_{t} = o_{t} ⊙ \tanh (c_{t})

(9)

f_{t}

,

i_{t}

,

o_{t}

stand for forgetting gate, input gate, and output gate, respectively, which are used to control the information update of

c_{t}

and

h_{t}

. Where the forgetting gate

f_{t}

is used to determine what information should be forgotten from the cell state

c_{t}

, the input gate

i_{t}

determines what information should be added to the cell state

c_{t}

, and the output gate

o_{t}

determines what information of the current input and cell state should be input to the hidden state of the next time step. LSTM can effectively capture the nonlinear dependence relationship between data. Finally, the sequence

i

at time

t

goes through the nonlinear evolution of LSTM, and its output

x_{t i}

can be updated from Equation (4) as follows:

x_{t i} = g_{i} (x_{⊳ t}) + e_{t i} = W_{i} h_{t} + e_{t i}

(10)

where

W_{i}

is the output weight matrix. For

W_{i}

, if we assume that all lag values k in column

j

equal to 0 (i.e.,

W_{i}^{j} = 0

), then sequence

j

is not the Granger cause of sequence

i

, indicating that

x_{(t - k) j}

does not affect

h_{t}

, and, thus, does not affect the output

x_{t i}

. As cold-rolled steel data are high-dimensional data, in order to select the most relevant causal relationship in

W_{i}

to ensure the sparsity of the graph, group lasso is used in each column of

W_{i}

to restrict the sparsity of the whole group and select the sequence with the most GC with sequence

i

. Group lasso is a method to deal with the selection of multiple groups of correlated variables. Variables are divided into several groups. In feature selection, the sparsity constraint is introduced to the whole variable group, and the whole feature group is selected or excluded at the same time, instead of just a single feature, so that variable selection can be more accurate in the case of intra-group correlation. This method can better balance variable selection and model interpretation when dealing with high dimensional data. By using group lasso on Equation (10), it is converted to solving the following optimization problem:

\underset{w}{L = \min} \sum_{t = 1}^{T} (x_{t i} - W_{i} h_{t})^{2} + λ \sum_{j = 1}^{k} ‖ W_{i}^{j} ‖

(11)

where

W_{i}^{j}

is the j-th column of

W_{i}

,

λ

is the penalty function used to control the sparsity between groups, and the variable

x_{t j}

with GC for

x_{t i}

is selected through the inter-group constraint term. GC analysis in the LSTM model can be transformed into a variable selection problem, that is, whether the elements in the matrix

W_{i}^{j}

are equal to 0. At the same time,

W_{i}

is penalized by introducing regularization terms for group lasso, thus, ensuring sparsity while ensuring correct correlation.

Given N variables, the GC between variables is calculated by the above LSTM model to determine the connectivity between variables, so as to generate the causal graph structure. The unweighted digraph

G = (V, E, A)

is used to describe the causal relationship between the multivariate variables of cold-rolled steel data, where

V \in ℝ^{N}

is a set of multivariate node sets, E is a set of edge sets, used to represent the causal relationship between variables, adjacency matrix

A \in ℝ^{N \times N}

is used to describe the relationship between nodes, and the elements of

A

are composed of 0 and 1, indicating whether there is a causal relationship between nodes. Firstly, given sequence

(x_{1}, x_{2}, \dots, x_{N}) \in ℝ^{N \times T}

, for sequence

x_{i}, i = 1, \dots, N

, the group lasso causal matrix with other associated sequences

x_{j}, j = 1, 2, \dots, N

can be obtained according to Equation (11). If the sub-vector

W_{i}^{' j} \neq 0

, then GC exists for

x_{i}

and

x_{j}

, and the corresponding element in the adjacency matrix

A

is set to 1, otherwise it is set to 0, indicating that there is connectivity between the two nodes. Finally, the above steps are repeated to obtain the adjacency matrix

A

, and the final graph structure of the cold-rolled steel performance prediction data is obtained.

3.3. Embedding Graph Attention Network

The attention mechanism of GAT allows the model to dynamically adjust its attention to node relationships in the face of different scenarios, which enables the model to better adapt to the relationship changes in the data, thereby improving the robustness of the model in the face of different input conditions. However, as a typical whole process, the cold-rolling process requires that the directed path information from the central node to other nodes of the graph can reflect the causal relationship of various influencing factors in the whole production process. However, GAT tends to only focus on first-order adjacent nodes within the same layer, which is used to aggregate the topological information of the graph and update the characteristics of the central node [14]. Therefore, in order to comprehensively consider the information of nodes on the whole path, so as to learn the continuous feature representation of nodes in the network, a node embedding layer is added before GAT. The node embedding layer utilizes the Node2vec [15] algorithm, which leverages the symmetry in its exploration of various neighborhood paths through a random walk strategy. This process aims to learn the low-dimensional feature space mapping of nodes. Thus, the high-order neighborhood information of nodes is preserved to the greatest extent. At the same time, the path information learned by Node2vec is used in the calculation of GAT’s attention score, so that GAT can consider the path information and data characteristics of graph nodes in the process of attention score calculation and information aggregation, so as to more effectively complete the task of cold-rolling performance prediction.

Node embedding layer: For a given graph structure

G = (V, E, A)

, Node2vec is used for node embedding to learn the continuous feature representation of nodes and map nodes to the low-dimensional feature space, so as to learn the path information of nodes in the network topology. First, given the current node

v \in V

, the probability of visiting the next node

x

is:

P (c_{i} = x | c_{i - 1} = v) = {\begin{matrix} \frac{π_{v x}}{Z} & i f (v, x) \in E \\ 0 & o t h e r w i s e \end{matrix}

(12)

where

π_{v x}

is the non-normalized transition probability between nodes

x

and

v

in the causal graph of the mechanical properties of cold-rolled steel, and

Z

is the normalized constant. Subsequently, let the transition probability

π_{v x} = α_{p q} (t, x) \cdot w_{v x}

,

w_{v x} = 1

be the weight of the edges of the unweighted directed graph, and

α_{p q} (t, x)

be:

α_{p q} (t, x) = {\begin{matrix} \frac{1}{p}, & i f d_{t x} = 0 \\ 1, & i f d_{t x} = 1 \\ \frac{1}{q}, & i f d t x = 2 \end{matrix}

(13)

where

d_{t x}

represents the shortest path distance of nodes

t

and

x

, Node2vec guides the random walk strategy to generate node sequence N by introducing two parameters

p

and

q

,

p

and

q

are utilized to balance the symmetry between depth-first search and breadth-first search, so as to balance the capture of local structure and global structure of the graph in the learning process and learn the embedding of nodes more comprehensively.

Finally, the generated sequence of nodes is trained using a skip-gram model, which aims to maximize the conditional probability of a context node given a central node. By adjusting the node embedding vector, nodes appearing in the same context are closer in the embedding space. After training, each node will have a corresponding embedding vector, which represents the position of the node in the learning process. These embedding vectors capture the structural relationships and similarities between nodes. For node

v_{i}

, there exists the following embedding vector:

e_{i} = ({e^{1}}_{v_{i}}, {e^{2}}_{v_{i}}, \dots, {e^{d}}_{v_{i}})

(14)

where

e

represents the embedding vector of nodes, which is used to map nodes into the low-dimensional feature space, and

d

is the embedding dimension.

GAT layer with path information: Considering that GAT only considers the first-order neighborhood, it is difficult to characterize the whole process characteristics of cold-rolled steel. Therefore, a GAT model with path information is proposed. Firstly, the path information features learned from the embedded model Node2vec are used to calculate the attention coefficient. The model can consider both the original features of nodes and the path information of nodes in the network topology in the attention mechanism, and assign different weights to different nodes, so as to realize the whole-process representation of cold-rolled steel. Formula (2) is updated as follows:

α_{i j} = \frac{\exp (L e a k y Re L U ({\vec{a}}^{T} [W {\vec{h}}_{i} + W_{e} {\vec{e}}_{i} ‖ W {\vec{h}}_{j} + W_{e} {\vec{e}}_{j}]))}{\sum_{k \in N_{i}} \exp (L e a k y Re L U ({\vec{a}}^{T} [W {\vec{h}}_{i} + W_{e} {\vec{e}}_{i} ‖ W {\vec{h}}_{k} + W_{e} {\vec{e}}_{k}]))}

(15)

Then, the calculated attention coefficient is used to update node features. In this step, in order to make full use of node features and path information, multi-head attention is used to update node features. Multi-head attention uses multiple attention calculations to mine node information in different subspaces, thus, improving the model’s ability to perceive features of different scales and levels. The calculation process is as follows:

\vec{h_{i}^{'}} = σ (\frac{1}{H} \sum_{h = 1}^{H} \sum_{j \in N (v_{i})} α_{i j}^{h} W^{h} {\vec{h}}_{j})

(16)

where

H

represents the number of heads of attention and

W^{h}

is the weight matrix.

Finally, the updated node feature vector

h^{'}

is fed into a two-layer feedforward layer to obtain the final predicted value:

h^{'} = (\vec{h_{1}^{'}}, \vec{h_{2}^{'}}, \dots, \vec{h_{N}^{'}})

(17)

\hat{y} = Re L U (W h^{'} + b)

(18)

where

W

and

b

are weight matrix and offset.

4. Experiment and Analysis

4.1. Description of Experimental Data

In order to verify the validity of the NGC–EGAT model in predicting the properties of cold-rolled steel, the data collected from SPCC steel production line of JISCO Carbon Steel Sheet Factory were used for experimental verification. According to the smelting mechanism of cold-rolled steel and the suggestions of field experts, 14 process variables and chemical components were collected, including coaling temperature (FT1), final rolling temperature (FT2), cold-rolled steel thickness (FT3), hot-rolled steel thickness (FT4), flat elongation (FT5), ALS, C, Si, Mn, P, S, Cu, Ni, AL. Through the processing of missing and abnormal data, 13,485 pieces of data were finally collected, of which the first 80% was used for training and 20% was used for testing. The objective of this paper is to predict three mechanical properties of cold-rolled steel, including yield strength (YS), tensile strength (TS), and elongation (EL).

4.2. Model Performance Evaluation Index and Parameter Setting

In order to measure the effectiveness of the model, three different performance evaluation indexes were selected in this paper, including root mean square error (RMSE), mean absolute error (MAE), and R-squared (

R^{2}

). The smaller the value of RMSE and MAE, the better, and the closer the value of

R^{2}

to 1, the better, defined as follows:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} (y_{i} - {\hat{y}}_{i})}

(19)

M A E = \frac{1}{N} \sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} |

(20)

R^{2} = 1 - \frac{\sum_{i} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i} {(y_{i} - \bar{y})}^{2}}

(21)

where

y_{i}

is the target value,

{\hat{y}}_{i}

is the predicted value, and

\bar{y}

is the average of the target value.

The hyperparameters of NGC–EGAT mainly include the hidden size (hid_dim) of GAT, the number of heads (num_heads); Table 1 lists the node embedding dimensions (emb_dim), random walk step (walk_len), context size (cont_size), and number of random walks (walks_pn) of each node in the Node2vector section.

4.3. Structural Analysis of Neural Causality Graph

In order to verify the feasibility of the proposed NGC graph structure, this paper compares the structure of cold-rolled steel performance prediction graph constructed by Pearson correlation analysis. Figure 3 shows the graph structure based on NGC and Pearson correlation analysis. In Figure 3, the connections between nodes are determined solely by measuring the strength of the linear relationship between the two variables through a graph structure built based on Pearson correlation analysis. As a result, the connections present in the graph structure are limited, covering only a few nodes. However, the graph structure based on NGC analysis not only contains all the node connections in the graph structure based on Pearson correlation analysis, but also contains many potential node relationship connections. Through the analysis, it is found that these potential node relationships are more consistent with the whole process of cold-rolled steel production and the conclusions of previous studies. For example, ALS is an important parameter in the pickling stage, FT5 is an important parameter in the finishing machine stage, and the pickling stage and the finishing machine stage are usually adjacent processes. In addition, the literature [16] pointed out that the content of P, S, C, and Mn show a positive correlation trend with mechanical properties. The literature [17] proposes that simultaneous adjustment of composition and heat treatment conditions is an effective way to optimize mechanical properties, so there is a potential relationship between FT1, FT2, and most chemical components. In addition, in order to quantitatively analyze the validity of the NGC graph structure, this paper conducted experiments on the NGC–EGAT model using two graph structures. Table 2 shows the experimental results of the two graph structures on the NGC–EGAT model. It can be seen from Table 2 that the graph structure based on neural Granger causality has the best prediction accuracy in different evaluation indicators.

4.4. Model Performance Comparison

In order to compare the performance of the NGC–EGAT model, this paper selects five advanced methods for comparison. The detailed information of the comparison model is as follows:

GATv2 [18] is an improvement of GAT model. The key hyperparameters of GATv2 are set as the hidden size is 128, the number of heads is 7, and the learning rate is 0.01;
GCN [19] is a semi-supervised deep learning model designed for graph-structured data. The key hyperparameters of GCN are set to the hidden size of 256 and the learning rate of 0.01;
DeepGCNs [20], a variant of traditional GCN, defines a differentiable generalized aggregation function to unify different message aggregation operations, adopts a deeper structure, and solves the problem of information disappearance in graph-structured data. The key hyperparameters of DeepGCNs are set as the size of hidden is 128, the number of model layers is 4, and the learning rate is 0.01;
GraphUNet [21] is a U-Net model based on graph-structured data. It realizes the feature learning of graph nodes and hierarchical representation of graph data through the graph convolution operation of hierarchical structure. The key hyperparameters for GraphUNet are set to a hidden size of 256, a U-Net depth of 4, and a learning rate of 0.01;
CNN–LSTM [22] is a deep neural network that integrates CNN and LSTM models. The hidden layer size of CNN–LSTM is 64, the time step is 2, the model has 2 layers, and the learning rate is set to 0.01.

The graph structure constructed by NGC proposed in this paper is used for prediction of GNN-based networks. Table 3 shows the experimental results of NGC–EGAT and five comparison models on the test set. As can be seen in Table 3, the proposed NGC–EGAT method is largely due to the existing advanced methods. In addition, GNN model showed better performance than the CNN–LSTM model, which also verified that the graph structure was more able to express the complex dependencies between variables in the cold-rolled steel data set. Compared with other advanced GNN models, NGC–EGAT has achieved the best results under different evaluation indexes. This is mainly because NGC–EGAT not only uses GAT to self-adaptively weight the neighbors of nodes, but also integrates the structural information of the network topology through Node2Vec. The model fully considers the path information from the central node to other nodes, so that the model learns the continuous feature representation of nodes.

In order to visually demonstrate the effectiveness of the NGC–EGAT model, Figure 4a–c shows the fitting results between the predicted values and the real values of the three mechanical properties indexes of NGC–EGAT. It can be seen from the figure that the predicted value of NGC–EGAT fits most of the true value, which also shows the effectiveness of NGC–EGAT.

In addition, Figure 5 presents the error rates of different models within the range of actual industrial production error requirements. The actual industrial production error ranges of YS, TS, and EL are ±10 Mpa, ±10 Mpa, and ±3, respectively. By looking at Figure 5, it is clear that the NGC–EGAT model has the lowest error rate in these areas. It is further confirmed that NGC–EGAT is more suitable for actual industrial production needs.

4.5. Ablation Experiment

To verify the effectiveness of each module in the proposed method for model performance, ablation studies were performed on the test set. In the NGC–GCN method, GCN is used to replace GAT, and the effectiveness of GAT is verified. In NGC–GAT method, Node2vec node embedding is removed, and the NGC graph structure training test GAT is used directly to verify the effectiveness of using Node2vec node embedding to integrate structural information. The experimental results are shown in Table 4. It can be clearly observed from Table 4 that compared with the GCN method, the GAT method shows significant improvement in different mechanical properties. After removing node embeddings, the performance of the NGC–GAT model decreases significantly compared with NGC–EGAT model, which also verifies the importance of integrating structural information, and also proves that integrating structural information through Node2vec node embeddings is more in line with the requirements of cold-rolled steel whole-process production process.

5. Conclusions

In this paper, an NGC–EGAT model for predicting mechanical properties of cold-rolled steel is presented. Firstly, the model employs the NGC algorithm to systematically construct the graph structure for predicting mechanical properties based on data related to cold-rolled steel. Subsequently, the Node2vec node embedding method utilizes a random walk strategy to explore diverse neighborhood paths of nodes, extract the path information features of graph nodes, and fuse the path information features with the original data features. After that, the GAT is used to learn the dependencies between multiple variables and the path information of the graph nodes simultaneously. Finally, the validity and robustness of the NGC–EGAT method are verified by experiments on real data sets, and the feasibility and superiority of NGC–EGAT in predicting the properties of cold-rolled steel are also proved.

Author Contributions

Conceptualization, X.L., R.G. and Q.Z.; methodology, X.L., R.G., Q.Z. and X.T.; software, R.G.; validation, R.G.; formal analysis, X.L., R.G., Q.Z. and X.T.; investigation, X.L., R.G. and Q.Z.; resources, R.G. and X.T.; data curation, X.L., Q.Z. and X.T.; writing—original draft preparation, R.G.; writing—review and editing, X.L., R.G. and Q.Z.; visualization, R.G; supervision, X.L., Q.Z. and X.T.; project administration, X.L. and Q.Z.; funding acquisition, X.L., Q.Z. and X.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China 62063021 (Research on HMS Scheduling Optimization and Control and Intelligence System in Manufacturing IoT Environment), and the National Natural Science Foundation of China 62162040 (Research on Terrain Representation and Dissemination Adaptive Scheduling Strategy for Large-scale Social Network Influence Adaptability).

Data Availability Statement

Data available on request from the authors.

Conflicts of Interest

Author Xiaoyang Luo was employed by the company Gansu JISCO Group, Hongxing Iron & Steel Co., Ltd., Carbon Steel& Thin Slab Rolling Plant. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Diao, Y.; Yan, L.; Gao, K. A strategy assisted machine learning to process multi-objective optimization for improving mechanical properties of carbon steels. J. Mater. Sci. Technol. 2022, 109, 86–93. [Google Scholar] [CrossRef]
Jacobs, L.J.M.; Atzema, E.H.; Moerman, J.; de Rooij, M.B. Quantification of the in fluence of Anisotropic Plastic Yielding on Cold Rolling Force. J. Mater. Process. Technol. 2023, 319, 118055. [Google Scholar] [CrossRef]
Sheng, H.; Wang, P.; Tang, C. Predicting mechanical properties of cold-rolled steel strips using micro-magnetic ndt technologies. Materials 2022, 15, 2151. [Google Scholar] [CrossRef] [PubMed]
Kano, M.; Nakagawa, Y. Data-based process monitoring, process control, and quality improvement: Recent developments and applications in steel industry. Comput. Chem. Eng. 2008, 32, 12–24. [Google Scholar] [CrossRef]
Li, W.; Gu, J.; Deng, Y.; Mu, W.; Li, J. New comprehension on the microstructure, texture and deformation behaviors of UNS S32101 duplex stainless steel fabricated by direct cold rolling process. Mater. Sci. Eng. A 2022, 845, 143150. [Google Scholar] [CrossRef]
Li, F.; He, A.; Song, Y.; Wang, Z.; Xu, X.; Zhang, S. Deep learning for predictive mechanical properties of hot-rolled strip in complex manufacturing systems. Int. J. Miner. Metall. Mater. 2022, 30, 1093–1103. [Google Scholar] [CrossRef]
Yan, Y.F.; Lü, Z.M. Multi-objective quality control method for cold-rolled products oriented to customized requirements. Int. J. Miner. Metall. Mater. 2021, 28, 1332–1342. [Google Scholar] [CrossRef]
Liu, X.; Cong, Z.; Peng, K.; Dong, J.; Li, L. DA-CBGRU-Seq2Seq based soft sensor for mechanical properties of hot rolling process. IEEE Sens. J. 2023, 23, 14234–14244. [Google Scholar] [CrossRef]
Xu, Z.W.; Liu, X.M.; Zhang, K. Mechanical properties prediction for hot rolled alloy steel using convolutional neural network. IEEE Access 2019, 7, 47068–47078. [Google Scholar] [CrossRef]
Chen, H.; Jiang, Y.; Zhang, X.; Zhou, Y.; Wang, L.; Wei, J. Spatio-Temporal Graph Attention Network for Sintering Temperature Long-Range Forecasting in Rotary Kilns. IEEE Trans. Ind. Inform. 2022, 19, 1923–1932. [Google Scholar] [CrossRef]
Sun, B.; Lv, M.; Zhou, C.; Li, Y. A multimode structured prediction model based on dynamic attribution graph attention network for complex industrial processes. Inf. Sci. 2023, 640, 119001. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2021, arXiv:1710.10903. [Google Scholar]
Tank, A.; Covert, I.; Foti, N.; Shojaie, A.; Fox, E.B. Neural granger causality. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4267–4279. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Wang, X.; Song, M.; Yuan, J.; Tao, D. Spagan: Shortest path graph attention network. arXiv 2021, arXiv:2101.03464. [Google Scholar]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Choudhury, A. Prediction and analysis of mechanical properties of low carbon steels using machine learning. J. Inst. Eng. (India) Ser. D 2022, 103, 303–310. [Google Scholar] [CrossRef]
Li, X.; Zheng, M.; Yang, X.; Chen, P.; Ding, W. A property-oriented design strategy of high-strength ductile RAFM steels based on machine learning. Mater. Sci. Eng. A 2022, 840, 142891. [Google Scholar] [CrossRef]
Brody, S.; Alon, U.; Yahav, E. How attentive are graph attention networks? arXiv 2021, arXiv:2105.14491. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Li, G.; Xiong, C.; Thabet, A.; Ghanem, B. Deepergcn: All you need to train deeper gcns. arXiv 2020, arXiv:2006.07739. [Google Scholar]
Gao, H.; Ji, S. Graph u-nets. In International Conference on Machine Learning; PMLR: Long Beach, CA, USA, 9–15 June 2019; pp. 2083–2092. [Google Scholar]
Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of GAT structure.

Figure 2. Structure diagram of NGC–EGAT.

Figure 3. (a) Causal graph structure; (b) Pearson correlation graph structure.

Figure 4. Prediction results of NGC–EGAT: (a) YS, (b) TS, (c) EL.

Figure 5. Prediction error rates of different models.

Table 1. NGC–EGAT hyperparameters.

Parameters	Values	Parameters	Values
hid_dim	512	num_heads	14
emb_dim	128	walk_len	10
cont_size	10	walks_pn	10

Table 2. Comparative experimental results of different graph structures.

Methods	Target Variables	RMSE	R²	MAE
NGC	YS	2.926	0.930	1.859
	TS	2.501	0.947	1.518
	EL	0.596	0.970	0.356
Pearson	YS	2.947	0.926	1.842
	TS	2.538	0.945	1.573
	EL	0.692	0.960	0.481

Table 3. Comparative experimental results of different models.

Target Variables	Metrics	Model
Target Variables	Metrics	GATv2	GCN	DeepGCNs	GraphUNet	CNN–LSTM	NGC–EGAT
YS	RMSE	3.803	3.903	3.035	3.486	4.764	2.926
	$R^{2}$	0.878	0.871	0.922	0.897	0.831	0.930
	MAE	2.704	2.674	1.921	2.407	2.920	1.859
TS	RMSE	3.135	3.086	2.658	3.122	4.673	2.501
	$R^{2}$	0.916	0.919	0.940	0.917	0.839	0.947
	MAE	1.969	2.066	1.637	2.117	3.554	1.518
EL	RMSE	0.712	0.699	0.642	0.712	1.221	0.596
	$R^{2}$	0.957	0.959	0.965	0.957	0.884	0.970
	MAE	0.482	0.480	0.432	0.497	0.930	0.356

Table 4. Ablation experiment results.

Methods	Target Variables	RMSE	$R^{2}$	MAE
NGC–GCN	YS	3.903	0.871	2.674
	TS	3.086	0.919	2.066
	EL	0.699	0.959	0.480
NGC–GAT	YS	3.827	0.876	2.466
	TS	3.029	0.922	1.911
	EL	0.685	0.960	0.430
NGC–EGAT	YS	2.926	0.930	1.859
	TS	2.501	0.947	1.518
	EL	0.596	0.970	0.356

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, X.; Guo, R.; Zhang, Q.; Tang, X. Prediction of Mechanical Properties of Cold-Rolled Steel Based on Improved Graph Attention Network. Symmetry 2024, 16, 188. https://doi.org/10.3390/sym16020188

AMA Style

Luo X, Guo R, Zhang Q, Tang X. Prediction of Mechanical Properties of Cold-Rolled Steel Based on Improved Graph Attention Network. Symmetry. 2024; 16(2):188. https://doi.org/10.3390/sym16020188

Chicago/Turabian Style

Luo, Xiaoyang, Rongping Guo, Qiwen Zhang, and Xingchang Tang. 2024. "Prediction of Mechanical Properties of Cold-Rolled Steel Based on Improved Graph Attention Network" Symmetry 16, no. 2: 188. https://doi.org/10.3390/sym16020188

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Mechanical Properties of Cold-Rolled Steel Based on Improved Graph Attention Network

Abstract

1. Introduction

2. Theoretical Foundation

2.1. Granger Causality

2.2. Graph Attention Networks

3. NGC–EGAT Method

3.1. NGC–EGAT Overall Framework

3.2. Neural Granger Causal Graph Structure

3.3. Embedding Graph Attention Network

4. Experiment and Analysis

4.1. Description of Experimental Data

4.2. Model Performance Evaluation Index and Parameter Setting

4.3. Structural Analysis of Neural Causality Graph

4.4. Model Performance Comparison

4.5. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI