*3.1. Data-Driven Model*

The Scikit-learn software tool package is used to create five data-driven models: LR, RF, ridge regression (RR), DT, and BP neural network. The cross-validation analysis approach is used to verify the model's accuracy to acquire the best output results. A total of 60% of the data samples (154 total) were categorized as training subsets and 40% as testing subsets (102 total) using random functions to select the training and testing sets. The following are the specific steps: To begin, split the data into groups and number them using a random function. The training and testing sets are then chosen at random. Finally, statistical performance indicators such as the decisive correlation coefficient (R2), root mean square error (RMSE), and relative root mean square error (MAE) were used to evaluate the model's computational accuracy. R<sup>2</sup> shows the degree of fitting of the proposed model to the experimental data. R2 lies between 0 and 1, and the closer its value is to the upper limit, the better the model's fit to the experimental data. Otherwise, the worse. RMSE, commonly known as the cost function, plays a positive role in the learning process of ML algorithms. The more significant the RMSE and MAE values, the worse the model's fitting effect on the experimental data and the lower the model's accuracy. Table 1 displays the test and verification results of various models.

**Table 1.** Calculation results of each data-driven model.


As shown in Table 1, the BP neural network model has the highest R2 value (0.868) from the training set, while the MAE (0.845) and RMSE (0.327) values are the lowest, indicating that the BP neural network model is most appropriate for simulating the diffusion coefficient of corrosion factors and concentration of surface corrosion factors. Similarly, the RF model's R2, RMSE, and MAE values for the testing set are 0.834, 0.924, and 0.375, respectively. The results indicate a significant correlation between the measured and predicted diffusion coefficient values of corrosion factors and the concentration of surface corrosion factors. The LR model performs the poorest, with the lowest R<sup>2</sup> (0.638) and the highest MAE (0.813). The simulation results of the diffusion coefficient of corrosion

factors and the concentration of surface corrosion factors using the BP neural network are better, with the highest R<sup>2</sup> and the lowest MAE and RMSE values, as shown by the analysis findings above. The drawbacks of the BP neural network model include that it is prone to collapsing into local minima and has poor stability, which slows down the convergence speed and ultimately leads to a drop in model accuracy. As a result, updating the BP neural network model is required to increase its accuracy.
