1. Introduction
Silicon content in hot metal is deeply intertwined with the operational efficiency and quality assurance of blast furnaces (BFs). The BF process involves injecting hot gas that reduces the iron ore in the burden descending from the top of the furnace. The molten products (generally referred to as hot metal) are collected at the furnace hearth along with molten oxides, which form a second liquid phase of slag. Hot metal and slag are tapped from the furnace hearth and diverted into separate streams for steelmaking and other processing. This complex system necessitates a sensitive balance for optimal product yield and quality [
1], and it demands careful control over internal conditions, further compounded by the needs for improved energy efficiency and emissions control [
2].
The level of silicon dissolved in the hot metal, introduced as part of charged raw materials or fuels, is an important quality metric and often also positively correlates strongly with the energy present in the high-temperature regions of the furnace. Low or falling Si levels may warn operators of approaching stability concerns such as abnormal nodulation, excessive skull formation, and potentially even industrial accidents. Rising or high Si levels may indicate unnecessary heat generation (through wasted coke combustion) and poor energy efficiency, in addition to adverse downstream impacts [
3]. Ascertaining Si content, therefore, has significant implications for energy efficiency, environmental impact, and process economics. Furthermore, through these correlations, Si content also has implicit links with other blast furnace parameters and conditions such as fuel ratios, coal injection amount, air volume, air temperature, coke load, and smelting slag. Since the hostile environment within the furnace precludes direct temperature measurements, silicon content in the hot metal in the hearth emerges as a key proxy measure, offering a tangible link to furnace conditions and health.
However, to determine silicon content, significant time delays are present in the response of the blast furnace to operator inputs when looking at hot metal composition and temperature measurements, as well as in the lab analysis of such measurements, presenting challenges in the use of direct measurements for real-time control. Consequently, first-principles models, coupled with domain knowledge of heat conduction, energy, and chemical equations, have been deployed to infer silicon content. However, this approach can require significant computation time and may not be accurate enough to deterministically model and account for the complex range and magnitude of processes taking place—such as transient conditions in the high temperature furnace zones. Additionally, drivers with inherent stochasticity and a lack of detailed information exist. These include, but are not limited to, unknowns and uncertainties in charged material composition, transient conditions in the high-temperature zones of the furnace, and other influencing elements that are not easily captured in static models.
While the fundamental mechanisms behind the process have been established, the limiting factors established here encourage BF operators to explore methods that can predict silicon content in advance. To address this challenge, various data-driven models have been developed and consequently share a rich history of implementation alongside these physics-based models. These approaches involve sophisticated data pre-processing, feature selection, and optimization techniques [
3,
4,
5], as demonstrated by numerous researchers [
6,
7,
8,
9]. They range from traditional statistical methods to contemporary machine learning approaches, which seek to improve accuracy and speed as their prime objectives.
Among earlier works, Liu employed Bayesian networks, while Gao explored chaos theory to develop a chaotic prediction method around 2005 [
10,
11]. Jian utilized radial basis functions and support vector machines (SVMs), and Gao applied fuzzy SVMs for control limit determination around 2008. Bhattacharya employed partial least squares, while Liu and Zeng used dimensionality reduction techniques such as principal component analysis around 2009. In 2011, Jian adopted the smooth SVR method for predicting silicon content in molten iron. Time series analysis methods, proposed by Waller and Saxén [
12], were also explored. Although these traditional methods showed promise and decent accuracy under specific conditions, they struggled to incorporate large datasets or capture the complexity of reactions influencing silicon yield.
As machine learning became more feasible, more complex models gained traction. Wang improved predictions in 2014 using random forests with numerous features. In 2015, Particle Swarm Optimization with SVRs enhanced convergence speed and parameter optimization [
13]. By 2017, shallow neural networks such as Extreme Learning Machines (ELMs), combined with outlier detection and feature selection, achieved notable accuracy [
14,
15]. Later, XGBoost and LightGBM outperformed traditional models, while simple Multi-Layer Perceptrons (MLPs) proved effective, lightweight, and fast. Techniques such as Gray Relational Analysis and fuzzy clustering, integrated with neural models, enabled effective feature selection and identification of time lags for dynamic predictions under fluctuating furnace conditions [
16,
17]. Overall, MLPs and deep neural networks further improved accuracy with significant speed gains.
However, most of these works still do not incorporate an expansive set of furnace parameters and state variables, only a select few. One important reason is the overhead involved in establishing automated data processing pipelines. While domain knowledge necessitates manual intervention and judgment, many features could be automatically processed and filtered, potentially leading to better accuracy. As a result, there exists a rich history of predictive modeling work that spans traditional statistical models and contemporary machine learning models. These models have achieved good accuracy and speed but often focus on a specific subset of features, typically requiring significant manual deliberation, which may limit further improvement in accuracy. A similar practice is observed in predicting other industrial process variables, such as carbon brick temperature [
18], in addition to silicon content.
The limitations of previous works are addressed through the objectives and contributions of this current effort, which are twofold: (1) a generalized yet adaptable automated data processing pipeline and (2) an effective ML modeling, tuning, and selection framework. Existing modeling approaches in the literature often lack a comprehensive and generalized data processing pipeline. Such a pipeline is essential to lower data requirements and costs, reduce corrupted inputs, accelerate convergence, and improve model accuracy. Effective data processing, particularly in identifying anomalies and removing collinearities, is arguably more critical than modeling in physical processes. This is because real-time process data often lack the accuracy and standardization typically present in preprocessed data used in ML-driven research. Consequently, the greatest performance gains may stem from robust data processing. Furthermore, the pipeline needs to be as model-agnostic as possible to support diverse model training and selection, enabling the identification and fine-tuning of the best predictive models for various purposes, such as silicon content prediction [
17] Currently, the absence of a comprehensive yet versatile data processing pipeline, combined with the lack of a broad modeling framework, limits the accuracy and broader adoption of data-driven modeling for silicon content prediction. Our pipeline addresses these gaps by integrating both data processing and modeling modules to streamline the predictive process, making it applicable across different sites and process variables and improving predictive capabilities. Improved accuracy in predicting silicon content can help furnace operators adjust parameters effectively, thereby maintaining production yield while reducing operating time, cost, and energy consumption.
3. Results
3.1. Processed Dataset for Evaluation
The data processing pipeline formats, appropriately filters, performs relevant feature selections on, and standardizes the data. This yields a dataset with about 1800 samples. Specifically, these samples contain the compressed information of the furnace condition and state pertaining to the respective cast. Each casting period takes approximately 2.5 h, and thus, the dataset spans roughly six months of standard furnace operations. The comprehensiveness in both the quality of furnace information contained in the dataset and the time span of the operations makes it suitable for performing tests and evaluations. For this purpose, a train–test split of 85–15%, or 1560 vs. 240 samples, via random selection is performed to obtain our corresponding training and test sets.
3.2. Accuracy
On the test set, using prior shift-based data allows this approach to achieve 91% accuracy, or 0.065% expected error with respect to actual silicon content%. Concomitantly, a fivefold cross validation is performed, due to the relatively low number of samples, which yields an accuracy in a tight range of 87% to 92% for these five folds. This validates the findings by reducing the likelihood of an overfitted model or the possibility of using a ‘lucky’ test split which fortuitously yielded good results.
Moreover, this accuracy is also a lower bound of the model’s accuracy, as only the past shift’s aggregated data—which have a guarantee of availability—is utilized to predict the next shift’s silicon content output. Further improvements in accuracy requiring minimal model changes might be achieved by incorporating real-time features. While other features such as hot metal chemistry have a lag, these real-time features from sensor data such as blast moisture, blast temperature, coke rate, and natural gas injection rates can readily improve performance. This is reflected in the feature importance score for such features in
Table 2. Simultaneously, some of these features exhibit high variability in real time, the information of which may not be known to the model at that time, and thus can impact prediction accuracy. Hence, these same limitations hold the key to improving performance but may be rendered difficult to achieve due to furnace data processing and provisioning logistics in real-time. Apart from the model accuracy, with potential for improvement, it is also computationally feasible on low-spec machines. The model occupies minimal memory space, 1.1 MB, and can provide a prediction in less than 1 ms.
The accuracy of 91% or expected error bound of 0.065% in silicon content with 0.911% mean is highly promising. It is also reasonable to ascertain that a good portion of these gains can be attributed to the data processing pipeline. This assertion is corroborated in
Figure 11, where the accuracy of this approach is compared with those obtained from other models, such as sequential neural network models or SVM models. Cases are also studied without applying the data processing pipeline—only with basic formatting and data clean-up. It can be observed that even the more complex neural network model’s accuracy is lower than our Bayesian optimized XGBoost model, where hyperparameter tuning depends on the quality of processed data. Additionally, the pipeline results are compared with a commercial neural network modeling software, Neuroshell (v 3.0.0.1), which has inbuilt data processing and hyperparameter optimization. In this case, the proposed model’s accuracy remains superior. This suggests that in an industrial setting with hundreds of parameters, where manual selection becomes infeasible or is limiting, the selection pipeline can help identify the best reduced features and yield a better predictive model.
Alongside internal comparisons, this model’s results and accuracy are in line with observed prediction accuracy and error bounds seen in other literature. For instance, Song’s [
22], Cui’s [
4], and Diniz’s [
5] work exhibit an error bound of 0.13% (mean silicon content 0.5%), 0.1252% (mean silicon content 0.55%), and 0.05 (mean silicon content 0.45%), respectively, against our 0.065% (mean silicon content 0.911%). Direct comparisons are not possible because individual datasets in the studies are not publicly available, in part due to the possibility of disclosure of sensitive industrial data. Nevertheless, the approach detailed herein holds promise. The generality and adaptability of the pipeline makes it suitable for diverse industrial process data; for data refinement; and for attaining accurate models for the purpose for modeling various operational states, outputs, or variables of interest besides hot metal silicon [
18,
23].
3.3. Robustness and Reliability
Furthermore, additional result analysis was conducted to determine robustness and interpretability. The histogram plot in
Figure 12 shows the error deviation and that the residuals are approximately centered. This indicates a lack of bias in the predictions which is typically a good indicator in terms of overfitting. However, the lack of a large test set means that outliers are highlighted slightly more. The cross-validation accuracy, which is consistently around the 90–91% mark, serves to reduce this effect and provides further evidence of robustness.
Complementing this, the time series plot in
Figure 13, provides another perspective on the model’s performance. Specifically, predictions are performed on approximately 220 samples from the test set. The inputs to the model for each sample are the processed furnace parameters and state variables available from the previous cast. The models in turn predict the silicon content of the current cast. The model’s predictions align closely with the actual reported silicon content %, especially including the peaks and troughs of silicon content production. The values reported are standardized and can hence be negative. The standardized silicon content has a linear relation with the actual silicon content and thus does not affect the evaluation metric scores such as the accuracy percentage.
The reliance on historical training data means the model may require periodic retraining to address data staleness. This need can also arise due to changes in furnace operation parameters or structural modifications. To maintain accuracy and robustness, continuous evaluation of the model is essential, focusing on amortized accuracy over time rather than isolated periods of poor performance. When a consistent decline in accuracy or input data drift is observed, definitive action can be taken. Operators can provide real-time feedback on poorly performing cases, enabling the model to adapt by adding subtrees to handle these examples. However, given the low resource intensity of the current model, retraining on the entire dataset in a batch setting is generally more advisable. Techniques such as sample weighting can also be employed to prioritize critical cases during retraining.
3.4. Interpretability
Interpretability in XGBoost, unlike typical deep learning models, is one of its desired qualities, especially in industrial settings where there is a crucial need to understand the actual factors impacting production quantity rather than simply predicting the quantity itself. To achieve this, XGBoost also provides an internal ranking of feature importance or “gain scores” for each input parameter. Gain in XGBoost measures the improvement in loss reduction—how far off predictions are from ground truth silicon content during training—due to a feature split, illustrated in
Figure 10, without directly accounting for interactions with other features. It is also pertinent to note that these scores are not linear in nature. Mathematically, these are the “gains” during XGBoost training where each feature based split yields improved entropy and helps determine if that feature was useful for fitting the model on the output silicon values in the train set. Likewise, the model does not have a concept of domain knowledge, so these features should be seen as complex correlation factors rather than actual causation factors.
In
Table 3, a mix of such variables can be observed. For instance, cast average titanium is not expected to have any causal relation with determining silicon content but it may be correlated in some way with the silicon content. The XGBoost model identifies this potential correlation. However, there are other features indicated by the model that align well with those established by theory and empirical observations as causal variables, such as hot blast temperature, moisture, and cast temperature. These are expected variables from physical principles that establish the link between predicted Si content in the hot metal and the available energy in the high energy zone of the furnace as noted in the
Section 1.
Figure 14 shows the SHapley Additive exPlanation factor (SHAP), which is a highly utilized method for ascertaining impact of each feature and its interpretability. It measures the marginal contributions of a feature by considering all possible subsets of that feature. It thus provides another view of the “impact” of each feature, mainly the global standalone contribution of each feature. For instance, it can be observed that cast average titanium has a positive correlation with silicon content, but coke rate seems to only have an impactful positive correlation when its value is high. Low coke rate does not seem to lead to low silicon content, at least not with a similar impact. Natural gas injection shows an opposite effect compared to coke rate. Finally, we also see that “mca__0” feature, a compressed sum of categorical variables such as SiO
2 content, does not have a high SHAP value despite the high gain value, which values critical splitting, in
Table 3. This indicates that mca__0 might have critical importance in a small subset of splits for predictions, specifically the outliers with higher individual losses, and that it is highly correlated with other features hence diluting its standalone SHAP importance value.
3.5. Error Analysis
It is worthwhile to conduct an error analysis for the high error cases observed in this study, where a large difference is observed between the model’s predictions and actual silicon content, in order to assess potential avenues for further improvement. It can also reveal properties regarding the nature of the operations, systematic biases, and stochasticity of the processes. Thus, the current test set is augmented with new samples and segregated based on the high error cases against lower error ones. The higher error cases are defined as those that exhibit a difference of twice the mean expected error, which is 0.13%. These come out to be approximately 9.1% of the cases, or 107 cases from the augmented 1180 sized dataset.
Afterwards, the feature characteristics of the five of top important features and the silicon content are visualized.
Figure 15 visualizes these characteristics in the form of box plots and the corresponding kernel density estimates (KDEs). The box plot serves to show data statistics such as median, interquartile range, and outliers, while the KDE shows a non-parametric probability density function (pdf), essentially the estimated distribution, of the relevant features. It is observed that the natural gas injection rate has a higher mean value with a lower standard deviation, while cast average temperature has a higher standard deviation with a lower mean for higher error cases. However, the shape of their pdfs overall is consistent, and no definitive pattern emerges to conclude otherwise. The lack of difference in the distribution of the features and silicon output is somewhat unexpected but still plausible given the lack of systematic bias seen before.
Therefore, a time series-based analysis was conducted to observe if higher errors occur in local clusters. There were indeed definitive patterns in this case, as illustrated in
Figure 16 where sample actual silicon and model predictions for higher error and lower error cases in successive casts are juxtaposed. Higher error predictions generally arise when there is a lot of local variability, noise and oscillations in silicon content production across successive casts. This could indicate high unexplained stochasticity and noise in the operations of the furnace during the relevant period, not captured by the furnace parameters. Thus, the model struggles to provide a good prediction for those samples. Stochasticity in processes is inherent however their explainability can be improved by potentially including other information impacting the furnace state not yet covered by the current parameters. This includes data such as inter-shaft pressure differentials and estimates for liquid level and fraction in the hearth from other analysis approaches. Overall, the findings seem to indicate lack of systemic biases in the model and that high errors are likely caused by stochasticity in the furnace operations during localized periods.
4. Conclusions
In this paper, a comprehensive, generalized data preprocessing pipeline integrated with a robust machine learning (ML) modeling framework to predict blast furnace (BF) hot metal silicon content was developed and demonstrated. Using the XGBoost model, 91% accuracy and 0.065% expected error in silicon content were achieved for next-cast average silicon content predictions at run speeds faster than 1 ms, even with relatively stale and historical input data. This ensures data availability for potential deployments without reliance on real-time inputs. Furthermore, robustness analysis, including knowing when and how to retrain the model, and error analysis confirmed the reliability of the model. These aspects enable operators to accurately gauge silicon content and make informed adjustments to manage energy consumption and mitigate operational risks. Additionally, by controlling silicon effectively, operators can also reduce energy consumption and potentially carbon emissions in downstream processes in steelmaking, including the Basic Oxygen Furnace (BOF).
The pipeline’s generality across industrial facilities is intrinsic to the relative consistency of the raw data schema provided by industrial collaborators for integrated steelmaking operations. Typical furnace data processing systems, such as Pi or similar platforms, usually associate furnace parameters or state variables either with a timestamp (e.g., gas rate at a specific time for a particular furnace) or, in more processed cases, with operational events (e.g., cast average zinc content during a casting or ladle process). By specifying linking keys such as “timestamp” or “cast” in the pipeline’s user configuration, the full state of the furnace—including input variables and the output variable to model—can be effectively determined. Our approach is agnostic to the specific data source if the required linking criteria are met. The pipeline can then handle diverse data sources while enabling user-configurable settings for anomaly thresholds (e.g., thermocouple temperature), linking variables (e.g., casts or timestamps), and output variables (e.g., silicon content). Additionally, it dynamically performs adaptive correlations and feature selection based on the modeled output variables [
18,
23]. Moreover, the pipeline can be easily extended to predict other industrial process variables, such as slag chemistry, with minimal modifications. This adaptability enables rapid development of bespoke models tailored to different sites and operational requirements [
18,
23], making our approach both efficient and versatile.
Despite the lack of standardized data for benchmarking, the pipeline developed in this study represents a significant advancement. The ability to seamlessly integrate features on an industrial scale while remaining flexible to changing process variables is as yet undocumented in the current literature for ironmaking applications. This novelty not only enhances prediction accuracy but also offers adaptability in creating and selecting bespoke models, providing a pathway for future advancements in predictive modeling for industrial processes.
An evident area for enhancing the predictions generated by this pipeline is the integration of a powerful deep learning sequential model to incorporate time-based information. By extending the approach to yield extended forecasts and not merely short-term predictions, blast furnace operators can be offered valuable insights for longer-term planning and stability. Additionally, incorporating real-time process data into the current model—which currently relies on historical data—could further elevate the model’s capability. This would enable more accurate real-time silicon content predictions, empowering operators to make swift adjustments for immediate operational changes. This enhanced schema, considering the inherent complexity and stochasticity of blast furnace operations, likely holds the potential to deliver the most accurate and actionable predictions.