Next Article in Journal
Molecular Simulation Study of Gas–Water Adsorption Behavior and Mobility Evaluation in Ultra-Deep, High-Pressure Fractured Tight Sandstone Reservoirs
Previous Article in Journal
Method of Quality Control of Nuclear Reactor Element Tightness to Improve Environmental Safety
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Solid Oxide Fuel Cell Voltage Prediction by a Data-Driven Approach

by
Hristo Ivanov Beloev
1,
Stanislav Radikovich Saitov
2,
Antonina Andreevna Filimonova
2,
Natalia Dmitrievna Chichirova
2,
Egor Sergeevich Mayorov
2,
Oleg Evgenievich Babikov
2 and
Iliya Krastev Iliev
3,*
1
Department Agricultural Machinery, “Angel Kanchev” University of Ruse, 7017 Ruse, Bulgaria
2
Department Nuclear and Thermal Power Plants, Kazan State Power Engineering University, 420066 Kazan, Russia
3
Department of Heat, Hydraulics and Environmental Engineering, “Angel Kanchev” University of Ruse, 7017 Ruse, Bulgaria
*
Author to whom correspondence should be addressed.
Energies 2025, 18(9), 2174; https://doi.org/10.3390/en18092174
Submission received: 28 March 2025 / Revised: 16 April 2025 / Accepted: 22 April 2025 / Published: 24 April 2025
(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)

Abstract

:
A solid oxide fuel cell (SOFC) is an electrochemical energy conversion device that provides higher thermoelectric efficiency than traditional cogeneration systems. Current research in this field highlights a variety of mathematical models. These models are based on complex physicochemical and electrochemical reactions, enabling accurate simulation and optimal control of fuel cells. However, these models require substantial computational resources, leading to high processing times. White box and gray box models are unable to achieve real-time optimization of control parameters. A potential solution involves using data-driven machine learning (ML) black-box models. This study examines three ML models: artificial neural network (ANN), random forest (RF), and extreme gradient boosting (XGB). The training dataset consisted of experimental results from SOFC laboratory experiments, comprising 32,843 records with 47 control parameters. The study evaluated the effectiveness of input matrix dimensionality reduction using the following feature importance evaluation methods: mean decrease in impurity (MDI), permutation importance (PI), principal component analysis (PCA), and Shapley additive explanations (SHAP). The application of ML models revealed a complex nonlinear relationship between the SOFC output voltage and the control parameters of the system. The default XGB model achieved the optimal balance between accuracy (MSE = 0.9940) and training speed (τ = 0.173 s/it), with performance capabilities that enable real-time enhancement of SOFC thermoelectric characteristics during system operation.

1. Introduction

Solid oxide fuel cells (SOFCs) have attracted significant research attention over the past decades due to their capability to produce electricity and high-grade heat from hydrogen or natural gas [1,2]. This cogeneration system has a higher thermoelectric efficiency than traditional and alternative generation systems [3]. This makes SOFCs a sustainable, green technology aimed at reducing carbon emissions. They also have great potential for use in distributed generation systems [4,5] and new vehicles [5,6].
SOFC performance depends on numerous operational parameters, so long-term testing and substantial financial resources for experimental research are required to ensure stable, safe, and cost-effective operation of these systems [7,8]. The application of mathematical modeling significantly reduces both the time and financial resources required for determining optimal operating parameters [4].
Currently, there are numerous mathematical modeling methods with varying levels of complexity, ranging from zero- to three-dimensional computational fluid dynamics (CFD). A brief overview of these models can be found in the works of Mütter F. et al. [4] and Huo W. et al. [8]. Multidimensional models (2D and 3D) have higher accuracy but require significantly greater computational resources [4,5]. Fuel cells rely on complex multiphysical processes involving heat and mass transfer, electrochemical reactions, and electron conductivity. Therefore, researchers have to make some simplifications in mathematical models [5,9,10], which negatively affects their predictive accuracy [7].
Machine learning (ML) methods can be used to overcome the above limitations. They simplify and accelerate computational efforts while maintaining accuracy in SOFC performance and lifespan prediction using experimental data directly [11]. These methods are easy to implement and often have higher accuracy and wider applicability than mathematical models [5]. Surrogate models trained on SOFC data can perform most calculations within one second, while physical CFD models can take hundreds of hours to perform such calculations [11]. This also leads to a substantial cost reduction when repeating a large number of experiments using ML models [12].
The aim of this study was to identify the most significant features affecting the performance of SOFCs, evaluating the weight of and quantifying their relative contributions to the SOFC output voltage. This would aid in optimizing the system’s thermoelectric efficiency. The work is structured as follows: Section 2 analyzes the existing applications of ML methods to SOFCs, and it evaluates the current state of research in this field, briefly summarizing the most common ML models and evaluation metrics. Section 3 describes the SOFC laboratory setup, experimental conditions, and the obtained monitoring data structure. It also presents a multistage ML model investigation scheme with specified hyperparameter optimization ranges. Section 4 presents the feature value analysis and highlights the most significant factors for predicting SOFC output voltage whilst also comparing and analyzing the accuracy of ML models. The concluding remarks are introduced in Section 5.

2. Current State of the Research Field

Table 1 provides a review of recent ML-based approaches for SOFC modeling. These models can generally be divided into two categories: classification and regression models.
Classification models evaluate SOFC states using operational parameters, detecting conditions such as normal state, air leaks in the air supply manifold, flooding failure in the stack, etc. [33]. Furthermore, these models can be subdivided into binary [9] and multiclass [27,32,33,35,36] classification models.
The performance and accuracy of classification models are most commonly evaluated using the following metrics:
  • precision (p), as follows:
p = TP TP + FP
  • recall (r), as follows:
r = TP TP + FN
  • accuracy (A), as follows:
A = TP + TN TP + FN + FP + TN
  • F1-score, as follows:
F 1 = 2 p r p + r
where TP, FP—true and false positives, respectively; TN, FN—true and false negatives, respectively.
In regression models, input parameters are used to predict continuous output features such as power density, current density, voltage, and others. These models are also subdivided into statistical (LR, LogR, PR, ARIMA, VARMA, etc.), traditional ML (RG, GB, SVM, etc.) and deep learning (DL) (ANN: MLP, LSTM, RNN, etc.) [5]. The performance of DL methods continues to improve as the amount of training data increases [40].
The performance and accuracy of regression models are most commonly evaluated using the following metrics:
  • mean absolute error (MAE), as follows:
MAE = 1 n i = 1 n R i P i
  • mean absolute percentage error (MAPE), as follow:
MAPE = 1 n i = 1 n R i P i R i
  • symmetric mean absolute percentage error (SMAPE), as follow:
SMAPE = 100 n i = 1 n R i P i R i + P i / 2
  • mean squared error (MSE), as follows:
MSE = 1 n i = 1 n R i P i 2
  • root mean square error (RMSE), as follows:
RMSE = 1 n i = 1 n R i P i 2
  • normalized root mean square error (NRMSE), as follows:
NRMSE = 1 R ¯ i 1 n i = 1 n R i P i 2
  • normalized mean error (NME), as follows:
NME = 1 n R ¯ i i = 1 n R i P i
  • coefficient of determination (R2), as follows:
R 2 = 1 i = 1 n R i P i 2 i = 1 n R ¯ i P i 2
where Ri, Pi—measured and predicted values, respectively; R ¯ i —sample average; n—validation sample size.
Comparative analysis of the regression models efficiency from Table 1 is challenging because the various authors do not have a unified approach to choosing metrics. However, it should be noted that in most cases, the MAPE metric did not exceed 5%, while the R2 score most often tended to 1. Thus, black-box models (ML and DL) are often more accurate and efficient than physical and statistical models (white-box and gray-box models). The authors of the studies came to similar conclusions [5,10,14].
Table 1 indicates that DL models (MLP, LSTM, BP, etc.) generally have higher efficiency according to metrics 5–12. Traditional ML models (RF, GB, SVM, etc.) also show robust and stable results, even on large data samples. For example, in [8,32,36], when analyzing samples consisting of ~10 × 106 records, ML models were found to be more efficient than DL models. This finding partially contradicts the conclusion of Ming and Sun [40] that traditional ML models outperform DL models when training data is limited.
At the same time, statistical models (ARIMA, LR, MLR, etc.) showed worse performance than ML and DL models. This phenomenon occurs because features with a strong relationship (for example, power W and electrical efficiency ηE) are not always strictly correlated with each other. Therefore, their relationships cannot always be reliably described by linear approximations [3]. As exceptions, we would like to highlight Golbabaei M.H. et al. [7] (GP); Hou D., Wu X. et al. [18] (NLARX); and Huo H., Li X. et al. [5] (N-BEATS), where traditional statistical models outperformed ML and DL approaches in predictive accuracy or computational efficiency.
The accuracy and performance of classification and regression models directly depends on the size of the training matrix. The size of the matrix is determined by the volume of records (rows) and the number of input parameters (columns). A large number of input parameters significantly reduces computational speed, while the models themselves often do not have the best metrics (for example, in [8,12,13,25]). Over-reduction of input features may result in the loss of critical relationships in the model itself. Such a model will poorly match the actual operational characteristics of the SOFC. According to Table 1, the median number of input parameters taken into account in the model among the authors was 5, while the average was 11.
Optimization of the model input features is most often performed using the following tools:
  • Correlation matrix (Pearson, Kendall tau, Spearman) [3,7,16];
  • Covariance ranks matrix [7];
  • Shapley additive explanations (SHAP) [13];
  • Sequence forward selection (SFS) [6,14] or sequential backward selection (SBS);
  • Principal component analysis (PCA) [32,36];
  • Mean decrease in impurity (MDI) [7,8,12];
  • Permutation importance (PI).
In addition, some researchers also reduce the volume of input data. For example, Rao M., Wang L. et al. [6] and Li X., Wu J. et al. [4] reduced the size of the experimental dataset from 629,873 to 10,323 records in their studies, and Sheng C., Fu J. et al. [25] reduced the sample size from 240,000 to 800 records. Thus, by analyzing Table 1, we can conclude that the optimal input matrix size for data-driven models is approximately (10 × 105) × (5 ÷ 11).
The most frequently predicted continuous target feature is the output voltage (V) generated by the SOFC (Table 1). This is a universal parameter that allows one to evaluate both SOFC performance and operational stability [7]. According to Chen H., Shan W. et al. [30], accurate stack voltage prediction enables optimization of SOFC control and design parameters, which in turn extends the fuel cell service life. As highlighted by Rao M., Wang L. et al. [6], the SOFC output voltage is the most important and valuable state characteristic, outperforming other operational parameters for validating predictive model performance. Jouin M., Bressel M. et al. [41] noted that a voltage drop in cell to a minimum value can lead to automatic shutdown of the entire stack by the safety system. Voltage prediction helps prevent such events. Li M. and Wu J. et al. [14] confirmed these findings, noting that a voltage drop to 70% of the nominal value indicates stack failure. Sheng C., Zheng Y. et al. [19] and Wu X.-L., Li Y. et al. [28] also noted that voltage is an indirect indicator of the SOFC state of health.
Following the example of other authors, this study focuses on stack voltage prediction as the most valuable feature for SOFC performance and reliability assessment. Therefore, only ML regression models have been considered in this work.
The scientific novelty of this work lies in the implementation of extreme gradient boosted random forest (XGBRF) modeling—an approach previously unreported in SOFC-related studies.

3. Materials and Methods

3.1. Laboratory Setup and Data Collection

The data for the study were obtained from a laboratory setup (Figure 1), which consisted of natural gas and nitrogen supply manifolds, a flow distribution system, an SOFC, a battery storage unit, and a steam reforming and preheating system. Gas flow rate, temperature, pressure, and electrical load were controlled by specialist software. The fuel cell unit operated in three power output modes (500 W, 1000 W, and 1500 W), with operational ranges of 600–1000 °C for stack temperature and 4.5–15.5 L/min for gas flow rate.
The stack itself contained 27 cells located in the SOFC hot zone (Figure 2).
A schematic diagram of the laboratory setup is shown in Figure 3.
The initial natural gas flow (A1) was fed from the main line to a pressure reduction unit (A2), then directed to a desulfurizer (A3). Before entering the burner (E2) and heat exchanger (E3), the main flow passed through coarse (A4) and fine (A5) filters, followed by flow splitting according to the operational mode. Nitrogen (B1) was fed to the heat exchanger in parallel with natural gas. The mixture of natural gas and nitrogen (B2) was mixed with steam (C1) superheated by exhaust gases from the burner (E2). The resulting gas mixture was then fed to the heat exchanger, after which it was directed to the reformer (E4), where the mixture reached the operating temperature of the steam reforming reaction. The synthesis gas (C2) produced by reforming was fed to the anode chamber of the fuel cell (E5). The purified air (D1), heated in the heat exchanger (E3), was supplied to the cathode chamber of the fuel cell (E5). The products of the chemical reactions of the cathode and anode chambers (F1), including oxygen and residual methane, were fed to the burner (E2), where the reaction of complete oxidation of methane occurred. The resulting exhaust gases (F2) sequentially transferred heat to the incoming flows in the heat exchanger (E3) and in the carburetor (E1).
To obtain recent on-site monitoring data, 10 experiments with a total duration of 80 h were conducted. Various power generation modes (500, 100, 1500 W) were reached. The steam reformer commissioning and hot standby modes were also reached. During the experiments, the nitrogen supply pressure was varied in the range from 1 to 5 bar with a step of 1 bar to determine the relationship between gas flow rate and SOFC stack heating dynamics. During the process of generating electricity in various modes, the SOFC was disconnected from the battery storage unit, and then an external load with variable resistance was connected (Figure 4).
As a result of the experiments, 32,844 data records were obtained in CSV format, captured via SOFC control software onto a FAT32-formatted external storage device. The complete dataset is publicly available at: https://github.com/caapel/SOFC/tree/master (accessed on 27 March 2025).

3.2. Data Preprocessing

Reading and working with the resulting CSV file were performed using the Pandas and NumPy libraries in the Jupyter Notebook development environment in Python. The monitoring data, converted to a DataFrame object, included 32,844 records with 47 parameters. Subsequent analysis required comprehensive data preprocessing.
At the first preprocessing stage, the DataFrame was reindexed to the value of the “MCGS_TIME” column (date and time of the log entry). The column itself was dropped.
At the second stage, noninformative features (for example, the column with the millisecond value “MCGS_TIMEMS”, etc.) that did not affect the target feature were eliminated.
At the third stage, similar features with identical and/or close values were combined. For example, T3 and T4—representing burner temperatures in the upper and lower sections, respectively—were merged into a common feature T3: “burner temperature”. Features were considered similar when their percentage difference was below 1%. Keeping these features separate could lead to multicollinearity issues. This dimensionality reduction approach eliminated four additional parameters (T4, T24, T26, T28).
At the fourth stage, incorrect “temperature” features (associated with temperature sensors) of which the average value exceeded 2999.9 °C were removed. This filtering process allowed us to reduce additional 12 data columns.
At the fifth stage, the target feature—output voltage (V)—was isolated. As a result, the input feature vector was reduced to 25 elements (Table 2).
During the sixth processing stage, 92 records containing “temperature” features (un-removed at the fourth stage) with values of 3000 °C were eliminated.
During the final seventh processing stage, records in which the target feature was equal to zero (representing system startup/shutdown, warm-up, and cooldown periods) were removed. As a result, the number of records in the DataFrame was reduced to 14,427 lines.
The subsequent preprocessing involved input feature normalization to scale the data and prevent features with large values from dominating. This standardization was necessary for proper neural network operation. Some activation functions, such as sigmoid and tanh are most sensitive to input values near zero. If the input values are excessively large or small, activation functions may saturate (produce values close to 0 or 1), resulting in vanishing gradients and slowed training. Normalization adapts input values to an optimal range for these functions [7,34]. Data normalization was performed using the Z-score method (StandardScaler). This normalization process is described by the following equation:
z = x x ¯ σ
where x—initial value; z—normalized value; x ¯ —sample average; σ—the standard deviation of the training samples.
The dataset was randomly split into training (for training models) and test (for tuning regressor hyperparameters) samples in a ratio of 80:20 using the train_test_split tool of the Scikit-learn library.
Principal component analysis (PCA) was implemented for some models to further reduce the feature vector and eliminate linear correlations. For implementation details of PCA-based hybrid models, refer to Section 3.3.

3.3. Model Selection and Hyperparameter Tuning

To develop predictive models, ensemble methods of traditional machine learning (extreme gradient boosting and random forest) and a deep learning method (Multilayer perceptron) were used.
Model accuracy and computational performance were evaluated on the test set using the following metrics: R2, MSE, MAE, MAPE (%) and τ (s/it).

3.3.1. Extreme Gradient Boosting

Extreme gradient boosting (XGB) is a popular and powerful machine learning algorithms. The XGB architecture is a sequential ensemble of decision trees, where each subsequent model corrects the errors of the previous, weak model [3]. Figure 5 demonstrates an example of such a tree structure.
The XGB model was developed using the XGBRegressor tool of the XGBoost library. The model was optimized by tuning the XGB regressor hyperparameters in the specified scanned ranges:
  • max_depth: 2–5;
  • learning_rate: (0.01, 0.1, 0.2);
  • gamma: (0, 0.1, 1.0, 10.0).
Hyperparameter tuning was performed using the cross-validation method with the help of a hyperparameter grid (GridSearchCV tool, Scikit-learn library). Five data splits were specified in the cross-validation parameters (n_splits = 5). The MAPE metric was selected for evaluating the model’s efficiency.
All other hyperparameters retained their default settings.
Five versions were developed for this model:
  • XGB default—default model with a full set of features (25 pcs.);
  • XGB + MDI—a model where the control parameter vector is obtained using standard XGBoost’s feature importance (MDI). The number of the strongest initial features varies in the range from 5 to 11;
  • XGB + PI—a model where the control parameter vector is obtained using permutation feature importance (PI). The number of the strongest initial features varies in the range from 5 to 11;
  • XGB + SHAP—a model where the control parameter vector is obtained using Shapley additive explanation (SHAP) feature importance. The number of the strongest initial features varies in the range from 5 to 11;
  • XGB + PCA—a hybrid model with preliminary standardization and PCA data decomposition, where the number of components varies in the range from 5 to 23 with a step of 2.

3.3.2. Random Forest

Random forest (RF) is a parallel ensemble method where multiple weak decision trees of the same type are trained independently and in parallel, and their average output becomes the prediction result. This architecture provides inherent resistance to input noise and mitigates overfitting risks [3].
The RF model was developed using the XGBRFRegressor tool of the XGBoost library. Model optimization involved tuning the max_depth hyperparameter within a specified scanned range of 5–15. The remaining parameters were saved with default settings. The GridSearchCV configuration for the RF model is similar to the XGB model described in Section 3.3.1.
Two versions of this model were developed: the default (XGBRF default), with the full set of 25 features, and the hybrid (XGBRF + PCA), using PCA data decomposition with the number of components varying from 5 to 23 in steps of 2.

3.3.3. Multilayer Perceptron

Artificial neural networks (ANNs) are ML algorithms related to deep learning. The ANN is a mathematical abstraction that models the structure and functioning mechanism of a biological neural network [7].
Figure 6 shows the architecture of the multilayer perceptron (MLP) used for data processing (Section 3.1). This was a feed-forward neural network containing two hidden layers.
The MLP model was developed using the MLPRegressor tool of the Scikit-learn library. The model was optimized by tuning the hyperparameters of the MLP regressor within specified scanned range:
  • Feature vectors size: 5–25;
  • Hidden layer sizes: (5–45; 5–45);
  • Activation functions for hidden layer neurons: logistic, ReLu, tanh.
The hyperparameter grid tuning for the MLP model followed the same approach as for the XGB model in Section 3.3.1.
The static hyperparameters of the MLP model were configured as follows:
  • Learning rate: invscaling;
  • Learning rate init: 0.055;
  • Solver: Adam;
  • Maximum iterations: 2000.
Default values were used for all other hyperparameters.
Two model variants were developed: a default version (MLP default) with the full set of 25 features, preprocessed only through standardization, and a hybrid version (MLP + PCA), where the input layer nodes varied from 5 to 23 in increments of 2. Data preprocessing, in addition to standardization, included PCA decomposition.

3.4. Hardware and Software

The following software and libraries were used in this work:
  • Windows 10, v. 22H2, build 19045.5608;
  • Python, v. 3.10.7;
  • Jupyter Notebook, v. 6.4.12;
  • SHAP, v. 0.47.0;
  • NumPy, v. 1.25.2;
  • Pandas, v. 2.2.3;
  • Seaborn, v. 0.13.2;
  • Graphviz, v. 0.20.1;
  • Matplotlib, v. 3.6.2;
  • YellowBrick, v. 1.5;
  • SKLearn, v. 1.4.2;
  • XGBoost, v. 2.1.2.
The calculations were carried out on the following hardware:
  • CPU: Intel(R) Core(TM) i5-8300H;
  • GPU: NVIDIA GeForce GTX 1050 Ti;
  • RAM: DDR4, 16 GB.

4. Results and Discussion

4.1. Feature Selection

Exploratory data analysis was focused on identifying and selecting significant in-put features. It was necessary to reduce training set dimensionality, thereby improving model accuracy and computational efficiency.
At the first stage, a correlation analysis of the data was performed (Figure 7). The absolute values of the correlation coefficients were sorted by column V in descending order for visual interpretation.
The obtained Pearson correlation matrix (Figure 7) revealed strong dependence be-tween the “temperature” features (T19, T20, T21, T22, T23, etc.). According to Lin R.H., Pei Z.H. et al. [36], it is necessary to eliminate such multiple collinearities by reducing the dimensionality of the input feature vector.
“Temperature” features also had a strong correlation with the target feature (voltage, V). Such dominance of the similar features may cause the model to overlook more important but smaller features [7]. The weakest correlations with the target feature were observed for CH4 flow rate ( Q CH 4 ), stack’s current (I), and power (W). However, weak correlations for these features does not imply their insignificance, since the relationships between parameters cannot always be reliably described by linear approximations [3].
The second stage involved feature importance evaluation using the MDI (Figure 8a), PI (Figure 8b), and SHAP (Figure 8c) methods following dataset standardization. The final feature vector was reduced to 11 elements.
The strongest feature sets for the PI and SHAP evaluation methods had matches for all features albeit with different weights for these features. The match with the MDI was partial, with only 8 of 11 features matching. Despite preliminary data standardization, the T20 feature (hydrogen temperature at the SOFC inlet) remained the dominant feature across all evaluation methods.
To mitigate feature dominance and address multicollinearity, PCA data decomposition was applied at the third stage. The features’ importance after using PCA-XGBoost is shown in Figure 9.

4.2. Diagnostics of Models

The results of ML model diagnostics performed on the full set of standardized data using the Learning Curve tool of the Yellowbrick library are presented in Figure 10. The negative mean absolute percentage error was used as the evaluation metric. The lines on figure are the mean score value, and the shaded area around each lines indicate the variance of the model [7].
All models had comparable score gaps between training and validation data. The MLP model demonstrated the highest variance. The XGB model showed overfitting after about 9000 training examples. The XGBRF model achieved the lowest variance. Thus, the XGBRF model showed the greatest stability and predictive accuracy during the training process.

4.3. Results of Model Fitting

Four out of five XGB models used dimensionality reduction of the control parameter vector. The search for the optimal number of features was performed in the following score ranges: (5, 11, 1) for the MDI (Figure 11a), PI (Figure 11b), and SHAP (Figure 11c) feature importance evaluation methods and (5, 23, 2) for PCA (Figure 11d).
The models’ scores decreased monotonically as the number of features increased, as shown in Figure 11a–c. Consequently, reducing the control parameter vector dimensionality did not improve model performance—and therefore, using the MDI, PI, and SHAP evaluation methods to select the strongest features and sample dimensionality reduction was ineffective in this case. For this reason, those three evaluation methods (MDI, PI, and SHAP) were not used for the remaining ML models (RF, MLP).
The hyperparameters for the best XGB models are presented in Table 3.
The results of finding the optimal number of components for PCA decomposition of the RF and MLP models are presented in Figure 12 and Figure 13. The hyperparameters for the best RF and MLP models are given in Table 4 and Table 5.

4.4. Discussion

A comparison of metrics of the best ML models is given in Table 6.
The analysis of the metrics in Table 6 revealed that any attempts to reduce the control parameter vector improved model performance but led to a noticeable decrease in accuracy. This confirmed our conclusions from Section 4.2.
Multicollinearity mitigation using PCA decomposition demonstrated improvements in both performance and accuracy only for the MLP model. In this case, the best result was achieved for the extreme value of the score range (n_components = 23, Figure 13). This also confirmed our conclusion that feature dimensionality reduction is inadvisable in this context.
However, PCA decomposition did not provide such a performance boost to models as reduction with the MDI, PI, and SHAP feature importance evaluation methods. PCA was a useless tool in this case. These findings contradicted the results of studies by Golbabaei M.H. et al. [7] and Zheng Y., Li X. et al. [32].
The best results were obtained using the default RF and XGB models, which was consistent with the results of the studies by Testasecca T. et al. [3] and Keyhanpour M. and Ghassemi M. [22] but contradicted the results reported by Natali A. [10], Ding R. et al. [12], and Kim J., Choi M. et al. [13].
The default XGB model showed the best performance and predictive accuracy (by the R2 and MSE metrics). The default XGBRF model demonstrated greater stability, accuracy (by the MAE and MAPE metrics), and generalization capability.
The higher performance of the XGB model (0.17 s/it) is explained by the fact that XGB builds an ensemble of trees, where each corrects the errors of previous ones, and each is individually built relatively quickly. However, this speed is negated by greater variance and a tendency to overfit (Figure 10a).
XGBRF is a modification of XGB where each tree’s training uses random data subsampling and random feature subsets. This approach helped reduce variance and improve the model’s generalization capability (Figure 10b), creating a balance between accuracy and overfitting resistance (Table 6). However, it led to significantly reduced performance (3.22 s/it) due to additional computations. Since in our case we had high feature dimensionality (25 features), and each tree was trained on a random feature subset, XGBRF required a larger number of trees to achieve good accuracy, and consequently more time.
The MLP model demonstrated relatively low performance (2.44 s/it) similar to that of the XGBRF model. This can be attributed to several factors. First, the performance of MLP models may be constrained by the backpropagation algorithm, which requires sequential computation of gradients and weight updates across all network layers. While these computations are more complex than building a single decision tree, they remain less intensive than constructing multiple parallel trees in ensemble methods such as XGBRF. The MLP’s computational inefficiency primarily resulted from imperfect hyperparameter tuning (batch size, learning rate, epochs), potentially leading to the model getting stuck in local minima during the training process.
The learning curve (Figure 10c) revealed that the MLP model exhibited overfitting and likely required stronger regularization. Unlike MLP, the default XGB and XGBRF models incorporate built-in L1 and L2 regularization, whereas MLP necessitates manual regularization configuration. Furthermore, the default XGB and XGBRF models possess intrinsic feature importance evaluation mechanisms, enabling automatic selection of relevant features while disregarding noninformative ones. Consequently, feature dimensionality reduction and PCA decomposition failed to enhance accuracy for XGB and XGBRF models, whereas these techniques simultaneously improved both accuracy and performance for the MLP model.
In conclusion, selecting the most suitable approach in our study involves finding a balance between prediction accuracy and computational training time.
Thus, for SOFC output voltage prediction under strict time constraints (including real-time operation scenarios), the authors recommend using the default XGB model with the full feature set as the optimal solution for SOFC performance and reliability evaluation.

5. Conclusions

Based on the results of the study, the following conclusions were drawn:
  • SOFC output voltage is the most frequently predicted continuous target feature for assessing fuel cell reliability and performance characteristics;
  • Applying dimensionality reduction to sample sets using MDI, PI, and SHAP feature importance evaluation methods in this task improved model performance but significantly reduced their accuracy;
  • The positive effect (simultaneous increase in both accuracy and computational performance) from PCA decomposition was obtained only for the MLP model. Therefore, PCA application is not recommended in this case;
  • The default XGB model with a full feature set demonstrated the best performance (0.17276 s/it) and accuracy (R2 = 0.99698 and MSE = 0.9940);
  • The default XGBRF model with a full feature set demonstrated the lowest variance and absolute error (MAE = 0.266 and MAPE = 1.22%), as well as the best generalization capability.
In conclusion, this study successfully achieved its objective: the performance of the developed XGB model enables real-time optimization of the most critical SOFC system control parameters (including inlet air oxygen concentration, peristaltic pump rotation speed, stack’s current, etc.). This capability directly facilitates the improvement of SOFC thermoelectric characteristics during operation.

Author Contributions

Conceptualization, N.D.C., H.I.B. and I.K.I.; methodology, E.S.M. and S.R.S.; software, S.R.S.; validation, A.A.F. and N.D.C.; formal analysis, I.K.I. and H.I.B.; resources, A.A.F.; writing—original draft preparation, S.R.S.; writing—review and editing, O.E.B. and I.K.I.; visualization, S.R.S.; supervision, I.K.I. and H.I.B.; project administration, I.K.I. and A.A.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Union’s NextGenerationEU through the National Recovery and Resilience Plan of the Republic of Bulgaria, project № BG-RRP-2.013-0001-C01. This research was also cofunded by the Ministry of Science and Higher Education of the Russian Federation “Study of Processes in a Fuel Cell–Gas Turbine Hybrid Power Plant” (project code: FZSW-2022-0001).

Data Availability Statement

The original data presented in the study are openly available in public repository “SOFC” at https://github.com/caapel/SOFC (accessed on 30 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CContent (%)
EPolarization curve
FFrequency (Hz)
gradTccMaximum combustion chamber temperature gradient
ICurrent (A)
JCurrent density (A/m2)
mMass (kg)
MMolar fraction (mol-%)
pPrecision
PPressure (MPa)
qVolume flow density (m3/h · cm2)
QVolumetric flow rate (m3/h)
QmBiomass flow rate (kg/h)
rRecall
R2Coefficient of determination
τAlgorithm performance, s/it
tRunning time (h)
TTemperature (°C)
vVelocity (m/s)
VVoltage (V)
WPower (W)
wPower density (W/m2)
ZElectrical impedance (Ohm · cm2)
ηEfficiency
ηCHPCombined heat and power efficiency
ηEElectrical efficiency
ηHHeat efficiency
λStoichiometric gas ratio
AAccuracy
ANFISAdaptive network fuzzy inference system
ANNArtificial neural network
ARIMAAutoregressive integrated moving average
ARXAutoregressive–exogenous
ASLAnode Support Layer
AUCArea under the ROC (receiver operating characteristic) curve
BLSTMBidirectional long short-term memory method
BPBackpropagation
BRRBayesian ridge regression
CDRCorrect diagnosis rate
CFLCathode functional layer
CFDComputation fluid dynamics
CNNConvolutional neural network
DAGDirected acyclic graph
DNDendritic network
DNNDeep neural network
DTDecision tree
edEncoder–decoder
ELElectrolyte layer
ELMExtreme learning machine
ES-R-GMGrey model prediction method based on residual exponential smoothing optimization
FCMFuzzy C-means clustering
FNFalse negatives
FPFalse positives
FUFuel utilization factor
GAGenetic algorithm
GBGradient boosting
GPGaussian Process
gpgrid partition
GRUGated recurrent unit
HbHistogram-based
HPSOHybrid particle swarm optimization
KFKalman Filter
KNNK-nearest neighbors
LASSOLeast absolute shrinkage and selection operator
LogRLogistic Regression
LRLinear Regression
LSLeast squares
LSTMLong short-term memory
MAEMean absolute error
MAPEMean absolute percentage error
MDIMean decrease in impurity
MLMachine Learning
MLPMultilayer perceptron
MLRMultiple linear regression
mRMRMinimum redundancy maximum relevance
MSEMean squared error
N-BEATSNeural basis expansion analysis for time series
NLARXNonlinear autoregressive–exogenous
NMENormalized mean error
NRMSENormalized root mean square error
PCAPrincipal component analysis
PIPermutation importance
PRPolynomial regression
PSOParticle swarm optimization
RBFRadial basis function
RERelative error
RFRandom forest
RHRelative humidity
RMSERoot mean square error
RNNRecurrent neural network
rtReal time
RULRemaining useful life
S/BSteam-to-biomass ratio
SBSSequential backward selection
SCSubtractive clustering
SFSSequence forward selection
SMAPESymmetric mean absolute percentage error
SOFCSolid oxide fuel cells
SVMSupport vector machine
TNTrue negatives
TPTrue positives
VARMAVector autoregressive moving average
WSWeighted score
XGBXGBoost, extreme gradient boosting
XGBRFXGBoost random forest, extreme gradient boosting random forest

References

  1. Subramanian, Y.; Veena, R.; Muhammed Ali, S.A.; Kumar, A.; Gubediran, R.K.; Dhanasekaran, A.; Gurusamy, D.; Muniandi, K. Artificial Intelligence Technique Based Performance Estimation of Solid Oxide Fuel Cells. Mater. Today Proc. 2023, 80, 2573–2576. [Google Scholar] [CrossRef]
  2. Hai, T.; Alizadeh, A.A.; Ali, M.A.; Dhahad, H.A.; Goyal, V.; Mohammed Metwally, A.S.; Ullah, M. Machine learning-assisted tri-objective optimization inspired by grey wolf behavior of an enhanced SOFC-based system for power and freshwater production. Int. J. Hydrogen Energy 2023, 48, 25869–25883. [Google Scholar] [CrossRef]
  3. Testasecca, T.; Maniscalco, M.P.; Brunaccini, G.; Airò Farulla, G.; Ciulla, G.; Beccali, M.; Ferraro, M. Toward a Digital Twin of a Solid Oxide Fuel Cell Microcogenerator: Data-Driven Modelling. Energies 2024, 17, 4140. [Google Scholar] [CrossRef]
  4. Mütter, F.; Berger, C.; Königshofer, B.; Höber, M.; Hochenauer, C.; Subotić, V. Artificial intelligence for solid oxide fuel cells: Combining automated high accuracy artificial neural network model generation and genetic algorithm for time-efficient performance prediction and optimization. Energy Convers. Manag. 2023, 291, 117263. [Google Scholar] [CrossRef]
  5. Huo, H.; Chen, Y.; Afun, G.P.; Kuang, X.; Xu, J.; Li, X. Prediction Study of Solid Oxide Fuel Cell Performance Degradation Using Data-Driven Approaches. Energy Technol. 2025, 13, 2400990. [Google Scholar] [CrossRef]
  6. Rao, M.; Wang, L.; Chen, C.; Xiong, K.; Li, M.; Chen, Z.; Dong, J.; Xu, J.; Li, X. Data-Driven State Prediction and Analysis of SOFC System Based on Deep Learning Method. Energies 2022, 15, 3099. [Google Scholar] [CrossRef]
  7. Golbabaei, M.H.; Saeidi Varnoosfaderani, M.; Zare, A.; Salari, H.; Hemmati, F.; Abdoli, H.; Hamawandi, B. Performance Analysis of Anode-Supported Solid Oxide Fuel Cells: A Machine Learning Approach. Materials 2022, 15, 7760. [Google Scholar] [CrossRef]
  8. Huo, W.; Li, W.; Zhang, Z.; Sun, C.; Zhou, F.; Gong, G. Performance prediction of proton-exchange membrane fuel cell based on convolutional neural network and random forest feature selection. Energy Convers. Manag. 2021, 243, 114367. [Google Scholar] [CrossRef]
  9. Vairo, T.; Cademartori, D.; Clematis, D.; Carpanese, M.P.; Fabiano, B. Solid oxide fuel cells for shipping: A machine learning model for early detection of hazardous system deviations. Process Saf. Environ. Prot. 2023, 172, 184–194. [Google Scholar] [CrossRef]
  10. Natali, A. Development of a Simplified SOFC Model Using Machine Learning. Ph.D. Thesis, Polytechnic University of Turin, Torino, Italy, 2024. [Google Scholar]
  11. Su, D.; Zheng, J.; Ma, J.; Dong, Z.; Chen, Z.; Qin, Y. Application of Machine Learning in Fuel Cell Research. Energies 2023, 16, 4390. [Google Scholar] [CrossRef]
  12. Ding, R.; Wang, R.; Ding, Y.; Yin, W.; Liu, Y.; Li, J.; Liu, J. Designing AI-Aided Analysis and Prediction Models for Nonprecious Metal Electrocatalyst-Based Proton-Exchange Membrane Fuel Cells. Angew. Chem. Int. Ed. 2020, 59, 19175–19183. [Google Scholar] [CrossRef]
  13. Kim, J.; Baek, J.; Choi, M. Machine-Learning-Driven Feature-Importance Analysis for Protonic Ceramic Fuel Cells. SSRN 2024. [Google Scholar] [CrossRef]
  14. Li, M.; Wu, J.; Chen, Z.; Dong, J.; Peng, Z.; Xiong, K.; Rao, M.; Chen, C.; Li, X. Data-Driven Voltage Prognostic for Solid Oxide Fuel Cell System Based on Deep Learning. Energies 2022, 15, 6294. [Google Scholar] [CrossRef]
  15. Chen, K.; Li, Y.; Chen, J.; Li, M.; Song, Q.; Huang, Y.; Wu, X.; Xu, Y.; Li, X. Prediction of Hydrogen Production from Solid Oxide Electrolytic Cells Based on ANN and SVM Machine Learning Methods. Atmosphere 2024, 15, 1344. [Google Scholar] [CrossRef]
  16. Lai, M.; Zhang, D.; Li, Y.; Wu, X.; Li, X. Application of Multiple Linear Regression and Artificial Neural Networks in Analyses and Predictions of the Thermoelectric Performance of Solid Oxide Fuel Cell Systems. Energies 2024, 17, 4084. [Google Scholar] [CrossRef]
  17. Wu, Y.; Wu, X.; Xu, Y.; Cheng, Y.; Li, X. A Novel Adaptive Neural Network-Based Thermoelectric Parameter Prediction Method for Enhancing Solid Oxide Fuel Cell System Efficiency. Sustainability 2023, 15, 14402. [Google Scholar] [CrossRef]
  18. Hou, D.; Ma, W.; Hu, L.; Huang, Y.; Yu, Y.; Wan, X.; Wu, X.; Li, X. Modeling of Nonlinear SOEC Parameter System Based on Data-Driven Method. Atmosphere 2023, 14, 1432. [Google Scholar] [CrossRef]
  19. Sheng, C.; Zheng, Y.; Tian, R.; Xiang, Q.; Deng, Z.; Fu, X.; Li, X. A Comparative Study of the Kalman Filter and the LSTM Network for the Remaining Useful Life Prediction of SOFC. Energies 2023, 16, 3628. [Google Scholar] [CrossRef]
  20. Song, S.; Xiong, X.; Wu, X.; Xue, Z. Modeling the SOFC by BP Neural Network Algorithm. Int. J. Hydrogen Energy 2021, 46, 20065–20077. [Google Scholar] [CrossRef]
  21. İskenderoğlu, F.C.; Baltacioğlu, M.K.; Demir, M.H.; Baldinelli, A.; Barelli, L.; Bidini, G. Comparison of support vector regression and random forest algorithms for estimating the SOFC output voltage by considering hydrogen flow rates. Int. J. Hydrog. Energy 2020, 45, 35023–35038. [Google Scholar] [CrossRef]
  22. Keyhanpour, M.; Ghassemi, M. Investigating the performance of tubular direct ammonia IT-SOFC with temkin-pyzhev kinetic model using machine learning and CFD. JCARME 2025, in press. Available online: https://jcarme.sru.ac.ir/article_2290.html (accessed on 8 March 2025).
  23. Subotić, V.; Eibl, M.; Hochenauer, C. Artificial intelligence for time-efficient prediction and optimization of solid oxide fuel cell performances. Energy Convers. Manag. 2021, 230, 113764. [Google Scholar] [CrossRef]
  24. Milewski, J.; Świrski, K. Modelling the SOFC behaviours by artificial neural network. Int. J. Hydrog. Energy 2009, 34, 5546–5553. [Google Scholar] [CrossRef]
  25. Sheng, C.; Fu, J.; Qin, H.C.; Zu, Y.M.; Liang, Y.Z.; Deng, Z.H.; Wang, Z.; Li, X. Short-term hybrid prognostics of fuel cells: A comparative and improvement study. Renew. Energy 2024, 237, 121742. [Google Scholar] [CrossRef]
  26. Wu, X.; Yang, Y.; Li, K.; Xu, Y.; Peng, J.; Chi, B.; Wang, Z.; Li, X. Performance prediction of gasification-integrated solid oxide fuel cell and gas turbine cogeneration system based on PSO-BP neural network. Renew. Energy 2024, 237, 121711. [Google Scholar] [CrossRef]
  27. Wu, X.; Mei, J.; Xu, Y.; Cheng, Y.; Peng, J.; Chi, B.; Wang, Z.; Li, X. Stack performance classification and fault diagnosis optimization of solid oxide fuel cell system based on bayesian artificial neural network and feature selection. J. Power Sources 2024, 620, 235198. [Google Scholar] [CrossRef]
  28. Wu, X.; Li, Y.; Cai, S.; Xu, Y.; Hu, L.; Chi, B.; Peng, J.; Li, X. Data-driven approaches for predicting performance degradation of solid oxide fuel cells system considering prolonged operation and shutdown accumulation effect. J. Power Sources 2024, 598, 234186. [Google Scholar] [CrossRef]
  29. Kheirandish, A.; Shafiabady, N.; Dahari, M.; Kazemi, M.S.; Isa, D. Modeling of commercial proton exchange membrane fuel cell using support vector machine. Int. J. Hydrog. Energy 2016, 41, 11351–11358. [Google Scholar] [CrossRef]
  30. Chen, H.; Shan, W.; Liao, H.; He, Y.; Zhang, T.; Pei, P.; Deng, C.; Chen, J. Online voltage consistency prediction of proton exchange membrane fuel cells using a machine learning method. Int. J. Hydrogen Energy 2021, 46, 34399–34412. [Google Scholar] [CrossRef]
  31. Raeesi, M.; Changizian, S.; Ahmadi, P.; Khoshnevisan, A. Performance analysis of a degraded PEM fuel cell stack for hydrogen passenger vehicles based on machine learning algorithms in real driving conditions. Energy Convers. Manag. 2021, 248, 114793. [Google Scholar] [CrossRef]
  32. Zheng, Y.; Wu, X.L.; Zhao, D.; Xu, Y.W.; Wang, B.; Zu, Y.; Li, D.; Jiang, J.; Jiang, C.; Fu, X.; et al. Data-Driven Fault Diagnosis Method for the Safe and Stable Operation of Solid Oxide Fuel Cells System. J. Power Sources 2021, 490, 229561. [Google Scholar] [CrossRef]
  33. Huo, W.; Li, W.; Sun, C.; Ren, Q.; Gong, G. Research on Fuel Cell Fault Diagnosis Based on Genetic Algorithm Optimization of Support Vector Machine. Energies 2022, 15, 2294. [Google Scholar] [CrossRef]
  34. Legala, A.; Zhao, J.; Li, X. Machine Learning Modeling for Proton Exchange Membrane Fuel Cell Performance. Energy AI 2022, 10, 100183. [Google Scholar] [CrossRef]
  35. Chauhan, V.; Mortazavi, M.; Benner, J.Z.; Santamaria, A.D. Two-phase flow characterization in PEM fuel cells using machine learning. Energy Rep. 2020, 6, 2713–2719. [Google Scholar] [CrossRef]
  36. Lin, R.-H.; Pei, Z.-X.; Ye, Z.-Z.; Guo, C.-C.; Wu, B.-D. Hydrogen fuel cell diagnostics using random forest and enhanced feature selection. Int. J. Hydrogen Energy 2020, 45, 10523–10535. [Google Scholar] [CrossRef]
  37. Lü, X.; Deng, R.; Chen, C.; Wu, Y.; Meng, R.; Long, L. Performance optimization of fuel cell hybrid power robot based on power demand prediction and model evaluation. Appl. Energy 2022, 316, 119087. [Google Scholar] [CrossRef]
  38. Han, I.S.; Chung, C.B. Performance prediction and analysis of a PEM fuel cell operating on pure oxygen using data-driven models: A comparison of artificial neural network and support vector machine. Int. J. Hydrogen Energy 2016, 41, 10202–10211. [Google Scholar] [CrossRef]
  39. Zhong, Z.-D.; Zhu, X.-J.; Cao, G.-Y. Modeling a PEMFC by a support vector machine. J. Power Sources 2006, 160, 293–298. [Google Scholar] [CrossRef]
  40. Ming, W.; Sun, P.; Zhang, Z.; Qiu, W.; Du, J.; Li, X.; Zhang, Y.; Zhang, G.; Liu, K.; Wang, Y.; et al. A systematic review of machine learning methods applied to fuel cells in performance evaluation, durability prediction, and application monitoring. Int. J. Hydrogen Energy 2023, 48, 5197–5228. [Google Scholar] [CrossRef]
  41. Jouin, M.; Bressel, M.; Morando, S.; Gouriveau, R.; Hissel, D.; Péra, M.C.; Zerhouni, N.; Jemei, S.; Hilairet, M.; Ould Bouamama, B. Estimating the end-of-life of PEM fuel cells: Guidelines and metrics. Appl. Energy 2016, 177, 87–97. [Google Scholar] [CrossRef]
Figure 1. Physical (a) and 3D (b) diagrams of the 1.5 kW SOFC power generation system.
Figure 1. Physical (a) and 3D (b) diagrams of the 1.5 kW SOFC power generation system.
Energies 18 02174 g001
Figure 2. Three-dimensional structural diagram of the hot zone.
Figure 2. Three-dimensional structural diagram of the hot zone.
Energies 18 02174 g002
Figure 3. Schematic diagram of the laboratory setup.
Figure 3. Schematic diagram of the laboratory setup.
Energies 18 02174 g003
Figure 4. SOFC system output electrical characteristics.
Figure 4. SOFC system output electrical characteristics.
Energies 18 02174 g004
Figure 5. Example of the XGB model decision tree.
Figure 5. Example of the XGB model decision tree.
Energies 18 02174 g005
Figure 6. Schematic of artificial neural network architecture for fuel cell voltage prediction.
Figure 6. Schematic of artificial neural network architecture for fuel cell voltage prediction.
Energies 18 02174 g006
Figure 7. Correlation of input features with the output value.
Figure 7. Correlation of input features with the output value.
Energies 18 02174 g007
Figure 8. Feature importance for: (a) MDI; (b) PI; (c) SHAP.
Figure 8. Feature importance for: (a) MDI; (b) PI; (c) SHAP.
Energies 18 02174 g008
Figure 9. Features’ importance after using PCA decomposition.
Figure 9. Features’ importance after using PCA decomposition.
Energies 18 02174 g009
Figure 10. Learning curves for XGB (a); XGBRF (b); MLP (c).
Figure 10. Learning curves for XGB (a); XGBRF (b); MLP (c).
Energies 18 02174 g010aEnergies 18 02174 g010b
Figure 11. Finding the optimal numbers of features for XGB models: MDI (a); PI (b); SHAP (c); PCI (d).
Figure 11. Finding the optimal numbers of features for XGB models: MDI (a); PI (b); SHAP (c); PCI (d).
Energies 18 02174 g011aEnergies 18 02174 g011b
Figure 12. Finding the optimal number of components for the XGBRF + PCA model.
Figure 12. Finding the optimal number of components for the XGBRF + PCA model.
Energies 18 02174 g012
Figure 13. Finding the optimal number of components for the MLP + PCA model.
Figure 13. Finding the optimal number of components for the MLP + PCA model.
Energies 18 02174 g013
Table 1. Review of recent ML-based approaches for SOFC modeling.
Table 1. Review of recent ML-based approaches for SOFC modeling.
ReferenceYearModel (s)Best ModelData SizeProportion Set:
Train–Test–Valid
VariablesBest Error Metrics
Input (Count)Output
Subramanian Y. et al. [1]2023SVMSVM1685:15:0T, V (2)J, wMAPE = 0.0098
Testasecca T. et al. [3]2024XGB, RF, LSTM, GB, MLP, PRRF>250090:10:0 W ,   Q gas ,   W gas , m C O 2 emiss ,   m C O 2 save , Σ m C O 2 save , ΔW (7)ηEMAE = 0.24
MAPE = 0.04
MSE = 0.14
RMSE = 0.38
R2 = 0.98
Mütter F. et al. [4]2023GA-MLPGA-MLP534,9764:1:0 M H 2 , M H 2 O , M C O , M C O 2 ,   M C O 4 , M N 2 , T, J (8)VMSE = 6.384 × 10−7 ± 7.159 × 10−8
RMSE = 0.799 ± 0.268
Huo H. et al. [5]2025VARMA, RBF, GRU N-BEATS, LSTMN-BEATS374390:10:0VMAE = 0.0225
RMSE = 0.0237
R2 = 0.9889
Rao M. et al. [6]2022ARIMA,
multi-step LSTM,
recursive LSTM
multi-step LSTM10,3237000:2323:1000 V rt ,   I , P CH 4 in , Pcathod air,
Pannod in, Pannod out, Pcathod out (7)
VRMSE = 0.3444
MAE = 0.1691
Golbabaei M.H. et al. [7]2022LR, SVM, GP, DT, RF, GB, KNN, MLPMLP4038:2:0ASL-, EL-, CFL-thickness, ASL porosity, T, J (6)VR2 = 0.998
MSE = 9.6 × 10−5
MAE = 6 × 10−3
Huo W. et al. [8]2021DNN, RF-CNNRF-CNN>10,000(26)I-V
curve
RMSE = 0.0396
MAE = 0.0355
R2 = 0.9119
Vairo T. et al. [9]2023GBGB439280:20:0V, I, F, Zr, Zim (5)2 state p = 0.99
r = 0.99
F-1 score = 0.99
Natali A. [10]2023LR, BRR, PR, DT, GB, RF, Hb-GB, MLPHb-GB10,00080:20:0 M H 2 , T, J (3)VR2 = 0.972
RMSE = 1.6 × 10−4
Ding R. et al. [12]2020DT, XGB, BP-ANNBP-ANN>10,00085:15:0(26)wR2 = 0.9621
RMSE = 58.5
Kim J. et al. [13]2024LR, DT, RF, MLP, SVM, XGBMLP5918:2:0(57)wR2 = 0.9251
Li M. et al. [14]2022LSTM, ed-LSTM, GRU, ed-GRUed-LSTM10,3234129:5162:
1032
V rt ,   I , P CH 4 in , Pcathod air (4)VMSE = 0.014966
MAE = 0.084220
R2 = 0.964618
Chen K. et al. [15]2024SVM, BPBP20008:2:0 t ,   V ,   I , Q H 2 rt (4) Q H 2 RMSE = 0.259
MAPE = 0.003334
MAE = 0.017
R2 = 0.9976
Lai M. et al. [16]2024MLR, BPBP662,327(12)T, V, WNRMSE = 0.0066
NME = 0.428
MAE = 1.367
Wu Y. et al. [17]2023DAG, DN, BP, SVM, RF, GA-RBF, RBF, GA-BP, LS-SVMDAG10991000:99:0Qfuel, Qair, Qsteam, W (4)ηH, ηEMAE = 0.0109
RMSE2 = 0.0135
Hou D. et al. [18]2023ARX, NLARXNLARX3600W, T, Qwater (3) Q H 2 MSE = 0.05266
Sheng C. et al. [19]2023KF, LSTMLSTM37502500:1250:0J, Vrt, w, I (4)V, RULRMSE = 1.4373
MAE = 0.0015
Song S. et al. [20]2021BP, SVM, RFBP858650:208:0 T ,   J ,   Q air , Q H 2 (4)VR2 = 0.999
RMSE = 0.0032
MAE = 0.0769
İskenderoğlu F.C. et al. [21]2020SVM, RFSVM12721122:150:0T, J, syngas types, etc. (10)VMAPE = 0.0092
Keyhanpour M., Ghassemi M. [22]2025DNN, RF, LASSORF601601:30:0vfuel, vair, T, ASL and CFL porosity (5)w, TRMSE = 0.1213
MAE = 0.08853
R2 = 0.9999
Subotić V. et al. [23]2021MLPMLP227180:15:5T, J, type fuel, etc. (9)E,
Zim, Zr
E/Zr/Zim:
MSE = 2.93 × 10−5, 7.12 × 10−7/3.68 × 10−7
MAPE = 0.0034/0.00204/0.369
SMAPE = 0.0034/0.02/0.237
Milewski J., Świrski K. [24]2009MLPMLP5831:0:0J, T, qfuel, qoxidant (4)VRE = 1.0%
Sheng C. et al. [25]2023ES (ES2/ES3)-R-GM, ANFIS-SC (gp/FCM)ES3-R-GM + ANFIS-SC800500:300:0Tfuel, Tair, Tstack, Tburn, Pgas, V,
I, W, etc. (82)
VRMSE = 0.1345
R2 = 0.9450
Wu X. et al. [26]2024PSO-BPPSO-BP729070:30:0 C C , C H 2 , C O 2 , Tstack, Tanode, Qm, S/B (7)V, J, ηE, ηCHPMAPE < 0.06
RMSE < 0.33
R2 > 0.98
Wu X. et al. [27]2024ReliefF-mRMRReliefF-mRMR220670:30:0 W ,   grad T cc , T H 2 (3)3 stateCDR1 = 0.98
CDR2 = 0.978
CDR3 = 0.981
Wu X. et al. [28]2024MLR, RBF, BP, LSTM, PSO-BP, GA-BPGA-BP91046300:2804:0t, Tafterburn, Tstack, I, W (5)VMAE = 0.182
MSE = 0.081
R2 = 0.949
RMSE = 0.285
Kheirandish A. et al. [29]2016SVM, MLPSVM9725J, V, W, ηE (4)VI,
PI,
ηE–P curve
PI:
MSE = 0.0009
R2 = 0.9952
Chen H., el al. [30]2021LS-MLR, KNN, SVM, AdaBoost, RF, Bagging DT, GBGB50080:20:0λ, T, RH, Panode, J (5)VR2 = 0.89609
Raeesi M. et al. [31]2021RNN, DNN, LSTM, BLSTMDNN60005250:750VMSE = 0.14
R2 = 0.9982
Zheng Y. et al. [32]2021PCA-MLP, RF, PCA-SVMSVM71,06470:30:0Qair,re-burner, Qair,bypass, Qwater, Qfuel,react, Tair,exch, Tafter-burner, Treformer, I, V (9)3 stateAUC = 0.997
A = 0.9304
F1-score = 0.929
Huo W. et al. [33]2022ELM, SVM, GA-SVMGA-SVM400350:50:0(12)9 stateA = 0.98
Legala A. et al. [34]2022SVM, BP-ANNBP-ANN110070:30:0I, T, Pcathod, PO2, PH2, Membrane Hydration (6)VMAE = 0.011
R2 = 0.995
RMSE = 0.015
Chauhan V. et al. [35]2020LogR, SVM, MLPMLP47343750:984:0Extracted channel photo3 stateA = 0.95
Lin R.H. et al. [36]2020DT, RF, KNN, SVM, AdaBoost, ANNRF-PCA206,36075:25:0(8)3 stateAUC = 0.99975
F1-score = 0.9989
Lü X. et al. [37]2022RF-HPSORF-HPSO200,00070:30:0(4)WMSE = 47.6444
Han I.S. et al. [38]2016SVM, ANNANN1468923:454:0PH2, PO2, T, RHc, I (5)VR2 = 0.9994
RMSE = 2.4
MAPE = 0.0022
Zhong Z.-D. et al. [39]2006SVMSVMT, J, V, etcI-V curveMSE = 0.0002
R2 = 0.997%
Table 2. Feature notation and interpretations.
Table 2. Feature notation and interpretations.
FeatureInterpretationFeatureInterpretation
T3Burner temperature, °CT27Temperature at the right point of the reformer, °C
T5Temperature at the inlet of the reformer, °CT30Cooling water temperature, °C
T7SOFC exhaust gases temperature, °CT31Water tank temperature, °C
T9Heat exchanger temperature, °Cpump_spdPeristaltic pump rotation speed, rps
T12Water temperature for steam reforming, °Cimpl_spd1Cooling fan speed, rps
T16Temperature at the SOFC left front point, °Cimpl_spd2Main fan speed, rps
T17Temperature at the SOFC right rear point, °CQ_CH4CH4 flow rate, m3/h
T19Air temperature at the SOFC inlet, °CQ_CH4_N2CH4/N2 flow rate, m3/h
T20Hydrogen temperature at the SOFC inlet, °CP_NGDifferential natural gas pressure, bar
T21Air temperature at the SOFC outlet, °CO2Oxygen concentration at the burner inlet, %
T22Hydrogen temperature at the SOFC outlet, °CWThe stack’s power, Wt
T23Temperature at the rear point of the reformer, °CIThe stack’s current, A
T25Temperature at the left point of the reformer, °CVThe stack’s voltage, V
Table 3. Hyperparameters of optimal XGB models.
Table 3. Hyperparameters of optimal XGB models.
ModelScoreNumber of Features/ComponentsHyperparameters
GammaLearning RateMax Depth
XGB defaultMSE2500.25
XGB + MDIMAPE1000.25
MSE110.10.25
XGB + PIMAPE110.10.25
MSE1000.25
XGB + SHAPMAPE, MSE110.10.25
XGB + PCIMAPE190.10.25
MSE170.10.25
Table 4. Hyperparameters of optimal RF models.
Table 4. Hyperparameters of optimal RF models.
ModelScoreNumber of Features/ComponentsHyperparameters
Max Depth
XGBRF defaultMSE2515
XGBRF + PCAMAPE1315
MSE1715
Table 5. Hyperparameters of optimal MLP models.
Table 5. Hyperparameters of optimal MLP models.
ModelScoreNumber of Features/ComponentsHyperparameters
Hidden Layer SizeActivation
MLP defaultMAPE25(40, 40)logistic
MLP + PCAMAPE, MSE23(15, 15)logistic
Table 6. Comparison of accuracy and performance of optimized ML models.
Table 6. Comparison of accuracy and performance of optimized ML models.
Model (Components)R2MSEMAEMAPEτ, s/it
Extreme gradient boosting
XGB + default (25)0.996980.99400.3092.63%0.172760
XGB + PCA (19)0.995241.56700.3923.52%0.142445
XGB + SHAP (11)0.996601.11800.3833.40%0.095001
XGB + PI (11)0.996581.12340.3843.40%0.093776
XGB + MDI (10)0.995251.56040.4273.52%0.091928
Random forest
XGBRF default (25)0.996801.05460.2661.12%3.215760
XGBRF + PCA (13)0.995181.58180.3361.55%2.389213
Multilayer perceptron
MLP default (25)0.994681.75460.5545.89%2.444528
MLP + PCA (23)0.995271.55530.4904.82%2.148090
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Beloev, H.I.; Saitov, S.R.; Filimonova, A.A.; Chichirova, N.D.; Mayorov, E.S.; Babikov, O.E.; Iliev, I.K. Solid Oxide Fuel Cell Voltage Prediction by a Data-Driven Approach. Energies 2025, 18, 2174. https://doi.org/10.3390/en18092174

AMA Style

Beloev HI, Saitov SR, Filimonova AA, Chichirova ND, Mayorov ES, Babikov OE, Iliev IK. Solid Oxide Fuel Cell Voltage Prediction by a Data-Driven Approach. Energies. 2025; 18(9):2174. https://doi.org/10.3390/en18092174

Chicago/Turabian Style

Beloev, Hristo Ivanov, Stanislav Radikovich Saitov, Antonina Andreevna Filimonova, Natalia Dmitrievna Chichirova, Egor Sergeevich Mayorov, Oleg Evgenievich Babikov, and Iliya Krastev Iliev. 2025. "Solid Oxide Fuel Cell Voltage Prediction by a Data-Driven Approach" Energies 18, no. 9: 2174. https://doi.org/10.3390/en18092174

APA Style

Beloev, H. I., Saitov, S. R., Filimonova, A. A., Chichirova, N. D., Mayorov, E. S., Babikov, O. E., & Iliev, I. K. (2025). Solid Oxide Fuel Cell Voltage Prediction by a Data-Driven Approach. Energies, 18(9), 2174. https://doi.org/10.3390/en18092174

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop