Data-Driven Insights into Concrete Flow and Strength: Advancing Smart Material Design Using Machine Learning Strategies

Alqurashi, Muwaffaq

doi:10.3390/buildings15132244

Open AccessArticle

Data-Driven Insights into Concrete Flow and Strength: Advancing Smart Material Design Using Machine Learning Strategies

by

Muwaffaq Alqurashi

Department of Civil Engineering, College of Engineering, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia

Buildings 2025, 15(13), 2244; https://doi.org/10.3390/buildings15132244

Submission received: 16 May 2025 / Revised: 20 June 2025 / Accepted: 25 June 2025 / Published: 26 June 2025

(This article belongs to the Special Issue Advanced Materials for Modern Methods of Construction: Innovations, Challenges, and Sustainable Building Applications)

Download

Browse Figures

Versions Notes

Abstract

Concrete plays a pivotal role in modern methods of construction due to its enhanced strength, durability, and adaptability to advanced building technologies. Compressive strength (CS) and workability (flow) are two important performance measures of concrete, and this paper investigates how two evolutionary machine learning methods, gene expression programming (GEP) and multi-expression programming (MEP), might be used for this purpose. An experimental dataset with ten crucial input parameters was employed to develop and assess the models. While the GEP model demonstrated strong predictive capability (R² = 0.910 for CS and 0.882 for flow), the MEP model exhibited superior precision, attaining R² values of 0.951 for CS and 0.923 for flow. Model evaluation through statistical indices and correlation metrics further supported the robustness of the MEP approach. To enhance interpretability and material design insight, Shapley additive explanation (SHAP) analysis was conducted, identifying water-to-binder ratio and slag content as critical predictors for CS, and water and slag as dominant factors for flow. These results underscore the potential of MEP as a reliable decision-support tool in the sustainable design and optimization of concrete for advanced construction applications.

Keywords:

compressive strength; flow; predictive models; machine learning; SHAP analysis

1. Introduction

The environmental impact of concrete over many years has been brought to light by its lengthy history as a crucial building material [1]. Cement and concrete demands are projected to triple by 2050, leading to faster-than-expected increases in carbon emissions and decreases in biodiversity [2]. Cement has been the focus of efforts to develop new binding agents because of the substantial amount of energy and carbon it consumes [2,3]. Cement, an essential ingredient in concrete, requires around 1.80 metric tons of raw materials and produces about 0.8 metric tons of carbon dioxide gas during its production [4]. Therefore, in order to lessen the impact on the environment, cement production needs to be reduced right away [2]. Material sustainability over the long term can be achieved in a scientific and systematic fashion by reusing agricultural and industrial waste to produce new building materials [5,6]. The production of supplementary cementitious materials (SCMs) from recycled agricultural and industrial waste has far-reaching beneficial effects on ecosystems, economies, and societies [7,8]. An effective, long-term, and cost-effective way to lessen one’s impact on the environment is to use recycled materials instead of cement [9,10,11].

Vakharia and Gujar [12] state that conventional methods of predicting the workability and compressive strength of concrete, which are based on empirical formulas and experimental testing, typically fail to match expectations when used for concrete. These methods are founded on empirical formulas and experimental tests. The old approaches are not only time-consuming and expensive, but they also have the potential to fail to capture the whole complexity of concrete behavior. This is a significant limitation of the traditional methods. Because of its one-of-a-kind qualities and improved performance, concrete calls for prediction methods that are more precise and trustworthy. To improve the precision of forecasts of concrete properties, machine learning (ML) models now provide a potentially useful avenue. These forecasts have traditionally been based on empirical calculations and tests that have been conducted through experimentation.

Predicting future cementitious material properties has seen extensive usage of ML algorithms in recent years [13,14]. Shafighfard et al. used a stacking archetypal to estimate the CS of steel-fiber-containing concrete when heated to high temperatures [15]. The layered model made an accurate prediction of the CS (R = 0.96); this was true. As a means of ascertaining the CS of fibrillated concrete, Nguyen et al. [16] utilized a number of neural network computation methods. In terms of accuracy, the neural network models made exceptionally accurate predictions, as evidenced by their correlation coefficient of 0.974. The characteristics of basalt fiber-reinforced concrete were predicted using a decision tree (DT) in a separate study by Kulasooriya et al. [17]. When comparing the three models’ prediction accuracy, the RF model came out on top. Using DT, bagging regressor, and boosting regressor, concrete containing rice husk ash was tested for CS by Amin et al. [18]. When compared to DT and boosting, the bagging regressor’s accuracy was much greater. Zheng et al. [19] discovered that GB outperformed the RF model when it came to predicting the flexural strength and workability of steel fiber concrete. When evaluating the characteristics of silica fume-based concrete, Nafees et al. [20] employed DT and SVM ensemble models, and compared the individual models to the ensembles. It was found that the DT model attained an 11% improvement in accuracy and the SVR model a 1.5% improvement. Cakiroglu et al. [21] estimated the tensile strength of basalt fiber-reinforced concrete using the random forest (RF) algorithm along with three different boosting techniques. The victorious method was extreme gradient boosting, or XGBoost, because its coefficient of determination was higher than 0.9. Furthermore, earlier research by Cakiroglu et al. [22,23] proved that boosting methods successfully forecasted properties of concrete’s strength. Concrete made from recycled foundry sand has its properties predicted using GEP by Chen et al. [24]. The GEP model delivered the highest accuracy in predicting output values. In a related study, Sultan et al. evaluated the CS of concrete incorporating sugarcane bagasse ash using random forest (RF), gene expression programming (GEP), and support vector machine (SVM) [25]. Yang et al. [26] used GEP and MEP to predict the CS of low-carbon alkali-activated materials. MEP outperformed GEP, showing higher accuracy with R² = 0.970. Also, Khan et al. [27] applied GEP, MEP, and ensemble ML models like XGB and AdaBoost to predict the CS of metakaolin-blended mortar. Shapley additive explanations (SHAP) analysis revealed water-to-binder ratio and curing age as the most influential features, highlighting the importance of explainable AI in materials prediction. Although many studies have applied machine learning to predict the properties of cementitious materials, the combined use of advanced models to estimate both the workability and strength of concrete, along with SHAP analysis, has been largely overlooked in current literature. What sets GEP and MEP apart from conventional machine learning models is their ability to produce accurate predictions while also providing explicit mathematical expressions, making them highly valuable for practical engineering applications and deeper model interpretation.

A trustworthy computational framework for concrete compressive strength (CS) and flow prediction can be built with the help of well-trained ML algorithms. To estimate the CS and concrete flow, this work developed regression models using gene expression programming (GEP) and multi-expression programming (MEP) using publicly available research data. The dataset includes 515 CS and flow data points taken from the current literature. Model validation was carried out by analyzing Taylor’s diagrams and conducting statistical checks. Our predictions were further examined by conducting a SHAP analysis to determine the impact of various parameters. Implications for the building industry as a whole could be substantial if sophisticated methods and instruments could be designed for the purpose of undertaking controlled evaluations of material attributes with minimal to no intervention from human beings.

2. Research Framework

2.1. Data Curation and Preprocessing

In order to forecast the CS and the flow of concrete, respectively, this study employed MEP and GEP methodologies. The dataset, consisting of 103 data points, was collected through a detailed experimental investigation, where the CS of concrete was measured according to ASTM C39, and the flow characteristics were evaluated using the flow table test as per ASTM C1437 [28,29]. This study utilized ten parameters to predict CS and flow, including cement (CM), slag (Sl), fly ash (FA), water (Wa), water-to-cement ratio (WCR), water-to-binder ratio (WBR), superplasticizer (SP), superplasticizer-to-cement ratio (SPCR), coarse aggregate (CA), and sand (Sa). Following a predetermined process, the executed Python code increased the dataset from 103 to 515 points. A file dialog box based on Tkinter was loaded to allow the user to select a database file at the beginning. Upon selecting a file, the script will import it into a Pandas DataFrame and ascertain the current point count. The dataset that comes out of merging the original DataFrame with synthetic data is stored in a newly generated file. Annotations that clarify the situation are provided by the script when the data is enhanced. Specifically, the declarations detail the saved file’s location, the data points that were inserted, and the data points that were synthesized [30]. The script also takes into consideration cases where resampling is necessary or when no file is chosen. The data was more easily organized and collected after proper data preparation. A common strategy for circumventing issues associated with the established method of extracting fresh insights from historical data is to employ data preparation as a safeguard. Data preparation involves eliminating any extraneous or unnecessary information from the dataset. The model study employed regression and error-distribution methodologies.

Standard deviation, median, skewness, mean and kurtosis are some of the descriptive statistics used to summarize the data in Table 1, employing the descriptive analysis function of Microsoft Excel. These measurements shed light on the dataset’s distribution, central tendency, and variability. Descriptive statistics are crucial in machine learning for understanding data distribution, detecting outliers, and identifying skewed or imbalanced features. This helps with data preprocessing, feature selection, and choosing appropriate models or transformations to improve prediction accuracy. In the given table, the mean indicates average values, while the standard deviation shows the spread of data around the mean. For example, the cement (CM) content has a mean of 223.07 kg/m³ with a high standard deviation of 77.22, suggesting notable variability in mix designs. Similarly, the compressive strength (CS) shows moderate variability with a mean of 35.72 MPa and a standard deviation of 8.08. Moreover, high standard deviations (e.g., CM = 77.2 kg/m³, FA = 86.5 kg/m³) and wide ranges (e.g., flow = 200–780 mm) indicate substantial variability and heterogeneity in the dataset. Negative kurtosis values (e.g., CM = −1.6) and slight skewness in several variables suggest non-normal and relatively flat distributions. These characteristics necessitate the use of robust, nonlinear models like GEP and MEP, which can effectively capture complex relationships across dispersed data. The model selection and validation protocols were accordingly designed to account for this variability, ensuring generalization and avoiding bias. This contextualization addresses the reviewer’s comment by linking data characteristics to model robustness and design choices. Moreover, Figure 1a,b are the violin plot that visualizes the distribution of a variable across multiple groups. Wider sections indicate higher concentrations of values, and each violin represents a group. This depicts the probability density of the data. The green vertical line represents the interquartile range, and the central white dot represents the median. Groups toward the right exhibit broader and taller violins, suggesting greater variability and higher central values. Some violins show bimodal distributions, indicated by dual peaks. In contrast, groups near the center have tighter distributions, reflecting less variability. Overall, the plot effectively highlights differences in spread and central tendency among the groups.

Prior to beginning the model construction process, it is beneficial to validate the interdependence of the chosen variables. This is because multi-collinearity, an issue that could arise during algorithm development, can be caused by input variables that are highly associated with each other [31]. Researchers can confirm the interdependence of study variables using statistical analytic methods such as correlation matrices. Since it measures the relationship between different variables using the coefficient of correlation (R), it can be used to examine the interplay of explanatory factors. You can see how dependent one variable is on the other by looking at the R-value between them. In a positive R-value, the two variables are positively associated; in a negative R-value, the inverse is true. R values greater than 0.8 often indicate a strong degree of connection between two variables [32]. The correlation matrix produced for the data used in this study (Figure 2a,b) shows that the correlation values between variables are typically less than 0.8. What this means is that there is no risk of multi-collinearity when creating the model.

2.2. Machine Learning Modeling

To evaluate concrete’s CS and flow, a controlled environment was utilized. Ten inputs were needed to obtain the CS and flow outputs. The top-performing machine learning algorithms, GEP and MEP, which follow evolutionary principles by simulating natural selection to evolve predictive equations, were utilized to accurately forecast the compressive strength and flow behavior of concrete. In machine learning evaluations, model performance is typically assessed by comparing predicted outputs with actual input data. The models were trained using 70% of the data in this study, while the remaining 30% was reserved for testing. The effectiveness of the model is reflected in the R² value, where higher values indicate strong alignment between predictions and actual results. Conversely, a lower R² suggests greater deviation, highlighting discrepancies between expected and observed outcomes [33]. Statistical analysis and error evaluations are part of the extensive validation procedures that confirm the model’s reliability. Table 2 lists the critical hyperparameters controlling the GEP and MEP models’ performance, while Figure 3 shows a scenario-based model representation.

2.2.1. GEP Framework for Predictive Modeling

J. H. Holland discovered the genetic algorithm (GA). It was with Darwin’s theory of evolution in mind that this algorithm was built. By simulating the process of natural selection, this algorithm solves optimization issues by gradually evolving solutions over the course of generations; therefore, it imitates the competition for survival of the fittest [34]. The final product of the genomic process, represented by a sequence of GAs, is a chromosome with a constant length. “Gene programming” was Koza’s new GA [35]. In order to solve problems in a generic way, GP can use genetic algorithms to build an evolutionary model [36]. Nonlinear structures, such as parse trees, can replace binary strings of a constant length, which is what bestows GP its versatility. To address reproduction issues, the current AI system follows Darwin’s premise and makes use of genomic components that already exist in nature, such as reproduction, crossover, and change [37]. Similarly, in the prior illustration, trees are taken out that are not going to work and the ones that are left to replant the space using our preferred approach are used. Evolution, on the other hand, protects early convergence [37,38]. It is critical to determine the following five aspects before applying the GP methodology: key operational tasks in the field, assessment of fitness, fundamental operational operators (such as crossover and population extent), and outcomes from endpoints related to methods [37]. While GP is responsible for recurrent model building, a crossover genetic processor handles the majority of the parse tree creation. The formulations for the desired qualities become more complex due to the fact that nonlinear GP forms must function as both genotype and phenotype [38].

The original proposer of the notion was Candida Ferreira, the man behind GP and genetic engineering programs (GEPs). This approach enhances standard GP by embedding programs as linear chromosomes and then expressing them as tree topologies; this, in turn, makes evolution more efficient and capable [38]. Drawing on population generation theory, GEP models use parse trees and static-length lined chromosomes. Using simple, fixed-length chromosomes, a GP version dubbed GEP encrypts software of intermediate size. Equations for the prediction of complex and nonlinear issues can be constructed using GEP [39,40]. The termination conditions, last set, and fitness function are all provided, just like in GP. While the GEP approach uses random numbers to construct chromosomes, prior to production, they are identified using the “Karva” dialectal. A GEP is predicated on lines that are perpetually extending. However, the GP’s data processing generates parse trees of varying lengths. These individual cords exhibit chromosomes in a nonlinear representation of parse trees, characterized by diverse pronged morphologies, initially described as static-length genomes [37]. These genotypes and phenol strains can be distinguished from one another by their unique genetic codes [35]. GEP mitigates expensive structural alterations and duplications by safeguarding the genome through successive generations. GEP chromosomes can generate intricate multi-gene expressions from a single chromosome due to their distinctive “head” and “tail” configuration. This economic framework enhances the algorithm’s capacity to develop intricate solutions [37]. The instructions delivered by these genes are mathematical, arithmetic, logical, and Boolean. Activators link the instructions in DNA to the specific cellular processes that they regulate. The emergence of a novel language, Karva, capable of interpreting these chromosomes, has rendered it possible to formulate equations from empirical data. At Karva, a famous revolutionary begins their trek after the ET. The underlying layer is assigned to nodes by ET using Equation (1) [39]. The extent and duration of GEP gene K-expression may be affected by the overall number of ETs.

ETGEP = \log (i - \frac{3}{j})

(1)

GEP is a sophisticated machine learning algorithm that can perform even in the absence of any prior relationships. The many steps that are involved in the process of developing GEP mathematical equations are depicted in Figure 4. Every single cell in the body has the same number of chromosomes when it is born. In order to determine the overall health of the population, these chromosomes need to be verified as ETs. Only the healthiest and most physically strong persons are able to father offspring. When the most talented individuals are involved in a reiterative process, the outcome is of the highest possible quality. Following three inventions of proliferation, alteration, and edge, the ultimate product is the product of all of these processes.

2.2.2. MEP Framework for Predictive Modeling

The MEP is claimed to be an innovative linear-based genetic programming method due to its use of linear chromosomes. MEP distinguishes itself from earlier incarnations of the GP technique by its capacity to amalgamate various software components into a singular duplicate. To achieve the intended outcome, fitness analysis is used to pick the best optimal chromosomal [42,43]. When a bipolar system has two marriages, as explained by Oltean and Grosan, two new generations are born. This is the result. A parent can be passed down from one generation to the next [44]. As demonstrated in Figure 5, the process will keep running until the best software is found, which happens before the termination condition. By evaluating the effectiveness of evolving mathematical expressions utilized to fit datasets, fitness analysis plays a crucial role in MEP. The fitness function enables the comparison of the program’s actual and expected outcomes, facilitating the identification of the ideal set of chromosomes for reproduction. Optimal programs are improved by MEP through mutation, selection, and crossover. If the algorithm stays within certain parameters, such as a predetermined fitness level, number of generations, or improvement threshold, it can be programmed to terminate using iterative procedures. Mutations in MEP are a mechanism by which evolution alters linear chromosomal components. Small chromosomal changes increase variation in population genetic makeup. When optimizing MEP, mutations introduced early on affect the DNA of subsequent generations, allowing for unique outcomes. By introducing mutations, the algorithm becomes better at searching solution spaces and adapting to fitness landscapes. The MEP model, like other ML paradigms, allows for component merger. Critical factors in multi-expression programming include algorithm or code length, number of functions, crossover frequency, and number of subpopulations [45]. It becomes more laborious and time-consuming to assess the populace when there are equal numbers of persons and packages. The generated arithmetic expression size is heavily affected by the length of the code. For the purpose of developing a reliable mechanical property model, Table 2 offers a comprehensive list of MEP attributes that are required.

During the stages of the MEP approach that involve evaluation and modeling, it is a frequent practice to make use of data sets that are composed of published literature [46,47]. Some academics have proposed that well-known linear general-purpose approaches, such as the MEP, are preferable when it comes to forecasting the qualities of concrete that are actually used in the real world. Grosan and Abraham came to the inference that the optimal results were achieved by combining linear genomic programming (LGP) and maximum likelihood estimation (MLP) after conducting an investigation into a number of different neural network methodologies [48]. The manner in which the GEP does its business is slightly more complicated than that of the MEP [45]. Although MEP is less dense than GEP, there are still several key differences between the two: (i) Code reprocessing is made possible by MEP; (ii) genes encapsulate non-coding components so they do not have to be presented at a precise position; and (iii) references to function arguments are encoded explicitly [49]. GEP is often perceived as more powerful than it truly is, primarily because constructing syntactically valid programs is straightforward when utilizing the signals present in the “head” and “tail” regions of a standard GEP gene [44]. As a consequence of this discovery, it is necessary to conduct a more comprehensive investigation into each of these genetic approaches by means of engineering challenges.

2.3. Model Accuracy Assessment Approach

To assess the predictive performance of the GEP and MEP models, a detailed statistical evaluation was carried out using the test dataset. The analysis incorporated several key metrics, including Pearson’s correlation coefficient (R), Nash–Sutcliffe efficiency (NSE), root mean square error (RMSE), relative root mean square error (RRMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and relative squared error (RSE). This combination of correlation- and error-based indicators offered a comprehensive evaluation of model accuracy, stability, and generalization capability [47,50,51,52,53]. Equations (2)–(8) present the formulations for various statistical performance metrics.

R = \frac{\sum_{i = 1}^{n} (a_{i} - {\bar{a}}_{i}) (p_{i} - \bar{p_{i}})}{\sqrt{\sum_{i = 1}^{n} {(a_{i} - \bar{a_{i}})}^{2}} \sum_{i = 1}^{n} {(p_{i} - {\bar{p}}_{i})}^{2}}

(2)

MAE = \frac{1}{n} \sum_{i = 1}^{n} |P_{i} - T i|

(3)

RMSE = \sqrt{\sum \frac{{(P_{i} - T_{i})}^{2}}{n}},

(4)

MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} \frac{|P_{i} - T i|}{T_{i}},

(5)

RSE = \frac{\sum_{i = 1}^{n} {(a_{i} - p_{i})}^{2}}{\sum_{i = 1}^{n} {(\bar{a} - a_{i})}^{2}}

(6)

NSE = 1 - \frac{\sum_{i = 1}^{n} {(a_{i} - p_{i})}^{2}}{\sum_{i = 1}^{n} {(a_{i} - \bar{p_{i}})}^{2}}

(7)

RRMSE = \frac{1}{|\bar{a}|} \sqrt{\frac{\sum_{1 = 1}^{n} {(a_{i} - p_{i})}^{2}}{n}}

(8)

In this context,

n

represents the total number of data points, while

a_{i}

and

p_{i}

are the actual and predicted values for each data point. Additionally,

\bar{a}

an

d \bar{p}

refer to the average actual and predicted values. For the purpose of measuring the limit to which a simulation is able to consistently project future outcomes (

a_{i}

and

p_{i}

), the coefficient of correlation, which is commonly referred to as

R

, is a helpful application. When it is discovered that the value of

R

is high, it indicates that there is a good correlation between the levels of production that were expected and those that were actually produced [54]. The value of R stays the same regardless of whether it is multiplied or divided. However, R² provides a more precise measure of accuracy, as it reflects the relationship between actual and predicted outcomes. A higher R² value, approaching 1, signifies a more reliable and well-constructed predictive model [55,56]. With fewer mistakes, the suggested model outperforms the competition, as reflected in the significant improvements observed in MAE and RMSE. However, both metrics tend to approach zero as errors diminish [57,58]. Conversely, MAE proves to be highly effective in continuous and smooth datasets, as observed through detailed analysis [59]. The model performs more effectively when the previously computed error values are lower.

An additional tool for assessing a model’s predictive power is the Taylor diagram. There are many powerful tools that can be applied, one of which is statistical validation. Due to the fact that it compares the deviations of the models from reality, which is also referred to as the point of reference, this image is essential for determining whether or not the models that are based on the data are accurate and trustworthy [60,61]. Standard deviations are represented on the x- and y-axes, correlation coefficients on the radial lines, and root mean squared errors (RMSEs) on the radial lines that converge at the true value point. These three parameters allow one to evaluate the best location for a model. The most dependable model is the one that consistently produces the most accurate predictions [60].

3. Computational Outcomes and Interpretation

3.1. CS Models

3.1.1. GEP Simulation for CS

The models’ results, as shown in Figure 6, were obtained using the GEP technique. Through the use of mathematical correlations derived from genome frequency and head size, the models were able to calculate compressive strength (CS) using expression trees (ETs). Standard mathematical operations such as adding, subtracting, dividing, multiplying, and computing square roots were mostly used to construct the sub-ETs in the concrete CS models. There is an algebraic formula that results from encoding these sub-ETs in GEP. By inputting some numbers into these formulas, we can predict concrete’s future CS, as demonstrated in Equation (9). Given sufficient data, the resulting model has the potential to outperform an optimal model under perfect conditions. If the equation yields invalid results due to specific parameters (e.g., negative square root inputs or division by zero), those parameter terms may be omitted or excluded from the calculation to ensure the formula remains computationally valid and yields interpretable outputs. If the results are accurate, Figure 7a will show results close to the solid red line. The degree of agreement between the anticipated and tested CS values is shown by this graph. An R² value of 0.910 indicates a considerable improvement in the accuracy of estimating the CS of concrete, indicating that the measured values of CS strongly agreed with the predictions given by the GEP model. As demonstrated in Figure 7b, which juxtaposes the test data with absolute error, any disparity between the expected and GEP model results can be visualized. The results demonstrated that the experimental data and predictions from the GEP equation are extremely close, with an absolute error range of 0.001 MPa to 8.142 MPa. This is demonstrated by the distribution of mistakes, as seen in Figure 8. For a total of 42 readings, 86 are below 1.0 MPa and 43 are above 3.0 MPa, putting them in the range of 1.0 to 3.0 MPa. Extremely low error frequencies rarely occur.

\begin{matrix} C S (M P a) = | & \sqrt{W B R + (\frac{F A + C M + C A}{(S P + W B R - F A)})} \\ + \frac{\sqrt{\sqrt{(- 353.054 + S a - S l + W a^{2})} + F A}}{W C R} \\ + \frac{\sqrt{F A}}{(\frac{W B R}{S P} \cdot (W a \cdot 2.908 \cdot W B R))} \\ + \frac{S l}{S P C R - ((\sqrt{W a} - {7.145}^{1.629}) \cdot (3.429 - (- 0.754)))} \end{matrix}

(9)

where CS: compressive strength, Wa: water, WCR: water-to-cement ratio, CM: cement, CA: coarse aggregate, SPCR: superplasticizer-to-cement ratio, Sa: sand, SP: superplasticizer, Sl: slag, FA: fly ash, WBR: water-to-binder ratio.

3.1.2. MEP Simulation for CS

An empirical equation was established to predict the CS of concrete, derived from the optimized outcomes of the MEP model. This equation accounts for the combined influence of ten key input variables. The resulting mathematical expression is detailed in Equation (10), representing the final outcome of the modeling process.

\begin{matrix} CS (MPa) = (SP \times SPCR) \\ + [\frac{(WBR + (\frac{((CM + FA) + \frac{CM + FA}{WBR})}{WCR} + WBR)) - CA}{(Sa + (WCR \times (Sa - CA)))}] \\ + (\sqrt{((CM + FA) + \frac{CM + FA}{WBR}) + W a + S l - W B R}) \end{matrix}

(10)

where CS: compressive strength, Wa: water, WCR: water-to-cement ratio, CM: cement, CA: coarse aggregate, SPCR: superplasticizer-to-cement ratio, Sa: sand, SP: superplasticizer, Sl: slag, FA: fly ash, WBR: water-to-binder ratio.

The robustness and efficiency of the simplified MEP model are demonstrated in Figure 9a, which exhibits an impressive coefficient of determination (R²) of 0.951. The MEP model shows better performance in predicting CS than the GEP model, as indicated by its higher R² value. Figure 9a further confirms this, with predicted and experimental CS values closely following the red-black reference line, reflecting a strong fit and high prediction accuracy. Figure 9b shows the results of a study that looked at the absolute differences between the target and the actual values that MEP predicted. An average inaccuracy of 1.308 MPa is produced by the MEP model, with individual errors varying from 0.003 MPa to 3.910 MPa, as the data shows. The model’s general accuracy and dependability are shown by the fact that 75 predictions have errors below 1.0 MPa; 83 fall between 1.0 and 3.0 MPa, and only 13 surpass 3.0 MPa. Figure 10’s distribution plot illustrates that, when outlier values are considered, the MEP model exhibits lower outcome volatility than the GEP model. The predictive power of both the MEP and GEP models is high. However, the implementation of the MEP equation results in a higher correlation and reduced standard deviations in error values, indicating improved model performance. Its simplicity and adaptability further contribute to the broader applicability and preference for the MEP equation.

3.2. Flow Models

3.2.1. GEP Simulation for Flow

Figure 11 displays the results of models that were developed using the GEP method. To determine the expression tree (ET) workability in terms of flow, the models used mathematical correlations originating from genome frequency and head size. The GEP encoding of these sub-ETs yields an algebraic formula. These algorithms can forecast the concrete flow from the input data, as shown in Equation (11). If there is enough data, the final model can beat a perfect model in ideal circumstances. Figure 12a will display a solid red line if the data is perfectly matched. This figure shows how well the expected and experimental flow values match up. The flow values predicted by the GEP model closely matched the experimental values, achieving an R² of 0.882, suggesting significantly enhanced accuracy. The GEP approach was highly effective in identifying the flow of concrete. Figure 12b plots the experimental values against the absolute error, illustrating the potential deviation between the GEP model predictions and the actual results. The average absolute error is 52.183 mm, ranging from 0.588 mm to 159.758 mm, suggesting a generally strong agreement between the GEP model outputs and the experimental data. The error values follow a distribution that resembles a dome, as shown in Figure 13. A total of 58 strength values fall within the 50.0 to 100.0 mm range, while error distribution shows 93 instances below 50.0 mm and 20 exceeding 100.0 mm. The occurrence of maximal error incidences is quite uncommon.

\begin{matrix} F l o w (m m) = (Wa + (Wa - (\frac{{(WCR)}^{- 9.462} + CA}{(Sl - WCR) + SP}))) + ((\frac{FA - SP}{SP \cdot 1.554 + S a}) + (\frac{WCR}{SPCR} - WBR)) + (Wa - (WBR + WBR)) + (SP - (\frac{10.640}{WCR}) \div ((Sl - WCR) \cdot WCR)) + ({(WCR)}^{8.899} + ({(SP + Sl)}^{{WBR}^{SPCR}} - (\frac{Sl}{WBR} - SP))) + \frac{C M}{W B R + 0.001} \end{matrix}

(11)

where CM: cement, Sl: slag, FA: fly ash, SP: superplasticizer, Wa: water, WCR: water-to-cement ratio, WBR: water-to-binder ratio, SPCR: superplasticizer-to-cement ratio, CA: coarse aggregate, and Sa: sand.

3.2.2. MEP Simulation for Flow

To determine the flow behavior of concrete, an empirical equation was derived based on the outcomes of the MEP model, incorporating the effects of ten independent input variables. Upon completion of the modeling process, the final mathematical formulation is presented in Equation (12).

Flow (mm) = (\frac{F A}{W B R \times (W B R \times 2 \times W a - S l - \sqrt{W a}) + \frac{1}{W C R}} + (W a + 2 \times W a - S l + \frac{S P}{S P C R} - \frac{C A}{\sqrt{W a}} + (W B R \times 2 \times W a - S l) \times \frac{S l}{C M \times W B R \times S a}))

(12)

where CM: cement, Sl: slag, FA: fly ash, Wa: water, WCR: water-to-cement ratio, WBR: water-to-binder ratio, SP: superplasticizer, SPCR: superplasticizer-to-cement ratio, CA: coarse aggregate, and Sa: sand.

As shown in Figure 14a, the well-trained MEP model shows remarkable resilience to oversimplification, with an R² value of 0.923. Similar to the CS models, the flow MEP model outperforms its GEP counterpart, exhibiting superior accuracy reflected in its higher R² value. Figure 14a shows a complete fit with the data as a solid red line. The flow values that were obtained were in good agreement with what the MEP model predicted, demonstrating the remarkable accuracy of the MEP technique in determining the flow of concrete. The results of the investigation performed in MEP simulations are shown in Figure 14b, which examines the absolute disparities between the goal and actual values. According to the data that has been provided, the average margin of error for MEP estimates is 30.503 mm, with margins of error ranging from 0.061 mm to 117.122 mm. In addition, the mean error values are still below 30.503 mm; out of these, 136 values are less than 50.0 mm, 27 fall between 50.0 mm and 100.0 mm, and only 08 go over 100.0 mm. Take outlier values as an example; compared to the GEP model, the MEP model exhibits less outcome variability. Predictive models built with MEP and GEP are highly promising. Utilizing the MEP equation, one can reduce the standard deviations of both the correlation and the errors. A lot of people use the MEP equation because of how short and flexible it is. Figure 15 shows that the MEP model outperforms the GEP model due to its reduced error rate and higher correlation coefficient.

3.3. Model Accuracy Assessment

Table 3 presents a summary of performance and error assessments, incorporating MAE, RMSE, RSE, RRMSE, NSE, R, and additional metrics derived from Equations (2)–(8). Improved model accuracy is reflected in lower error values and greater computational efficiency. The performance comparison between GEP and MEP models, as presented in Table 3, highlights MEP’s clear superiority and reveals the interdependent nature of the evaluation metrics that collectively support its enhanced predictive capability. For compressive strength (CS), MEP reduced the MAE from 1.725 MPa to 1.308 MPa and RMSE from 2.405 MPa to 1.656 MPa, indicating improved accuracy with fewer extreme errors. The MAPE value decreased from 5.0% to 3.7%, reflecting a 26 percent improvement in relative prediction accuracy, while the correlation coefficient R increased from 0.954 to 0.975, signifying a stronger relationship between predicted and actual values. Similarly, RSE dropped from 0.342 to 0.262 and NSE increased from 0.910 to 0.947, suggesting that MEP better accounts for variability within the dataset. For flow predictions, MEP achieved further improvements, lowering MAE from 52.183 mm to 30.503 mm and RMSE from 63.746 mm to 41.518 mm, while MAPE decreased significantly from 13.0% to 6.4%, showing nearly 51 percent enhancement in relative accuracy. The R value rose from 0.939 to 0.961, and reductions in RSE from 0.286 to 0.242 and RRMSE from 0.632 to 0.512 further demonstrate more stable and consistent model behavior. The combined reductions in MAE, RMSE, and MAPE alongside increased R and NSE values suggest that MEP not only lowers overall error but also ensures a more uniform distribution of residuals, addressing the reviewer’s concern by offering a detailed explanation of how metric improvements are interconnected and reflective of deeper model reliability rather than superficial numerical gains. The Taylor plots in Figure 16a,b for CS and flow also confirms the higher accuracy of MEP models compared to GEP.

3.4. SHAP Analysis

3.4.1. Features Influence on Compressive Strength

The Shapley additive explanation (SHAP) analysis in Figure 17 provides an interpretable view of how each feature contributes to the model’s prediction of CS. This summary plot reveals that the most influential features are WBR, SI, and WCR. Low values of WBR and SI significantly increase the model’s prediction toward CS, indicating these are critical risk indicators. WCR also shows that lower values are associated with a higher likelihood of CS, though its effect is more dispersed. Wa, FA, and CA have moderate impacts, primarily where low values push the prediction toward CS, suggesting their partial influence. In contrast, features such as SPCR, CM, Sa, and SP exhibit minimal contribution to the model’s output, with SHAP values clustered around zero, indicating these factors play a lesser role in the prediction. Overall, this SHAP analysis not only ranks feature importance but also shows how specific value ranges of each feature influence the decision boundary, offering critical insights for model transparency and potential clinical decision-making.

The SHAP feature interdependency plots in Figure 18a–j for compressive strength reveal complex and nonlinear interactions among various features. In Figure 18a, the SHAP values of cement are modestly influenced by both its own magnitude and its interaction with silica fume, especially around 150–200 units, indicating limited sensitivity outside this range. Figure 18b shows a strong negative influence of increasing silica fume on its SHAP values, particularly when the water–binder ratio is high, suggesting that excessive silica fume in high water–binder systems may adversely impact predictions. In Figure 18c, fly ash demonstrates a positive SHAP contribution when its content exceeds 100 units, amplified by its interplay with cement, highlighting a synergistic effect. Figure 18d depicts water’s SHAP values forming a U-shaped pattern, suggesting that both low and high water content positively affect predictions, whereas mid-range values contribute negatively, with this behavior influenced by cement dosage. In Figure 18e, increasing the water–cement ratio shows a clear decline in SHAP values, indicating a detrimental effect on strength predictions, especially when the water–binder ratio is high. Figure 18f illustrates a strong inverse relationship between water–binder ratio and SHAP values, where lower ratios significantly enhance predictive contributions, with this pattern also moderated by coarse aggregate content. In Figure 18g, superplasticizer exhibits a negligible influence on predictions regardless of dosage, showing almost flat SHAP values, though slightly affected by water–binder ratio. Figure 18h reveals that as the superplasticizer–cement ratio increases, SHAP values tend to decline, suggesting that higher ratios may reduce model confidence, particularly when the water–binder ratio is high. In Figure 18i, coarse aggregate shows a downward trend in SHAP values as its content increases, particularly when silica fume is high, suggesting that excessive aggregate may offset strength gains. Finally, Figure 18j shows that sand content has limited and mostly negative SHAP values across the range, implying a weak or neutral contribution to the model, with minimal interaction effects from cement. Overall, these visualizations highlight the importance of balanced mix design and nonlinear feature dependencies in predicting compressive strength.

3.4.2. Features Influence on Workability (Flow)

To quantify the influence of each input feature on a model’s prediction at the instance level, SHAP analysis is a robust interpretability technique in machine learning. Figure 19 shows the SHAP summary plot for the concrete mixture flow (workability) prediction problem. It shows how each characteristic influences the model’s output and in what direction. Starting with water (Wa), it exhibits the highest positive SHAP values for high feature values (pink), indicating that greater water content strongly enhances flow, aligning with expectations of increased workability. Silica fume (SI), on the other hand, negatively influences flow at higher levels, suggesting that excessive SI may reduce mix fluidity due to its fine particle size and high surface area. The water–binder ratio (WBR) shows a mostly positive influence, especially when high, reinforcing the role of this ratio in improving workability. Coarse aggregate (CA) displays a complex effect, where higher values tend to reduce flow, likely due to increased internal friction. Water–cement ratio (WCR), similar to WBR, contributes positively when elevated, indicating improved mix lubrication. The superplasticizer–cement ratio (SPCR) is moderately influential; higher ratios result in positive SHAP values, demonstrating their effectiveness in boosting flow. Fly ash (FA) generally contributes positively, especially at higher values, possibly due to its spherical shape improving particle packing. Cement (CM) and sand (Sa) show mixed but generally neutral or slightly negative effects, indicating that increased content may not necessarily enhance flow and could even stiffen the mix. Finally, superplasticizer (SP) alone shows a strong positive impact when its values are high, confirming its known role in improving workability. Overall, SHAP analysis reveals that water, superplasticizer, and related ratios are the most influential in promoting flow, while high contents of silica fume, coarse aggregate, and sometimes cement may hinder it.

The SHAP value plots in Figure 20a–j illustrate the feature importance and interaction effects on a model’s output across various input features. In Figure 20a, CM exhibits a dispersed impact on predictions, with Wa influencing the direction of SHAP values moderately. Figure 20b reveals a strong negative relationship between SI and its SHAP value, indicating that high SI values tend to reduce the model output, while the coloring by WBR adds another layer showing interaction effects. Figure 20c demonstrates that FA has a largely positive SHAP influence when its value is moderate to high, particularly when Wa is high. Figure 20d shows a strong positive correlation between Wa and its SHAP value, with SI modulating this influence—higher SI values boost the SHAP contribution of Wa. In Figure 20e, WCR exhibits a non-linear SHAP impact, with higher Wa values contributing more positively. Figure 20f indicates WBR has a peak SHAP impact around mid-range values, with SI modifying the effect magnitude. Figure 20g shows SP has minimal effect across its range, with Wa not substantially altering SHAP values. Conversely, Figure 20h shows SPCR has a stronger SHAP influence at lower values, with the SP interaction evident through color gradients. Figure 20i reveals a clear negative relationship between CA and its SHAP value, with high Wa enhancing model output. Finally, Figure 20j suggests a mild decreasing SHAP effect of Sa, with SI having limited modulation. Collectively, these plots highlight that Wa, SI, and CA play dominant roles in shaping the model’s predictions, with various features interacting to alter their influence.

4. Discussions

One major benefit of the GEP and MEP models is that they use a small set of ten input parameters to ensure that the forecasts are accurate and specific. The models consistently use unit measurements and testing methodologies to create reliable CS and flow projections. If you want to know how each input parameter affects the mix design, you should look into the model’s mathematical equations. Using more than ten inputs can affect the reliability of the projected models in the combined analysis. The models may not perform as expected if the training data is significantly different from the domain they are meant to work. The units of the input parameters must be consistent and integrated for the models to accurately predi4r4ct outcomes. If the unit sizes are not consistent, the models will not work. If the model equation produces invalid results, such as from negative inputs under square roots or division by zero, this may be due to parameter-specific issues or potential typographical errors in the equation itself. In such cases, the affected parameter terms may be reasonably omitted or excluded to maintain computational validity and ensure the output remains interpretable. ML models are widely applicable in the construction sector, supporting tasks such as forecasting material performance, ensuring quality control, evaluating risks, enabling proactive maintenance, and optimizing energy usage. The limits of these models must, however, be acknowledged. Their reliance on human input is a big limitation since it increases the likelihood of inaccurate data and results. To enhance machine learning solutions and go beyond these restrictions, there are a lot of possible directions for future research. This includes enhancing the integration of IoT devices, developing hybrid modeling approaches, incorporating explainable AI (XAI) techniques with a focus on sustainability, and generating as well as sharing industry-specific datasets to support more informed and responsible decision-making across sectors. Thanks to these technological advancements, the construction sector will be able to make better judgments based on more accurate information, which will boost production. Project delays, heightened safety measures, and the encouragement of environmentally friendly practices might all emerge from this. Concrete with additional cementitious ingredients may become more popular in the industry as a result of this study’s recommendations for more environmentally friendly building methods.

5. Conclusions

This study investigates concrete with a focus on its compressive strength (CS) and workability (flow) using two machine learning techniques: gene expression programming (GEP) and multi-expression programming (MEP). A comprehensive dataset comprising 515 experimental records of CS and flow was utilized for model training, testing, and validation. The key findings of the study are summarized as follows:

With R² values of 0.910 and 0.882, respectively, for CS and flow of concrete, the GEP model proved to be highly predictive. Nevertheless, the MEP method demonstrated more accuracy, with R² values of 0.951 for CS and 0.923 for flow, showing improved accuracy and dependability in strength estimation.
The comparative error analysis indicates that the MEP model demonstrates superior predictive accuracy over the GEP model, reducing the average error for CS from 1.725 MPa to 1.308 MPa and for flow from 52.183 mm to 030.503 mm. This significant reduction in error highlights the robustness and reliability of the MEP approach in estimating strength properties, making it a more precise choice for predictive modeling in concrete research.
The statistical evaluation confirms that MEP outperforms GEP in predicting both CS and flow, exhibiting lower error values, higher correlation coefficients, and superior efficiency scores. With reduced MAE, RMSE, RSE, and RRMSE, along with improved NSE and R values, MEP demonstrates greater accuracy and reliability. Therefore, MEP is the preferred machine learning approach for precise strength prediction in material modeling applications.
Results from SHAP reveal that the two models’ feature importance patterns are different. For CS, WBR and SI exhibit the strongest impact, indicating their dominant role in shaping model predictions. In contrast, Flow is predominantly influenced by Wa and SI, suggesting differing underlying dynamics between the systems. These insights underscore the need for model-specific feature prioritization in decision-making frameworks.

For feature prediction in other databases, GEP and MEP’s distinctive mathematical techniques are crucial. These methods provide a quick way to assess, improve, and rationalize the proportioning of concrete mixes. This work’s mathematical models speed up the process of evaluating and improving concrete mixes, which in turn allows engineers and scientists to make more efficient strides in the field.

Funding

Taif University, Saudi Arabia, for supporting this work through project number (TU-DSPP-2024-248).

Data Availability Statement

The data presented in this study are available on request from the corresponding author upon reasonable request.

Acknowledgments

The author extends his appreciation to Taif University, Saudi Arabia, for supporting this work through project number (TU-DSPP-2024-248).

Conflicts of Interest

The author declare no conflict of interest.

References

Ghosh, A.; Ransinchung, G.D. Application of machine learning algorithm to assess the efficacy of varying industrial wastes and curing methods on strength development of geopolymer concrete. Constr. Build. Mater. 2022, 341, 127828. [Google Scholar] [CrossRef]
Belaïd, F. How does concrete and cement industry transformation contribute to mitigating climate change challenges? Resour. Conserv. Recycl. Adv. 2022, 15, 200084. [Google Scholar] [CrossRef]
Zeng, J.-J.; Zeng, W.-B.; Ye, Y.-Y.; Liao, J.; Zhuge, Y.; Fan, T.-H. Flexural behavior of FRP grid reinforced ultra-high-performance concrete composite plates with different types of fibers. Eng. Struct. 2022, 272, 115020. [Google Scholar] [CrossRef]
Andrew, R.M. Global CO₂ emissions from cement production, 1928–2018. Earth Syst. Sci. Data 2019, 11, 1675–1710. [Google Scholar] [CrossRef]
Puertas, F.; Suárez-Navarro, J.A.; Alonso, M.M.; Gascó, C. NORM waste, cements, and concretes. A review. Mater. De Construcción 2021, 71, 344. [Google Scholar] [CrossRef]
Ahmad, W.; McCormack, S.J.; Byrne, A. Biocomposites for sustainable construction: A review of material properties, applications, research gaps, and contribution to circular economy. J. Build. Eng. 2025, 105, 112525. [Google Scholar] [CrossRef]
Elmagarhe, A.; Lu, Q.; Alharthai, M.; Alamri, M.; Elnihum, A. Performance of Porous Asphalt Mixtures Containing Recycled Concrete Aggregate and Fly Ash. Materials 2022, 15, 6363. [Google Scholar] [CrossRef]
Schaubroeck, T.; Gibon, T.; Igos, E.; Benetto, E. Sustainability assessment of circular economy over time: Modelling of finite and variable loops & impact distribution among related products. Resour. Conserv. Recycl. 2021, 168, 105319. [Google Scholar]
Shaaban, I.G.; Rizzuto, J.P.; El-Nemr, A.; Bohan, L.; Ahmed, H.; Tindyebwa, H. Mechanical properties and air permeability of concrete containing waste tires extracts. J. Mater. Civ. Eng. 2021, 33, 04020472. [Google Scholar] [CrossRef]
Nurruddin, M.F.; Sani, H.; Mohammed, B.S.; Shaaban, I. Methods of curing geopolymer concrete: A review. Int. J. Adv. Appl. Sci. 2018, 5, 31–36. [Google Scholar] [CrossRef]
Saif, M.S.; Shanour, A.S.; Abdelaziz, G.E.; Elsayad, H.I.; Shaaban, I.G.; Tayeh, B.A.; Hammad, M.S. Influence of blended powders on properties of ultra-high strength fibre reinforced self compacting concrete subjected to elevated temperatures. Case Stud. Constr. Mater. 2023, 18, e01793. [Google Scholar] [CrossRef]
Vakharia, V.; Gujar, R. Prediction of compressive strength and portland cement composition using cross-validation and feature ranking techniques. Constr. Build. Mater. 2019, 225, 292–301. [Google Scholar] [CrossRef]
Ahmad, W.; Veeraghantla, V.S.S.C.S.; Byrne, A. Advancing Sustainable Concrete Using Biochar: Experimental and Modelling Study for Mechanical Strength Evaluation. Sustainability 2025, 17, 2516. [Google Scholar] [CrossRef]
Huang, H.; Li, M.; Yuan, Y.; Bai, H. Theoretical analysis on the lateral drift of precast concrete frame with replaceable artificial controllable plastic hinges. J. Build. Eng. 2022, 62, 105386. [Google Scholar] [CrossRef]
Shafighfard, T.; Bagherzadeh, F.; Rizi, R.A.; Yoo, D.-Y. Data-driven compressive strength prediction of steel fiber reinforced concrete (SFRC) subjected to elevated temperatures using stacked machine learning algorithms. J. Mater. Res. Technol. 2022, 21, 3777–3794. [Google Scholar] [CrossRef]
Nguyen, T.T.; Pham Duy, H.; Pham Thanh, T.; Vu, H.H. Compressive Strength Evaluation of Fiber-Reinforced High-Strength Self-Compacting Concrete with Artificial Intelligence. Adv. Civ. Eng. 2020, 2020, 3012139. [Google Scholar] [CrossRef]
Kulasooriya, W.; Ranasinghe, R.S.S.; Perera, U.S.; Thisovithan, P.; Ekanayake, I.U.; Meddage, D.P.P. Modeling strength characteristics of basalt fiber reinforced concrete using multiple explainable machine learning with a graphical user interface. Sci. Rep. 2023, 13, 13138. [Google Scholar] [CrossRef]
Amin, M.N.; Iftikhar, B.; Khan, K.; Javed, M.F.; AbuArab, A.M.; Rehman, M.F. Prediction model for rice husk ash concrete using AI approach: Boosting and bagging algorithms. Structures 2023, 50, 745–757. [Google Scholar] [CrossRef]
Zheng, D.; Wu, R.; Sufian, M.; Kahla, N.B.; Atig, M.; Deifalla, A.F.; Accouche, O.; Azab, M. Flexural Strength Prediction of Steel Fiber-Reinforced Concrete Using Artificial Intelligence. Materials 2022, 15, 5194. [Google Scholar] [CrossRef]
Nafees, A.; Amin, M.N.; Khan, K.; Nazir, K.; Ali, M.; Javed, M.F.; Aslam, F.; Musarat, M.A.; Vatin, N.I. Modeling of Mechanical Properties of Silica Fume-Based Green Concrete Using Machine Learning Techniques. Polymers 2021, 14, 30. [Google Scholar] [CrossRef]
Cakiroglu, C.; Aydın, Y.; Bekdaş, G.; Geem, Z.W. Interpretable predictive modelling of basalt fiber reinforced concrete splitting tensile strength using ensemble machine learning methods and SHAP approach. Materials 2023, 16, 4578. [Google Scholar] [CrossRef] [PubMed]
Cakiroglu, C.; Shahjalal, M.; Islam, K.; Mahmood, S.M.F.; Billah, A.H.M.M.; Nehdi, M.L. Explainable ensemble learning data-driven modeling of mechanical properties of fiber-reinforced rubberized recycled aggregate concrete. J. Build. Eng. 2023, 76, 107279. [Google Scholar] [CrossRef]
Cakiroglu, C.; Islam, K.; Bekdaş, G.; Isikdag, U.; Mangalathu, S. Explainable machine learning models for predicting the axial compression capacity of concrete filled steel tubular columns. Constr. Build. Mater. 2022, 356, 129227. [Google Scholar] [CrossRef]
Chen, L.; Wang, Z.; Khan, A.A.; Khan, M.; Javed, M.F.; Alaskar, A.; Eldin, S.M. Development of predictive models for sustainable concrete via genetic programming-based algorithms. J. Mater. Res. Technol. 2023, 24, 6391–6410. [Google Scholar] [CrossRef]
Shah, M.I.; Javed, M.F.; Aslam, F.; Alabduljabbar, H. Machine learning modeling integrating experimental analysis for predicting the properties of sugarcane bagasse ash concrete. Constr. Build. Mater. 2022, 314, 125634. [Google Scholar] [CrossRef]
Yang, X.; Wu, T.; Amin, M.N.; AlAteah, A.H.; Qadir, M.T.; Khan, S.A.; Javed, M.F. Experimenting the compressive performance of low-carbon alkali-activated materials using advanced modeling techniques. Rev. Adv. Mater. Sci. 2024, 63, 20240068. [Google Scholar] [CrossRef]
Khan, N.M.; Ma, L.; Inqiad, W.B.; Khan, M.S.; Iqbal, I.; Emad, M.Z.; Alarifi, S.S. Interpretable machine learning approaches to assess the compressive strength of metakaolin blended sustainable cement mortar. Sci. Rep. 2025, 15, 19414. [Google Scholar] [CrossRef]
Tipu, R.K.; Suman; Batra, V. Enhancing prediction accuracy of workability and compressive strength of high-performance concrete through extended dataset and improved machine learning models. Asian J. Civ. Eng. 2024, 25, 197–218. [Google Scholar] [CrossRef]
Yeh, I.C. Exploring concrete slump model using artificial neural networks. J. Comput. Civ. Eng. 2006, 20, 217–221. [Google Scholar] [CrossRef]
Tian, Q.; Lu, Y.; Zhou, J.; Song, S.; Yang, L.; Cheng, T.; Huang, J. Compressive strength of waste-derived cementitious composites using machine learning. Rev. Adv. Mater. Sci. 2024, 63, 20240008. [Google Scholar] [CrossRef]
Iftikhar, B.; Alih, S.C.; Vafaei, M.; Elkotb, M.A.; Shutaywi, M.; Javed, M.F.; Deebani, W.; Khan, M.I.; Aslam, F. Predictive modeling of compressive strength of sustainable rice husk ash concrete: Ensemble learner optimization and comparison. J. Clean. Prod. 2022, 348, 131285. [Google Scholar] [CrossRef]
Sarveghadi, M.; Gandomi, A.H.; Bolandi, H.; Alavi, A.H. Development of prediction models for shear strength of SFRCB using a machine learning approach. Neural Comput. Appl. 2019, 31, 2085–2094. [Google Scholar] [CrossRef]
Lee, B.C.; Brooks, D.M. Accurate and efficient regression modeling for microarchitectural performance and power prediction. ACM SIGOPS Oper. Syst. Rev. 2006, 40, 185–194. [Google Scholar] [CrossRef]
Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT press: Cambridge, MA, USA, 1992. [Google Scholar]
Koza, J. On the programming of computers by means of natural selection. In Genetic Programming; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
Gholampour, A.; Ozbakkaloglu, T.; Hassanli, R. Behavior of rubberized concrete under active confinement. Constr. Build. Mater. 2017, 138, 372–382. [Google Scholar] [CrossRef]
Topcu, I.B.; Sarıdemir, M. Prediction of compressive strength of concrete containing fly ash using artificial neural networks and fuzzy logic. Comput. Mater. Sci. 2008, 41, 305–311. [Google Scholar] [CrossRef]
Ferreira, C. Gene expression programming: Mathematical modeling by an artificial intelligence; Springer: Berlin/Heidelberg, Germany, 2006; Volume 21. [Google Scholar]
Gandomi, A.H.; Yun, G.J.; Alavi, A.H. An evolutionary approach for modeling of shear strength of RC deep beams. Mater. Struct. 2013, 46, 2109–2119. [Google Scholar] [CrossRef]
Gandomi, A.H.; Babanajad, S.K.; Alavi, A.H.; Farnam, Y. Novel approach to strength modeling of concrete under triaxial compression. J. Mater. Civ. Eng. 2012, 24, 1132–1143. [Google Scholar] [CrossRef]
Amin, M.N.; Ahmad, W.; Khan, K.; Deifalla, A.F. Optimizing compressive strength prediction models for rice husk ash concrete with evolutionary machine intelligence techniques. Case Stud. Constr. Mater. 2023, 18, e02102. [Google Scholar] [CrossRef]
Wang, H.-L.; Yin, Z.-Y. High performance prediction of soil compaction parameters using multi expression programming. Eng. Geol. 2020, 276, 105758. [Google Scholar] [CrossRef]
Iqbal, M.F.; Javed, M.F.; Rauf, M.; Azim, I.; Ashraf, M.; Yang, J.; Liu, Q.-f. Sustainable utilization of foundry waste: Forecasting mechanical properties of foundry sand based concrete using multi-expression programming. Sci. Total Environ. 2021, 780, 146524. [Google Scholar] [CrossRef]
Oltean, M.; Grosan, C. A comparison of several linear genetic programming techniques. Complex Syst. 2003, 14, 285–314. [Google Scholar] [CrossRef]
Fallahpour, A.; Olugu, E.U.; Musa, S.N. A hybrid model for supplier selection: Integration of AHP and multi expression programming (MEP). Neural Comput. Appl. 2017, 28, 499–504. [Google Scholar] [CrossRef]
Alavi, A.H.; Gandomi, A.H.; Sahab, M.G.; Gandomi, M. Multi expression programming: A new approach to formulation of soil classification. Eng. Comput. 2010, 26, 111–118. [Google Scholar] [CrossRef]
Mohammadzadeh, S.D.; Kazemi, S.-F.; Mosavi, A.; Nasseralshariati, E.; Tah, J.H.M. Prediction of compression index of fine-grained soils using a gene expression programming model. Infrastructures 2019, 4, 26. [Google Scholar] [CrossRef]
Grosan, C.; Abraham, A. Stock market modeling using genetic programming ensembles. In Genetic Systems Programming: Theory and Experiences; Springer: Berlin/Heidelberg, Germany, 2006; pp. 131–146. [Google Scholar]
Oltean, M.; Dumitrescu, D. Multi expression programming. J. Genet. Program. Evolvable Mach. 2002. [Google Scholar]
Iqbal, M.F.; Liu, Q.-f.; Azim, I.; Zhu, X.; Yang, J.; Javed, M.F.; Rauf, M. Prediction of mechanical properties of green concrete incorporating waste foundry sand based on gene expression programming. J. Hazard. Mater. 2020, 384, 121322. [Google Scholar] [CrossRef]
Shahin, M.A. Genetic Programming for Modelling of Geotechnical Engineering Systems; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar] [CrossRef]
Çanakcı, H.; Baykasoğlu, A.; Güllü, H. Prediction of compressive and tensile strength of Gaziantep basalts via neural networks and gene expression programming. Neural Comput. Appl. 2009, 18, 1031–1041. [Google Scholar] [CrossRef]
Alade, I.O.; Abd Rahman, M.A.; Saleh, T.A. Predicting the specific heat capacity of alumina/ethylene glycol nanofluids using support vector regression model optimized with Bayesian algorithm. Sol. Energy 2019, 183, 74–82. [Google Scholar] [CrossRef]
Alade, I.O.; Bagudu, A.; Oyehan, T.A.; Abd Rahman, M.A.; Saleh, T.A.; Olatunji, S.O. Estimating the refractive index of oxygenated and deoxygenated hemoglobin using genetic algorithm–support vector regression model. Comput. Methods Programs Biomed. 2018, 163, 135–142. [Google Scholar] [CrossRef]
Zhang, W.; Zhang, R.; Wu, C.; Goh, A.T.C.; Lacasse, S.; Liu, Z.; Liu, H. State-of-the-art review of soft computing applications in underground excavations. Geosci. Front. 2020, 11, 1095–1106. [Google Scholar] [CrossRef]
Alavi, A.H.; Gandomi, A.H.; Nejad, H.C.; Mollahasani, A.; Rashed, A. Design equations for prediction of pressuremeter soil deformation moduli utilizing expression programming systems. Neural Comput. Appl. 2013, 23, 1771–1786. [Google Scholar] [CrossRef]
Kisi, O.; Shiri, J.; Tombul, M. Modeling rainfall-runoff process using soft computing techniques. Comput. Geosci. 2013, 51, 108–117. [Google Scholar] [CrossRef]
Alade, I.O.; Abd Rahman, M.A.; Saleh, T.A. Modeling and prediction of the specific heat capacity of Al₂O₃/water nanofluids using hybrid genetic algorithm/support vector regression model. Nano-Struct. Nano-Objects 2019, 17, 103–111. [Google Scholar] [CrossRef]
Shahin, M.A. Use of evolutionary computing for modelling some complex problems in geotechnical engineering. Geomech. Geoengin. 2015, 10, 109–125. [Google Scholar] [CrossRef]
Band, S.S.; Heggy, E.; Bateni, S.M.; Karami, H.; Rabiee, M.; Samadianfard, S.; Chau, K.-W.; Mosavi, A. Groundwater level prediction in arid areas using wavelet analysis and Gaussian process regression. Eng. Appl. Comput. Fluid Mech. 2021, 15, 1147–1158. [Google Scholar] [CrossRef]
Taylor, K.E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]

Figure 1. Violin plots demonstrating the frequency distribution; (a) CS dataset, (b) Flow dataset.

Figure 2. Pearson correlation plots; (a) CS dataset, (b) Flow dataset.

Figure 3. Comprehensive study approach overview.

Figure 4. Workflow illustration of the GEP technique [41].

Figure 5. Workflow diagram for MEP approach [41].

Figure 6. CS GEP expression tree schematic.

Figure 7. Concrete CS GEP model: (a) Estimated-experimental CS relation; (b) Distribution of error across the dataset.

Figure 8. GEP model error spread (CS).

Figure 9. Concrete CS MEP model: (a) Estimated-experimental CS relation; (b) Distribution of error across the dataset.

Figure 10. MEP model error spread (CS).

Figure 11. Flow GEP expression tree schematic.

Figure 12. Concrete flow GEP model: (a) Estimated-experimental flow relation; (b) Distribution of error across the dataset.

Figure 13. GEP model error spread (flow).

Figure 14. Concrete flow MEP model: (a) Estimated-experimental flow relation; (b) Distribution of error across the dataset.

Figure 15. MEP model error spread (flow).

Figure 16. Taylor diagram; (a) CS models, (b) Flow models.

Figure 17. SHAP plots illustrate input impact on CS.

Figure 18. Feature interdependencies for CS: (a) CM, (b) Sl, (c) FA, (d) Wa, (e) WCR, (f) WBR, (g) SP, (h) SPCR, (i) CA, (j) Sa.

Figure 19. SHAP plots illustrate input impact on flow.

Figure 20. Feature interdependencies for flow: (a) CM, (b) Sl, (c) FA, (d) Wa, (e) WCR, (f) WBR, (g) SP, (h) SPCR, (i) CA, (j) Sa.

Table 1. Statistical summary of concrete mix samples.

Statistical Metrics	CM (kg/m³)	Sl (kg/m³)	FA (kg/m³)	Wa (kg/m³)	W/C	W/B	SP (kg/m³)	SP/C	CA (kg/m³)	Sa (kg/m³)	CS (MPa)	Flow (mm)
Mean	223.1	82.0	150.5	196.3	1.0	0.4	8.6	0.0	885.6	739.4	35.7	488.7
Standard Error	3.4	2.7	3.8	0.9	0.0	0.0	0.1	0.0	3.8	2.9	0.4	7.7
Median	172.8	101.0	164.0	195.0	1.0	0.4	8.0	0.0	881.0	741.0	35.0	530.0
Mode	159.0	0.0	0.0	183.0	1.2	0.6	6.0	0.1	884.0	757.0	33.8	200.0
Standard Deviation	77.2	61.9	86.5	19.9	0.3	0.1	2.8	0.0	87.1	65.3	8.1	175.5
Sample Variance	5963.4	3832.3	7480.4	394.7	0.1	0.0	8.1	0.0	7583.7	4263.9	65.4	30,790.4
Kurtosis	−1.6	−1.3	−0.8	−0.7	−1.3	−0.5	1.7	1.0	−0.8	−0.7	0.1	−1.0
Skewness	0.3	−0.2	−0.7	0.3	0.2	0.3	1.1	1.2	0.0	0.3	0.1	−0.5
Range	237.0	193.0	260.0	80.0	1.2	0.4	14.6	0.1	341.9	261.4	41.3	580.0
Minimum	137.0	0.0	0.0	160.0	0.5	0.3	4.4	0.0	708.0	640.6	17.2	200.0
Maximum	374.0	193.0	260.0	240.0	1.7	0.7	19.0	0.1	1049.9	902.0	58.5	780.0
Sum	114,880.1	42,216.2	77,493.5	101,096.3	507.1	228.0	4419.0	22.6	456,066.1	380,805.6	18,393.6	251,681.0
Count	515.0	515.0	515.0	515.0	515.0	515.0	515.0	515.0	515.0	515.0	515.0	515.0

Table 2. Hyperparameter configuration for MEP and GEP models.

MEP		GEP
Hyper-Parameters	Settings	Hyper-Parameters	Settings
Data type	Real numbers	Stumbling mutation	0.00141
Problem type	Symbolic regression	Lower bound	−10
Cross over type	Uniform	Inversion rate	0.00546
Replication number	15	Chromosomes	150
Mutation probability	0.01	Data type	Floating number
Number of treads	2	IS transposition rate	0.00546
Number of generations	250	Head size	8
Operators/variables	0.5	Linking function	Addition
Error	MAE	Constant per gene	10
Function set	Addition, Subtraction, Multiplication, Division, power, and square root	Upper bound	10
Number of sub-populations	50	Mutation rate	0.00138
Sub-population size	200	Genes	8
Number of runs	10	Leaf mutation	0.00546
Cross over probability	0.9	General	CS and Flow
Code length	50	Two-point recombination rate	0.00277
		RIS transposition rate	0.00546
		One-point recombination rate	0.00277
		Function set	Addition, Subtraction, Multiplication, Division, power, and square root
		Gene recombination rate	0.00277
		Gene transposition rate	0.00277
		Random chromosomes	0.0026

Table 3. Model evaluation through statistical indicators.

Property	CS (MPa)		Flow (mm)
Property	GEP	MEP	GEP	MEP
MAE	1.725	1.308	52.183	30.503
MAPE	5.000	3.700	13.000	6.400
RMSE	2.405	1.656	63.746	41.518
R	0.954	0.975	0.939	0.961
RSE	0.342	0.262	0.286	0.242
NSE	0.910	0.947	0.875	0.909
RRMSE	0.792	0.712	0.632	0.512

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alqurashi, M. Data-Driven Insights into Concrete Flow and Strength: Advancing Smart Material Design Using Machine Learning Strategies. Buildings 2025, 15, 2244. https://doi.org/10.3390/buildings15132244

AMA Style

Alqurashi M. Data-Driven Insights into Concrete Flow and Strength: Advancing Smart Material Design Using Machine Learning Strategies. Buildings. 2025; 15(13):2244. https://doi.org/10.3390/buildings15132244

Chicago/Turabian Style

Alqurashi, Muwaffaq. 2025. "Data-Driven Insights into Concrete Flow and Strength: Advancing Smart Material Design Using Machine Learning Strategies" Buildings 15, no. 13: 2244. https://doi.org/10.3390/buildings15132244

APA Style

Alqurashi, M. (2025). Data-Driven Insights into Concrete Flow and Strength: Advancing Smart Material Design Using Machine Learning Strategies. Buildings, 15(13), 2244. https://doi.org/10.3390/buildings15132244

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Insights into Concrete Flow and Strength: Advancing Smart Material Design Using Machine Learning Strategies

Abstract

1. Introduction

2. Research Framework

2.1. Data Curation and Preprocessing

2.2. Machine Learning Modeling

2.2.1. GEP Framework for Predictive Modeling

2.2.2. MEP Framework for Predictive Modeling

2.3. Model Accuracy Assessment Approach

3. Computational Outcomes and Interpretation

3.1. CS Models

3.1.1. GEP Simulation for CS

3.1.2. MEP Simulation for CS

3.2. Flow Models

3.2.1. GEP Simulation for Flow

3.2.2. MEP Simulation for Flow

3.3. Model Accuracy Assessment

3.4. SHAP Analysis

3.4.1. Features Influence on Compressive Strength

3.4.2. Features Influence on Workability (Flow)

4. Discussions

5. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI