**1. Introduction**

Self-compacting concrete (SCC), a type of high-performance concrete (HPC) with a superior ability to deform and resistance to segregation, was developed for the first time in Japan in 1986 [1]. SCC has been utilized in Japan for major office buildings as well as innovative types of extruded tunnels combined with steel fibers [2]. The utilization of SCC reduced the construction site noise level and its impact on the environment. SCC is better than regular concrete for many reasons, including (1) eliminating the need for vibration; (2) lowering construction duration and costs of labor; minimizing noise pollution; (4) enhancing the filling volume of highly crowded structural elements; (5) improving the transition zone among the cement paste and reinforcement or aggregate; (6) limiting concrete's permeability and increasing its durability [3,4]. The introduction of SCC allows for the exploitation of replacement materials, industrial waste, and other secondary resources, such as mineral chemicals, and generates interest in carrying out this process [5–7].

**Citation:** Amin, M.N.; Al-Hashem, M.N.; Ahmad, A.; Khan, K.; Ahmad, W.; Qadir, M.G.; Imran, M.; Al-Ahmad, Q.M.S. Application of Soft-Computing Methods to Evaluate the Compressive Strength of Self-Compacting Concrete. *Materials* **2022**, *15*, 7800. https://doi.org/ 10.3390/ma15217800

Academic Editor: Krzysztof Schabowicz

Received: 1 September 2022 Accepted: 28 September 2022 Published: 4 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

In general, the quality of SCC is determined by its compressive strength (CS), which provides a basic indication of concrete because it is linked to the structure of the hardened mixture [8,9]. Typically, the compressive strength of SCC is determined by costly and time-consuming physical trials, therefore the work productivity will be extremely low [10]. On account of its complicated composition, SCC requires a suitable mixed design procedure in order to achieve its desired qualities [11]. For the selected design procedure, the materials used must be balanced with at least one mineral and one or more chemical additives [12]. The difficulty in improving grain size dispersal and packing particles in stronger cohesion for SCC is met by looking for the optimal balance equivalency among the coarse and fine components and the admixtures [13–15]. For this reason, technological advancements make it possible to solve engineering challenges at a lesser cost by employing empirical regression, simulation techniques, and machine learning algorithms [16–18]. These approaches enable the forecasting of the CS of SCC based on the proportions of different components in the mixture that has been created such as aggregate, cement, superplasticizers, and water [19–21].

In recent decades, machine learning (ML) approaches have emerged as an appealing modelling tool appropriate to a broad array of scientific fields, including materials engineering [22–27]. These data sets can be used to build an appropriate surrogate model for predetermined model parameters, hence eliminating the need for costly and timeconsuming trials [28]. Considering this, a trend has gained a surge in recent years by using ML techniques to anticipate the CS of concrete material [29–35]. These methods can be utilized for a number of applications, including regression, classification, correlation, and clustering [36–40]. With the advancement of ML approaches, it is consequently uncomplicated to investigate the CS of SCC along with the concrete's other properties [41,42]. Thus, to investigate the strength properties of SCC. Asteris et al. [43] employed the artificial neural network algorithm from ML techniques. The study was based on the prediction of 28 days CS of SCC in a limited time period. Awoyera et al. [44] investigate the predictive accuracy of ANN and GEP approaches for the strength properties of SCC. It was reported that both ANN and GEP successfully anticipated the required properties of SCC.

The purpose of this research is to investigate and evaluate the prediction capabilities of three distinct machine learning techniques for the CS of superplasticized self-compacting concrete (SCC). This research is groundbreaking in that it makes a prediction about the CS of SCC on the data set that was chosen by employing both ensemble machine learning methods (boosting regressor) and individual machine learning approaches (SVM, MLP). This research involves the descriptive analysis of the variables, the application of Python codes for running the employed models, statistical checks for the model's legitimacy, a validation approach for validating the models, and sensitivity analysis to check the impact that the variables have on the predictive outcome. This study has the potential to make a significant contribution to the construction industry's utilization of novel tools and approaches for investigating the various properties of construction materials in a manner that is economical, takes a limited amount of time, and does not require any physical effort in the laboratory.

#### **2. Research Significance**

This study presents the implementation of individual machine learning algorithms in addition to ensemble machine learning approaches in order to estimate the compressive strength of self-compacting concrete (SCC). In order to execute the necessary models for the purpose of prediction, the anaconda navigator software was programmed with the Python programming language. Twenty bagging sub-models were trained on the data, and then those models were tuned so that they had the maximum R2 value. In addition to this, the test data were confirmed by employing k-fold cross-validation in conjunction with R2, MAPE, MAE, and RMSE. Moreover, the statistical model performance index was utilized in order to contrast individual models with ensemble models (e.g., MAPE, MAE, and RMSE). Furthermore, a comparative study of the obtained results and with the results of similar

published articles has also been carried out in order to have a better understanding of an accurate model for the forecasting of the concrete's strength. This was carried out in order to have a better understanding of the accurate model towards the forecasting of the concrete's strength. In addition, the sensitivity analysis was included in the research in order to analyze the contribution level of each input parameter toward the strength prediction of SCC. This was carried out in order to ensure that the study was as accurate as possible.

### **3. Materials and Methods**

Python coding (attached in Supplementary Data) in the Anaconda navigator software plays a vital role and was used for running all the employed models. The data set of self-compacting concrete (SCC) used for running the models to anticipate the compressive strength (CS) was retrieved from the literature [45–62]. A total of 169 data points (attached in the Supplementary Data) was used for running the selected models. The software automatically splits 70% of the data for training the model and 30% for testing the model. While the k-fold cross validation approach was adopted to validate the required model. To reduce the complexity of the data, the data preprocessing method was adopted. Data preprocessing for data mining addresses one of the most crucial challenges inside the renowned knowledge discovery from data procedure. Data preparation covers data reduction strategies that try to reduce the data's complexity by recognizing and deleting irrelevant and noisy data items. The model's analysis was conducted by using the regression and error distribution processes. Eleven input variables, including cement, limestone powder, coarse aggregate, fly ash, water, fine aggregate, GGBS, silica fume, RHA, superplasticizers, and VMA, were introduced for a single outcome such as compressive strength. The selection of these parameters was based on the importance of their effect in the concrete material. The selected input parameters show a significant effect when evaluating their effect using sensitivity analysis. The influence of all the input parameters was also accessed for predicting the CS of SCC through sensitivity analysis. The descriptive statistical analysis was also incorporated for these parameters as listed in Table 1. The validation method has also been adopted to evaluate the precision level of the employed models. Moreover, the histograms give the relative frequency dispersion of all the variables, as shown in Figure 1. A frequency distribution of all the input variables describes how often different values occur in a complete data set. Relative frequency distributions are valuable because they show how common a value is in a data set in comparison to all other values. In addition, violin plot distribution for all the variables is shown in the Figure 2.

**Table 1.** Descriptive statistics of variables.


**Figure 1.** Input parameters' relative frequency distribution.

**Figure 2.** Violin plots distribution of the input parameters.

#### **4. Employed Machine Learning Algorithms**

*4.1. Multilayer Perceptron (MLP)*

An MLP is a type of feedforward ANN that turns a set of inputs into outputs. Between the output and input layers, a targeted graph connects many layers of input nodes. In MLP, the network is trained with backpropagation. It can also connect many loops in a directed graph, with signals moving in only one direction across the nodes. Every entity, with the exception of the input nodes, possesses its very own unique nonlinear activation function. MLPs, which are a form of supervised learning, make use of backpropagation in their learning processes. MLP is often called a deep learning approach because it uses so many layers of neurons. MLP is often used in studies of supervised learning, imputation, parallel distributed processing, and pure science. Machine translation, image recognition, and speech recognition are all examples of applications. To begin, the algorithm selects the predictors that it will be utilized throughout the regression phase in order to locate the variance inflation component (VIF). The VIF then figures out how much an estimated regression coefficient has changed because of collinearity. Figure 3 is a flowchart that shows the whole process of predicting the results of the MLP model.

**Figure 3.** Multilayer perceptron model execution process [63].

## *4.2. Support Vector Machine (SVM)*

SVM refers to a type of algorithm which connected learning algorithms used for evaluating data for both regression and classification. A SVM technique is a description of the samples as points in space that have been drawn in such a way that the patterns of the different classifications are separated by a discrete vector (line/plane) with the largest possible gap. Figure 4 depicts the classification of additional cases based on the side of the vector on which they lie. Figure 5 displays the implementation approach for the SVM model. This model is used to assess the material's strength, taking into account the influence of multiple factors. The optimization strategy is used to determine the SVM model's parameters.

**Figure 4.** Model mapping of the support vector machine algorithm [64].

**Figure 5.** Execution process of the SVM model [65].

#### *4.3. Bagging Regressor (BR)*

BR, also referred to as bootstrap aggregation, is a method for combining multiple versions of an anticipated model. Each model is independently trained, then the results are averaged. BR's primary objective is to attain a lesser divergence than any one model. The process of producing bootstrap samples from a selected data point is known as bootstrapping. The samples are formed by selecting and exchanging data points at random. The characteristics of the resampled data are distinct from those of the original data in its totality. It shows how the data are spread out and tends to keep bootstrapped samples from becoming too similar. This means that the data distribution must stay the same while keeping bootstrapped samples from becoming too similar. This aids in the development of robust models. In addition, bootstrapping helps prevent the overfitting issue. When constructing a model, the utilization of a large number of training data sets results in a decreased likelihood of errors and improved performance when applied to test data. This reduces variation by giving the test set a strong base. Multiple permutations of the model ensure that it is not biased towards an inaccurate outcome. The BR model's flowchart can be seen in the Figure 6.

**Figure 6.** Bagging model execution process for required output.

#### **5. Results and Discussion**

*5.1. MLP Model Outcome*

Figure 7 shows a depiction of the relationship between the actual and anticipated values for the self-compacting concrete's (SCC) compressive strength. This relationship gives the coefficient of determination (R2) value of 0.86. Figure 8 illustrates the disparity between the actual and expected results. The tabulated information in the figure shows that 'x' is the variable that is being explained, and y is the variable that is being investigated. The slope of the line is denoted by the letter b, and 'a' is the intercept (the value of y when x is equal to 0). The difference depicts the higher and lower values equal to 21.50 MPa, and 0.18 MPa, respectively. Moreover, it has been noted that the 41.18% of the difference data were found between the minimum value (0.18 MPa) and 5 MPa, and 45.10% of the data were noted among 5 MPa, and 10 MPa. However, only 13.73% of the difference data were located above 10 MPa.

**Figure 7.** Experimental and predicted outcomes relationship of CS from MLP model.

**Figure 8.** Indication of the error's difference between the actual and forecasted CS result of SCC from ML model.

The box plot as shown in the Figure 9 gives more statistical information such as the minimum, maximum, median, mean, and first and third quartile values for both the experimental and forecasted outcomes from the test set. The values on the graph clearly indicate the difference of predicted and actual results while comparing.

**Figure 9.** Box plot for predicted and experimental outcomes from MLP model.

#### *5.2. SVM Model Output*

As shown in Figure 10, the SVM model provides a superior link between the experimental CS of SCC and the projected outcome when compared to the MLP model, which results in an R2 value of 0.90 having been determined. Figure 11 is an illustration of the distribution of the data, which shows the disparity between the actual and the targeted values. The greatest value, the minimum value, and the average value, all based on this distribution, are 14.81 MPa, 0.21 MPa, and 5.72 MPa, respectively. In addition, 50.98% of these measurements was obtained between 0.21 MPa and 5 MPa, 33.333% of these measurements was obtained between 5 MPa and 10 MPa, and only 15.61% of these measurements was obtained at or above 10 MPa.

**Figure 10.** Experimental and predicted outcomes relationship of CS from SVM model.

**Figure 11.** Indication of the error's difference between the actual and forecasted CS result of SCC from SVM model.

In addition, Figure 12 provides additional statistical information, including the minimum, maximum, median, mean, first quartile, and third quartile values for both the experimental and projected outcomes from the test set. The data on the graph make it abundantly evident that there is a disparity between the results that were projected and those that were actually achieved.

**Figure 12.** Box plot for predicted and experimental outcomes from SVM model.

#### *5.3. BR Model Outcome*

As can be seen in Figure 13, the output of the bagging model demonstrates a strong and better relationship with the experimental CS result of the self-compacting concrete than the predictions of the MLP and SVM models, and it gives an R<sup>2</sup> value of 0.95. This is in contrast to the predictions of the MLP and SVM models. Figure 14 also provides a visual representation of the error's distribution, which is an additional point of interest. The variation produces data with a maximum of 13.05 MPa, a minimum of 0.16 MPa, and

an average of 3.87 MPa, respectively. Additionally, it was seen that 72.55% of this data fell between 0.16 MPa and 5 MPa, while 19.61% of the data were reported to fall between 5 MPa and 10 MPa. However, only 5.88% of these values were found to be higher than the 10 MPa criterion.

**Figure 13.** Experimental and predicted outcomes relationship of CS from BR model.

**Figure 14.** Indication of the error's difference between the actual and forecasted CS result of SCC from BR model.

Moreover, further statistical information is provided in Figure 15, including the minimum, maximum, median, mean, and first and third quartile values for both the experimental and predicted test set results. The discrepancy between the expected and actual outcomes is graphically represented by the graph's values. The result of the Bagging model seems closer with one another (actual and predicted) as opposed to both SVM and ML models.

**Figure 15.** Box plot for predicted and experimental outcomes from BR model.

#### *5.4. K-Fold Cross Validation Outcomes and Statistical Metrics*

K-fold and statistical tests were applied to validate the ML algorithms in use. Typically, the k-fold method is utilized to test the viability of a strategy by arbitrarily distributing and dividing relevant data into 10 groups. As shown in Figure 17, nine groups are used to train machine learning models, while one is used to validate them. The ML approach is more accurate when the errors (MAPE, MAE, and RMSE) are minor and R2 is superior. In addition, the technique must be performed 10 times for a desirable outcome. This huge amount of work is a big reason why the model is so accurate. Moreover, the statistical metrics obtained from the models are listed in the Table 2. In the meantime, Figure 16 gives the statistical information about the accuracy level of the employed models for the CS of SCC. This Tylor diagram also indicates the better performance of the bagging model towards the required outcome as compared to SVM and MLP models. The error percent for BR model is less than 8 MPa, while both MLP and SVM models give the same result equal to 12.96 MPa and 11.44 MPa, respectively.

**Table 2.** Statistics derived from the employed models.


**Figure 16.** Statistical indication of the model's performance by Taylor diagram.

Using Equations (1)–(3) derived from previous research [66], the statistical prediction performance of the techniques was evaluated.

$$\text{MAE} = \frac{1}{n} \sum\_{i=1}^{n} |P\_i - \text{Ti}| \tag{1}$$

$$\text{RMSE} = \sqrt{\sum \frac{\left(P\_i - T\_i\right)^2}{n}} \tag{2}$$

$$\text{MAPE} = \frac{1}{n} \sum\_{t=1}^{n} (A - F/A) \tag{3}$$

where *n* = number of data points, *Pi* = anticipated values, and *Ti* = experimental values, *A* is the actual values and *F* is the forecasted values from the data set.

**Figure 17.** Statistical evaluations of the models used for this investigation [67].

Statistical and k-Fold Analysis

In order to determine whether or not the model being used is legitimate, a k-fold cross validation check was implemented as a standard. To investigate the results, the statistical metrics were taken into consideration: R2, MAE, RMSE. According to the k-fold study, MLP models had higher values of R2, MAE, and RMSE, as shown in Figure 18: 0.86, 18.53, and 24.46 MPa, respectively. Similarly, the highest values for the same metrics for SVM models were reported as 0.90, 19.20 MPa, and 20.98 MPa, as shown in Figure 19. However, the higher, lower, and average values of R2, MAE, and RMSE for the bagging model were noted as 0.95, 19.74 MPa, and 18.94 MPa, respectively, and can be seen in Figure 20.

**Figure 18.** Statistical analysis of MLP model.

**Figure 19.** Statistical analysis of SVM model.

**Figure 20.** Statistical analysis of BR model.

#### *5.5. Discussion on the Main Findings*

This study describes the predictive performance of three different types of ML algorithms for the CS of SCC. The multilayer perceptron (MLP), SVM, and bagging regressor (BR) have been investigated for the analysis. Even though MLP and SVM are individual ML techniques, the precision of their predictive results was noted to be within acceptable limits. The BR belongs to the ensemble ML approach, which normally goes through the process of splitting the model into 20-sub models for optimization to have a strong outcome. The result of the bagging sub-models can be seen in the Figure 21. It has been noted that the input parameters and number of data points have a significant effect on the required outcomes. Therefore, the descriptive statistics of the input variables, relative frequency distribution of the input data, and sensitivity analysis for evaluation of their influence on the outcome were incorporated into the study. It was determined that the correlation between the experimental CS result and the prediction CS result from all employed models was satisfactory. The k-fold cross validation approach was also introduced to check the legitimacy of the models. The comparison of the present study with the other relevant studies has also been taken into consideration and found to have a reasonable and better relationship.

**Figure 21.** Result of the bagging sub-models.

#### *5.6. Comparison with Other Studies*

The result comparison for the application of ML approaches for predicting the same type of outcomes reported in the published articles are listed in the Table 3.


**Table 3.** Result comparison with published studies.

#### **6. Sensitivity Analysis**

This approach was introduced to investigate the impact of each input variable on the predicted CS of SCC. This analysis reveals that the highest contribution was made by the binding material (cement) by giving 16.25% towards the anticipation of CS of SCC. However, rice husk ash contributed the least (4.25%) to predicting the required outcome. Moreover, the other variables' impact from the analysis in the descending order were reported for superplasticizers (13.44%), silica fume (11%), fly ash (9.94%), coarse aggregate (9.50%), limestone powder (8.90%), fine aggregate (8.80%), VMA (6.65%), and water (6.40) as depicted in the Figure 22. However, the Equations (4) and (5) were used to calculate the percent contribution of each parameter towards the required outcome.

$$N\_i = f\_{\max}(\mathbf{x}\_i) - f\_{\min}(\mathbf{x}\_i) \tag{4}$$

$$S\_i = \frac{N\_i}{\sum\_{j=i}^n N\_j} \tag{5}$$

where, *fmax*(*xi*) and *fmin*(*xi*) are the highest and lowest of the anticipated output over the *i th* output.

**Figure 22.** Influence of the input parameters towards the predicted output.

#### **7. Limitations and Future Perspective**

The following are the limitations regarding the application of machine learning approaches along with recommendations for future studies.


#### **8. Conclusions**

This investigation was predicated on the utilization of supervised ML techniques for the purpose of estimating the CS of SCC. For the purpose of predicting the CS of SCC, the MLP, SVM, and the BR were all investigated. Nevertheless, the following inference can be made based on the findings of the study:


**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ma15217800/s1.

**Author Contributions:** M.N.A.: conceptualization, funding acquisition, resources, project administration, supervision, writing—reviewing and editing. M.N.A.-H.: funding acquisition, resources, visualization, writing—reviewing and editing. A.A.: conceptualization, data curation, software, methodology, investigation, validation, writing—original draft. K.K.: methodology, investigation, writing reviewing and editing. W.A.: resources, visualization, writing—reviewing and editing. M.G.Q.: funding acquisition, writing—reviewing and editing. M.I.: visualization, resources. Q.M.S.A.-A.: visualization, resources. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Project No. GRANT752]. The APC was funded by the same "Project No. GRANT752".

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data used in this research have been properly cited and reported in the main text.

**Acknowledgments:** The authors acknowledge the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia (Project No. GRANT752). The authors extend their appreciation for the financial support that has made this study possible.

**Conflicts of Interest:** The authors declare no conflict of interest.
