Machine Learning Approach for Assessment of Compressive Strength of Soil for Use as Construction Materials

Mustafa, Yassir M. H.; Wudil, Yakubu Sani; Zami, Mohammad Sharif; Al-Osta, Mohammed A.

doi:10.3390/eng6050084

Open AccessArticle

Machine Learning Approach for Assessment of Compressive Strength of Soil for Use as Construction Materials

by

Yassir M. H. Mustafa

^1,*

,

Yakubu Sani Wudil

^1,2,

Mohammad Sharif Zami

^1,3

and

Mohammed A. Al-Osta

^1,4

¹

Interdisciplinary Research Center for Construction and Building Materials, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

²

Laser Research Group, Physics Department, King Fahd University of Petroleum and Minerals (KFUPM), Mailbox 5047, Dhahran 31261, Saudi Arabia

³

Architecture and City Design Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

⁴

Civil and Environmental Engineering Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Eng 2025, 6(5), 84; https://doi.org/10.3390/eng6050084

Submission received: 26 February 2025 / Revised: 18 April 2025 / Accepted: 18 April 2025 / Published: 23 April 2025

(This article belongs to the Section Chemical, Civil and Environmental Engineering)

Download

Browse Figures

Versions Notes

Abstract

This study investigates the use of machine learning techniques to predict the unconfined compressive strength (UCS) of both stabilized and unstabilized soils. This research focuses on analyzing key soil parameters that significantly impact the strength of earth materials, such as grain size distribution and Atterberg limits. Machine learning models, specifically Support Vector Regression (SVR) and Decision Trees (DT), were employed to predict UCS. Model performance was evaluated using key metrics, including the Pearson coefficient of correlation (r²), coefficient of determination (R²), mean absolute error, and root mean square error. The findings reveal that, for unstabilized soils, both SVR and DT models exhibit remarkable performance with r² values of 0.9948 and 0.9947, respectively, with the DT model surpassing the SVR model in estimating UCS. Validation was conducted using data from four types of locally available soils in the Najd region of Saudi Arabia, although some disparities were noted between actual and predicted results due to limitations in the training data. The analysis indicates that, for unstabilized soil, grain size distribution and moisture content during testing are primary influencers of strength, whereas, for stabilized soil, factors such as stabilizer type and content, as well as density and moisture during testing, are pivotal. This research demonstrates the potential of machine learning for developing a robust classification system to enhance earth material utilization.

Keywords:

rammed earth; soil stabilization; unconfined compressive strength; machine learning; decision tree; support vector regression

1. Introduction

The rapid growth in the human population around the world has increased the damage imposed on the environment [1]. Earth materials (soil, rocks, etc.), which are thought to be one of the oldest types of building materials (dating back to 7500 BC) [2], were recently reconsidered as one of the best methods for addressing energy and environmental concerns in residential buildings [3]. Previous studies have shown the capability of using soil as a construction material that significantly reduces CO₂ emissions as compared to conventional construction materials [4,5], and provides better thermal comfort [6]. Moreover, these materials have been proven to be both durable [7] and environmentally sound [8]. Furthermore, the availability of earth resources, with cheaper construction and maintenance costs, has made such materials acceptable for the fabrication of load-bearing residential walls, bricks, and other construction units [9].

The strength of such materials could originate from different sources. Unstabilized soil, for example, derives its strength from physical characteristics, such as density and grain size distribution, which primarily determine particle interlocking and, thus, soil strength. Moreover, soil strength is primarily affected by moisture content because the strength developed by the capillary action of water (capillary forces) plays a significant role in keeping the soil skeleton intact [10]. Another factor is clay minerals [11], which influence the soil’s strength in terms of both grain size and interaction with water.

In the case of stabilized soils, the formation of chemical compounds like the hydration products (i.e., calcium silicate hydrate) could be considered the primary source of strength. Moreover, the strength of stabilized soil is affected primarily by the characteristics of the raw soil. The grain size distribution, for instance, plays a major role in strength development even in the case of stabilization. Increasing the sand content enhances the strength of cement-stabilized soil as reported in a previous study [12]. This study analyzed the effect of adding sand to cement-stabilized soil to enhance its mechanical strength. The raw soil had a fines content of 12.6% and a plasticity index of 29%. It was found that a soil sample stabilized with 5% cement and 50% sand could achieve a compressive strength of 2.89 MPa, exceeding the required 2 MPa rammed earth threshold value [12]. Furthermore, higher gravel content in coarse-grained soil enhances stabilized soil strength, as reported previously [13]. The effects of the aforementioned characteristics were thoroughly discussed in a previous review [14].

Therefore, in order to use earth as a construction material, each of these characteristics should be adjusted and added to the overall mix in particular proportions to achieve the optimum performance. Several studies have been carried out to determine the best set of characteristics for earth materials. Given the immense variety and complexity of soil properties (due to their different origins), large variations are to be expected, resulting in a high level of uncertainty [15]. For instance, Houben & Guillaud [16] reported a plasticity index (PI) of 15–20% as sufficient, while Alley [17] reported a lower value (6%). Similarly, Burroughs [18] recommended a PI below 15% as optimal. For grain size distribution, Houben & Guillaud [16] recommended a maximum silt/clay content of 50% for optimal mixes, which agrees with Norton’s suggestion (55%) [19]. Other studies reported a lower fines content (35%) [18,20]. Consequently, engineers and researchers around the world are conducting comprehensive experimental programs to obtain the optimal soil mix based on previous studies.

Recently, several advanced tools have been adopted to analyze soil behavior, such as artificial intelligence (AI) methods [21,22], finite and discrete element modeling [23], and molecular dynamic simulation [24]. Regarding artificial intelligence, many methods have been utilized in various areas of geotechnical engineering. These include artificial neural network (ANN) [25,26,27,28,29,30], support vector machine [31,32,33,34,35], and decision trees [36,37]. These methods have been applied to several geotechnical problems, including the grain size distribution [38], soil liquefaction assessment [31], compaction characteristics [39], slope stability [34], foundation settlement [32,33,40], and blast-induced ground vibration [41]. Baghbani et al. [42] presented a thorough review of artificial intelligence applications in geotechnical engineering.

The unconfined compressive strength (UCS) of stabilized soil has received the majority of the research interest in the area of artificial intelligence [43]. Several studies predicted the strength of stabilized soil using the ANNs [30,44,45,46,47,48], multivariable regression analysis [49,50], the SVM [51,52,53,54,55], and DT [56]. Most of these studies assessed the unconfined compressive strength while incorporating different input variables into their models. The input parameters varied between the different studies in accordance with the specific requirements of each research. For some studies, the grain size distribution, Atterberg limits, and the stabilizer content were the main input parameters [44,57,58]; others included the mineralogical composition of the material [45] and curing period and condition [46]. The size of the data varied from one study to another, depending on whether the data were experimental or collected from the literature. Table 1 highlights some of the recent studies carried out on the application of artificial intelligence in earth materials analysis. As shown in Table 1, most of the studies focused on the usage of artificial neural networks to predict the performance of stabilized soils or rocks. In the reviewed studies, the coefficient of determination (R²) was commonly used to evaluate model performance, indicating how well the predicted values match the actual measurements. In all these studies, the application of ANN proved effective in predicting the strength of the materials, whether they are rocks [59] or soils. However, other studies utilized different machine learning approaches and achieved excellent results, such as SVM [55,56,60,61], as shown in Table 1, which shows the capability of machine learning in the analysis of soil properties.

Finally, the studies shown in Table 1 provide insights into which parameters are mostly affecting the strength. The selection of the most important parameters differs based on the input features in every study. Some studies suggested the grain size distribution as the main feature affecting the strength [39,44,45], while others considered the stabilizer content [61,62]. However, it is difficult to estimate which parameters are mostly affecting the strength without considering several features in the analysis. Therefore, in the current study, many features were included that are believed to affect the strength based on the literature.

Table 1. Summary of some of the previous studies on the application of artificial intelligence and machine learning in predicting the strength of stabilized soils/rocks (compiled by the authors).

Study	Input Parameters	Output Parameters	Method	Dataset Size	Performance	General Remarks
[39]	Grain size distribution; Atterberg limits; Linear shrinkage; Cement, lime, and asphalt content.	OMC and MDD	ANN	192	R² = 0.990 and 0.987 for MDD and OMC	It was found that both the grain size and plasticity are the main affecting parameters.
[44]	Grain size distribution; Atterberg limits; Water content; Cement content.	MDD and UCS		55	R² = 0.828	Both the moisture and clay contents were found to be the most effective parameters.
[45]	Atterberg limits; Grain size distribution; Stabilizer content; Mineralogical compositions.	UCS		283	R² = 0.964	Sand and stabilizer content are the most effective parameters.
[57]	Grain size distribution; Atterberg limits; water content; Cement content.	MDD and UCS		55	R² = 0.828 (MDD) R² = 0.865 (UCS)	Both the moisture and gravel contents were found to be the most effective parameters.
[58]	Grain size distribution; Cement content; Density; Moisture content.	UCS		373	R² = 0.943	Lime and micro-silica contents are the most effective parameters.
[63]	Mineralogical components	UCS; plasticity coefficient; drying shrinkage; shaping moist.		139	R² = 0.785	Both the moisture and cement contents were found to be the most effective parameters.
[56]	Grain size distribution; MDD; OMC; Lime content; Curing period; 7-Day soaked CBR	UCS	M5 model tree, RF, ANN, SVM, and Gaussian processes (GP)	255	R² = 0.967 (M5) R² = 0.989 (RF) R² = 0.991 (ANN) R² = 0.994 (SVM) R² = 0.997 (GP)	It was found that the GP analysis was the most efficient method. Moreover, for stabilized pond ash, GP can reduce the tedious lab work and help in identifying the UCS of the material.
[64]	Grain size distribution; Chemical composition; Cement content; curing period.	UCS	ANN	80	R² = 0.994	Both the cement content and sand less than 0.5 mm are the most effective parameters.
[65]	Wet density; Dry density; Moisture content; Brazilian tensile strength (BTS)	UCS	GBR, Catboost, LightGBM, and XGBoost	106	R² = 0.928 (GBR) R² = 0.879 (Catboost) R² = 0.357 (LightGBM) R² = 0.999 (XGBoost)	The UCS of sedimentary rocks was evaluated using different ML approaches. The XGBoost algorithm achieved the best performance. The sensitivity analysis showed correlation between the BTS and wet density.
[66]	Size of stabilizer (recycled tiles); Content of RT; MDD; OMC; pH; PI; Curing time	UCS	Optimized ANN with PSO and ICA algorithms	156	R² = 0.956 (ANN-PSO) R² = 0.954 (ANN-ICA)	Both models were able to predict the strength of treated clay at high level of accuracy.
[61]	Cement content; Moisture content; Wet density; Soil type; Sample depth; Sample dimensions (i.e., diameter, height, and area); Curing condition and period.	UCS	Gradient boosting (GB); ANNs; SVM	216	R = 0.929, 0.93, and 0.87 for GB, ANN and SVM	Cement content and soil smaller than 0.5 mm
[67]	Cement content; Dry density; Suction	UCS	Optimized ANN with GA, PSO, and ICA algorithms	96	R² = 0.9 (ICA-ANN) R² = 0.921 (GA-ANN) R² = 0.941 (PSO-ANN)	The developed model was intended to assess the UCS for unsaturated cemented soil. Hybrid models were developed and high prediction accuracies were achieved.
[68]	Temperature; P-wave velocity; Density; Porosity; Dynamic Young modulus	UCS and E	MLR; ANNs; RF; KNN	60	R² = 0.9 (MLR) R² = 0.95 (ANN) R² = 0.94 (KNN) R² = 0.97 (RF)	The study was carried out on rock samples. The different models gave similar results.
[69]	Grain size distribution; Cement and lime content; Atterberg limits	MDD, OMC, UCS	Optimizable ensemble algorithms (OE) (bagging and boosting regression), and ANN	162	R² = 0.56 (OMC); 0.21 (MDD); 0.61 (UCS) for OE R² = 0.55 (OMC); 0.25 (MDD); 0.65 (UCS) for ANN	For MDD, both methods were not able to predict the results. Marginally acceptable results were achieved for OMC and UCS. For UCS, it is found that the density is the most important factor affecting the strength.
[62]	Grain size distribution; Atterberg limits; Organic content; Lime and cement contents	UCS	Optimized SVR with SA, RRHC, PSO, HGS, and SMA	227 (167 for lime and 60 for cement)	R² = 0.853 (SVR-HGS) R² = 0.869 (SVR-SMA) R² = 0.874 (SVR-RRHC) R² = 0.875 (SVR-SA) R² = 0.880 (SVR-PSO)	The hybrid model of SVR-PSO gave the best performance. The study revealed that the stabilizer’s content is the main parameter affecting the strength, followed by the organic content.
[50]	Grain size distribution; Atterberg limits; MDD and OMC; Aspect ratio; Sample condition (wet/dry); Cement and lime content	UCS	Multilinear regression (MLR) and ANN	488	For unstabilized soil: R² = 0.223 (MLR) R² = 0.988 (ANN) For stabilized soil: R² = 0.766 (MLR) R² = 0.846 (ANN)	The study showed that MLR could not predict the strength of unstabilized soil and marginally predicted the stabilized soil strength. The ANN outperformed the MLR for both stabilized and unstabilized soils.
[70]	Reinforcement type; Column diameter; Area replacement ratio; Column penetration ratio; Max-deviator stress	UCS	Several machine learning algorithms were utilized including, but not limited to, DT, logistic regression, ANN, etc.	52	R² = 0.894	Soils were treated using polypropylene columns. The random forest (RF) achieved the best performance among the selected models.
[71]	Consistency limits; Stabilizer dosage; Chemical compositions and ratios	UCS	ANN, GP, EPR	149	R² = 0.992 (ANN) R² = 0.925 (GP) R² = 0.944 (EPR)	The study revealed the superiority of the ANN over the other methods. Both the LL and Na/Al ratio are the main parameters affecting the strength of geopolymer cement granular sand.
The current study	Grain size distribution; Atterberg limits; Compaction characteristics; Aspect ratio of tested samples; Moisture condition during testing; Stabilizer (cement, lime, or unstabilized) dosage	UCS	Support vector regression (SVR) and Decision tree (DT)	488	?	?

Based on the previous discussion and literature review, to our knowledge, none of the previous studies have addressed this topic in the context of earth construction, particularly rammed earth, except for the study reported by Anysz & Narloch [58], which focused on using artificial neural networks for developing earth material mixes for compressed earth bricks (CEBs). Their study focused on predicting the strength of cement-stabilized CEBs while modifying the water and grain size proportions. However, their study was limited to only cement-stabilized earth materials. Moreover, none of the previous studies have assessed the strength of unstabilized soils that are still used in some parts of the world. Therefore, this study investigates the feasibility of using machine learning approaches—specifically decision trees (DTs) and support vector regression (SVR)—to analyze the unconfined compressive strength of unstabilized, cement-, and lime-stabilized soils. This paper is structured as follows: Section 2 details the materials and methods, including data collection, model development, and validation approaches. Section 3 evaluates the performance of SVR and DT models, validates predictions with experimental results, and examines feature importance. Section 4 discusses conclusions and future research directions. The study aims to establish a foundation for future studies to apply such techniques for the development of soil classification systems that serve the purpose of earth construction, and it is a part of a research project that is intended to analyze the behavior of earth materials as valuable construction materials.

2. Materials and Methods

To meet the current study’s objectives, several steps were followed. First, a comprehensive literature review was conducted to identify the most relevant parameters affecting the soil compressive strength. Based on this review, soil characteristics known to affect compressive strength were selected, including grain size distribution and Atterberg limits. Next, data from published literature were collected covering both stabilized and unstabilized soils. Subsequently, these data were analyzed using machine learning approaches (SVR and DT). Model performance was evaluated using statistical indicators and further validated with independent experimental results [72]. Finally, an importance analysis of the input features was conducted to assess their influence on UCS prediction. This approach provides a holistic assessment of the applicability of machine learning methods in predicting the strength of both stabilized and unstabilized soils.

The following sections describe the data collection and preparation methods, as well as the support vector regression and decision tree implementation. The Sci-Kit learn library (version 1.6.1), implemented in Python via Google Colab, was used for the analysis [73]. This library facilitates the implementation of different machine learning algorithms, like the SVM and random forest. Moreover, it provides comprehensive tools for data preprocessing and enables effective data visualization [74].

2.1. Model Input Variables

As presented earlier, several characteristics define the behavior of earth materials, including the grain size distribution [16] and consistency limits [18]. Previous studies have considered such parameters along with others, such as cement content [57,58] and organic matter [48], to predict the unconfined compressive strength of soil. For this study, a total of twelve parameters were considered. Each of these parameters was believed to affect the strength of both stabilized and unstabilized soils based on a comprehensive literature review. Table 2 provides a description of the selected parameters. Regarding the strength, only the 28-day unconfined compressive strength was considered in the current study.

This study included two additional parameters: aspect ratio (AR) and testing condition (DW), as detailed in Table 2. The AR served as a quantitative variable characterizing sample size and shape. Furthermore, it was demonstrated that there is a substantial difference in strength whether the samples tested were oven-dried or at their natural moisture content during testing [75,76]. Therefore, the parameter DW was included to indicate whether the sample is wet (1) or oven-dried (2).

In this investigation, only two types of stabilizers were investigated in the case of stabilization: cement and lime. The parameter ST denotes the stabilizer type, whereas SD denotes its dosage (by the dry weight of the soil). For simplicity, stabilizer types were coded numerically: 1 for unstabilized soil, 2 for lime-stabilized soil, and 3 for cement-stabilized soil.

2.2. Model Datasets

For this study, data from previous studies were compiled, resulting in a total of 488 cases—143 for unstabilized soils and 345 for cement and lime-stabilized soils [50]. Since most of the reviewed studies did not report complete data, many correlations from the literature were used to estimate the missing variables. Though several techniques can be adopted to estimate the missing variables, like multiple imputations [77], it was preferred to estimate these variables by correlation because they can consider the variation of soil characteristics with stabilization. Table 3 presents the equations used to estimate the missing variables along with their application criteria to ensure prediction accuracy.

In addition to the literature data, four distinct types of soils were collected from the Najd area in Saudi Arabia’s central region. The basic geotechnical parameters of the selected soils are shown in Table 4. The soils were stabilized with cement and lime at six dosages (2.5, 5, 7.5, 10, 12.5, and 15% by dry weight of the soil). Then, the UCS samples were prepared in small molds (38 mm diameter × 76 mm height) at their maximum dry density and optimum moisture content as reported in a previous study [72]. For each stabilizer dosage, three samples were prepared for UCS assessment, and the average value was recorded. All samples were cured by wrapping them in double plastic sheets for 28 days. Following curing, samples were unsealed and oven-dried for 48 h at 60 °C to remove the moisture. Then, the stabilized soils’ 28-day compressive strength was tested, and the results were utilized to validate the models developed in this work. The detailed results of the experimental program can be found in the referenced study [72].

Based on this information, the following three datasets were prepared for the current study:

Unstabilized Soil (Dataset A): In which only the unstabilized cases (i.e., 143) were considered. Such a dataset could help in understanding the role of the basic characteristics in affecting the strength of earth materials. This is especially important since some parts of the world still use unstabilized soil for earth construction [82].
Stabilized/Unstabilized Soil (Dataset B): Which included all cases (both stabilized and unstabilized). Such an approach will make the model general, which could help in incorporating different types of stabilizers in future studies and, hence, building more general models.
Validation Data (Dataset C): Including the experimental results of the soils shown in Table 4. The data were used to validate the final model and test its capability of evaluating the strength of stabilized soil. The dataset points are shown in Table A1.

Table 5 shows the statistical description of the data for the three datasets after estimating the missing variables. The data in Table 5 show the ranges of the different input parameters, which cover a wide range of soil properties. For instance, the plasticity index for stabilized and unstabilized soils shows a wide range covering the different plasticity conditions (high, medium, low, and non-plastic soils). Similarly, other input variables provide a wide range of values that enhance the efficiency of the developed models.

Prior to applying machine learning techniques, each dataset was randomized to mitigate potential biases and enhance model robustness. The data were then split into training and testing sets, with 80% allocated for training and the remaining 20% reserved for testing. Both the SVR and DT models were trained on the training set and subsequently evaluated on the test set to assess their performance.

Figure 1 and Figure 2 illustrate the correlation coefficients between the input factors and the output (UCS) in both unstabilized and stabilized soils. It should be noted that both ST and SD were not considered in Figure 1, since we are dealing with unstabilized soil, and both parameters will not affect the strength. There are strong relationships between the grain size distribution parameters for unstabilized soil. Similarly, the consistency limits displayed high correlations with themselves, the OMC, and the grain size (especially the fines content). Such a result is logical because of the effect of fine content on the soil consistency. Moreover, the data in Figure 1 show that specific parameters are highly correlated to the unconfined compressive strength, specifically the fines content, plasticity index, and linear shrinkage. Such parameters confirm the hypothesis presented by [18], which indicates the importance of these three parameters in the strength development of earth materials. Similar patterns were observed in the case of stabilized soil, as shown in Figure 2. The utilization of the correlation equations from Table 3 increases the correlation of different parameters. It should be noted that having such strongly correlated features would result in enhancing the accuracy of the developed machine learning models, as reported by previous studies [83]. However, higher multicollinearity could affect the accuracy of the model, and overfitted results could be achieved. Therefore, adopting machine learning approaches that reduce the effect of multicollinearity is needed [50].

For the current study, the selection of both SVR and DT was performed due to their capabilities in handling highly correlated data [84,85]. Also, other methods like the multilinear regression proved their deficiency when dealing with highly correlated data [50]. In addition, both methods were selected due to their simplicity as compared to other hybrid models shown in Table 1. Therefore, both SVR and DT, as two common machine learning methods that are commonly utilized in engineering applications, were selected for the current study. The following sections describe the models and how they are prepared.

2.3. Model-I: Support Vector Regression

Support vector regression represents a supervised machine learning approach originally introduced by Drucker et al. [86] as an extension to the support vector machine (SVM) algorithm developed by Boser et al. [87]. The method has proven to be a useful tool for estimating real-value functions. As a supervised-learning strategy, it trains with a symmetrical loss function (ε-insensitive) that penalizes both high and low misestimates equally [88]. The fundamental principle of SVR involves creating a hyperplane to model the relationship between the input features and the outputs [89]. For a given training dataset of

\{(x_{i}, y_{i})\}

, where

x_{i}

represents the input feature while

y_{i}

is the output, the developed hyperplane could be linear, which is the simplest form of SVR that can be described using Equation No. 1 as follows:

y = f (x_{i}) = ω x_{i} + b

(1)

where

y

is the predicted output;

ω

is the weight vector and

b

is the bias. For the hyperplane to hold the maximum training observation within ε-tolerance, the weight vector should be minimized (i.e., minimum

\frac{1}{2} ‖ω^{2}‖

) while preserving the conditions:

\{\begin{matrix} y_{i} - ω x_{i} - b \leq ε \\ ω x_{i} + b - y_{i} \leq ε \end{matrix}

However, using such a model will not tolerate variables outside the ε-tolerance. Hence, the model is modified by adding slack variables (

ξ_{i}, {ξ_{i}}^{*}

) to guard against the outliers. Based on that, the model optimization is conducted as follows:

\min \frac{1}{2} ‖ω^{2}‖ + C \sum_{i = 1}^{n} (ξ_{i} + {ξ_{i}}^{*})

Subjected to \{\begin{matrix} y_{i} - ω x_{i} - b \leq ε + ξ_{i} \\ x_{i} + b - y_{i} \leq ε + {ξ_{i}}^{*} \\ ξ_{i}, {ξ_{i}}^{*} \geq 0 \end{matrix}

where

C

is the regularization parameter that determines the trade-off between the complexity of function

f (x_{i})

and the prediction errors.

While the above equations could be used to develop high-accuracy models, the training data will not always be linear. Therefore, using non-linear transformation functions (Kernel), the input variables can be transformed into a high-dimensional space. A hyperplane can be created so that the regression function can effectively be applied to the new space [89]. Hence, Equation No. 1 is modified as follows:

y = f (x_{i}) = \sum_{i = 1}^{n} ({λ_{k}}^{'} - λ_{k}) K ({x, x}_{i}) + C

(2)

where

{λ_{k}}^{'}

,

λ_{k}

are positive Lagrangian multipliers,

K ({x, x}_{i})

is the Kernel function which can be polynomial, Gaussian, Gaussian radial basis (RBF), Laplace RBF, Hyperbolic tangent, Sigmoid, Bessel, or linear spline. For the current study, the Gaussian radial basis function was used. Several studies confirmed the superiority of RBF over the other kernel functions [90,91,92]. Moreover, the function is commonly used for geotechnical applications [93]. Therefore, it was selected for the current study. The function is expressed as follows:

K ({x, x}_{i}) = e x p (- γ {‖{x - x}_{i}‖}^{d})

(3)

2.4. Model-II: Decision Tree

Decision Trees represent flexible Machine Learning algorithms capable of handling classification, regression, and multi-output tasks. The model type, whether a classification or regression model, depends on the outcome. Such models are extremely powerful algorithms that can fit complex datasets [73]. The model is based on a sequential decision scheme or a tree-like structure, which is composed of the following parts [94]: the root node, which contains all the data; internal nodes; terminal nodes (i.e., leaves). The decision process is carried out from top to bottom at different layers of the tree until the end leaf is reached. At each node, a binary decision is made which separates some of the classes from the remaining ones. In the end, several classes are decided based on the decisions carried out through the analysis, and a final solution that is easier to interpret is reached [95].

The initial tree structure is determined by all training data. The algorithm data set is divided using every feasible binary split and chooses the split that minimizes the total of the squared deviations from the mean in the two portions. Each new branch is divided. Each node becomes a terminal node when it achieves a user-specified minimum node size (number of training samples) [94]. To describe the mathematics behind the decision trees, the following equations are presented [96]:

The tree node (α) and f_l, f_r are the two functions deciding the sides (conditions) of the tree. Also, consider

ν

to be the input variable and

t

to be the threshold value. Let

m

and

α

be a specific node and candidate split, respectively. Hence, the functions at the node would be described as follows:

f_{l} (α) = (x, y) |x_{v} \leq t_{m}

(4)

f_{r} (α) = (x, y) |x_{v} > t_{m}

(5)

where

x

and

y

are the model predictors (inputs) and target, respectively.

To validate the results of the tree and ensure the accuracy of the output, fitness functions are applied, and the errors are calculated at each trial to determine the accuracy of the model. Such functions include the coefficient of determination and the mean absolute error as will be presented in Section 3.1.

2.5. Model Optimization

For both SVR and DT models, certain parameters can be modified to optimize the models and achieve higher accuracy. Several parameters are optimized for SVR analysis to ensure better prediction capability, as presented in Section 2.3. Those are the Kernel function,

ε

, the regularization parameter (C), and gamma (

γ

). Since the Gaussian radial basis function was implemented in the current study, the other three parameters were optimized. The optimization strategy followed in this study was the test-set-cross validation method which fixes one parameter and changes the others systematically until the minimum error is achieved. The method was repeated until the optimum set of parameters was selected [96]. Table 6 summarizes the optimum selected parameters for both the stabilized and unstabilized soil scenarios that were selected as per the trial results shown in Figure 3 and Figure 4. Figure 3 presents the optimization of the SVR model. For unstabilized soil (Figure 3a), the best results were obtained with

ε

, C, and

γ

equal to 0.0001, 10, and 0.01, respectively. These settings help the model make precise predictions while avoiding overfitting. For stabilized and unstabilized soil combined (Figure 3b), slightly different values worked better:

ε

= 0.1, C = 1000, and

γ

= 0.001. The higher C value accounts for the greater complexity introduced by soil stabilizers.

In the case of DT analysis, the depth of the tree is the most essential element for the decision tree. Therefore, this parameter was modified until the minimum error was achieved and, hence, the depth was selected [96]. Figure 4 shows the optimization of the DT model’s maximum depth. For unstabilized soil, limiting the tree depth to 20 provided the right balance between accuracy and simplicity. For stabilized and unstabilized soil, a slightly deeper tree with a maximum depth of 25 was needed to capture the more complex relationships in the data. Finally, Figure 5 shows the graphical presentation of the research methodology and the algorithm development for both models.

3. Results and Discussion

3.1. Model Development

For both unstabilized and stabilized soils, SVR and DT analyses were carried out as discussed in Section 2. To test the performance of the developed models, the following numerical indicators were used: Pearson coefficient of correlation (r²), the coefficient of determination (R²), the root mean square error (RMSE), and the mean absolute error (MAE). The four indicators are calculated using Equations (6)–(9) as follows [97,98]:

r^{2} = {(\frac{n (\sum y_{m} y_{p}) - (\sum y_{m}) (\sum y_{p})}{\sqrt{[n \sum {y_{m}}^{2} - {(\sum y_{m})}^{2}] [n \sum {y_{p}}^{2} - {(\sum y_{p})}^{2}]}})}^{2}

(6)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{m} - y_{p})}^{2}}{\sum_{i = 1}^{n} {(y_{m} - \bar{y_{m}})}^{2}}

(7)

R M S E = \sqrt{\frac{\sum_{n} {(y_{m} - y_{p})}^{2}}{n}}

(8)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{m} - y_{p}|

(9)

where

y_{m}

is the measured output (from the experiment),

y_{p}

is the predicted output from the neural network,

\bar{y_{m}}

is the average of the measured output, and

n

is the number of measured data. Both r² and R² were used as model evaluation metrics. Although they often yield similar numerical values in regression analysis, they are conceptually different, where r² measures the strength and direction of the linear relationship between observed and predicted values, while R² measures the proportion of variance in the dependent variable that is predictable from the independent variables.

The performance metrics for unstabilized and stabilized soils are summarized in Table 7 and Table 8 for the training and testing phases, respectively. For unstabilized soils, the DT-model demonstrated strong performance during training, generating significantly lower errors than the SVR model (Table 7). However, both models achieved comparable accuracy during testing (Table 8). These results indicate that both DT and SVR models can be utilized to analyze the unstabilized soil data with high levels of accuracy.

For stabilized soil, the DT-model outperformed the SVR-model and marginal errors were produced (similar to the unstabilized case). Similarly, better results were achieved using the DT-model as compared to the SVR models in the testing phase, as shown in Table 8. These findings demonstrate that the DT approach effectively captured the complex relationships among input variables for stabilized soil.

The results summarized in Table 7 and Table 8 are validated by the cross-plots between the predicted and actual UCS presented in Figure 6, Figure 7, Figure 8 and Figure 9. Figure 6 and Figure 7 show graphically the cross-plots for both SVR and DT analyses in the case of unstabilized soils. As seen in these two figures, both models yielded similar performance, which was proved in Table 7 and Table 8. On the other hand, Figure 8 and Figure 9 show that, for stabilized soils, the DT model (Figure 9) surpassed the SVR model in the training phase. The insignificant errors presented in Table 7 confirm this conclusion. Moreover, in the testing phase, the DT model showed better performance as compared to the SVR model. This could be ascribed to the nature of the data for stabilized soil, which are categorized into different groups (i.e., unstabilized, lime-, and cement-stabilized). Hence, it is easier to estimate the compressive strength based on a categorical approach (i.e., decision tree approach) as described in Section 2.4.

A review of existing literature reveals that hybrid SVR models (e.g., SVR with particle swarm optimization (PSO)) will predict the UCS at high accuracy (R² more than 0.85) [62]. Such results are higher than those achieved in the testing phase, as shown in Table 8. This performance gap could be attributed to several causes like (1) the data size and nature (whether it is all computed through experiments or correlations); (2) the input features, where it is noted that many features were included in the current study like the AR and DW which were not included in previous studies. Nevertheless, it is expected that optimizing the SVR model with other algorithms would enhance the efficiency of the current model. On the other hand, the usage of decision trees for UCS prediction was limited. The M5 model presented by Suthar [56] showed comparable results to our current model (R² equal 0.967), even though his model only considered the lime stabilization and 7-day curing period, as shown earlier in Table 1.

Comparative analysis with previous ANN and MLR models [50] demonstrates that the SVR and DT methods surpassed MLR analysis. As for the ANN, it was found that, for unstabilized soils, the ANN achieved similar results as compared with the SVR and DT models in the testing phase (r² = 0.9698 compared with 0.9947 and 0.9948, respectively). On the other hand, the ANN achieved a lower r² (i.e., 0.6622) in the case of stabilized soil (in testing phase). Yet, the ANN outperformed both the SVR and DT in validation, as will be discussed in Section 3.2.

3.2. Model Validation (Experimental Predictions)

Model validation was performed using Dataset C (Section 2.2) to verify the predictive accuracy of the SVR and DT models developed in Section 3.1. The Wilcoxon Signed-Rank test [99] was used to determine whether the differences between the actual UCS and those values generated by the SVR and DT approaches are statistically significant. The Wilcoxon Signed-Rank test is a nonparametric statistical test that analyzes two paired samples to determine whether or not there is a difference between two dependent samples. The p-value was used to calculate the statistical difference between the two dependent samples. If the p-value is less than a certain threshold (e.g., 0.05), the result is statistically significant; otherwise, the result is statistically insignificant and suggests strong evidence for the null hypothesis [99].

For unstabilized soil, data from published studies were used [100,101]. The data in Figure 10a show that the SVR model was able to predict the strength of one sample (out of five). For sample 1, a marginal difference was observed between the experimental and model prediction, which could be acceptable if the model was intended to give a general idea about the soil strength. Comparing the data for both experimental and SVR models, a p-value of 0.188 was obtained, which is higher than the significance level (0.05), indicating that the difference between the experimental data and the SVR model’s predictions is statistically insignificant. This implies that the null hypothesis, that there is no significant discrepancy between the predicted and observed values, is not rejected. Therefore, the SVR model can be considered reliable for unstabilized soil.

On the other hand, Figure 10b demonstrated better performance in strength prediction with a p-value equal to 0.313 and relatively smaller errors as compared to SVR (Table 9). Generally, both models (SVR and DT) produced acceptable results in prediction, which is expected considering the models’ performance in the testing phase, as shown in Table 8. However, we can conclude that the DT performed marginally better than SVR in the case of unstabilized soil. Such performance could be ascribed to the nature at which each method deals with the inputs and which features affect the models the most, as will be illustrated in Section 3.3.

Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17 and Figure 18 present a comparison of experimental and predicted strength values for the four stabilized soils (Table 4), while Table 9 displays the test for significance for the validation analysis performance. The sample numbers in the figures correspond to the stabilizer dosages (0, 2.5, 5, 7.5, 10, 12.5, and 15%, respectively). As evidenced in Figure 11a, Figure 13a, Figure 15a and Figure 17a, the SVR model was able to reflect the basic pattern of strength development in the case of cement stabilization for the four soils (A, B, C, and D). However, the model achieved p-values below 0.05 (Table 9). This indicates that the models’ predictions are significantly different from the experimental results. On the other hand, the DT model was able to reflect the general strength pattern successfully for the four soils, as shown in Figure 11b, Figure 13b, Figure 15b and Figure 17b, and achieve p-values of 0.779, which suggests that the DT model has a stronger correlation with the experimental data, particularly for soils A and D, where the DT model’s R values are 0.937 and 0.949, respectively, compared to the SVR model’s 0.603 and 0.314. For soil B, the SVR model exhibited substantial errors, as indicated by the RMSE and MAE compared with the DT model. The model underestimated the strength of the soil. This could reflect unexpected strength development for Soil B, which was reported in a previous study [72]. On the other hand, the model underestimated Soil D’s high strength with cement stabilization. Examination of Dataset B reveals limited high-UCS values (>11 MPa) training examples, with only one study reporting such values [102]. Therefore, it could be expected that the SVR model could not predict such a high strength in the current study. The DT model also exhibited significant differences from the experimental results, with a very low R value of 0.036, suggesting poor predictive performance for this soil type.

In the case of lime stabilization, the SVR model achieved acceptable accuracy in predicting strength for Soils A, B, and C (Table 10). However, for Soil D, the model overestimated the strength while capturing the same pattern of strength development. Specifically, for Soil B, Figure 14 demonstrates minimal discrepancies between experimental and predicted strengths. The SVR model was able to predict the strength up to 7.5% lime dosage, while it underestimated the strength at higher dosages, which explains the relatively high RMSE shown in Table 10 (i.e., 0.77). Similarly, the DT model underestimated strength for some lime-stabilized Soil B samples, producing higher errors than SVR (Table 10). Consequently, we conclude that the DT model was inadequate for predicting this soil’s strength.

Generally, for lime stabilization, the p-value for the DT model across all soil types (A, B, C, and D) was above the 0.05 threshold, indicating no significant difference between the models’ predictions and the experimental data (Table 9). This suggests that the DT model was capable of predicting UCS for lime-stabilized soils better than the SVR model.

Comparing the current models with other models presented in a previous study [50], it was found that the ANN model performed better in validation using the experimental results. This agrees with the findings in the studies presented in Table 1. It can be seen that both models (SVR and DT models) gave variant results in terms of the experimental work prediction presented in this study. The results shown in Table 9 reflect the need to enhance the capability of the models developed in the current study. This can be achieved by incorporating hybrid machine learning approaches, as shown earlier in Table 1. Moreover, the accuracy of the database used for model creation should be enhanced by including full experimental results rather than estimating missing variables through correlations.

3.3. Input Features Importance

The “permutation importance” or “Mean Decrease Accuracy (MDA)” technique was used to evaluate the variable importance for both stabilized and unstabilized cases [103]. The algorithm applies a simple procedure by measuring the reduction in the model’s score (r²) after removing one feature at a time. The analyses were carried out with both the test and training data to ensure robustness.

Figure 19 shows the importance of the different features for both the SVR and DT models in the case of the unstabilized scenario. Figure 19a shows the feature importance when using the SVR model. The analysis shows that the MDD is the main factor affecting the strength of the soil. Next, the grain size distribution, mainly the fines and sand content, plays a major role in affecting the strength of unstabilized soils. Logically, both the density and grain size distribution affect the strength of unstabilized soil due to the interlocking between the particles. After both density and grain size distribution come the consistency limits, which describe the water–soil relation and, hence, are directly correlated to the capillary strength developed between the soil particles [15].

On the other hand, the DT model gave different results, as shown in Figure 19b. The testing condition, which determines whether the soil sample was wet or dry during the testing, plays the main role in determining the strength of unstabilized soils. Such a result could be ascribed to the categorical nature of the DT. In simple words, for the unstabilized soil data, there are only two categories for DW: wet or dry condition, which forces the DT to categorize the data based on their testing conditions, especially when there is a clear difference between the wet and dry samples’ strength, as shown in Figure 20. The second parameter is the LL, which might be considered the main parameter defining the soil moisture condition. Other parameters show marginal importance and similar effects on the strength of the unstabilized soils.

Looking at Figure 19, we can see that many features affect the strength of unstabilized soil in the case of SVR (with permutation importance more than 0.2). This could increase the complexity of the model and, hence, it could be expected that it affects the prediction accuracy of the model as compared with the DT model, which only requires the testing condition and LL for predicting the strength. Therefore, we can conclude that the superiority of the DT model over the SVR is ascribed to the number of features affecting the strength of the soil (i.e., the outcome).

In summary, based on the information presented in this research, the superiority of the DT model over the SVR model for predicting the strength of unstabilized soils can be summarized as follows:

Performance Metrics: For unstabilized soils, the DT model achieved superior results in both training and testing phases, as shown in Table 7 and Table 8. Moreover, the DT model performed better with the validation data, as summarized earlier in Table 9.
Simplicity and Feature Relevance: As per the aforementioned discussion, the DT model requires fewer features to predict the strength of unstabilized soils, specifically only the testing condition and LL, compared to the SVR model, which is influenced by many features with permutation importance more than 0.2. This simplicity could lead to a more straightforward model that is easier to interpret and use.

For the stabilized soil, Figure 21a shows that both the MDD and SD are the main sources of strength, regardless of the stabilizer type (ST), which has a marginal effect on the strength according to the SVR model. This is similar to the results presented earlier by Ngo et al. [62], who used optimized SVR with other algorithms for their analysis (Table 1). However, the literature suggests that higher strength could be achieved in cement stabilization compared to lime, especially in low-plasticity soils, as confirmed by a previous study [72]. Such results explain the high errors shown in Table 7 when using the SVR analysis.

On the other hand, the DT model produced more logical results by introducing the stabilizer type as the main parameter influencing the strength of the stabilized soil, followed by the dry density and testing condition. This could explain why the DT model performed better in the testing phase compared to the SVR model, due to the categorical approach where the ST was the first parameter used to categorize the data and analyze the strength, as mentioned in Section 3.1.

Moreover, by examining the performance of both the SVR and DT results presented in Section 3.2 (i.e., validation with experimental data), it can be seen that the DT model performed better than the SVR, as shown in Table 9 earlier. It could be concluded that such performance was due to the variation in both density and stabilizers’ dosages in the experimental data shown in Table A1, which, as per Figure 21a, are considered as the main parameters affecting the strength as per the SVR model criteria. On the other hand, the DT model was mainly affected by the stabilizer type, while the dosage had a marginal effect.

4. Conclusions

The current study compares the performance of machine learning methods, specifically support vector regression and decision trees, in assessing the compressive strength of stabilized and unstabilized soils. The data, collected from several sources in the literature, along with the experimental data prepared by the authors, were used to train, test, and validate the models. A total of 12 input features were considered for the analysis. Using both the SVR and DT methods, two models were developed to analyze the strength of unstabilized and stabilized soils, and their performance was compared in terms of r², R², MAE, and RMSE. Based on these results, the following conclusions were reached:

For unstabilized soil, both the SVR and DT models demonstrated strong predictive capability. In the testing phase, the DT model achieved better performance with r² = 0.9383 and R² = 0.9311, compared to the SVR model, which achieved r² = 0.9135 and R² = 0.9108. This indicates that both models are suitable for strength prediction.
In the case of stabilized soils, the models exhibited moderate accuracy. The DT model again outperformed SVR with r² = 0.7530 and R² = 0.6898, while the SVR model yielded r² = 0.5474 and R² = 0.5412. The lower performance is attributed to the complexity of the dataset, which was expected because of the nature of the data (i.e., categorized into three different groups of unstabilized, cement-, and lime-stabilized soils).
The models (for both unstabilized and stabilized scenarios) were validated using the experimental results from previous work. For unstabilized soils, the DT model generally performed better than the SVR model. This was supported by the p-values obtained from the Wilcoxon Signed-Rank test, which suggests that the differences in predictions from the DT model are not statistically significant from the experimental results, indicating a good model fit. Additionally, the SVR model showed a p-value lower than that obtained with the DT model. Moreover, the importance of the input variables was estimated using the permutation importance analysis, and it was found that the soil density, along with the grain size distribution, were the main features affecting the strength according to the SVR model. However, many features obtained high importance value (more than 0.2), which increases the model complexity and, hence, the accuracy. On the other hand, the DT model showed that the testing condition was the main strength feature, followed by the liquid limit. These differences in performance and the variables’ effects could be ascribed to the nature in which each model deals with the data, as the DT follows a more categorical approach and, hence, is highly affected by the testing condition categories. Therefore, it is concluded that the superiority of the DT model over the SVR is ascribed to the number of features affecting the strength of the soil (i.e., the outcome) in the case of unstabilized soil.
While validating the stabilized/unstabilized models, it was found that, in most cases, the models predicted the strength pattern of both lime and cement stabilization, which could aid future studies to examine the effect of stabilization on the general strength of the stabilized soils. Moreover, the models performed better in the case of cement stabilization as compared to lime stabilization. The models were able to marginally predict the high strength of the low-plasticity soil (i.e., Soil D) at an acceptable level of accuracy, which proves the ability of the models to distinguish the prediction results for different types of soils.
In the case of stabilized soils, the density and stabilizer content were the main sources of strength as per the SVR model. However, such a conclusion contradicts the literature, which has proven that there is a clear difference between the strength of cement- and lime-stabilized soil. On the other hand, the DT model considered the stabilizer type as the main feature influencing the strength of the stabilized soil, followed by the density and testing conditions. Therefore, based on both model analyses, it can be concluded that the stabilizer type and content, along with the density and moisture at the time of testing, are the main strength features for stabilized soils.

The current study aimed to investigate the possibility of machine learning utilization in strength prediction for both stabilized and unstabilized soils. This study proved that such tools can be used in future studies to analyze the strength of earth materials and, hence, would help in producing proper classification systems that consider the usage of the earth as a construction material. The developed models provide a quick method to analyze the strength of earth material, which will be efficient in designing the earth mix effectively. Also, the presented model (for unstabilized soil) could be modified to include the possibility of adding different stabilizers to the raw soil, which could help in reducing the amount of experimental work in future studies. Moreover, the models can be used to optimize other input parameters like gravel, moisture content, aspect ratio, and linear shrinkage, which would help in defining the optimum ranges of such parameters when designing the earth mix. Several other parameters could be incorporated in such tools as clay mineralogy and other stabilizers like fly ash, cement kiln dust, rice husk, straws, etc. It should be noted, however, that the current models were developed based on the specific features that were selected for this study. Hence, changing these features or adding additional input parameters might not produce the same results. In addition, it is clear that some differences in the prediction capacities of both SVR and DT models were observed while validating the models. Hence, it is expected that such performance is ascribed to the nature of the data used in building the models, which were based on correlation equations from previous studies. Therefore, it is recommended that models be built using actual experimental results in future studies.

Author Contributions

Conceptualization, Y.M.H.M.; data curation, Y.S.W.; formal analysis, Y.M.H.M.; funding acquisition, M.S.Z.; investigation, Y.M.H.M.; methodology, Y.M.H.M., M.S.Z. and M.A.A.-O.; project administration, M.S.Z.; software, Y.M.H.M. and Y.S.W.; supervision, M.S.Z. and M.A.A.-O.; validation, Y.M.H.M., M.S.Z., M.A.A.-O. and Y.S.W.; visualization, Y.M.H.M.; writing—original draft, Y.M.H.M.; writing—review and editing, M.S.Z., M.A.A.-O. and Y.S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This publication is based upon work supported by King Fahd University of Petroleum & Minerals (KFUPM). The authors at KFUPM acknowledge the Deanship of Research Oversight and Coordination (DROC) for the support received under Grant no. IN-171030.

Data Availability Statement

The data used in the current study were collected and compiled by the authors using the references presented in a previous study [50]. The experimental data were taken from a previous study conducted by the authors [72]. The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors acknowledge the support provided by the Deanship of Research Oversight and Coordination (DROC) and the Interdisciplinary Research Center for Construction and Building Materials (IRC-CBM) at King Fahd University of Petroleum and Minerals.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ANN	Artificial neural network
AR	Aspect ratio
CBR	California bearing ratio
DT	Decision tree
E	Young modulus
EPR	Evolutionary polynomial regression
F	Fines content (%)
G	Gravel content (%)
GA	Genetic algorithm
GB	Gradient boosting
GBR	Gradient boosted regression
GP	Genetic programming
HGS	Hunger games search
ICA	Imperialist competitive algorithm
KNN	K-nearest neighbor
LightGBM	Light gradient boosting regression
MDD	Maximum dry density (kg/m³)
MLR	Multilinear regression
MLR	Multi-linear regression
OMC	Optimum moisture content (%)
PSO	Particle swarm optimization
RF	Random forest
RRHC	Random restart hill climbing
S	Sand content (%)
SA	Simulated annealing
SD	Stabilizer dosage (%)
SMA	Slime mould algorithm
ST	Stabilizer type
SVM	Support vector machine
SVR	Support vector regression
UCS	Unconfined compressive strength (MPa)
XGBoost	Extreme boosting gradient

Appendix A

Dataset C, employed for experimental validation in Section 3.2, comprises UCS measurements from cement- and lime-stabilized soil samples. As summarized in Table A1, this dataset was extracted from the comprehensive experimental study by [72].

Table A1. Experimental results used for validation.

Soil Type	Soil Gradation %			SD (%)	Consistency (%)			Compaction		AR	D/W	UCS (MPa)
Soil Type	G	S	F	SD (%)	LL	PI	SL	OMC	MDD	AR	D/W	UCS (MPa)
Soil A Cement	0.00	84.00	16.00	0	43.74	17.01	7.99	20.00	1692.15	2	2	1.496
	0.00	84.00	16.00	2.50	29.83	23.47	9.67	19.29	1711.62	2	2	1.715
	0.00	84.00	16.00	5.00	29.79	16.65	5.97	19.08	1718.35	2	2	1.884
	0.00	84.00	16.00	7.50	27.13	4.55	3.09	18.87	1725.08	2	2	1.641
	0.00	84.00	16.00	10.00	19.71	2.18	0.09	18.65	1731.80	2	2	2.008
	0.00	84.00	16.00	12.50	18.70	1.92	0.09	18.44	1738.53	2	2	1.940
	0.00	84.00	16.00	15.00	18.62	2.47	0.09	18.23	1745.26	2	2	2.660
Soil A Lime	0.00	84.00	16.00	2.50	29.83	13.21	6.20	19.42	1702.14	2	2	0.667
	0.00	84.00	16.00	5.00	29.84	11.87	5.57	20.63	1676.25	2	2	0.773
	0.00	84.00	16.00	7.50	30.34	5.72	2.69	21.85	1650.36	2	2	0.596
	0.00	84.00	16.00	10.00	32.89	0.92	0.43	23.07	1624.46	2	2	1.020
	0.00	84.00	16.00	12.50	32.93	0.42	0.20	24.29	1598.57	2	2	0.780
	0.00	84.00	16.00	15.00	32.95	0.91	0.43	25.51	1572.68	2	2	0.607
Soil B Cement	0.00	84.70	15.30	0	33.49	17.78	8.35	18.95	1726.81	2	2	2.117
	0.00	84.70	15.30	2.50	29.83	15.55	7.30	18.24	1733.10	2	2	2.251
	0.00	84.70	15.30	5.00	29.78	17.72	8.32	18.80	1736.85	2	2	2.999
	0.00	84.70	15.30	7.50	26.96	3.19	1.50	19.36	1740.60	2	2	3.185
	0.00	84.70	15.30	10.00	19.80	3.35	1.57	19.92	1744.34	2	2	3.179
	0.00	84.70	15.30	12.50	18.74	2.12	1.00	20.48	1748.09	2	2	3.511
	0.00	84.70	15.30	15.00	18.64	2.79	1.31	21.04	1751.83	2	2	3.570
Soil B Lime	0.00	84.70	15.30	2.50	29.83	17.90	8.40	19.78	1703.70	2	2	0.963
	0.00	84.70	15.30	5.00	29.84	13.90	6.53	20.85	1679.15	2	2	1.415
	0.00	84.70	15.30	7.50	30.39	4.96	2.33	21.93	1654.61	2	2	1.367
	0.00	84.70	15.30	10.00	32.89	5.49	2.58	23.00	1630.07	2	2	2.364
	0.00	84.70	15.30	12.50	32.93	3.98	1.87	24.07	1605.53	2	2	2.209
	0.00	84.70	15.30	15.00	32.94	1.00	0.47	25.14	1580.99	2	2	1.997
Soil C Cement	0.00	73.80	26.20	0	34.05	15.05	7.07	17.60	1689.09	2	2	1.330
	0.00	73.80	26.20	2.50	29.83	21.34	10.02	17.78	1696.25	2	2	1.291
	0.00	73.80	26.20	5.00	29.80	15.62	7.33	17.97	1706.88	2	2	2.399
	0.00	73.80	26.20	7.50	27.49	1.05	0.49	18.16	1717.51	2	2	2.152
	0.00	73.80	26.20	10.00	19.85	1.00	0.47	18.36	1728.13	2	2	2.293
	0.00	73.80	26.20	12.50	18.71	0.74	0.35	18.55	1738.76	2	2	2.886
	0.00	73.80	26.20	15.00	18.62	0.34	0.16	18.74	1749.39	2	2	2.964
Soil C Lime	0.00	73.80	26.20	2.50	29.83	12.13	5.69	17.37	1679.74	2	2	1.122
	0.00	73.80	26.20	5.00	29.84	9.84	4.62	18.61	1653.16	2	2	1.224
	0.00	73.80	26.20	7.50	29.91	1.42	0.67	19.85	1626.58	2	2	1.200
	0.00	73.80	26.20	10.00	31.08	0.29	0.14	21.09	1600.00	2	2	1.785
	0.00	73.80	26.20	12.50	33.03	1.02	0.48	22.33	1573.42	2	2	2.322
	0.00	73.80	26.20	15.00	33.68	0.43	0.20	23.58	1546.84	2	2	1.831
Soil D Cement	0.00	74.00	26.00	0	17.52	2.28	1.07	13.48	1800.20	2	2	0.621
	0.00	74.00	26.00	2.50	29.83	4.41	2.07	14.85	1772.60	2	2	1.200
	0.00	74.00	26.00	5.00	29.80	4.39	2.06	15.25	1778.64	2	2	2.784
	0.00	74.00	26.00	7.50	27.56	5.14	2.41	15.65	1784.68	2	2	4.408
	0.00	74.00	26.00	10.00	19.81	3.96	1.86	16.04	1790.72	2	2	6.254
	0.00	74.00	26.00	12.50	18.70	2.98	1.40	16.44	1796.76	2	2	7.892
	0.00	74.00	26.00	15.00	18.62	0.91	0.43	16.83	1802.80	2	2	10.320
Soil D Lime	0.00	74.00	26.00	2.50	29.83	3.84	1.80	15.55	1769.70	2	2	0.704
	0.00	74.00	26.00	5.00	29.83	3.41	1.60	16.49	1748.78	2	2	0.983
	0.00	74.00	26.00	7.50	29.82	5.65	2.65	17.43	1727.85	2	2	0.975
	0.00	74.00	26.00	10.00	29.52	2.63	1.23	18.38	1706.93	2	2	1.114
	0.00	74.00	26.00	12.50	28.11	4.85	2.28	19.32	1686.01	2	2	1.665
	0.00	74.00	26.00	15.00	27.82	2.00	0.94	20.26	1665.09	2	2	1.536

References

Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
Berge, B. The Ecology Building Materials; Routledge: London, UK, 2009. [Google Scholar]
Reddy, B.V.V.; Mani, M.; Walker, P. Earthen Dwellings and Structures: Current Status in Their Adoption; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
Kosarimovahhed, M.; Toufigh, V. Sustainable usage of waste materials as stabilizer in rammed earth structures. J. Clean. Prod. 2020, 277, 123279. [Google Scholar] [CrossRef]
Ávila, F.; Puertas, E.; Gallego, R. Characterization of the mechanical and physical properties of unstabilized rammed earth: A review. Constr. Build. 2021, 270, 121435. [Google Scholar] [CrossRef]
Cuitiño-Rosales; Guadalupe, M.; Rotondaro, R.; Esteves, A. Comparative Analysis of Thermal Aspects and Mechanical Resistance of Building Materials and Elements with Earth. Rev. Arquit. 2020, 22, 138–151. [Google Scholar]
Ghasemalizadeh, S.; Toufigh, V. Durability of Rammed Earth Materials. Int. J. Geomech. 2020, 20, 04020201. [Google Scholar] [CrossRef]
Samadianfard, S.; Toufigh, V. Energy Use and Thermal Performance of Rammed-Earth Materials. J. Mater. Civ. Eng. 2020, 32, 0003364. [Google Scholar] [CrossRef]
Taffese, W.Z.; Abegaz, K.A. Artificial intelligence for prediction of physical and mechanical properties of stabilized soil for affordable housing. Appl. Sci. 2021, 11, 7503. [Google Scholar] [CrossRef]
Santamarina, J.C. Soil behavior at the microsclae: Particle forces. Soil Behav. Soft Ground Constr. 2003, 25–56. [Google Scholar] [CrossRef]
Narloch, P.; Woyciechowski, P.; Kotowski, J.; Gawriuczenkow, I.; Wójcik, E. The effect of soil mineral composition on the compressive strength of cement stabilized rammed earth. Materials 2020, 13, 324. [Google Scholar] [CrossRef]
Zami, M.S.; Ewebajo, A.O.; Al-Amoudi, O.S.B.; Al-Osta, M.A.; Mustafa, Y.M.H. Strength and durability improvement of cement-stabilized Al-Qatif soil by the addition of sand. Arab. J. Geosci. 2022, 15, 1339. [Google Scholar] [CrossRef]
Tan, E.H.; Zahran, E.M.M.; Tan, S.J. A comparative experimental investigation into the chemical stabilisation of sandstone aggregates using cement and styrene-butadiene copolymer latex for road sub-base construction. Transp. Geotech. 2022, 37, 100864. [Google Scholar] [CrossRef]
Mustafa, Y.M.H.; Al-Amoudi, O.S.B.; Zami, M.S.; Al-Osta, M.A. Assessment of the Suitability of Earth as a Construction Material from Experimental and Numerical Perspectives: A Critical Review. Int. J. Geomech. 2023, 23, 0002646. [Google Scholar] [CrossRef]
Mitchell, J.K.; Soga, K. Fundamentals of Soil Behavior, 3rd ed.; John Wiley and Sons, Inc.: Hoboken, NJ, USA, 2005. [Google Scholar]
Houben, H.; Guillaud, H. Earth Construction: A Comprehensive Guide; Earth Construction Series; Craterre-Eag: Grenoble, France, 1994. [Google Scholar]
Alley, P.J. Rammed earth construction. N. Z. Eng. 1948, 3, 582. [Google Scholar]
Burroughs, S. Soil property criteria for rammed earth stabilization. J. Mater. Civ. Eng. 2008, 20, 264–273. [Google Scholar] [CrossRef]
Norton, J. Handbook on Building with Earth; Intermediate Technology Publications Ltd.: London, UK, 1997. [Google Scholar]
McHenry, P.G. Adobe and Rammed Earth Buildings: Design and Construction; Wiley-Interscience Publication: New York, NY, USA, 1984. [Google Scholar]
Moayedi, H.; Mosallanezhad, M.; Rashid, A.S.A.; Jusoh, W.A.W.; Muazu, M.A. A systematic review and meta-analysis of artificial neural network application in geotechnical engineering: Theory and applications. Neural Comput. Appl. 2020, 32, 495–518. [Google Scholar] [CrossRef]
Zhang, W.; Li, H.; Li, Y.; Liu, H.; Chen, Y.; Ding, X. Application of Deep Learning Algorithms in Geotechnical Engineering: A Short Critical Review; Springer: Dordrecht, The Netherlands, 2021. [Google Scholar] [CrossRef]
Villard, P.; Chevalier, B.; Le Hello, B.; Combe, G. Coupling between finite and discrete element methods for the modelling of earth structures reinforced by geosynthetic. Comput. Geotech. 2009, 36, 709–717. [Google Scholar] [CrossRef]
Ahmed, H.R.; Abduljauwad, S. Significance of molecular-level behaviour incorporation in the constitutive models of expansive clays—A review. Geomech. Geoengin. 2018, 13, 115–138. [Google Scholar] [CrossRef]
Lee, S.J.; Lee, S.R.; Kim, Y.S. An approach to estimate unsaturated shear strength using artificial neural network and hyperbolic formulation. Comput. Geotech. 2003, 30, 489–503. [Google Scholar] [CrossRef]
Neaupane, K.M.; Achet, S.H. Use of backpropagation neural network for landslide monitoring: A case study in the higher Himalaya. Eng. Geol. 2004, 74, 213–226. [Google Scholar] [CrossRef]
Choobbasti, A.J.; Farrokhzad, F.; Barari, A. Prediction of slope stability using artificial neural network (Case study: Noabad, mazandaran, iran). Arab. J. Geosci. 2009, 2, 311–319. [Google Scholar] [CrossRef]
Kiran, S.; Lal, B.; Tripathy, S.S. Shear strength prediction of soil based on probabilistic neural network. Indian J. Sci. Technol. 2016, 9, 1–6. [Google Scholar] [CrossRef]
Moayedi, H.; Hayati, S. Artificial intelligence design charts for predicting friction capacity of driven pile in clay. Neural Comput. Appl. 2019, 31, 7429–7445. [Google Scholar] [CrossRef]
Pham, B.T.; Singh, S.K.; Ly, H.-B. Using Artificial Neural Network (ANN) for prediction of soil coefficient of consolidation. Vietnam J. Earth Sci. 2020, 42, 311–319. [Google Scholar]
Goh, A.T.C.; Goh, S.H. Support vector machines: Their use in geotechnical engineering as illustrated using seismic liquefaction data. Comput. Geotech. 2007, 34, 410–421. [Google Scholar] [CrossRef]
Samui, P. Support vector machine applied to settlement of shallow foundations on cohesionless soils. Comput. Geotech. 2008, 35, 419–427. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, X.; Wang, B. LS-SVM and Monte Carlo methods based reliability analysis for settlement of soft clayey foundation. J. Rock Mech. Geotech. Eng. 2013, 5, 312–317. [Google Scholar] [CrossRef]
Xue, X.H.; Yang, X.G.; Chen, X. Application of a support vector machine for prediction of slope stability. Sci. China Technol. Sci. 2014, 57, 2379–2386. [Google Scholar] [CrossRef]
Zhang, Y.; Qiu, J.; Zhang, Y.; Xie, Y. The adoption of a support vector machine optimized by GWO to the prediction of soil liquefaction. Environ. Earth Sci. 2021, 80, 360. [Google Scholar] [CrossRef]
Gandomi, A.H.; Fridline, M.M.; Roke, D.A. Decision Tree Approach for Soil Liquefaction Assessment. Sci. World J. 2013, 2013, 346285. [Google Scholar] [CrossRef]
Garcia, E.M.; Alberti, M.G.; Alvarez, A.A.A. Measurement-While-Drilling Based Estimation of Dynamic Penetrometer Values Using Decision Trees and Random Forests. Appl. Sci. 2022, 12, 4565. [Google Scholar] [CrossRef]
Kurup, P.U.; Griffin, E.P. Prediction of Soil Composition from CPT Data Using General Regression Neural Network. J. Comput. Civ. Eng. 2006, 20, 281–289. [Google Scholar] [CrossRef]
Alavi, A.H.; Gandomi, A.H.; Mollahassani, A.; Heshmati, A.A.; Rashed, A. Modeling of maximum dry density and optimum moisture content of stabilized soil using artificial neural networks. J. Plant Nutr. Soil Sci. 2010, 173, 368–379. [Google Scholar] [CrossRef]
Shahin, M.A. Load-settlement modeling of axially loaded steel driven piles using CPT-based recurrent neural networks. Soils Found. 2014, 54, 515–522. [Google Scholar] [CrossRef]
Komadja, G.C.; Rana, A.; Glodji, L.A.; Anye, V.; Jadaun, G.; Onwualu, P.A.; Sawmliana, C. Assessing Ground Vibration Caused by Rock Blasting in Surface Mines Using Machine-Learning Approaches: A Comparison of CART, SVR and MARS. Sustainability 2022, 14, 11060. [Google Scholar] [CrossRef]
Baghbani, A.; Choudhury, T.; Costa, S.; Reiner, J. Application of artificial intelligence in geotechnical engineering: A state-of-the-art review. Earth Sci. Rev. 2022, 228, 103991. [Google Scholar] [CrossRef]
Khatti, J.; Grover, K.S. A Scientometrics Review of Soil Properties Prediction Using Soft Computing Approaches. Arch. Comput. Methods Eng. 2024, 31, 1519–1553. [Google Scholar] [CrossRef]
Das, S.K.; Samui, P.; Sabat, A.K. Application of Artificial Intelligence to Maximum Dry Density and Unconfined Compressive Strength of Cement Stabilized Soil. Geotech. Geol. Eng. 2011, 29, 329–342. [Google Scholar] [CrossRef]
Mozumder, R.A.; Laskar, A.I. Prediction of unconfined compressive strength of geopolymer stabilized clayey soil using Artificial Neural Network. Comput. Geotech. 2015, 69, 291–300. [Google Scholar] [CrossRef]
Ghorbani, A.; Hasanzadehshooiili, H. Prediction of UCS and CBR of Microsilica-Lime Stabilized Sulfate Silty Sand using ANN and EPR Models: Application to the Deep Soil Mixing. Soils Found. 2018, 58, 34–49. [Google Scholar] [CrossRef]
Narloch, P.; Hassanat, A.; Tarawneh, A.S.; Anysz, H.; Kotowski, J.; Almohammadi, K. Predicting compressive strength of cement-stabilized rammed earth based on SEM images using computer vision and deep learning. Appl. Sci. 2019, 9, 5131. [Google Scholar] [CrossRef]
Tinoco, J.; Alberto, A.; Venda, P.; Correia, A.G.; Lemos, L. A Novel Approach based on Soft Computing Techniques for Unconfined Compression Strength Prediction of Soil Cement Mixtures. Neural Comput. Appl. 2020, 32, 8985–8991. [Google Scholar] [CrossRef]
Khatti, J.; Grover, K.S. Prediction of UCS of fine-grained soil based on machine learning part 1: Multivariable regression analysis, gaussian process regression, and gene expression programming. Multiscale Multidiscip. Model. Exp. Des. 2023, 6, 199–222. [Google Scholar] [CrossRef]
Mustafa, Y.M.H.; Zami, M.S.; Al-Amoudi, O.S.B.; Al-Osta, M.A.; Wudil, Y.S. Analysis of Unconfined Compressive Strength of Rammed Earth Mixes Based on Artificial Neural Network and Statistical Analysis. Materials 2022, 15, 9029. [Google Scholar] [CrossRef] [PubMed]
Tinoco, J.; Correia, A.G.; Cortez, P. Support vector machines applied to uniaxial compressive strength prediction of jet grouting columns. Comput. Geotech. 2014, 55, 132–140. [Google Scholar] [CrossRef]
Barzegar, R.; Sattarpour, M.; Reza, M. Comparative evaluation of artificial intelligence models for prediction of uniaxial compressive strength of travertine rocks, Case study: Azarshahr area, NW Iran. Model. Earth Syst. Environ. 2016, 2, 76. [Google Scholar] [CrossRef]
Mozumder, R.A.; Laskar, A.I.; Hussain, M. Empirical approach for strength prediction of geopolymer stabilized clayey soil using support vector machines. Constr. Build. Mater. 2017, 132, 412–424. [Google Scholar] [CrossRef]
Tabarsa, A.; Latif, N.; Osouli, A.; Bagheri, Y. Unconfined compressive strength prediction of soils stabilized using artificial neural networks and support vector machines. Front. Struct. Civ. Eng. 2021, 15, 520–536. [Google Scholar] [CrossRef]
Khatti, J.; Grover, K.S. Prediction of UCS of fine-grained soil based on machine learning part 2: Comparison between hybrid relevance vector machine and Gaussian process regression. Multiscale Multidiscip. Model. Exp. Des. 2024, 7, 123–163. [Google Scholar] [CrossRef]
Suthar, M. Applying several machine learning approaches for prediction of unconfined compressive strength of stabilized pond ashes. Neural Comput. Appl. 2020, 32, 9019–9028. [Google Scholar] [CrossRef]
Suman, S.; Mahamaya, M.; Das, S.K. Prediction of Maximum Dry Density and Unconfined Compressive Strength of Cement Stabilised Soil Using Artificial Intelligence Techniques. Int. J. Geosynth. Ground Eng. 2016, 2, 11. [Google Scholar] [CrossRef]
Anysz, H.; Narloch, P. Designing the composition of cement stabilized rammed earth using artificial neural networks. Materials 2019, 12, 1396. [Google Scholar] [CrossRef]
Ceryan, N. Application of support vector machines and relevance vector machines in predicting uniaxial compressive strength of volcanic rocks. J. Afr. Earth Sci. 2014, 100, 634–644. [Google Scholar] [CrossRef]
Kumar, A.; Sinha, S.; Saurav, S.; Chauhan, V.B. Prediction of unconfined compressive strength of cement–fly ash stabilized soil using support vector machines. Asian J. Civ. Eng. 2024, 25, 1149–1161. [Google Scholar] [CrossRef]
Ngo, H.T.T.; Pham, T.A.; Vu, H.L.T.; Van Giap, L. Application of artificial intelligence to determined unconfined compressive strength of cement-stabilized soil in Vietnam. Appl. Sci. 2021, 11, 1949. [Google Scholar] [CrossRef]
Ngo, T.Q.; Nguyen, L.Q.; Tran, V.Q. Novel hybrid machine learning models including support vector machine with meta-heuristic algorithms in predicting unconfined compressive strength of organic soils stabilised with cement and lime. Int. J. Pavement Eng. 2022, 24, 2136374. [Google Scholar] [CrossRef]
Vasić, M.V.; Pezo, L.L.; Radojević, Z. Optimization of adobe clay bricks based on the raw material properties (mathematical analysis). Constr. Build. Mater. 2020, 244, 118342. [Google Scholar] [CrossRef]
Pham, V.; Do, H.; Oh, E.; Ong, D.E.L. Prediction of Unconfined Compressive Strength of Cement-Stabilized Sandy Soil in Vietnam using Artificial Neural Networks (ANNs) Model. Int. J. Geotech. Eng. 2021, 15, 1177–1187. [Google Scholar] [CrossRef]
Shahani, N.M.; Kamran, M.; Zheng, X.; Liu, C.; Guo, X. Application of gradient boosting machine learning algorithms to predict uniaxial compressive strength of soft sedimentary rocks at Thar coalfield. Adv. Civ. Eng. 2021, 2021, 2565488. [Google Scholar] [CrossRef]
Al-Bared, M.A.M.; Mustaffa, Z.; Armaghani, D.J.; Marto, A.; Yunus, N.Z.M.; Hasanipanah, M. Application of hybrid intelligent systems in predicting the unconfined compressive strength of clay material mixed with recycled additive. Transp. Geotech. 2021, 30, 100627. [Google Scholar] [CrossRef]
Kardani, N.; Zhou, A.; Shen, S.L.; Nazem, M. Estimating unconfined compressive strength of unsaturated cemented soils using alternative evolutionary approaches. Transp. Geotech. 2021, 29, 100591. [Google Scholar] [CrossRef]
Khan, N.M.; Cao, K.; Yuan, Q.; Hashim, M.H.B.M.; Rehman, H.; Hussain, S.; Emad, M.Z.; Ullah, B.; Shah, K.S.; Khan, S. Application of Machine Learning and Multivariate Statistics to Predict Uniaxial Compressive Strength and Static Young’s Modulus Using Physical Properties under Different Thermal Conditions. Sustainability 2022, 14, 9901. [Google Scholar] [CrossRef]
Taffese, W.Z.; Abegaz, K.A. Prediction of Compaction and Strength Properties of Amended Soil Using Machine Learning. Buildings 2022, 12, 613. [Google Scholar] [CrossRef]
Hoque, M.I.; Hasan, M.; Islam, M.S.; Houda, M.; Abdallah, M.; Sobuz, M.H.R. Machine Learning Methods to Predict and Analyse Unconfined Compressive Strength of Stabilised Soft Soil with Polypropylene Columns. Cogent Eng. 2023, 10, 2220492. [Google Scholar] [CrossRef]
Onyelowe, K.C.; Ebid, A.M.; Hanandeh, S. Advanced machine learning prediction of the unconfined compressive strength of geopolymer cement reconstituted granular sand for road and liner construction applications. Asian J. Civ. Eng. 2023, 25, 1027–1041. [Google Scholar] [CrossRef]
Mustafa, Y.M.H.; Al-Amoudi, O.S.B.; Zami, M.S.; Al-Osta, M.A. Strength and durability assessment of stabilized Najd soil for usage as earth construction materials. Bull. Eng. Geol. Environ. 2023, 82, 55. [Google Scholar] [CrossRef]
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2022. [Google Scholar]
Scikit-Learn, Scikit-Learn: Machine Learning in Python. 2023. Available online: https://scikit-learn.org/stable/ (accessed on 26 June 2023).
Malkanthi, S.N.; Balthazaar, N.; Perera, A.A.D.A.J. Lime stabilization for compressed stabilized earth blocks with reduced clay and silt. Case Stud. Constr. Mater. 2020, 12, e00326. [Google Scholar] [CrossRef]
Ciancio, D.; Beckett, C.T.S.; Carraro, J.A.H. Optimum lime content identification for lime-stabilised rammed earth. Constr. Build. Mater. 2014, 53, 59–65. [Google Scholar] [CrossRef]
Fu, Y.; Liao, H.; Lv, L. A Comparative Study of Various Methods for Handling Missing Data in UNSODA. Agriculture 2021, 11, 727. [Google Scholar] [CrossRef]
Di Matteo, L.; Bigotti, F.; Ricco, R. Best-Fit Models to Estimate Modified Proctor Properties of Compacted Soil. J. Geotech. Geoenviron. Eng. 2009, 135, 992–996. [Google Scholar] [CrossRef]
Widjaja, B.; Chriswandi, C. New relationship between linear shrinkage and shrinkage limit for expansive soils. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2020; Volume 1007. [Google Scholar] [CrossRef]
Sivrikaya, O. Models of compacted fine-grained soils used as mineral liner for solid waste. Environ. Geol. 2008, 53, 1585–1595. [Google Scholar] [CrossRef]
Polidori, E. Relationship between the atterberg limits and clay content. Soils Found. 2007, 47, 887–896. [Google Scholar] [CrossRef]
Liu, Q.; Tong, L. Engineering properties of unstabilized rammed earth with different clay contents. J. Wuhan Univ. Technol.—Mater. Sci. Ed. 2017, 32, 914–920. [Google Scholar] [CrossRef]
Khatti, J.; Grover, K.S. A Study of Relationship among Correlation Coefficient, Performance, and Overfitting using Regression Analysis. Int. J. Sci. Eng. Res. 2022, 13, 1074–1085. [Google Scholar]
Piramuthu, S. Input data for decision trees. Expert Syst. Appl. 2008, 34, 1220–1226. [Google Scholar] [CrossRef]
Shafiee, S.; Lied, L.M.; Burud, I.; Dieseth, J.A.; Alsheikh, M.; Lillemo, M. Sequential forward selection and support vector regression in comparison to LASSO regression for spring wheat yield prediction based on UAV imagery. Comput. Electron. Agric. 2021, 183, 106036. [Google Scholar] [CrossRef]
Drucker, H.; Surges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural. Inf. Process Syst. 1997, 1, 155–161. [Google Scholar]
Boser, B.E.; Guyon, I.M.; Vapnik, V.N. Training Algorithm for Optimal Margin Classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
Awad, M.; Khanna, R. Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers; Springer Nature: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Zhang, F.; O’Donnell, L.J. Support Vector Regression; Elsevier Inc.: Amsterdam, The Netherlands, 2019. [Google Scholar] [CrossRef]
Tbarki, K.; Said, S.B.; Ksantini, R.; Lachiri, Z. RBF kernel based SVM classification for landmine detection and discrimination. In Proceedings of the 2016 International Image Processing, Applications and Systems (IPAS), Hammamet, Tunisia, 5–7 November 2016; pp. 1–6. [Google Scholar] [CrossRef]
Nguyen, H. Support vector regression approach with different kernel functions for predicting blast-induced ground vibration: A case study in an open-pit coal mine of Vietnam. SN Appl. Sci. 2019, 1, 283. [Google Scholar] [CrossRef]
Karal, O. Performance comparison of different kernel functions in SVM for different k value in k-fold cross-validation. In Proceedings of the Innovations in Intelligent Systems and Applications Conference (ASYU), Istanbul, Turkey, 15–17 October 2020; IEEE: New York, NY, USA, 2020. [Google Scholar]
Mustafa, M.R.; Rezaur, R.B.; Rahardjo, H.; Isa, M.H. Prediction of pore-water pressure using radial basis function neural network. Eng. Geol. 2012, 135–136, 40–47. [Google Scholar] [CrossRef]
Xu, M.; Watanachaturaporn, P.; Varshney, P.K.; Arora, M.K. Decision tree regression for soft classification of remote sensing data. Remote Sens. Environ. 2005, 97, 322–336. [Google Scholar] [CrossRef]
Breiman, L. Classification and Regression Trees, 1st ed.; Routledge: London, UK, 1984. [Google Scholar] [CrossRef]
Alrebdi, T.A.; Wudil, Y.S.; Ahmad, U.F.; Yakasai, F.A.; Mohammed, J.; Kallas, F.H. Predicting the thermal conductivity of Bi2Te3-based thermoelectric energy materials: A machine learning approach. Int. J. Therm. Sci. 2022, 181, 107784. [Google Scholar] [CrossRef]
Mishra, S.; Mishra, D.; Santra, G.H. Adaptive boosting of weak regressors for forecasting of crop production considering climatic variability: An empirical assessment. J. King Saud Univ. Comput. Inf. Sci. 2020, 32, 949–964. [Google Scholar] [CrossRef]
Kent State University Libraries. SPSS Tutorials: Pearson Correlation; Kent State University: Kent, OH, USA, 2023. [Google Scholar]
Demir, S.; Sahin, E.K. Comparison of tree-based machine learning algorithms for predicting liquefaction potential using canonical correlation forest, rotation forest, and random forest based on CPT data. Soil Dyn. Earthq. Eng. 2022, 154, 107130. [Google Scholar] [CrossRef]
Zami, M.S.; Ewebajo, A.; Al-Amoudi, O.S.B.; Al-Osta, M.A.; Mustafa, Y.M.H. Compressive Strength and Wetting – Drying Cycles of Al-Hofuf “Hamrah” Soil Stabilized with Cement and Lime. Arab. J. Sci. Eng. 2022, 47, 13249–13264. [Google Scholar] [CrossRef]
Zami, M.S.; Oladapo, A.; Omar, E.; Al, S.B.; Al Osta, M.A.; Mustafa, Y.M.H. Geotechnical properties and strength of Al-Hassa White Soil suitable for stabilized earth construction. Arab. J. Geosci. 2022, 15, 698. [Google Scholar] [CrossRef]
Porter, H.; Blake, J.; Dhami, N.K.; Mukherjee, A. Rammed earth blocks with improved multifunctional performance. Cem. Concr. Compos. 2018, 92, 36–46. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]

Figure 1. Correlation heatmap for the unstabilized soil case.

Figure 2. Correlation heatmap for the stabilized soil case.

Figure 3. Hyperparameter optimization trials for both (a) unstabilized and (b) stabilized/unstabilized scenarios for SVR model. The red dotted boxes represent the optimum performance.

Figure 4. Hyperparameter optimization trials for both unstabilized and stabilized/unstabilized scenarios for DT model. The red dashed lines represent the optimum performance.

Figure 5. Research methodology.

Figure 6. SVR analysis of unstabilized soils during (a) training and (b) testing phases.

Figure 7. DT analysis of unstabilized soils during (a) training and (b) testing phases.

Figure 8. SVR analysis of stabilized/unstabilized soils during (a) training and (b) testing phases.

Figure 9. DT analysis of stabilized/unstabilized soils during (a) training and (b) testing phases.

Figure 10. Model validation of unstabilized soils for (a) SVR and (b) DT analyses.

Figure 11. Validation plot for cement-stabilized soil A using (a) SVR and (b) DT.

Figure 12. Validation plot for lime-stabilized soil A using (a) SVR and (b) DT.

Figure 13. Validation plot for cement-stabilized soil B using (a) SVR and (b) DT.

Figure 14. Validation plot for lime-stabilized soil B using (a) SVR and (b) DT.

Figure 15. Validation plot for cement-stabilized soil C using (a) SVR and (b) DT.

Figure 16. Validation plot for lime-stabilized soil C using (a) SVR and (b) DT.

Figure 17. Validation plot for cement-stabilized soil D using (a) SVR and (b) DT.

Figure 18. Validation plot for lime-stabilized soil D using (a) SVR and (b) DT.

Figure 19. Permutation importance of input variables for the unstabilized soil for (a) SVR, and (b) DT models.

Figure 20. Strength distribution between wet and dry soil samples for unstabilized soils (from Dataset A).

Figure 21. Permutation importance of input variables for the stabilized soil for (a) SVR, and (b) DT models.

Table 2. Selected parameters for model development.

Parameter	Description	Unit
AR	Aspect ratio (length/width ratio)	-
DW	Testing condition (oven-dried or wet)	-
ST	Stabilizer type	-
SD	Stabilizer’s dosage	%
G	Gravel	%
S	Sand	%
F	Fines (silt and clay)	%
LL	Liquid limit	%
PI	Plasticity index	%
LS	Linear shrinkage	%
OMC	Optimum moisture content	%
MDD	Maximum dry density	kg/m³

Table 3. Some relationships for estimating soil properties.

	Missing Data	Available Data	Suggested Correlation	Condition	Reference
1	OMC and MDD	Consistency limits (LL, PI)	$O M C = - 0.86 L L + 3.04 (\frac{L L}{G_{s}}) + 2.2$	10% < FC < 100% MDD > 2038.74 kg/m³ OMC < 10%	[78]
1	OMC and MDD	Consistency limits (LL, PI)	$M D D = ((40.316 \times {O M C}^{- 0.295}) \times {P I}^{0.032}) - 2.4$	10% < FC < 100% MDD > 2038.74 kg/m³ OMC < 10%	[78]
2	LS	PI	$L S = \frac{P I}{2.13}$		[79]
3	OMC and MDD	LL, PL	$O M C = 0.94 P L$	41% < FC < 99%	[80]
			$O M C = 0.52 L L$
			$M D D = 0.22 (96.32 - P L)$
			$M D D = 0.09 (225.78 - L L)$
4	LL and PI	FC	$L L = 0.67 F C$	Better used for sand-Kaolinite mixtures	[81]
4	LL and PI	FC	$P I = 0.96 L L - (0.26 F C + 10)$	Better used for sand-Kaolinite mixtures	[81]

Table 4. Geotechnical properties of the selected soils [72].

Test	Soil A	Soil B	Soil C	Soil D
Specific gravity	2.45	2.50	2.43	2.45
LL (%)	43.74	33.49	34.05	17.52
PI (%)	17.01	17.78	15.05	2.28
Grain Size Distribution - G (%) - S (%) - F (%)
	0	0	0	0
	84	84.7	73.8	74
	16	15.3	26.2	26
MDD (kg/m³)	1692.15	1730.89	1690.11	1800.20
OMC (%)	20	19.2	17.6	13.48
UCS (MPa)	1.50	2.12	1.33	0.62

Table 5. Descriptive statistics of the input variables.

Dataset		Gravel (%)	Sand (%)	Fines (%)	LL (%)	PI (%)	LS (%)	OMC (%)	MDD (kg/m³)	SD (%)	UCS (MPa)
A	Min	0	0	0	13.40	1.00	0.00	5.80	1216.00	-	0.00
	Max	62.00	100.00	100.00	100.00	58.00	27.23	121.00	2750.00	-	4.26
	Average	4.09	29.92	65.99	43.28	20.87	8.55	20.56	1705.57	-	0.98
	Std. Deviation	12.32	26.29	30.95	15.19	11.06	6.31	10.56	257.88	-	0.92
B	Min	0	0.00	0.00	3.35	1.00	0.00	4.68	641.09	0.00	0.00
	Max	62.00	100.00	100.00	100.00	63.00	29.58	121.00	2750.00	15.00	11.50
	Average	4.51	36.04	59.44	40.36	17.76	6.26	22.04	1672.53	5.16	1.69
	Std. Deviation	11.49	29.76	33.17	17.36	10.91	6.03	12.91	280.84	4.27	1.58
C	Min	0	73.80	15.30	17.52	0.29	0.09	13.48	1546.84	0.00	0.60
	Max	0	84.70	26.20	43.74	23.47	10.02	25.51	1802.80	15.00	10.32
	Average	0	79.13	20.88	28.10	6.96	3.18	19.19	1702.99	7.50	2.14
	Std. Deviation	0	5.23	5.23	6.13	6.66	3.07	2.65	63.05	5.00	1.71

Table 6. Optimized hyperparameters used for the model development.

	Unstabilized Soil		Stabilized/Unstabilized Soil
	SVR	DT	SVR	DT
ε	0.0001	-	0.1	-
C	10	-	1000	-
gamma	0.01	-	0.001	-
Max Depth	-	20	-	25

Table 7. Performance indicators at the training phase.

	Unstabilized Soil		Stabilized/Unstabilized Soil
	SVR	DT	SVR	DT
r²	0.9886	1.00	0.9973	1.00
R²	0.9883	1.00	0.9965	1.00
MAE	0.0258	4.3825 × 10⁻¹⁸	0.0943	6.9389 × 10⁻¹⁸
RMSE	0.0937	2.0136 × 10⁻¹⁷	0.0986	5.5096 × 10⁻¹⁷

Table 8. Performance indicators at the testing phase.

	Unstabilized Soil		Stabilized/Unstabilized Soil
	SVR	DT	SVR	DT
r²	0.9135	0.9383	0.5474	0.7530
R²	0.9108	0.9311	0.5412	0.6898
MAE	0.1534	0.1246	0.5447	0.3206
RMSE	0.3127	0.2747	0.9885	0.8127

Table 9. Statistical significance of both models in validation.

	Stabilizer	Model	p-Value	Significant at 95%
Unstabilized	-	SVR	0.188	No
Unstabilized	-	DT	0.313	No
Stabilized	Cement	SVR	<0.05	Yes
	Cement	DT	0.779	No
	Lime	SVR	0.013	Yes
	Lime	DT	0.295	No

Table 10. Numerical indicators for validation.

	Stabilizer	Model	MAE	RMSE
Unstabilized	-	SVR	0.287	0.345
Unstabilized	-	DT	0.282	0.303
Stabilized/Unstabilized	Cement	A-SVR	1.274	1.476
	Cement	A-DT	0.408	0.433
	Lime	A-SVR	0.560	0.593
	Lime	A-DT	0.563	0.579
	Cement	B-SVR	1.505	1.528
	Cement	B-DT	0.957	1.022
	Lime	B-SVR	0.814	1.005
	Lime	B-DT	0.599	0.729
	Cement	C-SVR	1.100	1.470
	Cement	C-DT	1.324	1.725
	Lime	C-SVR	0.562	0.678
	Lime	C-DT	0.429	0.532
	Cement	D-SVR	1.080	1.277
	Cement	D-DT	2.338	3.000
	Lime	D-SVR	0.924	1.458
	Lime	D-DT	0.283	0.372

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mustafa, Y.M.H.; Wudil, Y.S.; Zami, M.S.; Al-Osta, M.A. Machine Learning Approach for Assessment of Compressive Strength of Soil for Use as Construction Materials. Eng 2025, 6, 84. https://doi.org/10.3390/eng6050084

AMA Style

Mustafa YMH, Wudil YS, Zami MS, Al-Osta MA. Machine Learning Approach for Assessment of Compressive Strength of Soil for Use as Construction Materials. Eng. 2025; 6(5):84. https://doi.org/10.3390/eng6050084

Chicago/Turabian Style

Mustafa, Yassir M. H., Yakubu Sani Wudil, Mohammad Sharif Zami, and Mohammed A. Al-Osta. 2025. "Machine Learning Approach for Assessment of Compressive Strength of Soil for Use as Construction Materials" Eng 6, no. 5: 84. https://doi.org/10.3390/eng6050084

APA Style

Mustafa, Y. M. H., Wudil, Y. S., Zami, M. S., & Al-Osta, M. A. (2025). Machine Learning Approach for Assessment of Compressive Strength of Soil for Use as Construction Materials. Eng, 6(5), 84. https://doi.org/10.3390/eng6050084

Article Menu

Machine Learning Approach for Assessment of Compressive Strength of Soil for Use as Construction Materials

Abstract

1. Introduction

2. Materials and Methods

2.1. Model Input Variables

2.2. Model Datasets

2.3. Model-I: Support Vector Regression

2.4. Model-II: Decision Tree

2.5. Model Optimization

3. Results and Discussion

3.1. Model Development

3.2. Model Validation (Experimental Predictions)

3.3. Input Features Importance

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI