1. Introduction
Ground improvement using the technique of shallow or deep soil mixing has received much interest and acceptance in recent years largely due to its extensive applications in construction projects. In the UK, EU and US where the uptake and implementation of the technology has increased exponentially over the past three decades, environmental policies and laws, taxes, landfill directives and the ever-increasing cost of excavating and moving poor soils has made this method of ground stabilisation even more imminent [
1,
2,
3,
4].
Cementitious materials, such as cement and lime, have been used traditionally over the past 50 decades as hydraulic binders to stabilise poor soils. However, the attendant negative environmental impacts associated with the production of these energy intensive binders are a present global concern. Hence, based on current developments in knowledge and research, attention is gradually shifting from an over-reliance on solely cement and lime to the utilisation of waste materials, industrial and agricultural by-products, organics, etc., in soil stabilisation [
5,
6].
Agro-based environmentally friendly pozzolanic materials, such as rice husk ash, palm oil fuel ash, bagasse ash, coconut shell ash, coconut husk ash, corn cob ash, almond shell ash, etc., have gained considerable attention in soil stabilisation given the ever-growing costs of their disposal [
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18]. The major chemical composition of these plant-based pozzolans are alumino-silicates [
19,
20,
21]. Moreover, in order to achieve the desired effect on the mechanical properties of soils, most applications in soil stabilisation have tended towards the partial substitution of calcium-based agents by agro-based pozzolans. The soil-binder mix in this regard can speed up the rate of development of the calcium alumino-silicate hydrated gel (CAH or CASH) as well as the sodium alumino-silicate hydrated gel (NASH) [
22]. These binding gels will develop inside the soil voids, and aid in the formation of a more compact soil-binder mix and thus enable a further improvement in the strength of the stabilised soil. Approximately 50–80% reduction in the quantity of calcium-oxide-based additives as a result of the addition of agro-based pozzolanic materials has been reported [
23]. Important literature surveys reflecting the composite mix of calcium-based and agro-based agents used in soil stabilisation were carried out recently [
15,
24,
25].
Table 1 is an indication of some of the binders used in the recent past and the target strength properties considered in the improvement.
In general, the determination of the strength properties of soils stabilised by using a composite binder mixture is often a crucial first step towards establishing the correct design mix guideline for field application [
26,
27,
28]. For soils stabilised by using multiple binder combinations, the challenges of establishing a parameter, such as the UCS, may require laborious laboratory experimentation and time-consuming trial batching of binder type, quantities, optimal combinations, choice of curing duration, and the determination of other influencing factors.
Moreover, conventional techniques of predicting or modelling the UCS of stabilised soils do rely essentially on relationships that are developed empirically from statistical concepts, employing mostly linear, and occasionally nonlinear regression methods [
29,
30,
31]. The equations generated analytically from these methods do tend to determine several unknown coefficients that may affect relationships between the dependent and independent features or variables. Hence, the resulting models, although effective in certain situations, are inherently riddled with shortcomings due mainly to the complexities of the stabilised soil mix.
In recent times, artificial intelligence paradigms relying on several machine learning (ML) techniques have begun to gain traction as alternatives for the determination of the UCS of soils [
32,
33,
34]. That notwithstanding, the adoption of ML methods for the predictive modelling of improved ground properties has only been reported in a few studies [
32,
35,
36,
37,
38,
39,
40,
41,
42]. Moreover, an application that involves predictive modelling of compressive strength of soils stabilised by eco-friendly pozzolans enriched by cementitious additives has not been reported.
In this study, the gradient boosting (GB) machine learning technique is utilised for the predictive modelling of compressive strength of soils stabilised by cementitious additive-enriched eco-friendly pozzolans. This research shall take into account both regression and multinomial classification of the compressive strength of the stabilised soils. Rigorous sensitivity-driven diagnostic testing to validate the algorithm used and the corresponding statistical outcomes are also undertaken. Finally, it is recommended that an implementation of the concepts derived from this study be applied during the preliminary stages of soil stabilisation for civil construction and related ground improvement applications.
Table 1.
Studies on soil stabilisation using agro-based and cementitious additive blends.
Table 1.
Studies on soil stabilisation using agro-based and cementitious additive blends.
Pozzolanic Ash | Cementitious Additive | Target Strength | Reference |
---|
Rice husk ash | Lime | CBR | [43] |
Bagasse ash | Lime | UCS, CBR | [44] |
Rice husk ash | Cement, Lime | UCS | [45] |
Rice husk ash | Cement, Lime | UCS, Shear | [23] |
Bagasse ash | Lime | CBR | [46] |
Rice husk ash | Cement | UCS, CBR | [47] |
Palm oil fuel ash | Lime | UCS | [21] |
Rice husk ash | Lime | CBR, shear | [48] |
Bagasse ash | Calcium carbide | UCS, shear | [49] |
Rice husk ash | Cement | UCS, tensile strength, flexural strength | [50] |
Rice husk ash | Cement | CBR | [51] |
Sawdust ash | Lime | CBR | [17] |
Coconut shell ash, coconut husk ash | Lime | UCS, CBR | [52] |
Palm oil fuel ash | High calcium pulverized fuel ash | UCS, CBR | [53] |
Rice husk ash | Cement | UCS, tensile strength | [54] |
Palm oil fuel ash, rice husk ash | Calcium carbide | UCS, shear | [55] |
Rice husk ash | Lime | UCS, shear, tensile strength, CBR | [56] |
Rice husk ash | Lime | UCS, CBR | [7] |
Rice husk ash | Lime | UCS, CBR | [57] |
Bagasse ash | Cement | UCS, CBR | [58] |
Rice husk ash | Lime | UCS, shear, CBR | [59] |
Palm oil fuel ash | Cement | Shear | [60] |
Rice husk ash | Cement | Shear, CBR | [11] |
Rice husk ash | Lime | UCS | [61] |
Rice husk ash | Lime | CBR | [62] |
Bagasse ash | Lime | UCS, CBR | [63] |
Rice husk ash | Lime, calcium chloride | UCS, CBR | [64] |
Corn cob ash | Calcium carbide | UCS | [65] |
Bagasse ash | Cement | UCS | [66] |
Almond shell ash | Lime | UCS | [67] |
Plant ash | Cement, calcium chloride | UCS | [14] |
Rice husk ash | Cement, lime | UCS | [35] |
2. Methodology
2.1. Database Development, Pre-Processing, and Exploratory Analysis
A dataset of 392 soils stabilised using cementitious additives’-enriched agro-based pozzolans in various proportions and combinations, compacted and cured for 7, 14 and 28 days were compiled from a very intensive literature search [
7,
17,
21,
22,
35,
44,
47,
49,
50,
53,
54,
55,
56,
63,
67]. As stated previously, most agro-based pozzolanic materials are composed mainly of alumino-silicates. The range of some of the main chemical compositions of the pozzolans and those of the cementitious materials (containing mostly of calcium-oxide compounds) utilised to stabilise the soils are given in
Table 2. A very broad range of agro-based pozzolans (rice husk ash, palm oil fuel ash, bagasse ash, coconut shell ash, coconut husk ash, corn cob ash, and almond shell ash) were used to stabilise the soils. On the other hand, the cementitious materials consist of cement, lime, cement kiln dust, high calcium fly ash and calcium carbide. As could be observed in
Table 2, X-ray fluorescence (XRF) measurements conducted on the binding agents used in this study indicates the maximum proportion of alumino-silicates in the agro-based pozzolans as being about 93%, while that of cementitious additive is about 25%. On the other hand, the highest amount of calcium oxide compound realized in the pozzolanic ashes is about 14% compared to 95% in the cementitious binders. It seems from
Table 2 that the proportions of both agents that can be used to stabilize the soil could be an important trade-off between their innate chemical compositions, the target compressive strength to be achieved and the impact of their usage on the environment. It is also pertinent to bear in mind that the clay soils to be stabilized are themselves mostly siliceous as indicated by their chemical compositions in
Table 2; hence, they should be taken into account in deriving a suitable design mix. Standard preparation methods including those involving slight modifications of traditional or standardized measurement procedures carried out to reflect special laboratory testing conditions were followed to achieve the aims of stabilisation. Since the nature of the dataset of UCS are diverse in this regard, it was necessary to normalise these data in order to enhance the significance of the overall modelling and the reliability of the results of findings. A two-step inverse-normal data transformation approach was applied on the dataset of UCS regarded in this study as the target variable [
68]. As could be observed from
Figure 1 and
Table 3, normally distributed data and relatively lower values of kurtosis (−0.16) and skewness (1.63 × 10
−6) suggest that the dataset can be reliable for ML modelling.
A total of 8 independent variables are used as input features in the ML modelling namely, values of agro-based pozzolans, cementitious additives, soil class, liquid limit, plasticity index, plastic limit, curing duration and strength class. Exploratory dataset analysis carried out on these variables yields the statistical metrics and distributions shown in
Table 3 and
Figure 2, respectively.
Given that the independent variable dataset will be used in its raw form for the ML modelling, it is very refreshing to note how reasonably low their kurtosis and skewness scores are as observed in
Table 3. It is very necessary to use raw independent data of the variables in the predictive modelling to preserve as well as ensure an accurate representation of the natural reality of random distributions. As
Table 3 shows, the range and proportion (calculated by weight of dry soil) of the agro-based pozzolans (lowest of 0.1% and highest of 25%) and the cementitious materials (lowest of 0.5% and highest of 11.25%) used, demonstrate a very diverse mix of binder quantities that are used to stabilise the soils. Except for the soil plasticity properties,
Figure 2a,b shows a fairly uniform frequency distribution for the binders used. The distribution of the soil class is observed as being skewed to favour mostly soils of lower plasticity. However, there is almost a satisfying balance between the soil classes if both high plasticity classes (CH and MH) are considered together compared to the lower plasticity class (CL). The frequency distribution of curing duration (
Figure 2g) seems fair, except for the UCS strength classes, which, according to
Figure 2h, is imbalanced and skewed towards the hardened stabilised soils. Thus, the frequency distribution for the stabilised soil’s compressive strength class (or consistency) in this regard indicates that most of the stabilised soils in the dataset were greater than approximately 400 kPa, as
Table 4 shows. The UCS class shall serve as a target (or dependent) variable in the multinomial classification ML prediction, while, as already stated above, the actual numerical values of the UCS shall be used as dependent variables in the ML regression modelling considered subsequently in this research.
2.2. Gradient Boosting Machine (GBM)
Boosting is generally an ensemble machine learning technique that involves an aggregation of based learners to enable better predictions of mostly classification and regression problems. Gradient boosting (GB) machine works by optimising a differentiable loss function (an example is the ‘squared error’ for regression and ‘logarithmic’ for classification) as well as an additive modelling that involves taking a weighted sum of several suitable base learners in order to minimise the loss function. In its simplest from, the GBM as an additive model can be represented mathematically as
where
F = ensemble model,
f = base(weak) learner,
η = rate of learning or shrinkage and
X is the input vector.
Fm (
X) is the result of each iteration obtained by minimising a loss function and therefore can be considered as a directional vector (
rm−1), which points to the steepest decent. Hence, the GB machine can then be expressed alternatively as
If the function that is being approximated is given as
If there are
n number of samples in a dataset (
D), with each sample having
m set of features in a vector
x and a real target or dependent value of
y all expressed as
Then, an ensemble of trees considering additive modelling can be given as below:
where
M = number of base leaners and
= regression tree space.
Additionally, if the differentiable loss function is given as
then the first step in initialising the model with a constant value by minimizing the loss function becomes
where
b0 = minimisation of loss function at 0th iteration.
For
m = 1 to
M, the following is computed for all the
n samples for
i = 1, …,
nNext, we fit a regression (or classification tree) to rim, allowing each tree to be denoted by Rjm for j = 1, …, Jm, where Jm is the number of leaves in the trees created in the mth iteration.
For
j = 1, …,
Jm the following is then computed:
where
bjm = least square coefficient or the basis function,
ρm = leaf weight or scaling factor
The equation then simplifies to
It is important to note that the parameters or hyperparameters required (among other factors) will have to be carefully selected in order to obtain an optimised or desirable results of the ML prediction. The following section shall describe the methods adopted to optimise the GB machine to ensure higher performance on multinomial classification and regression.
2.3. Model Optimisation
2.3.1. Hyperparameter Tuning
In order to ensure the best performance of the GB model, a series of stepwise-randomized searching were implemented to select the best performing hyperparameters using python’s sklearn searching class type called ‘‘RandomizedSearchCV”.
Table 5 shows the hyperparameters eventually chosen to optimise the algorithm on both the training and testing datasets.
2.3.2. Cross-Validation
The k-fold cross-validation technique was applied to enhance learning and validation on 80% of the dataset. Cross validation also ensures that undue overfitting of the algorithm on the training set was prevented. After several trials, 10-fold cross validation was regarded as the most effective in the ML modelling. It is important to note that a further 20% of the data was set aside for model testing.
2.4. ML Performance Evaluation Metrics
For an assessment of the performance of the ML model on the multinomial classification problem, accuracy, precision, recall and F1 score were used. Additionally, in order to depict the capacity of the model to predict the probability of the compressive strength of the stabilised soils belonging to different categories across a specified decision threshold, the receiver operating characteristic curve (ROC) and corresponding area under curve (AUC) were used. ROC is a plot of true positive rate (TPR) or sensitivity versus false positive rate (FPR) (or one less specificity) under some threshold values hence, separating “noise” from “signals”. AUC is a measure of the actual ability of a model to distinguish between class labels. For the regression problem, coefficient of determination (R2) and mean absolute percentage error (MAPE) metrics shall be adopted.
4. Study Significance
The importance of machine learning to civil and environmental engineering design and construction especially in ground improvement works including but not limited to road subgrades, building foundations, embankments and cut slopes, bridge abutments, exclusion barriers, liquefaction mitigation, backfills, contaminated ground remediation, etc. cannot be over- emphasized. The concept of artificial intelligence as applied in this study can save time, cost and money during the planning and design stages of ground improvement. Examples of some of the preliminary exercises that can be circumvented are laborious laboratory experimentation and time-consuming trial batching of binder type, quantities, optimum combinations, choice of curing duration, and the determination of other influencing factors. In order to practically implement the model studied herein, all its resources, including background scripting, would have to be deployed and persisted on an organisation’s server to be used to train and test on new data of stabilised soil’s unconfined compressive strength.
However, it should be borne in mind that only a few of the plethora of factors that can directly or indirectly influence soil stabilisation were considered. Hence, it is suggested that the principles applied in this research be extended to include modelling of not just soil strength behaviours but also other important serviceability design parameters that involve settlement and soil swelling or expansion. Moreover, predictive modelling using artificial intelligence is recommended for stabilised soils subjected to different curing and environmental durability conditions in future studies. It is also suggested that various other environmentally friendly materials be utilised to stabilise the soils and predictions made using machine learning models.