Cutting-Edge Machine Learning Techniques for Accurate Prediction of Agglomeration Size in Water–Alumina Nanofluids

Vaferi, Behzad; Dehbashi, Mohsen; Alibak, Ali Hosin

doi:10.3390/sym16070804

Open AccessArticle

Cutting-Edge Machine Learning Techniques for Accurate Prediction of Agglomeration Size in Water–Alumina Nanofluids

by

Behzad Vaferi

^1,*

,

Mohsen Dehbashi

^2,* and

Ali Hosin Alibak

³

¹

Department of Chemical Engineering, Shiraz Branch, Islamic Azad University, Shiraz 7198774731, Iran

²

Institute of Physics, Center for Science and Education, Silesian University of Technology, Konarskiego 22B, 44-100 Gliwice, Poland

³

Department of Petroleum Engineering, Faculty of Engineering, Soran University, Soran 44008, Kurdistan Region, Iraq

^*

Authors to whom correspondence should be addressed.

Symmetry 2024, 16(7), 804; https://doi.org/10.3390/sym16070804

Submission received: 7 May 2024 / Revised: 15 June 2024 / Accepted: 20 June 2024 / Published: 27 June 2024

(This article belongs to the Special Issue Machine Learning and Data Analysis II)

Download

Browse Figures

Versions Notes

Abstract

:

Nanoparticle agglomeration is one of the most problematic phenomena during nanofluid synthesis by a two-step procedure. Understanding and accurately estimating agglomeration size is crucial, as it significantly affects nanofluids’ properties, behavior, and successful applications. To the best of our knowledge, the literature has not yet applied machine learning methods to estimate alumina agglomeration size in water-based nanofluids. So, this research employs a range of machine learning models—Random Forest, Adaptive Boosting, Extra Trees, Categorical Boosting, and Multilayer Perceptron Neural Networks—to predict alumina agglomeration sizes in water-based nanofluids. To this end, a comprehensive experimental database, including 345 alumina agglomeration sizes in water-based nanofluids, compiled from 29 various sources from the literature, is utilized to train these models and monitor their generalization ability in the testing stage. The models estimate agglomeration size based on multiple factors: alumina concentration, ultrasonic time, power, frequency, temperature, surfactant type and concentration, and pH levels. The relevancy test based on the Pearson method clarifies that Al₂O₃ agglomeration size in water primarily depends on ultrasonic frequency, ultrasonic power, alumina concentration in water, and surfactant concentration. Comparative analyses based on numerical and graphical techniques reveal that the Categorical Boosting model surpasses others in accurately simulating this complex phenomenon. It effectively captures the intricate relationships between key features and alumina agglomeration size, achieving an average absolute relative deviation of 6.75%, a relative absolute error of 12.83%, and a correlation coefficient of 0.9762. Furthermore, applying the leverage method to the experimental data helps identify two problematic measurements within the database. These results validate the effectiveness of the Categorical Boosting model and contribute to the broader goal of enhancing our understanding and control of nanofluid properties, thereby aiding in improving their practical applications.

Keywords:

water–alumina nanofluids; agglomeration size; machine learning modeling; Categorical Boosting

1. Introduction

Nanofluids, a relatively new class of working fluids, consist of stable, suspended solid particles ranging from 1 to 100 nm in size [1,2]. These nanoscale particles are typically derived from metals [3], metal oxides [4], and carbon-based materials [5,6]. Already utilized in various engineering [7] and everyday applications [8], nanofluids offer significant advantages in fields such as heat and mass transfer [9,10], thermal energy storage [11], solar energy harvesting [12], medical and biomedical technologies [13,14], lubrication [15], and wastewater/environmental remediation [16]. These areas have particularly benefited from the enhanced properties provided by this innovative technology.

Alumina (Al₂O₃)–water is not only a non-toxic and low-cost nanofluid [17], but it also benefits from relatively good stability [18] and suitable thermophysical properties [19]. These interesting characteristics of alumina–water nanofluid make it a promising candidate for application in heat exchangers [20], solar collectors [21], enhanced oil recovery [22], electronic liquid cooling systems [23], and air conditioning units [24].

While alumina–water nanofluids offer numerous potential benefits, a comprehensive understanding of their challenges is essential before widespread adoption in industrial applications. Nanoparticle agglomeration is a critical issue that can significantly compromise the stability and performance of nanofluids [25,26] and may even damage operating equipment [27]. This problem occurs when alumina nanoparticles cluster together, forming larger particles. These resultant agglomerates, being too heavy to remain suspended, can settle out of the base liquid. Consequently, large (>100 nm) and dense alumina particles reduce the concentration of nanoparticles in suspension, diminishing the effectiveness of the nanofluids [28].

Hence, many laboratory-scale tests have been suggested to monitor nanoparticle size after adding to the base fluid. Dynamic light scattering [29], transmission electron microscopy [30], atomic force microscopy [31], scanning electron microscopy [32], light extinction spectroscopy [33], and tunable resistive pulse sensing [34] are available tools to measure nanoparticle size in nanofluids. The choice of the appropriate method depends on the nanoparticle type, the size range of interest, and the availability of equipment and expertise. Reliable measurement of nanoparticle size using these methods is expensive, time-consuming, and requires careful sample preparation, appropriate instrumentation, and selection of a proper measurement technique that considers the nanofluids’ properties.

Estimating agglomeration size from the properties of nanoparticles and preparation techniques that are readily and consistently available is a valuable strategy to address the limitations of experimental measurements [35]. Such an estimation model enables researchers to discern how each specific property of nanoparticles and each synthesis technique influences nanoparticle agglomeration [35]. Furthermore, accurately predicting nanoparticle agglomeration size in nanofluids has practical implications, contributing to developing more efficient and reliable nanofluid-based systems.

The Derjaguin–Landau–Verwey–Overbeek (DLVO) theory is instrumental in predicting the stability of nanofluids and estimating agglomerate size based on the attractive and repulsive forces between nanoparticles in a fluid [36]. Additionally, the Smoluchowski equation provides another method to calculate the agglomeration rate and size distribution [37]. Monte Carlo simulations employ statistical techniques to model the behavior of nanoparticles in a fluid and predict their agglomeration patterns [38]. Furthermore, fractal analysis is utilized to forecast agglomerate structure and size distribution [39].

Machine learning (ML) methods are adept at analyzing complex phenomena using historical data to uncover the inherent relationships between dependent and independent variables [40,41,42,43,44]. Given that alumina agglomeration size is influenced by various factors—including nanoparticle properties, preparation techniques such as surfactant use [45] and sonication [46], as well as operating conditions like temperature and pH [45]—it may not be accurately modeled using traditional statistical methods. Also, the relationships between these factors and agglomeration size are complex and poorly understood. Furthermore, the literature includes no attempt to apply machine learning methods to estimate alumina agglomeration size in water-based nanofluids. So, this current study utilizes Random Forest, Adaptive Boosting (AdaBoost), Extra Tree, Categorical Boosting (CatBoost), and Multilayer Perceptron Neural Networks to predict the agglomeration size in water-based nanofluids as a function of alumina size and dose in the base fluid, ultrasonication properties (power, frequency, and power), pH, surfactant type and concentration, and temperature, as well as their corresponding agglomeration sizes. These models can provide valuable insights into the behavior of nanoparticles in nanofluids and help optimize their performance for various applications. Also, the stability of nanofluids is directly related to the agglomeration of nanoparticles. Determining this stability in the laboratory is time-consuming and often expensive. Our work utilizes some simple and available features to estimate agglomeration size and help decide whether the water–alumina nanofluid is stable or not.

2. Literature Data for the Agglomeration Size in Water–Alumina Nanofluids

Table 1 provides a detailed summary of 345 experimental datasets meticulously compiled from the existing literature. This table includes essential information relevant to alumina agglomeration in water-based environments, such as alumina dosage, operational temperature, pH levels, and ultrasonication properties (duration, power, and frequency). It also details the observed agglomeration sizes (Da) in these studies. It must be noted that the nanoparticles’ size has been considered as the agglomeration size in the literature that reported no values for the Da. Notably, the literature sources frequently used surfactants like sodium dodecylbenzene sulfonate (SDBS), sodium dodecyl sulfate (SDS), polyvinyl alcohol (PVA), and polyvinylpyrrolidone (PVP). The interaction and variation in these parameters significantly influence the size of alumina agglomerates in aquatic media, providing crucial insights for this research study.

3. Description of Machine Learning Models

This section describes the working process of the intelligent models involved in this current study.

3.1. CatBoost

CatBoost is a gradient boosting algorithm [74], used for sequentially incorporating decision trees into a model, each designed to rectify the errors of its predecessors [75]. As the left-hand side of Figure 1 shows, the process initiates with a basic model, typically a single leaf, that predicts the average outcome for the entire dataset. By moving in a left–right direction, as shown in Figure 1, new trees are gradually integrated, with their predictions amalgamated with those from the existing trees. It must be noted that samples with higher errors are allocated more weight, depicted by expanded circles. As the model evolves, additional trees are incorporated into the samples, and estimation errors are reduced. This division continues until the minimal number of samples required for a split is met or the trees reach their maximum depth. Several variables, including the splitting criterion, the number of trees or estimators, the learning rate, the loss function, and the leaf’s regularization coefficient, are crucial in shaping the tree structure and need fine-tuning before the model is deployed. Interested readers may refer to Prokhorenkova et al.’s work for a deeper understanding of this topic [75]. CatBoost employs two distinct methods for feature splitting. Numerical features utilize a non-symmetric split method, where all potential split points are considered. Conversely, the symmetric method assigns two child nodes to each node, ensuring similar subtrees on both sides. This balanced approach not only aids in creating symmetric trees but also enhances the model’s accuracy.

A standout feature of CatBoost is its ‘ordered boosting’ technique, an innovation in gradient boosting algorithms. Unlike the traditional sequential or random training sequences, ordered boosting trains trees in a specific order based on the importance of their features. This strategic approach effectively constructs more precise models by giving precedence to the most impactful features, thus leading to a more informed and accurate predictive model.

CatBoost is a supervised learning algorithm that provides samples as (X, y) pairs. The objective is to develop a mapping function, f(X), capable of precise estimation of the outputs (y) from the inputs (X). The difference between the actual and predicted outputs (L) must be minimized according to Equation (1) in order to design such a mapping function.

L (f) = \sum_{s = 1}^{N} L (y_{s}, f (X_{s}))

(1)

where

N

is the number of training samples.

The optimization problem can be formulated as follows:

m i n L (f) = m i n \sum_{s = 1}^{N} L (y_{s}, f (X_{s}))

(2)

Considering M gradient boosting steps, new estimators (h_M) can be added to the model.

f_{M + 1} (X_{s}) = f_{M} (X_{s}) + h_{M} (X_{s})

(3)

where f_M+₁(X_s) is the new model.

The gradient boosting role is to find the new estimator by leveraging the steepest descent, i.e., h_M = −α_M g_M, where α_M is the learning rate, and g_M is the gradient of the loss function [74].

g_{M} = - {[\partial L (y_{s}, f (X_{s})) / \partial f (X_{s})]}_{f (X_{s}) = f_{M - 1} (X_{s})}

(4)

Now, the new tree/estimator is found as follows:

f_{M + 1} (X) = f_{M} (X) + \{m i n \sum_{s = 1}^{N} L (y_{s}, f_{M} (X_{s}) + h_{M} (X_{s}))\}

(5)

f_{M + 1} (X) = f_{M} (X) - α_{M} g_{M}

(6)

These steps are repeated M times as there are M gradient boosting steps.

3.2. AdaBoost

Adaptive Boosting, commonly known as AdaBoost, is a sophisticated ensemble boosting learning algorithm applicable to both classification and regression tasks [76]. AdaBoost’s core principle involves amalgamating the outputs of numerous weak learners, such as decision stumps, into a weighted sum that forms a powerful, more effective learner. The algorithm is termed ‘adaptive’ because each new weak learner is specifically tailored to address the mistakes of its predecessors, thereby progressively refining the model’s accuracy in areas where it is most deficient. This characteristic is depicted in Figure 2, which illustrates the AdaBoost learning process. AdaBoost often experiences lower overfitting than other learning methods on specific problems [77].

The AdaBoost algorithm unfolds through the following sequential steps:

Initial stage: All data points are assigned equal weight.
Iterative training: Weak learners are trained in sequence, with an increased emphasis on data points that previously yielded high prediction errors.
Performance evaluation: At the end of each cycle, the weak learner’s effectiveness is assessed, and higher weights are assigned to more accurate learners.
Adjusting weights: The algorithm amplifies the weights of incorrectly predicted data points, making them pivotal in subsequent iterations.
Final output: The ultimate prediction target is determined by the weighted average of the predictions made by all the weak learners.

A typical loss function employed in AdaBoost training is the mean squared error (MSE) [78], i.e., Equation (7).

L (f) = (1 / N) \sum_{s = 1}^{N} {(y_{s} - f (X_{s}))}^{2}

(7)

The weights of the samples (W) are updated based on the value of the loss function. Equation (8) introduces the weights update process in the presence of the observed error of Δ.

W^{t + 1} = W^{t} \times e^{\pm Δ}

(8)

where W^t+¹ represents the new weights, and W^t is the old weights. The sign of Δ will be positive for correctly predicted samples and negative for samples with high prediction errors.

The process concludes once a predetermined number of learners (estimators) have been trained or when optimal performance is attained. Therefore, key control parameters in AdaBoost include the number of estimators, the loss function, and the learning rate. The learning rate, often multiplied by a factor

α

, regulates the extent of weight adjustments in each iteration.

3.3. Random Forest (RF)

Random Forest, a well-known ensemble learning algorithm, utilizes the bagging method, where the outputs of multiple decision trees are aggregated to produce the final prediction. Each decision tree—acting as a learner—is trained on a unique random subset of the initial dataset [79]. The structure of a decision tree includes root, branch, and leaf nodes. Branch nodes, or internal nodes, categorize features based on specific criteria like squared error, absolute error, Friedman’s mean squared error, or Poisson error, thus establishing the decision rules. However, final decisions are made at the leaf nodes, not at these internal nodes [80,81]. Unlike boosting algorithms, which focus on correcting errors from previous iterations, Random Forest trains trees independently on different subsets of the data. This process continues until a predefined number of trees, known as estimators, is achieved. During training, the dataset is divided into ‘bag’ and ‘out-of-bag’ (OOB) segments, with the ‘bag’ part used to build the trees and the OOB part serving as a validation set for each tree, as depicted in Figure 3.

In Random Forest regression tasks, the final output is determined by averaging the predictions from all individual trees (

{\hat{y}}_{1}, \dots, {\hat{y}}_{T}

).

\hat{y} = (1 / T) \times \sum_{t = 1}^{T} {\hat{y}}_{t}

(9)

where

T

is the number of estimators, and

{\hat{y}}_{t}

is the output of estimator

t

.

This averaging process contributes to Random Forest’s enhanced stability over traditional decision trees [82]. Training proceeds until the maximum number of estimators is trained, an optimal performance is attained, or the minimum sample count required for a split is met.

Therefore, key control parameters in Random Forests include the number of estimators, the criterion for determining split quality, the maximum depth allowed for trees, the maximal number of features considered for tree construction, and the minimum number of samples essential for a split.

3.4. Extra Trees (ET)

Extra Trees, an abbreviation of Extremely Randomized Trees, serves as an ensemble learning algorithm applicable for both regression and classification tasks [83]. Originating from the Random Forest algorithm, Extra Trees distinguishes itself through its unique tree construction methodology. This method selects a random subset of features for each split within a tree, and the split threshold is also chosen randomly. This introduces a higher level of randomness than traditional Random Forests, where the optimal feature from a subset is selected for each split.

Despite these differences, Extra Trees share several characteristics with random forests. Both algorithms employ bagging, training numerous trees on various random subsets of the data. Furthermore, they utilize averaging and voting mechanisms for regression and classification outputs. A significant advantage of Extra Trees is their reduced variance and enhanced resilience against overfitting compared to singular decision trees. However, a notable drawback is the complexity of interpreting the model due to the less straightforward nature of feature importance.

In terms of hyperparameters, Extra Trees and Random Forests are similar. Key parameters affecting model performance include the number of trees (estimators), the criterion for split quality, the maximum depth of trees, the maximum number of features considered, and the minimum number of samples required for a node split. Fine-tuning these hyperparameters is essential to optimize the performance of both algorithms. The tree-building process and control parameters of Extra Trees and Random Forests resemble each other.

3.5. Multilayer Perceptron (MLP) Neural Networks

Multilayer Perceptron is likely the most well-known type of artificial neural network that consists of multiple layers of nodes so that each one is connected to the next layer of nodes. This intelligent tool is mainly applied to handle either classification or regression tasks. The node uses a series of mathematical operations according to Equations (10) and (11) to compute output (O) from input signals.

η = W \cdot X + b

(10)

O = ϕ (η)

(11)

Here W, b, and ϕ are weight, bias, and activation function, respectively.

Multilayer Perceptron (MLP) Neural Networks typically consist of input, hidden, and output layers, although the number of layers can vary. The input layer receives the initial data, acting as the interface for the influential features, which are then connected to the nodes in the hidden layer through weighted connections. These hidden nodes execute mathematical operations on the input data and relay the processed information to the output layer. The output layer is responsible for generating the final output of the MLP Neural Network. The dimensions of the input and output layers are determined by the number of influential and target features, respectively. By contrast, the number of nodes in the hidden layer is variable and must be carefully selected to optimize performance. Key hyperparameters in an MLP Neural Network include the number of hidden nodes, the activation functions in the hidden and output layers, and the choice of training algorithm.

4. Results and Discussion

A step-by-step description of machine learning model development, selection, and reliability evaluation is given in this section.

4.1. Training and Testing Series

Machine learning (ML) models are typically developed through a two-step process—k-fold cross-validation followed by testing. The initial step combines training and validation, where the ML models’ parameters are adjusted, and hyperparameters are determined using an internal dataset (containing known values for both independent and dependent variables). Parameter adjustment involves minimizing the discrepancy between predicted and actual values of the target variable using an optimization technique. Hyperparameters are often determined through a trial-and-error process or via a search method. Once trained, the model can predict target values from unseen datasets. During the testing stage, the model is given the numerical values of independent variables from these new datasets to evaluate its generalizability.

This study used 283 out of the 345 datasets for five-fold cross-validation, while the remaining 62 unseen datasets were allocated to the testing group.

4.2. Statistical Accuracy Monitoring

Measuring the deviation between actual and predicted values of a target variable is necessary to adjust the ML model’s parameters, determine proper hyperparameters, and assess generalizability. In this study, we measure the deviation between actual and predicted values of the alumina agglomeration size by regression coefficient (R), relative absolute error (RAE%), absolute average relative deviation (AARD%), and root mean squares error (RMSE). R, RAE%, AARD%, and RMSE can be obtained from Equations (12)–(15), respectively [84].

R = \sqrt{1 - [\sum_{s = 1}^{N} {(D_{a}^{a c t} - D_{a}^{p r e d})}_{s}^{2} / \sum_{s = 1}^{N} {(D_{a}^{a c t} - D_{a}^{a v e})}_{s}^{2}]}, where D_{a}^{a v e} = (1 / N) \times \sum_{s = 1}^{N} D_{a}^{a c t}

(12)

R A E % = 100 \times (\sum_{s = 1}^{N} {|D_{a}^{a c t} - D_{a}^{p r e d}|}_{s} / \sum_{s = 1}^{N} |D_{a}^{a c t} - D_{a}^{a v e}|)

(13)

A A R D % = (100 / N) \times \sum_{s = 1}^{N} {|D_{a}^{a c t} - D_{a}^{p r e d}|}_{s} / D_{a}^{a c t}

(14)

R M S E = \sqrt{(1 / N) \times \sum_{s = 1}^{N} {(D_{a}^{a c t} - D_{a}^{p r e d})}_{s}^{2}}

(15)

In these equations, actual and predicted values of the alumina agglomeration size are shown by

D_{a}^{a c t}

and

D_{a}^{p r e d}

, respectively.

4.3. Models’ Design

This study utilizes grid search to determine the proper hyperparameters of ML models. Hence, providing the grid search with each ML tool’s feasible ranges of hyperparameters is necessary. Table 2 introduces the hyperparameters of the ML models involved in this current study, along with the ranges investigated by the grid search. The grid search technique applies all possible combinations of hyperparameters of each ML model to perform the cross-validation stage and testing phase. Finally, it is possible to select the best hyperparameters based on the ML performances (low AARD%, RAE%, and RMSE and large R) in the cross-validation and testing stages.

The last column of this table presents the most suitable hyperparameters identified by this search technique.

4.4. The Best Model Selection

The previous analysis determined the best combination of hyperparameters for each ML model. Measuring and reporting the deviation between actual and predicted values of the alumina agglomeration size is necessary. The observed deviations between actual agglomeration sizes and AdaBoost, CatBoost, ET, RF, and MLP predictions were computed in the cross-validation and testing stages and are reported in Table 3. This table also introduces the observed uncertainty for predicting the whole database, i.e., cross-validation + testing.

It can be easily concluded that the AdaBoost, ET, and RF models are not accurate enough even to predict the cross-validation datasets. Contrarily, the MLP Neural Network and CatBoost provide reliable predictions for the alumina agglomeration size in laboratory-scale conditions. The acceptable performance of these two ML models was approved by the observed deviations between their results and actual data in terms of AARD%, RAE%, RMSE, and R.

Figure 4 compares the cumulative frequency versus absolute relative deviation (ARD%, Equation (16)) for the designed ML models.

A R D % = 100 \times {|D_{a}^{a c t} - D_{a}^{p r e d}|}_{s} / D_{a}^{a c t}

(16)

This figure explains how the percent of the experimental data has been estimated with a specific ARD%. Indeed, it helps visually determine which model offers the most accurate predictions for the alumina agglomeration size. This visual analysis proves that CatBoost best estimates the alumina agglomeration size in aquatic media. This model estimates a small fraction of the experimental database with an ARD higher than 2%. The MLP Neural Network is the second-best ML model for simulating the considered problem. On the other hand, AdaBoost is the worst model for simulating the considered task (it predicts ~56% of the actual agglomeration sizes with ARD < 10%, and the remaining 44% of the datasets are estimated with ARD > 10%).

The violin plot, which displays the distribution of a data series along with their key statistical characteristics (such as the average and median), is another visual technique to compare the performance of ML models toward predicting alumina agglomeration size in water-based nanofluids. Figure 5 introduces the violin plots of the actual alumina agglomeration size and predictions by the five designed ML models. Considering shape, spread, outliers, and consistency, this investigation demonstrates that the CatBoost model aligns closely with the actual data. On the other hand, the AdaBoost and actual violin plots have the highest difference in their shapes.

4.5. Inspection of the Performance of the Best Model

This part of the study employs various visual inspection methods—such as cross-plots, histograms of residual errors, Bland–Altman plots, and Kernel Density Estimation (KDE) against magnitude profiles—to evaluate the congruence between actual alumina agglomeration sizes and their predictions made by the CatBoost model.

Figure 6 presents a scatter plot of the predicted alumina agglomeration sizes versus their laboratory-measured counterparts. The close alignment of both cross-validation and testing data points with the diagonal line in this figure highlights the CatBoost model’s exceptional ability to accurately predict Al₂O₃ agglomeration sizes in real-world scenarios. Notably, the regression coefficients for the cross-validation and testing phases are 0.9745 and 0.9991, respectively. Additionally, the overall regression coefficient for the entire database is 0.9762, further demonstrating the model’s robust predictive capability. The CatBoost model also achieves excellent average absolute relative deviations (AARDs%) of 6.38 and 8.86 for the cross-validation and testing groups, respectively.

The difference between actual and predicted values of the alumina agglomeration size (Δ) can be obtained from Equation (17).

Δ_{s} = {(D_{a}^{a c t} - D_{a}^{p r e d})}_{s} s = 1, 2, \dots, N

(17)

The histogram of the observed residual errors in the cross-validation and testing stages is shown in Figure 7. It can be seen that the CatBoost model is accurate enough to estimate the maximum number of cross-validation (~250 data) and testing (~45 data) samples with the lowest residual error, i.e., almost equal to zero. This investigation also shows that the error histogram has a normal distribution shape.

The results of analyzing the CatBoost model performance to simulate Al₂O₃ agglomeration size in water by the Bland–Altman technique is presented in Figure 8. This technique plots residual errors versus their average values (

Δ^{a v e}

, Equation (18)).

Δ^{a v e} = \sum_{s = 1}^{N} Δ_{s} / N

(18)

The Bland–Altman method determines the bounds of the feasible region by the upper and lower limits of agreement (LoA, Equation (19)). These upper and lower LoAs can be obtained from Equation (19).

L o A = Δ^{a v e} \pm 1.96 S D

(19)

Equation (20) can be used to calculate the standard deviation (SD).

S D = \sqrt{\sum_{s = 1}^{N} {(Δ_{s} - Δ^{a v e})}^{2} / (N - 1)}

(20)

The following figure shows that only one of the predictions of the CatBoost model is out of the feasible region. This analysis can be viewed as a validation of the CatBoost model.

Figure 9 exhibits/compares the actual and predicted KDE magnitude profiles of the alumina agglomeration size in aqueous suspensions. This analysis clarifies that only two predicted results have an observable deviation from their associated actual records.

4.6. Validity Domain Inspection

As the last analysis, this section applies the leverage technique to determine the outlier, out-of-leverage, and valid samples among the collected records from the literature for the alumina agglomeration size. As Figure 10 shows, this technique plots standardized residual (STR, Equation (21)) as a function of the Hat index (HI, Equation (22)).

S T R_{s} = Δ_{s} / S D s = 1, 2, \dots, N

(21)

H I = d i a g (X {(X^{T} X)}^{- 1} X^{T})

(22)

It must be noted that the HI values are the diagonal element of the above matrix. In addition, X, −1, and T indicate the matrix of input features of ML models, matrix inverse operator, and transpose, respectively.

The feasible region of the leverage method is determined by the STR = ±3 and HI < leverage limit (Equation (23)).

L e v e r a g e l i m i t = 3 \times (p + 1) / N

(23)

Here, p designates the number of input features of ML models (i.e., 8).

The following figure shows only two outliers and one out-of-leverage record among the 345 alumina agglomeration sizes collected from the literature.

4.7. Relevancy Test

This section employs the Pearson method to sort the independent variables based on the importance of their role in the agglomeration size. The Pearson method shows the strength and direction of a relationship between two series of variables using a factor ranging from −1 to +1 [85]. The sign of the Pearson factor clarifies the relationship type, i.e., direct and inverse relationships. Also, the Pearson factor magnitude shows the relationship’s strength. Table 4 reports the Pearson factors between the agglomeration size and each independent variable. This table also introduces the relationship type and relevancy degree for all pairs of target–feature combinations. It can be seen that ultrasonic frequency, ultrasonic power, alumina dose in water, and surfactant concentration have the most crucial role in the agglomeration size. On the other hand, temperature, ultrasonic time, and pH have the slightest impact on the agglomeration size.

If a high level of scattering exists in the databank, the Pearson method may wrongly anticipate the direction and strength of a relationship between dependent and independent variables [86]. For example, pH is one of the most important parameters determining agglomeration levels, and Zeta potential measurement devices are designed based on this property. However, due to the high level of data scattering, the Pearson method failed to anticipate agglomeration size dependency on the pH precisely.

4.8. Trend Analysis

The concentration of nanoparticles in water-based nanofluids and the duration of ultrasonic treatment can influence the size of agglomerates in aquatic media. Figure 11 demonstrates the effect of ultrasonic time (200 W and 24 kHz) across three alumina concentration levels on agglomeration size, incorporating both experimental and modeling insights. The figure clearly shows an outstanding agreement between laboratory-measured results and their corresponding predictions using the CatBoost model. The observed average absolute relative deviations (AARDs%) for 1 to 3 vol% alumina concentrations are 9.18%, 9.83%, and 12.18%, respectively.

Additionally, the figure reveals that agglomeration size increases with alumina concentration in water. This trend may be attributed to the increased likelihood of nanoparticle collisions and subsequent adhesion. Initially, ultrasonic mixing disrupts the agglomerates, significantly reducing their size. However, as sonication time extends, a slight increase in agglomeration size is observed due to continued collisions and particle adhesion.

Figure 12 investigates the effect of pH on the agglomeration size at two alumina concentrations in aqueous suspension using experimental and modeling results. The agreement between actual and modeling profiles can be approved by the observed AARDs% of 6.67 (0.01 vol%) and 11.8 (0.03 vol%). The nanofluid’s pH can affect the alumina nanoparticles’ surface charge, influencing their size distribution. This analysis shows that the maximum agglomeration size of ~500 nm occurs at pH = 10. This means that the alumina nanoparticles have the maximum tendency to adhere to each other in this condition.

5. Conclusions

This study utilized five machine learning models to accurately estimate the agglomeration size of alumina in aquatic media. The models—Random Forest, AdaBoost, Extra Trees, CatBoost, and a Multilayer Perceptron Neural Networks—relied on parameters such as alumina dose, temperature, pH, surfactant type and concentration, and ultrasonication properties (power and frequency). According to the Pearson method, ultrasonic frequency, ultrasonic power, alumina dose, and surfactant concentration were identified as the most influential factors on agglomeration size in water-based nanofluids. Conversely, temperature, ultrasonic time, and pH had minimal impact.

The CatBoost model outperformed its counterparts in predicting alumina agglomeration sizes through a combination of statistical analysis and grid search techniques. This efficacy is highlighted by its performance metrics: an average absolute relative deviation (AARD) of 6.75%, a relative absolute error (RAE%) of 12.83%, and a correlation coefficient (R-value) of 0.9762. Various graphical analyses confirmed the excellent alignment between the actual agglomeration sizes and predictions made by the CatBoost model. The leverage method was also applied to validate the experimental data, confirming the reliability of 342 out of 345 datasets, representing over 99% validity. This method also identified two suspect samples and one outlier, pinpointing a total of three problematic data points in the database.

Future research in this field may explore the interactions between alumina nanoparticles and water molecules to understand further the factors controlling agglomeration size and the stability of nanofluids.

Author Contributions

B.V.: Conceptualization, Writing—original draft, Data curation, Investigation, Supervision, Final approval. M.D.: Writing and review editing, Software, Methodology, Analysis, Project administration, Final approval. A.H.A.: Writing and review editing, Final approval. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

This study analyzes a gathered database from the literature. The corresponding authors are ready to provide this database upon reasonable request.

Conflicts of Interest

The authors declare that they have no competing interests.

References

Javadpour, R.; Heris, S.Z.; Mohammadfam, Y.; Mousavi, S.B. Optimizing the heat transfer characteristics of MWCNTs and TiO₂ water-based nanofluids through a novel designed pilot-scale setup. Sci. Rep. 2022, 12, 15154. [Google Scholar] [CrossRef] [PubMed]
Ahmad, S.; Takana, H.; Ali, K.; Akhtar, Y.; Hassan, A.M.; Ragab, A.E. Role of localized magnetic field in vortex generation in tri-hybrid nanofluid flow: A numerical approach. Nanotechnol. Rev. 2023, 12, 20220561. [Google Scholar] [CrossRef]
Shahzad, A.; Liaqat, F.; Ellahi, Z.; Sohail, M.; Ayub, M.; Ali, M.R. Thin film flow and heat transfer of Cu-nanofluids with slip and convective boundary condition over a stretching sheet. Sci. Rep. 2022, 12, 14254. [Google Scholar] [CrossRef] [PubMed]
Sun, L.; Liang, T.; Zhang, C.; Chen, J. The rheological performance of shear-thickening fluids based on carbon fiber and silica nanocomposite. Phys. Fluids 2023, 35, 032002. [Google Scholar] [CrossRef]
Sharif, M.; Heidari, A.; Aghaeinejad Meybodi, A. Polythiophene/Zinc Oxide/Graphene Oxide Ternary Photocatalyst: Synthesis, characterization and application. Polym. Technol. Mater. 2021, 60, 1450–1460. [Google Scholar] [CrossRef]
Bazireh, E.; Sharif, M. Polythiophene-coated multi-walled carbon nanotube-reinforced epoxy nanocomposites for enhanced mechanical, electrical and thermal properties. Polym. Bull. 2020, 77, 4537–4553. [Google Scholar] [CrossRef]
Subramanian, K.R.V.; Rao, T.N.; Balakrishnan, A. Nanofluids and Their Engineering Applications; CRC Press: Boca Raton, FL, USA, 2019; ISBN 0429886993. [Google Scholar]
Gakare, A. A review on nanofluids: Preparation and applications. J. Nanotechnol. Its Appl. 2019, 21, 21–35. [Google Scholar]
Ma, H.; Liu, J.; Liang, W.; Li, J.; Zhao, W.; Sun, P.; Ji, Q. Effects of PEMFC cooling channel insulation coating on heat transfer and electrical discharge characteristics of nanofluid coolants. Appl. Energy 2024, 357, 122514. [Google Scholar] [CrossRef]
Yuan, B.; Zhan, G.; Xing, L.; Li, Y.; Huang, Z.; Chen, Z.; Wang, L.; Li, J. Boosting CO₂ absorption and desorption of biphasic solvent by nanoparticles for efficient carbon dioxide capture. Sep. Purif. Technol. 2024, 329, 125108. [Google Scholar] [CrossRef]
Akanda, M.A.M.; Shin, D. A synthesis parameter of molten salt nanofluids for solar thermal energy storage applications. J. Energy Storage 2023, 60, 106608. [Google Scholar] [CrossRef]
Mousavi, S.M.; Alborzi, Z.S.; Raveshiyan, S.; Amini, Y. Applications of nanotechnology in the harvesting of solar energy. In Nanotechnology Applications for Solar Energy Systems; Wiley: Hoboken, NJ, USA, 2023; pp. 239–256. [Google Scholar]
Toledo, C.; Gambaro, R.C.; Padula, G.; Vela, M.E.; Castro, G.R.; Chain, C.Y.; Islan, G.A. Binary medical nanofluids by combination of polymeric Eudragit nanoparticles for vehiculization of tobramycin and resveratrol: Antimicrobial, hemotoxicity and protein corona studies. J. Pharm. Sci. 2021, 110, 1739–1748. [Google Scholar] [CrossRef] [PubMed]
Sheikhpour, M.; Arabi, M.; Kasaeian, A.; Rabei, A.R.; Taherian, Z. Role of nanofluids in drug delivery and biomedical technology: Methods and applications. Nanotechnol. Sci. Appl. 2020, 13, 47–59. [Google Scholar] [CrossRef] [PubMed]
He, J.; Sun, J.; Meng, Y.; Tang, H.; Wu, P. Improved lubrication performance of MoS₂-Al₂O₃ nanofluid through interfacial tribochemistry. Colloids Surf. A Physicochem. Eng. Asp. 2021, 618, 126428. [Google Scholar] [CrossRef]
Mahian, O.; Bellos, E.; Markides, C.N.; Taylor, R.A.; Alagumalai, A.; Yang, L.; Qin, C.; Lee, B.J.; Ahmadi, G.; Safaei, M.R. Recent advances in using nanofluids in renewable energy systems and the environmental implications of their uptake. Nano Energy 2021, 86, 106069. [Google Scholar] [CrossRef]
Manikandan, S.; Chinnusamy, P.; Thangamani, R.; Palaniraj, S.; Ravichandran, P.; Karuppasamy, S.; Sanmugam, Y. Heat transfer performance of Al₂O₃-water-methanol nanofluid in a plate heat exchanger. Chem. Ind. Chem. Eng. Q. 2023, 30, 257–264. [Google Scholar] [CrossRef]
Mukherjee, S.; Jana, S.; Mishra, P.C.; Chaudhuri, P.; Chakrabarty, S. Experimental investigation on thermo-physical properties and subcooled flow boiling performance of Al₂O₃/water nanofluids in a horizontal tube. Int. J. Therm. Sci. 2021, 159, 106581. [Google Scholar] [CrossRef]
Mehta, B.; Subhedar, D.; Panchal, H.; Sadasivuni, K.K. Stability and thermophysical properties enhancement of Al₂O₃-water nanofluid using cationic CTAB surfactant. Int. J. Thermofluids 2023, 20, 100410. [Google Scholar] [CrossRef]
Hamidatou, S.; Nadir, M.; Togun, H.; Abed, A.M.; Deghoum, K.; Hadjad, A.; Ahmadi, G. Experimental investigation of a novel heat exchanger for optimizing heat transfer performance using Al₂O₃-water nanofluids. Heat Mass Transf. 2023, 59, 1635–1646. [Google Scholar] [CrossRef]
Harrabi, I.; Hamdi, M.; Hazami, M. Long-term performances and technoeconomic and environmental assessment of Al₂O₃/water and MWCNT/oil nanofluids in three solar collector technologies. J. Nanomater. 2021, 2021, 6461895. [Google Scholar] [CrossRef]
Han, S.; Gomez-Flores, A.; Choi, S.; Kim, H.; Lee, Y. A study of nanofluid stability in low–salinity water to enhance oil recovery: An extended physicochemical approach. J. Pet. Sci. Eng. 2022, 215, 110608. [Google Scholar] [CrossRef]
Khaleduzzaman, S.S.; Sohel, M.R.; Saidur, R.; Mahbubul, I.M.; Shahrul, I.M.; Akash, B.A.; Selvaraj, J. Energy and exergy analysis of alumina–water nanofluid for an electronic liquid cooling system. Int. Commun. Heat Mass Transf. 2014, 57, 118–127. [Google Scholar] [CrossRef]
Ahmed, M.S.; Hady, M.R.A.; Abdallah, G. Experimental investigation on the performance of chilled-water air conditioning unit using alumina nanofluids. Therm. Sci. Eng. Prog. 2018, 5, 589–596. [Google Scholar] [CrossRef]
Ilyas, S.U.; Pendyala, R.; Marneni, N. Preparation, sedimentation, and agglomeration of nanofluids. Chem. Eng. Technol. 2014, 37, 2011–2021. [Google Scholar] [CrossRef]
Zerradi, H.; Mizani, S.; Loulijat, H.; Dezairi, A.; Ouaskit, S. Population balance equation model to predict the effects of aggregation kinetics on the thermal conductivity of nanofluids. J. Mol. Liq. 2016, 218, 373–383. [Google Scholar] [CrossRef]
Bałdyga, J.; Orciuch, W.; Makowski, Ł.; Malski-Brodzicki, M.; Malik, K. Break up of nano-particle clusters in high-shear devices. Chem. Eng. Process. Process Intensif. 2007, 46, 851–861. [Google Scholar] [CrossRef]
Timofeeva, E.V.; Gavrilov, A.N.; McCloskey, J.M.; Tolmachev, Y.V.; Sprunt, S.; Lopatina, L.M.; Selinger, J.V. Thermal conductivity and particle agglomeration in alumina nanofluids: Experiment and theory. Phys. Rev. E-Stat. Nonlinear Soft Matter Phys. 2007, 76, 28–39. [Google Scholar] [CrossRef]
Dwornick, B.L.; Jeyashekar, N.S.; Johnson, J.E.; Schall, J.D.; Comfort, A.S.; Zou, Q.; Dusenbury, J.S.; Thrush, S.J.; Powers, C.C.; Hutzler, S.A. Application of Dynamic Light Scattering to Characterize Nanoparticle Agglomeration in Alumina Nanofluids and its Effect on Thermal Conductivity. J. Artic. 2012, 11, 2. [Google Scholar]
Meriläinen, A.; Seppälä, A.; Saari, K.; Seitsonen, J.; Ruokolainen, J.; Puisto, S.; Rostedt, N.; Ala-Nissila, T. Influence of particle size and shape on turbulent heat transfer characteristics and pressure losses in water-based nanofluids. Int. J. Heat Mass Transf. 2013, 61, 439–448. [Google Scholar] [CrossRef]
Chicea, D. Nanoparticles and nanoparticle aggregates sizing by DLS and AFM. J. Optoelectron. Adv. Mater. 2010, 4, 1310–1315. [Google Scholar]
Maheshwary, P.B.; Handa, C.C.; Nemade, K.R. A comprehensive study of effect of concentration, particle size and particle shape on thermal conductivity of titania/water based nanofluid. Appl. Therm. Eng. 2017, 119, 79–88. [Google Scholar] [CrossRef]
Eneren, P.; Sergievskaya, A.; Aksoy, Y.T.; Umek, P.; Konstantinidis, S.; Vetrano, M.R. Time-resolved in situ nanoparticle size evolution during magnetron sputtering onto liquids. Nanoscale Adv. 2023, 5, 4809–4818. [Google Scholar] [CrossRef]
Cohen, J.M.; DeLoid, G.M.; Demokritou, P. A critical review of in vitro dosimetry for engineered nanomaterials. Nanomedicine 2015, 10, 3015–3032. [Google Scholar] [CrossRef]
Krishna, K.H.; Neti, S.; Oztekin, A.; Mohapatra, S. Modeling of particle agglomeration in nanofluids. J. Appl. Phys. 2015, 117, 094304. [Google Scholar] [CrossRef]
Zhang, T.; Zou, Q.; Cheng, Z.; Chen, Z.; Liu, Y.; Jiang, Z. Effect of particle concentration on the stability of water-based SiO₂ nanofluid. Powder Technol. 2021, 379, 457–465. [Google Scholar] [CrossRef]
Kundan, L.; Mallick, S.S.; Pal, B. An investigation into the effect of nanoclusters growth on perikinetic heat conduction mechanism in an oxide based nanofluid. Powder Technol. 2017, 311, 273–286. [Google Scholar] [CrossRef]
Feng, Y.; Yu, B.; Feng, K.; Xu, P.; Zou, M. Thermal conductivity of nanofluids and size distribution of nanoparticles by Monte Carlo simulations. J. Nanopart. Res. 2008, 10, 1319–1328. [Google Scholar] [CrossRef]
Du, M.; Tang, G.H. Optical property of nanofluids with particle agglomeration. Sol. Energy 2015, 122, 864–872. [Google Scholar] [CrossRef]
Sharma, P.; Said, Z.; Kumar, A.; Nizetic, S.; Pandey, A.; Hoang, A.T.; Huang, Z.; Afzal, A.; Li, C.; Le, A.T. Recent advances in machine learning research for nanofluid-based heat transfer in renewable energy system. Energy Fuels 2022, 36, 6626–6658. [Google Scholar] [CrossRef]
Zhang, H.; Zou, Q.; Ju, Y.; Song, C.; Chen, D. Distance-based Support Vector Machine to Predict DNA N6-methyladenine Modification. Curr. Bioinform. 2022, 17, 473–482. [Google Scholar] [CrossRef]
Yin, L.; Lin, S.; Sun, Z.; Li, R.; He, Y.; Hao, Z. A game-theoretic approach for federated learning: A trade-off among privacy, accuracy and energy. Digit. Commun. Netw. 2024, 10, 389–403. [Google Scholar] [CrossRef]
Alazeb, A.; Chughtai, B.R.; Al Mudawi, N.; AlQahtani, Y.; Alonazi, M.; Aljuaid, H.; Jalal, A.; Liu, H. Remote intelligent perception system for multi-object detection. Front. Neurorobot. 2024, 18, 1398703. [Google Scholar]
Mohammadzadeh, A.; Zhang, C.; Alattas, K.A.; El-Sousy, F.F.M.; Vu, M.T. Fourier-based type-2 fuzzy neural network: Simple and effective for high dimensional problems. Neurocomputing 2023, 547, 126316. [Google Scholar] [CrossRef]
Zareei, M.; Yoozbashizadeh, H.; Hosseini, H.R.M. Investigating the effects of pH, surfactant and ionic strength on the stability of alumina/water nanofluids using DLVO theory. J. Therm. Anal. Calorim. 2019, 135, 1185–1196. [Google Scholar] [CrossRef]
Mahbubul, I.M.; Shahrul, I.M.; Khaleduzzaman, S.S.; Saidur, R.; Amalina, M.A.; Turgut, A. Experimental investigation on effect of ultrasonication duration on colloidal dispersion and thermophysical properties of alumina–water nanofluid. Int. J. Heat Mass Transf. 2015, 88, 73–81. [Google Scholar] [CrossRef]
Sadeghi, R.; Etemad, S.G.; Keshavarzi, E.; Haghshenasfard, M. Investigation of alumina nanofluid stability by UV–vis spectrum. Microfluid. Nanofluid. 2015, 18, 1023–1030. [Google Scholar] [CrossRef]
Wang, R.-T.; Wang, J.-C. Intelligent dimensional and thermal performance analysis of Al₂O₃ nanofluid. Energy Convers. Manag. 2017, 138, 686–697. [Google Scholar] [CrossRef]
Lai, W.Y.; Duculescu, B.; Phelan, P.E.; Prasher, R.S. Convective heat transfer with nanofluids in a single 1.02-mm tube. Am. Soc. Mech. Eng. Heat Transf. Div. HTD 2009, 131, 112401. [Google Scholar]
Syarif, D.G.; Prajitno, D.H. Synthesis and characterization of Al₂O₃ nanoparticles and water-Al₂O₃ nanofluids for nuclear reactor Coolant. Adv. Mater. Res. 2015, 1123, 270–273. [Google Scholar] [CrossRef]
Cacua, K.; Ordoñez, F.; Zapata, C.; Herrera, B.; Pabón, E.; Buitrago-Sierra, R. Surfactant concentration and pH effects on the zeta potential values of alumina nanofluids to inspect stability. Colloids Surf. A Physicochem. Eng. Asp. 2019, 583, 123960. [Google Scholar] [CrossRef]
Liu, L.; Stetsyuk, V.; Kubiak, K.J.; Yap, Y.F.; Goharzadeh, A.; Chai, J.C. Nanoparticles for convective heat transfer enhancement: Heat transfer coefficient and the effects of particle size and zeta potential. Chem. Eng. Commun. 2019, 206, 761–771. [Google Scholar] [CrossRef]
Zhao, W.L.; Zhu, B.J.; Li, J.K.; Guan, Y.X.; Li, D.D. Suspension stability and thermal conductivity of oxide based nanofluids with low volume concentration. Adv. Mater. Res. 2011, 160, 802–808. [Google Scholar] [CrossRef]
Pastoriza-Gallego, M.J.; Casanova, C.; Páramo, R.; Barbés, B.; Legido, J.L.; Piñeiro, M.M. A study on stability and thermophysical properties (density and viscosity) of Al₂O₃ in water nanofluid. J. Appl. Phys. 2009, 106, 64301. [Google Scholar] [CrossRef]
Hung, Y.-H.; Chen, J.-H.; Teng, T.-P. Feasibility assessment of thermal management system for green power sources using nanofluid. J. Nanomater. 2013, 2013, 321261. [Google Scholar] [CrossRef]
Lu, S.; Song, J.; Li, Y.; Xing, M.; He, Q. Improvement of CO₂ absorption using Al₂O₃ nanofluids in a stirred thermostatic reactor. Can. J. Chem. Eng. 2015, 93, 935–941. [Google Scholar] [CrossRef]
Huang, J.; Wang, X.; Long, Q.; Wen, X.; Zhou, Y.; Li, L. Influence of pH on the stability characteristics of nanofluids. In Proceedings of the 2009 Symposium on Photonics and Optoelectronics, Wuhan, China, 14–16 August 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–4. [Google Scholar]
Pare, A.; Ghosh, S.K. Rheological Analyses of Aluminum Oxide Based Water Nanofluid. In Proceedings of the International Conference on Thermal Engineering, Candhinagar, India, 23–26 February 2019; pp. 3–5. [Google Scholar]
Vakilinejad, A.; Aroon, M.A.; Al-Abri, M.; Bahmanyar, H.; Al-Ghafri, B.; Myint, M.T.Z.; Vakili-Nezhaad, G.R. Experimental investigation and modeling of the viscosity of some water-based nanofluids. Chem. Eng. Commun. 2021, 208, 1054–1068. [Google Scholar] [CrossRef]
Zhao, W.; Li, J.; Liu, Z.; Guan, Y. Thermal Conductivities and Viscosities of Al₂O₃-Water Nanofluids with Low Volume Concentrations. In Proceedings of the International Conference on Micro/Nanoscale Heat Transfer, Shanghai, China, 18–21 December 2009; Volume 1, pp. 491–496. [Google Scholar]
Okonkwo, E.C.; Wole-Osho, I.; Kavaz, D.; Abid, M.; Al-Ansari, T. Thermodynamic evaluation and optimization of a flat plate collector operating with alumina and iron mono and hybrid nanofluids. Sustain. Energy Technol. Assess. 2020, 37, 100636. [Google Scholar] [CrossRef]
Lee, S.W. Effects of Graphene and SiC Nanofluids on Critical Heat Flux and Quenching for Advanced Nuclear Reactors. Ph.D. Thesis, Departmeny of Nuclear Engineering, Ulsan National Institute of Science & Technology, Ulsan, Republic of Korea, 2013. [Google Scholar]
Kim, H. Effective Dynamic Conductivity Correlation of Nanofluids in Convective Flow. Ph.D. Thesis, School of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Republic of Korea, 2016. [Google Scholar]
Mukherjee, S.; Chakrabarty, S.; Mishra, P.C.; Chaudhuri, P. Transient heat transfer characteristics and process intensification with Al₂O₃-water and TiO₂-water nanofluids: An experimental investigation. Chem. Eng. Process. Intensif. 2020, 150, 107887. [Google Scholar] [CrossRef]
Wang, X.-J.; Li, X.-F.; Wang, N.; Wen, X.-Y.; Long, Q. Influence of SDBS on stability of Al₂O₃ nano-Suspensions. In Proceedings of the Nanophotonics, Nanostructure, and Nanometrology II, Beijing, China, 12–14 November 2007; Volume 6831, pp. 164–169. [Google Scholar]
Zhu, D.; Wang, X.; Li, X. Influence of SDBS on dispersive stability of AI₂O₃ nano-suspenions. In Proceedings of the 2008 ASME Micro/Nanoscale Heat Transfer International Conference, Tainan, Taiwan, 6–9 June 2008; Volume 42924, pp. 575–580. [Google Scholar]
Zhu, B.J.; Zhao, W.L.; Li, J.K.; Guan, Y.X.; Li, D.D. Thermophysical properties of Al₂O₃-water nanofluids. Mater. Sci. Forum 2011, 688, 266–271. [Google Scholar] [CrossRef]
Jang, S.P.; Hwang, K.S.; Lee, J.-H.; Kim, J.H.; Lee, B.H.; Choi, S.U.S. Effective thermal conductivities and viscosities of water-based nanofluids containing Al₂O₃ with low concentration. In Proceedings of the 2007 7th IEEE Conference on Nanotechnology (IEEE NANO), Hong Kong, China, 2–5 August 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 1011–1014. [Google Scholar]
Mahbubul, I.M.; Saidur, R.; Amalina, M.A.; Elcioglu, E.B.; Okutucu-Ozyurt, T. Effective ultrasonication process for better colloidal dispersion of nanofluid. Ultrason. Sonochem. 2015, 26, 361–369. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Zhu, D. Investigation of pH and SDBS on enhancement of thermal conductivity in nanofluids. Chem. Phys. Lett. 2009, 470, 107–111. [Google Scholar] [CrossRef]
Zhu, D.; Li, X.; Wang, N.; Wang, X.; Gao, J.; Li, H. Dispersion behavior and thermal conductivity characteristics of Al₂O₃–H₂O nanofluids. Curr. Appl. Phys. 2009, 9, 131–139. [Google Scholar] [CrossRef]
Wang, X.-J.; Li, H.; Li, X.-F.; Wang, Z.-F.; Lin, F. Stability of TiO₂ and Al₂O₃ nanofluids. Chinese Phys. Lett. 2011, 28, 86601. [Google Scholar] [CrossRef]
Choudhary, R.; Khurana, D.; Kumar, A.; Subudhi, S. Stability analysis of Al₂O₃/water nanofluids. J. Exp. Nanosci. 2017, 12, 140–151. [Google Scholar] [CrossRef]
Fahmy, O.M.; Eissa, R.A.; Mohamed, H.H.; Eissa, N.G.; Elsabahy, M. Machine learning algorithms for prediction of entrapment efficiency in nanomaterials. Methods 2023, 218, 133–140. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. Catboost: Unbiased boosting with categorical features. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 2018, pp. 6638–6648. [Google Scholar]
Schapire, R.E. Explaining adaboost. In Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik; Springer: Berlin/Heidelberg, Germany, 2013; pp. 37–52. [Google Scholar] [CrossRef]
Yousefzadeh, R.; Bemani, A.; Kazemi, A.; Ahmadi, M. An Insight into the Prediction of Scale Precipitation in Harsh Conditions Using Different Machine Learning Algorithms. SPE Prod. Oper. 2023, 38, 286–304. [Google Scholar] [CrossRef]
Çolak, A.B. A novel comparative analysis between the experimental and numeric methods on viscosity of zirconium oxide nanofluid: Developing optimal artificial neural network and new mathematical model. Powder Technol. 2021, 381, 338–351. [Google Scholar] [CrossRef]
Livingston, F. Implementation of Breiman’s Random Forest Machine Learning Algorithm. Mach. Learn. J. Pap. 2005, 1–13. Available online: https://datajobs.com/data-science-repo/Random-Forest-[Frederick-Livingston].pdf (accessed on 1 May 2024).
Patel, H.H.; Prajapati, P. Study and analysis of decision tree based classification algorithms. Int. J. Comput. Sci. Eng. 2018, 6, 74–78. [Google Scholar] [CrossRef]
Gholami, M.; Ranjbargol, M.; Yousefzadeh, R.; Ghorbani, Z. Integrating three smart predictive models using a power-law committee machine for the prediction of compressive strength in masonry made of clay bricks and cement mortar. Structures 2023, 55, 951–964. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Sahin, F.; Genc, O.; Gökcek, M.; Çolak, A.B. An experimental and new study on thermal conductivity and zeta potential of Fe₃O₄/water nanofluid: Machine learning modeling and proposing a new correlation. Powder Technol. 2023, 420, 118388. [Google Scholar] [CrossRef]
Feng, L.; Zhong, K.; Liu, J.; Ghanbari, A. Applying supervised intelligent scenarios to numerical investigate carbon dioxide capture using nanofluids. J. Clean. Prod. 2022, 381, 135088. [Google Scholar] [CrossRef]
Hosseini, S.; Khandakar, A.; Chowdhury, M.E.H.; Ayari, M.A.; Rahman, T.; Chowdhury, M.H.; Vaferi, B. Novel and robust machine learning approach for estimating the fouling factor in heat exchangers. Energy Rep. 2022, 8, 8767–8776. [Google Scholar] [CrossRef]

Figure 1. CatBoost tree building.

Figure 2. AdaBoost learning algorithm.

Figure 3. Random Forest tree building.

Figure 4. Comparing the cumulative ARD% frequency of the designed ML methodologies.

Figure 5. Comparing the violin shapes of actual and predicted values of alumina agglomeration size.

Figure 6. Linear correlation between actual values of D_a and their related predictions by the CatBoost model for (a) all experimental data and (b) zoomed version between 0 and 500 nm.

Figure 7. Histograms of the training/testing residual errors presented by the CatBoost model.

Figure 8. Applying the Bland–Altman technique to evaluate the reliability of the CatBoost model’s predictions.

Figure 9. The actual and predicted profiles of KDE versus magnitude for nano-alumina agglomeration size.

Figure 10. Discrimination among valid, outlier, and out-of-leverage nano-alumina agglomeration records.

Figure 11. The effect of alumina concentration in water and sonication time on the agglomeration size (without surfactant, pH = 7, and 308 K).

Figure 12. Experimental and modeling profiles of the agglomeration size versus pH at different alumina concentrations in water (without surfactant, 298 K, 10 min sonication, 200 W, and 20 kHz).

Table 1. Summary of the literature data for the alumina agglomeration size in water-based nanofluids [45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73].

Variable	Min	Average	Max	Std. Deviation	Observations
Alumina dose (vol%)	0	0.41	3	0.64	345
Temperature (K)	288	298.8	320	4.9	345
pH (-)	1.0	6.9	13.4	2.25	345
Ultrasonic time (min)	0	1042.6	40,505	5085.8	345
Ultrasonic power (W)	50	303.9	750	239.6	345
Ultrasonic frequency (kHz)	19	28.9	60	10.6	345
Surfactant concentration (wt%)	0	0.04	0.99	0.14	345
Agglomeration size (nm)	5.5	170.1	1173	541.6	345

Table 2. The key topological features (hyperparameters) of the ML methodologies, their investigated ranges/types, and the best candidate for each hyperparameter.

ML Methodology	Hyperparameter	Investigated Domain	The Best One
Random Forest	No. of estimators	10–100 (with an increment size of 10)	20
	Split quality condition	squared error, absolute error, Friedman MSE, Poisson	Friedman MSE
	Max. depth of trees	2–9 (with an increment size of 1)	8
	Max. features	sqrt, log2, 1, 2	sqrt
	Min. samples split	2–4 (with an increment size of 1)	2
AdaBoost	No. of estimators	10–100 (with an increment size of 10)	30
AdaBoost	Learning rate	0.1–0.7 (with an increment size of 0.2)	0.5
Extra Trees	No. of estimators	10–100 (with an increment size of 10)	20
	Split quality condition	linear, absolute error, exponential, square	Absolute error
	Max. depth of trees	2–8 (with an increment size of 1)	8
	Max. features	sqrt, log2, 1, 2	log2
	Min. samples split	2–4 (with an increment size of 1)	3
CatBoost	No. of estimators	10–100 (with an increment size of 10)	20
	Model size regularization	0–30 (with an increment size of 5)	0
	Max. depth of trees	2–9 (with an increment size of 1)	8
	L2 regularization	0–30 (with an increment size of 5)	0
	Learning rate	0.1–0.7 (with an increment size of 0.2)	0.5
MLP	No. of hidden neurons	1–10 (with an increment size of 1)	9
	Activation function of hidden layer	Linear, radial basis function, tangent, and logarithm sigmoid	Tangent sigmoid
	Activation function of output layer		Logarithm sigmoid
	Training algorithm	Gradient descent, Scaled conjugate gradient, Bayesian regularization, Levenberg–Marquardt	Scaled conjugate gradient

Table 3. Comparing the performance of involved ML methodology by four statistical uncertainty indices.

ML Methodology	Category	AARD%	RAE%	RMSE	R
AdaBoost	Cross-validation	39.10	50.17	857.65	0.7486
	Testing	38.57	50.88	765.47	0.9659
	Whole database	39.02	50.28	844.40	0.7785
CatBoost	Cross-validation	6.38	14.11	271.96	0.9745
	Testing	8.86	6.23	45.31	0.9991
	Whole database	6.75	12.83	251.25	0.9762
Extra Trees	Cross-validation	25.14	50.77	892.29	0.8660
	Testing	25.94	49.30	813.78	0.9241
	Whole database	25.26	50.53	880.90	0.8727
RF	Cross-validation	13.02	34.38	724.06	0.8710
	Testing	18.39	28.09	399.54	0.9960
	Whole database	13.83	33.36	685.06	0.8744
MLP	Cross-validation	9.75	33.93	646.23	0.8558
	Testing	9.88	19.92	300.35	0.9967
	Whole database	9.77	31.65	606.84	0.8717

Table 4. The relevancy between agglomeration size and independent variables.

Variable	Pearson Coefficient	Meaning	Relevancy (%)
Alumina dose (vol%)	0.0427	Direct relation	15.0
Surfactant concentration (wt%)	−0.0276	Inverse relation	9.7
pH (-)	0.0183	Direct relation	6.4
Ultrasonic time (min)	−0.0079	Inverse relation	2.8
Ultrasonic power (W)	0.0719	Direct relation	25.3
Ultrasonic frequency (kHz)	−0.1088	Inverse relation	38.3
Temperature (K)	0.0071	Direct relation	2.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vaferi, B.; Dehbashi, M.; Alibak, A.H. Cutting-Edge Machine Learning Techniques for Accurate Prediction of Agglomeration Size in Water–Alumina Nanofluids. Symmetry 2024, 16, 804. https://doi.org/10.3390/sym16070804

AMA Style

Vaferi B, Dehbashi M, Alibak AH. Cutting-Edge Machine Learning Techniques for Accurate Prediction of Agglomeration Size in Water–Alumina Nanofluids. Symmetry. 2024; 16(7):804. https://doi.org/10.3390/sym16070804

Chicago/Turabian Style

Vaferi, Behzad, Mohsen Dehbashi, and Ali Hosin Alibak. 2024. "Cutting-Edge Machine Learning Techniques for Accurate Prediction of Agglomeration Size in Water–Alumina Nanofluids" Symmetry 16, no. 7: 804. https://doi.org/10.3390/sym16070804

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cutting-Edge Machine Learning Techniques for Accurate Prediction of Agglomeration Size in Water–Alumina Nanofluids

Abstract

1. Introduction

2. Literature Data for the Agglomeration Size in Water–Alumina Nanofluids

3. Description of Machine Learning Models

3.1. CatBoost

3.2. AdaBoost

3.3. Random Forest (RF)

3.4. Extra Trees (ET)

3.5. Multilayer Perceptron (MLP) Neural Networks

4. Results and Discussion

4.1. Training and Testing Series

4.2. Statistical Accuracy Monitoring

4.3. Models’ Design

4.4. The Best Model Selection

4.5. Inspection of the Performance of the Best Model

4.6. Validity Domain Inspection

4.7. Relevancy Test

4.8. Trend Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI