*3.1. Case Study and Input Parameters*

Mahshahr in Khuzestan Province, near the Persian Gulf, located in southwest Iran, was selected for developing petrochemical industries over the past three decades. Different types of precast piles were constructed in various projects built or under construction in this area. The original soil of this region was clay to silty clay with an average plasticity index range of 8 to 20 and SPT counts (2 to 15) down to at least 30 m. In order to determine the pile bearing capacity precisely and to optimize the required pile embedment depths, various "test piles" were driven at different points on the sites. Pile Dynamics Analyzer (PDA) equipment was used to perform a dynamic load test (DLT) on all test piles at Endof-Driving (EOD) and Beginning-of-Restrike (BOR) conditions. The DLT program was performed in three phases to verify the variations in the pile capacities with time. The first phase of the DLT was carried out simultaneously as driving the test piles (EOD time). The next phase of tests was performed at different times after the initial driving of piles. In addition, some axial static load tests (SLTs) were carried out, loading the piles to their ultimate capacities. Test results show that a significant "soil setup" has occurred.

A database containing information about 256 data samples was utilized to develop the pile bearing capacity models. There are five independent variables to predict the target variable: pile setup. Independent variables cover a range of information about pile and soil properties i.e., pile diameter (PD, m), length of pile (LOP, m), initial bearing capacity (IC, kPa), time after EOD (T, days), and undrained shear strength (Su, kPa). The dependent variable in this database is the ultimate capacity (UC) of the pile (kPa) measured through the site and other mentioned parameters.

## *3.2. Statistical Information on the Data*

Five relevant factors, including LOP, PD, IC, Su, and T, were measured to build a database for developing the intelligent model to forecast UC. The database is composed of 256 datasets. Statistical analysis was applied to analyze the collected database. Figure 4 presents the boxplots of input and output variables. The box plots are not symmetrical, the database is not a normal distribution, and many data points exceed the upper and lower tentacles of the boxplots. Because the data distribution is unknown, these outliers cannot be eliminated. As shown in Figure 5, Pearson correlation coefficients in Equation (1) between any two variables are calculated [45], and the deeper the color, the stronger the positive correlation, whereas the lighter the color, the stronger the negative correlation. It can be seen that UC has a negative correlation with LOP and PD in five input variables.

$$\sigma = \frac{\sum\_{i=1}^{n} \left(X\_i - \overline{X}\right) \left(Y\_i - \overline{Y}\right)}{\sqrt{\sum\_{i=1}^{n} \left(X\_i - \overline{X}\right)^2} \sqrt{\sum\_{i=1}^{n} \left(Y\_i - \overline{Y}\right)^2}} \tag{1}$$

where *Xi* and *Yi* are variables, *X* and *Y* are their mean values, and n is the total number of data points.

**Figure 4.** Six input parameters and their box plots.

**Figure 5.** The heatmap of inputs and their correlations.
