1. Introduction
Grounding systems come in various forms and sizes, with their primary function being the safe dissipation of electric current into the surrounding soil. This dissipated current may originate from human activities (including direct current, alternating current, or transient current) or natural phenomena (such as lightning currents). Lightning currents [
1], or transient currents in general, including those generated by switching operations [
2], uniquely impact grounding systems. Specifically, when a grounding system is subjected to a transient current, only a portion of the grounding system effectively dissipates the current into the surrounding soil, while the remaining part of the grounding system plays a passive role. Throughout the years, numerous researchers have extensively studied this phenomenon in grounding systems of varying complexity. For example, effective lengths of simple horizontal or vertical grounding electrodes are examined in [
3,
4] and, more recently, in [
5,
6,
7,
8], to name a few. More complex grounding electrode configurations and their effective areas are analyzed in [
9] and, more recently, in [
10,
11,
12]. In addition to analyzing the effect of this phenomenon, some researchers have developed approximation formulas to predict the area (or length) of the grounding system that participates in the dissipation of transient current depending on several input parameters such as soil resistivity, the rise time of the dissipated lightning function, and the peak value of the lightning current [
9,
13,
14,
15]. These approximation formulas basically utilize the least-squares method to fit the selected formula to the analytically computed data. In this paper, we will improve upon this approach by providing our own approximation formulas with the addition of utilizing advanced machine learning (ML) regression models to achieve a much better fit.
In the scope of this paper, only one type of grounding system will be observed—a radial-based grounding system as described in the international standard IEC 62305-3 [
16]. This grounding system, the so-called type-A grounding system, is characterized by a set of horizontal electrical ground electrodes, buried parallel to the Earth’s surface, all radiating from a central point. The simplest form of this grounding system, which consists of two grounding electrodes, will be the object of observation in this paper. This configuration is mainly used in lightning protection of relatively low dislocated objects such as meteorological stations that have only one conductor. The standard [
16] states that in the type-A arrangement, the minimum number of ground electrodes should be one for each down-conductor and at least two for the entire lightning protection system. Note that this grounding electrode with a central point lightning injection is also observed and analyzed in [
11,
15]. In this paper, it is our intention to model the mentioned type-A grounding system using a well-tested frequency-domain-based algorithm [
17] and to subject the grounding system model to a number of various lightning strikes with the purpose of finding the effective length of the grounding system. To further refine our effective length analysis, we will modify two additional input parameters alongside the lightning current: the burial depth of the grounding system and the soil resistivity. Note that the influence of the burial depth of the grounding electrode on its effective length was not analyzed in any of the available references. This is to be expected, since the electrodes are almost always buried at a standard prescribed depth. However, as in all practical cases, a scenario may occur where the actual burial depth will deviate from the prescribed depth, so we included this as an input parameter to our model. It is also important to note here that in our analysis we disregarded the frequency dependence of the soil parameters, and we disregarded the beneficial effect of the soil ionization phenomenon [
18]. This nonlinear effect, in reality, reduces the effective length value of the grounding electrode, so in our analysis, we practically conservatively overestimate the effective length values. Due to this fact, the peak value of the lightning current will not influence the effective length in our approximation since the soil ionization is neglected. In addition to disregarding the soil ionization, we assumed a constant soil permittivity value in all considered cases since, in our preliminary analysis, we found that the effect of this parameter on the solution was less significant than the effect of the chosen input parameters. By varying the chosen parameters (soil resistivity, rise time of the lightning current, and the burial depth of the grounding system), we aim to generate a comprehensive dataset containing effective length values of the type-A grounding system for various input parameter combinations. This extensive dataset will serve as a foundation for applying various regression algorithms with the aim of developing predictive models that will enable users to accurately estimate the effective length of the type-A grounding system in most practical cases.
To address this research goal, we explored multiple approaches. Initially, we tested various mathematical regression functions on the input dataset, utilizing nonlinear least-squares estimation to optimize the function parameters for a better fit. Through this process, we identified two regression functions that provided satisfactory results and are relatively easy to calculate. To further improve our approximations, we then applied a range of ML regression models, training them on the dataset. Our testing revealed that only a subset of these ML models delivered superior approximation results. In this paper, we will present only the best-performing ML regression model to avoid redundancy. We validated both the mathematical regression functions and the ML regression model using standard regression quality metrics. This will be elaborated on in the corresponding section.
The paper is organized as follows. In the second section, following the Introduction, we provide a summary of the methodological approach used in this study. This includes a description of the observed type-A grounding system [
16], the definition of the effective length used in the paper, and an explanation of the iterative procedure for determining the effective lengths of the type-A grounding system for various input parameters. The third section features an overview of the input parameters that were varied to generate the effective length dataset. The section includes a detailed description of each input parameter along with its observation interval and an initial feature analysis to identify which parameters most significantly impact the effective length. Additionally, this section describes the resulting dataset. In the fourth section, on the basis of the described dataset, we develop two simple mathematical regression functions followed by a detailed regression quality analysis. These formulas are intended for quick engineering use. It is important to emphasize here that the developed formulas take into account the burial depth of the grounding electrodes unlike the formulas available in the literature. In the final section, we explore the possibility of applying a specific ML regression model to our dataset in order to further increase the accuracy of the effective length approximation. The Gaussian process regression model was used in our paper since it performs extremely well on smooth functions, especially when a suitable kernel is selected. The model was trained and tested in the section and the reader was provided with the optimized values of the model hyperparameters so the results may be replicated without the training procedure.
3. Analysis of Input Parameters and an Overview of the Effective Length Dataset
Using the previously described methodology, we performed a series of numerical simulations to determine the influence of various input parameters (features) on the effective length of the type-A grounding system. Specifically, we performed a total of 880 simulations, varying three key features: soil resistivity, burial depth of the ground electrode, and the rise time of the lightning current dissipated into the surrounding soil. Please note that the results of simulations are available for download as a public dataset [
24].
In the international standard [
16], a type-A grounding system is recommended for use in soils with resistivity values up to 3000
m, effectively establishing an upper limit for our selected feature set. As for the lower limit of our soil resistivity feature set, reference [
25] and several other sources indicate that soils originating from the Cretaceous period exhibit “unusually low” resistivity values, with loam and clay reaching as low as 10
m. With the lower and upper limits established, other values in our soil resistivity feature set are (more or less) uniformly selected within the defined interval, resulting in a set of 11 soil resistivity values:
= {10, 50, 100, 250, 500, 750, 1000, 1500, 2000, 2500, 3000}
m. Note that more soil resistivity values were selected in the interval up to 1000
m for our feature set, which is consistent with reference [
25], where a larger number of soil types are also listed in the mentioned interval. Note that in [
25], soil types of resistivity values 1000
m and 3000
m are characterized as having “high” and “very high” resistivity, respectively, indicating that soils with these kinds of resistivities are less common, which is consistent with our feature set selection.
The second feature we varied in the mentioned simulations that potentially influenced the effective length of the grounding electrode was the electrode burial depth. Similar to the soil resistivity value selection, in this case, we also chose a realistic interval of electrode burial depths, ranging from 0.25 m to 2 m (see for example [
26]). This resulted in a set of eight type-A grounding system burial depths:
d = {0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0} m.
The third and final feature we varied in the simulations was the rise time of the lighting current that is dissipated into the surrounding soil via the type-A grounding system. Observing the recent statistical analysis of real lightning strikes in [
27] and the rise times prescribed for positive and negative subsequent strokes in the international standard [
28], the following set of ten rise time values was selected:
= {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0}
s.
An excerpt from the dataset [
24] containing all combinations of previously described features along with the computed values of the type-A grounding system effective length (of each electrode) is provided in
Table 1. A visualization of these data is slightly problematic since the dataset contains three feature columns and one response column, which would correspond to a 4D plot. Although possible, we found that the readers would best benefit if we isolated, for example, all the data for the burial depth of 0.5 m. Note that a 3D surface plot has been generated in
Figure 3 instead of only the data points. This visual interpolation was performed solely to improve the reader’s viewing experience.
After performing all 880 simulations and determining the effective lengths of each type-A grounding electrode, a preliminary analysis of feature importance was performed using the Minimum Redundancy Maximum Relevance (MRMR) algorithm [
29]. As expected, it was determined that not every selected feature has a significant impact on the effective length of the grounding electrode. Importance scores of all three selected features on the effective length are depicted in
Figure 4. Similar feature importance was also obtained by the ANOVA F-Test, but results were not provided here to avoid redundancy.
Observing
Figure 4, it can be seen that the soil resistivity dominates as the most crucial feature influencing the effective length of the grounding electrode, boasting an MRMR importance score of 1.7347. Rise time trails significantly behind, with a score of 0.0659, whereas the burial depth of the grounding electrode, as a factor affecting its effective length, was considered practically inconsequential, evident from its MRMR importance score of 0.0015.
4. Nonlinear Regression of the Type-A Grounding System Effective Length
The first approach to developing a predictive interpolant model of the effective length of each electrode based on the available dataset [
24] will be the selection of a suitable mathematical model function with which we will try to approximate the effect of the independent variables (features) on the electrode effective length. In order to test our approximation function properly, we first divided the 880 simulation results into a randomly chosen training set comprised of 90% of the simulation results, whereas the remaining 10% of the results represent the test set. The main idea, as usual, is to use the training set to perform the regression procedure and then use the test set to validate the quality of the produced approximation function and its estimated coefficients. After a number of numerical and mathematical experiments, a simple product of power functions was selected as a model function for the prediction of the effective length of each electrode for the type-A grounding system:
where
s are the unknown regression function coefficients. We fitted the selected model function to the input data from the training set using a nonlinear least-squares regression algorithm [
30] and thus obtained optimized values of the unknown coefficients. These estimated coefficients, along with their standard errors, are given in
Table 2.
In order to confirm the validity of the derived regression function, we conducted a comprehensive analysis of our regression model’s performance using several regression quality metrics. This included creating scatter plots of actual vs. predicted values, as well as a residuals plot for both the training and test sets. Additionally, we evaluated three standard regression metrics [
31,
32]: the Root Mean Squared Error (RMSE), the
value, and the Mean Absolute Error (MAE), for both datasets.
Figure 5a,b depict the actual vs. predicted values of the effective length of the type-A grounding system obtained using Equation (
2) for the training set and the test set, respectively. Similarly,
Figure 6a,b depict the residuals of predicted values obtained using Equation (
2) for the training set and the test set, respectively. Observing
Figure 5a for this simplest approximation function, when applying the regression formula given by Equation (
2) with the optimized coefficient value given in
Table 2 on the training set, the produced results deviate from the perfect score moderately. They, nevertheless, yield relatively accurate results, although the data show clear signs of heteroscedasticity [
33], i.e., the heterogeneity of variance. The moderately accurate results provided by the regression function are further validated on the test set (
Figure 5b), where the regression function defined by Equation (
2) performs similarly. A different visualization of this model error is presented in
Figure 6a,b, where the residuals of the regression function are depicted relative to the predicted values. It is evident that, for this case, the residuals vary ±10 m from the predicted responses, especially for greater values of the grounding system’s effective length. Note that the regression function displays a certain amount of heteroscedasticity for both plot types, which indicates that the variance progressively increases as the predicted values increase. A Breusch–Pagan test was performed for the regression function given by Equation (
2) with parameters given in
Table 2, and the
p-values obtained were 1.2231 ×
for the training set and
for the test set, which are well below the usual cut-off value of 0.05. Attempts to increase these
p-values using the usual methods, such as input data transformation, yielded little to no effect, although this was to be expected considering the simplicity of the regression function in question.
Moving on to the numerical regression quality metrics mentioned before—these metrics confirm the previous visual analysis (
Table 3). The
value of the regression function is high for both the training and test sets, which is to be expected given that the results cluster around the perfect score. The average error, measured by the RMSE, is consistent across both sets, approximately 4 m. In conclusion, this simplest regression function will yield moderately satisfactory results.
In addition to this simplest regression formula, we also attempted to fit the input data using a slightly more complex model function with hopes of obtaining more accurate results:
As before, we fitted the selected model function to the input data from the training set using a nonlinear least-squares regression algorithm and obtained optimized values of the unknown coefficients. These values, along with their standard errors, are given in
Table 4.
Figure 7a,b depict the actual vs. predicted values obtained using Equation (
3) for the training set and the test set, respectively.
Figure 8a,b depict the residuals vs. predicted results obtained using Equation (
3) for the training set and the test set, respectively. It is immediately noticeable that by introducing a sum of two products of power functions, more accurate predicted results were obtained. Results are clustered much closer to the perfect score line for both the training set and the test set (
Figure 7a,b). This is also seen when observing the residuals plot (
Figure 8a,b), where we can see that the maximum residual values peak at cca 4 m. Note that in this case, heteroscedasticity is still present in the model, which is observable from previous figures and from the
p-values of the Breusch–Pagan test—
for the training set and 0.000237 for the test set.
Again, the regression quality metrics confirm this visual analysis (
Table 5)—the
is closer to 1, and the RMSE is almost three times lower for both the training and test sets. Similarly, other metrics also reflect this level of approximation quality.
5. Using Advanced Supervised ML Regression Models for a Better Fit
In an attempt to enhance the accuracy of the predictive model and, possibly, reduce the heteroscedasticity present in the models from the previous section, we utilized a comprehensive range of supervised ML regression algorithms. These included various types of regression trees (fine, medium, and coarse) [
34], support vector machines with diverse kernels such as quadratic, cubic, and Gaussian kernels of varying coarseness [
35], Gaussian process (GP) regression models with different kernels [
36], kernel approximation regression models, ensembles of trees [
37], and neural networks [
38]. We obtained the best results using the GP regression models followed closely by the neural network approach whereas other ML algorithms did not prove to be suitable for our interpolation problem. Note that we performed a random search procedure to optimize the selection of suitable hyperparameters for each tested ML regression model. Therefore, we decided to provide only the best fit obtained using the GP regression in this chapter to avoid redundancy.
The mentioned GP regression is a well-known non-parametric Bayesian approach to regression that utilizes stochastic Gaussian processes and is generally considered an excellent choice for multidimensional interpolation problems. It completely eliminates the usage of the least-squares method in the regression analysis, which could possibly eliminate the previously observed heteroscedasticity of predicted results. In this paper, the GP regression model will be implemented using the scikit-learn library [
39].
Basically, the GP regression model applied to any input data as an approximation model can be briefly described as a combination of a prior mean function and a covariance or kernel function:
where
represents the mean function,
represents the covariance (kernel) function, and
represents the input data points. In our analysis, we assumed that the prior mean is the training data’s mean. The kernel function
k is practically a function that accepts as input two points in the input space and, as an output, produces how similar these points are, based on some notion of distance between them. A whole array of kernel functions were tested in this paper, including the radial basis function (RBF) kernel, Matérn kernel, rational quadratic kernel, and Exp-Sine-Squared kernel, but the best results were obtained using the non-isotropic Matérn kernel function.
The Matérn covariance (kernel) function [
40] in general represents a popular choice in GP regression analysis due to its flexibility in controlling the smoothness of the function. Unlike the RBF kernel, it does not over-smooth the predicted values. Moreover, the non-isotropic version of it can additionally handle different length scales for different dimensions. The general form of the Matérn kernel function is represented by the following general term:
where
is the kernel variance,
l is the length-scale parameter,
is the Euclidean distance between input points,
is a modified Bessel function of the second kind or order
, and
is the Gamma function. Modification of the
parameter significantly changes the kernel function. For example, as
approaches infinity, the Matérn kernel converges to the RBF kernel. Of special interest are two constant values of the parameter
with which we eliminate the modified Bessel function from the kernel:
and
. For these cases, the modified Bessel function vanishes from the kernel expression and the predicted values produced using this kernel yield very good results without over-smoothing the predicted values. After the numerical testing procedure, we found that the Matérn kernel with
yielded the best results when applied to our dataset:
The length-scale parameter
l can be set differently for all dimensions of the input data, which makes the kernel anisotropic. We will use this anisotropic variant of the Matérn 3/2 kernel given by Equation (
6).
The hyperparameters and l of the selected kernel are optimized in the scikit-learn library by maximizing the log-marginal-likelihood (LML). We conducted a number of optimizer restarts since it is well-known that the LML may have multiple local optima. By performing the optimization procedure repeatedly, the best possible hyperparameter values are obtained. In the optimization procedure, the first optimizer run starts from the initial hyperparameter values of the kernel whereas the subsequent runs are conducted from hyperparameter values that have been chosen randomly from the range of allowed values.
The training of the GP regression model and the optimization of hyperparameters were performed similarly to before on the training set containing 90% of the dataset samples. Training and the optimization of hyperparameters of the anisotropic Matérn 3/2 kernel yielded the following optimal values of the kernel for our dataset: and .
Similar to the previous section, the trained GP regression model was validated on the test set containing the remaining 10 % of dataset samples.
Figure 9a,b depict the actual vs. predicted values obtained using the trained GP regression model for the training set and the test set, respectively.
Figure 10a,b depict the residuals vs. predicted results obtained using the trained GP regression model for the training set and the test set, respectively. It is immediately noticeable that introducing an advanced regression model yielded even better-fit results with maximum residual values peaking at cca 1 m.
The regression quality metrics also reflect this visual analysis (
Table 6)—the
is much closer to 1, and the RMSE is almost nine times lower for both the training and test sets. Although the predicted results look homoscedastical in nature, a Breusch–Pagan test was again performed to confirm the elimination of previously observed heteroscedasticity. The trained and optimized GP regression model yielded
p-values of 0.18147 for the training set and 0.087989 for the test set, which is above the usual cut-off value of 0.05. Therefore, we can conclude that when using the trained GP regression model to interpolate effective length values in the prescribed input data intervals, heteroscedasticity will be absent from the predicted results.
6. Discussion and Conclusions
In this section, we will briefly summarize the main features of the three regression procedures previously presented. We developed two mathematical regression formulas and an ML regression model based on GP to predict the effective length of a subset of the type-A grounding system with varying degrees of accuracy. All developed procedures are practically interpolant predictive models, which are valid for the following ranges of input parameters: m m, s s, and m m. We believe these intervals encompass most practical cases of potential use.
The analyzed subset of the type-A grounding system consists of two opposing grounding electrodes. This configuration is used mainly in grounding systems of very small dislocated objects such as, for example, hydrological or meteorological stations. These kinds of objects, due to their size, most often have only one down conductor, whereas the standard [
16] prescribes a minimum of two ground electrodes per grounding system—thus the centrally fed observed system. The standard [
16] also prescribes curves of minimal length
of type-A grounding electrodes depending only on the soil resistivity and the Lightning Protection Level (LPL) chosen. In our paper, we analyzed the effective lengths of the electrodes for different values of soil resistivity, rise time, and even burial depth for this subset of the type-A grounding system. Although a comparison between these two lengths would be interesting, it is somewhat difficult to perform since the standard provides, for example, a curve of minimal length for the LPL I independent of the lightning current rise time and the burial depth. If we, for example, consider our subset of type-A grounding system buried at a depth of 0.5 m in a soil of resistivity 3000
m, our analysis yields effective lengths of each of the electrodes ranging from 78.25 m for a 1
s rise time to 134 m for a 10
s rise time. The standard [
16] yields a minimal length
of the grounding electrode of 80 m for our chosen scenario for all cases. On the other hand, our proposed regression formula and the ML regression model based on GP provide the readers with optimal lengths of the grounding electrodes for the considered subset of the type-A grounding system.
All the presented regression approaches yield satisfactory results, especially when the accuracy of the prediction is put into context with the complexity of usage of that particular approach. Naturally, the simplest regression formula, which is the easiest to use, produces results with the highest deviations between the predicted effective length values and the actual values, although these deviations are not substantial. The slightly more complex regression formula produces significantly lower deviations, as demonstrated in the previous sections. Both formulas are intended for quick engineering estimates.
In contrast, the last approach, a GP-based ML regression model, is designed for more complex estimations of the effective length of the considered grounding system where accuracy is more important. This model provides significantly more accurate prediction results compared to the other two models, as seen in the previous section. We presented only one ML regression model in the paper—the one that consistently yielded the best prediction results in our numerical experiments. Many other models were tested but were intentionally excluded from the paper to maintain clarity and avoid clutter.
It is worth pointing out at the end of the paper that in our future work we plan to extend our effective length analysis to complex grounding systems. These include, for example, end-fed simple grounding rods as well as various interpretations of the type-A and type-B grounding systems according to [
16]. In addition to this, we will explore the effective area analysis of practical grounding systems that feature a more or less typical layout. In our opinion, this would benefit engineers when designing and sizing grounding grids in a number of scenarios in which it is difficult to achieve a satisfactorily low grounding resistance.