Next Article in Journal
Is the Fine Root Tensile Strength Predictable from Structural and Morphological Traits across Mycorrhizal Types in Cool-Temperate Woody Species?
Previous Article in Journal
Temporal Evolution of Vapor Pressure Deficit Observed in Six Locations of Different Brazilian Ecosystems and Its Relationship with Micrometeorological Variables
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling Height–Diameter Relationship Using Artificial Neural Networks for Durango Pine (Pinus durangensis Martínez) Species in Mexico

by
Yuduan Ou
1 and
Gerónimo Quiñónez-Barraza
2,*
1
College of Coastal Agricultural Sciences, Guangdong Ocean University, Zhanjiang 524088, China
2
Campo Experimental Valle del Guadiana, Instituto Nacional de Investigaciones Forestales, Agrícolas y Pecuarias (INIFAP), Carretera Durango—Mezquital km 4.5, Durango 34170, Dgo., Mexico
*
Author to whom correspondence should be addressed.
Forests 2023, 14(8), 1544; https://doi.org/10.3390/f14081544
Submission received: 8 July 2023 / Revised: 25 July 2023 / Accepted: 26 July 2023 / Published: 28 July 2023
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Abstract

:
The total tree height (h) and diameter at breast height (dbh) relationship is an essential tool in forest management and planning. Nonlinear mixed effect modeling (NLMEM) has been extensively used, and lately the artificial neural network (ANN) and the resilient backpropagation artificial neural network (RBPANN) approach has been a trending topic for modeling this relationship. The objective of this study was to evaluate and contrast the NLMEN and RBPANN approaches for modeling the h-dbh relationship for the Durango pine species (Pinus durangensis Martínez) for both training and testing datasets in a mixed-species forest in Mexico. The knowledge of this relationship is important for forest management and planning in Mexican Forestry. The total dataset considered 1000 plots (each plot 0.10 ha) (11,472 measured trees) randomly selected from 14,390 temporary forest inventory plots and the dataset was randomly divided into two parts: 50% for training and 50% for testing. An unsupervised clustering analysis was used to group the dataset into 10 cluster-groups based on the k-means clustering method. An RBPANN was performed for tangent hyperbolicus (RBPANN-tanh), softplus (RBPANN-softplus), and logistic (RBPANN-logistic) activation functions in the cross product of the covariate or neurons and the weights for the ANN analysis. Also, a different vector of hidden layers was used for training of ANNs. For both training and testing, 10 classical statistics (e.g., RMSE, AIC, BIC, and logLik) were computed for the residual values and to assess the approaches for the h-dbh relationship. For training and testing, the ANN approach outperformed the NLMEM approach, and the RBPANN-tanh had the best performance in both the training and testing of ANNs.

1. Introduction

Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on the development of algorithms and models that enable a computer to learn from a specific dataset and make predictions or take actions without being explicitly programmed [1,2]. One of the most used ML techniques is the artificial neural network (ANN), and the resilient backpropagation artificial neural network (RBPANN) performs supervised ML in multi-layer perceptron. The main principle is to eliminate the harmful influence of the size of the partial derivative on the weight step [3,4,5]. ANNs are computational models inspired by natural neurons, and they represent a generalization of mathematical models of human cognition or neural biology [1,6,7]. In ANNs, the training and testing datasets are used to train and evaluate the performance of the network. The training dataset is used to train the neural network. The testing dataset is a separate dataset that is used to evaluate the performance of the trained neural network [1,8,9].
One of the most important relationships in forest modeling is the total tree height and diameter at breast height (h-dbh), and this relationship is usually applied in forest inventory or for height estimation in forest management and planning [10]. The knowledge of the h-dbh relationship is fundamental in both developing and applying many growth and yield models [11,12]. This relationship has mainly been studied with nonlinear mixed effect modeling (NLMEM) with fixed and random parameters for several species for group-level or ecological conditions [10,11,13,14,15,16]. Lately, this relationship has been studied with AI, and ML through an ANN has been used [7,17]. Also, other variables such as crown width [18], biomass [19], volume [20], forest fire [21], and annual radial growth with competition indices [22] have been studied with different ML algorithms. Occasionally, a clustering analysis based on unsupervised ML has been included to group similar data points together based on their inherent characteristics or similarities [1,23,24,25]. An unsupervised clustering analysis can identify patterns or structures in datasets to improve the fitted models in forest modeling. For AI algorithms, prediction is more important than inference. In this context, models or algorithms based on ANNs could give better estimations than the NLMEM approach, and this is worth reporting on.
Specifically, in Mexican Forestry the h-dbh relationship has been extensively studied with NLMEM for local and generalized models, and occasionally an unsupervised cluster analysis was included in modeling [12,26,27]. NLMEM is better than fitted models using the ordinary least squares method and those using random parameters to explain the variability between groups, sites, or ecological regions. Lately, ML algorithms have been used in forestry research and the results outperform the NLMEM approach for the h-dbh relationship [7]. However, in Mexican Forestry, ANN algorithms have not been applied to model the h-dbh relationship, and it is necessary to evaluate and compare these approaches. The main model used for NLMEM has been the Chapman–Richards model [28], which is based on sigmoid relationship growth based on age [29]. The significance of Durango pine (Pinus durangensis Martínez) extends beyond its ecological value; it also plays a pivotal role in wood production in the region of Durango, Mexico. Moreover, the state of Durango holds a preeminent position as the most crucial hub for timber production across the entirety of Mexico. In the study area, Durango pine is the most frequent species in mixed-species forests.
Considering the above schemes and the context of AI in forestry research, this study aims to model the h-dbh relationship for the Durango pine species by NLMEM and ANNs with an unsupervised clustered dataset for training and testing. The algorithms were compared in both training and testing phases and some conventional statistics such as root mean square error, coefficient of determination, Akaike’s information criterion, Bayesian information criterion, and log-likelihood were used to perform the approaches. The resilience backpropagation of the ANN (RBPANN) was employed, and three activation functions were computed and evaluated. The activation functions were tangent hyperbolicus (RBPANN-tanh), softplus (RBPANN-softplus), and logistic (RBPANN-logistic), and those were trained by resilience backpropagation in addition to maximum likelihood being used. Finally, the primary objective of this research was to assess and contrast the efficacy of ANNs and NLMEM in modeling the h-dbh relationship for the Durango pine species. Additionally, a novel algorithm is proposed for accurately estimating total tree height using the diameter at breast height and cluster-groups as the predictor variable.

2. Materials and Methods

2.1. Study Area

This study was developed in a forest community in Northern Mexico, specifically in the state of Durango. The forest community is called San Diego de Tezains (24°48.2′ to 25°19.5′ N and −105°52.2′ to −106°12.9′ W), and the total area is around 61,098 ha, of which 30,000 ha are used for forest management and timber harvesting. The main applied silvicultural treatments are based on continuous cover forestry (CCF) and rotation forest management (RFM) [30]. The silvicultural treatments for the CCF areas are based on selection, while RFM entails tree thinning and shelterwood cutting treatments with 15 years of forest cycle cutting [31]. The location of the study area is showed in Figure 1. The mean annual temperature ranges from 5 to 18 °C and the lowest temperature occurs in January (−6 °C) and the hottest in May (28 °C). The altitude varies from 1500 to 3033 m. The mixed-species stands are represented by seven genera: Pinus, Quercus, Juniperus, Cupressus, Pseudotsuga, Arbutus, and Alnus. The main species are Pinus durangensis Martínez and Quercus sideroxyla Bonpl. Lately, the improved forest management combines forest management and carbon credit offsets according to the Mexican Protocol developed by the Climate Action Reserve [32].

2.2. Dataset Description

The dataset came from temporary forest inventory plots with a random sampling design for a framework of 30,000 ha. A total of 14,390 temporary forest inventory plots were considered and the Durango pine species was selected. A random sample of 1000 plots (0.10 ha each plot) was selected in the sampling R package [33], and 11,472 measured trees were considered. Firstly, the unsupervised clustering analysis was used for grouping the dataset [1,23] according to the k-means clustering of the kmeans R package [33]. Ten clusters were generated according to density (N, trees per hectare), basal area (BA, m2), mean diameter (Dm, cm), mean total trees (Hm, m), quadratic mean diameter (QMD, cm), altitudes (A, m), slope (S, %), and aspect (As, categorical variable). All variables were standardized by having their performance values bounded at 0.0 and 1.0 [19,34,35]. The standardization was performed according to Milligan and Cooper [36] and Equation (1).
Z = ( x M i n ( x ) ) / ( M a x x M i n ( x ) )
where Z is the standardized variable, x is the variable, and Min and Max are the minimum and maximum values of x.
In the clustering analysis, the clusters (k = 10) were generated and the explained variance was 61.8812% in ten clusters. The dataset is shown in Figure 2 for the total dataset and the clusters. The unsupervised clustering analysis allows the subjective grouping of the overall dataset and the fitting of the NLMEM approach with fixed and random parameters. Also, these cluster-groups were used in the ANN algorithms.
Employing an unsupervised clustering analysis to group the dataset for modeling the h-dbh relationship using either the NLMEM approach or ANNs yielded favorable outcomes. This clustering allowed for the standardization of variables and led to enhanced results for the random parameters in NLMEM [15]. Subsequently, the cluster-groups obtained were utilized as input variables to evaluate the ANN activation functions.
The site-specific variables for the clustering analysis are recorded as descriptive statistics in Table 1. All variables were standardized with Equation (1) to improve the clustering analysis [36]. The dataset encompassed an altitudinal range spanning from 2032 to 2978 m and land slopes varying from 0 to 96%. These diverse geographical and climatic conditions gave rise to varied vegetation compositions and soil characteristics within the study area.
The total dataset (11,472 pairs of h-dbh relationships) was randomly divided in two sets: 50% for training and 50% for testing or validation. The main statistics for both the training and testing datasets are shown in Table 2 for the total tree height and diameter at breast height (h-dbh).

2.3. Nonlinear Mixed Effect Modeling (NLMEM)

The base growth model developed by Richards [28] was used to model the nonlinear h-dbh relationship. This model is based on a sigmoid curve, and it is represented by Equation (2). This model has been extensively used in this kind of relationship [7,11,26].
h i j = A 0 + α 0 1 e α 1 d b h i j α 2 + e i j
where h i j = total tree height j in plot i; A 0 = lower asymptote parameter; α 0 , α 1 , and α 1 = upper asymptote, growth rate, and slope of growth parameters; e = exponential function; d b h i j   = diameter at breast height j in plot i; and e i j = residual j in plot i. In the case of the A 0 parameter, the value was fixed at 1.3. This means the total tree height is equal to 1.3 m when the diameter at breast height is equal to 0, mentioned by Fang and Bailey [37].
For NLMEM, the parameter vector of the nonlinear model was defined according to Pinheiro et al. [38] and summarized as follows [7,11,13] (Equation (3)):
Φ k = A k λ + B k b k
where λ is the p × 1 vector of fixed parameters ( p is the number of fixed parameters in the model), b k is the q × 1 vector of the random effect associated with the l t h cluster ( q is the number of random parameters in the model), and A k and B k are the design matrices of size r × p and r × q ( r is the total number of parameters in the model) for the fixed and random effects specific to each cluster. The residual vector ( e i j ) and the random effect vector ( b k ) are frequently assumed to be uncorrelated and normally distributed with mean zero and variance–covariance matrices R k and D , respectively.
The upper asymptote parameter ( α o ) was treated as a random parameter in the analysis for each cluster ( α o l   l = 1 ,   2 ,   ,   10 ), which explains the maximum relationship between h and dbh. The random effect vector represents the variability between clusters for the asymptote parameter.

2.4. Artificial Neural Network (ANN)

ANNs are inspired by the early models of sensory processing by the brain. An ANN can easily be created by simulating a network of model neurons in a computer or specific programming language. Also, by applying mathematical algorithms that mimic the process of real neurons, we can make the network “learn” to solve many types of problems [39]. ANNs can learn by themselves. Because they have similarities with the information processing features of the human brain (nonlinearity, high parallelism, and capability to generalize), this modeling technique has the potential ability to solve problems that are difficult to formalize, such as problems of biological nature [7,39]. The resilient backpropagation artificial neural network (RBPANN) is a logical adaptative learning scheme, which performs supervised batch learning in multi-layer perceptron. The basic principle of RBPANN is to eliminate the harmful influence of the size of the partial derivative on the weight steps [3,4].
According to Anastasiadis et al. [40], the RBP for an ANN employs a sign-based scheme to update the weights in order to eliminate harmful influences of the derivatives’ magnitude on the weight updates. The size of the update step along the weight direction is exclusively determined by a weight-specific “update-value” as follows:
w i j k = i j k   i f   E w k w i j > 0 + i j k   i f   E w k w i j < 0 0   o t h e r w i s e
where E w k denotes the partial derivative of bathc error with respect to weight w i j at the kth iteration.
The second step of RBP learning is to determine the new update values, as follows:
i j k = η + i j k   i f   E w k 1 w i j E w k 1 w i j > 0 η i j k   i f   E w k 1 w i j E w k 1 w i j < 0 i j k 1   o t h e r w i s e
where 0 < η < 1 < η + .
The total number of parameters in RBPANN is five: (i) the increase factor is set to η + = 1.2 ; (ii) the decrease factor is set to η = 0.5 ; (iii) the initial update value is set to o = 0.1 ; (iv) the maximum step; (v) the minimum step.
According to Cartwright [41], the first step in using ANNs is to determine a suitable topology, optimal if possible (number of inputs and outputs, number of hidden layers, and neurons in each layer) and optimal (weights, biases, and activation functions). The process of ANNs begins by setting up the weights as small random variables. Then, each input pattern undergoes a feedforward phase, where the input signal is received and transmitted to all nodes in the hidden layer. In ANNs, every hidden node calculates the sum of its weighted input signals, applies an activation function to determine its output signal, and transmits this signal to the output node. At the output node, the final output signal is computed using the received signals from the hidden nodes [7]. Within the context of RBPANN, the associated error ( δ k ) is computed, and this error is utilized to adjust the weights. The weight correction term is determined based on the error, and it is subsequently employed to update the corresponding weights. Additionally, the δ k is transmitted to each hidden node. Each hidden node then calculates its error information term by summing the inputs received from the output node, multiplied by the derivative of its activation function [7,41,42]. According to Fausett [43] and Cartwright [41], the general formulation for RBPANN could be as follows (Equation (6)):
w i j t + 1 = α δ k Z j + μ w i j
where w i j is the bias on output unit k , α is the learning rate, δ k is the ratio of error correction weight fitted for w i j that is due to an error at output O k and also the information about the error at unit O k that is propagated back to the hidden units that feed into unit O k , Z j is the output activation of the hidden unit j , and μ is the momentum parameter (refers to the contribution of the gradient calculated at the previous time step to the correction of the weights) [41].
The activation functions used for smoothing the h-dbh relationship through RBPANN were tangent hyperbolicus (RBPANN-tanh), softplus (RBPANN-softplus), and logistic (RBPANN-logistic) functions [4,40]. These activation functions occur between the hidden layers or between the input layer and hidden layer. These functions are defined for RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic in Equations (7)–(9), respectively. Also, the derivatives are in Equations (10)–(12), respectively.
f x = e x e s e x + e s
f x = 1 1 + e s
f x = l o g ( 1 + e s )
f x = 1 e s e s e s + e s 2
f x = e s 1 + e s 2
f x = e s 1 + e s
where s = w i x i is the information of the node transmits, in which w i are the weights and x i are the input values with s 1 ,   1 , s 0 ,   , and s 0 ,   1 for RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic, respectively.
In the ANN learning process, a different vector from 1 to 10 for each hidden layer was used in a preliminary analysis, and the best results were obtained when the vector was 10 nodes for each hidden layer c(10, 10, 10). In Figure 3, the ANN plots are presented for vectors c(3, 3, 3), c(5, 5, 5), and c(10, 10, 10) in the hidden layer. The employed vectors were computed during the training process for each activation function of RBPANNs.
For RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic activation functions, the topology was as follows: (i) two inputs (dbh and cluster); (ii) one output (h); (iii) vector c(10, 10, 10) hidden layers; and (iv) 2 nodes for the first layer, 11 nodes for the second, third, and fourth layers (bias node is included), and 2 nodes for the fifth layer. The ANNs for the RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic activation functions are presented in Figure 4. Input variables are represented by “I” in nodes, hidden nodes are represented by “H”, the input variable is represented by “O”, and bias nodes are represented by “B”.
For the RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic activation functions the number of repetitions was 10, the maximum steps for training of the NN was 107, and the threshold was 0.1, which is similar to arguments used by Özçelik, Diamantopoulou, Crecente-Campo, and Eler [7]. Also, the training algorithm for ANNs was the resilient backpropagation with weight backtracking [4,40,41].

2.5. Model Fitting

For NLME, the total tree height and diameter at breast height (h-dbh) relationship was fitted in the “nlme” R package [33] and the maximum likelihood estimation method [38] was used for fixed and random parameters within cluster-groups. For ANN models the “neuralnet” R package [33] was used. For ANNs, the resilient backpropagation (RPROP) for tangent hyperbolicus (tanh), softplus (softplus), and logistic (logistic) functions was programed for smoothing the result of the cross product of the covariate or neurons and the weights [3,4,43]. All functions about fitting statistics for both training and testing were programmed in the R environment [33].

2.6. Model Performance Criteria

For both training and testing steps, the fitting statistics were obtained at two levels, the first one for the entire dataset and the second one for each cluster. The statistics were the root mean square error (RMSE), standard error of estimate (SEE), relative SEE (RSEE), fitting index (FI), mean error (E), relative E (RE), Akaike information criterion (AIC), Bayesian information criterion (BIC), and the log-likelihood (logLik). The statistics were computed as follows:
R M S E = i = 1 n ( O i P i ) 2 n
S E E = i = 1 n ( O i P i ) 2 n p
R S E E = S E E O ¯
F I = 1 i = 1 n ( O i P i ) 2 i = 1 n ( O i O ¯ ) 2
E = i = 1 n ( O i P i ) n
R E = E O ¯
A I C = n l o g i = 1 n ( O i P i ) 2 n + 2 p
B I C = n   l o g i = 1 n ( O i P i ) 2 n + p   l o g   n
l o g L i k = n   l o g i = 1 n ( O i P i ) 2 n
where O i , P i , and O ¯ are the observed, predicted, and average values of the h variable; n = observations; p = number of parameters estimated; and log = logarithm function.
In all cases, the residual values were obtained with the implementation of NLMEM or RBPANN models and the statistics were programmed in the R environment [33]. The NLMEM and ANN models were ranked based on the overall dataset and cluster-group for all fitting statistics. A ranking system of Kozak and Smith [44] was used. All fitted statistics were equally weighted and rank 1 was used for the best model and 4 for the poorest.

3. Results

3.1. Training Phase

3.1.1. NLMEM

The fitted growth equation for the h-dbh relationship by NLMEM performed well and all parameters were significantly different from zero at a 5% significance level. The relationship between total tree height and diameter at breast height can be explained with fixed and random parameters. In Table 3, the estimated parameters and their statistical properties can be found for the entire training dataset. Furthermore, the confidence interval for each parameter is recorded at a 95% confidence level.
The training phase’s fitting statistics are presented in Table 4, which includes the overall training dataset as well as the cluster-groups individually. For both the overall training dataset and cluster-groups, the fitting statistics were accurate and showed the potential to offer the NLMEM approach for the h-dbh relationship. The RMSE value for the overall training dataset was 3.1085 m, the best value was 2.4735 for cluster-group 1 (C1), and the worst value was for cluster-group 4 (3.8688). Additionally, the overall training dataset exhibited an E value of −0.0005. The highest value was observed in C4, indicating poorer performance, while the lowest value was found in C3, indicating the best performance. In terms of the AIC, C4 demonstrated the best performance, whereas C6 exhibited relatively poorer performance. The FI values ranged from 0.5378 to 0.6525 for C2 and C3, respectively, while the FI value for the overall dataset was 0.6588.

3.1.2. RBPANN

The results regarding the ANNs performed for the RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic activation functions are shown in Table 5. The statistics are in standardized variables and observed in 10 repetitions in the learning process. In this scenario, all three activation functions exhibit favorable outcomes. Specifically, both the RBPANN-tanh and RBPANN-softplus deliver comparable performances. In contrast, the RBPANN-logistic exhibited the lowest performance among the three activation functions. Additionally, the RBPANN-logistic achieved the minimum number of steps (88) required for convergence.
The fitting statistics for three ANNs applied to examine the h-dbh relationship in Durango pine are presented for both the overall dataset and each cluster-group in Table 6 for the training phase. The nine fitting statistics illustrate the accuracy of the RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic activation functions in the ANN models. The topology of each ANN, as depicted in Figure 4, exhibited satisfactory results in predicting h based on dbh and an unsupervised clustering analysis for ten groups. Overall, the estimations demonstrate similar characteristics across the three activation models. However, RBPANN-tanh exhibits certain advantages that are comparable to the other activation functions in the training phase. All ANNs were trained using the resilience backpropagation learning algorithm, and the likelihood function was employed. Across all instances of ANNs, the FI values were consistently and significantly higher than those of NLMEM. This suggests that the ANNs outperformed NLMEM in terms of fitting in training or model performance in the evaluated scenarios.
The residual and predicted values are presented in Figure 5 for each cluster-group by the RBPANN-tanh, which was the best approach in the training phase to model the h-dbh relationship. In general, the residuals ranged between −6 m and 6 m. Enhancing the training phase could involve increasing the number of repetitions or epochs; however, the computational process would be significantly time consuming. In the context of the analysis, cluster-groups 6 and 10 demonstrated notable variations or spread in the residual values. When there is a considerable dispersion in the residuals, it suggests that the ANN predictions differ widely from the actual data points within these specific cluster-groups. Despite the significant accuracy of predictions during the training phase for the RBPANN-tanh activation function, it is essential to carefully assess its performance during the testing and validation stages to ensure its robustness and generalizability.
Finally, the ranks and sums of ranks for the hierarchy of NLMEN and ANNs are presented in Table 7. The statistics for both the overall dataset and the cluster-group were ranked from 1 to 4. In terms of the overall dataset, the RBPANN-softplus exhibited the best performance during the training phase, which was comparable to the NLMEN model and other ANNs. The RBPANN-tanh activation function ranked second, while the RBPANN-logistic ranked third. On the other hand, the NLMEN approach had the lowest rank sum, indicating poorer performance compared to the other models. A similar pattern was observed within the cluster-groups, and the sum of ranks resulted in the following rankings: 176, 241, 269, and 304 for the RBPANN-tanh, RBPANN-softplus, NLMEN, and RBPANN-logistic, respectively. It is worth noting that only the RBPANN-logistic demonstrated lower performance compared to the NLMEN approach. The numbers in parenthesis indicate the ranking for models in the overall dataset.

3.2. Testing or Validation Phase

3.2.1. NLMEM

During the testing phase, 5736 pairs of heights and diameters from 50% of the dataset were utilized. Height estimations were performed using fixed and random parameters for each cluster-group provided by the NLMEM approach. The nine testing statistics were computed at two levels: for the overall dataset and for each cluster-group. Table 8 presents the statistics for testing the advantages of NLMEN for the overall dataset and for each cluster-group. All the statistics displayed satisfactory performance, which depended on the number of observations. Additionally, the cluster-groups with limited information exhibited the lowest values. Among the cluster-groups, C4 had the maximum number of observations, whereas C9 had the minimum number of observations.

3.2.2. RBPANN

During the testing phase, the results for the ANNs were similar to the training phase. The ANN utilizing the tangent hyperbolicus activation function (RBPANN-tanh) exhibited the highest performance, followed by the RBPANN-logistic, and finally the RBPANN-softplus. Table 9 shows the statistics for the testing dataset, both in the overall dataset and within each cluster-group. The FI values for the overall dataset were higher than 0.7029, with the RBPANN-tanh demonstrating the best performance and RBPANN-logistic exhibiting the poorest performance. A similar pattern was observed for other statistics such as the AIC, BIC, and logLik. Furthermore, in this instance, the ANNs demonstrated superior performance compared to the NLMEM approach.
Figure 6 displays the representation of residual and predicted values for each cluster-group obtained through the RBPANN-tanh. In this scenario, the residual dispersion appears to be larger compared to the training phase. However, this can be attributed to the utilization of a new dataset, where the predictions are made under different training conditions. Cluster-groups 4 and 10 exhibited higher levels of dispersion compared to the remaining cluster-groups. The statistics obtained for the testing dataset were slightly lower compared to those of the training dataset, but this disparity can be attributed to the fact that the approaches were tested on a different dataset. Despite this variation, the results are still considered satisfactory, indicating that both approaches exhibit promising performance in generalizing new and unseen data (Table 8 and Table 9).
Finally, the ranks and sums of ranks for the hierarchy of NLMEN and ANNs are presented in Table 10. During the testing phase, the RBPANN-tanh achieved the highest rank of 1 with a sum of ranks of 9, while the NLMEM approach performed the poorest, ranking 4 with sum of ranks of 36. The RBPANN-softplus and RBPANN-logistic exhibited relatively similar performance conditions. Among the proven models of ANNs, the RBPANN-logistic exhibited the poorest performance. In terms of the sum of ranks for the combined overall dataset and cluster-groups, the RBPANN-tanh demonstrated the best performance with a sum of ranks of 181. Following that, the RBPANN-logistic, RBPANN-softplus, and NLMEM were ranked 2nd, 3rd, and 4th, with sum of ranks values of 201, 240, and 368, respectively. These findings provide compelling evidence that the ANN algorithms yielded satisfactory results with the RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic activation functions.

4. Discussion

Having knowledge about the total tree height and diameter at breast height is essential for both the development and application of many growth and yield models. Models focusing on the h-dbh relationship serve as valuable tools for accurately predicting tree height based on dbh measurements. The dbh can be conducted quickly, easily, and accurately, but the measurement of total tree height is comparatively complex, time consuming, and expensive [11]. NLMEM has been a capable approach to generate models of the h-dbh relationship for different species and assume fixed and random parameters for specific groups or covariables to study the variability in inter- and intra-plots, ecological regions, or cluster-groups [10,16,37]. Also, these models have been studied for local and generalized formulations with the NLMEM approach [12,13,16,26]. In this case of study, the NLMEM performance was accurately strong in modeling the h-dbh relationship for the Durango pine and the inclusion of an unsupervised clustering analysis improved the estimated parameters and their statistical properties [34,45]. This involves fixed parameters for the overall dataset in the training phase and random parameters for each cluster-group, in addition to parameters that give information about general variability and variability within cluster-groups.
The NLMEM demonstrated outstanding performance during the training phase, with the fitting process converging quickly and effortlessly. Additionally, the maximum likelihood approach yielded favorable and suitable results, particularly when expressing the asymptote parameter with mixed effects (Table 3 and Table 4). All parameters in the fitting process were significantly different than zero at a 5% significance level, the random parameters allowed suitable estimations in the training phase, and those were used for cluster-groups in the testing phase. The application of the NLMEM approach on the testing dataset resulted in successful outcomes that aligned with the expected results (as shown in Table 8), accompanied by the utilization of appropriate statistical measures. As an illustration, the root mean square error (RMSE) for the overall dataset during the testing phase was determined to be 3.1438 m, with an average value of 3.3773 m observed within the cluster-groups (refer to Table 8). By employing a mixed effect model and incorporating cluster-group inclusion, the Chapman–Richards growth equation [28] (Equation (2)) proved to be a highly effective model for predicting the height of Durango pine trees. Similar results have been found for several species an different conditions [11,16,26]. Even though the NLMEM method is accurate for height prediction based on diameter measurements, it is worth considering that ANNs could be a suitable alternative for modeling the h-dbh relationship under several dataset conditions and the incorporation of grouped strategies [7,14,46]. In recent times, there has been a growing application of AI and ML techniques in the fields of biology and forestry. These advanced approaches have proven valuable in addressing challenges that require substantial computational resources and unsupervised learning methods [1,39]. Several of these approaches have been employed in studying the height–diameter at breast height (h-dbh) relationship, leading to notable outcomes and reported successes for various species and under diverse forest management conditions and thereby demonstrating their versatility and effectiveness [7,14,15,46]. In this context the ANN model outperformed the NLMEM approach.
In this study, the ANNs were evaluated and compared with the traditional NLMEM method. The ANNs utilized the RBP learning algorithm along with three activation functions. In most cases, the ANNs employing the RBPANN-tanh, RBPANN, and RBPANN-logistic (Equations (7)–(9), respectively) exhibited superior performance compared to the results obtained by the NLMEM, both during the training and testing phases. The training statistics for the three ANNs, as presented in Table 6, exhibited enhanced fitting performance compared to the statistics obtained by the NLMEM (see Table 4). This improvement was observed in both the overall dataset and cluster-group analyses. These findings provide evidence that the clustering analysis using the k-means algorithm effectively grouped the dataset utilized in this study [34,45]. The RBPANN-tanh model, employing a tangent hyperbolic activation function, demonstrated the highest performance in predicting height measurements during both the training and testing phases (as shown in Table 6 and Table 9). Furthermore, the ranks and sum of ranks, based on the ranking system proposed by Kozak and Smith [44], provided evidence of the advantages of the ANN models over the NLMEM approach. Models such as the RBPANN-logistic were reported by Özçelik, Diamantopoulou, Crecente-Campo, and Eler [7], who revealed that models such as the RBPANN-logistic exhibited advantages over NLMEM when predicting the growth of Crimean juniper in the southern and southwestern regions of Turkey. Similar results have been reported regarding the advantages of ANNs or deep learning algorithms over the ordinary least square model and NLMEM in both training and testing or validation phases [14,15,46,47]. In all cases, the implementation of ANNs exhibited significant advantages over traditional approaches when modeling the h-dbh relationship.
In this study, based on the implemented ranking system, the RBPANN-tanh model emerged as the top performer (residual and predicted values are showed in Figure 5 and Figure 6). It achieved a sum of ranks of 176 for the training phase and 81 for the testing phase. These sums of ranks account for both the overall dataset and cluster-groups, as illustrated in Table 7 and Table 10, respectively. In terms of training, the RBPANN-softplus model ranked second, whereas during the testing phase, the RBPANN-logistic model exhibited the second-best performance. On the other hand, the RBPANN-logistic model performed least effectively in the training phase, while the NLMEM model demonstrated comparatively lower performance during the testing phase. The ANNs developed in this study, as depicted in Figure 4, were trained using the RBP algorithm. The ANNs were then evaluated using three different activation functions: RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic. These models comprised a total of five layers, including three hidden layers. The training process involved ten repetitions to ensure robustness and accuracy. Even though the RBPANN-logistic converged in 88 steps, it exhibited relatively poorer performance compared to the RBPANN-tanh, which achieved better results within 301 steps. Interestingly, the RBPANN-logistic required a longer convergence time of 1885 steps, indicating its comparatively poorer performance in this aspect. As a result, the developed ANN model showcased a high capability for predicting total tree height measurements. This highlights the potential application of AI in modeling the h-dbh relationship, not only for Durango pine trees but also for general forest modeling purposes or other variables [6,19,48]. The ANNs could be used to improve the estimations in forest inventory and forest management and planning in mixed-species forests in Durango, Mexico. The findings from this study offer substantial evidence that ANNs can be effectively applied to predict the total tree height of Durango pine trees, utilizing both the diameter at breast height and cluster-groups as predictive variables. The ANNs demonstrated robust performance for Durango pine trees and hold the potential for utilization across diverse species and ecological regions not only in Mexico but also worldwide. Furthermore, ANNs could find valuable applications in forest management and planning scenarios.

5. Conclusions

The nonlinear mixed effect modeling (NLMEM) and artificial neural networks (ANN) with resilience backpropagation (RBP) were employed to model the height–diameter at breast height (h-dbh) relationship for the Durango pine species. An unsupervised clustering analysis was conducted to enhance the capability of the trained and tested models. Three activation functions, namely tangent hyperbolicus (RBPANN-tanh), softplus (RBPANN-softplus), and logistic (RBPANN-logistic), were utilized in the RBPANN. Those activation functions were trained and tested on both the overall dataset and each cluster-group. In general, the ANNs outperformed the NLMEM for predictions of heights in training and testing phases. The best model in both training and testing phases was the RBPANN-tanh, which assumed five layers in the ANN and three of them were hidden. The use of ANNs proves to be a suitable and effective approach for estimating the total tree height of the Durango pine species. Additionally, incorporating an unsupervised clustering analysis enhances the estimation accuracy in ANN models, highlighting the capabilities of artificial intelligence (AI) in this context. In conclusion, AI techniques such as ANNs prove to be suitable and modern statistical tools for forest modeling. Also, the findings could be used to improve the forest management and planning of studied species. In conclusion, the ANNs demonstrated superior performance compared to the NLMEM approach and are well suited for height predictions for the Durango pine species.

Author Contributions

Conceptualization, G.Q.-B. and Y.O.; methodology, G.Q.-B. and Y.O.; validation, G.Q.-B. and Y.O.; formal analysis, G.Q.-B. and Y.O.; investigation, G.Q.-B. and Y.O.; resources, G.Q.-B.; writing—original draft preparation, G.Q.-B. and Y.O.; writing—review and editing, G.Q.-B. and Y.O.; visualization, G.Q.-B.; supervision, G.Q.-B.; project administration, G.Q.-B.; funding acquisition, Y.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a Guangdong Science and Technology Department Project.

Data Availability Statement

Not applicable.

Acknowledgments

The authors of this article express their gratitude to the Ejido San Diego de Tezains community in Durango, Mexico, for data support.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wodecki, A. Artificial Intelligence Methods and Techniques. In Artificial Intelligence in Value Creation; Wodecki, A., Ed.; Springer International Publishing: Cham, Switzerland, 2019; pp. 71–132. [Google Scholar] [CrossRef]
  2. Holzinger, A.; Keiblinger, K.; Holub, P.; Zatloukal, K.; Müller, H. AI for life: Trends in artificial intelligence for biotechnology. New Biotechnol. 2023, 74, 16–24. [Google Scholar] [CrossRef]
  3. Riedmiller, M.; Braun, H. A direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm; 0780309995; IEEE: Piscataway, NJ, USA, 1993; pp. 586–591. [Google Scholar]
  4. Riedmiller, M.; Braun, H. Rprop-Description and Implementation Details; Technical Report; Univer: Singapore, 1994. [Google Scholar]
  5. Hecht-Nielsen, R. III.3—Theory of the Backpropagation Neural Network. In Neural Networks for Perception, Proceedings of the International Joint Conference on Neural Networks 1, Washinton, DC, USA, 18–21 June 1989; Wechsler, H., Ed.; Academic Press: Cambridge, MA, USA, 1992; pp. 65–93. [Google Scholar] [CrossRef]
  6. Chen, W.; Shirzadi, A.; Shahabi, H.; Ahmad, B.B.; Zhang, S.; Hong, H.; Zhang, N. A novel hybrid artificial intelligence approach based on the rotation forest ensemble and naïve Bayes tree classifiers for a landslide susceptibility assessment in Langao County, China. Geomat. Nat. Hazards Risk 2017, 8, 1955–1977. [Google Scholar] [CrossRef] [Green Version]
  7. Özçelik, R.; Diamantopoulou, M.J.; Crecente-Campo, F.; Eler, U. Estimating Crimean juniper tree height using nonlinear regression and artificial neural network models. For. Ecol. Manag. 2013, 306, 52–60. [Google Scholar] [CrossRef]
  8. Russell, S.J. Artificial Intelligence a Modern Approach; Pearson Education, Inc.: Hoboken, NJ, USA, 2010. [Google Scholar]
  9. Gui, Y.; Li, D.; Fang, R. A fast adaptive algorithm for training deep neural networks. Appl. Intell. 2023, 53, 4099–4108. [Google Scholar] [CrossRef]
  10. Mehtätalo, L.; de-Miguel, S.; Gregoire, T.G. Modeling height-diameter curves for prediction. Can. J. For. Res. 2015, 45, 826–837. [Google Scholar] [CrossRef] [Green Version]
  11. Sharma, M.; Parton, J. Height–diameter equations for boreal tree species in Ontario using a mixed-effects modeling approach. For. Ecol. Manag. 2007, 249, 187–198. [Google Scholar] [CrossRef]
  12. Santiago-García, W.; Jacinto-Salinas, A.H.; Rodríguez-Ortiz, G.; Nava-Nava, A.; Santiago-García, E.; Ángeles-Pérez, G.; Enríquez-del Valle, J.R. Generalized height-diameter models for five pine species at Southern Mexico. For. Sci. Technol. 2020, 16, 49–55. [Google Scholar] [CrossRef]
  13. Szurszewski, J.H. Mechanism of action of pentagastrin and acetylcholine on the longitudinal muscle of the canine antrum. J. Physiol. 1975, 252, 335–361. [Google Scholar] [CrossRef] [PubMed]
  14. Elliott, R.J.; Gardner, D.L. Proline determination with isatin, in the presence of amino acids. Anal. Biochem. 1976, 70, 268–273. [Google Scholar] [CrossRef]
  15. Ogana, F.N.; Ercanli, I. Modelling height-diameter relationships in complex tropical rain forest ecosystems using deep learning algorithm. J. For. Res. 2021, 33, 883–898. [Google Scholar] [CrossRef]
  16. Raptis, D.I.; Kazana, V.; Kazaklis, A.; Stamatiou, C. Mixed-effects height–diameter models for black pine (Pinus nigra Arn.) forest management. Trees 2021, 35, 1167–1183. [Google Scholar] [CrossRef]
  17. Diamantopoulou, M.J. Predicting fir trees stem diameters using Artificial Neural Network models. South. Afr. For. J. 2005, 205, 39–44. [Google Scholar] [CrossRef]
  18. Qin, Y.; Wu, B.; Lei, X.; Feng, L. Prediction of tree crown width in natural mixed forests using deep learning algorithm. For. Ecosyst. 2023, 10, 100109. [Google Scholar] [CrossRef]
  19. Vahedi, A.A. Artificial neural network application in comparison with modeling allometric equations for predicting above-ground biomass in the Hyrcanian mixed-beech forests of Iran. Biomass Bioenergy 2016, 88, 66–76. [Google Scholar] [CrossRef]
  20. Diamantopoulou, M.J.; Milios, E. Modelling total volume of dominant pine trees in reforestations via multivariate analysis and artificial neural network models. Biosyst. Eng. 2010, 105, 306–315. [Google Scholar] [CrossRef]
  21. Wu, Z.; Wang, B.; Li, M.; Tian, Y.; Quan, Y.; Liu, J. Simulation of forest fire spread based on artificial intelligence. Ecol. Indic. 2022, 136, 108653. [Google Scholar] [CrossRef]
  22. Richards, M.; McDonald, A.J.S.; Aitkenhead, M.J. Optimisation of competition indices using simulated annealing and artificial neural networks. Ecol. Model. 2008, 214, 375–384. [Google Scholar] [CrossRef]
  23. Öhman, K.; Lämås, T. Clustering of harvest activities in multi-objective long-term forest planning. For. Ecol. Manag. 2003, 176, 161–171. [Google Scholar] [CrossRef]
  24. akay, A.O.; Akgül, M.; Esin, A.I.; Demir, M.; Şentürk, N.; Öztürk, T. Evaluation of occupational accidents in forestry in Europe and Turkey by k-means clustering analysis. Turk. J. Agric. For. 2021, 45, 495–509. [Google Scholar] [CrossRef]
  25. Phillips, P.D.; Yasman, I.; Brash, T.E.; van Gardingen, P.R. Grouping tree species for analysis of forest data in Kalimantan (Indonesian Borneo). For. Ecol. Manag. 2002, 157, 205–216. [Google Scholar] [CrossRef]
  26. Corral-Rivas, S.; Álvarez-González, J.; Crecente-Campo, F.; Corral-Rivas, J. Local and generalized height-diameter models with random parameters for mixed, uneven-aged forests in Northwestern Durango, Mexico. For. Ecosyst. 2014, 1, 6. [Google Scholar] [CrossRef] [Green Version]
  27. Castillo-Gallegos, E.; Jarillo-Rodríguez, J.; Escobar-Hernández, R. Diameter-height relationships in three species grown together in a commercial forest plantation in eastern tropical Mexico. Rev. Chapingo Ser. Cienc. For. Y Del Ambiente 2018, 24, 33–48. [Google Scholar] [CrossRef]
  28. Richards, F.J. A Flexible Growth Function for Empirical Use. J. Exp. Bot. 1959, 10, 290–300. [Google Scholar] [CrossRef]
  29. Burkhart, H.E.; Tomé, M. Modeling Forest Trees and Stands; Springer Science & Business Media: Berlin, Germany, 2012. [Google Scholar]
  30. Von Gadow, K.; Nagel, J.; Saborowski, J. Continuous Cover Forestry: Assessment, Analysis, Scenarios; Springer Science & Business Media: Berlin, Germany, 2002; Volume 4. [Google Scholar]
  31. Quiñonez-Barraza, G.; Zhao, D.; De Los Santos Posadas, H.M.; Corral-Rivas, J.J. Considering neighborhood effects improves individual dbh growth models for natural mixed-species forests in Mexico. Ann. For. Sci. 2018, 75, 78. [Google Scholar] [CrossRef] [Green Version]
  32. Climate Action Reserve [CAR]. Protocolo Forestal para México. Versión 3.0; Climate Action Reserve: Los Angeles, CA, USA, 2022; Volume Versión 3.0, 2080p. [Google Scholar]
  33. R Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2012; Available online: https://www.r-project.org/ (accessed on 7 July 2023).
  34. Ghoshal, S.; Nohria, N.; Wong, M. A K-means clustering algorithm. Algorithm AS 136. Appl. Stat. 1979, 28, 100–108. [Google Scholar]
  35. Forgy, E.W. Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics 1965, 21, 768–769. [Google Scholar]
  36. Milligan, G.W.; Cooper, M.C. A study of standardization of variables in cluster analysis. J. Classif. 1988, 5, 181–204. [Google Scholar] [CrossRef]
  37. Fang, Z.; Bailey, R.L. Height–diameter models for tropical forests on Hainan Island in southern China. For. Ecol. Manag. 1998, 110, 315–327. [Google Scholar] [CrossRef]
  38. Pinheiro, J.C.; Bates, D.M.; Lindstrom, M.J. Model Building for Nonlinear Mixed Effects Models; University of Wisconsin, Department of Biostatistics Madison: Madison, WI, USA, 1995. [Google Scholar]
  39. Krogh, A. What are artificial neural networks? Nat. Biotechnol. 2008, 26, 195–197. [Google Scholar] [CrossRef]
  40. Anastasiadis, A.D.; Magoulas, G.D.; Vrahatis, M.N. New globally convergent training scheme based on the resilient propagation algorithm. Neurocomputing 2005, 64, 253–270. [Google Scholar] [CrossRef]
  41. Cartwright, H. Artificial Neural Networks: Methods in Molecular Biology, 3rd ed.; Springer Human Press: New York, NY, USA, 2021; Volume 2190, 368p. [Google Scholar]
  42. Han, W.; Nan, L.; Su, M.; Chen, Y.; Li, R.; Zhang, X. Research on the prediction method of centrifugal pump performance based on a double hidden layer BP neural network. Energies 2019, 12, 2709. [Google Scholar] [CrossRef] [Green Version]
  43. Fausett, L.V. Fundamentals of Neural Networks: Architectures, Algorithms and Applications; Pearson Education India: Chennai, India, 2006. [Google Scholar]
  44. Kozak, A.; Smith, J.H.G. Standards for evaluating taper estimating systems. For. Chron. 1993, 69, 438–444. [Google Scholar] [CrossRef]
  45. Bicego, M. K-Random Forests: A K-means style algorithm for Random Forest clustering. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
  46. Ercanlı, İ. Innovative deep learning artificial intelligence applications for predicting relationships between individual tree height and diameter at breast height. For. Ecosyst. 2020, 7, 12. [Google Scholar] [CrossRef] [Green Version]
  47. Skudnik, M.; Jevšenak, J. Artificial neural networks as an alternative method to nonlinear mixed-effects models for tree height predictions. For. Ecol. Manag. 2022, 507, 120017. [Google Scholar] [CrossRef]
  48. Thanh, T.N.; Tien, T.D.; Shen, H.L. Height-diameter relationship for Pinus koraiensis in Mengjiagang Forest Farm of Northeast China using nonlinear regressions and artificial neural network models. J. For. Sci. 2019, 65, 134–143. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Study area location in Northern Mexico.
Figure 1. Study area location in Northern Mexico.
Forests 14 01544 g001
Figure 2. Scatter plot for h-dbh relationship for full dataset (a) and grouping dataset by cluster (b).
Figure 2. Scatter plot for h-dbh relationship for full dataset (a) and grouping dataset by cluster (b).
Forests 14 01544 g002
Figure 3. Plots of ANNs for vectors of c(3, 3, 3) on the (left), c(5, 5, 5) in the (center), and c(10, 10, 10) on the (right) in the hidden layer.
Figure 3. Plots of ANNs for vectors of c(3, 3, 3) on the (left), c(5, 5, 5) in the (center), and c(10, 10, 10) on the (right) in the hidden layer.
Forests 14 01544 g003
Figure 4. Plots of RBPANN-tanh (left), RBPANN-softplus (center), and RBPANN-logistic (right) for h-dbh relationship with unsupervised clustering analysis. Bias is included in nodes “B”. Positive weight values in the visual representation are denoted by blue lines, while negative weight values are represented by red lines.
Figure 4. Plots of RBPANN-tanh (left), RBPANN-softplus (center), and RBPANN-logistic (right) for h-dbh relationship with unsupervised clustering analysis. Bias is included in nodes “B”. Positive weight values in the visual representation are denoted by blue lines, while negative weight values are represented by red lines.
Forests 14 01544 g004
Figure 5. Residual versus predicted values for RBPANN-tanh model in training phase and each cluster-group.
Figure 5. Residual versus predicted values for RBPANN-tanh model in training phase and each cluster-group.
Forests 14 01544 g005
Figure 6. Residual versus predicted values for RBPANN-tanh model in testing phase for each cluster-group.
Figure 6. Residual versus predicted values for RBPANN-tanh model in testing phase for each cluster-group.
Forests 14 01544 g006
Table 1. Descriptive statistics for plot-specific variables used in clustering analysis.
Table 1. Descriptive statistics for plot-specific variables used in clustering analysis.
Statistic
VariablenMinimumMeanMaximumSD
N100010.0000114.7200570.000087.7167
BA10000.06365.337528.52174.3161
Dm10008.500022.963675.00007.6788
Hm10004.000012.806235.00003.9813
QMD10008.514724.747876.94848.0708
A10002032.00002588.21702978.0000137.3215
S10000.000043.049996.000020.0551
As10001592
N = density (trees per hectare); BA = basal area (m2); Dm = mean diameter (cm); Hm = mean total tree (m); QMD = quadratic mean diameter (cm); A = altitudes (m); S = slope (%); As = aspect (categorical variable; 1 = plain, 2 = N, 9 = SW); n = observations; SD = standard deviation.
Table 2. Descriptive statistics for both training and testing datasets.
Table 2. Descriptive statistics for both training and testing datasets.
DatasetStatistic
VariablenMinimumMeanMaximumSD
Trainingh57363.000012.390035.00005.3217
dbh57367.500021.536295.000011.4394
Testingh57362.000012.174235.00005.2871
dbh57367.500021.384698.000011.5267
h = total tree height (m); dbh = diameter at breast height (cm); n = observations; SD = standard deviation.
Table 3. Estimated parameter for h-dbh relationship in Durango pine by NLMEM.
Table 3. Estimated parameter for h-dbh relationship in Durango pine by NLMEM.
ParameterEstimateSEDFt-Valuep-ValueLowerUpper
α 0 26.4090601.100113572424.005770<0.0000124.25298528.565134
α 1 0.0297860.002534572411.754320<0.000010.0248200.034752
α 2 1.0831330.040518572426.732200<0.000011.0037231.162543
s d ( α 0 ) 1.9289390.58399757243.3029920.0009621.2105793.073574
σ 3.1108390.0293385724106.033502<0.000013.0543793.168342
α 01 −3.3717450.106547407−31.645570<0.00001−3.580578−3.162913
α 02 −2.8408260.089770320−31.645570<0.00001−3.016775−2.664877
α 03 −0.6018790.019019631−31.645570<0.00001−0.639157−0.564601
α 04 3.5725800.11289413331.645570<0.000013.3513093.793851
α 05 0.7738020.02445292531.645570<0.000010.7258760.821729
α 06 0.4785650.015123110931.645570<0.000010.4489250.508206
α 07 0.9455490.02987936431.645570<0.000010.8869851.004113
α 08 −0.3087940.009758876−31.645570<0.00001−0.327919−0.289668
α 09 0.0123270.00039065431.645570<0.000010.0115640.013091
α 010 1.3404200.04235731731.645570<0.000011.2574001.423440
SE = asymptotic standard error; DF = freedom degrees; sd = standard deviation for random effect between cluster groups; σ = standard error within cluster-group.
Table 4. Fitting statistics for h-dbh relationship in Durango pine by NLMEM.
Table 4. Fitting statistics for h-dbh relationship in Durango pine by NLMEM.
DatasetnRMSESEERSEEFIEREAICBIClogLik
Overall57363.10853.112325.11930.6588−0.0005−0.004213,039.7513,139.57−13,009.75
C16312.47352.48896.43220.6182−0.0324−0.31618834.14772.23−736.18
C24072.52892.54896.14850.53780.00500.05157113.33627.39−592.78
C39252.96312.97498.42670.6525−0.1111−0.952616,437.911408.51−1369.83
C411093.86883.94422.98760.56100.34161.73784306.53388.22−358.88
C53203.13413.142610.69710.6312−0.0072−0.061025,348.122153.32−2112.34
C66542.87272.879210.30140.61340.05550.452828,074.422381.60−2339.53
C73643.43453.45846.59110.6329−0.0640−0.487910,767.05932.64−897.25
C88763.38423.393910.25920.60150.02100.163225,618.612175.54−2134.88
C91333.20983.22218.80880.6138−0.0229−0.186218,292.631563.28−1524.39
C103173.61683.64585.14520.6196−0.0058−0.03539768.73848.61−814.06
RMSE = root mean square error (m); SEE = standard error of estimate (m); RSEE = relative SEE (%); FI = fitting index; E = mean error (m); RE = relative (%); AIC = Akaike information criterion; BIC = Bayesian information criterion; logLik = log-likelihood value; C = cluster-group.
Table 5. Main fitting statistics for RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic activation functions tested for h-dbh relationship.
Table 5. Main fitting statistics for RBPANN-tanh, RBPANN-softplus, and RBPANN-logistic activation functions tested for h-dbh relationship.
ANNErrorReached ThresholdStepsAICBIC
RBPANN-tanh27.84550.0775301577.692314.52
RBPANN-softplus27.39390.08381885576.792313.62
RBPANN-logistic28.41130.099488578.822315.65
AIC = Akaike information criterion; BIC = Bayesian information criterion.
Table 6. Fitting statistics for h-dbh relationship in Durango pine by ANNs and different backpropagation activation functions.
Table 6. Fitting statistics for h-dbh relationship in Durango pine by ANNs and different backpropagation activation functions.
DatasetRMSESEERSEEFIEREAICBIClogLik
RBPANN-tanh
Overall2.81222.813422.70710.72080.0001−0.000311,872.5911,912.52−11,860.59
C12.63222.64278.01840.72580.00020.001814,644.891259.09−1220.41
C22.21272.22646.16340.69450.00730.07177745.69681.53−645.47
C32.81702.824610.29880.7020−0.0087−0.074122,979.701955.95−1914.98
C42.57952.58539.90820.68830.06360.519125,209.162142.83−2100.76
C52.41782.43706.29660.5776−0.6268−6.45836768.31598.64−564.03
C62.89072.90198.49770.68670.25592.080516,649.441426.35−1387.45
C73.13403.15586.44230.69430.36772.80379967.09865.97−830.59
C83.06523.07409.95350.6731−0.3691−2.863623,537.452002.11−1961.45
C93.74703.82003.09950.58821.51077.68644204.43379.71−350.37
C103.24903.27514.95090.6930−0.1391−0.84268952.96780.63−746.08
RBPANN-softplus
Overall2.84312.844322.95650.71430.7146−0.0013−0.010711,997.8812,037.81
C12.65162.66218.07720.72180.04890.419014,755.601268.32−1229.63
C22.45912.47446.84980.6226−0.9459−9.24088777.19767.49−731.43
C32.85292.860610.43010.69440.42003.569723,260.851979.38−1938.40
C42.60032.60629.98820.68330.31092.535125,423.152160.66−2118.60
C52.49562.51546.49930.5499−0.9140−9.41787011.62618.91−584.30
C62.88412.89528.47810.6882−0.1042−0.847216,613.231423.33−1384.44
C73.10303.12466.37870.70030.13391.02139880.37858.75−823.36
C83.07483.08379.98470.6711−0.3712−2.879923,603.312007.59−1966.94
C93.81873.89313.15880.57231.64618.37544264.92384.75−355.41
C103.25503.28114.96000.69190.09920.60098966.90781.80−747.24
RBPANN-logistic
Overall2.84862.849823.00120.7135−0.0052−0.042012,020.1912,060.12−12,008.19
C12.65702.66768.09380.72060.07850.672914,786.651270.90−1232.22
C22.46752.48286.87320.6201−0.9254−9.04038810.45770.26−734.20
C32.85862.866410.45100.69320.41463.523723,305.341983.09−1942.11
C42.60242.60839.99620.68270.28682.338625,444.482162.44−2120.37
C52.51832.53826.55830.5417−0.9468−9.75617081.01624.69−590.08
C62.90552.91678.54110.6835−0.1415−1.150316,729.321433.01−1394.11
C73.09573.11726.36360.70170.10660.81299859.77857.03−821.65
C83.07403.08289.98180.6712−0.3751−2.909523,597.182007.08−1966.43
C93.81923.89363.15920.57211.67568.52534265.34384.79−355.45
C103.25843.28454.96520.69120.18351.11158974.85782.46−747.90
RMSE = root mean square error (m); SEE = standard error of estimate (m); RSEE = relative SEE (%); FI = fitting index; E = mean error (m); RE = relative (%); AIC = Akaike information criterion; BIC = Bayesian information criterion; logLik = log-likelihood value; C = cluster-group.
Table 7. Ranks and sum of ranks based on the fitting statistics for fitted models in training phase.
Table 7. Ranks and sum of ranks based on the fitting statistics for fitted models in training phase.
ModelDatasetRMSESEERSEEFIEREAICBIClogLikRank
NLMEMOverall44443244433 (4)
RBPANN-tanhOverall11114421217 (2)
RBPANN-softplusOverall22221312116 (1)
RBPANN-logisticOverall33332133324 (3)
NLMEMC111142211114
RBPANN-tanhC122211122215
RBPANN-softplusC133323333326
RBPANN-logisticC144434444435
NLMEMC244141111118
RBPANN-tanhC211212222215
RBPANN-softplusC222324433326
RBPANN-logisticC233433344431
NLMEMC344142211120
RBPANN-tanhC311211122213
RBPANN-softplusC322324433326
RBPANN-logisticC333433344431
NLMEMC444144211122
RBPANN-tanhC411211122213
RBPANN-softplusC422323433325
RBPANN-logisticC433432344430
NLMEMC544411144427
RBPANN-tanhC511122211112
RBPANN-softplusC522233322221
RBPANN-logisticC533344433330
NLMEMC611441144424
RBPANN-tanhC633224422224
RBPANN-softplusC622112211113
RBPANN-logisticC644333333329
NLMEMC744441144430
RBPANN-tanhC733334433329
RBPANN-softplusC722223322220
RBPANN-logisticC711112211111
NLMEMC844441144430
RBPANN-tanhC811112211111
RBPANN-softplusC833333333327
RBPANN-logisticC822224422222
NLMEMC911411144421
RBPANN-tanhC922122211114
RBPANN-softplusC933233322223
RBPANN-logisticC944344433332
NLMEMC1044441144430
RBPANN-tanhC1011113311113
RBPANN-softplusC1022222222218
RBPANN-logisticC1033334433329
RMSE = root mean square error; SEE = standard error of estimate; RSEE = relative SEE; FI = fitting index; E = mean error; RE = relative E; AIC = Akaike information criterion; BIC = Bayesian information criterion; logLik = log-likelihood value; C = cluster-group.
Table 8. Testing statistics for h-dbh relationship in Durango pine by NLMEM in testing phase.
Table 8. Testing statistics for h-dbh relationship in Durango pine by NLMEM in testing phase.
DatasetnRMSESEERSEEFIEREAICBIClogLik
Overall57363.14383.147625.85490.6464−0.1611−1.322913,169.2913,269.10−13,139.29
C16312.70372.714124.19310.6893−0.0987−0.879515,671.321344.87−1305.94
C24072.90452.922829.69730.46801.222312.419410,249.76890.11−854.15
C39253.09413.102927.54430.6136−0.7206−6.396423,870.142029.86−1989.18
C411093.05093.057825.06900.6102−0.0942−0.772529,971.182539.72−2497.60
C53202.75632.779728.69920.51750.91949.49267263.59639.50−605.30
C66543.17183.183425.40800.60850.14281.139819,047.711626.51−1587.31
C73643.45123.475226.27810.6545−0.7996−6.046210,839.28938.67−903.27
C88763.06153.070424.63030.6290−0.1456−1.168023,216.401975.28−1934.70
C91335.00615.113825.42190.2128−2.4332−12.09594665.31417.55−388.78
C103173.86683.895724.60240.5555−0.7956−5.024510,991.37950.90−915.95
RMSE = root mean square error (m); SEE = standard error of estimate (m); RSEE = relative SEE (m); FI = fitting index; E = mean error (m); RE = relative E (%); AIC = Akaike information criterion; BIC = Bayesian information criterion; logLik = log-likelihood value; C = cluster-group.
Table 9. Testing statistics for both overall dataset and each cluster-group in testing phase with ANN approaches.
Table 9. Testing statistics for both overall dataset and each cluster-group in testing phase with ANN approaches.
DatasetRMSESEERSEEFIEREAICBIClogLik
RBPANN-tanh
Overall2.86932.870623.57930.70550.66035.424112,103.3912,143.32−12,091.39
C12.50902.51868.10570.73240.60155.361714,492.431246.63−1207.70
C22.47932.49497.12920.61240.83098.44218726.27763.15−727.19
C32.72942.737210.17040.69940.40463.591721,218.121808.86−1768.18
C42.88422.890711.19310.65160.89697.353128,460.682413.85−2371.72
C52.38802.40836.02270.63780.08250.85176234.38553.73−519.53
C63.08883.10019.14360.62871.20809.641518,609.701590.01−1550.81
C73.16653.18856.46430.70910.86136.512810,085.03875.82−840.42
C82.77102.77909.24580.69610.15191.218321,146.631802.80−1762.22
C94.23224.32323.26130.43741.81569.02594177.60376.91−348.13
C103.47323.49925.70630.64140.52253.299710,118.01878.12−843.17
RBPANN-softplus
Overall2.87642.877623.63710.70400.65785.402912,131.4812,171.41−12,119.48
C12.51812.52788.13530.73050.67566.021914,550.001251.43−1212.50
C22.33882.35366.72530.6551−0.0814−0.82668164.99716.38−680.42
C32.83122.839310.55000.67650.81317.217621,992.851873.42−1832.74
C42.96572.972411.50950.63171.11489.140029,209.972476.29−2434.16
C52.40002.42046.05290.6342−0.2679−2.76606270.23556.72−522.52
C62.97722.98818.81350.65500.78666.278518,002.411539.40−1500.20
C73.10293.12446.33440.72070.61474.64859907.23861.00−825.60
C82.77482.78289.25840.69520.14501.163221,174.891805.15−1764.57
C94.30454.39713.31710.41802.00259.95474226.81381.01−352.23
C103.52663.55305.79400.63030.85555.402610,242.05888.46−853.50
RBPANN-logistic
Overall2.88202.883223.68320.70290.64845.326312,153.8512,193.77−12,141.85
C12.50712.51688.09980.73280.64805.776014,481.071245.68−1206.76
C22.34762.36246.75050.6525−0.0800−0.81238200.94719.38−683.41
C32.82872.836810.54050.67710.84187.472621,973.901871.84−1831.16
C42.97182.978511.53310.63021.13859.334229,264.972480.87−2438.75
C52.38332.40356.01080.6392−0.2199−2.27026220.20552.55−518.35
C62.96742.97838.78440.65730.83016.625517,947.971534.87−1495.66
C73.10243.12406.33350.72080.63424.79569905.96860.90−825.50
C82.77632.78439.26340.69490.14381.153921,186.181806.09−1765.51
C94.25384.34533.27800.43161.94529.66984192.42378.14−349.37
C103.49133.51745.73600.63770.78414.951910,160.28881.65−846.69
RMSE = root mean square error (m); SEE = standard error of estimate (m); RSEE = relative SEE (%); FI = fitting index; E = mean error (m); RE = relative E (%); AIC = Akaike information criterion; BIC = Bayesian information criterion; logLik = log-likelihood value; C = cluster-group.
Table 10. Ranks and sum of ranks based on the fitting statistics for fitted models in testing phase.
Table 10. Ranks and sum of ranks based on the fitting statistics for fitted models in testing phase.
ModelDatasetRMSESEERSEEFIEREAICBIClogLikRank
NLMEMOverall44444444436 (4)
RBPANN-tanhOverall1111111119 (1)
RBPANN-softplusOverall22222222218 (2)
RBPANN-logisticOverall33333333327 (3)
NLMEMC144441144430
RBPANN-tanhC122222222218
RBPANN-softplusC133334433329
RBPANN-logisticC111113311113
NLMEMC244444444436
RBPANN-tanhC233333333327
RBPANN-softplusC211112211111
RBPANN-logisticC222221122216
NLMEMC344442244432
RBPANN-tanhC31111111119
RBPANN-softplusC333333333327
RBPANN-logisticC322224422222
NLMEMC444441144430
RBPANN-tanhC411112211111
RBPANN-softplusC422223322220
RBPANN-logisticC433334433329
NLMEMC544444444436
RBPANN-tanhC522221122216
RBPANN-softplusC533333333327
RBPANN-logisticC511112211111
NLMEMC644441144430
RBPANN-tanhC633334433329
RBPANN-softplusC622222222218
RBPANN-logisticC611113311113
NLMEMC744443344434
RBPANN-tanhC733334433329
RBPANN-softplusC722221122216
RBPANN-logisticC711112211111
NLMEMC844443344434
RBPANN-tanhC811114411115
RBPANN-softplusC822222222218
RBPANN-logisticC833331133323
NLMEMC944444444436
RBPANN-tanhC91111111119
RBPANN-softplusC933333333327
RBPANN-logisticC922222222218
NLMEMC1044443344434
RBPANN-tanhC101111111119
RBPANN-softplusC1033334433329
RBPANN-logisticC1033334433329
RMSE = root mean square error (m); SEE = standard error of estimate; RSEE = relative SEE; FI = fitting index; E = mean error; RE = relative E; AIC = Akaike information criterion; BIC = Bayesian information criterion; logLik = log-likelihood value; C = cluster-group.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ou, Y.; Quiñónez-Barraza, G. Modeling Height–Diameter Relationship Using Artificial Neural Networks for Durango Pine (Pinus durangensis Martínez) Species in Mexico. Forests 2023, 14, 1544. https://doi.org/10.3390/f14081544

AMA Style

Ou Y, Quiñónez-Barraza G. Modeling Height–Diameter Relationship Using Artificial Neural Networks for Durango Pine (Pinus durangensis Martínez) Species in Mexico. Forests. 2023; 14(8):1544. https://doi.org/10.3390/f14081544

Chicago/Turabian Style

Ou, Yuduan, and Gerónimo Quiñónez-Barraza. 2023. "Modeling Height–Diameter Relationship Using Artificial Neural Networks for Durango Pine (Pinus durangensis Martínez) Species in Mexico" Forests 14, no. 8: 1544. https://doi.org/10.3390/f14081544

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop