*2.5. Feature Screening*

### 2.5.1. Feature Screening Based on Akaike Information Criterion (AIC) Method

The Akaike information criterion (AIC), which is an information criterion based on the concept of entropy, can be used to select statistical models by evaluating the accuracy and complexity of the model [38,39]. In general, AIC must be combined with a logistic regression model to achieve feature selection. Therefore, in this study, we introduced binary logistic regression to describe the relationships between features, which can also help predict the probability of lodging.

In this study, the independent variables of the binary logistic regression model were the maize lodging recognition features, and the dependent variables were binary variables with a value of 1 or 0, representing lodging or nonlodging maize, respectively. The maximum likelihood method was used to estimate the parameters of the binary logistic regression model. Following this approach, the binary logistic regression model for predicting the occurrence probability of lodging is

$$P = \frac{e^{\beta\_0 + \beta\_1 z\_1 + \dots + \beta\_m z\_m}}{1 + e^{\beta\_0 + \beta\_1 z\_1 + \dots + \beta\_m z\_m}} \tag{2}$$

where *z*1, *z*2, ... , *zm* are the features to be screened, β0 is the intercept, and β1, β2 ... , β*m* are regression coe fficients.

The maize samples were divided into a training set and a testing set, and the AIC value of the model was calculated as follows:

$$AIC = -2lnL(\beta\_k \| y) + 2k \tag{3}$$

where *ln* stands for the natural logarithm. *L*( βˆ *k y*) is the maximum likelihood function value of the logistic regression model, which indicates the probability that the model will result in a correct classification, and *k* is the number of parameters in the logistic regression model.

When calculating the AIC value of the binary logistic regression model, three-tenths of 1297 maize samples were randomly selected as the training set, and the remaining maize samples were used as the testing set. The dependent variables in the model were eliminated along the decreasing direction of the AIC value until the AIC value reached its lowest point. Finally, the parameters contained in the model with the lowest AIC values were regarded as the optimal features.

### 2.5.2. Feature Screening Based on Index Method

Variation coe fficients and relative di fferences have a wide range of applications in screening and analyzing the image features of crop lodging [6,7]. The variation coe fficient directly reflects the dispersion degree of lodging and nonlodging crops in the image features, while the relative di fference indicates the di fference between the lodging and nonlodging areas based on the image characteristics. Features suitable for lodging identification should have low variation coe fficients and high relative di fferences.

This study adopts the variation coe fficient and relative di fference as two evaluation indicators for feature selection. First, the variation coe fficient and relative di fference of the lodging and nonlodging areas in features were calculated, and the ten features with the largest relative di fference were selected as predictors for lodging recognition. Based on these predictors, the factor whose variation coe fficient was less than 20.57% was selected as the optimal feature to ensure classification stability.

The formulas to calculate the variation coe fficient and relative di fference are as follows:

$$CV = \frac{sd}{mn} \times 100\% \tag{4}$$

$$RD = \frac{ABS(mm\_1 - mm\_2)}{mm\_1} \times 100\% \tag{5}$$

where *CV* denotes the variation coe fficient; *sd* and *mn* represent the standard deviation values of maize samples; *RD* denotes the relative di fference; *mn1* and *mn2* represent the mean values of maize samples; and *ABS* is the absolute value algorithm.
