2.1. Study Area and Planting Date Data
The study areas include the major maize cultivation zones in China. Planting dates are the observed actual planting date of maize at 188 agricultural meteorological experiment stations of the Chinese Meteorological Administration (CMA) [
23], for the 1992–2010 period. These stations have standardized observing guidelines and method for collecting planting dates [
23]. We assume that planting is at an optimal time in these experiment stations, i.e., there is no problem with labor, supplies, or management. This is the same assumption about this dataset made by Tao et al. [
24] in their study of maize phenology in China.
China has been divided into crop cultivation zones, based on the soil and climate [
25]. This concept is similar to the agro-ecological zones of the Food and Agriculture Organization (FAO) [
26]. The 188 agricultural meteorological experiment stations were located in six zones (
Figure 1). There were 51, 21, 36, 27, 32, and 21 experiment stations for Zone I, Zone II, Zone III, Zone IV, Zone V, and Zone VI, respectively. There are two types maize: Spring and summer. The spring maize is sown in spring, and the summer maize is sown in summer.
Table 1 shows the basic information about maize cultivation in each zone. Zone I is northeast China, characterized by cold winters, warm summers, moderate precipitation, and a relatively short growing season. Precipitation is concentrated from May to September [
27]. Zone II is the semiarid agricultural area of Inner Mongolia Autonomous region north of the Great Wall. In Zones I and II, most maize is spring maize. Zone III is the north China plain, which has a temperate continental climate. It is rainy and hot in summer, while dry and cold in winter. In Zone III, most maize is summer maize. Zone IV is southwest China, which has a subtropical monsoon climate. Zone V is the Loess Plateau of China, which has a temperate continental monsoon climate. Zone VI is in the arid northwest of China, including Gansu province and the Xinjiang Uighur Autonomous Region. In Zone VI, growing period precipitation is generally less than 50 mm [
24] so that maize is irrigated. In Zones IV, V, and VI, both spring and summer maize are grown.
The distributions of observed maize planting dates are shown in
Figure 2. Note the narrow range of planting dates for spring maize in Zone V, and for summer maize in Zones III, V, and especially VI. Note also the wide range of planting dates for both types of maize in Zone IV (southwest China), and the overlap between the two types, although summer maize is typically later than spring maize.
2.3. Analyses
To answer the research question of whether or not there was a trend in climate variables over the 1992–2010 period we fit a linear model by ordinary least squares (OLS) regression, for each month of each maize type′s season in each zone, based on the experiment stations in that zone, assuming temporal independence between years. Coefficients were tested using t tests for statistical significance at p < 0.05, by the standard error of the regression coefficient. We checked for serial autocorrelation of the OLS residuals, which would have required a generalized least squares (GLS) regression with a model of temporal autocorrelation for the error term, but found none.
To solve the research question of whether or not there was a trend in planting dates over the 1992–2010 period, we fit a linear trend by ordinary least squares (OLS) regression for each maize type in each station, again confirming temporal independence between years.
To assess at what degree planting dates are related to climate, we performed principal component analysis (PCA) data reduction on the standardized predictors, followed by OLS multivariate regression over the years, using the most important derived principal components (PCs) as predictors.
PCA was used to transform a large number of original variables into a small number of uncorrelated principal components based on their influence and quantity [
30,
31]. PCA is a data transformation that replaces an original multivariate space with a transformed space, with uncorrelated axes, sorted by variance explained. The later components typically represent only a minor part of data variability, and so are eliminated, thereby reducing database dimensionality [
32], i.e., the number of predictors to be used in subsequent regression modelling. We determined the number of principal components to retain for modelling by the parallel analysis method [
33]. PCA was indicated because there is a high correlation between predictors, especially the T and T
min of same month. The PCs were interpreted in terms of the original variables, by examining the PC loadings. Function “principal” in the R package
psych [
34] was used for the PCA analysis.
We assume that the planting date decision made at each station depends only on that year’s weather, i.e., no serial correlation from previous years, which we confirmed by examining the autocorrelation of regression residuals. For each spring maize in five zones, and summer maize in four zones, all observations are considered to be independent and the data from stations were combined to Ordinary Least Squares (OLS) multivariate model:
where the dependent variable
is annual planting date observation at station
i in year
t, from 1992 to 2010. The coefficient
denotes an intercept, the design matrix
has an initial column of 1′s and then one column for each principal component scoring for the maize type, at station
i in year
t. The equation is solved for the coefficient vector
. The error vector
assumes temporal independence between years (confirmed with autocorrelation analysis) and spatial independence between stations.
In some zones there were outliers, i.e., planting dates far from the majority. Therefore, we re-fitted the planting date observation and independent variables with robust regression [
35], using the Huber and bisquare methods with default parameters suggested by the
rlm function of the MASS R package [
36]. All coefficients of determination were reduced by less than 3%, and all but two (Zones II and IV spring maize) by less than 0.5%, indicating that the outliers did not have high leverage in the OLS multivariate regression model. Therefore, we reported the adjusted coefficient of determination of the OLS multivariate regression as the proportion of variance explained.
Several other studies of the effect of climate on planting dates or yield [
37,
38,
39] used panel regression models. In this study we did not use such models. We did not consider stations as fixed effects, because we are not interested in the differences between specific stations. There could well be systematic bias over years in individual stations (e.g., a conservative or aggressive approach to planting by the farm management); however, there is no way to distinguish any such traits from the site-specific effects of the climate. It’s assumed that any bias is minimal, because these agricultural experiment stations are under the same administration and regulations.