1. Introduction
Viticulture is an important sector within the New Zealand horticultural system, with a total vineyard production area extending over 41,000 ha [
1]. In New Zealand, conventional vineyards typically employ uniform management practices. However, studies have demonstrated the spatial and temporal variation in vine vigor, grape yield, and quality within the vineyard scale [
2,
3]. With uniform management, a single application rate of fertilizers, pesticides and irrigation are used for the entire vineyard. As a consequence of uniform management, parts of the field are likely to receive too little and others too much input. This could have negative impacts on the environment, such as groundwater pollution, soil degradation and increase pressure from weeds and pests [
4]. Precision viticulture (PV) provides an opportunity for grape growers to understand and manage the vineyard spatial variability by using remote sensing data and geostatistical analysis. Remote sensing allows viticulturists to continuously monitor spatial and temporal variation in soil properties and vine growth status [
5]. This is particularly relevant for New Zealand wine growers who need to understand and manage vineyard variability to increase grape quality and achieve sustainable viticulture.
Remote sensing is the process of acquiring spectral data remotely from several platforms including ground vehicles, aircraft, uncrewed aerial vehicles (UAV), and satellites. In recent years, remote sensing has been widely applied in PV to evaluate the vineyard spatial variability of vine vigor, nutrient status, wine water status, and grape yield and quality [
5,
6,
7]. Satellite platforms with multispectral cameras, such as Sentinel-2 with 10 m pixel resolution and Landsat 8 with 30 m pixel resolution, have been used on a regional scale to predict grape yield [
8,
9]. However, low spatial resolution satellite imagery makes it difficult to monitor vine growth status on a vineyard block scale without bias, as a single pixel of satellite imagery often mixes inter-row crops and bare soil. UAV and aircraft sensors carrying multispectral or hyperspectral cameras have provided high spatial resolution imagery on a vineyard scale. For example, Carrillo et al. found a linear relationship between berry weight and NDVI derived from multispectral airborne imagery [
10]. In addition, Lamb et al. found a strong correlation between berry quality parameters and multispectral airborne imagery obtained at veraison [
7]. However, the operational cost of aircraft is very expensive. Compared to other platforms, UAV provide an interesting alternative approach for PV, as it can provide high spatial resolution images at a lower cost [
11]. For example, García-Fernández et al. found a significant correlation between RGB-based vegetation indices derived from UAV imagery and berry quality [
12]. Wei et al. found the VIs-derived UAV multispectral imagery combined with other environmental variables can predict grapevine water status (stem water potential Ψstem), with the RMSE of 138 kPa [
13]. In addition to remote sensing, proximal sensors have demonstrated their capability to explore the spatial and temporal variation of vine growth status within the vineyard scale. Bramley et al. used a handheld proximal sensor (Crop Circle™, Holland Scientific, Lincoln, Nebraska, USA) and an EM38 electromagnetic soil sensor (Geonics Ltd, Mississauga, Ontario, Canada) to explore the spatial variability of vine vigor and yield as well as soil texture on a vineyard block scale, while neither ECa nor VIs derived from proximal sensor were good predictors of grape yield [
3]. Their result also indicated that EC
a and VIs significantly correlated with vine vigor. Furthermore, a portable hyperspectral spectroradiometer can be used to predicted the vine growth status and grape quality within the vineyard scale [
14,
15].
One of the main objectives of PV is selective harvesting. A major reason for wishing to do this, considering the variability of vineyards, is to harvest berries with consistent quality during the harvest stage resulting in higher profit margins from wine [
16]. In New Zealand, sugar content commonly describes the quality of wine grapes at harvest. The sugar content relates to the alcohol level of wine during the fermentation process. The traditional method used in monitoring wine grape sugar content is to perform sample-based laboratory destructive analysis which can be time consuming and an expensive process. In recent years, remote and proximal sensors have been used to monitor grape sugar content [
12,
17,
18]. For example, Benelli et al. used a push broom hyperspectral camera mounted on a vehicle to predict the sugar content of wine grapes [
17]. In addition, Kasimati et al. used NDVI obtained from proximal and remote sensing during different growing stages to predict grape sugar content with R
2 values of 0.61. Presently, the increase in computing power and advanced sensing techniques enable more accurate prediction of grape quality, helping grape growers assess grape quality before harvest and thus develop a selective harvesting plan.
Previous research has been conducted using Pearson’s correlation coefficient and machine learning techniques to explore the relationship between spectral index data and vine growth status, yield, and berry quality parameters. Pearson’s correlation coefficients have been constructed to select key spectra indices relevant to the grape sugar content [
12]. Machine learning techniques have been constructed to model both linear and non-linear relationships between spectral index data and vine growth status, yield, and berry quality. One study used an artificial neural network (ANN) to predict the table grape yield through different vegetation indices (VIs) obtained from satellite remote sensing [
9]. In regard to studies conducted in predicting berry quality parameters, many studies used hyperspectral imaging systems to predict various quality parameters such as sugar content, pH, and titratable acidity through different machine learning techniques [
19]. However, most of these studies used remote or proximal sensing to directly measure berries or clusters. Few studies have predicted grape quality through the VIs derived from canopy or leaf level. For example, Kasimati et al. achieved good prediction accuracy with R
2 values of 0.65 for grape sugar content using NDVI and automated machine learning [
20]. The VIs measured from vine canopy or leaves can reflect the vine vigor, water status, and nutrient status [
5,
14,
21]. Thus, it is important to explore the possibility of using the combination of VIs and machine learning techniques to predict grape quality parameters.
The aims of this study are to (1) create an alternative method to predict grape sugar content thorough VIs, derived from proximal and remote sensing, and other ancillary environmental variables during different growth stages, and (2) to determine the ability of different machine learning techniques in predicting grape sugar content.
2. Methods
2.1. Study Sites
The study research sites were conducted in two commercial wine grape vineyard blocks on the Palliser Estate located in Martinborough, New Zealand (175.45235°E 41.21119°S, WGS 84). The study vineyards are named Hua Nui (HN) and Pencarrow (PN). The variety used in the study areas was Pinot Noir which were planted in 1998–2000 for winemaking. The research sites selected for data collection were 3.31 ha for HN and 7.51 ha for PN. The wine grapes in study sites were trained with two-cane vertical shoot positioning. Inter- and intra-row planting space is 2.2 and 1.7 m for HN, and 2.2 and 1.8 m for PN. The region of Martinborough has a mild coastal climate with an average temperature of 18 °C. The soil in the vineyards are mostly clay and silty loams, which are known to have moderate soil water holding capacity. Vine phenology in research vineyards is shown in
Table 1. The vineyard manager is responsible for managing the vines and applied all inputs.
2.2. Grape Sugar Content Data Acquisition
Wine grapes were manually harvested between 20 February and 14 March 2023 (from veraison to harvest time). Three berries from a single bunch were randomly selected from each sampling vine.
Table 2 shows the number of sampling vines from each measurement date. During each measurement, the location of the sampling vine’s trunk was recorded by a global navigation satellite system (GNSS) with real-time kinematic (RTK) correction (model: GPS1200+, Leica Geosystems AG., Heerbrugg, Switzerland).
Figure 1 shows the sampling locations in each vineyard. After acquiring grape berries, total soluble solids (TSS) expressed in °Brix, was chosen as a proxy for grape sugar content and measured by using a portable digital refractometer (PAL-ALPHA Digital Refractometer, ATAGO CO., LTD, Tokyo, Japan). The sugar content of each sampling vine was determined by calculating the mean of the three measurements taken per sampled vine. The weather during collecting samples period is shown in
Figure 2.
2.3. Canopy and Leaf Reflectance Data Acquisition
Multispectral imagery was acquired by DJI P4 Multispectral (Da-Jiang Innovations, Shenzhen, China) between 11.00 and 14.00 under stable weather conditions and a clear sky on the 14 December 2022. The DJI P4 Multispectral was gimbal equipped, with a 6-camera array, one camera is an RGB camera, designed for visible light and capturing standard photos, while the remaining five cameras are dedicated to capturing 2 MP images, at various wavelengths: near-infrared, red-edge, red, green, and blue, with centre wavelengths at 840 nm, 730 nm, 650 nm, 560 nm, and 450 nm, and bandwidth of ±26 nm, ±16 nm, ±16 nm, ±16 nm, and ±16 nm, respectively. The DJI P4 flew at 80 m height capturing images with a spatial resolution of 4.2 cm for PN, and some at 84.9 m resulting in 4.5 cm spatial resolution imagery for HN. The DJI P4 is equipped with an integrated sunlight sensor that records irradiance during flight. This sensor captures data in the same bands as the multispectral sensor, which is used for radiometric calibration purposes. In addition, radiometric calibration was performed on the image blocks, using reference images acquired from a calibrated reflectance panel (Seattle, WA, USA). To assure the accuracy of the image, several ground control points were recorded by GNSS-RTK correction in each vineyard during flight for geometric correction. Photogrammetric processing was applied to the georeferenced multispectral images using Pix4Dmapper (Pix4D SA, Lausanne, Switzerland). The digital surface models (DSM), digital terrain models (DTM), and reflectance maps of the study sites were created by Pix4Dmapper.
Due to discontinuous vegetation surfaces for the vineyard features, it becomes essential to differentiate between the pixels belonging to the vine canopy and those in the inter-row spacing. The process of detecting vine rows consists of the following steps: (1) NDVI were calculated by band math function in ENVI 5.6 (Research Systems Inc., Boulder, CO, USA), producing a greyscale image for each study site; (2) then, a global threshold was implemented on the NDVI images to generate a binary image based on Otsu’s method (Otsu threshold value is 0.535); (3) a single greyscale image was calculated from subtraction of the DTM to the DSM for each study site; (4) then, a height threshold of 0.9 m was implemented on the greyscale images to generate a binary image [
13]. The height threshold was dependent on the vineyards’ architecture; (5) then these two binary images were converted to a feature map by ArcGIS Pro 2.9 (ESRI, Redlands, CA, USA); (6) then, using the “Clip” tool in ArcGIS pro to extract the overlapping portion of these two features; (7) using the vine row feature map as a mask layer to retain the original multispectral images (
Figure 3).
We located the image pixels that corresponded to the 236 sampling points by selecting pixels on both sides of each sampling point, considering the vine spacing within each vineyard. From these pixels, we extracted reflectance values, averaged them, and subsequently calculated the VIs using the formulas outlined in
Table 3.
Leaf reflectance data was measured during 17 February 2023 to record the NDVI of the grapevines’ leaves. Three leaves in each sampling vine were randomly selected and scanned using a handheld RapidSCAN CS-45 (Holland Scientific Inc., Lincoln, NE, USA) to record these three leaves NDVI value (NDVI_proximal). The average of the three leaves NDVI values (NDVI_proximal) was used to represent the sampling vine growth status during veraison.
2.4. Vine Vigour Parameter Acquisition
The trunk circumference (TC) was chosen as a proxy for vine vigor. The trunk circumference of all of the vines was measured 10 cm above the graft union and 10 cm below the head of the vine with the paper ruler during the bud and leaf growing stage. These two measurements per sampled vine were averaged to represent the vine vigor [
33].
2.5. Soil and Terrain Data Acquisition
Many studies indicated that soil electrical conductivity (EC
a) can be used to assess soil texture types and water content [
34,
35]. Soil EC
a is widely used to explore the spatial variation in soil properties within the vineyards [
33]. In addition, one study showed that soil EC
a is directly related to vine water status, berry weight, and sugar content [
36]. In this study, an electromagnetic induction sensor EM38-MK2 (Geonics Ltd., Mississauga, ON, Canada) was used to assess the soil EC
a in the study area during 27 May 2021. The EM38-MK2 was operated in the vertical dipole mode, capturing integrated EC
a measurements at a depth of approximately 1.5 m. The EM38-MK2 was towed at the back of an all-terrain vehicle, maintaining a distance of less than 0.2 m between the instrument and the vehicle. To ensure accurate georeferencing of all point data from the EC
a (mS/m) survey, a Trimble Yuma tablet equipped with an onboard GPS receiver (model: Yuma, Trimble) accurate to 2–4 m, was utilized. Soil EC
a points were measured at intervals of approximately 3–10 m along transects, with a 10 m spacing between individual measurements.
The elevation (m) and slope (degree) data for the study site were obtained from the ‘Wellington LiDAR 1 m DEM (2013–2014)’ layer. This dataset was made accessible through the Land Information New Zealand data service (
https://data.linz.govt.nz/, accessed on 28 July 2023). This digital elevation model (DEM) has a resolution of 1 m and was created using aerial LiDAR data captured between 2013 and 2014, encompassing the study vineyards.
2.6. Geostatistical Analysis
The application of geostatistics in PV has been undertaken in many studies [
3,
33]. The purpose of geostatistics is describing spatial autocorrelation of a regionalized variable (the georeferenced data) and uses this information to predict the values of the variable across an entire field by Kriging. The central tool in geostatistics is the variogram, which is a set of semi-variances plotted against the lag distances between the measurements to describe the way in which a property varies from place to place. Experimental variograms were computed using R statistical software (R Core Team, version 4.2.2) with the package “gstat” by the Matheron’s method of moments, as in the following formula:
where
is the number of paired comparisons at the lag interval
,
, and
are the values of the property at two locations separated by distance
.
After computing the experimental variogram, the experimental variogram based on a suitable mathematical model. The best fitting mathematical model was selected based on the lowest residual sum of squares (RSS). The parameters of the fitted variogram were used to interpolate the soil ECa, slope, and elevation value based on ordinary Kriging by using ArcGIS Pro 2.9 (ESRI, Redlands, CA, USA). The soil ECa values less than 0 mS/m were removed before doing geostatistical analysis. The Kriging interpolation images were then exported to a raster layer with the same gride size as used by the multispectral UAV image. For each sampling vine, the mean values of soil ECa, elevation, and slope were computed using “zonal statistics as table” in ArcGIS Pro 2.9.
2.7. Machine Learning Model
Different machine learning models were performed to predict the sugar content. The input variables include, canopy reflectance data, leaf reflectance data, TC, soil ECa, elevation, slope, and day of year for sampling date (DOY). The machine learning models used; included regularized regression, k-Nearest Neighbors (KNN), support vector regression (SVR), random forest regression (RFR), XGBoost, and ANN.
Regularized regression: regularized regression is used to explore the linear relationship between input and output variable, when the data set contain more features than observations. In addition, it is suitable for the analysis of multicollinearity among the features. The objective function of a regularized regression is in the following formula:
where
is the sample number,
is the measured truth value of the
th sample,
is the predicted value of the
th sample,
is the penalty term.
In this study, we used ridge penalty and lasso penalty to build the linear relationship between input and output variables. When using the ridge penalty, the formula of penalty parameters is
When using the lasso penalty, the formula of penalty parameters is
where
is a tuning parameter,
is the feature number,
is the regression coefficient of the
th feature.
KNN: It aims to identify the k most similar instances from a training dataset to predict the target value of a new data point. The Euclidean distance metric are commonly used to determine the similarity between observations. The formula of Euclidean distance is
where
and
represent the observations,
represents the feature,
represents the feature number.
SVR: Support vector machine (SVM) try to find a hyperplane in an N-dimensional space (N-the number of features) that “best” classify the two classes. Hyperplane represents a decision boundary that help classify the data points. SVR is an extended tool of SVM to solve regression problems.
RFR: It is an ensemble learning model which combines the multiple decision trees on different subsets of the training data (bootstrap samples), and then averaging their predictions to achieve more accurate and robust regression results.
XGBoost: It combines gradient boosting with regularization techniques, creating a boosted ensemble of decision trees. XGBoost optimizes model predictions by fitting negative gradients of the loss function during each boosting iteration.
ANN: It consists of interconnected nodes called neurons organized into layers: an input layer, one or more hidden layers, and an output layer. Each neuron receives input data, performs computations, and passes the results to the next layer through weighted connections. Activation functions introduce non-linearity and enable ANN to learn complex relationships in data. In this study, we use a single layer neural network to predict the TSS based on input variables.
To evaluate the model’s prediction performance, the dataset is divided into training and test sets with the ratio of 7:3. This process was repeated 20 times with different data splits to improve the estimated performance of study models. The performance of machine learning models is affected by their hyperparameters. Thus, it was important to tune the hyperparameters. Bayesian hyperparameter optimization was used on the training set with 10-fold cross-validation to search for the best combination of hyperparameters based on the root mean square error (RMSE). In addition, the coefficient of determination (R
2) and RMSE were selected to compare the performance of different machine learning models on the test set. The Waller–Duncan test was used to conduct multiple comparisons between different machine learning models. The formula of R
2 and RMSE was as follows:
where
is the number of sampling vines,
is the measured TSS value of the
th vine sample,
is the mean measured TSS value,
is the model predicted TSS value of the
th vine sample.
4. Discussion
In this study, we explored the potential to use VIs, soil EC
a, elevation, slope, and TC data as input variables to predict the grape sugar content in a non-destructive way. A total of 236 samples from Pinot Noir cultivars had the TSS measured values based on destructive methods from two commercial vineyards and used as output variable in the regression model. The grape TSS was measured on five different days in the period from veraison to harvest. During the veraison stage, the berries start to mature changing color, softening, accumulating sugar, and reducing acid [
37]. From veraison, grape growers start to measure the grape quality parameters such as TSS, pH, and titratable acidity in order to determine the best harvest day. Among them, the TSS is an important parameter to assess the grape maturity as it can determine the alcohol concentration and flavor of the subsequent wine.
Figure 4 shows that the sugar content of grapes initially decreased and then increased during the study period. However, in a previous study, the sugar concentration of grapes exhibited a strong increase starting from veraison and eventually reached a plateau during harvest stage [
38]. One possible reason is that the sampling vines were randomly selected during each of the five-measurement days, without repeating the selection. This differed from the sampling strategy used in the previous study. Several studies have shown that there is considerable spatial and temporal variation in grape TSS [
2,
39]. Thus, the grapes in different geographic locations may accumulate sugar at different rates [
2]. On each measurement day, the large magnitude of the interquartile range and the outlier in
Figure 4 showed the large spatial variability in grape TSS within the vineyard blocks. Thus, it is inappropriate to take a single or average measurements, collected in the vineyard to represent the grape maturity stage.
Due to the spatial variability of grape quality at the vineyard scale, it is important to measure the grape quality across the entire vineyard. However, the traditional method relies on destructive measurement, which makes it impossible to measure each grape berry’s quality. Thus many studies have explored the potential of using advanced sensing techniques to measure the grape quality with a non-destructive method [
15,
17,
18,
40]. However, most of these studies used direct measurements of berries or clusters to estimate grape quality parameters [
15,
17,
31]. Few studies have predicted grape quality by canopy or at the leaf level [
18]. In this study we propose an alternative method to predict grape TSS based on VIs and other ancillary data. Different biological stress conditions can cause changes in canopy status such as heightened pigment levels or canopy structure change, which will affect the quality of subsequent berries. These changes can affect the way plants interact with light of different wavelengths [
41]. Based on these characteristics, many studies have used hyperspectral and multispectral images to predict the crop growth status, yield and quality [
14]. Compared with hyperspectral cameras, this study used a cheaper multispectral UAV imaging system to acquire the canopy reflectance data in the red, red edge, blue, green, and near-infrared bands. The red, red edge, and near infrared bands have been reported to be related with the chlorophyll content and leaf structure. The blue and green bands have been proven to be associated with the canopy pigments change. In this study, we calculated 23 VIs based on different bands reflectance data values. Due to the grape TSS being measured from different locations without a repeat at each sampling date, the data was grouped based on measurement date, and the Pearson’s coefficient calculated between VIs and grape TSS in each group (
Figure 5). A previous study calculated the Pearson’s correlation coefficient between spectral indices acquired from UAV RGB imagery and grape quality parameters [
12]. The result found significant correlations between berry weight, malic acid, alpha amino nitrogen, phenolic maturation index, total polyphenol index, and spectral indices. However, the spectral indices calculated from RGB imagery show a poor correlation with TSS in their study. In this study, multispectral cameras can provide additional spectral values in the near infrared and red-edge band. The VIs including OSAVI, RDVI, EVI, ARI, and MSAVI which are computed based on the spectral bands including near infrared and red edge, which showed a strong correlation between grape TSS during the harvest stage (
Figure 5). Furthermore, the simple linear regression between each of the 23 VIs and grape TSS during harvest stage was examined. It is apparent that the prediction performance of using only RGB bands was lower than that of using the VIs which combined with near infrared and red edge bands.
The MSAVI, RDVI, and OSAVI were calculated from different wavelength reflectance in the red and near infrared bands. Red and near infrared band are the most common band combination to monitor biomass and vegetation density as well as biophysical parameters [
42]. PCD and NDVI calculated based on the reflectance value in red and near infrared band were widely used in PV [
18,
21,
43,
44]. For example, one study used PCD as an indicator to represent the vine vigor and explore their spatial variability within vineyard [
43]. In addition, the NDVI are commonly used to identify the vine row space using high spatial resolution imagery [
45]. Due to the discontinuity of the vegetation surface in the vineyard, it is necessary to extract the vegetation information from the vine canopy under the high-resolution images, to reduce the influence of soil and weeds between rows. In this study, a simple threshold method on NDVI and DSM-DTM was used to identify the vine row area. However, the Pearson’s correlation coefficient between NDVI and grape TSS is low in this study. A previous study showed that the NDVI obtained at veraison stage to have a strong correlation with grape TSS [
18]. One possible reason is that the UAV imagery was acquired during the flowering stage. Most studies showed that there was a strong correlation between NDVI and grape quality parameters at the late development stages of vine growing [
46,
47]. It is worth noting that the vineyard will use the antibird net to protect the grapes after veraison. This makes it difficult to use UAV and satellites to obtain accurate canopy reflection data during this period.
Furthermore, the potential of using machine learning models to predict the grape TSS was evaluated. In addition to the VIs collected during flowering, the input variables in the machine learning includes the NDVI_proximal value measured on a handheld proximal sensor during the post veraison stage, the soil EC
a value measured from EM38-MK2, elevation and slope value obtained from LiDAR data, and TC measured in the field. The dataset was used to train and test machine learning models, evaluating the performance of linear and nonlinear regression models including lasso regression, ridge regression, KNN, SVR, RFR, XGBoost, and ANN. When we used all input variables in machine learning models, the ensemble method, which included RFR and XGBoost showed similar prediction accuracy for grape TSS prediction, with the best-fitted model achieving R
2 = 0.52 and RMSE = 1.19 °Brix. These results confirm the findings of [
18], who compared the linear and nonlinear regression models in predicting grape TSS, and they concluded that the AdaBoosting, RFR, and Extra Trees model outperform the other machine learning models [
18]. In addition, another study showed that the XGBoost and RFR demonstrated greater capability for modeling crop yield than linear regression model and ANN [
48]. Compared with [
48], this study tested different machine learning models, 20 times with different test sets, and used Waller–Duncan to analyze the differences between the performance of each model. Furthermore, this study used OSAVI or NGBDI as main input variables with other ancillary data to predict grape TSS based on different machine learning models. We chose OSAVI and NGBDI as the main input variables, as they represent the Vis that can be obtained when different sensors are available. Similar results were obtained with 23 VIs used as input variables, the RFR showed the best prediction performance (
Table 4,
Table 5 and
Table 6). Therefore, the implementation of ensemble learning techniques provide potential to predict the grape TSS in a non-destructive way, based on remote sensed data. However, it should be noted that berry quality was affected by the environmental conditions during the harvest stage (e.g., radiation, temperature). A further study should continue to explore relationships, using different input variables to predict grape TSS. In addition, the berry samples were only 3 berries per vine which cannot represent the whole vine grapes’ TSS value but was used to return a range of TSS values. The sample size may influence the prediction performance when using different machine learning models. Thus, further study should increase the berry sampling numbers.