Machine Learning-Based Hourly Frost-Prediction System Optimized for Orchards Using Automatic Weather Station and Digital Camera Image Data

Noh, Ilseok; Doh, Hae-Won; Kim, Soo-Ock; Kim, Su-Hyun; Shin, Seoleun; Lee, Seung-Jae

doi:10.3390/atmos12070846

Open AccessArticle

Machine Learning-Based Hourly Frost-Prediction System Optimized for Orchards Using Automatic Weather Station and Digital Camera Image Data

by

Ilseok Noh

,

Hae-Won Doh

,

Soo-Ock Kim

,

Su-Hyun Kim

,

Seoleun Shin

and

Seung-Jae Lee

^*

National Center for AgroMeteorology, Seoul 08826, Korea

^*

Author to whom correspondence should be addressed.

Atmosphere 2021, 12(7), 846; https://doi.org/10.3390/atmos12070846

Submission received: 20 May 2021 / Revised: 22 June 2021 / Accepted: 28 June 2021 / Published: 29 June 2021

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

Download

Browse Figures

Versions Notes

Abstract

:

Spring frosts damage crops that have weakened freezing resistance after germination. We developed a machine learning (ML)-based frost-classification model and optimized it for orchard farming environments. First, logistic regression, decision tree, random forest, and support vector machine models were trained using balanced Korea Meteorological Administration (KMA) Automated Synoptic Observing System (ASOS) frost observation data for March from the last 10 years (2008–2017). Random forest and support vector machine models showed good classification performance and were selected as the main techniques, which were optimized for orchard fields based on initial frost occurrence times. The training period was then extended to March–April for 20 years (2000–2019). Finally, the model was applied to the KMA ASOS frost observation data from March to April 2020, which were not used in the previous steps, and RGB data were extracted by digital cameras installed in an orchard in Gyeonggi-do. The developed model successfully classified 117 of 139 frost observation cases from the domestic ASOS data and 35 of 37 orchard camera observations. The assumption of the initial frost occurrence time for training helped the most in improving the frost-classification model. These results clearly indicate that the frost-classification model using ML has applicable accuracy in orchard farming.

Keywords:

classification model; frost prediction; image processing; machine learning; orchard

1. Introduction

Frost is a phenomenon in which water vapor in the atmosphere crystallizes when the temperature falls below zero. It occurs at a small scale near the surface and is difficult to predict, owing to its complicated growth process and nonlinear interaction between the contact surface and atmosphere. Late frost occurs in spring and considerably damages crops that have weakened freezing resistance after germination. As the risk of late-spring frosts increases, it is important to predict spring frost and share the information with farmers. To this end, Mosedale et al. [1] expected the risk of late-spring frosts to increase because of the earlier timing of grapevine bud break in the UK under future climate scenarios. A temperature-based frost index was then developed for frost warnings [2,3,4].

Chevalier et al. [5] developed a frost-alarm system using a fuzzy expert model. Alongside the development of artificial intelligence (AI), studies have been conducted to predict frost days by applying weather information to machine learning (ML) techniques [6,7,8]. In recent years, there have been attempts to mitigate frost risk with hybrid AI methods that combine various internet-of-things (IoT) sensors [9,10,11,12]. Most ML models using the IoT showed good accuracy and precision, but IoT devices were mainly used in greenhouses, owing to the availability of internet and power.

Radiation frost is caused by radiative cooling on the surface of the ground at night, and advection frost is caused by the advection of cold air [13,14]. South Korea comprises small and complex agricultural lands, and many corresponding studies on frost mechanisms and trends have been conducted. Kwon et al. [15] analyzed the meteorological characteristics of frost occurrence over the past 30 years and concluded that frost is predominantly caused by radiative cooling in South Korea. The frost predominantly occurs from October to April, with the first and last frost days occurring late. Notably, the number of late-frost phenomena tends to increase; thus, it is difficult to predict crop damage caused by late frost [16]. Bae et al. [17] analyzed the temporal and spatial variations in the number of frost days using a climate-change scenario. As in previous studies that used observational data, the first frost days were delayed, and the late-frost days arrived sooner than expected. Kim et al. [18] expected that the flowering period for the growth of pears, apples, and peaches would occur earlier if it were calculated based on the climate-change scenario. This increases the frost risk for flowers that have very weak freezing resistance compared with the dormant period, as deviations in low temperature increase after flowering.

Using frost observation data in South Korea for learning, Lee et al. [19] attempted to use logistic regression (LR) and decision tree (DT) techniques to predict frost, and Kim et al. [20] estimated the occurrence of frost using artificial neural networks, random forests (RFs), and support vector machines (SVMs). However, these studies did not verify the proposed models using field observations and had a low temporal resolution in terms of daily frost prediction. One of the most common anti-frost techniques, sprinkler irrigation, requires approximately 2.5–5.1 mm/h of water. Additionally, for the wind–machine technique, a 65–75-kW power source is needed for each 4.0–4.5 ha [21]. Therefore, predictions were insufficient to mitigate frost risks for orchards from the viewpoint of management.

In this study, we employ four ML methods (i.e., LR, DT, RF, and SVM) to develop frost-classification models based on meteorological data uniformly observed at the 24 h-manned synoptic weather observation stations of the Korea Meteorological Administration (KMA) Automated Synoptic Observing System (ASOS). Subsequently, the model with the highest classification accuracy is ultimately selected and optimized for application in actual farming environments. Then, the performance of the developed model is verified using the KMA ASOS frost observation information from March to April 2020 and the frost image information obtained from an orchard in Gyeonggi-do, Korea.

2. Data and Frost-Classification Model

2.1. Input Data

Currently, frost observations in South Korea are performed twice daily (a.m. and p.m.) at 22 stations of the 24 h-manned synoptic weather observation stations of the KMA ASOS. Data were collected from a total of 19 inland stations (Figure 1). Nighttime was set as 17:00–06:00 LST instead of 18:00–06:00 LST for the link up with the 17:00 LST weather forecast of KMA. Focusing on the late frosts in spring, which directly cause significant damage to crops, nighttime (17:00–06:00 LST) temperature, subzero duration, precipitation, wind speed, humidity, snowfall, three-hourly fresh snowfall, and ground temperature over 10 years (2008–2017) were used for model training. A value of “1” was assigned collectively if frost was observed in the morning from 17:00 LST on the previous day to 06:00 LST on the present day, and a value of “0” was assigned otherwise. The subzero duration is a secondary variable that uses temperature. It is a value obtained by accumulating the duration of the subzero temperature from 17:00 LST on the previous day to 06:00 LST on the present day in 1 h increments. When the observed temperature is restored to above zero, it is initialized to “0”. The idea was devised considering that, for frost crystals to develop into a frost layer, they must be cooled for a critical time [22,23] based on the frost index [2,3,4] using a subzero duration.

For onsite verification, orchards in Buk-myeon (point A in Figure 1; located in Gapyeong-gun, Gyeonggi-do) and Wabu-eup (point B in Figure 1; located in Namyang-si, Gyeonggi-do), each installed with an automatic weather station (AWS), were selected as verification target sites. To observe the frost phenomenon, the fruit’s growth stage, and farming activities that can affect weather observation, a camera was installed on the AWS to obtain 2560 × 1440-pixel images 10 times daily (05:00, 06:00, 07:00, 08:00, 09:00, 10:00, 12:00, 14:00, 16:00, and 18:00 LST) (Figure 2). Meteorological observation data for the period March–April 2020 from the verification target sites, the Buk-myeon station in Gapyeong-gun and Wabu-eup station in Namyang-si, were provided by the Gyeonggi-do Agricultural Research and Extension Services (GARES) (http://nongup.gg.go.kr (accessed on 29 June 2021)) and were used for verification.

2.2. Preprocessing of Input Data

Data quality is the most important factor in classification algorithm training. As an input data preprocessing step to increase the frost occurrence classification accuracy of the model, errors and missing values were processed, and data were categorized. By referring to the quality-control (QC) flag (0: normal, 1: error, 9: missing), time data, including missing and error values of precipitation, wind speed, humidity, and ground temperature, were deleted. Snowfall and three-hourly fresh snowfall data with no observations (null) were replaced with “0.” According to Eltahir [24], wet soil moisture conditions tend to enhance the net terrestrial radiation at the surface via cooling, and the precipitation increases the water content in the soil. KMA’s observation policy distinguishes rain days and no-rain days. Precipitation data have a null value for a no-rain day, whereas 0 mm signifies light rainfall that cannot be measured by the sensor. Precipitation is classified into three categories to consider ground conditions that classify precipitation and non-precipitation time, and to use these as measures to represent sky conditions. Among the observations with a normal QC flag (0), when there was no observed precipitation value (null), it was classified as “no rain”, whereas observations of 0 mm or more, and less than 1 mm, were classified as “light rain”. Those 1 mm or more were classified as “rain” (Table 1). The heat loss on the land surface is caused by evaporation of water on the wet surface caused by light rain. Therefore, light rain can contribute to nighttime cooling compared to a dry surface. On days with continuous precipitation, there are overcast clouds and high relative humidity; such days were classified as “rain” because they were distinguished from weather conditions prone to radiation frost.

The criteria for “balanced” and “unbalanced” data depend on the amount of frost observation data. All data for the past 10 years to be used for model development were unbalanced data because they had approximately 5.5-times more days of non-frost occurrence than frost events (Figure 3A). When the frost and non-frost data sets are 50:50, the data are balanced. The training data set in the classification model using ML techniques considerably affects model accuracy, especially in the case of classification models, in which the importance of balanced data is very high. Unbalanced data can degrade model accuracy because more data are trained on days when frost does not occur during classification model training [25,26]. As general methods of resolving data imbalance, more weight was given to the side with less data, and techniques, such as up-sampling, down-sampling, and the synthetic minority over-sampling, were applied to adjust and balance data [27]. To resolve imbalanced input data, the ratio of frost events was adjusted to 50:50 by applying the down-sampling method, which completely preserved the observed values of frost days (Figure 3B). The balanced data were again divided into training and testing at a ratio of 70:30 by applying a randomization method. They were then used to quantitatively diagnose performance by calculating the evaluation indicators of model training and those of the trained model. Furthermore, verification with unbalanced data that had not been sampled was performed to verify the performance of the model, as frost phenomena occur disproportionately in the real world.

2.3. Variable Setting and ML Technique

The complex model proposed in this study includes eight variables of input data (i.e., temperature, subzero temperature duration, precipitation, wind speed, humidity, snowfall, three-hourly fresh snowfall, and ground temperature) based on the observation factors of the KMA ASOS. The simple model includes five variables of input data (i.e., temperature, subzero temperature duration, precipitation, wind speed, and humidity), which are major observation factors of AWS. Originally, a model in which dew-point temperature was included as a variable was selected; however, the variance inflation factor was the highest at the dew-point temperature in the multicollinearity test. Therefore, models with the corresponding factor removed were selected. The frost-classification model was built in the R language; the packages used according to the ML technique are summarized in Table 2 [28,29,30,31,32,33].

\begin{array}{l} F r o s t_{c o m p l e x} & = t e m p e r a t u r e + s u b z e r o d u r a t i o n + p r e c i p i t a t i o n \\ + w i n d s p e e d + h u m i d i t y + s n o w f a l l + \\ t h r e e- h o u r l y f r e s h s n o w f a l l + g r o u n d t e m p e r a t u r e \end{array}

(1)

\begin{array}{l} F r o s t_{s i m p l e} & = t e m p e r a t u r e + s u b z e r o d u r a t i o n + p r e c i p i t a t i o n \\ + w i n d s p e e d + h u m i d i t y \end{array}

(2)

Frost-classification models classify the presence or absence of frost phenomena. DT [34], RF [35,36], and SVM [37] methods are known to perform well with binary classification problems. LR models have lower predictability and accuracy than other ML classification methods; however, the prediction result is a probability value rather than a zero or a one. Therefore, it has the advantage of allowing the threshold to be adjusted by verification to improve the prediction accuracy of frost occurrence.

We employed the tree, rpart, and party packages in the R language for DT. Each package differs in terms of its pruning method. The tree package uses the binary recursive partitioning method, and the rpart package uses the CART methodology to determine the pruning variables based on entropy and Gini coefficients. The party package uses the methodology of unbiased recursive partitioning based on permutation tests to determine the variables to be pruned based on the importance that passed the P test. The Gaussian radial basis function kernel is used for SVM.

2.4. Model Evaluation

2.4.1. Performance Evaluation Indicators

In this study, a confusion matrix (Table 3) was prepared to evaluate the performance of the frost-classification model. The matrix comprises data of unclassified frost (true negative (TN)), classified frost (false positive (FP)) when frost is not observed, unclassified frost (false negative (FN)), and classified frost (true positive (TP)) when frost is observed. As frost is classified as the presence or absence of the phenomenon, accuracy (ACC), false-alarm ratio (FAR), probability of detection (POD), and critical success index (CSI) are selected as the verification indicators. Their respective equations are as follows:

ACC = \frac{TP + TN}{TP + TN + FN + FP},

(3)

FAR = \frac{FP}{TP + FP},

(4)

POD = \frac{TP}{TP + FN},

(5)

CSI = \frac{TP}{TP + FP + FN} .

(6)

The ACC is the ratio of the correct classification in the total classification, and the FAR is the number of false alarms. The POD is the ratio of the classified frost by the model to the observed number of the actual frost occurrence. The CSI is the hit rate of frost occurrence classifications excluding TN. In natural conditions, there are far fewer cases of frost phenomena than cases of nonoccurrence and, because predicting frost occurrence is more important than predicting nonoccurrence, the CSI is considered the most important indicator. The area under the curve (AUC) of the receiver operating characteristic curve was calculated. The AUC had a value between 0.5 and 1; the closer it is to 1, the better the model performance is [38].

2.4.2. Performance Result

The confusion matrix (Table 4) and verification index (Table 5) for the test data for each classification model were calculated. The results of the DT technique were denoted Tree 1, Tree 2, and Tree 3 in the order of tree, rpart, and party packages. In the case of the tree package (Tree 1), the same confusion matrix was obtained for the complex model and the simple model.

The ACC of each derived classification model was particularly high in the SVM complex model, and that of the RF complex model was the second highest. The FAR was the highest for the RF complex model, and the POD was the highest for the SVM complex model. The CSI was in the following order: SVM (0.637), RF (0.62), LR (0.605), and DT series (Tree 3: 0.596, Tree 2: 0.575, Tree 1: 0.568). The AUC was in the order of RF (0.853), tree 3 (0.836), LR (0.816), SVM (0.771), Tree 2 (0.753), and Tree 1 (0.708).

For all techniques, the complex model had a higher verification index value than the simple model; however, if the input data of the frost-classification model were to be replaced with the numerical weather prediction output value in the future to account for the numerical model error, the reliability would not necessarily increase with the input variables.

When the performance indicators were synthesized according to the test data, the RF and SVM techniques were determined to be the most appropriate for frost classification, They were selected as the final classification techniques for the frost-classification model (v1.0).

3. Application and Optimization

Frost is a meteorological phenomenon that is significantly affected by topography and the environment on a small spatial scale. The frost-classification model developed in this study aims to predict frost in the natural environment of an orchard, not a weather station, which minimizes the topography and environment. To produce frost information at a level that can be used on farms, the frost-classification model (v1.0) based on the two ML techniques selected in Section 2.4 (RF and SVM) was optimized and applied to the orchard in the pilot service target site (Gyeonggi-do) to verify the model performance.

3.1. Model Optimization Method

To optimize the ML-based frost-classification model and improve its classification performance, first, an assumption was introduced regarding the initial frost occurrence time of the frost occurrence date, which was used as the learning data. Second, the night minimum temperature variable was added to the learning data. Third, the period of observation data used for learning was expanded from March for the previous 10 years (2008–2017) to March–April for 20 years (2000–2019). Section 3.1.1, Section 3.1.2 and Section 3.1.3 provide detailed descriptions of each method.

3.1.1. Assuming the Initial Time of Frost Occurrence

Frost observations data in South Korea show that frost occurs in the morning and afternoon. As the time when the frost will occur cannot be known from the observation information, the pre-optimization model (v1.0) had the same value as the frost observation information for all night times when constructing the hourly training data. The result classified by the model trained in this way can be viewed as information regarding the occurrence of frost the next morning, not as a classification of the occurrence of frost over time. Considering that several previous studies assumed frost days to be days when the minimum temperature was below 0 ℃ [5,7,14,15,16,17], the time at which the temperature reached 0 ℃ at night was assumed as the initial frost occurrence time when frost was observed in the morning (Figure 4). It was also assumed that frost only existed at 06:00 LST, the last time of the input data for the day, when the temperature remained higher than 0 ℃.

3.1.2. Minimum Temperature at Night

When analyzing the verification results of the frost-classification model in real cases, a factor that affected the classification accuracy of the model was the occurrence of frost when the daily minimum temperature was higher than 0 ℃. Such cases often occurred, and the cause is surmised to be errors arising from the difference between the height of the observation station thermometer and the location at which the frost is observed; the observation data consist of each hourly air temperature, and the lower temperatures that could occur between observation times were not considered. To compensate for this, the minimum night temperature was added as an input variable to the training data. The lowest value among the minimum daily temperature and the value of the hourly temperature at nighttime (17:00–06:00 LST) observed at the observatory was used as the minimum nighttime temperature. The minimum daily temperature occurred primarily in the morning; however, to reflect cases where the temperature was rather high in the morning, the lower value among the values compared with the hourly temperature during the nighttime was used.

3.1.3. Extension of Training Period

Increasing the number of training data is a simple method to improve the performance of the ML model. However, increasing the training too much can lead to overfitting and could deteriorate the model’s performance. For this reason, the training data, which consisted of weather observation data for March for 10 years, were gradually extended to weather observation data for March and April for 20 years (2000–2019). Although there are a few days when frost occurs in April, the training period was extended because April is the flowering period of fruit trees in South Korea, and the frost causes considerable damage. Initially, the goal of the training-period expansion plan was to include data for 30 years (1990–2019), but observations before 1999 had a number of missing temperature values for more than 2 h in a row. As the missing temperature would significantly affect the calculation results of the subzero duration and the assumption of the initial frost occurrence, weather observation data since 2000 were used.

3.2. Optimization Results: Case Period March–April 2020

Table 6 summarizes the contents of the phased optimization of the frost-classification model. The performance evaluation index values for each version, which were calculated using the test data reconstructed as balanced data, are shown in Table 7. In the pre-optimization version (v1.0), when the initial frost occurrence time assumption was added (v2.0), all indicators were significantly improved. Furthermore, the difference in performance between the complex model requiring eight input variables, and the simple model requiring five input variables, for the same ML technique, was also considerably reduced. The performances of the ML techniques were almost identical. In Version 2.1, in which the training period was extended to March–April 2008–2019, all verification indicators of both ML techniques were improved. Version 2.2, in which the minimum temperature at night was added to the training to increase the classification accuracy for cases in which frost was observed when the daily minimum temperature was higher than 0 °C, provided a slight improvement in accuracy compared with Version 2.1. In Version 2.3, in which the learning period was extended from 2008–2019 to 2000–2019, the verification index decreased compared with the previous version.

Figure 5 presents the performance evaluation indicators for ASOS by version. The station-by-station confusion matrix is calculated based on frost observations and classified frost occurrence data by model. The data classified as frost occurrence were determined when the classification output of both ML techniques signified frost occurrence. As in Table 7, most stations have noticeable performance improvements in Version 2.0. Pohang, Changwon, Busan, and Yeosu showed lower indicators compared to the total. These regions have a common topographical characteristic: the southern coast of the Korean peninsula (Figure 1). Kwon et al. [15] determined that the average daily minimum temperature for the spring frost occurrence days of 1973–2007 in this southern coastal region was 1.0 °C. As mentioned in Section 3.1.2., the limitations of the current model have been prominently shown in these areas where there are many cases (e.g., frost was observed when the daily minimum temperature was higher than 0 ℃) that the models did not classify well.

3.2.1. Case Verification Using KMA ASOS Data

The classification results for each version were compared with the actual frost observation data using the weather observation values from 17:00–06:00 at 18 locations of the KMA ASOS from March to April 2020 as input data. For the classification result of the model, frost occurrence was classified as true only when both ML techniques classified frost as occurring. As the morning frost observation data were obtained daily, hourly verification was conducted assuming the first frost occurrence time. Furthermore, daily verification was conducted for only 06:00 LST classification results. Table 8 shows the classification results of each version using the confusion matrix and verification index. Unlike the test data of the learning DB, which are balanced data, the ACC was high for all versions in the actual case because there are less data of frost days than there are of non-frost days. The FAR, which was a limitation of the pre-optimization model (Version 1.0), was considerably improved in the version after Version 2.0, which reflected the assumption of the initial frost occurrence time. If the training data of the pre-optimization model (Version 1.0) had data showing frost occurring the next day, even when it was difficult to accurately predict frost occurrence, the existing frost occurrence data likely contributed excessively to frost classification. However, the version after Version 2.0 that reflected the initial frost occurrence time assumption classified less frost occurrence as a whole, indicating a decrease in TP and an increase in FN, resulting in a lower POD.

Among the versions (after Version 2.0) that reflected the initial frost occurrence time assumptions, the POD was the highest for Version 2.2, and TP, which indicated that frost was classified to occur on days when frost was observed, was also the highest. However, because there is still a tendency to overestimate frost occurrence, both the frost event itself and its duration were classified as being longer on the day when the occurrence of said frost was classified. Version 2.3, which learned from the meteorological observation data for 20 years, classified fewer frost occurrence days than Version 2.2, which learned from meteorological observation data for 10 years. The FN of Version 2.3, which indicated failure to classify the actual observed frost, was higher for the simple model than for Version 2.2. The overestimate that appeared in Versions 2.1 and 2.2 was reduced in Version 2.3 (the decline in FP). Thus, we improved the ACC and FAR in Version 2.3.

3.2.2. Verification of Orchard Cases in March–April 2020 Using Digital Cameras and AWS Observations

The frost-classification model was verified by meteorological observation data of GARES AWS and digital camera image data from March–April 2020 of the orchard selected as the verification target. The verification period was from March 1 to April 19 in Buk-myeon, Gapyeong-gun, and March 1 to April 22 in Wabu-eup, Namyang-si. Five days that could not be identified owing to strong fog (Buk-myeon, Gapyeong-gun: March 1, March 22; Wabu-eup, Namyangju-si: March 22, April 18, April 20) were excluded from the verification. In the case of orchard AWS, rainfall of 0.1 mm or more, and less than 1 mm, was categorized as “light rain” because rainfall of 0 mm was not used to distinguish between precipitation days and non-precipitation days, unlike the KMA observation.

While estimating the frost occurrence date in the orchard using digital camera images, the days when frost heave occurred and the days when a thick frost layer occurred were first classified (Figure 6). The normalized difference snow index (RGB-NDSI), which uses RGB values to analyze snow cover [39,40], was calculated and used to determine frost on other days (Figure 7). The method of calculating RGB-NDSI presented in Hinkler et al. [39] is given as follows:

RGB- NDSI = \frac{RGB - {MIR}_{Replacement}}{RGB + {MIR}_{Replacement}},

(7)

RGB = \frac{R + G + B}{3},

(8)

{RGB}_{High} = \frac{B^{3}}{R^{3}} G,

(9)

τ = 200{a(RGB_High)_Mean + b},

(10)

{MIR}_{Replacement} = \frac{τ^{4} \cdot {RGB}_{Max}}{{RGB}^{4}} .

(11)

RGB is the average of each RGB element value of a pixel, and RGB_Max is the highest RGB value among pixels. In the equation that calculates τ, a and b are empirical constants that are specific for each camera, which are replaced by τ = (RGB_High)_Mean by Fedorov et al. [40]. In this study, the average of the RGB_High values was used as τ because the empirical constants, a and b, of the camera could not be obtained. The mid-infrared spectral band (MIR) value of the calculation formula for NDSI was replaced with MIR_Replacement, which was calculated using the RGB value in the RGB-NDSI calculation formula and was analyzed using the Python language and the opencv-python library [41]. In the AWS in the orchard, not all input data of the complex model were observed; therefore, only the simple model was used for verification.

Table 9 shows the classification results of the simple model for each version using the confusion matrix and verification index. As with the verification using the KMA ASOS, the pre-optimization version (1.0) showed a high POD and a low FAR, and the ACC and FAR were improved in the version after optimization (after Version 2.0). After optimization, all versions classified 35 of 37 cases equally in daily verification. As in the case of verification using the KMA ASOS, the overestimate was the smallest for Version 2.3, which was determined to be a stable version compared to the other versions.

4. Summary and Future Works

We developed an hourly frost-classification model using four types of ML method (i.e., LR, DT, RF, and SVM) based on the frost observation information of the KMA ASOS for the past 20 years. Among them, the frost-classification model based on RF and SVM was selected. The basic assumptions of the model were altered, and the training data were increased to optimize the model for farming environments that are considerably affected by topography and environment. In addition, the frost-classification model was verified using the frost observation information of KMA’s 24 h-manned synoptic weather observation stations and the frost observation information of the GARES AWS-installed orchard using a camera from March–April 2020; the data represented unbalanced data. During verification using the KMA ASOS observation information and orchard data, more frost occurrence days were classified in the pre-optimization version (1.0); however, it was difficult to apply the pre-optimization model to farms because it exhibited a limitation of excessive classification of the frost phenomenon itself. In the optimized version, these vulnerabilities were reduced, and the ACC, POD, and CSI were improved. Regarding frost events, a maximum of 117 cases were classified out of 139 domestic frost observations in the spring of 2020, and 35 of 37 cases were classified in the orchard verification scheme using a camera.

The assumption of the initial frost occurrence time greatly improved the performance of the frost-classification model using the ML method. If initial frost occurrence time-observation data or hourly frost observation data are used for training, the performance of the frost-classification model can be improved. However, frost observation using a digital camera, as in this study, has a limitation in terms of hourly frost observations. It is impossible to capture pictures at night, and it is difficult to distinguish between reflected sunlight and frost crystals after sunrise. For this reason, frost observation using a thermal imaging camera may be a good alternative.

In this study, hourly observation data and their secondary variables were used as input data. As hourly observation data were used as input, the training reflected the characteristics of variables with diurnal variation and were discontinuous for wind speed and precipitation. The secondary variables, subzero temperature duration and categorized precipitation, were used as input data. Categorized precipitation can be considered a variable that focuses more on sky conditions than on the amount of precipitation. However, the criteria for light rain and rain were determined empirically. Therefore, it was necessary to discuss the contribution to radiative cooling and frost occurrence of wet surfaces.

RF and SVM techniques have also evolved to perform nonlinear classification. However, the frost-classification model shows low performance for the frost case when the daily minimum temperature is higher than 0 ℃. Generally, frost phenomena are heavily influenced by temperature, but the data for these cases have nonlinearities. For the next step, we will classify these cases using a deep neural network-based algorithm that allows more diverse attempts at nonlinear classification via the adjustment of hidden layers and activation functions. Furthermore, the current frost-classification model has been used to predict frost based on the 17:00 LST KMA weather forecast as input data. The frost-prediction system can be further improved by considering frost-retaining conditions, which were not considered in the present study but will be in the near future.

Author Contributions

Conceptualization, I.N. and S.-O.K.; methodology, H.-W.D. and S.-J.L.; software, I.N. and H.-W.D.; validation, I.N.; data curation, I.N., S.-O.K., and S.-H.K.; writing—original draft preparation, I.N.; writing—review and editing, S.-O.K., S.S., and S.-J.L.; supervision, S.-J.L.; project administration, S.-O.K. and S.S.; funding acquisition, S.-O.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Korea Meteorological Administration Research and Development Program under Grant KMI2018-04811 and Development of Production Techniques on User-Customized Weather Information (KMA2018-00622).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Mosedale, J.R.; Wilson, R.J.; Maclean, I.M. Climate Change and Crop Exposure to Adverse Weather: Changes to Frost Risk and Grapevine Flowering Conditions. PLoS ONE 2015, 10, e0141218. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lindkvist, L.; Chen, D. Air and soil frost indices in relation to plant mortality in elevated clear-felled terrain in Central Sweden. Clim. Res. 1999, 12, 65–75. [Google Scholar] [CrossRef] [Green Version]
Lindkvist, L.; Gustavsson, T.; Bogren, J. A frost assessment method for mountainous areas. Agric. For. Meteorol. 2000, 102, 51–67. [Google Scholar] [CrossRef]
Prabha, T.; Hoogenboom, G. Evaluation of the weather research and forecasting model for two frost events. Comput. Electron. Agric. 2008, 64, 234–247. [Google Scholar] [CrossRef]
Chevalier, R.F.; Hoogenboom, G.; McClendon, R.W.; Paz, J.O. A web-based fuzzy expert system for frost warnings in horticultural crops. Environ. Modell. Softw. 2012, 35, 84–91. [Google Scholar] [CrossRef]
Verdes, P.F.; Granitto, P.M.; Navone, H.D.; Ceccatto, H.A. Frost Prediction with Machine Learning Techniques. Available online: http://sedici.unlp.edu.ar/bitstream/handle/10915/23444/Documento_completo.pdf?sequence=1 (accessed on 29 June 2021).
Ghielmi, L.; Eccel, E. Descriptive models and artificial neural networks for spring frost prediction in an agricultural mountain area. Comput. Electron. Agric. 2006, 54, 101–114. [Google Scholar] [CrossRef]
Möller-Acuña, P.; Ahumada-García, R.; Reyes-Suárez, J.A. Machine learning for prediction of frost episodes in the Maule region of Chile. In Ubiquitous Computing and Ambient Intelligence, Proceedings of the International Conference on Ubiquitous Computing and Ambient Intelligence, Philadelphia, PA, USA, 7–10 November 2017; Ochoa, S., Singh, P., Bravo, J., Eds.; Springer: Berlin, Germany, 2017; pp. 715–720. [Google Scholar] [CrossRef]
Diedrichs, A.L.; Bromberg, F.; Dujovne, D.; Brun-Laguna, K.; Watteyne, T. Prediction of Frost Events Using Machine Learning and IoT Sensing Devices. IEEE Internet Things J. 2018, 5, 4589–4597. [Google Scholar] [CrossRef] [Green Version]
Castaneda-Miranda, A.; Castano-Meneses, V.M. Internet of things for smart farming and frost intelligent control in greenhouses. Comput. Electron. Agric. 2020, 176, 23. [Google Scholar] [CrossRef]
Castaneda-Miranda, A.; Castano-Meneses, V.M. Smart frost measurement for anti-disaster intelligent control in greenhouses via embedding IoT and hybrid AI methods. Measurement 2020, 164, 13. [Google Scholar] [CrossRef]
Guillen-Navarro, M.A.; Martinez-Espana, R.; Lopez, B.; Cecilia, J.M. A high-performance IoT solution to reduce frost damages in stone fruits. Concurr. Comput. Pract. Exp. 2021, 33, e5299. [Google Scholar] [CrossRef]
Synder, R.; Paw, U.K.T.; Thompson, J.F. Passive Frost Protection of Trees and Vines. Available online: https://anrcatalog.ucanr.edu/pdf/21429e.pdf (accessed on 29 June 2021).
Perry, K.B. Basics of frost and freeze protection for horticultural crops. HortTechnology 1998, 8, 10–15. [Google Scholar] [CrossRef] [Green Version]
Kwon, Y.-A.; Lee, H.-S.; Kwon, W.-T.; Boo, K.-O. The weather characteristics of frost occurrence days for protecting crops against frost damage. J. Korean Geogr. Soc. 2008, 43, 824–842. [Google Scholar]
Kwon, Y. The spatial distribution and recent trend of frost occurrence days in South Korea. J. Korean Geogr. Soc. 2006, 41, 361–372. [Google Scholar]
Bae, C.-H.; Bae, S.-G.; Suh, M.-S. A study on the spatio-temporal variations of frost days (first/last frost date) in South Korea using climate change scenarios. J. Clim. Res. 2017, 12, 17–40. [Google Scholar] [CrossRef]
Kim, D.-J.; Kim, J.-H. An outlook of changes in the flowering dates and low temperature after flowering under the RCP8. 5 projected climate condition. Kor. J. Agric. For. Meteorol. 2018, 20, 313–320. [Google Scholar] [CrossRef]
Lee, H.; Chun, J.A.; Han, H.H.; Kim, S. Prediction of frost occurrences using statistical modeling approaches. Adv. Meteorol. 2016, 2016, 9. [Google Scholar] [CrossRef]
Kim, Y.; Shim, K.-M.; Jung, M.-P.; Choi, I.-T. Study on the estimation of frost occurrence classification using machine learning methods. Kor. J. Agric. For. Meteorol. 2017, 19, 86–92. [Google Scholar] [CrossRef]
Snyder, R.L.; Melo-Abreu, J.D. Frost Protection: Fundamentals, Practice and Economics; FAO: Rome, Italy, 2005; Volume 1, pp. 1–240. [Google Scholar]
Tao, Y.X.; Besant, R.W.; Rezkallah, K.S. A mathematical-model for predicting the densification and growth of frost on a flat-plate. Int. J. Heat Mass Transf. 1993, 36, 353–363. [Google Scholar] [CrossRef]
Kaviany, M. Principles of Convective Heat Transfer; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
Eltahir, E.A.B. A soil moisture rainfall feedback mechanism 1. Theory and observations. Water Resour. Res. 1998, 34, 765–776. [Google Scholar] [CrossRef]
Liu, Y.; Chawla, N.V.; Harper, M.R.; Shriberg, E.; Stolcke, A. A study in machine learning from imbalanced data for sentence boundary detection in speech. Comput. Speech Lang. 2006, 20, 468–494. [Google Scholar] [CrossRef]
Ruiz, A.; Villa, N. Storms prediction: Logistic regression vs random forest for unbalanced data. arXiv 2008, arXiv:0804.0650. [Google Scholar]
Seo, J.-H. A comparative study on the classification of the imbalanced intrusion detection dataset based on deep learning. J. Korean Inst. Intell. Syst. 2018, 28, 152–159. [Google Scholar] [CrossRef]
Kuhn, M. Caret: Classification and Regression Training. 2020. Available online: https://cran.r-project.org/package=caret (accessed on 29 June 2021).
Ripley, B. Tree: Classification and Regression Trees. Available online: https://cran.r-project.org/package=tree (accessed on 29 June 2021).
Therneau, T.; Atkinson, B. Rpart: Recursive Partitioning and Regression Trees. 2019. Available online: https://cran.r-project.org/package=rpart (accessed on 29 June 2021).
Hothorn, T.; Hornik, K.; Zeileis, A. Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Stat. 2006, 15, 651–674. [Google Scholar] [CrossRef] [Green Version]
Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Meyer, D.; Dimitriadou, E.; Hornik, K.; Weingessel, A.; Leisch, F.; Chang, C.C.; Lin, C.C. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. 2019. Available online: https://cran.r-project.org/package=e1071 (accessed on 29 June 2021).
Somvanshi, M.; Chavan, P.; Tambade, S.; Shinde, S.V. A review of machine learning techniques using decision tree and support vector machine. In Proceedings of the 2016 International Conference on Computing Communication Control and automation (ICCUBEA), Pune, India, 12–13 August 2016. [Google Scholar]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Muller, M.P.; Tomlinson, G.; Marrie, T.J.; Tang, P.; McGeer, A.; Low, D.E.; Detsky, A.S.; Gold, W.L. Can routine laboratory tests discriminate between severe acute respiratory syndrome and other causes of community-acquired pneumonia? Clin. Infect. Dis. 2005, 40, 1079–1086. [Google Scholar] [CrossRef] [Green Version]
Hinkler, J.; Pedersen, S.B.; Rasch, M.; Hansen, B.U. Automatic snow cover monitoring at high temporal and spatial resolution, using images taken by a standard digital camera. Int. J. Remote Sens. 2010, 23, 4669–4682. [Google Scholar] [CrossRef]
Fedorov, R.; Camerada, A.; Fraternali, P.; Tagliasacchi, M. estimating snow cover from publicly available images. IEEE Trans. Multimed. 2016, 18, 1187–1200. [Google Scholar] [CrossRef]
Bradski, G. The OpenCV library. Dr. Dobbs J. Softw. Tools 2000, 120, 122–125. [Google Scholar]

Figure 1. KMA 24 h-manned synoptic weather observation stations ((left), red dots) and field verification sites ((right), blue dots).

Figure 2. AWS and digital camera (red circle) at the Gapyeong (A) and Namyangju (B) sites.

Figure 3. The number of events in unbalanced data set (A) and balanced data set (B).

Figure 4. Example of assumed initial frost occurrence time.

Figure 5. Performance evaluation indicators for each weather station shown in Figure 1.

Figure 6. Digital camera images on a frost day at Gapyeong (left) and Namyangju (right) shown in Figure 1.

Figure 7. Normalized difference snow index (NDSI) (red box) calculated from a frost day camera image at Namyangju.

Table 1. Categorization of precipitation observations.

Observations	Definition	Variable
NA	No rain	0
0 $\leq$ x $<$ 1 mm	Light rain	1
1 mm $\leq$ x	Rain	2

Table 2. Package names and source codes for each classification method.

Classification Method	Package Name in R	Source Code
Logistic regression	caret	https://CRAN.R-project.org/package=caret (accessed on 29 June 2021)
Decision tree	tree	https://CRAN.R-project.org/package=tree (accessed on 29 June 2021)
	rpart	https://CRAN.R-project.org/package=rpart (accessed on 29 June 2021)
	party	https://CRAN.R-project.org/package=party (accessed on 29 June 2021)
Random forest	rndomForest	https://CRAN.R-project.org/package=rndomForest (accessed on 29 June 2021)
Support vector machine	e1071	https://CRAN.R-project.org/package=e1071 (accessed on 29 June 2021)

Table 3. Confusion matrix.

		Observation
		No Frost	Frost
Classification	No Frost	TN	FN
	Frost	FP	TP

Table 4. Confusion matrix of original version for each classification model.

Model	Type	Classification	Observation
Model	Type	Classification	No Frost	Frost
Logistic	Complex	No frost	2176	652
	Complex	Frost	1043	2591
	Simple	No frost	2248	794
	Simple	Frost	971	2449
Tree 1	Complex	No frost	2100	764
	Complex	Frost	1119	2479
	Simple	No frost	2100	764
	Simple	Frost	1119	2479
Tree 2	Complex	No frost	2485	957
	Complex	Frost	734	2286
	Simple	No frost	2295	842
	Simple	Frost	924	2401
Tree 3	Complex	No frost	2463	860
	Complex	Frost	756	2383
	Simple	No frost	2509	990
	Simple	Frost	710	2253
RF	Complex	No frost	2467	767
	Complex	Frost	752	2476
	Simple	No frost	2455	903
	Simple	Frost	764	2340
SVM	Complex	No frost	2382	645
	Complex	Frost	837	2598
	Simple	No frost	2396	820
	Simple	Frost	823	2423

Table 5. Performance evaluation indicators of the original version for each classification model.

Model	Type	ACC	FAR	POD	CSI	AUC
Logistic	Complex	0.738	0.713	0.799	0.605	0.816
Logistic	Simple	0.727	0.716	0.755	0.581	0.798
Tree 1	Complex	0.709	0.689	0.764	0.568	0.708
Tree 1	Simple	0.709	0.689	0.764	0.568	0.708
Tree 2	Complex	0.738	0.757	0.705	0.575	0.753
Tree 2	Simple	0.727	0.722	0.740	0.576	0.750
Tree 3	Complex	0.750	0.759	0.735	0.596	0.836
Tree 3	Simple	0.737	0.760	0.695	0.570	0.826
RF	Complex	0.765	0.767	0.764	0.620	0.854
RF	Simple	0.742	0.754	0.722	0.584	0.834
SVM	Complex	0.771	0.756	0.801	0.637	0.771
SVM	Simple	0.746	0.747	0.747	0.596	0.746

Table 6. Update note of frost-classification models.

Version	Update Note
1.0	Original version
2.0	Assumption of initial frost occurrence time
2.1	Extension of training period (March 2008–2017 → March and April 2008–2019)
2.2	Addition of nighttime minimum temperature
2.3	Extension of training period (March and April 2008–2019 → March and April 2000–2019)

Table 7. Performance evaluation indicators of the original and optimized models.

Version	Model	Type	ACC	FAR	POD	CSI
1.0	RF	Complex	0.7649	0.7670	0.7635	0.6198
	RF	Simple	0.7420	0.7539	0.7216	0.5835
	SVM	Complex	0.7707	0.7563	0.8011	0.6368
	SVM	Simple	0.7457	0.7465	0.7471	0.5959
2.0	RF	Complex	0.9152	0.9105	0.9237	0.8468
	RF	Simple	0.9191	0.9175	0.9237	0.8528
	SVM	Complex	0.9215	0.9195	0.9266	0.8570
	SVM	Simple	0.9186	0.9182	0.9217	0.8518
2.1	RF	Complex	0.9430	0.9504	0.9374	0.8937
	RF	Simple	0.9422	0.9488	0.9374	0.8922
	SVM	Complex	0.9458	0.9461	0.9477	0.8991
	SVM	Simple	0.9422	0.9488	0.9374	0.8922
2.2	RF	Complex	0.9435	0.9474	0.9417	0.8949
	RF	Simple	0.9426	0.9503	0.9365	0.8929
	SVM	Complex	0.9470	0.9470	0.9494	0.9015
	SVM	Simple	0.9426	0.945	0.9425	0.8935
2.3	RF	Complex	0.9416	0.9457	0.9392	0.8912
	RF	Simple	0.9408	0.9463	0.9369	0.8896
	SVM	Complex	0.9361	0.9349	0.9400	0.8822
	SVM	Simple	0.9353	0.9382	0.9346	0.8803

Table 8. Confusion matrix and performance evaluation indicators for the real case (KMA ASOS).

Version	Type	Period	TN	FN	FP	TP	ACC	FAR	POD	CSI
1.0	Complex	Hourly	12,538	11	2441	382	0.8405	0.1353	0.9720	0.1348
	Complex	Daily	710	4	249	135	0.7696	0.3516	0.9712	0.3479
	Simple	Hourly	12,752	14	2227	379	0.8542	0.1454	0.9644	0.1447
	Simple	Daily	689	6	270	133	0.7486	0.3300	0.9568	0.3252
2.0	Complex	Hourly	14,704	49	275	344	0.9789	0.5557	0.8753	0.5150
	Complex	Daily	912	41	47	98	0.9199	0.6759	0.7050	0.5269
	Simple	Hourly	14,701	55	278	338	0.9783	0.5487	0.8601	0.5037
	Simple	Daily	911	46	48	93	0.9144	0.6596	0.6691	0.4973
2.1	Complex	Hourly	14,547	37	432	356	0.9695	0.4518	0.9059	0.4315
	Complex	Daily	899	30	60	109	0.9180	0.6450	0.7842	0.5477
	Simple	Hourly	14,541	43	438	350	0.9687	0.4442	0.8906	0.4212
	Simple	Daily	895	37	64	102	0.9080	0.6145	0.7338	0.5025
2.2	Complex	Hourly	14,589	30	390	363	0.9727	0.4821	0.9237	0.4636
	Complex	Daily	889	22	70	117	0.9162	0.6257	0.8417	0.5598
	Simple	Hourly	14,608	35	371	358	0.9736	0.4911	0.9109	0.4686
	Simple	Daily	888	27	71	112	0.9107	0.6120	0.8058	0.5333
2.3	Complex	Hourly	14,640	33	339	360	0.9758	0.5150	0.9160	0.4918
	Complex	Daily	893	26	66	113	0.9162	0.6313	0.8129	0.5512
	Simple	Hourly	14,653	46	326	347	0.9758	0.5156	0.8830	0.4826
	Simple	Daily	894	39	65	100	0.9053	0.6061	0.7194	0.4902

Table 9. Confusion matrix and performance evaluation indicators for real case (orchard).

Version	Type	Period	TN	FN	FP	TP	ACC	FAR	POD	CSI
1.0	Simple	Hourly	723	2	445	188	0.6708	0.2970	0.9895	0.2961
1.0	Simple	Daily	14	0	46	37	0.5258	0.4458	1.0000	0.4458
2.0	Simple	Hourly	956	3	212	187	0.8417	0.4687	0.9842	0.4652
2.0	Simple	Daily	20	2	40	35	0.5670	0.4667	0.9459	0.4545
2.1	Simple	Hourly	907	2	261	188	0.8063	0.4187	0.9895	0.4169
2.1	Simple	Daily	21	2	39	35	0.5773	0.4730	0.9459	0.4605
2.2	Simple	Hourly	956	2	212	188	0.8424	0.4700	0.9895	0.4677
2.2	Simple	Daily	20	2	40	35	0.5670	0.4667	0.9459	0.4545
2.3	Simple	Hourly	975	2	193	188	0.8564	0.4934	0.9895	0.4909
2.3	Simple	Daily	22	2	38	35	0.5876	0.4795	0.9459	0.4667

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Noh, I.; Doh, H.-W.; Kim, S.-O.; Kim, S.-H.; Shin, S.; Lee, S.-J. Machine Learning-Based Hourly Frost-Prediction System Optimized for Orchards Using Automatic Weather Station and Digital Camera Image Data. Atmosphere 2021, 12, 846. https://doi.org/10.3390/atmos12070846

AMA Style

Noh I, Doh H-W, Kim S-O, Kim S-H, Shin S, Lee S-J. Machine Learning-Based Hourly Frost-Prediction System Optimized for Orchards Using Automatic Weather Station and Digital Camera Image Data. Atmosphere. 2021; 12(7):846. https://doi.org/10.3390/atmos12070846

Chicago/Turabian Style

Noh, Ilseok, Hae-Won Doh, Soo-Ock Kim, Su-Hyun Kim, Seoleun Shin, and Seung-Jae Lee. 2021. "Machine Learning-Based Hourly Frost-Prediction System Optimized for Orchards Using Automatic Weather Station and Digital Camera Image Data" Atmosphere 12, no. 7: 846. https://doi.org/10.3390/atmos12070846

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Hourly Frost-Prediction System Optimized for Orchards Using Automatic Weather Station and Digital Camera Image Data

Abstract

1. Introduction

2. Data and Frost-Classification Model

2.1. Input Data

2.2. Preprocessing of Input Data

2.3. Variable Setting and ML Technique

2.4. Model Evaluation

2.4.1. Performance Evaluation Indicators

2.4.2. Performance Result

3. Application and Optimization

3.1. Model Optimization Method

3.1.1. Assuming the Initial Time of Frost Occurrence

3.1.2. Minimum Temperature at Night

3.1.3. Extension of Training Period

3.2. Optimization Results: Case Period March–April 2020

3.2.1. Case Verification Using KMA ASOS Data

3.2.2. Verification of Orchard Cases in March–April 2020 Using Digital Cameras and AWS Observations

4. Summary and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI