Prediction of Sensor Data in a Greenhouse for Cultivation of Paprika Plants Using a Stacking Ensemble for Smart Farms

Han, Seok-Ho; Mutahira, Husna; Jang, Hoon-Seok

doi:10.3390/app131810464

Open AccessArticle

Prediction of Sensor Data in a Greenhouse for Cultivation of Paprika Plants Using a Stacking Ensemble for Smart Farms

by

Seok-Ho Han

¹,

Husna Mutahira

² and

Hoon-Seok Jang

^1,*

¹

IT Application Research Center, Korea Electronics Technology Institute, Jeonju 54853, Republic of Korea

²

Department of Computer Science and Engineering, Sogang University, Seoul 04107, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(18), 10464; https://doi.org/10.3390/app131810464

Submission received: 10 August 2023 / Revised: 6 September 2023 / Accepted: 12 September 2023 / Published: 19 September 2023

(This article belongs to the Section Agricultural Science and Technology)

Download

Browse Figures

Versions Notes

Abstract

:

Ensuring food security has become of paramount importance due to the rising global population. In particular, the agriculture sector in South Korea faces several challenges such as an aging farming population and a decline in the labor force. These issues have led to the recognition of smart farms as a potential solution. In South Korea, the smart farm is divided into three generations. The first generation primarily concentrates on monitoring and controlling precise cultivation environments by leveraging information and communication technologies (ICT). This is aimed at enhancing convenience for farmers. Moving on to the second generation, it takes advantage of big data and artificial intelligence (AI) to achieve improved productivity. This is achieved through precise cultivation management and automated control of various farming processes. The most advanced level is the 3rd generation, which represents an intelligent robotic farm. In this stage, the entire farming process is autonomously managed without the need for human intervention. This is made possible through energy management systems and the use of robots for various farm operations. However, in the current Korean context, the adoption of smart farms is primarily limited to the first generation, resulting in the limited utilization of advanced technologies such as AI, big data, and cloud computing. Therefore, this research aims to develop the second generation of smart farms within the first generation smart farm environment. To accomplish this, data was collected from nine sensors spanning the period between 20 June to 30 September. Following that, we conducted kernel density estimation analysis, data analysis, and correlation heatmap analysis based on the collected data. Subsequently, we utilized LSTM, BI-LSTM, and GRU as base models to construct a stacking ensemble model. To assess the performance of the proposed model based on the analyzed results, we utilized LSTM, BI-LSTM, and GRU as the existing models. As a result, the stacking ensemble model outperformed LSTM, BI-LSTM, and GRU in all performance metrics for predicting one of the sensor data variables, air temperature. However, this study collected nine sensor data over a relatively short period of three months. Therefore, there is a limitation in terms of considering the long-term data collection and analysis that accounts for the unique seasonal characteristics of Korea. Additionally, the challenge of including various environmental factors influencing crops beyond the nine sensors and conducting experiments in diverse cultivation environments with different crops for model generalization remains. In the future, we plan to address these limitations by extending the data collection period, acquiring diverse additional sensor data, and conducting further research that considers various environmental variables.

Keywords:

correlation heatmap; kernel density estimation; Long Short-Term Memory (LSTM); smart farm; stacking ensemble

1. Introduction

Agriculture is a significant economic activity worldwide, and it is an essential element for ensuring food security. According to recent estimates by the UN, the global population is projected to exceed 8.5 billion by 2030, over 9.7 billion by 2050, and reach approximately 10.4 billion by 2080. With the ongoing population growth, the importance of food security is increasingly recognized as a crucial factor for national security and social stability [1,2].

In recent years, South Korea has been experiencing deteriorating growth momentum attributed to structural issues in agriculture. Challenges such as aging farmers, a decline in the agricultural workforce, decreasing production area, and a rise in imported agricultural products have led to stagnation in agricultural growth, income, and exports. These challenges are accompanied by factors such as declining food self-sufficiency, insecure food supplies, and dependence on food imports, all of which are likely to worsen due to insecure food supplies for the world’s growing population. Smart farms are gaining recognition as a solution to address these problems and revitalize the agricultural industry [3,4,5]. A smart farm is a system that applies 4th industrial technologies such as sensors, information and communication technology (ICT), Internet of Things (IoT), and drones in the agricultural field. Its primary purpose is to collect and analyze essential data pertaining to crop growth and environmental conditions. By monitoring and controlling factors such as soil quality, climate conditions, and disease occurrences, smart farms aim to optimize agricultural processes through automation and mechanization [6,7,8,9,10].

According to the MAFRA (Ministry of Agriculture, Food and Rural Affairs Korea), the country’s smart agriculture market is projected to grow from USD 240 million in 2020 to USD 490 million over a span of 25 years, with a compound annual growth rate of 15.5% [11]. Furthermore, Figure 1 illustrates the Smart Farm Adoption Status in South Korea, revealing a notable trend of continuous expansion in the smart farming sector. The area dedicated to smart farms has experienced significant growth over the years, increasing from 405 hectares in 2014 to an impressive 5383 hectares by the year 2020.

Furthermore, the MAFRA classifies smart farms into three generations based on technological advancements. The first generation smart farm involves the use of information and communication technology to monitor and control the cultivation environment, such as the automatic opening and closing of vinyl houses and real-time control, enhancing convenience. The second generation of smart farms utilizes big data and artificial intelligence to enhance productivity by implementing precise cultivation management, automated control, and cloud services. This model analyzes the conditions of crop cultivation and pest infestation to proactively take preventive measures and optimize management, leading to improved crop quality and productivity. The third generation smart farm represents an intelligent robot farm with the capability of autonomously managing the entire cultivation process. This includes complex energy management and various robotic farm operations, all accomplished without human intervention. This approach is anticipated as a strategy to address the current rural challenges of aging and decreasing populations, offering the potential to overcome these issues [12].

However, a report, The Analysis of the Status and Future Development of Smart Farming Projects, released by the Korean National Assembly Budget Office in 2022, presents a breakdown of the proportion of Korean smart farms in 2020. According to the analysis, the first generation of smart farms makes up 84.2% of the total, whereas the second generation comprises 15.8% [13]. It is evident that the first generation, characterized by relatively low technological levels, is primarily distributed in small-scale operations, and the utilization of advanced technologies such as artificial intelligence, big data, and cloud computing remains limited. Therefore, in this paper, we conducted a study in a greenhouse where paprika was grown hydroponically in the first generation of smart farms.

The paprika (Capsicum annum L.) cultivated in greenhouses is classified under the Solanaceae family and is considered an economically important crop in most regions of the world [14]. Paprika plants prefer mild climates and are sensitive to high air temperatures and intense sunlight, with the optimal air temperature for their growth being approximately 20–25 °C [15]. In regions with distinct seasons like South Korea, traditional cultivation practices have historically deemed paprika cultivation unsuitable. However, with the advancement of agricultural technology, various methods such as greenhouses, hydroponics, aquaponics, and aeroponics have made it possible to cultivate special crops like paprika [16,17]. This technological introduction has led to a significant increase in paprika production in South Korea, from 7500 tons in 2000 to approximately 78,000 tons in 2017, making it one of the major export vegetables in the horticultural industry. It has also become a significant import commodity in the Japanese paprika market [18,19]. Nonetheless, these specialty crops are highly sensitive to environmental changes, and unexpected shifts in their growth environment can lead to hindered growth, pest outbreaks, and decreased yields. Furthermore, recent abrupt climate changes have caused extensive agricultural damage worldwide [20,21]. In response, recent research aims to enhance productivity by modernizing agricultural practices and analyzing and understanding the growth environment. In this paper, we aim to accurately predict environmental factors that impact crop growth in the greenhouse where paprika is cultivated in a similar vein to recent work. To achieve this goal, we established a testbed in the greenhouse for paprika cultivation and collected sensor data. Following that, we conducted kernel density estimation analysis, data analysis, and correlation heatmap analysis based on the collected data. Subsequently, we utilized LSTM, BI-LSTM, and GRU as base models to construct a stacking ensemble model. To assess the performance of the proposed model based on the analyzed results, we utilized LSTM, BI-LSTM, and GRU as the existing models for predicting one of the sensor data variables, air temperature.

This research signifies a transition from traditional agricultural methods relying on experience to a modern approach employing data analysis and artificial intelligence. It lays the foundation for moving from the first generation of smart farms to the second generation, making it a valuable reference in the field of agriculture.

2. Related Work

In this section, we introduce models related to sensor data prediction, such as LSTM, BI-LSTM, and GRU.

2.1. Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) is a model introduced in 1997 as a type of Recurrent Neural Network (RNN) that addresses the long-term dependency problem by incorporating a cell state structure [22,23]. The structure of LSTM cell is shown in Figure 2.

The structure of LSTM typically consists of an input gate, forget gate, cell state, and output gate. In the forget gate, the incoming information from the previous cell is processed through a sigmoid function to determine whether to remember or forget it [24,25,26]. The forget gate is given in Equation (1):

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f},

(1)

where

x_{t}

represents the input of the current neuron,

h_{t - 1}

represents the output of the previous neuron,

W_{f}

is the weight values,

b_{f}

is the bias values,

σ

is the sigmoid function [27,28]. The input gate determines the significance of the current input information through a sigmoid function and controls which information to store in the cell state [29]. The input gate is shown in Equation (2):

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}),

(2)

where

W_{i}

represents the weight values and

b_{i}

is the bias values. The

{\tilde{c}}_{t}

is a candidate cell state used to incorporate the current input into the cell state. It is calculated using the

t a n h

activation function and is used in conjunction with the forget gate and input gate to regulate the cell state

c_{t}

[30,31]. The cell state is given in Equations (3) and (4):

{\tilde{c}}_{t} = t a n h (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c}),

(3)

c_{t} = c_{t - 1} ⊙ f_{t} + i_{t} ⊙ {\tilde{c}}_{t},

(4)

where

W_{c}

represents the weight values,

b_{c}

represents the bias values, and

c_{t - 1}

is the previous time step’s cell state. The output gate adjusts the final output based on the current cell state using a sigmoid function. The resulting output,

h_{t}

is computed by multiplying the output of the output gate with the current cell state using the

t a n h

activation function [32,33]. The output gate is shown in Equations (5) and (6):

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}),

(5)

h_{t} = o_{t} ⊙ {t a n h (c}_{t}),

(6)

where

W_{o}

represents the weight values and

b_{o}

is the bias values.

2.2. Bidirectional Long Short-Term Memory (BI-LSTM)

Since LSTM is a unidirectional model that relies solely on previous information to predict the current state, it has limitations in simultaneously processing both past and future information. To overcome this limitation, the Bidirectional-LSTM (BI-LSTM) model was proposed in 2005. BI-LSTM incorporates an additional LSTM that operates in the reverse direction during the forward learning process of LSTM [34,35]. The structure of BI-LSTM enables bidirectional information processing by sequentially processing input data from front to back through the Forward LSTM Layer and from back to front through the Backward LSTM Layer. The outputs of these layers are connected together with an activation function, facilitating the integration of bidirectional information processing, as shown in Figure 3 [36].

2.3. Gated Recurrent Unit (GRU)

Gated Recurrent Unit (GRU) is a model introduced in 2014, belongs to the category of RNN, and is specifically designed to address the challenge of long-term dependencies encountered in LSTM. GRU exhibits a simpler structure compared to LSTM, incorporating only two gates: the update gate and the reset gate, as shown in Figure 4 [37,38].

The update gate determines how much of the previous cell’s information should be updated to the current state [39]. The update gate is given in Equation (7):

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}] + b_{z}),

(7)

where

x_{t}

represents the input of the current neuron,

h_{t - 1}

represents the output of the previous neuron,

W_{z}

is the weight values,

b_{z}

is the bias values, and

σ

is the sigmoid function [40,41]. The reset gate is responsible for determining the extent to which the previously hidden state should be reset based on the current input and the previous hidden state [42]. The reset gate is shown in Equation (8):

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r}),

(8)

where

W_{r}

represents the weight values and

b_{r}

is the bias values. The candidate cell state

{\tilde{h}}_{t}

is computed using the

t a n h

activation function and the output of the current neuron and the reset gate, and the final output

h_{t}

is computed using

{\tilde{h}}_{t}

and the output of the update gate [43,44]. The candidate cell and final output are given in Equations (9) and (10), respectively.

{\tilde{h}}_{t} = t a n h (W_{h} \cdot [r_{t} * h_{t - 1}, x_{t}] + b_{h}),

(9)

h_{t} = (1 - z_{t}) \times h_{t - 1} + z_{t} \times {\tilde{h}}_{t},

(10)

where

W_{h}

represents the weight values and

b_{h}

is the bias values.

3. Proposed Method

3.1. Data Acquisition

The primary crops cultivated in the Jeonbuk region of South Korea include paprika, tomatoes, and strawberries. The Fruit and Vegetable Research Institute at Jeonbuk Agricultural Research & Extension Services generously offered us access to their paprika greenhouse, consequently leading to the selection of paprika as the focal crop for our study. The paprika testbed was established to collect environmental data for smart farm. It was equipped with a main board, sensor board, router, converter, and sensors. The structure of the established paprika testbed is depicted in Figure 5.

The main board plays a pivotal role in collecting data from three sensor nodes and transmitting it to the database. The router is responsible for facilitating data transmission, acting as a communication hub. The converter ensures that measured data is transformed into the appropriate format, while the sensors are dedicated to gathering environmental data. Lastly, the sensor node serves the function of transmitting information collected by the sensors to the Main Board.

To acquire environmental data, a total of nine sensors (air temperature, humidity, CO₂, soil temperature, soil moisture, insolation, soil electrical conductivity (EC), drainage electrical conductivity (EC), drainage pH) were installed. Figure 6 shows the appearance of the established smart farm testbed.

3.1.1. Process of Acquiring Environmental Sensor Data

Figure 7 illustrates the process of acquiring environmental sensor data, which operates through RS-485 communication between the main board and the three sensor boards for data transmission and reception. The first sensor board is responsible for acquiring four types of data: air temperature, humidity, CO₂, and insolation. The second sensor board is dedicated to acquiring three types of data: soil temperature, soil moisture, and soil EC. Lastly, the third sensor board is responsible for acquiring two types of data: drainage EC and drainage pH.

3.1.2. Establishment of Smart Farm Environment Database

Figure 8 represents the process of storing data in a database. The main board receives nine types of environmental sensor data from the three sensor boards. These data are then transmitted to the web server using HTTP requests. Subsequently, the web server stores the transmitted data in the database. During this process, the web server sends an HTTP response to indicate the success of data storage.

3.1.3. Preprocessing of Environmental Sensor Data

We collected environmental sensor data at 5-min intervals from 20 June to 20 September. The stored data was extracted from the database and imported to Excel sheets for data preprocessing. After excluding any missing data, approximately 220,000 data points were obtained. The part of the preprocessed data is shown in Figure 9.

3.2. Stacking Ensemble

After acquiring the sensor data, an ensemble model is applied to predict the sensor data. The ensemble is a method of combining multiple individual models to create a powerful model, resulting in improved performance including the prediction accuracy, compared to a single model [45]. Within the realm of ensemble, various methodologies such as bagging, boosting, and stacking are prevalent [46,47]. In this study, we employed the stacking ensemble method for predicting the sensor data. Stacking ensemble involves training multiple base models and utilizing their prediction outputs as input to train a final meta-model [48]. The proposed architecture of the stacking ensemble in this study is shown in Figure 10.

Examining the architecture, the data consists of data from sensors that are strongly correlated with the data from the sensor we want to predict based on the results from the correlation heatmap analysis. Subsequently, in the Base-Learners Layer, data serves as input to the LSTM, BI-LSTM, and GRU models configured as the base models. As a result, each base model learns the unique characteristics of different time-series data and generates meta-data based on this knowledge. The generated meta-data within the Meta-Learner Layer is used to train a Meta Learner [49]. The Meta Learner combines the various prediction outputs generated by the base models to predict the sensor data ultimately.

4. Results and Discussion

4.1. Kernel Density Estimation Analysis

To analyze the distribution of environmental sensor data, we utilized the non-parametric density estimation method known as Kernel Density Estimation (KDE). Among the various kernel functions available, we specifically employed the commonly used Gaussian kernel density estimation [50,51] in this study. The Gaussian function and the Gaussian kernel density estimation are shown in Equations (11) and (12), respectively:

K (x) = \frac{1}{\sqrt{2 π}} e x p (\frac{{- x}^{2}}{2}) .

(11)

\hat{f_{h}} (x) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x - x_{i}}{h}),

(12)

where

x

represents a data and

x_{i}

represents the

i^{t h}

data point in the dataset of size

n

. The parameter

h

denotes the bandwidth and

K

represents the kernel function. The Gaussian kernel function is as follows [52,53]:

Figure 11 shows a Gaussian KDE of the environmental sensor data, comprising air temperature, humidity, CO₂, soil temperature, soil moisture, insolation, soil EC, drainage EC, and drainage pH arranged from the top left. During the data distribution analysis, specific patterns were observed in the environmental variables. Air temperature data displayed a notable concentration of around 25 °C, while humidity exhibited a high distribution within the range of 90% and 100%. CO₂ concentrations demonstrated a significant presence between 200 and 300 ppm, and soil temperature displayed a high distribution between 25 °C and 27.5 °C. Soil moisture levels ranged from 26% to 28%, while insolation varied from 0 to 100 w/

m^{2}

. Additionally, both soil EC and drainage EC showed prominent distributions centered around 0.5 dS/m and 3 dS/m, respectively. Lastly, drainage pH exhibited a high distribution between 6.5 and 7.0 pH.

4.2. Data Analysis

After performing data distribution analysis, the monthly average, minimum, and maximum values were computed. To assess data trends over time, these values were graphically depicted on graphs. The blue color in the graphs represents the average values, while orange represents the minimum values, and the gray color is used for the maximum values.

4.2.1. Air Temperature

As can be seen in Figure 12, the average air temperature values were 27.3 °C, 27.8 °C, 27.1 °C, and 24.7 °C, respectively through June to September. The average air temperature remained relatively stable from June to August but decreased in September. The minimum air temperature levels were observed to be 22.3 °C, 21.4 °C, 17.8 °C, and 17.1 °C, respectively, with September recording the lowest minimum air temperature. The maximum air temperature levels were observed to be 37.3 °C, 40.0 °C, 40.1 °C, and 35.7 °C, respectively, with August recording the highest maximum air temperature.

4.2.2. Humidity

As shown in Figure 13, the average humidity values for the respective months were found to be 81.7%, 84.5%, 89.5%, and 88.1%. Humidity exhibited an increasing trend from June to August, followed by a decrease in September, reaching an observed value of 88.1%. The minimum humidity levels were observed to be 50.2%, 45.8%, 41.2%, and 49.4% for the respective months, with August recording the lowest minimum humidity. On the other hand, the maximum humidity levels were observed to be 99.4%, 99.3%, 99.4%, and 99.5% for the respective months, with the maximum values averaging over 99.0%.

4.2.3. CO₂

As can be seen in Figure 14, the average values of CO₂ data were observed to be 316.6 ppm, 303.4 ppm, 280.4 ppm, and 252.5 ppm, respectively. The average CO₂ levels showed a decreasing trend from June to September. The minimum CO₂ levels were observed to be 226 ppm, 183 ppm, 109 ppm, and 154 ppm, respectively, with August recording the lowest minimum CO₂. The maximum CO₂ levels were observed to be 454 ppm, 494 ppm, 478 ppm, and 410 ppm, respectively, with July recording the highest maximum CO₂.

4.2.4. Soil Temperature

As shown in Figure 15, the average values of soil temperature data were observed to be 27.6 °C, 27.9 °C, 27.4 °C, and 25.0 °C, respectively. Similar to the air temperature, the average soil temperature remained relatively stable at around 27 °C from June to August but decreased to 25.0 °C in September. The minimum soil temperatures were observed to be 23.5 °C, 22.5 °C, 19 °C, and 18.4 °C for the respective months, with September recording the lowest minimum soil temperature. On the other hand, the maximum soil temperatures were observed to be 33.4 °C, 34.5 °C, 35.9 °C, and 32.8 °C, respectively, with August recording the highest maximum soil temperature.

4.2.5. Soil Moisture

As can be seen in Figure 16, the average values of soil moisture data were observed to be 25.8%, 26.6%, 27.1%, and 27.0% for the respective months. The average soil moisture exhibited an increasing trend from 25.8% in June to 27.1% in August and then remained stable at 27.0% in September. The minimum soil moisture levels were observed to be 24.5%, 24.8%, 25.0%, and 19.2% for the respective months, with September recording the lowest minimum soil moisture. Conversely, the maximum soil moisture levels were observed to be 27.0%, 28.9%, 30.5%, and 32.1%, respectively, with September recording the highest maximum soil moisture.

4.2.6. Insolation

As shown in Figure 17, the average values of insolation data were recorded as 89.1 W/

m^{2}

, 97.0 W/

m^{2}

, 104.3 W/

m^{2}

, and 119.2 W/

m^{2}

for respective months. The average insolation displayed a consistent increasing trend from 89.1 W/

m^{2}

in June to 119.2 W/

m^{2}

in September. Throughout the specified months, the minimum insolation levels remained constant at 0 W/

m^{2}

. On the other hand, the maximum insolation levels were observed to be 567.5 W/

m^{2}

, 674.5 W/

m^{2}

, 923.1 W/

m^{2}

, and 830.8 W/

m^{2}

, respectively, with August recording the highest maximum insolation.

4.2.7. Soil EC

As can be seen in Figure 18, the average values of soil electrical conductivity data were observed to be 0.53 dS/m, 0.51 dS/m, 0.50 dS/m, and 0.63 dS/m, respectively. The average soil EC showed a decreasing trend from 0.53 dS/m in June to 0.50 dS/m in August but experienced a sharp increase to 0.63 dS/m in September. The minimum soil EC levels were observed to be 0.46 dS/m, 0.4 dS/m, 0.41 dS/m, and 0.41 dS/m, respectively, with July recording the lowest minimum soil EC. The maximum soil EC levels were observed to be 0.73 dS/m, 0.76 dS/m, 0.87 dS/m, and 1.17 dS/m, respectively, with September recording the highest maximum soil EC.

4.2.8. Drainage EC

As shown in Figure 19, the average values of drainage EC data were observed to be 3.5 dS/m, 3.2 dS/m, 3.0 dS/m, and 3.8 dS/m, respectively. Similar to the soil EC, the average drainage EC showed a decreasing trend from 3.5 dS/m in June to 3.0 dS/m in August but experienced a sharp increase to 3.8 dS/m in September. The minimum drainage EC levels were observed to be 2.96 dS/m, 2.44 dS/m, 2.38 dS/m, and 2.59 dS/m, respectively, with August recording the lowest minimum drainage EC. The maximum drainage EC levels were observed to be 4.78 dS/m, 4.66 dS/m, 5.43 dS/m, and 6.92 dS/m, respectively, with September recording the highest maximum drainage EC.

4.2.9. Drainage pH

Figure 20 depicts the drainage pH data, and the average values were observed to be 7.3 pH, 6.9 pH, 6.8 pH, and 6.6 pH, respectively. The average drainage pH continued to fall from June to September. The minimum drainage pH levels were observed to be 6.33 pH, 5.8 pH, 5.52 pH, and 4.93 pH, respectively, with September recording the lowest minimum drainage pH. The maximum drainage pH levels were observed to be 8.21 pH, 7.93 pH, 8.1 pH, and 8.0 pH, respectively, with June recording the highest maximum drainage pH.

4.3. Correlation Heatmap Analysis

After analyzing the data, it was observed that certain environmental sensor data, such as air temperature and soil temperature, soil EC, and drainage EC, exhibited similar monthly changes. To examine these relationships, we computed the correlations among the data and constructed separate data heatmaps for both day and night periods, to provide a clear overview of the correlations. The Pearson correlation is given in Equation (13):

r_{x y} = \frac{\sum_{i = 1}^{n} {(x}_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}},

(13)

where

n

represents the size of the data,

x

and

y

represent the data

x

and data

y

, respectively, and

x_{i}

and

y_{i}

represent the

i^{t h}

value of

x

and

y

, respectively [54,55,56].

Table 1 illustrates the interpretation of correlation coefficient values. These values range from −1 to 1, with values closer to 1 suggesting a strong correlation, and values closer to 0 indicating minimal to no correlation between the two datasets [57,58].

The analysis of the correlation heatmap in Figure 21 revealed the correlations among various environmental sensor data. Among these, we focused on the analysis of data related to air temperature for the evaluation of the stacking ensemble model’s performance.

From Figure 21, it can be observed that there exists a strong positive correlation between air temperature and soil temperature, as well as between soil EC and drainage EC. Moreover, a strong negative correlation was identified between air temperature and humidity. Moderate positive(negative) correlations were found between air temperature and CO₂, air temperature and insolation, humidity and CO₂, humidity and soil temperature, humidity and insolation, CO₂ and insolation, and soil temperature, and drainage pH. As for the remaining data, weak or very weak correlations, or no correlation at all, were observed. Based on these results, we proceeded to predict air temperature using a total of five sensor data: humidity, CO₂, soil temperature, insolation, and drainage pH.

4.4. Performance Evaluation

For the performance comparison of the stacking ensemble model, we selected and designed LSTM, BI-LSTM, and GRU as the performance evaluation targets for the base models. The input data included humidity, CO₂, soil temperature, insolation, and drainage pH, which exhibited significant correlations with air temperature.

The base models of the stacking ensemble had an input sequence length of 5 and a layer configuration of 64, 32, 8, 1. Consequently, the LSTM, BI-LSTM, and GRU models, which were the subjects of performance evaluation, were designed with the same input sequence and layer configuration. Figure 22 shows the model architectures of LSTM, BI-LSTM, and GRU. The dataset was divided into training, testing, and validation sets, following an 8:1:1 ratio. The models were trained using the Adam optimizer for 100 epochs, with a batch size of 16. To mitigate overfitting, early stopping was incorporated as a regularization technique during the training phase.

The model’s performance was assessed using the validation data, employing evaluation metrics such as mean squared error (MSE), mean absolute error (MAE), and R-squared. The MSE, MAE, and R-squared are shown in Equations (14), (15) and (16), respectively:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2},

(14)

M A E = \frac{1}{n} \sum_{i = 1}^{n} (|y_{i} - {\hat{y}}_{i}|),

(15)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} .

(16)

where

n

represents the size of the data,

y

,

\hat{y}

, and

\bar{y}

represent the actual air temperature values, model-predicted air temperature values, and the average of model-predicted air temperature, respectively.

y_{i}

and

{\hat{y}}_{i}

represent the

i^{t h}

value of

y

and

\hat{y}

, respectively [59,60].

Figure 23, Figure 24, Figure 25 and Figure 26 show the air temperature prediction results using the validation data for the Stacking ensemble, LSTM, BI-LSTM, and GRU models, respectively. The corresponding MSE, MAE, and R-squared values for each model are presented in Table 2.

The performance evaluation results of the stacking ensemble, LSTM, BI-LSTM, and GRU models indicate that the stacking ensemble model achieved an MSE of 0.594, MAE of 0.601, and an R-squared value of 0.958. The LSTM model achieved an MSE of 0.668, MAE of 0.623, and an R-squared value of 0.953. The BI-LSTM model achieved an MSE of 0.772, MAE of 0.670, and an R-squared value of 0.946. The GRU model achieved an MSE of 0.720, MAE of 0.639, and an R-squared value of 0.950. Overall, the stacking ensemble model exhibited superior performance compared to the LSTM, GRU, and BI-LSTM models.

5. Conclusions

Smart farms in South Korea are categorized into three generations, namely the first, second, and third generations. However, when examining the proportion of smart farms in South Korea in 2020, it is evident that most are first generation of smart farms, which are relatively low-tech. In this paper, a study was conducted on the prediction of environmental sensor data in the environment of first generation smart farms to facilitate the adoption of second generation smart farms. To do this, we first established a smart farm testbed and collected environmental sensor data from 20 June to 30 September. Following that, we conducted kernel density estimation analysis, data analysis, and correlation heatmap analysis based on the collected data. Subsequently, we utilized LSTM, BI-LSTM, and GRU as base models to construct a stacking ensemble model. To assess the performance of the proposed model based on the analyzed results, we utilized LSTM, BI-LSTM, and GRU as the existing models for predicting one of the sensor data variables, air temperature. As a result, the stacking ensemble model exhibited superior performance compared to the LSTM, BI-LSTM, and GRU models, with MSE 0.594, MAE 0.601, and an R-squared value of 0.958.

However, several limitations were identified in this study. Data is collected and analyzed for a relatively short duration, approximately from June to September, spanning about three months, to create the prediction model of environmental sensor data in this study. South Korea has distinct seasons, including spring, summer, autumn, and winter, leading to significant variations in crop growth and environmental conditions. Consequently, the prediction model of environmental sensor data primarily trained on summer data may not adequately capture the variability in other seasons, potentially resulting in the prediction errors of environmental sensor data. Therefore, to enhance the reliability of both the data and the model, there is a need to collect and analyze a diverse and extensive dataset over an extended period, encompassing all seasons, including the four distinct seasons. Furthermore, within the smart farming environment of this study, data were collected from nine sensors, encompassing air temperature, humidity, CO₂, soil temperature, insolation, soil electrical conductivity (EC), drainage electrical conductivity (EC), and drainage pH. However, to ensure optimal crop growth, there are additional environmental variables that need to be predicted beyond nine sensor data, including wind speed, precipitation, stages of crop growth, and seasonal variations. Therefore, to enhance the prediction model of environmental sensor data, it is imperative to consider these diverse environmental factors. Lastly, this study was conducted within a smart farming environment, utilizing hydroponic methods for cultivating paprika inside a greenhouse. However, if there are changes in the cultivation environment or the type of crops being grown, there is a potential for reduced accuracy and performance of the prediction model of environmental sensor data. Therefore, to ensure its generalizability, it is crucial to validate and adjust the model’s performance across a variety of cultivation environments and crop types.

In summary, there is a need for long-term data collection and analysis that considers all seasons, as well as the inclusion of various environmental variables such as wind and precipitation, to improve the model’s accuracy. Additionally, conducting experiments in diverse cultivation environments and with different crops is necessary for model generalization. Addressing these limitations would enable the development of the prediction model of environmental sensor data that can forecast all four seasons, effectively incorporating South Korea’s distinct seasonal characteristics. Furthermore, the application of this model across various cultivation environments and crops holds the potential to enhance agricultural productivity and efficiency.

This study is anticipated to offer solutions to rural challenges like an aging and dwindling population through technological progress. It also seeks to address sustainability concerns by modernizing agriculture and laying the groundwork for the transition from first generation smart farms to second generation ones. Additionally, these advancements in smart farm technology are poised to make significant contributions to the improvement of domestic crop productivity and cultivation efficiency. Consequently, they are expected to play a pivotal role in bolstering food security in the region.

Author Contributions

Conceptualization, S.-H.H. and H.-S.J.; methodology, S.-H.H. and H.-S.J.; software, S.-H.H.; validation, H.-S.J.; writing—original draft preparation, S.-H.H. and H.-S.J.; writing—review and editing, H.-S.J. and H.M.; supervision, H.-S.J.; funding acquisition, H.-S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-00751, Development of multi-dimensional visualization digital twin framework technology for displaying visible and invisible information with lower than 0.5 mm precision).

Data Availability Statement

Data is unavailable due to privacy.

Conflicts of Interest

The authors declare no conflict of interest.

References

Muhie, S. Novel approaches and practices to sustainable agriculture. J. Agric. Food Res. 2022, 10, 100446. [Google Scholar] [CrossRef]
Moreira, R.; Moreira, L.; Munhoz, P.; Lopes, E.; Ruas, R. AgroLens: A low-cost and green-friendly Smart Farm Architecture to support real-time leaf disease diagnostics. Internet Things 2022, 19, 100570. [Google Scholar] [CrossRef]
Yoon, C.; Lim, D.; Park, C. Factors affecting adoption of smart farms: The case of Korea. Comput. Hum. Behav. 2020, 108, 106309. [Google Scholar] [CrossRef]
Kim, S.; Lee, M.; Shin, C. IoT-Based Strawberry Disease Prediction System for Smart Farming. Sensors 2018, 18, 4051. [Google Scholar] [CrossRef] [PubMed]
Son, K.; Sim, H.; Lee, J.; Lee, J. Precise Sensing of Leaf Temperatures for Smart Farm Applications. Horticulturae 2023, 9, 518. [Google Scholar] [CrossRef]
Chae, C.; Cho, H.; Cho, H. Enhanced secure device authentication algorithm in P2P-based smart farm system. Peer-to-Peer Netw. Appl. 2018, 11, 1230–1239. [Google Scholar] [CrossRef]
Muangprathub, J.; Boonnam, N.; Kajornkasirat, S.; Lekbangpong, N.; Wanichsombat, A.; Nillaor, P. IoT and agriculture data analysis for smart farm. Comput. Electron. Agric. 2019, 156, 467–474. [Google Scholar] [CrossRef]
Choi, S.; Shin, Y.J. Role of Smart Farm as a Tool for Sustainable Economic Growth of Korean Agriculture: Using Input–Output Analysis. Sustainability 2023, 15, 3450. [Google Scholar] [CrossRef]
Inoue, Y. Satellite-and drone-based remote sensing of crops and soils for smart farming—A review. Soil Sci. Plant Nutr. 2020, 66, 798–810. [Google Scholar] [CrossRef]
Al-Ali, A.; Nabulsi, A.; Mukhopadhyay, S.; Awal, M.; Fernandes, S.; Ailabouni, K. IoT-solar energy powered smart farm irrigation system. J. Electron. Sci. Technol. 2019, 17, 100017. [Google Scholar] [CrossRef]
Smart Agriculture Domestic and Overseas Market Status (Marketsandmarkets, 2020). Available online: https://www.mafra.go.kr/home/5281/subview.do (accessed on 10 July 2023).
Smart Farm Overview. Available online: https://www.mafra.go.kr/home/5280/subview.do (accessed on 10 July 2023).
Analysis on the Status and Future Development of Smart Farming Projects. Available online: https://korea.nabo.go.kr/naboEng/bbs/BMSR00154/view.do?boardId=3109&gubunCd=B154001&menuNo=17700027&pageIndex=1 (accessed on 10 July 2023).
Tong, R.C.; Whitehead, C.S.; Fawole, O.A. Effects of Conventional and Bokashi Hydroponics on Vegetative Growth, Yield and Quality Attributes of Bell Peppers. Plants 2021, 10, 1281. [Google Scholar] [CrossRef]
Nguyen, G.N.; Lantzke, N. Mitigating the Adverse Effects of Semi-Arid Climate on Capsicum Cultivation by Using the Retractable Roof Production System. Plants 2022, 11, 2794. [Google Scholar] [CrossRef] [PubMed]
Mishra, S.; Karetha, K.M.; Yau, Y.Y.; Easterling, M. Vertical Cultivation: Moving Towards a Sustainable and Eco-Friendly Farming. In Biotechnological Innovations for Environmental Bioremediation; Arora, S., Kumar, A., Ogita, S., Yau, Y.Y., Eds.; Springer Nature Singapore: Singapore, 2022; pp. 487–507. [Google Scholar]
Yuan, G.N.; Marquez, G.P.B.; Deng, H.; Iu, A.; Fabella, M.; Salonga, R.B.; Ashardiono, F.; Cartagena, J.A. A Review on Urban Agriculture: Technology, Socio-Economy, and Policy. Heliyon 2022, 8, e11583. [Google Scholar] [CrossRef]
Kim, K.H.; Shawon, M.R.A.; An, J.H.; Lee, H.J.; Kwon, D.J.; Hwang, I.-C.; Bae, J.H.; Choi, K.Y. Effect of Shade Screen on Sap Flow, Chlorophyll Fluorescence, NDVI, Plant Growth and Fruit Characteristics of Cultivated Paprika in Greenhouse. Agriculture 2022, 12, 1405. [Google Scholar] [CrossRef]
Kwon, Y.B.; Lee, J.H.; Roh, Y.H.; Choi, I.-L.; Kim, Y.; Kim, J.; Kang, H.-M. Effect of Supplemental Inter-Lighting on Paprika Cultivated in an Unheated Greenhouse in Summer Using Various Light-Emitting Diodes. Plants 2023, 12, 1684. [Google Scholar] [CrossRef] [PubMed]
Srinivasarao, C.H.; Rao, K.V.; Gopinath, K.A.; Prasad, Y.G.; Arunachalam, A.; Ramana, D.B.V.; Mohapatra, T. Agriculture Contingency Plans for Managing Weather Aberrations and Extreme Climatic Events: Development, Implementation, and Impacts in India. Adv. Agron. 2020, 159, 35–91. [Google Scholar] [CrossRef]
Gallic, E.; Vermandel, G. Weather Shocks. Eur. Econ. Rev. 2020, 124, 103409. [Google Scholar] [CrossRef]
Ahmed, D.M.; Hassan, M.M.; Mstafa, R.J. A Review on Deep Sequential Models for Forecasting Time Series Data. Appl. Comput. Intell. Soft Comput. 2022, 2022, 6596397. [Google Scholar] [CrossRef]
Alkinani, M.H.; Khan, W.Z.; Arshad, Q. Detecting Human Driver Inattentive and Aggressive Driving Behavior Using Deep Learning: Recent Advances, Requirements, and Open Challenges. IEEE Access 2020, 8, 105008–105030. [Google Scholar] [CrossRef]
Smagulova, K.; James, A.P. A Survey on LSTM Memristive Neural Network Architectures and Applications. Eur. Phys. J. Spec. Top 2019, 228, 2313–2324. [Google Scholar] [CrossRef]
Yadav, A.; Jha, C.K.; Sharan, A. Optimizing LSTM for Time Series Prediction in Indian Stock Market. Procedia Comput. Sci. 2020, 167, 2091–2100. [Google Scholar] [CrossRef]
Memarzadeh, G.; Keynia, F. A new short-term wind speed forecasting method based on fine-tuned LSTM neural network and optimal input sets. Energy Convers. Manag. 2020, 213, 112824. [Google Scholar] [CrossRef]
Farzad, A.; Mashayekhi, H.; Hassanpour, H. A Comparative Performance Analysis of Different Activation Functions in LSTM Networks for Classification. Neural Comput. Appl. 2019, 31, 2507–2521. [Google Scholar] [CrossRef]
Ding, G.; Liangxi, Q. Study on the prediction of stock price based on the associated network model of LSTM. Int. J. Mach. Learn. Cybern. 2020, 11, 1307–1317. [Google Scholar] [CrossRef]
Zhou, J.; Wang, Y.; Xiao, F.; Wang, Y.; Sun, L. Water Quality Prediction Method Based on IGRA and LSTM. Water 2018, 10, 1148. [Google Scholar] [CrossRef]
Landi, F.; Baraldi, L.; Cornia, M.; Cucchiara, R. Working Memory Connections for LSTM. Neural Netw. 2021, 144, 334–341. [Google Scholar] [CrossRef]
Lindemann, B.; Maschler, B.; Sahlab, N.; Weyrich, M. A survey on anomaly detection for technical systems using LSTM net-works. Comput. Ind. 2021, 131, 103498. [Google Scholar] [CrossRef]
Salehin, i.; Talha, I.; Hasan, M.; Dip, S.; Saifuzzaman, M.; Moon, N. An artificial intelligence based rainfall prediction using LSTM and neural network. In Proceedings of the 2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE), Bhubaneswar, India, 26–27 December 2020; pp. 5–8. [Google Scholar] [CrossRef]
Siami-Namini, S.; Tavakoli, N.; Namin, A. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 9–12. [Google Scholar] [CrossRef]
Alharbi, F.R.; Csala, D. Wind Speed and Solar Irradiance Prediction Using a Bidirectional Long Short-Term Memory Model Based on Neural Networks. Energies 2021, 14, 6501. [Google Scholar] [CrossRef]
Zheng, S.; Wang, J.; Zhuo, Y.; Yang, D.; Liu, R. Spatial distribution model of DEHP contamination categories in soil based on Bi-LSTM and sparse sampling. Ecotoxicol. Environ. Saf. 2022, 229, 113092. [Google Scholar] [CrossRef]
Zhao, H.; Hou, C.; Alrobassy, H.; Zeng, X. Recognition of Transportation State by Smartphone Sensors Using Deep Bi-LSTM Neural Network. J. Comput. Netw. Commun. 2019, 2019, 4967261. [Google Scholar] [CrossRef]
Becerra-Rico, J.; Aceves-Fernández, M.; Esquivel-Escalante, K.; Pedraza-Ortega, J. Airborne particle pollution predictive model using Gated Recurrent Unit (GRU) deep neural networks. Earth Sci. Inform. 2020, 13, 821–832. [Google Scholar] [CrossRef]
Dutta, A.; Kumar, S.; Basu, M. A Gated Recurrent Unit Approach to Bitcoin Price Prediction. J. Risk Financ. Manag 2020, 13, 23. [Google Scholar] [CrossRef]
Zulqarnain, M.; Ghazali, R.; Hassim, Y.; Admir, M. An enhanced gated recurrent unit with auto-encoder for solving text classification problems. Arab. J. Sci. Eng. 2021, 46, 8953–8967. [Google Scholar] [CrossRef]
Zhang, W.; Li, H.; Tang, L.; Gu, X.; Wang, L.; Wang, L. Displacement prediction of Jiuxianping landslide using gated recurrent unit (GRU) networks. Acta Geotech. 2022, 17, 1367–1382. [Google Scholar] [CrossRef]
Mirzaei, S.; Kang, J.; Chu, K. A comparative study on long short-term memory and gated recurrent unit neural networks in fault diagnosis for chemical processes using visualization. J. Taiwan Inst. Chem. Eng. 2022, 130, 104028. [Google Scholar] [CrossRef]
Koudjonou, K.M.; Rout, M. A Stateless Deep Learning Framework to Predict Net Asset Value. Neural Comput. Appl. 2020, 32, 1–19. [Google Scholar] [CrossRef]
Fanta, H.; Shao, Z.; Ma, L. SiTGRU: Single-tunnelled gated recurrent unit for abnormality detection. Inf. Sci. 2020, 524, 15–32. [Google Scholar] [CrossRef]
Li, Y.; Pan, Y. A Novel Ensemble Deep Learning Model for Stock Prediction Based on Stock Prices and News. Int. J. Data Sci. Anal. 2022, 13, 139–149. [Google Scholar] [CrossRef]
Ribeiro, M.H.D.M.; Coelho, L.S. Ensemble Approach Based on Bagging, Boosting, and Stacking for Short-Term Prediction in Agribusiness Time Series. Appl. Soft Comput. 2020, 86, 105837. [Google Scholar] [CrossRef]
Dou, J.; Yunus, A.P.; Bui, D.T.; Merghadi, A.; Sahana, M.; Zhu, Z.; Pham, B.T. Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan. Landslides 2020, 17, 641–658. [Google Scholar] [CrossRef]
Gupta, A.; Jain, V.; Singh, A. Stacking Ensemble-Based Intelligent Machine Learning Model for Predicting Post-COVID-19 Complications. New Gener. Comput. Vol. 2022, 40, 987–1007. [Google Scholar] [CrossRef]
Soleymanzadeh, R.; Aljasim, M.; Qadeer, M.W.; Kashef, R. Cyberattack and Fraud Detection Using Ensemble Stacking. AI 2022, 3, 22–36. [Google Scholar] [CrossRef]
Chen, C.; Chou, F.; Chou, J. Temperature prediction for reheating furnace by gated recurrent unit approach. IEEE Access 2022, 10, 33362–33369. [Google Scholar] [CrossRef]
Kamalov, F. Kernel density estimation based sampling for imbalanced class distribution. Inf. Sci. 2020, 512, 1192–1201. [Google Scholar] [CrossRef]
Wahbah, M.; Mohandes, B.; EL-Fouly, T.; Moursi, M. Unbiased cross-validation kernel density estimation for wind and PV probabilistic modelling. Energy Convers. Manag. 2022, 266, 115811. [Google Scholar] [CrossRef]
Zhou, B.; Ma, X.; Luo, Y.; Yang, D. Wind power prediction based on LSTM networks and nonparametric kernel density es-timation. IEEE Access 2019, 7, 165279–165292. [Google Scholar] [CrossRef]
Zhang, L.; Lu, S.; Ding, Y.; Duan, D.; Wang, Y.; Wang, P.; Yang, L.; Fan, H.; Cheng, Y. Probability Prediction of Short-Term User-Level Load Based on Random Forest and Kernel Density Estimation. Energy Rep. 2022, 8, 1130–1138. [Google Scholar] [CrossRef]
Jebli, I.; Belouadha, F.Z.; Kabbaj, M.I.; Tilioua, A. Prediction of Solar Energy Guided by Pearson Correlation Using Machine Learning. Energy 2021, 224, 120109. [Google Scholar] [CrossRef]
Baak, M.; Koopman, R.; Snoek, H.; Klous, S. A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics. Comput. Stat. Data Anal. 2020, 152, 107043. [Google Scholar] [CrossRef]
Chicco, D.; Giuseppe, J. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
Fu, T.; Tang, X.; Cai, Z.; Zuo, Y.; Tang, Y.; Zhao, X. Correlation Research of Phase Angle Variation and Coating Performance by Means of Pearson’s Correlation Coefficient. Prog. Org. Coat. 2020, 139, 105459. [Google Scholar] [CrossRef]
Schober, P.; Boer, C.; Schwarte, L. Correlation coefficients: Appropriate use and interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef] [PubMed]
Chicco, D.; Warrens, M.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Wu, F.; Takyi-Aninakwa, P.; Fernandez, C.; Stroe, D.I.; Huang, Q. Improved Singular Filtering-Gaussian Process Regression-Long Short-Term Memory Model for Whole-Life-Cycle Remaining Capacity Estimation of Lithium-Ion Batteries Adaptive to Fast Aging and Multi-Current Variations. Energy 2023, 284, 128677. [Google Scholar] [CrossRef]

Figure 1. Adoption of Smart Farm in South Korea.

Figure 2. LSTM cell structure.

Figure 3. BI-LSTM structure.

Figure 4. GRU cell structure.

Figure 5. Testbed structure.

Figure 6. Testbed for Smart Farm ((a): Main board and Router, (b): Drainage EC Sensor, (c): Drainage pH Sensor, (d): Soil temperature, moisture, EC, Drainage EC, pH Sensor node, (e): Soil temperature, moisture, EC Sensor, (f): Air temperature, Humidity, CO₂, Insolation Sensor node and Air temperature, Humidity Sensor, (g): CO₂ Sensor, (h): Insolation Sensor).

Figure 7. Data acquisition process.

Figure 8. The process of storing data in a database.

Figure 9. Preprocessed data.

Figure 10. Architecture of stacking ensemble.

Figure 11. Gaussian kernel density estimation.

Figure 12. Air Temperature data.

Figure 13. Humidity data.

Figure 14. CO₂ data.

Figure 15. Soil Temperature data.

Figure 16. Soil Moisture data.

Figure 17. Insolation data.

Figure 18. Soil EC data.

Figure 19. Drainage EC data.

Figure 20. Drainage pH data.

Figure 21. Correlation Heatmap.

Figure 22. Model architectures for LSTM, BI-LSTM, and GRU.

Figure 23. A prediction result of air temperature using stacking ensemble.

Figure 24. A prediction result of air temperature using LSTM.

Figure 25. A prediction result of air temperature using BI-LSTM.

Figure 26. A prediction result of air temperature using GRU.

Table 1. Correlation coefficient value.

Value (Absolute Value)	Equipment and Device
0.9~1.0	Very strong positive(negative) correlation
0.7~0.9	Strong positive(negative) correlation
0.5~0.7	Moderate positive(negative) correlation
0.3~0.5	Weak positive(negative) correlation
0.1~0.3	Very Weak positive(negative) correlation
0.0~0.1	Almost no positive(negative) correlation

Table 2. Performance comparison for existing and proposed models.

Model	MSE	MAE	$R^{2}$
GRU	0.720	0.639	0.950
LSTM	0.668	0.623	0.953
BI-LSTM	0.772	0.670	0.946
Ensemble	0.594	0.601	0.958

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, S.-H.; Mutahira, H.; Jang, H.-S. Prediction of Sensor Data in a Greenhouse for Cultivation of Paprika Plants Using a Stacking Ensemble for Smart Farms. Appl. Sci. 2023, 13, 10464. https://doi.org/10.3390/app131810464

AMA Style

Han S-H, Mutahira H, Jang H-S. Prediction of Sensor Data in a Greenhouse for Cultivation of Paprika Plants Using a Stacking Ensemble for Smart Farms. Applied Sciences. 2023; 13(18):10464. https://doi.org/10.3390/app131810464

Chicago/Turabian Style

Han, Seok-Ho, Husna Mutahira, and Hoon-Seok Jang. 2023. "Prediction of Sensor Data in a Greenhouse for Cultivation of Paprika Plants Using a Stacking Ensemble for Smart Farms" Applied Sciences 13, no. 18: 10464. https://doi.org/10.3390/app131810464

APA Style

Han, S.-H., Mutahira, H., & Jang, H.-S. (2023). Prediction of Sensor Data in a Greenhouse for Cultivation of Paprika Plants Using a Stacking Ensemble for Smart Farms. Applied Sciences, 13(18), 10464. https://doi.org/10.3390/app131810464

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Sensor Data in a Greenhouse for Cultivation of Paprika Plants Using a Stacking Ensemble for Smart Farms

Abstract

1. Introduction

2. Related Work

2.1. Long Short-Term Memory (LSTM)

2.2. Bidirectional Long Short-Term Memory (BI-LSTM)

2.3. Gated Recurrent Unit (GRU)

3. Proposed Method

3.1. Data Acquisition

3.1.1. Process of Acquiring Environmental Sensor Data

3.1.2. Establishment of Smart Farm Environment Database

3.1.3. Preprocessing of Environmental Sensor Data

3.2. Stacking Ensemble

4. Results and Discussion

4.1. Kernel Density Estimation Analysis

4.2. Data Analysis

4.2.1. Air Temperature

4.2.2. Humidity

4.2.3. CO2

4.2.4. Soil Temperature

4.2.5. Soil Moisture

4.2.6. Insolation

4.2.7. Soil EC

4.2.8. Drainage EC

4.2.9. Drainage pH

4.3. Correlation Heatmap Analysis

4.4. Performance Evaluation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2.3. CO₂