Hybrid CNN-LSTM Model with Custom Activation and Loss Functions for Predicting Fan Actuator States in Smart Greenhouses

Airlangga, Gregorius; Bata, Julius; Adi Nugroho, Oskar Ika; Lim, Boby Hartanto Pramudita

doi:10.3390/agriengineering7040118

Open AccessArticle

Hybrid CNN-LSTM Model with Custom Activation and Loss Functions for Predicting Fan Actuator States in Smart Greenhouses

¹

Department of Information Systems, Atma Jaya Catholic University of Indonesia, Jakarta 12930, Indonesia

²

Department of Electrical Engineering, National Chung Cheng University, Chiayi 621301, Taiwan

³

Department of Computer Science, National Chung Cheng University, Chiayi 621301, Taiwan

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(4), 118; https://doi.org/10.3390/agriengineering7040118

Submission received: 15 January 2025 / Revised: 20 March 2025 / Accepted: 7 April 2025 / Published: 10 April 2025

Download

Browse Figure

Versions Notes

Abstract

:

Smart greenhouses rely on precise environmental control to optimize crop yields and resource efficiency. In this study, we propose a novel hybrid Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) architecture to predict fan actuator states based on environmental data. The hybrid model integrates CNNs for spatial feature extraction and LSTMs for temporal dependency modeling, enhanced by a custom activation function and loss function tailored for the problem’s characteristics. The model was trained and evaluated on a comprehensive dataset containing 37,923 samples with 13 environmental features, collected from a smart greenhouse. Experimental results demonstrate the superior performance of the hybrid CNN-LSTM model, achieving an accuracy of 0.9992, precision of 0.9989, recall of 0.9996, and an F1 score of 0.9992, significantly outperforming traditional machine learning methods such as Random Forest and Gradient Boosting, as well as standalone CNN and LSTM architectures. The high recall underscores the model’s reliability in identifying positive actuator states, critical for greenhouse management. This study highlights the importance of hybrid architectures in handling complex spatiotemporal data, offering potential applications beyond greenhouses, such as healthcare monitoring and predictive maintenance. Despite the model’s strengths, limitations include computational complexity and limited interpretability, necessitating future work on optimization and explainability. These findings establish a foundation for integrating deep learning into smart agricultural systems, advancing the automation and efficiency of environmental control mechanisms.

Keywords:

smart greenhouse management; precision agriculture; deep learning; CNN-LSTM hybrid model; predictive automation

1. Introduction

The rapid advancement of smart agricultural systems has revolutionized how environmental conditions are monitored and controlled, optimizing crop growth and yield [1,2,3]. Among these innovations, smart greenhouses have emerged as a pivotal component of precision agriculture, leveraging advanced technologies to create controlled environments for sustainable food production [4,5,6]. Within this context, fan actuators play a critical role in regulating airflow to maintain optimal temperature and humidity levels, ensuring a stable microclimate for plant growth [7]. Predicting the Fan Actuator Activation State is not merely a technical challenge, but also a necessity for efficient and sustainable greenhouse management [8]. Efficient fan control impacts multiple facets of greenhouse operations, including energy efficiency, crop health, and environmental sustainability [9]. Unnecessary fan operation leads to increased energy consumption, raising operational costs and carbon footprints [10]. Conversely, failure to activate fans when required can result in adverse conditions such as overheating or excessive humidity, which can damage crops, promote pest infestations, or reduce yields [11]. Therefore, developing an intelligent system capable of accurately predicting fan activation is essential for achieving energy-efficient and climate-resilient agricultural practices.

Despite the increasing adoption of IoT-enabled greenhouses, many current systems rely on rudimentary rule-based algorithms or manual interventions, which fail to dynamically adapt to environmental changes [12,13]. Traditional threshold-based methods often overlook the complex non-linear interactions among environmental parameters, leading to suboptimal control decisions [14]. While machine learning (ML) and deep learning (DL) techniques offer promising alternatives by capturing intricate dependencies, existing models exhibit notable limitations in handling spatiotemporal data effectively [15]. Numerous studies have explored the application of machine learning and deep learning models in agricultural applications. For instance, decision trees have been used to predict irrigation needs, achieving moderate improvements over static methods [16]. Similarly, Convolutional Neural Networks (CNNs) have been applied to pest detection in image data, demonstrating their robustness in handling unstructured inputs [17]. However, the use of these models for real-time actuator control, particularly fan actuators in smart greenhouses, remains underexplored. Furthermore, most studies focus on single environmental parameters, neglecting the synergistic effects of multiple variables such as temperature, humidity, and soil nutrients [18,19,20].

A fundamental challenge in smart greenhouse management is class imbalance in actuator state data, as fan activations occur less frequently than non-activations. Many machine learning models struggle to generalize under such imbalanced distributions, resulting in biased predictions that compromise energy efficiency and crop health [21,22,23]. Deep learning-based models offer improvements in predictive accuracy but are typically computationally expensive, posing challenges for real-time deployment in resource-constrained environments [24]. Another key limitation in previous work is the reliance on fixed rule-based fan control mechanisms, which are unable to adapt dynamically to varying environmental conditions [25,26,27]. Many greenhouse systems employ simplistic ON/OFF heuristics, leading to either excessive fan usage or delayed activation, both of which negatively impact energy efficiency and plant health. These challenges highlight the necessity for an advanced predictive model that can seamlessly integrate spatial and temporal features while addressing data imbalance and computational efficiency concerns.

This research addresses these limitations by developing a hybrid CNN-LSTM model for predicting the Fan Actuator Activation State in smart greenhouses. The CNN component extracts spatial dependencies among sensor readings, capturing intricate patterns in temperature and humidity distributions, while the LSTM component models temporal variations, ensuring that actuator state predictions account for time-dependent fluctuations in greenhouse conditions. By combining these two architectures, the hybrid model leverages both spatial and temporal dependencies, significantly improving predictive accuracy compared to standalone CNN or LSTM models. A notable contribution in this field is the dataset developed by [28], which provides a rich collection of IoT sensor data from a fully operational smart greenhouse. This dataset enables the development of highly precise predictive models tailored to real-world smart greenhouse environments. However, to address data imbalance, this study applies Synthetic Minority Oversampling Technique (SMOTE), ensuring that the model does not develop biases toward the dominant actuator state. Additionally, a custom activation function and custom loss function are introduced to enhance the model’s learning efficiency and improve generalization across varying environmental conditions. These modifications enable the model to minimize errors in rare activation instances, ensuring robust decision-making for energy-efficient fan control.

To further enhance model performance, hyperparameter tuning is conducted using Keras Tuner, refining model parameters for maximum accuracy. This study also integrates K-fold cross-validation to provide a robust evaluation framework, ensuring reliable results across different data partitions. The novel contributions of this study include the development of a hybrid deep learning model that combines CNNs and LSTMs, specifically designed to capture both spatial and temporal dependencies in smart greenhouse data. The introduction of a custom loss function, which optimizes a combination of mean squared error and binary cross-entropy losses, further enhances prediction performance. A comparative analysis is conducted, benchmarking the hybrid model against traditional machine learning approaches such as Random Forest and Gradient Boosting, as well as standalone deep learning architectures such as CNNs and LSTMs. The results demonstrate the superior predictive capabilities of the proposed model, highlighting its potential impact on smart greenhouse management. The remainder of this article is organized as follows. The next section describes the dataset and preprocessing techniques, including imputation, scaling, and SMOTE-based augmentation. The Methodology section outlines the model architectures, training processes, and evaluation metrics. The Results section presents the performance comparison of various models and highlights the advantages of the proposed hybrid approach. The Discussion section elaborates on the implications of the findings, addressing challenges, limitations, and future directions. Finally, this study concludes by summarizing the contributions and their significance in advancing sustainable smart greenhouse management.

2. Dataset Description and Preprocessing Techniques

The dataset used in this research originated from a master’s thesis conducted by Mohammed Ismail Lifta (2023–2024) at Tikrit University, Iraq [28]. It represents a comprehensive collection of real-time environmental and actuator data from a smart greenhouse equipped with advanced IoT sensors. This dataset contains 37,922 rows and 13 columns, each corresponding to a recorded instance of environmental measurements and actuator states. The information provided by this dataset forms the foundation for developing predictive models to optimize greenhouse operations, specifically for controlling the fan actuator. The dataset comprises various features, including temporal, environmental, and actuator-related variables. The column labeled as date records the timestamp for each measurement in a datetime64 format, indicating when the data were captured. Environmental conditions such as temperature (temperature in degrees Celsius), humidity (percentage of environmental humidity), and water_level (percentage of water level) are recorded as integer values. Soil nutrient levels, represented by N, P, and K (nitrogen, phosphorus, and potassium levels, respectively), are scaled within the range

[0, 255]

, ensuring consistency in their representation. The actuator states are captured as binary indicators, where Fan_actuator_ON and Fan_actuator_OFF denote the operational status of the fan actuator, while similar pairs describe the states of the watering plant pump and water pump actuators.

To ensure the dataset’s readiness for machine learning and deep learning models, extensive preprocessing was performed. The primary steps involved handling temporal data, scaling numerical features, addressing class imbalance in the target variable, and ensuring the integrity of the dataset through imputation. The date column, initially in datetime64 format, was processed to extract two additional features: hour and minute. These features capture the temporal variations in the data, reflecting periodic changes in environmental conditions. The transformation is mathematically expressed as (1):

hour = ⌊\frac{t}{3600}⌋, minute = ⌊\frac{t mod 3600}{60}⌋

(1)

where t represents the timestamp in seconds since the epoch. After extracting these features, the original date column was dropped to simplify the feature set. Numerical features, including tempreature, humidity, water_level, N, P, and K, were standardized using Z-score normalization to ensure a uniform scale across features. This process is defined as (2):

X_{scaled} = \frac{X - μ}{σ}

(2)

where X represents the original feature value,

μ

is the mean, and

σ

is the standard deviation. This transformation centers the data around zero with a unit variance, which is crucial for ensuring effective training of machine learning models. The target variable Fan_actuator_ON exhibited class imbalance, which can adversely affect model performance. To address this, the Synthetic Minority Oversampling Technique (SMOTE) was applied. SMOTE generates synthetic samples for the minority class using the k-nearest neighbors algorithm. For two nearest neighbors

x_{1}

and

x_{2}

, a synthetic sample

\hat{x}

is generated as (3):

\hat{x} = x_{1} + λ (x_{2} - x_{1}), λ \sim U (0, 1)

(3)

This technique ensures a balanced class distribution, enhancing the model’s ability to generalize across both classes. Missing values in the dataset were minimal, primarily in the date column. These were imputed using linear interpolation, leveraging the temporal order of the data. For any remaining missing values in numerical features, mean imputation was used, where the missing value

X_{missing}

was replaced by the feature’s mean

μ

. Similarly, categorical features were imputed using the mode. The final dataset, after preprocessing, included features such as the standardized environmental variables, the temporal features hour and minute, and the actuator states excluding Fan_actuator_ON. This comprehensive preprocessing pipeline ensured that the data were balanced, scaled, and devoid of inconsistencies, making them suitable for predictive modeling. The target variable Fan_actuator_ON serves as the binary classification objective, representing whether the fan actuator is active (1) or inactive (0). This prepared dataset provides a robust basis for developing machine learning models aimed at enhancing the efficiency and sustainability of smart greenhouse operations.

3. Hybrid CNN-LSTM Architecture with Custom Activation Function

The hybrid CNN-LSTM model combines the strengths of Convolutional Neural Networks (CNNs) for spatial feature extraction and Long Short-Term Memory (LSTM) networks for capturing temporal dependencies. This architecture is specifically designed to predict the fan activation state in a smart greenhouse. Advanced techniques such as a custom loss function and a custom activation function further enhance its performance. This section provides a detailed mathematical and conceptual overview of the architecture.

3.1. Hybrid CNN-LSTM Model Flowchart

Figure 1 illustrates the architectural flow of the proposed hybrid CNN-LSTM model designed for predicting the activation state of the fan actuator in a smart greenhouse. This model processes sensor data sequentially through multiple layers, each serving a specific role in extracting spatial and temporal features. The process begins with the input layer, which receives raw sensor readings such as temperature, humidity, soil moisture, and CO² levels. These inputs undergo preprocessing steps, including normalization and imputation, to ensure data consistency before being passed into the neural network. The first major processing component of the model is the Conv1D layer, which is responsible for extracting localized spatial patterns from the sensor data. By applying convolutional filters, this layer captures meaningful correlations among environmental variables, helping to enhance predictive accuracy for actuator behavior. Following this, an activation layer applies a custom activation function that integrates the properties of both the tanh and sigmoid functions, introducing non-linearity into the network. To further refine the extracted features, a max-pooling operation is applied to reduce the spatial dimensions while retaining the most significant information. This operation improves computational efficiency by downsampling the feature maps, mitigating overfitting, and ensuring that only the most relevant features are carried forward in the model. The resulting feature maps are then passed through a flattening layer, which converts them into a one-dimensional vector in preparation for sequential processing by the LSTM component.

The bidirectional LSTM layer forms the core of the temporal modeling process. Unlike conventional LSTM architectures, which process time-series data in a single direction, the bidirectional nature of this layer enables the model to learn from both past and future time steps within the sensor data sequence. This capability enhances the model’s ability to recognize recurring patterns and trends, leading to improved predictive performance. The LSTM network maintains a cell state and a hidden state, which are iteratively updated using gating mechanisms such as the input gate, forget gate, and output gate. These mechanisms regulate the flow of information within the LSTM cell, ensuring that relevant dependencies are preserved while redundant or less important details are discarded. The updated cell state is then modulated using the custom activation function to introduce additional non-linearity, further enhancing the model’s capacity for learning complex sequential dependencies. Once the spatiotemporal features have been fully processed, they are passed to a fully connected dense layer, which serves as the classification stage of the model. The dense layer applies weighted transformations to the learned feature representations, refining them to improve prediction accuracy. Finally, the output layer produces a binary classification decision, determining whether the fan actuator should be turned on. This final output is generated using a sigmoid activation function, which produces a probability score between zero and one, representing the likelihood of fan activation. By applying an appropriate threshold to this probability score, the model makes a definitive binary decision about the actuator state.

The proposed hybrid CNN-LSTM model effectively combines convolutional and recurrent learning techniques to improve fan actuator state prediction. The CNN component ensures robust spatial feature extraction, while the LSTM component captures temporal dependencies within the greenhouse sensor data. The use of a custom activation function enhances the model’s non-linearity, improving its ability to model complex relationships in environmental conditions. Additionally, max-pooling and flattening operations ensure computational efficiency by reducing the model’s dimensional complexity, and the bidirectional nature of the LSTM layer allows for better sequence modeling by capturing both forward and backward dependencies in time-series data. By integrating these components, the hybrid CNN-LSTM model provides an advanced predictive framework tailored for smart greenhouse applications. Its ability to incorporate spatial and temporal features enables it to make highly accurate predictions, leading to more efficient energy usage, improved climate control, and enhanced crop health. The structured processing of data, from the input layer through the convolutional and recurrent layers to the final classification output, ensures that the model effectively leverages patterns in sensor readings to inform real-time actuator control decisions. Figure 1 provides an intuitive visual representation of this process, illustrating how raw sensor data are transformed step by step into an informed actuator decision that optimizes greenhouse conditions.

3.2. Convolutional Layers for Spatial Feature Extraction

The first stage of the hybrid architecture involves convolutional layers that process the input feature tensor

X \in R^{n \times d \times 1}

, where n is the number of samples, d is the number of features, and the last dimension represents the channel. These layers apply learnable filters W to extract spatial patterns. The convolution operation computes as (4):

Z [i, j] = \sum_{k = 1}^{k_{w}} X [i, j + k] \cdot W [k] + b,

(4)

where

k_{w}

is the kernel size, W represents the filter weights, and b is the bias term. This operation slides the filter across the input tensor, capturing local dependencies between features. The output of the convolution is passed through a non-linear activation function. In this architecture, a custom activation function replaces the standard ReLU to introduce additional flexibility in learning complex relationships as (5):

CustomActivation (z) = tanh (z) \cdot σ (z),

(5)

where the custom activation equation is the combination of tanh and sigmoid activation function, as presented in (6) and (7):

tanh (z) = \frac{e^{z} - e^{- z}}{e^{z} + e^{- z}}

(6)

σ (z) = \frac{1}{1 + e^{- z}} .

(7)

This custom activation function combines the properties of tanh, which allows outputs in the range

(- 1, 1)

, and

σ

, which compresses values into

(0, 1)

. The result is a smooth, non-linear function that enhances the model’s ability to learn subtle variations in the input. Following the convolution and activation, max-pooling is applied to reduce the spatial dimensions of the feature maps as (8):

Z_{pooled} [i, j] = max_{k \in [j, j + p]} Z [i, k]

(8)

where p is the pooling size. This step focuses on the most salient features, reducing computational complexity while maintaining essential information.

3.3. LSTM Layers for Temporal Dependency Modeling

The feature maps from the CNN layers are flattened and passed to an LSTM layer to model temporal dependencies. At each time step t, the LSTM cell maintains a cell state

c_{t}

and a hidden state

h_{t}

, updated through gating mechanisms. The input gate, forget gate, and output gate are defined as (9)–(11):

i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i})

(9)

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f})

(10)

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(11)

where

σ (x) = \frac{1}{1 + e^{- x}}

is the sigmoid activation function, and

W_{i}, W_{f}, W_{o}

are trainable weight matrices. The cell state and hidden state are updated as (12) and (13):

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ tanh (W_{c} [h_{t - 1}, x_{t}] + b_{c})

(12)

h_{t} = o_{t} ⊙ CustomActivation (c_{t})

(13)

Here, the custom activation function is used to modulate the cell state output, enhancing the network’s ability to model complex temporal patterns.

3.4. Dense Output Layer and Prediction

The output of the LSTM layer,

h_{LSTM}

, is passed to a dense layer for binary classification. The dense layer computes the final prediction as (14):

\hat{y} = CustomActivation (W_{o} \cdot h_{LSTM} + b_{o})

(14)

where

W_{o}

and

b_{o}

are the weights and biases of the dense layer. By using the custom activation function in the output layer, the model provides well-calibrated probabilities while maintaining sensitivity to subtle variations in the input features.

3.5. Custom Loss Function

The training process minimizes a custom loss function that combines binary cross-entropy (BCE) with mean squared error (MSE). The loss function is defined as (15):

L (θ) = \frac{1}{n} \sum_{i = 1}^{n} [- y_{i} log ({\hat{y}}_{i}) - (1 - y_{i}) log (1 - {\hat{y}}_{i}) + \frac{λ}{2} {({\hat{y}}_{i} - y_{i})}^{2}]

(15)

The binary cross-entropy term penalizes incorrect predictions as presented through (16):

BCE ({\hat{y}}_{i}, y_{i}) = - [y_{i} log ({\hat{y}}_{i}) + (1 - y_{i}) log (1 - {\hat{y}}_{i})] .

(16)

Then, the mean squared error term encourages the predicted probabilities to be closer to the true labels, as presented through (17):

MSE ({\hat{y}}_{i}, y_{i}) = \frac{1}{2} {({\hat{y}}_{i} - y_{i})}^{2} .

(17)

The regularization parameter

λ

balances the two components, ensuring both accurate classification and probabilistic calibration.

3.5.1. Motivation for Combining Tanh and Sigmoid

The selection of an appropriate activation function is a critical component in deep learning architectures, as it directly influences gradient propagation, convergence behavior, and the ability to model complex relationships within data. In this study, a novel activation function is proposed, which is defined as the product of the hyperbolic tangent (tanh) and the sigmoid function. The motivation behind this combination stems from the complementary properties of the tanh and sigmoid functions. The tanh function produces outputs in the range

(- 1, 1)

and is zero-centered, which is advantageous in stabilizing weight updates and preventing imbalanced activations during training. However, tanh suffers from the vanishing gradient problem for large values of

| z |

, where the gradient approaches zero, leading to slow convergence and ineffective learning in deep networks. Conversely, the sigmoid function, which maps inputs to the range

(0, 1)

, is widely used in probabilistic modeling but is not zero-centered, which can result in biased gradient updates and slower convergence. Additionally, sigmoid suffers from saturation issues, where gradients become negligible for very large positive or negative values of z.

By combining the two functions multiplicatively, the proposed activation function inherits desirable characteristics from both. The presence of tanh ensures that the activation remains zero-centered, which helps in maintaining balanced gradient updates across layers. Meanwhile, the incorporation of sigmoid introduces probabilistic properties, making the function particularly suitable for classification tasks. The output range of the proposed activation function is effectively constrained between

(- 0.5, 0.5)

, which limits extreme activations and reduces the risk of exploding gradients, thereby improving numerical stability during training. Furthermore, this formulation mitigates the vanishing gradient issue present in both tanh and sigmoid alone, as the derivative of the custom activation function retains non-zero values across a broader range of inputs compared to standard activation functions.

3.5.2. Theoretical Properties and Mathematical Justification

To further analyze the properties of the proposed activation function, its derivative is computed as follows:

\frac{d}{d z} [tanh (z) \cdot σ (z)] = σ (z) \cdot (1 - σ (z)) \cdot tanh (z) + σ (z) \cdot (1 - {tanh}^{2} (z)) .

(18)

This derivative reveals several important characteristics. First, when

z \to 0

, the gradient remains moderate, preventing gradient explosion and ensuring stable weight updates. Second, when z is large, the output remains bounded due to the natural saturation properties of sigmoid and tanh, which prevents uncontrolled activations. Unlike the standard tanh or sigmoid functions alone, which individually suffer from gradient saturation, the proposed function maintains a non-zero gradient across a wider range, allowing for more effective backpropagation. Additionally, when z is negative, the function continues to produce meaningful gradients, unlike the rectified linear unit (ReLU) function, which clips negative inputs to zero, effectively preventing any updates to the corresponding neurons.

These mathematical properties make the proposed activation function particularly advantageous in deep networks that require stable and efficient gradient propagation. In architectures with recurrent components, such as the hybrid CNN-LSTM model utilized in this study, maintaining effective gradient flow is essential to capturing long-range dependencies in sequential data. The combination of the smoothness of sigmoid and the zero-centered nature of tanh makes the proposed function well suited for handling complex spatiotemporal relationships, as required for fan actuator state prediction in smart greenhouse environments.

3.6. Training and Optimization

The model is trained using the Adam optimizer, which adjusts the learning rate for each parameter based on the gradients and their second moments. The update rule is given by (19):

θ_{t + 1} = θ_{t} - η \cdot \frac{\nabla L (θ_{t})}{\sqrt{v_{t}} + ϵ},

(19)

where

η

is the learning rate,

\nabla L (θ_{t})

is the gradient of the loss function,

v_{t}

is the exponentially decaying average of squared gradients, and

ϵ

is a small constant for numerical stability. The hybrid CNN-LSTM architecture integrates convolutional layers for spatial feature extraction, LSTM layers for temporal dependency modeling, and a dense output layer for prediction. The use of a custom activation function enhances the model’s ability to learn complex relationships, while the custom loss function ensures a balance between accurate classification and probabilistic calibration. This combination makes the architecture well suited for the dynamic and non-linear nature of the smart greenhouse data, enabling robust and reliable predictions of the Fan_actuator_ON state.

4. Experiment Setup

The experimental setup for evaluating the hybrid CNN-LSTM architecture was designed to ensure robustness and reproducibility. The experiments were implemented in Python (version 3.13.3), utilizing TensorFlow as the primary framework. All computations were performed on an NVIDIA GPU with CUDA support, enabling efficient training of the deep learning models. The dataset, comprising

n = 37, 923

samples and

d = 13

features, was split into training and testing sets in an 80:20 ratio. The training set was further divided into ten folds for cross-validation, a technique chosen to ensure the generalizability of the results. To address the inherent class imbalance in the target variable, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training data. SMOTE creates synthetic samples for the minority class by interpolating between existing samples. Mathematically, a new synthetic sample

x^{'}

is generated as (3). Furthermore, the training process minimized the custom loss function, defined in Section 3.4, which combines binary cross-entropy (BCE) and mean squared error (MSE). For a given sample, the loss function is expressed as (15). The optimization of the model parameters was performed using the Adam optimizer. The parameter update rule for each weight

θ_{t}

at iteration t is (19). The performance of the hybrid CNN-LSTM model was evaluated using a comprehensive set of metrics. Accuracy measures the proportion of correctly classified samples and is defined as (20):

Accuracy = \frac{\sum_{i = 1}^{n} 1 ({\hat{y}}_{i} = y_{i})}{n},

(20)

where

1 (\cdot)

is the indicator function, returning 1 if its argument is true and 0 otherwise. Precision quantifies the proportion of true positive predictions among all positive predictions, given by (21):

Precision = \frac{TP}{TP + FP},

(21)

where TP and FP denote the true positives and false positives, respectively. Recall evaluates the model’s ability to identify all actual positive cases and is computed as (22):

Recall = \frac{TP}{TP + FN},

(22)

where FN represents false negatives. The F1 score, a harmonic mean of precision and recall, balances these two metrics and is defined as (23):

F 1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall} .

(23)

To ensure robust and unbiased evaluation, a ten-fold cross-validation approach was adopted. In this procedure, the dataset was divided into ten equal subsets. For each fold k, the model was trained on nine subsets and validated on the remaining subset. The average loss across all folds was computed as (24).

L_{CV} = \frac{1}{K} \sum_{k = 1}^{K} L (θ_{k}),

(24)

where

K = 10

is the number of folds,

L (θ_{k})

is the loss for the k-th fold, and

θ_{k}

are the model parameters learned during the k-th training iteration.

To validate the effectiveness of the hybrid CNN-LSTM architecture, its performance was compared against baseline models, including Random Forest (RF), Gradient Boosting (GB), standalone CNN, and standalone LSTM. All models were trained and evaluated under identical conditions, using the same dataset splits, cross-validation framework, and evaluation metrics. The computational complexity of the hybrid CNN-LSTM model was analyzed in terms of the total number of trainable parameters, training duration, and runtime performance. The model contained approximately

P = 1.2

million trainable parameters, distributed across convolutional filters, LSTM cells, and dense layers. The training process for a single fold of the ten-fold cross-validation took approximately 18 min on an NVIDIA RTX 3090 GPU, resulting in a total training duration of approximately 3 h. The average inference time per sample was measured at

t_{\inf} = 2.3

milliseconds, demonstrating the feasibility of real-time predictions in deployment scenarios. The computational cost of the hybrid CNN-LSTM model was compared against the baseline models. Traditional machine learning models, such as Random Forest and Gradient Boosting, had significantly fewer parameters and lower computational demands but exhibited lower predictive accuracy. Conversely, standalone deep learning models, including CNN and LSTM architectures, required similar training time but failed to achieve the same level of predictive performance as the hybrid approach. These results indicate that while the hybrid CNN-LSTM model is computationally more expensive than traditional ML models, it provides superior accuracy and generalization capabilities. The experimental code was modularly implemented to facilitate reproducibility and scalability. Preprocessing steps, such as SMOTE application, feature scaling, and data splitting, were encapsulated in reusable functions. The model architecture, training pipeline, and evaluation metrics were implemented in a manner that allowed for seamless experimentation with different configurations. During training, real-time monitoring of loss and metrics was achieved through visualization tools, ensuring transparency and early detection of overfitting. Additionally, hyperparameter tuning was conducted to optimize the architectural design and learning parameters, ensuring that the final model achieved optimal performance under given computational constraints.

5. Results and Discussion

This section presents the results of the experiments conducted to evaluate the proposed hybrid CNN-LSTM model and its comparisons with various baseline models, including both traditional machine learning and deep learning methods. The discussion elaborates on the performance of these models using the evaluation metrics of accuracy, precision, recall, training time, model size and F1 score as presented in the Table 1.

5.1. Performance of the Proposed Hybrid CNN-LSTM Model

The proposed hybrid CNN-LSTM model achieved remarkable performance, demonstrating its ability to effectively capture both spatial and temporal dependencies in the smart greenhouse data. The model attained an accuracy of 0.9992, precision of 0.9989, recall of 0.9996, and an F1 score of 0.9992. These results indicate near-perfect predictions, with minimal false positives and false negatives. The high recall value of 0.9996 reflects the model’s exceptional ability to correctly identify positive instances of fan actuator state. This is particularly significant in the context of smart greenhouse management, where missing a true positive could result in suboptimal environmental control. The precision of 0.9989 further ensures that almost all predicted positives are indeed correct, minimizing unnecessary activations of the fan actuator. The harmonic mean of precision and recall, as represented by the F1 score, confirms the balance achieved by the model across these metrics.

5.2. Comparison with Traditional Machine Learning Models

Among the traditional machine learning models, XGBoost achieved the highest performance, with an accuracy of 0.9447, precision of 0.9713, recall of 0.9165, and an F1 score of 0.9431. Random Forest followed closely with an accuracy of 0.9429, precision of 0.9744, recall of 0.9098, and an F1 score of 0.9410. These results highlight the effectiveness of ensemble-based methods in capturing the non-linear relationships in the data. Gradient Boosting exhibited comparable performance to Random Forest, with an accuracy of 0.9384, precision of 0.9740, recall of 0.9007, and an F1 score of 0.9360. However, its slightly lower recall indicates a minor reduction in its ability to identify true positives compared to XGBoost and Random Forest.

Logistic Regression and SVM, while still achieving reasonable performance, were outperformed by the ensemble-based models. Logistic Regression attained an accuracy of 0.9097, precision of 0.9140, recall of 0.9045, and an F1 score of 0.9092. Similarly, SVM achieved an accuracy of 0.9122, precision of 0.9194, recall of 0.9037, and an F1 score of 0.9115. These results suggest that simpler linear and kernel-based methods are less effective at capturing the complex patterns in the data compared to ensemble methods and deep learning models. Interestingly, the Stacking Classifier performed poorly in comparison, with an accuracy of 0.7849, precision of 0.7323, recall of 0.8981, and an F1 score of 0.8068. This underperformance may indicate suboptimal integration of the base models or insufficient diversity among the ensemble’s components.

5.3. Comparison with Deep Learning Models

The standalone deep learning models, including Multilayer Perceptron (MLP), CNN, and LSTM, also exhibited strong performance, albeit slightly lower than the proposed hybrid CNN-LSTM model. The MLP achieved an accuracy of 0.9363, precision of 0.9721, recall of 0.8983, and an F1 score of 0.9337. The CNN model achieved comparable results, with an accuracy of 0.9334, precision of 0.9662, recall of 0.8983, and an F1 score of 0.9310. The LSTM model similarly performed well, with an accuracy of 0.9345, precision of 0.9708, recall of 0.8960, and an F1 score of 0.9319.

The hybrid CNN-LSTM model outperformed the standalone CNN and LSTM models, demonstrating the effectiveness of combining spatial feature extraction from CNNs with temporal dependency modeling from LSTMs. The integration of these two paradigms allowed the hybrid model to capture both local and sequential patterns in the data, leading to its superior performance.

5.4. Computational Efficiency and Training Time Analysis

The experimental results provide a comprehensive evaluation of the proposed hybrid CNN-LSTM model in comparison with traditional machine learning models and standalone deep learning architectures. A key aspect of this evaluation is the training time, which significantly influences the feasibility of model deployment in real-time or resource-constrained environments. The proposed hybrid CNN-LSTM model required 37.05 s to train, which is considerably longer than traditional machine learning models such as SVM (0.716 s), Random Forest (1.11 s), and Logistic Regression (0.0157 s). This is expected due to the computational complexity involved in deep learning models, particularly those incorporating sequential dependencies like LSTMs. When compared to standalone deep learning architectures, the hybrid model required more time than CNN (34.40 s) but was notably more efficient than LSTM (76.82 s). This suggests that while LSTMs excel at capturing sequential dependencies, their high computational cost is mitigated when combined with CNNs, which provide efficient feature extraction. In addition to training time, model size is another critical factor, especially for deployment on edge devices or systems with limited memory. The hybrid CNN-LSTM model exhibited the largest size (1455.33 KB) among all models, confirming that the combination of CNN and LSTM results in a substantial increase in the number of parameters. In contrast, traditional machine learning models such as Logistic Regression (0.999 KB), SVM (310.81 KB), and XGBoost (82.28 KB) had significantly smaller footprints, making them more suitable for lightweight applications. Among deep learning models, CNN had a model size of 278.96 KB, while LSTM was slightly larger at 382.54 KB, reinforcing the notion that sequence modeling incurs additional parameter storage. The model size of the hybrid approach is considerably larger than its standalone counterparts, suggesting that while it offers improved predictive power, it may not be ideal for deployment in environments with strict memory constraints unless optimized using techniques such as pruning or quantization.

A key statistical measure in this analysis is the p-value, which determines the significance of differences in performance among the models. The proposed hybrid CNN-LSTM model achieved a p-value of 0.02202, indicating that its superior performance is statistically significant compared to baseline models. This suggests that the observed improvements in accuracy, precision, recall, and F1 score are unlikely to be due to random variations in the data. Similarly, Random Forest (0.02080), Gradient Boosting (0.02491), and Logistic Regression (0.0213) also yielded p-values below 0.05, signifying that their performance differences are statistically significant. Among all models, XGBoost had the lowest p-value (0.01082), reinforcing its reliability as a high-performing model with strong statistical validity. Conversely, deep learning models such as CNN (0.37390), LSTM (0.12456), and MLP (0.12269) had p-values greater than 0.05, suggesting that their performance differences may not be statistically significant compared to the baseline. The Stacking Classifier exhibited the highest p-value (0.2521), indicating that its performance variations were not significant in comparison to other models. From these results, several key insights emerge regarding the trade-offs between accuracy, computational efficiency, and statistical significance. The hybrid CNN-LSTM model achieved the highest accuracy (0.9992) but required substantially longer training time (37.05 s) and the largest model size (1455.33 KB). This suggests that while it provides superior classification performance, its computational demands may make it less practical for real-time applications unless optimized. Traditional machine learning models such as XGBoost, Random Forest, and Gradient Boosting offered competitive accuracy while maintaining significantly lower training time and model size, making them strong alternatives for deployment in constrained environments. Among deep learning models, CNN demonstrated a balanced trade-off between performance and efficiency, whereas LSTM incurred the highest computational cost.

5.5. Discussion and Insights

The results clearly indicate the superiority of the proposed hybrid CNN-LSTM model over both traditional machine learning and standalone deep learning approaches. The high accuracy, precision, recall, and F1 score suggest that the hybrid model effectively generalizes the underlying patterns in the data. This performance can be attributed to the complementary strengths of CNNs and LSTMs, which allow the model to process spatial and temporal features simultaneously. Ensemble-based machine learning models, such as XGBoost and Random Forest, also performed well, highlighting their robustness in handling complex data. However, their performance was slightly lower than that of the hybrid model, suggesting that deep learning techniques are better suited for capturing the intricate relationships present in this dataset. The relatively poor performance of the Stacking Classifier may indicate issues with overfitting or the selection of base models. This result emphasizes the importance of careful design and validation of ensemble methods to ensure their effectiveness.

5.5.1. Real-World Applications and Deployment Challenges

Despite the promising results, deploying the hybrid CNN-LSTM model in real-world smart greenhouse environments presents several challenges. One of the primary concerns is the computational complexity of deep learning models. Unlike traditional machine learning approaches, which can be executed on low-power edge devices, deep learning models often require high-performance GPUs or TPUs for inference. This poses a challenge for resource-constrained greenhouse setups that may rely on embedded systems with limited processing power. A potential solution is model optimization through techniques such as quantization, pruning, and knowledge distillation, which can significantly reduce computational demands while maintaining accuracy. Another key consideration is the real-time adaptability of the model. Environmental conditions in greenhouses change dynamically, requiring predictive models that can update and adapt to new data continuously. Implementing an adaptive learning mechanism, such as online learning or periodic retraining with fresh data, can enhance model robustness. Additionally, integrating the hybrid model with IoT-enabled sensors and edge computing frameworks would allow decentralized decision-making, reducing reliance on cloud-based processing and improving response times for actuator control.

5.5.2. Possible Adaptations for Real-Time Greenhouse Monitoring

For successful deployment, the model should be incorporated into a real-time greenhouse monitoring system that seamlessly interacts with IoT sensors and actuators. This requires an efficient data pipeline that preprocesses sensor readings in real time, feeds them into the model, and triggers actuator responses based on the model’s predictions. Implementing a hierarchical decision-making approach where low-complexity rule-based systems handle routine tasks and the deep learning model intervenes in complex scenarios can further optimize energy efficiency. Furthermore, external factors such as network latency and data transmission failures must be addressed. Deploying the model on edge computing devices closer to the greenhouse environment minimizes dependency on continuous internet connectivity. Additionally, ensuring robustness against noisy or missing sensor data through advanced imputation techniques can enhance real-world applicability.

6. Implications and Limitations

The findings of this study, particularly the superior performance of the hybrid CNN-LSTM model, have several important implications for both research and practical applications in smart greenhouse management. At the same time, this study has certain limitations that highlight areas for future work.

6.1. Implications

The hybrid CNN-LSTM model demonstrated exceptional predictive capabilities, with near-perfect performance across all evaluation metrics. This suggests that such architectures are well suited for capturing both spatial and temporal dependencies in complex datasets. In the context of smart greenhouses, this translates to highly accurate and reliable control of environmental conditions, which can improve crop yields, reduce resource wastage, and enhance sustainability. One of the key implications of this study is the potential for generalization to other domains that involve spatiotemporal data. For example, similar architectures could be applied to predictive maintenance in industrial settings, where sensor data combine spatial and temporal features. Additionally, this approach could benefit healthcare applications, such as monitoring patient vitals over time, or transportation systems, where traffic patterns are inherently temporal and spatial.

The integration of a custom activation function and a custom loss function into the hybrid CNN-LSTM architecture further underscores the importance of tailoring deep learning models to the specific characteristics of the problem. These customizations not only improved the performance of the proposed model but also provided better-calibrated predictions, which are crucial for decision-making in critical systems like smart greenhouses. Moreover, this study highlights the complementary strengths of CNNs and LSTMs, showing that hybrid architectures can outperform standalone deep learning models and traditional machine learning methods. This has implications for researchers and practitioners aiming to develop state-of-the-art predictive models, as it provides a clear direction for combining multiple paradigms to enhance performance.

6.2. Limitations

Despite its strengths, this study is not without limitations. First, the dataset used for training and evaluation, while extensive, was collected from a single smart greenhouse environment. As a result, the generalizability of the findings to other greenhouses with different environmental conditions or control systems remains uncertain. The dataset may also contain inherent biases in environmental conditions, which could influence model performance in real-world applications. Future work should validate the model on datasets collected from diverse settings, incorporating variations in climate, greenhouse architecture, and crop types to establish its robustness. Additionally, methods such as domain adaptation or transfer learning could be explored to improve the model’s adaptability to new environments without requiring extensive retraining. Second, the computational cost of training the hybrid CNN-LSTM model is relatively high compared to traditional machine learning models. The reliance on GPU acceleration for efficient training may limit the applicability of this approach in resource-constrained environments. While the high accuracy achieved demonstrates the effectiveness of the model, the trade-off between complexity and real-time deployment needs further consideration. Strategies such as model quantization, pruning, or knowledge distillation could be explored to optimize the architecture for faster inference and lower resource consumption, enabling deployment on edge devices or embedded systems.

Third, while this study introduces a novel activation function that combines the hyperbolic tangent and sigmoid functions, and employs a custom loss function that integrates binary cross-entropy (BCE) with mean squared error (MSE), a dedicated numerical evaluation of these components was not performed. Although theoretical justifications have been provided to explain the expected benefits of these design choices, an empirical ablation study comparing the proposed activation and loss functions against conventional alternatives (such as ReLU, standard BCE, and standard MSE) was not conducted due to computational constraints and scope limitations. Performing such experiments would require additional extensive model training runs with multiple configurations, which was not feasible within the current study’s resource allocation and timeframe. Furthermore, since the primary objective of this study was to develop an integrated deep learning framework tailored for greenhouse actuator control, the focus remained on optimizing the overall model rather than isolating individual component contributions. Future research should conduct controlled experiments comparing the hybrid activation function and custom loss function with standard activation approaches to quantify their exact contributions. Fourth, the interpretability of deep learning models remains a significant challenge, particularly in applications where decision transparency is crucial. While the proposed CNN-LSTM architecture achieved exceptional predictive performance, understanding its internal decision-making process is non-trivial. Black-box nature issues persist in deep learning models, and this study did not incorporate post hoc explainability techniques to interpret how specific environmental conditions influence predictions. Approaches such as SHAP (SHapley Additive exPlanations), integrated gradients, or attention mechanisms could be explored in future work to enhance model interpretability. Additionally, assessing model reliability under varying input distributions is essential to ensure robustness in real-world deployments.

Fifth, while the model outperformed traditional classifiers, no formal error analysis was conducted to assess potential failure cases and sensitivity to noise or adversarial inputs. In practical greenhouse implementations, sensor readings may be affected by external noise, calibration errors, or missing data. The impact of such perturbations on model stability was not rigorously evaluated in this study. Future research should examine the resilience of the proposed approach by incorporating synthetic noise or adversarial attacks into the dataset to assess robustness. Finally, the relatively poor performance of the Stacking Classifier highlights the need for a more rigorous ensemble design. The stacking approach was not extensively tuned or optimized in this study, which may have limited its effectiveness. Future work could explore advanced ensemble techniques or hybridization strategies that integrate the strengths of stacking with deep learning architectures. Additionally, a comparative analysis of ensemble learning using different feature representations may provide insights into the optimal configurations for improving predictive performance. While these limitations acknowledge the areas requiring further refinement, the proposed model remains a promising step toward developing intelligent climate control solutions for smart greenhouses. Addressing these challenges in future research could further enhance the practicality, interpretability, and generalizability of deep learning-based actuator control systems.

7. Conclusions

This study proposed a hybrid CNN-LSTM architecture designed for predicting the activation state of fan actuators in smart greenhouses, leveraging deep learning techniques to enhance predictive accuracy and operational efficiency. By integrating Convolutional Neural Networks (CNNs) for spatial feature extraction and Long Short-Term Memory (LSTM) networks for temporal dependency modeling, the hybrid model effectively captured both localized patterns and long-range dependencies within environmental sensor data. Additionally, a custom activation function combining the tanh and sigmoid functions was introduced to improve gradient propagation and model stability, while a custom loss function incorporating binary cross-entropy (BCE) and mean squared error (MSE) was developed to enhance prediction calibration. Experimental results demonstrated the superior performance of the proposed model, achieving an accuracy of 0.9992, precision of 0.9989, recall of 0.9996, and an F1 score of 0.9992, significantly outperforming traditional machine learning models such as Random Forest, Gradient Boosting, and XGBoost, as well as standalone CNN and LSTM architectures. A key advantage of the proposed hybrid CNN-LSTM model lies in its ability to optimize actuator control decisions by effectively integrating spatial and temporal dependencies in sensor readings, thereby reducing unnecessary energy consumption and improving environmental regulation within the greenhouse. However, this study also highlights several challenges, including the computational complexity of deep learning-based approaches, the need for extensive hyperparameter tuning, and the difficulty of deploying high-parameter models in resource-constrained environments. Additionally, while theoretical justifications were provided for the custom activation and loss functions, an empirical ablation study comparing them against standard alternatives was not conducted due to computational constraints. Addressing these limitations is essential for future advancements in the field.

To further refine and extend this research, several directions are proposed for future work. First, the generalizability of the model should be assessed on datasets collected from multiple greenhouse environments with varying climatic conditions, sensor configurations, and crop types. This would provide a more comprehensive evaluation of model robustness and adaptability. Domain adaptation techniques and transfer learning strategies could also be explored to reduce the need for retraining when applying the model to different agricultural settings. Second, computational efficiency remains a critical consideration for real-time deployment. Future research should investigate techniques such as model quantization, pruning, and knowledge distillation to reduce the hybrid model’s computational footprint without compromising predictive accuracy. Deploying lightweight versions of the model on edge computing devices, such as embedded systems or microcontrollers, could enable real-time inference without relying on cloud-based infrastructure. Third, explainability and interpretability of deep learning models in greenhouse control systems require further investigation. Techniques such as SHAP (SHapley Additive exPlanations), attention mechanisms, and integrated gradients could be employed to provide insights into how sensor data influence the model’s predictions, ensuring transparency and trust in automated decision-making. Enhancing model interpretability would also facilitate its adoption in agricultural management systems where explainability is essential for user acceptance. Fourth, a dedicated ablation study should be conducted to empirically validate the contribution of the custom activation function and loss function. This analysis would involve training the model with standard activation functions (e.g., ReLU, Leaky ReLU) and loss functions (e.g., standalone BCE or MSE) to quantify the performance improvements introduced by the proposed modifications. Additionally, alternative activation functions with adaptive properties could be explored to further optimize gradient flow in deep architectures. Finally, real-time adaptability remains an open challenge in smart greenhouse automation. Future research could investigate the integration of reinforcement learning-based control strategies that dynamically adjust actuator behavior based on continuous feedback from sensor data. Hybrid AI systems combining deep learning with rule-based control mechanisms could offer a balance between predictive power and operational reliability, ensuring optimal climate regulation under diverse environmental conditions.

Author Contributions

Conceptualization, G.A. and J.B.; methodology, G.A.; software, G.A.; validation, G.A., J.B. and B.H.P.L.; formal analysis, G.A.; investigation, J.B.; resources, G.A. and O.I.A.N.; data curation, B.H.P.L.; writing—original draft preparation, G.A.; writing—review and editing, G.A. and J.B.; visualization, O.I.A.N.; supervision, B.H.P.L.; project administration, G.A.; funding acquisition, G.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Atma Jaya Catholic University of Indonesia through LPPM (Lembaga Penelitian dan Pengabdian Masyarakat).

Data Availability Statement

The data presented in this study are openly available and can be accessed through the dataset originally provided by Mohammed Ismail Lifta (2023–2024) at Tikrit University, available at https://www.kaggle.com/datasets/wisam1985/iot-agriculture-2024.

Conflicts of Interest

The authors declare no conflicts of interest. Furthermore, no external funders played any role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Hassan, S.I.; Alam, M.M.; Illahi, U.; Al Ghamdi, M.A.; Almotiri, S.H.; Su’ud, M.M. A systematic review on monitoring and advanced control strategies in smart agriculture. IEEE Access 2021, 9, 32517–32548. [Google Scholar] [CrossRef]
Hassan, M.; Kowalska, A.; Ashraf, H. Advances in deep learning algorithms for agricultural monitoring and management. Appl. Res. Artif. Intell. Cloud Comput. 2023, 6, 68–88. [Google Scholar]
Ragaveena, S.; Shirly Edward, A.; Surendran, U. Smart controlled environment agriculture methods: A holistic review. Rev. Environ. Sci. Bio/Technol. 2021, 20, 887–913. [Google Scholar] [CrossRef]
Karunathilake, E.; Le, A.T.; Heo, S.; Chung, Y.S.; Mansoor, S. The path to smart farming: Innovations and opportunities in precision agriculture. Agriculture 2023, 13, 1593. [Google Scholar] [CrossRef]
Getahun, S.; Kefale, H.; Gelaye, Y. Application of Precision Agriculture Technologies for Sustainable Crop Production and Environmental Sustainability: A Systematic Review. Sci. World J. 2024, 2024, 2126734. [Google Scholar] [CrossRef] [PubMed]
Mgendi, G. Unlocking the potential of precision agriculture for sustainable farming. Discov. Agric. 2024, 2, 87. [Google Scholar] [CrossRef]
Akpenpuun, T.; Ogunlowo, Q.; Na, W.; Rabiu, A.; Adesanya, M.; Dutta, P.; Zakir, E.; Ogundele, O.; Kim, H.; Lee, H. Review of Temperature Management Strategies and Techniques in the Greenhouse Microenvironment. Adeleke Univ. J. Eng. Technol. 2023, 6, 126–147. [Google Scholar]
Zhang, M.; Yan, T.; Wang, W.; Jia, X.; Wang, J.; Klemeš, J.J. Energy-saving design and control strategy towards modern sustainable greenhouse: A review. Renew. Sustain. Energy Rev. 2022, 164, 112602. [Google Scholar] [CrossRef]
Maraveas, C.; Karavas, C.S.; Loukatos, D.; Bartzanas, T.; Arvanitis, K.G.; Symeonaki, E. Agricultural greenhouses: Resource management technologies and perspectives for zero greenhouse gas emissions. Agriculture 2023, 13, 1464. [Google Scholar] [CrossRef]
Bose, R.; Roy, S.; Mondal, H.; Chowdhury, D.R.; Chakraborty, S. Energy-efficient approach to lower the carbon emissions of data centers. Computing 2021, 103, 1703–1721. [Google Scholar] [CrossRef]
Baidhe, E.; Clementson, C.L.; Senyah, J.; Hammed, A. Appraisal of Post-Harvest Drying and Storage Operations in Africa: Perspectives on Enhancing Grain Quality. AgriEngineering 2024, 6, 3030–3057. [Google Scholar] [CrossRef]
Dhayanidhi, G. Research on IoT Threats & Implementation of AI/ML to Address Emerging Cybersecurity Issues in IoT with Cloud Computing; University of Alberta: Edmonton, AB, Canada, 2022. [Google Scholar]
Sun, Q. Design, Integration, and Evaluation of IoT-Based Electrochromic Building Envelopes for Visual Comfort and Energy Efficiency. Ph.D. Thesis, Clemson University, Clemson, SC, USA, August 2022. [Google Scholar]
Dogani, J.; Namvar, R.; Khunjush, F. Auto-scaling techniques in container-based cloud and edge/fog computing: Taxonomy and survey. Comput. Commun. 2023, 209, 120–150. [Google Scholar] [CrossRef]
Kocher, G.; Kumar, G. Machine learning and deep learning methods for intrusion detection systems: Recent developments and challenges. Soft Comput. 2021, 25, 9731–9763. [Google Scholar] [CrossRef]
Li, W.; Finsa, M.M.; Laskey, K.B.; Houser, P.; Douglas-Bate, R. Groundwater level prediction with machine learning to support sustainable irrigation in water scarcity regions. Water 2023, 15, 3473. [Google Scholar] [CrossRef]
Shi, S.; Caluyo, F.; Hernandez, R.; Sarmiento, J.; Rosales, C.A. Automatic Classification and Identification of Plant Disease Identification by Using a Convolutional Neural Network. Nat. Eng. Sci. 2024, 9, 184–197. [Google Scholar] [CrossRef]
Ranawaka, A.; Alahakoon, D.; Sun, Y.; Hewapathirana, K. Leveraging the Synergy of Digital Twins and Artificial Intelligence for Sustainable Power Grids: A Scoping Review. Energies 2024, 17, 5342. [Google Scholar] [CrossRef]
Masudin, I.; Tsamarah, N.; Restuputri, D.P.; Trireksani, T.; Djajadikerta, H.G. The impact of safety climate on human-technology interaction and sustainable development: Evidence from Indonesian oil and gas industry. J. Clean. Prod. 2024, 434, 140211. [Google Scholar] [CrossRef]
Attaran, S.; Attaran, M.; Celik, B.G. Digital Twins and Industrial Internet of Things: Uncovering operational intelligence in industry 4.0. Decis. Anal. J. 2024, 10, 100398. [Google Scholar] [CrossRef]
Ding, W.; Abdel-Basset, M.; Alrashdi, I.; Hawash, H. Next generation of computer vision for plant disease monitoring in precision agriculture: A contemporary survey, taxonomy, experiments, and future direction. Inf. Sci. 2024, 665, 120338. [Google Scholar] [CrossRef]
Mayuravaani, M.; Ramanan, A.; Perera, M.; Senanayake, D.A.; Vidanaarachchi, R. Insights into Artificial Intelligence Bias: Implications for Agriculture. Digit. Soc. 2024, 3, 48. [Google Scholar] [CrossRef]
Tamayo-Vera, D.; Wang, X.; Mesbah, M. A Review of Machine Learning Techniques in Agroclimatic Studies. Agriculture 2024, 14, 481. [Google Scholar] [CrossRef]
Nayak, R.; Rajaramesh, C.; Ghugar, U. Challenges in Building Predictive Models. In Intelligent Techniques for Predictive Data Analytics; Wiley: Hoboken, NJ, USA, 2024; p. 25. [Google Scholar]
Tang, H.H.; Ahmad, N.S. Fuzzy logic approach for controlling uncertain and nonlinear systems: A comprehensive review of applications and advances. Syst. Sci. Control Eng. 2024, 12, 2394429. [Google Scholar] [CrossRef]
Lucchino, E.C.; Goia, F. Multi-domain model-based control of an adaptive façade based on a flexible double skin system. Energy Build. 2023, 285, 112881. [Google Scholar] [CrossRef]
Arango, E.; Nogal, M.; Yang, M.; Sousa, H.S.; Stewart, M.G.; Matos, J.C. Dynamic thresholds for the resilience assessment of road traffic networks to wildfires. Reliab. Eng. Syst. Saf. 2023, 238, 109407. [Google Scholar] [CrossRef]
Lifta, M.I.; Abdullah, W.D. IoT Agriculture 2024: Smart Greenhouse Dataset. Supervised by Assistant Professor Wissam Dawood Abdullah, Director of the Cisco Networking Academy at Tikrit University. The Dataset Consists of 13 Features and 37,923 Rows, Collected from a Smartly-Equipped Greenhouse. Master’s Thesis, Department of Computer Science, College of Computer Science and Mathematics, Tikrit University, Saladin Governorate, Iraq, 2024. [Google Scholar]

Figure 1. Flowchart of the hybrid CNN-LSTM model for fan actuator prediction. The model processes sensor data through convolutional layers for spatial feature extraction, followed by LSTM layers for temporal modeling, ultimately predicting the activation state of the fan actuator.

Table 1. Performance Comparison of models for predicting fan actuator state. Metrics include accuracy, precision, recall, F1 score, training time (s), model size (KB), and p-value.

Model	Accuracy	Precision	Recall	F1 Score	Training Time (s)	Model Size (KB)	p-Value
Proposed Hybrid CNN-LSTM	0.9992	0.9989	0.9996	0.9992	37.05	1455.33	0.02202
SVM	0.9122	0.9194	0.9037	0.9115	0.716296	310.816992	0.0699
Random Forest	0.9429	0.9744	0.9098	0.9410	1.11	960.73	0.02080
Gradient Boosting	0.9384	0.9740	0.9007	0.9360	3.73	175.42	0.02491
XGBoost	0.9447	0.9713	0.9165	0.9431	0.08	82.28	0.01082
Logistic Regression	0.9097	0.9140	0.9045	0.9092	0.015691	0.999023	0.0213
Stacking Classifier	0.7849	0.7323	0.8981	0.8068	2.401323	490.440527	0.2521
MLP	0.9363	0.9721	0.8983	0.9337	31.49	144.73	0.12269
CNN	0.9334	0.9662	0.8983	0.9310	34.40	278.96	0.37390
LSTM	0.9345	0.9708	0.8960	0.9319	76.82	382.54	0.12456

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Airlangga, G.; Bata, J.; Adi Nugroho, O.I.; Lim, B.H.P. Hybrid CNN-LSTM Model with Custom Activation and Loss Functions for Predicting Fan Actuator States in Smart Greenhouses. AgriEngineering 2025, 7, 118. https://doi.org/10.3390/agriengineering7040118

AMA Style

Airlangga G, Bata J, Adi Nugroho OI, Lim BHP. Hybrid CNN-LSTM Model with Custom Activation and Loss Functions for Predicting Fan Actuator States in Smart Greenhouses. AgriEngineering. 2025; 7(4):118. https://doi.org/10.3390/agriengineering7040118

Chicago/Turabian Style

Airlangga, Gregorius, Julius Bata, Oskar Ika Adi Nugroho, and Boby Hartanto Pramudita Lim. 2025. "Hybrid CNN-LSTM Model with Custom Activation and Loss Functions for Predicting Fan Actuator States in Smart Greenhouses" AgriEngineering 7, no. 4: 118. https://doi.org/10.3390/agriengineering7040118

APA Style

Airlangga, G., Bata, J., Adi Nugroho, O. I., & Lim, B. H. P. (2025). Hybrid CNN-LSTM Model with Custom Activation and Loss Functions for Predicting Fan Actuator States in Smart Greenhouses. AgriEngineering, 7(4), 118. https://doi.org/10.3390/agriengineering7040118

Article Menu

Hybrid CNN-LSTM Model with Custom Activation and Loss Functions for Predicting Fan Actuator States in Smart Greenhouses

Abstract

1. Introduction

2. Dataset Description and Preprocessing Techniques

3. Hybrid CNN-LSTM Architecture with Custom Activation Function

3.1. Hybrid CNN-LSTM Model Flowchart

3.2. Convolutional Layers for Spatial Feature Extraction

3.3. LSTM Layers for Temporal Dependency Modeling

3.4. Dense Output Layer and Prediction

3.5. Custom Loss Function

3.5.1. Motivation for Combining Tanh and Sigmoid

3.5.2. Theoretical Properties and Mathematical Justification

3.6. Training and Optimization

4. Experiment Setup

5. Results and Discussion

5.1. Performance of the Proposed Hybrid CNN-LSTM Model

5.2. Comparison with Traditional Machine Learning Models

5.3. Comparison with Deep Learning Models

5.4. Computational Efficiency and Training Time Analysis

5.5. Discussion and Insights

5.5.1. Real-World Applications and Deployment Challenges

5.5.2. Possible Adaptations for Real-Time Greenhouse Monitoring

6. Implications and Limitations

6.1. Implications

6.2. Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI