1. Introduction
The rapid advancement of smart agricultural systems has revolutionized how environmental conditions are monitored and controlled, optimizing crop growth and yield [
1,
2,
3]. Among these innovations, smart greenhouses have emerged as a pivotal component of precision agriculture, leveraging advanced technologies to create controlled environments for sustainable food production [
4,
5,
6]. Within this context, fan actuators play a critical role in regulating airflow to maintain optimal temperature and humidity levels, ensuring a stable microclimate for plant growth [
7]. Predicting the Fan Actuator Activation State is not merely a technical challenge, but also a necessity for efficient and sustainable greenhouse management [
8]. Efficient fan control impacts multiple facets of greenhouse operations, including energy efficiency, crop health, and environmental sustainability [
9]. Unnecessary fan operation leads to increased energy consumption, raising operational costs and carbon footprints [
10]. Conversely, failure to activate fans when required can result in adverse conditions such as overheating or excessive humidity, which can damage crops, promote pest infestations, or reduce yields [
11]. Therefore, developing an intelligent system capable of accurately predicting fan activation is essential for achieving energy-efficient and climate-resilient agricultural practices.
Despite the increasing adoption of IoT-enabled greenhouses, many current systems rely on rudimentary rule-based algorithms or manual interventions, which fail to dynamically adapt to environmental changes [
12,
13]. Traditional threshold-based methods often overlook the complex non-linear interactions among environmental parameters, leading to suboptimal control decisions [
14]. While machine learning (ML) and deep learning (DL) techniques offer promising alternatives by capturing intricate dependencies, existing models exhibit notable limitations in handling spatiotemporal data effectively [
15]. Numerous studies have explored the application of machine learning and deep learning models in agricultural applications. For instance, decision trees have been used to predict irrigation needs, achieving moderate improvements over static methods [
16]. Similarly, Convolutional Neural Networks (CNNs) have been applied to pest detection in image data, demonstrating their robustness in handling unstructured inputs [
17]. However, the use of these models for real-time actuator control, particularly fan actuators in smart greenhouses, remains underexplored. Furthermore, most studies focus on single environmental parameters, neglecting the synergistic effects of multiple variables such as temperature, humidity, and soil nutrients [
18,
19,
20].
A fundamental challenge in smart greenhouse management is class imbalance in actuator state data, as fan activations occur less frequently than non-activations. Many machine learning models struggle to generalize under such imbalanced distributions, resulting in biased predictions that compromise energy efficiency and crop health [
21,
22,
23]. Deep learning-based models offer improvements in predictive accuracy but are typically computationally expensive, posing challenges for real-time deployment in resource-constrained environments [
24]. Another key limitation in previous work is the reliance on fixed rule-based fan control mechanisms, which are unable to adapt dynamically to varying environmental conditions [
25,
26,
27]. Many greenhouse systems employ simplistic ON/OFF heuristics, leading to either excessive fan usage or delayed activation, both of which negatively impact energy efficiency and plant health. These challenges highlight the necessity for an advanced predictive model that can seamlessly integrate spatial and temporal features while addressing data imbalance and computational efficiency concerns.
This research addresses these limitations by developing a hybrid CNN-LSTM model for predicting the Fan Actuator Activation State in smart greenhouses. The CNN component extracts spatial dependencies among sensor readings, capturing intricate patterns in temperature and humidity distributions, while the LSTM component models temporal variations, ensuring that actuator state predictions account for time-dependent fluctuations in greenhouse conditions. By combining these two architectures, the hybrid model leverages both spatial and temporal dependencies, significantly improving predictive accuracy compared to standalone CNN or LSTM models. A notable contribution in this field is the dataset developed by [
28], which provides a rich collection of IoT sensor data from a fully operational smart greenhouse. This dataset enables the development of highly precise predictive models tailored to real-world smart greenhouse environments. However, to address data imbalance, this study applies Synthetic Minority Oversampling Technique (SMOTE), ensuring that the model does not develop biases toward the dominant actuator state. Additionally, a custom activation function and custom loss function are introduced to enhance the model’s learning efficiency and improve generalization across varying environmental conditions. These modifications enable the model to minimize errors in rare activation instances, ensuring robust decision-making for energy-efficient fan control.
To further enhance model performance, hyperparameter tuning is conducted using Keras Tuner, refining model parameters for maximum accuracy. This study also integrates K-fold cross-validation to provide a robust evaluation framework, ensuring reliable results across different data partitions. The novel contributions of this study include the development of a hybrid deep learning model that combines CNNs and LSTMs, specifically designed to capture both spatial and temporal dependencies in smart greenhouse data. The introduction of a custom loss function, which optimizes a combination of mean squared error and binary cross-entropy losses, further enhances prediction performance. A comparative analysis is conducted, benchmarking the hybrid model against traditional machine learning approaches such as Random Forest and Gradient Boosting, as well as standalone deep learning architectures such as CNNs and LSTMs. The results demonstrate the superior predictive capabilities of the proposed model, highlighting its potential impact on smart greenhouse management. The remainder of this article is organized as follows. The next section describes the dataset and preprocessing techniques, including imputation, scaling, and SMOTE-based augmentation. The Methodology section outlines the model architectures, training processes, and evaluation metrics. The Results section presents the performance comparison of various models and highlights the advantages of the proposed hybrid approach. The Discussion section elaborates on the implications of the findings, addressing challenges, limitations, and future directions. Finally, this study concludes by summarizing the contributions and their significance in advancing sustainable smart greenhouse management.
2. Dataset Description and Preprocessing Techniques
The dataset used in this research originated from a master’s thesis conducted by Mohammed Ismail Lifta (2023–2024) at Tikrit University, Iraq [
28]. It represents a comprehensive collection of real-time environmental and actuator data from a smart greenhouse equipped with advanced IoT sensors. This dataset contains 37,922 rows and 13 columns, each corresponding to a recorded instance of environmental measurements and actuator states. The information provided by this dataset forms the foundation for developing predictive models to optimize greenhouse operations, specifically for controlling the fan actuator. The dataset comprises various features, including temporal, environmental, and actuator-related variables. The column labeled as
date records the timestamp for each measurement in a
datetime64 format, indicating when the data were captured. Environmental conditions such as
temperature (temperature in degrees Celsius),
humidity (percentage of environmental humidity), and
water_level (percentage of water level) are recorded as integer values. Soil nutrient levels, represented by
N,
P, and
K (nitrogen, phosphorus, and potassium levels, respectively), are scaled within the range
, ensuring consistency in their representation. The actuator states are captured as binary indicators, where
Fan_actuator_ON and
Fan_actuator_OFF denote the operational status of the fan actuator, while similar pairs describe the states of the watering plant pump and water pump actuators.
To ensure the dataset’s readiness for machine learning and deep learning models, extensive preprocessing was performed. The primary steps involved handling temporal data, scaling numerical features, addressing class imbalance in the target variable, and ensuring the integrity of the dataset through imputation. The
date column, initially in
datetime64 format, was processed to extract two additional features:
hour and
minute. These features capture the temporal variations in the data, reflecting periodic changes in environmental conditions. The transformation is mathematically expressed as (
1):
where
t represents the timestamp in seconds since the epoch. After extracting these features, the original
date column was dropped to simplify the feature set. Numerical features, including
tempreature,
humidity,
water_level,
N,
P, and
K, were standardized using Z-score normalization to ensure a uniform scale across features. This process is defined as (
2):
where
X represents the original feature value,
is the mean, and
is the standard deviation. This transformation centers the data around zero with a unit variance, which is crucial for ensuring effective training of machine learning models. The target variable
Fan_actuator_ON exhibited class imbalance, which can adversely affect model performance. To address this, the Synthetic Minority Oversampling Technique (SMOTE) was applied. SMOTE generates synthetic samples for the minority class using the k-nearest neighbors algorithm. For two nearest neighbors
and
, a synthetic sample
is generated as (
3):
This technique ensures a balanced class distribution, enhancing the model’s ability to generalize across both classes. Missing values in the dataset were minimal, primarily in the date column. These were imputed using linear interpolation, leveraging the temporal order of the data. For any remaining missing values in numerical features, mean imputation was used, where the missing value was replaced by the feature’s mean . Similarly, categorical features were imputed using the mode. The final dataset, after preprocessing, included features such as the standardized environmental variables, the temporal features hour and minute, and the actuator states excluding Fan_actuator_ON. This comprehensive preprocessing pipeline ensured that the data were balanced, scaled, and devoid of inconsistencies, making them suitable for predictive modeling. The target variable Fan_actuator_ON serves as the binary classification objective, representing whether the fan actuator is active (1) or inactive (0). This prepared dataset provides a robust basis for developing machine learning models aimed at enhancing the efficiency and sustainability of smart greenhouse operations.
3. Hybrid CNN-LSTM Architecture with Custom Activation Function
The hybrid CNN-LSTM model combines the strengths of Convolutional Neural Networks (CNNs) for spatial feature extraction and Long Short-Term Memory (LSTM) networks for capturing temporal dependencies. This architecture is specifically designed to predict the fan activation state in a smart greenhouse. Advanced techniques such as a custom loss function and a custom activation function further enhance its performance. This section provides a detailed mathematical and conceptual overview of the architecture.
3.1. Hybrid CNN-LSTM Model Flowchart
Figure 1 illustrates the architectural flow of the proposed hybrid CNN-LSTM model designed for predicting the activation state of the fan actuator in a smart greenhouse. This model processes sensor data sequentially through multiple layers, each serving a specific role in extracting spatial and temporal features. The process begins with the input layer, which receives raw sensor readings such as temperature, humidity, soil moisture, and CO
2 levels. These inputs undergo preprocessing steps, including normalization and imputation, to ensure data consistency before being passed into the neural network. The first major processing component of the model is the Conv1D layer, which is responsible for extracting localized spatial patterns from the sensor data. By applying convolutional filters, this layer captures meaningful correlations among environmental variables, helping to enhance predictive accuracy for actuator behavior. Following this, an activation layer applies a custom activation function that integrates the properties of both the tanh and sigmoid functions, introducing non-linearity into the network. To further refine the extracted features, a max-pooling operation is applied to reduce the spatial dimensions while retaining the most significant information. This operation improves computational efficiency by downsampling the feature maps, mitigating overfitting, and ensuring that only the most relevant features are carried forward in the model. The resulting feature maps are then passed through a flattening layer, which converts them into a one-dimensional vector in preparation for sequential processing by the LSTM component.
The bidirectional LSTM layer forms the core of the temporal modeling process. Unlike conventional LSTM architectures, which process time-series data in a single direction, the bidirectional nature of this layer enables the model to learn from both past and future time steps within the sensor data sequence. This capability enhances the model’s ability to recognize recurring patterns and trends, leading to improved predictive performance. The LSTM network maintains a cell state and a hidden state, which are iteratively updated using gating mechanisms such as the input gate, forget gate, and output gate. These mechanisms regulate the flow of information within the LSTM cell, ensuring that relevant dependencies are preserved while redundant or less important details are discarded. The updated cell state is then modulated using the custom activation function to introduce additional non-linearity, further enhancing the model’s capacity for learning complex sequential dependencies. Once the spatiotemporal features have been fully processed, they are passed to a fully connected dense layer, which serves as the classification stage of the model. The dense layer applies weighted transformations to the learned feature representations, refining them to improve prediction accuracy. Finally, the output layer produces a binary classification decision, determining whether the fan actuator should be turned on. This final output is generated using a sigmoid activation function, which produces a probability score between zero and one, representing the likelihood of fan activation. By applying an appropriate threshold to this probability score, the model makes a definitive binary decision about the actuator state.
The proposed hybrid CNN-LSTM model effectively combines convolutional and recurrent learning techniques to improve fan actuator state prediction. The CNN component ensures robust spatial feature extraction, while the LSTM component captures temporal dependencies within the greenhouse sensor data. The use of a custom activation function enhances the model’s non-linearity, improving its ability to model complex relationships in environmental conditions. Additionally, max-pooling and flattening operations ensure computational efficiency by reducing the model’s dimensional complexity, and the bidirectional nature of the LSTM layer allows for better sequence modeling by capturing both forward and backward dependencies in time-series data. By integrating these components, the hybrid CNN-LSTM model provides an advanced predictive framework tailored for smart greenhouse applications. Its ability to incorporate spatial and temporal features enables it to make highly accurate predictions, leading to more efficient energy usage, improved climate control, and enhanced crop health. The structured processing of data, from the input layer through the convolutional and recurrent layers to the final classification output, ensures that the model effectively leverages patterns in sensor readings to inform real-time actuator control decisions.
Figure 1 provides an intuitive visual representation of this process, illustrating how raw sensor data are transformed step by step into an informed actuator decision that optimizes greenhouse conditions.
3.2. Convolutional Layers for Spatial Feature Extraction
The first stage of the hybrid architecture involves convolutional layers that process the input feature tensor
, where
n is the number of samples,
d is the number of features, and the last dimension represents the channel. These layers apply learnable filters W to extract spatial patterns. The convolution operation computes as (
4):
where
is the kernel size, W represents the filter weights, and
b is the bias term. This operation slides the filter across the input tensor, capturing local dependencies between features. The output of the convolution is passed through a non-linear activation function. In this architecture, a custom activation function replaces the standard ReLU to introduce additional flexibility in learning complex relationships as (
5):
where the custom activation equation is the combination of tanh and sigmoid activation function, as presented in (
6) and (
7):
This custom activation function combines the properties of tanh, which allows outputs in the range
, and
, which compresses values into
. The result is a smooth, non-linear function that enhances the model’s ability to learn subtle variations in the input. Following the convolution and activation, max-pooling is applied to reduce the spatial dimensions of the feature maps as (
8):
where
p is the pooling size. This step focuses on the most salient features, reducing computational complexity while maintaining essential information.
3.3. LSTM Layers for Temporal Dependency Modeling
The feature maps from the CNN layers are flattened and passed to an LSTM layer to model temporal dependencies. At each time step
t, the LSTM cell maintains a cell state
and a hidden state
, updated through gating mechanisms. The input gate, forget gate, and output gate are defined as (
9)–(
11):
where
is the sigmoid activation function, and
are trainable weight matrices. The cell state and hidden state are updated as (
12) and (
13):
Here, the custom activation function is used to modulate the cell state output, enhancing the network’s ability to model complex temporal patterns.
3.4. Dense Output Layer and Prediction
The output of the LSTM layer,
, is passed to a dense layer for binary classification. The dense layer computes the final prediction as (
14):
where
and
are the weights and biases of the dense layer. By using the custom activation function in the output layer, the model provides well-calibrated probabilities while maintaining sensitivity to subtle variations in the input features.
3.5. Custom Loss Function
The training process minimizes a custom loss function that combines binary cross-entropy (BCE) with mean squared error (MSE). The loss function is defined as (
15):
The binary cross-entropy term penalizes incorrect predictions as presented through (
16):
Then, the mean squared error term encourages the predicted probabilities to be closer to the true labels, as presented through (
17):
The regularization parameter
balances the two components, ensuring both accurate classification and probabilistic calibration.
3.5.1. Motivation for Combining Tanh and Sigmoid
The selection of an appropriate activation function is a critical component in deep learning architectures, as it directly influences gradient propagation, convergence behavior, and the ability to model complex relationships within data. In this study, a novel activation function is proposed, which is defined as the product of the hyperbolic tangent (tanh) and the sigmoid function. The motivation behind this combination stems from the complementary properties of the tanh and sigmoid functions. The tanh function produces outputs in the range and is zero-centered, which is advantageous in stabilizing weight updates and preventing imbalanced activations during training. However, tanh suffers from the vanishing gradient problem for large values of , where the gradient approaches zero, leading to slow convergence and ineffective learning in deep networks. Conversely, the sigmoid function, which maps inputs to the range , is widely used in probabilistic modeling but is not zero-centered, which can result in biased gradient updates and slower convergence. Additionally, sigmoid suffers from saturation issues, where gradients become negligible for very large positive or negative values of z.
By combining the two functions multiplicatively, the proposed activation function inherits desirable characteristics from both. The presence of tanh ensures that the activation remains zero-centered, which helps in maintaining balanced gradient updates across layers. Meanwhile, the incorporation of sigmoid introduces probabilistic properties, making the function particularly suitable for classification tasks. The output range of the proposed activation function is effectively constrained between , which limits extreme activations and reduces the risk of exploding gradients, thereby improving numerical stability during training. Furthermore, this formulation mitigates the vanishing gradient issue present in both tanh and sigmoid alone, as the derivative of the custom activation function retains non-zero values across a broader range of inputs compared to standard activation functions.
3.5.2. Theoretical Properties and Mathematical Justification
To further analyze the properties of the proposed activation function, its derivative is computed as follows:
This derivative reveals several important characteristics. First, when , the gradient remains moderate, preventing gradient explosion and ensuring stable weight updates. Second, when z is large, the output remains bounded due to the natural saturation properties of sigmoid and tanh, which prevents uncontrolled activations. Unlike the standard tanh or sigmoid functions alone, which individually suffer from gradient saturation, the proposed function maintains a non-zero gradient across a wider range, allowing for more effective backpropagation. Additionally, when z is negative, the function continues to produce meaningful gradients, unlike the rectified linear unit (ReLU) function, which clips negative inputs to zero, effectively preventing any updates to the corresponding neurons.
These mathematical properties make the proposed activation function particularly advantageous in deep networks that require stable and efficient gradient propagation. In architectures with recurrent components, such as the hybrid CNN-LSTM model utilized in this study, maintaining effective gradient flow is essential to capturing long-range dependencies in sequential data. The combination of the smoothness of sigmoid and the zero-centered nature of tanh makes the proposed function well suited for handling complex spatiotemporal relationships, as required for fan actuator state prediction in smart greenhouse environments.
3.6. Training and Optimization
The model is trained using the Adam optimizer, which adjusts the learning rate for each parameter based on the gradients and their second moments. The update rule is given by (
19):
where
is the learning rate,
is the gradient of the loss function,
is the exponentially decaying average of squared gradients, and
is a small constant for numerical stability. The hybrid CNN-LSTM architecture integrates convolutional layers for spatial feature extraction, LSTM layers for temporal dependency modeling, and a dense output layer for prediction. The use of a custom activation function enhances the model’s ability to learn complex relationships, while the custom loss function ensures a balance between accurate classification and probabilistic calibration. This combination makes the architecture well suited for the dynamic and non-linear nature of the smart greenhouse data, enabling robust and reliable predictions of the
Fan_actuator_ON state.
4. Experiment Setup
The experimental setup for evaluating the hybrid CNN-LSTM architecture was designed to ensure robustness and reproducibility. The experiments were implemented in Python (version 3.13.3), utilizing TensorFlow as the primary framework. All computations were performed on an NVIDIA GPU with CUDA support, enabling efficient training of the deep learning models. The dataset, comprising
samples and
features, was split into training and testing sets in an 80:20 ratio. The training set was further divided into ten folds for cross-validation, a technique chosen to ensure the generalizability of the results. To address the inherent class imbalance in the target variable, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training data. SMOTE creates synthetic samples for the minority class by interpolating between existing samples. Mathematically, a new synthetic sample
is generated as (
3). Furthermore, the training process minimized the custom loss function, defined in
Section 3.4, which combines binary cross-entropy (BCE) and mean squared error (MSE). For a given sample, the loss function is expressed as (
15). The optimization of the model parameters was performed using the Adam optimizer. The parameter update rule for each weight
at iteration
t is (
19). The performance of the hybrid CNN-LSTM model was evaluated using a comprehensive set of metrics. Accuracy measures the proportion of correctly classified samples and is defined as (
20):
where
is the indicator function, returning 1 if its argument is true and 0 otherwise. Precision quantifies the proportion of true positive predictions among all positive predictions, given by (
21):
where TP and FP denote the true positives and false positives, respectively. Recall evaluates the model’s ability to identify all actual positive cases and is computed as (
22):
where FN represents false negatives. The F1 score, a harmonic mean of precision and recall, balances these two metrics and is defined as (
23):
To ensure robust and unbiased evaluation, a ten-fold cross-validation approach was adopted. In this procedure, the dataset was divided into ten equal subsets. For each fold
k, the model was trained on nine subsets and validated on the remaining subset. The average loss across all folds was computed as (
24).
where
is the number of folds,
is the loss for the
k-th fold, and
are the model parameters learned during the
k-th training iteration.
To validate the effectiveness of the hybrid CNN-LSTM architecture, its performance was compared against baseline models, including Random Forest (RF), Gradient Boosting (GB), standalone CNN, and standalone LSTM. All models were trained and evaluated under identical conditions, using the same dataset splits, cross-validation framework, and evaluation metrics. The computational complexity of the hybrid CNN-LSTM model was analyzed in terms of the total number of trainable parameters, training duration, and runtime performance. The model contained approximately million trainable parameters, distributed across convolutional filters, LSTM cells, and dense layers. The training process for a single fold of the ten-fold cross-validation took approximately 18 min on an NVIDIA RTX 3090 GPU, resulting in a total training duration of approximately 3 h. The average inference time per sample was measured at milliseconds, demonstrating the feasibility of real-time predictions in deployment scenarios. The computational cost of the hybrid CNN-LSTM model was compared against the baseline models. Traditional machine learning models, such as Random Forest and Gradient Boosting, had significantly fewer parameters and lower computational demands but exhibited lower predictive accuracy. Conversely, standalone deep learning models, including CNN and LSTM architectures, required similar training time but failed to achieve the same level of predictive performance as the hybrid approach. These results indicate that while the hybrid CNN-LSTM model is computationally more expensive than traditional ML models, it provides superior accuracy and generalization capabilities. The experimental code was modularly implemented to facilitate reproducibility and scalability. Preprocessing steps, such as SMOTE application, feature scaling, and data splitting, were encapsulated in reusable functions. The model architecture, training pipeline, and evaluation metrics were implemented in a manner that allowed for seamless experimentation with different configurations. During training, real-time monitoring of loss and metrics was achieved through visualization tools, ensuring transparency and early detection of overfitting. Additionally, hyperparameter tuning was conducted to optimize the architectural design and learning parameters, ensuring that the final model achieved optimal performance under given computational constraints.
5. Results and Discussion
This section presents the results of the experiments conducted to evaluate the proposed hybrid CNN-LSTM model and its comparisons with various baseline models, including both traditional machine learning and deep learning methods. The discussion elaborates on the performance of these models using the evaluation metrics of accuracy, precision, recall, training time, model size and F1 score as presented in the
Table 1.
5.1. Performance of the Proposed Hybrid CNN-LSTM Model
The proposed hybrid CNN-LSTM model achieved remarkable performance, demonstrating its ability to effectively capture both spatial and temporal dependencies in the smart greenhouse data. The model attained an accuracy of 0.9992, precision of 0.9989, recall of 0.9996, and an F1 score of 0.9992. These results indicate near-perfect predictions, with minimal false positives and false negatives. The high recall value of 0.9996 reflects the model’s exceptional ability to correctly identify positive instances of fan actuator state. This is particularly significant in the context of smart greenhouse management, where missing a true positive could result in suboptimal environmental control. The precision of 0.9989 further ensures that almost all predicted positives are indeed correct, minimizing unnecessary activations of the fan actuator. The harmonic mean of precision and recall, as represented by the F1 score, confirms the balance achieved by the model across these metrics.
5.2. Comparison with Traditional Machine Learning Models
Among the traditional machine learning models, XGBoost achieved the highest performance, with an accuracy of 0.9447, precision of 0.9713, recall of 0.9165, and an F1 score of 0.9431. Random Forest followed closely with an accuracy of 0.9429, precision of 0.9744, recall of 0.9098, and an F1 score of 0.9410. These results highlight the effectiveness of ensemble-based methods in capturing the non-linear relationships in the data. Gradient Boosting exhibited comparable performance to Random Forest, with an accuracy of 0.9384, precision of 0.9740, recall of 0.9007, and an F1 score of 0.9360. However, its slightly lower recall indicates a minor reduction in its ability to identify true positives compared to XGBoost and Random Forest.
Logistic Regression and SVM, while still achieving reasonable performance, were outperformed by the ensemble-based models. Logistic Regression attained an accuracy of 0.9097, precision of 0.9140, recall of 0.9045, and an F1 score of 0.9092. Similarly, SVM achieved an accuracy of 0.9122, precision of 0.9194, recall of 0.9037, and an F1 score of 0.9115. These results suggest that simpler linear and kernel-based methods are less effective at capturing the complex patterns in the data compared to ensemble methods and deep learning models. Interestingly, the Stacking Classifier performed poorly in comparison, with an accuracy of 0.7849, precision of 0.7323, recall of 0.8981, and an F1 score of 0.8068. This underperformance may indicate suboptimal integration of the base models or insufficient diversity among the ensemble’s components.
5.3. Comparison with Deep Learning Models
The standalone deep learning models, including Multilayer Perceptron (MLP), CNN, and LSTM, also exhibited strong performance, albeit slightly lower than the proposed hybrid CNN-LSTM model. The MLP achieved an accuracy of 0.9363, precision of 0.9721, recall of 0.8983, and an F1 score of 0.9337. The CNN model achieved comparable results, with an accuracy of 0.9334, precision of 0.9662, recall of 0.8983, and an F1 score of 0.9310. The LSTM model similarly performed well, with an accuracy of 0.9345, precision of 0.9708, recall of 0.8960, and an F1 score of 0.9319.
The hybrid CNN-LSTM model outperformed the standalone CNN and LSTM models, demonstrating the effectiveness of combining spatial feature extraction from CNNs with temporal dependency modeling from LSTMs. The integration of these two paradigms allowed the hybrid model to capture both local and sequential patterns in the data, leading to its superior performance.
5.4. Computational Efficiency and Training Time Analysis
The experimental results provide a comprehensive evaluation of the proposed hybrid CNN-LSTM model in comparison with traditional machine learning models and standalone deep learning architectures. A key aspect of this evaluation is the training time, which significantly influences the feasibility of model deployment in real-time or resource-constrained environments. The proposed hybrid CNN-LSTM model required 37.05 s to train, which is considerably longer than traditional machine learning models such as SVM (0.716 s), Random Forest (1.11 s), and Logistic Regression (0.0157 s). This is expected due to the computational complexity involved in deep learning models, particularly those incorporating sequential dependencies like LSTMs. When compared to standalone deep learning architectures, the hybrid model required more time than CNN (34.40 s) but was notably more efficient than LSTM (76.82 s). This suggests that while LSTMs excel at capturing sequential dependencies, their high computational cost is mitigated when combined with CNNs, which provide efficient feature extraction. In addition to training time, model size is another critical factor, especially for deployment on edge devices or systems with limited memory. The hybrid CNN-LSTM model exhibited the largest size (1455.33 KB) among all models, confirming that the combination of CNN and LSTM results in a substantial increase in the number of parameters. In contrast, traditional machine learning models such as Logistic Regression (0.999 KB), SVM (310.81 KB), and XGBoost (82.28 KB) had significantly smaller footprints, making them more suitable for lightweight applications. Among deep learning models, CNN had a model size of 278.96 KB, while LSTM was slightly larger at 382.54 KB, reinforcing the notion that sequence modeling incurs additional parameter storage. The model size of the hybrid approach is considerably larger than its standalone counterparts, suggesting that while it offers improved predictive power, it may not be ideal for deployment in environments with strict memory constraints unless optimized using techniques such as pruning or quantization.
A key statistical measure in this analysis is the p-value, which determines the significance of differences in performance among the models. The proposed hybrid CNN-LSTM model achieved a p-value of 0.02202, indicating that its superior performance is statistically significant compared to baseline models. This suggests that the observed improvements in accuracy, precision, recall, and F1 score are unlikely to be due to random variations in the data. Similarly, Random Forest (0.02080), Gradient Boosting (0.02491), and Logistic Regression (0.0213) also yielded p-values below 0.05, signifying that their performance differences are statistically significant. Among all models, XGBoost had the lowest p-value (0.01082), reinforcing its reliability as a high-performing model with strong statistical validity. Conversely, deep learning models such as CNN (0.37390), LSTM (0.12456), and MLP (0.12269) had p-values greater than 0.05, suggesting that their performance differences may not be statistically significant compared to the baseline. The Stacking Classifier exhibited the highest p-value (0.2521), indicating that its performance variations were not significant in comparison to other models. From these results, several key insights emerge regarding the trade-offs between accuracy, computational efficiency, and statistical significance. The hybrid CNN-LSTM model achieved the highest accuracy (0.9992) but required substantially longer training time (37.05 s) and the largest model size (1455.33 KB). This suggests that while it provides superior classification performance, its computational demands may make it less practical for real-time applications unless optimized. Traditional machine learning models such as XGBoost, Random Forest, and Gradient Boosting offered competitive accuracy while maintaining significantly lower training time and model size, making them strong alternatives for deployment in constrained environments. Among deep learning models, CNN demonstrated a balanced trade-off between performance and efficiency, whereas LSTM incurred the highest computational cost.
5.5. Discussion and Insights
The results clearly indicate the superiority of the proposed hybrid CNN-LSTM model over both traditional machine learning and standalone deep learning approaches. The high accuracy, precision, recall, and F1 score suggest that the hybrid model effectively generalizes the underlying patterns in the data. This performance can be attributed to the complementary strengths of CNNs and LSTMs, which allow the model to process spatial and temporal features simultaneously. Ensemble-based machine learning models, such as XGBoost and Random Forest, also performed well, highlighting their robustness in handling complex data. However, their performance was slightly lower than that of the hybrid model, suggesting that deep learning techniques are better suited for capturing the intricate relationships present in this dataset. The relatively poor performance of the Stacking Classifier may indicate issues with overfitting or the selection of base models. This result emphasizes the importance of careful design and validation of ensemble methods to ensure their effectiveness.
5.5.1. Real-World Applications and Deployment Challenges
Despite the promising results, deploying the hybrid CNN-LSTM model in real-world smart greenhouse environments presents several challenges. One of the primary concerns is the computational complexity of deep learning models. Unlike traditional machine learning approaches, which can be executed on low-power edge devices, deep learning models often require high-performance GPUs or TPUs for inference. This poses a challenge for resource-constrained greenhouse setups that may rely on embedded systems with limited processing power. A potential solution is model optimization through techniques such as quantization, pruning, and knowledge distillation, which can significantly reduce computational demands while maintaining accuracy. Another key consideration is the real-time adaptability of the model. Environmental conditions in greenhouses change dynamically, requiring predictive models that can update and adapt to new data continuously. Implementing an adaptive learning mechanism, such as online learning or periodic retraining with fresh data, can enhance model robustness. Additionally, integrating the hybrid model with IoT-enabled sensors and edge computing frameworks would allow decentralized decision-making, reducing reliance on cloud-based processing and improving response times for actuator control.
5.5.2. Possible Adaptations for Real-Time Greenhouse Monitoring
For successful deployment, the model should be incorporated into a real-time greenhouse monitoring system that seamlessly interacts with IoT sensors and actuators. This requires an efficient data pipeline that preprocesses sensor readings in real time, feeds them into the model, and triggers actuator responses based on the model’s predictions. Implementing a hierarchical decision-making approach where low-complexity rule-based systems handle routine tasks and the deep learning model intervenes in complex scenarios can further optimize energy efficiency. Furthermore, external factors such as network latency and data transmission failures must be addressed. Deploying the model on edge computing devices closer to the greenhouse environment minimizes dependency on continuous internet connectivity. Additionally, ensuring robustness against noisy or missing sensor data through advanced imputation techniques can enhance real-world applicability.
6. Implications and Limitations
The findings of this study, particularly the superior performance of the hybrid CNN-LSTM model, have several important implications for both research and practical applications in smart greenhouse management. At the same time, this study has certain limitations that highlight areas for future work.
6.1. Implications
The hybrid CNN-LSTM model demonstrated exceptional predictive capabilities, with near-perfect performance across all evaluation metrics. This suggests that such architectures are well suited for capturing both spatial and temporal dependencies in complex datasets. In the context of smart greenhouses, this translates to highly accurate and reliable control of environmental conditions, which can improve crop yields, reduce resource wastage, and enhance sustainability. One of the key implications of this study is the potential for generalization to other domains that involve spatiotemporal data. For example, similar architectures could be applied to predictive maintenance in industrial settings, where sensor data combine spatial and temporal features. Additionally, this approach could benefit healthcare applications, such as monitoring patient vitals over time, or transportation systems, where traffic patterns are inherently temporal and spatial.
The integration of a custom activation function and a custom loss function into the hybrid CNN-LSTM architecture further underscores the importance of tailoring deep learning models to the specific characteristics of the problem. These customizations not only improved the performance of the proposed model but also provided better-calibrated predictions, which are crucial for decision-making in critical systems like smart greenhouses. Moreover, this study highlights the complementary strengths of CNNs and LSTMs, showing that hybrid architectures can outperform standalone deep learning models and traditional machine learning methods. This has implications for researchers and practitioners aiming to develop state-of-the-art predictive models, as it provides a clear direction for combining multiple paradigms to enhance performance.
6.2. Limitations
Despite its strengths, this study is not without limitations. First, the dataset used for training and evaluation, while extensive, was collected from a single smart greenhouse environment. As a result, the generalizability of the findings to other greenhouses with different environmental conditions or control systems remains uncertain. The dataset may also contain inherent biases in environmental conditions, which could influence model performance in real-world applications. Future work should validate the model on datasets collected from diverse settings, incorporating variations in climate, greenhouse architecture, and crop types to establish its robustness. Additionally, methods such as domain adaptation or transfer learning could be explored to improve the model’s adaptability to new environments without requiring extensive retraining. Second, the computational cost of training the hybrid CNN-LSTM model is relatively high compared to traditional machine learning models. The reliance on GPU acceleration for efficient training may limit the applicability of this approach in resource-constrained environments. While the high accuracy achieved demonstrates the effectiveness of the model, the trade-off between complexity and real-time deployment needs further consideration. Strategies such as model quantization, pruning, or knowledge distillation could be explored to optimize the architecture for faster inference and lower resource consumption, enabling deployment on edge devices or embedded systems.
Third, while this study introduces a novel activation function that combines the hyperbolic tangent and sigmoid functions, and employs a custom loss function that integrates binary cross-entropy (BCE) with mean squared error (MSE), a dedicated numerical evaluation of these components was not performed. Although theoretical justifications have been provided to explain the expected benefits of these design choices, an empirical ablation study comparing the proposed activation and loss functions against conventional alternatives (such as ReLU, standard BCE, and standard MSE) was not conducted due to computational constraints and scope limitations. Performing such experiments would require additional extensive model training runs with multiple configurations, which was not feasible within the current study’s resource allocation and timeframe. Furthermore, since the primary objective of this study was to develop an integrated deep learning framework tailored for greenhouse actuator control, the focus remained on optimizing the overall model rather than isolating individual component contributions. Future research should conduct controlled experiments comparing the hybrid activation function and custom loss function with standard activation approaches to quantify their exact contributions. Fourth, the interpretability of deep learning models remains a significant challenge, particularly in applications where decision transparency is crucial. While the proposed CNN-LSTM architecture achieved exceptional predictive performance, understanding its internal decision-making process is non-trivial. Black-box nature issues persist in deep learning models, and this study did not incorporate post hoc explainability techniques to interpret how specific environmental conditions influence predictions. Approaches such as SHAP (SHapley Additive exPlanations), integrated gradients, or attention mechanisms could be explored in future work to enhance model interpretability. Additionally, assessing model reliability under varying input distributions is essential to ensure robustness in real-world deployments.
Fifth, while the model outperformed traditional classifiers, no formal error analysis was conducted to assess potential failure cases and sensitivity to noise or adversarial inputs. In practical greenhouse implementations, sensor readings may be affected by external noise, calibration errors, or missing data. The impact of such perturbations on model stability was not rigorously evaluated in this study. Future research should examine the resilience of the proposed approach by incorporating synthetic noise or adversarial attacks into the dataset to assess robustness. Finally, the relatively poor performance of the Stacking Classifier highlights the need for a more rigorous ensemble design. The stacking approach was not extensively tuned or optimized in this study, which may have limited its effectiveness. Future work could explore advanced ensemble techniques or hybridization strategies that integrate the strengths of stacking with deep learning architectures. Additionally, a comparative analysis of ensemble learning using different feature representations may provide insights into the optimal configurations for improving predictive performance. While these limitations acknowledge the areas requiring further refinement, the proposed model remains a promising step toward developing intelligent climate control solutions for smart greenhouses. Addressing these challenges in future research could further enhance the practicality, interpretability, and generalizability of deep learning-based actuator control systems.
7. Conclusions
This study proposed a hybrid CNN-LSTM architecture designed for predicting the activation state of fan actuators in smart greenhouses, leveraging deep learning techniques to enhance predictive accuracy and operational efficiency. By integrating Convolutional Neural Networks (CNNs) for spatial feature extraction and Long Short-Term Memory (LSTM) networks for temporal dependency modeling, the hybrid model effectively captured both localized patterns and long-range dependencies within environmental sensor data. Additionally, a custom activation function combining the tanh and sigmoid functions was introduced to improve gradient propagation and model stability, while a custom loss function incorporating binary cross-entropy (BCE) and mean squared error (MSE) was developed to enhance prediction calibration. Experimental results demonstrated the superior performance of the proposed model, achieving an accuracy of 0.9992, precision of 0.9989, recall of 0.9996, and an F1 score of 0.9992, significantly outperforming traditional machine learning models such as Random Forest, Gradient Boosting, and XGBoost, as well as standalone CNN and LSTM architectures. A key advantage of the proposed hybrid CNN-LSTM model lies in its ability to optimize actuator control decisions by effectively integrating spatial and temporal dependencies in sensor readings, thereby reducing unnecessary energy consumption and improving environmental regulation within the greenhouse. However, this study also highlights several challenges, including the computational complexity of deep learning-based approaches, the need for extensive hyperparameter tuning, and the difficulty of deploying high-parameter models in resource-constrained environments. Additionally, while theoretical justifications were provided for the custom activation and loss functions, an empirical ablation study comparing them against standard alternatives was not conducted due to computational constraints. Addressing these limitations is essential for future advancements in the field.
To further refine and extend this research, several directions are proposed for future work. First, the generalizability of the model should be assessed on datasets collected from multiple greenhouse environments with varying climatic conditions, sensor configurations, and crop types. This would provide a more comprehensive evaluation of model robustness and adaptability. Domain adaptation techniques and transfer learning strategies could also be explored to reduce the need for retraining when applying the model to different agricultural settings. Second, computational efficiency remains a critical consideration for real-time deployment. Future research should investigate techniques such as model quantization, pruning, and knowledge distillation to reduce the hybrid model’s computational footprint without compromising predictive accuracy. Deploying lightweight versions of the model on edge computing devices, such as embedded systems or microcontrollers, could enable real-time inference without relying on cloud-based infrastructure. Third, explainability and interpretability of deep learning models in greenhouse control systems require further investigation. Techniques such as SHAP (SHapley Additive exPlanations), attention mechanisms, and integrated gradients could be employed to provide insights into how sensor data influence the model’s predictions, ensuring transparency and trust in automated decision-making. Enhancing model interpretability would also facilitate its adoption in agricultural management systems where explainability is essential for user acceptance. Fourth, a dedicated ablation study should be conducted to empirically validate the contribution of the custom activation function and loss function. This analysis would involve training the model with standard activation functions (e.g., ReLU, Leaky ReLU) and loss functions (e.g., standalone BCE or MSE) to quantify the performance improvements introduced by the proposed modifications. Additionally, alternative activation functions with adaptive properties could be explored to further optimize gradient flow in deep architectures. Finally, real-time adaptability remains an open challenge in smart greenhouse automation. Future research could investigate the integration of reinforcement learning-based control strategies that dynamically adjust actuator behavior based on continuous feedback from sensor data. Hybrid AI systems combining deep learning with rule-based control mechanisms could offer a balance between predictive power and operational reliability, ensuring optimal climate regulation under diverse environmental conditions.