1. Introduction
The Internet of Things (IoT) has transformed modern communication systems by enabling seamless interconnectivity among billions of devices in diverse applications, including smart cities, healthcare, industrial automation, and environmental monitoring [
1,
2,
3]. IoT enhances automation and decision-making by leveraging real-time data exchange, improving operational efficiency and user convenience [
4,
5]. However, despite its growing adoption, several challenges hinder its large-scale deployment and practicality. Energy consumption remains a critical issue, as most IoT devices are battery-powered and require efficient power management to extend operational lifespan. Battery replacement is often impractical in remote or massive deployments, increasing maintenance costs [
6,
7,
8,
9]. Moreover, IoT devices are typically low-resource systems with limited computational and memory capabilities, making it difficult to process and transmit large amounts of data efficiently [
10,
11,
12]. Connectivity is another challenge, as IoT networks must support seamless communication across varying network conditions and device densities [
13]. Additionally, coverage limitations in conventional wireless architectures restrict reliable data transmission, particularly in highly distributed IoT environments [
14,
15,
16,
17]. Last but not least, security vulnerabilities pose significant threats, as IoT systems are prone to cyberattacks, unauthorized access, and data breaches due to their decentralized nature [
18,
19,
20,
21].
To address the aforementioned challenges, cell-free massive multiple-input multiple-output (CF m-MIMO) has been proposed as a promising solution for IoT communication networks [
22,
23,
24]. CF m-MIMO leverages a distributed network of access points (APs) connected to a centralized processing unit, collaboratively serving IoT devices without traditional cell boundaries. By eliminating inter-cell interference and distributing network resources efficiently, CF m-MIMO enhances the quality of service (QoS) by improving spectral efficiency, coverage, and user fairness [
25,
26,
27]. The decentralized nature of CF m-MIMO significantly enhances connectivity, ensuring robust communication even in ultra-dense IoT environments. Unlike conventional cellular architectures, CF m-MIMO dynamically adapts network resources based on device locations and communication demands, providing uniform signal quality across a wide geographical area. By leveraging coordinated beamforming and joint signal processing, CF m-MIMO optimizes energy efficiency and mitigates interference, making it an ideal candidate for scalable, energy-efficient, and secure IoT communications [
28,
29,
30].
1.1. Related Works
The rapid growth of IoT has led to significant research in CF m-MIMO to enhance network scalability, energy efficiency (EE), and security. One of the primary challenges in CF m-MIMO IoT systems is ensuring efficient power usage while maintaining high QoS. The authors in [
28] analyzed a wirelessly powered IoT system using CF m-MIMO, where IoT sensors harvest energy from distributed APs during the downlink phase and use the power harvested for uplink transmission. Their optimization strategy minimized total transmit energy while meeting signal-to-interference-plus-noise ratio (SINR) constraints, demonstrating that CF IoT significantly outperforms collocated m-MIMO and small-cell IoT in energy efficiency. Similarly, Lee et al. [
29] investigated CF m-MIMO in low-power IoT networks and proposed an energy-efficient power control scheme that significantly reduces the transmission power of IoT devices while maintaining connectivity. Their work demonstrated power savings of 90% but highlighted challenges in balancing spectral efficiency (SE) and power consumption. Yan et al. [
30] extended this work by developing a scalable CF m-MIMO IoT system that utilizes optimal power control strategies and neural network-based solutions to improve EE. Their results indicated multifold EE improvements, but security considerations were not explicitly addressed.
Despite advances in EE, security remains a critical challenge in CF m-MIMO IoT due to the broadcast nature of wireless transmissions. Unauthorized eavesdropping poses a severe threat to confidential communications. Zhang et al. [
31] explored a non-orthogonal multiple access (NOMA)-based CF m-MIMO IoT system and derived closed-form SE and EE expressions under pilot contamination and interference conditions. However, while optimizing power control for NOMA users, their work did not incorporate physical layer security (PLS) techniques for the enhancement of secrecy rate. Similarly, Rao et al. [
32] examined pilot contamination in CF m-MIMO IoT, an issue exacerbated by the inability to assign orthogonal pilots to a massive number of IoT devices. They proposed an optimal linear minimum mean-square-error (LMMSE) channel estimation method to mitigate interference and improve uplink throughput, indirectly improving security but without explicitly considering PLS.
Beyond EE and security, recent studies have focused on improving network reliability and localization in CF m-MIMO IoT. Wei et al. [
33] proposed a fingerprint-based channel estimation and localization framework where location awareness was leveraged to enhance channel estimation accuracy. Their framework introduced a two-phase localization approach and a pilot reassignment scheme to improve positioning accuracy and channel quality. Similarly, Lan et al. [
34] introduced a RIS-assisted CF m-MIMO framework for IoT networks, optimizing power control, precoding, and RIS phase shifts to improve the sum rate (SR) and EE. Their work demonstrated that RIS could significantly enhance CF m-MIMO performance, although security issues were not addressed. Li et al. [
35] extended this by focusing on user-centric CF m-MIMO in highly dynamic IoT environments. They studied the effects of imperfect channel state information (CSI), channel aging, and non-line-of-sight (NLoS) conditions, proposing a soft handover scheme to enhance mobility support. Their findings emphasized the importance of preconfigured pilot overhead and CSI estimation in mitigating performance degradation.
Mahmoud et al. [
36] investigated CF m-MIMO for indoor factory environments in industrial IoT, analyzing the effects of centralized and distributed AP cooperation on spectral efficiency. Their proposed AP selection and pilot assignment schemes reduced pilot contamination while maintaining connectivity in highly dense deployments. Ke et al. [
37] studied massive access in CF m-MIMO IoT, addressing challenges in active user detection and channel estimation by proposing a structured sparsity-based generalized approximate message passing (SS-GAMP) algorithm. Their work compared cloud and edge computing paradigms, demonstrating that edge computing could reduce processing latency while maintaining similar performance levels. Yan et al. [
38] investigated the optimization of CF m-MIMO for IoT by developing power control algorithms that enhance both SE and EE. Their work focused on the uplink transmission scenario, where IoT devices require optimal power allocation to maintain reliable connectivity under stringent energy constraints. The authors proposed max-min power control algorithms leveraging random matrix (RM) theory, which provided accurate SINR approximations based on large-scale fading coefficients. They also introduced a neural network (NN)-based power control algorithm for the downlink, which significantly reduced computational complexity while achieving near-optimal power allocation. Their results demonstrated that machine learning-based power control in CF m-MIMO networks enhances scalability by reducing computational overhead at APs without degrading system performance.
Li et al. [
39] explored the integration of RIS and backscatter devices (BD) in CF m-MIMO symmetric radio (CF-m-MIMO-SR) to improve spectral efficiency and energy efficiency in IoT systems. Their study analyzed the performance trade-offs between RIS-aided and BD-aided CF-m-MIMO-SR systems, focusing on how RIS can mitigate the double fading effect inherent in BD communications. The authors derived closed-form SE expressions for different levels of cooperation among AP and investigated various signal cancelation schemes based on available CSI. Their simulation results demonstrated that RIS significantly improves the SE of the backscatter link due to its ability to control reflection elements, whereas BD-aided systems require additional signal processing for enhanced direct-link performance. Their findings highlight the potential of RIS in CF-m-MIMO-SR networks but emphasize the need for optimized signal processing techniques to maximize performance across both backscatter and direct links.
1.2. Research Gaps and Motivations
While CF m-MIMO has demonstrated significant improvements in QoS and connectivity in IoT networks, existing schemes often lack sufficient EE approaches. Most CF m-MIMO frameworks enhance performance by increasing the number of antennas in IoT APs. However, this comes at the cost of higher power consumption, which is particularly concerning for battery-operated IoT devices. As the number of antennas grows, the overall energy expenditure of the network also increases, creating a trade-off between EE and communication performance. Without proper EE optimization, the large-scale adoption of CF m-MIMO for IoT remains impractical due to its growing power demands.
In addition to energy concerns, security vulnerabilities in CF m-MIMO-based IoT networks present another major challenge. The broadcast nature of wireless communications makes legitimate transmissions susceptible to eavesdropping attacks, where malicious entities attempt to intercept confidential data. Security has, therefore, become an unavoidable issue in wireless communication systems, requiring advanced protection mechanisms. In the past decade, PLS has gained significant attention as a complement to traditional cryptographic encryption techniques. PLS leverages signal processing and transmission design to secure data at the physical level, reducing the dependency on computationally expensive encryption methods. Interestingly, the increase in the number of antennas in CF m-MIMO networks brings more spatial diversity gain, which has the potential to enhance PLS performance by improving secure beamforming and jamming techniques. However, despite these advantages, the existing literature has not yet fully explored secure CF m-MIMO IoT architectures, leaving security concerns largely unaddressed. To comprehensively evaluate both PLS and EE in CF m-MIMO IoT networks, this paper adopts secrecy energy efficiency (SEE) [
40,
41] as the key performance metric. SEE is defined as the ratio of the sum secrecy rate (SR) to total power consumption, providing a unified measure of both secure communication and energy efficiency [
42,
43,
44,
45]. Taking advantage of SEE, our objective is to develop an optimization framework that jointly improves security and EE, ensuring the feasibility of CF m-MIMO IoT deployments in various practical settings. To clearly highlight the positioning of our work within the existing literature,
Table 1 presents a comparative analysis of recent studies based on different metrics for designing secure and energy-efficient CF m-MIMO IoT systems, where EE represents optimization for energy efficiency, SE represents an enhancement of spectral efficiency through CF m-MIMO strategies or optimization, AB represents explicit consideration of antenna scaling and its impact on energy consumption in CF m-MIMO, PLS represents the incorporation of physical layer security techniques for secure transmission in CF m-MIMO IoT, SEE represents use of secrecy energy efficiency as a unified performance metric balancing power and security, and TO represents use of adaptive, multi-objective optimization methods to balance competing goals (e.g., security vs. efficiency).
1.3. Paper Contributions and Organization
Based on the aforementioned research gaps and the importance of EE and secure communication in CF m-MIMO-enabled IoT networks, the main contributions of this paper are summarized as follows:
This work employs SEE as a key performance metric to jointly optimize both EE and SR in CF m-MIMO-based IoT networks. Unlike traditional approaches that focus on either EE, QoS, or security, our framework ensures a balanced trade-off by quantifying the impact of power consumption on secure communication.
We propose a novel hybrid deep learning (DL) model based on a convolutional neural network (CNN) and long-short-term memory (LSTM) to improve SEE optimization in CF m-MIMO. CNN is responsible for extracting spatial features from the IoT network environment, while the LSTM network captures temporal dependencies, allowing the system to adapt dynamically to changing network conditions and security threats.
To further improve the efficiency and effectiveness of the deep-learning framework, we incorporate a multi-objective improved biogeography-based optimization (MOIBBO) algorithm for hyperparameter tuning. The MOIBBO improves training performance by optimizing key model parameters, accelerating convergence, and improving both classification accuracy and model robustness to secure IoT communications.
We conduct an extensive simulation and performance evaluation comparing our proposed model with existing CF m-MIMO security and energy efficiency frameworks. The simulation results demonstrate that our approach achieves higher SEE, lower power consumption, and better security performance compared to conventional methods.
The remainder of this paper is organized as follows:
Section 2 describes the system model and problem formulation for SEE maximization in CF m-MIMO-enabled IoT networks.
Section 3 presents the materials and methodology of the proposed hybrid DL framework.
Section 4 provides the simulation results, assessing the performance of the proposed solution.
Section 5 presents the discussion and comparison of the results with benchmark approaches, and finally,
Section 6 concludes the paper.
2. System Model and Problem Formulation
We consider a CF m-MIMO-based IoT network consisting of
K distributed APs, denoted as
, each equipped with
N antennas, serving
I single-antenna IoT devices, denoted as
, in the presence of
J active eavesdroppers, denoted as
, as shown in
Figure 1. All APs are linked to a centralized processing unit (CPU) and operate over the same time-frequency resources to simultaneously serve
while mitigating interference from
. The system follows a time-division duplex (TDD) protocol, where each coherence interval is divided into two phases: uplink training and downlink data transmission [
46,
47].
During the uplink training phase, each device
and eavesdropper
transmit a pilot sequence to the APs to facilitate channel estimation. The received pilot signals at the APs are used to estimate the CSI, which is then utilized for precoding during downlink transmission. The channel fading coefficients from AP
to device
and eavesdropper
are expressed as:
where
and
represent the large-scale fading coefficients, incorporating both path loss and shadowing between AP
and either device
or eavesdropper
. The terms
and
denote small-scale Rayleigh fading, whose elements are independent and identically distributed (i.i.d.) random variables following
. The small-scale fading remains constant over a single coherence interval but varies independently across different coherence blocks. In contrast, large-scale fading evolves at a much slower rate and remains unchanged for multiple coherence intervals [
48].
For the uplink training, we assume that all IoT devices
simultaneously transmit
mutually orthogonal pilot sequences to
K APs. As per [
49], we require
. Let
be the pilot sequence for device
, where
, and the sequences satisfy the orthogonality condition
for
. Since pilot sequences are publicly known, eavesdroppers may exploit them to launch pilot contamination attacks by transmitting sequences identical to those of legitimate devices. As a result, the received pilot signal at the
kth AP is modeled as:
where
and
denote the respective normalized signal-to-noise ratios (SNRs) for legitimate devices and eavesdroppers. The term
represents additive white Gaussian noise (AWGN) at
, modeled as
. For channel estimation at AP
, we project the received pilot signal onto
:
Thus, the projected received signal is given by:
Applying the linear minimum mean-square-error (LMMSE) estimator [
50], the estimated channel coefficient from AP
to device
is obtained as:
For notation simplicity, let
represent the mean-square value of the estimated channel coefficient:
By the properties of the LMMSE estimator, the estimation error is given by:
where
follows the independent Gaussian distribution
. Similarly, the estimated channel from AP
to eavesdropper
is expressed as:
The corresponding mean-square value for the estimated eavesdropper channel is given by:
The estimation error from AP
to eavesdropper
is defined as:
Following the properties of LMMSE estimation,
follows an independent Gaussian distribution
. For downlink data transmission, the transmitted signal from the
kth AP is given by:
where
represents the maximum normalized transmit signal-to-noise ratio (SNR) at each AP. Here,
is the maximum allowable transmit power per AP,
is the noise power, and
represents the transmitted symbol for device
, with
. The power control factor between AP
and device
is denoted as
, and
is the precoding matrix designed for the signals sent from AP
to device
. The normalized beamforming vector is computed as follows:
The total power used by AP
can be expressed as:
Each AP transmits confidential messages to
I IoT devices, while
J eavesdroppers attempt to intercept the legitimate downlink signals. The received signal at device
is given by:
where
represents the AWGN noise at device
. Similarly, the signal received by eavesdropper
while attempting to decode the transmission to device
is given by:
where
represents the AWGN at eavesdropper
. Next, we formulate the SEE maximization problem under constraints on available transmission power and QoS requirements. SEE here serves as a unified performance metric that captures the trade-off between secure data transmission and energy consumption, which is especially critical in power-constrained IoT systems. In the context of CF m-MIMO IoT networks, SEE is defined as the ratio of the sum secrecy rate to the total power consumption across all access points and devices. A higher SEE value indicates that more confidential information is successfully transmitted per unit of energy consumed. Unlike traditional metrics that consider only energy efficiency or spectral efficiency in isolation, SEE integrates security into the resource allocation framework by incorporating the difference between the achievable rate at the legitimate receiver and the maximum rate achievable by potential eavesdroppers. This allows the system to not only optimize power usage but also actively protect against information leakage. Therefore, optimizing SEE ensures that the network uses energy in a way that maximizes confidentiality while maintaining efficiency, a key requirement for the secure deployment of large-scale IoT environments [
42,
43,
44,
45].
Thus, to efficiently solve this optimization problem, we develop a hybrid DL framework that integrates CNN and LSTM networks for joint EE and security optimization. CNN is utilized for spatial feature extraction, while LSTM captures temporal dependencies, enabling more effective modeling of dynamic IoT communication patterns. Additionally, to improve the training efficiency and performance of the CNN’s fully connected layers, we employ the MOIBBO algorithm for hyperparameter optimization, given its effectiveness in handling multi-objective problems with improved convergence speed. Therefore, the primary objective of this work is to enhance the SEE of the CF m-MIMO-based IoT system through optimized power allocation while satisfying the following constraints:
The maximum normalized transmission power of each AP
is constrained by
. Thus, the power control factors must satisfy:
To ensure the QoS requirements of each IoT device
, the achievable secrecy rate must exceed a predefined threshold:
To limit the wiretapping capability of any eavesdropper
, its achievable rate for wiretapping device
is restricted as follows:
With these constraints, the SEE maximization problem is expressed as follows:
In this work, QoS-aware communication signifies the system’s capability to sustain a satisfactory SR. Consequently, our primary goal in optimizing SEE is to enhance SR while adhering to a predefined energy constraint.
3. Materials and Methods
In this section, we present the methodology and framework used to develop and evaluate the proposed model. The section is structured into four key subsections, each addressing a critical component of the study.
Section 3.1 provides an overview of CNNs, explaining their feature extraction capabilities and role in processing spatial dependencies.
Section 3.2 describes the LSTM architecture and its ability to model sequential dependencies, making it essential for handling temporal variations in IoT data.
Section 3.3 introduces the novel MOIBBO algorithm, explaining its migration, mutation, and Pareto-based selection mechanisms for optimizing DL architectures. Finally,
Section 3.4 presents the integration of CNN, LSTM, and MOIBBO into a unified framework, detailing the architecture, optimization strategy, and advantages of the hybrid model to solve the SEE problem in IoT networks based on CF m-MIMO.
3.1. Standard CNN
A CNN is a specialized class of DL models designed primarily for processing structured grid data, such as images. First introduced by Yann LeCun in 1989, CNNs were inspired by the organization of the animal visual cortex, where neurons in the brain respond to overlapping regions of the visual field. The most well-known early CNN architecture, LeNet-5, was developed in 1998 and demonstrated its effectiveness in handwritten digit recognition. Since then, CNNs have become the foundation of modern computer vision, driving applications such as image classification, object detection, medical imaging, and facial recognition. CNNs leverage spatial hierarchies of features, extracting relevant patterns from input images through the application of convolutional filters. Instead of traditional fully connected layers, CNNs utilize local connectivity and weight sharing to drastically reduce the number of parameters, making them more efficient in handling large-scale image datasets [
51,
52,
53,
54].
Each convolutional filter scans over an input matrix, capturing small localized patterns such as edges or textures, which are then combined across layers to build more complex representations. As shown in
Figure 2, a standard CNN consists of multiple essential layers that progressively transform the input data into high-level feature representations. The model consists of convolutional layers, pooling layers, a flattening stage, and fully connected layers, which ultimately produce the final output. The first stage is the convolutional layer, where each filter (kernel) slides over the input image and performs element-wise multiplications followed by summation. Mathematically, this operation can be expressed as follows:
where
X represents the input image,
W denotes the kernel weights, and
is the resulting feature map. The extracted feature maps are then passed through an activation function, commonly the Rectified Linear Unit (ReLU), which introduces nonlinearity into the model, allowing it to learn complex patterns.
Following the convolutional operation, CNNs employ pooling layers to reduce the spatial dimensions of feature maps while preserving key information. Max pooling is the most commonly used method, selecting the maximum value from each local region, thereby achieving translational invariance and improving computational efficiency. The first pooling operation in
Figure 2 demonstrates this process, in which the number of feature maps is reduced while retaining important details. After multiple convolutional and pooling layers, the flattening layer converts the extracted high-dimensional features into a one-dimensional vector. This flattened representation is then fed into fully connected (dense) layers, which perform the final decision-making. These layers resemble traditional neural networks, where each neuron is connected to every neuron in the previous layer. The final output layer typically applies a SoftMax function for classification tasks or a sigmoid function for binary classification. CNNs have revolutionized deep learning due to their ability to automatically learn feature hierarchies from raw data, reducing the need for manual feature engineering. Their success extends beyond computer vision to areas such as natural language processing (NLP) and speech recognition, where CNNs extract spatial representations from text and audio signals. Using convolutional operations, pooling, and deep feature extraction, CNNs have become indispensable tools for modern AI applications.
3.2. Standard LSTM
LSTM networks, first introduced by Hochreiter and Schmidhuber in 1997, are a specialized type of recurrent neural network (RNN) designed to address the vanishing gradient problem. Unlike traditional RNNs, which struggle to capture long-range dependencies in sequential data, LSTMs incorporate a memory cell mechanism that selectively retains and discards information over extended time steps. This key advantage makes LSTMs particularly effective for applications such as time-series forecasting, natural language processing (NLP), speech recognition, and anomaly detection in IoT systems. The LSTM architecture, as illustrated in
Figure 3, consists of multiple gates and memory cells that regulate the flow of information.
In
Figure 3, the architecture visually represents how the information propagates through the LSTM unit. The blue circles indicate elements-wise operations such as multiplication and addition, while the yellow sigmoid and
tanh activations control information processing. The previous hidden state and input are processed through weight matrices, and updated values propagate through the memory cell, ultimately generating
and
[
55,
56,
57,
58].
Unlike conventional RNNs, which rely solely on hidden states, LSTMs maintain a cell state (
) that enables long-term storage of relevant information. This memory cell is updated through three primary gates: the forget gate, the input gate, and the output gate. Each of these components is governed by specific activation functions and weight matrices. The
forget gate (
), defined by Equation (
28), determines how much of the past cell state (
) should be retained or discarded. It takes as input the previous hidden state (
) and the current input (
), applies a sigmoid activation function, and produces an output between 0 and 1. If the value is close to 0, the past memory is mostly forgotten; if it is close to 1, the memory is preserved.
where
denotes the forget gate;
,
are weight matrices;
,
are bias terms; and
represents the sigmoid activation function. The
input gate (
) and the
candidate memory update (
), given in Equations (
29) and (
30), work together to decide how much new information should be stored in the memory cell. The input gate applies a sigmoid activation function, while the candidate update uses a hyperbolic tangent function to generate new candidate values. These values are then multiplied element-wise to regulate how much information should be added to the cell state.
where
denotes the candidate generation,
represents the input gate;
,
,
,
are weight matrices; and
,
,
,
are bias terms. The cell state update, expressed in Equations (
31) and (
32), combines the previous cell state (
) and the newly computed candidate state (
), which is derived by multiplying the candidate state with the input gate. The forget gate determines how much of the past information is retained, while the input gate modulates the newly added information.
where
is the modulated candidate state;
is the updated cell state; and ⊙ represents the element-wise multiplication operator. Finally, the
output gate (
) and the final hidden state (
) determine the output at the current time step. The output gate, computed using Equations (
33) and (
34), applies a sigmoid activation function, controlling how much of the updated cell state is passed through the tanh activation function to produce the new hidden state.
where
represents the output gate;
is the updated hidden state;
,
are weight matrices;
,
are bias terms; and tanh is the hyperbolic tangent activation function.
3.3. MOIBBO
The biogeography-based optimization (BBO) algorithm was first introduced by Dan Simon in 2008 as an evolutionary optimization technique inspired by the natural distribution of species across different habitats [
59]. The fundamental idea of BBO is derived from biogeography, which studies the migration, mutation, and selection processes that govern species distribution in various ecosystems. In an optimization context, each habitat represents a potential solution to a given problem, and the quality of a solution is measured by its habitat suitability index (HSI). The key principle of BBO is that solutions with higher HSI tend to share features with lower-HSI solutions through a migration mechanism, allowing knowledge transfer and convergence toward optimal solutions. In BBO, migration plays a crucial role in the exchange of information among candidate solutions. Each habitat has an immigration rate (
) and an emigration rate (
), which determine how solutions share information.
Migration rates are typically modeled as linear functions of HSI, meaning that high-HSI habitats (better solutions) have a higher emigration rate, while low-HSI habitats (worse solutions) have a higher immigration rate. This is mathematically expressed as:
where
i represents the rank of the habitat in terms of suitability, and
N is the total number of habitats. The migration process involves selecting high-HSI solutions as donors and low-HSI solutions as recipients, enabling the exploration of new promising regions in the search space. In addition to migration, mutation is another critical operator in BBO, ensuring diversity and preventing premature convergence. Mutation is inspired by sudden environmental changes or random genetic variations in species. The mutation probability is often inversely proportional to the HSI of a habitat, meaning that worse solutions have a higher probability of undergoing mutation. The mutation rate (
) for a habitat is given by:
where
is the maximum mutation rate;
represents the probability of species count, and
is the highest value of
. This mechanism introduces randomness to the optimization process, enhancing global search capability and allowing exploration beyond the solutions obtained through migration. The interaction between host and guest habitats in the BBO is based on the principle that high-HSI habitats act as knowledge sources for lower-HSI habitats. When migration occurs, characteristics of a host (high-HSI habitat) replace some characteristics of a guest (low-HSI habitat), allowing for gradual improvement of weaker solutions. This biogeographical interaction mechanism allows the algorithm to balance exploration (through mutation) and exploitation (through migration), leading to efficient optimization in complex search spaces.
The standard linear migration model in BBO suffers from several limitations when dealing with complex optimization landscapes. Linear migration assumes a fixed and simple relationship between immigration and emigration rates, which does not adequately capture the dynamic and nonlinear nature of real-world species migration. This rigid structure limits the adaptability of the algorithm, leading to premature convergence in multimodal search spaces and reducing its ability to explore and exploit solutions effectively. Moreover, in highly non-convex problems, a static migration model fails to maintain diversity across different generations, which is crucial for avoiding stagnation in local optima. To overcome these challenges, we propose a six-stage adaptive migration model, which dynamically adjusts migration rates based on different optimization phases. Instead of relying on a single static rule, the six-rule migration strategy divides the population into distinct categories based on their fitness levels and assigns separate nonlinear migration functions to each subset. This ensures that solutions at different evolutionary stages experience customized migration behavior, improving both exploration and exploitation. The new migration model is defined as Equations (
38) and (
39):
In this nonlinear migration model, the first rule applies a polynomial-based migration rate for the best solutions (high HSI), ensuring that they contribute strongly to the population. The second rule introduces logarithmic and exponential functions, balancing the trade-off between exploration and exploitation in middle-range solutions. Finally, the third rule employs a hyperbolic tangent function, which dynamically adjusts the migration behavior of weaker solutions, ensuring gradual and stable convergence. By implementing this multi-stage migration model, the algorithm benefits from adaptive migration rates across different generations and iterations, leading to smarter and context-aware migration. This approach allows the BBO algorithm to self-regulate its migration strategy based on the evolutionary state of the population, resulting in improved diversity maintenance, convergence speed, and robustness in solving complex optimization problems [
60,
61,
62].
In this paper, we have implemented a MOIBBO algorithm, which is an extension of the standard BBO designed to handle multi-objective optimization problems. Unlike the single-objective version, MOIBBO aims to optimize multiple conflicting objectives simultaneously using the nondominated sorting approach. This strategy organizes the population into different nondominated frontiers, allowing the algorithm to maintain a diverse set of solutions that represent various trade-offs between the objectives. Nondominated sorting ensures that solutions in the population are ranked based on Pareto dominance, where no solution in a given frontier is worse than another in all objectives. The algorithm then evolves solutions across these fronts, ensuring an efficient balance between exploration and exploitation in the search space.
The MOIBBO algorithm works by incorporating the concepts of migration and mutation, similar to standard BBO, but with modifications to account for the multi-objective nature. Each solution is represented by a vector of objective values, and the fitness is evaluated using the Pareto dominance concept, where solutions are compared based on their ability to dominate others in all objectives. The algorithm uses nondominated sorting to categorize solutions into different Pareto fronts. After sorting, the migration process happens between habitats within the same Pareto front, and solutions are updated by a nondominated migration mechanism, ensuring that the population continuously moves towards a set of diverse solutions that best approximate the Pareto front. The mutation process is also adapted to maintain diversity within and across the Pareto fronts, preventing premature convergence and ensuring that all regions of the search space are explored.
3.4. Proposed MOIBBO-CNN–LSTM
The proposed MOIBBO-CNN–LSTM model, as illustrated in
Figure 4, represents a hybrid DL framework optimized using a MOIBBO algorithm. The figure is divided into two main sections: the upper part depicts the optimization process carried out by MOIBBO, while the lower part shows the CNN–LSTM architecture used for processing input data. The optimization pipeline begins with parameter initialization, followed by the computation of a fitness function, after which nondominated sorting is applied to categorize solutions into different Pareto fronts. If the stopping condition is not met, migration and mutation operators refine the population, generating new candidate solutions in each iteration. Once convergence is reached, the optimal solutions are finalized.
The CNN–LSTM component in the lower section of the figure consists of convolutional layers for feature extraction, LSTM layers for sequence modeling, and fully connected layers for final prediction. The MOIBBO optimizer enhances the overall framework by simultaneously optimizing three objective functions (Equations (42)–(44)): the root mean squared error (RMSE), which ensures that predictions are accurate; the total number of CNN layers, which controls the network complexity; and the number of neurons in hidden layers, which balances computational cost and expressiveness. The nondominated sorting approach in MOIBBO ensures that the optimization process explores multiple trade-offs between these objectives, enabling an adaptive and efficient search for the best model configuration.
where
N is the number of observations;
is the calculated parameter;
is the observed parameter;
L represents the total number of hidden layers in the CNN; and
is a binary indicator for the presence of a neuron in layer
l.
In the CNN–LSTM framework, input data undergo a hierarchical transformation, where CNN first extracts spatial dependencies. The convolutional layers apply filters to detect localized patterns, while the pooling layers reduce dimensionality and retain the most important features. After multiple layers of convolution and pooling, the extracted feature maps are flattened and passed into the LSTM component, where long-term dependencies in the data are learned. Unlike CNN, which only captures spatial features, LSTM processes sequential information by maintaining memory over past observations. This allows the model to handle time-dependent variations, making it particularly effective for applications where historical dependencies play a critical role, such as predictive modeling in IoT networks. The integration of MOIBBO with CNN–LSTM offers multiple advantages. Instead of relying on conventional backpropagation-based training, which often leads to overfitting or suboptimal convergence, MOIBBO optimizes the hyperparameters in a global search manner, ensuring that the CNN–LSTM model is not only accurate but also computationally efficient. The migration and mutation mechanisms in MOIBBO actively refine the network architecture, allowing it to self-adjust according to the complexity of the problem. This optimization process ensures that the resulting model does not suffer from unnecessary complexity while still maintaining high predictive accuracy.
For our SEE optimization in CF-mMIMO-based IoT networks, the proposed model is particularly well-suited. IoT networks require a balance between EE and security, which is often challenging due to power constraints and dynamic communication environments. The CNN–LSTM framework provides a data-driven approach to optimize SEE, where CNN identifies spatial characteristics of network signals, and LSTM captures temporal variations, making the model highly effective in understanding and predicting network behavior. MOIBBO further enhances this by optimizing the structure of the DL model, ensuring that the trade-off between computational efficiency, accuracy, and model complexity is properly managed. By leveraging multi-objective Pareto optimization, the proposed model does not settle for a single solution but instead provides a set of optimal solutions, allowing decision-makers to select the best configuration based on real-world constraints. This adaptability is crucial in wireless networks, where environmental factors constantly change, and a fixed model may not be optimal for different scenarios. The ability of MOIBBO to dynamically tune the model structure ensures that it remains robust and efficient across different IoT deployment conditions.
Figure 4 shows the flowchart of the proposed MOIBBO-CNN–LSTM model.
The flow of data through the model further highlights its effectiveness. Raw sensor data from IoT devices enters CNN, where initial transformations occur, removing noise and irrelevant features while preserving essential spatial structures. These refined features then pass through the LSTM layers, where short-term and long-term temporal correlations are modeled. The final layers of the network map the learned features to specific outputs, predicting optimal network configurations that maximize SEE. By consistently optimizing this process using MOIBBO, the model can adapt to different network conditions, ensuring that both energy efficiency and security requirements are met. In general, the proposed MOIBBO-CNN–LSTM model presents a highly adaptive and efficient solution to optimize SEE in CF-mMIMO-based IoT networks. It combines CNN’s capability to extract spatial patterns, LSTM’s ability to learn sequential delays, and MOIBBO’s strength in global optimization to provide a powerful framework that balances precision, efficiency, and robustness. The integration of multi-objective optimization ensures that no single performance metric is prioritized at the expense of others, making the model flexible for real-world deployment where trade-offs between energy consumption, security, and computational constraints must be carefully managed.
4. Results
This section presents the simulation results of the proposed SEEM framework, detailing the system setup, implementation framework, evaluation metrics, comparative analysis with other state-of-the-art models, and evaluating its SEE performance in comparison to different optimization strategies within a multi-device, multi-eavesdropper CF m-MIMO IoT network. The simulation environment assumes a distributed deployment of all nodes within a
m area, where each node is randomly positioned within this region. To visually represent this deployment, we illustrate the topology of the simulated system, where
APs,
IoT devices, and
eavesdroppers are randomly placed, as depicted in
Figure 5. For consistency, our simulations adopt the same large-scale fading coefficients and specific noise power values as those outlined in [
48]. The transmission power for pilot signals is set at
W for IoT devices and
W for eavesdroppers, with the respective normalized SNRs
and
being computed as the ratio of pilot power to noise power. Furthermore, the internal energy consumption of each antenna is assigned a fixed value of
W, while the backhaul system incurs a fixed power consumption of
W per AP. Furthermore, traffic-dependent backhaul energy use is defined as
W/(Gbits/s), which accounts for dynamic data transmission rates [
63]. The system also operates with a bandwidth of
MHz and utilizes
pilot sequences for CSI estimation. The power amplifier efficiency per AP is set to
. Finally, for iterative optimization-based evaluations, the maximum number of iterations is configured as
to ensure algorithmic convergence.
The implementation was carried out in Python 3.8, leveraging TensorFlow and Keras for DL components, ensuring efficient training and inference of the CNN–LSTM architecture. The multi-objective optimization process was conducted using the distributed evolutionary algorithms in the Python (DEAP) library, which provides robust evolutionary computation techniques tailored for multi-objective problems. To ensure computational consistency and scalability, all experiments were executed on a high-performance computing cluster equipped with an Intel Xeon processor and 128 GB RAM, providing an optimized environment for deep learning and optimization-based simulations. To assess the performance of the proposed MOIBBO-CNN–LSTM model, a comparative analysis was conducted against five widely recognized benchmark models in the field of DL and optimization-driven frameworks. The first comparative model is NSGA-II-CNN–LSTM, which integrates the nondominated sorting genetic algorithm-II (NSGA-II) with a CNN–LSTM architecture. This model was chosen because NSGA-II represents one of the most well-established multi-objective evolutionary algorithms, enabling an alternative approach to optimizing deep neural network (DNN) architectures. Another model used in the comparison is vision transformer (ViT), a DL architecture based on the self-attention mechanism, which has demonstrated remarkable success in capturing complex spatial and sequential constraints. Given its ability to extract spatial and temporal features, ViT serves as a strong competitor to hybrid CNN–LSTM architectures.
Additionally, the deep reinforcement learning (DRL) model was included as a reference since RL techniques are commonly used for energy efficiency and security optimization in IoT networks. DRL-based models dynamically learn optimal policies over time, making them valuable in real-time IoT environments. Furthermore, to evaluate the contribution of hybrid CNN–LSTM integration, two individual deep-learning models, CNN and LSTM, were used in the comparison. The standalone CNN model processes spatial features extracted from raw data without considering temporal dependencies, while the LSTM model captures sequential correlations in data without leveraging convolutional feature extraction. By including both CNN and LSTM individually, the comparative analysis demonstrates the advantages of their combined usage, particularly in scenarios where both spatial and temporal dependencies influence decision-making. This selection of benchmark models provides a comprehensive evaluation, ensuring that the proposed approach is assessed not only against evolutionary optimization methods but also against standalone and alternative deep-learning frameworks.
To evaluate the effectiveness of the proposed MOIBBO-CNN–LSTM framework, a synthetic dataset was constructed by simulating a wide range of CF-mMIMO-based IoT network scenarios. Each scenario was generated by randomly initializing the key system parameters, including the number of APs, the number of IoT devices and eavesdroppers, the channel fading coefficients, the SNRs, and the transmission power levels. Based on these randomly generated parameters, the theoretical values of SEE were calculated using Equation (
22), which encapsulates both the physical layer security and the energy efficiency of the system. The resulting dataset consisted of input–output pairs where the input captures the network configuration and the output corresponds to the calculated SEE. The generated dataset was used to train and validate the hybrid CNN–LSTM model, allowing it to learn the nonlinear mapping between network parameters and the corresponding SEE values.
To quantify the effectiveness of the models, five key evaluation metrics were employed, ensuring a rigorous and well-rounded assessment. The first metric used is the root mean squared error (RMSE), which measures the squared difference between predicted and observed values. RMSE is particularly relevant for assessing prediction accuracy, as it penalizes larger errors more heavily, making it ideal for evaluating the ability of the CNN–LSTM model to accurately predict SEE in CF-mMIMO-based IoT networks. A lower RMSE value indicates that the model provides more precise estimations, reducing deviations from actual values. The second metric used is the mean absolute percentage error (MAPE), which quantifies the relative prediction error in percentage form, as expressed in Equation (
44). This metric is particularly useful for assessing generalization across different data scales, as it provides an interpretable measure of accuracy in real-world scenarios. Given the variability in IoT network parameters, MAPE ensures that the model remains effective across diverse conditions.
In addition to RMSE and MAPE, the coefficient of determination (
) was used as a statistical measure to evaluate the strength of the correlation between predicted and actual values. This metric, computed using Equation (
45), determines how well the predictions align with real data trends. A higher
score, ideally close to 1, suggests that the model captures the variance in the dataset effectively, indicating strong predictive capabilities. This is particularly important in IoT applications, where understanding the underlying relationships between security, energy efficiency, and network parameters is critical for optimizing performance.
where
N is the number of observations;
is the calculated parameter;
is the observed parameter;
is the average calculated parameter;
is the average observed parameter;
is the standard deviation of predictions; and
is the standard deviation of observations.
Beyond accuracy-based metrics, the evaluation also considers the convergence trend of the models. The rate at which the model reaches its optimal performance is a crucial factor in determining its efficiency. A model that converges quickly requires fewer computational resources and can adapt to dynamic environments in real time, which is essential for IoT applications where rapid decision-making is required. The convergence behavior of each model is analyzed across multiple generations of optimization, ensuring that the proposed method is not only accurate but also computationally feasible for real-world deployment. Another critical factor in the evaluation process is execution time, which reflects the total computational cost of training and inference. Since IoT networks operate under strict latency and power constraints, it is necessary to ensure that the proposed model remains computationally efficient while delivering high performance. A model that requires excessive training time may not be practical for real-time network management, particularly in large-scale CF-mMIMO systems where quick adaptation is essential. Therefore, execution time is carefully monitored to assess the trade-off between performance and computational feasibility. These evaluation metrics were carefully selected to align with the core objective of the study, which is to enhance SEE in CF-mMIMO-based IoT networks.
Hyperparameter calibration is a critical aspect of DL and optimization-based models, as it directly influences the performance, convergence speed, and generalization ability of an algorithm. In complex architectures such as CNN–LSTM and transformer-based models, selecting the appropriate hyperparameters ensures that the model can efficiently extract spatial and temporal patterns while maintaining computational efficiency. Poorly tuned hyperparameters may lead to underfitting or overfitting, where the model either fails to learn meaningful representations or memorizes training data without generalizing well to new inputs. In optimization-driven frameworks such as MOIBBO and NSGA-II, hyperparameter calibration affects the balance between exploration and exploitation, determining how efficiently the search space is navigated for optimal solutions. Several approaches exist for hyperparameter tuning, with Grid Search, Random Search, and Bayesian optimization being the most commonly used techniques. Grid Search is a systematic method that exhaustively tests all possible combinations of hyperparameters within a predefined range, ensuring that the optimal set of parameters is identified. This method is highly effective for structured search spaces but becomes computationally expensive as the number of hyperparameters increases.
Table 2 provides the optimized hyperparameter configurations for the proposed MOIBBO-CNN–LSTM model and the baseline comparison methods, including ViT, DRL, CNN, LSTM, and NSGA-II, fine-tuned using the grid search approach to ensure optimal performance. For MOIBBO-CNN–LSTM, key hyperparameters such as learning rate (0.003), dropout rate (0.2), batch size (64), number of convolutional layers (10), and kernel size (5 × 5) were fine-tuned to maximize performance. Additionally, the mutation rate (0.06), population size (100), and iteration limit (300) for the MOIBBO optimizer were calibrated to balance search efficiency and convergence speed. The ViT model was configured with six attention heads, ten transformer layers, and a GELU activation function, which are optimal settings for handling complex spatial and temporal dependencies. For DRL, hyperparameters such as
-greedy exploration (ranging from 0.19 to 0.89), discount factor (0.91), and batch size (64) were tuned to ensure an optimal balance between learning and exploration. The NSGA-II-based model was configured with combination probability (0.94) and mutation probability (0.08), with a population size of 100 and 300 iterations, ensuring robust performance in multi-objective optimization. The results of this calibration process demonstrate the effectiveness of grid search in identifying optimal configurations, as each model was fine-tuned to achieve maximum predictive accuracy, computational efficiency, and stability. The hyperparameter tuning process significantly improved the convergence behavior and final performance of the models, ensuring that they operate at peak efficiency when applied to real-world IoT network optimization.
Figure 6 illustrates the convergence behavior of all evaluated models by plotting the RMSE values against training epochs, providing insight into their learning efficiency and optimization stability. The curves represent how each algorithm improves over time, with a steeper decline indicating faster convergence and better learning performance. The MOIBBO-CNN–LSTM model (blue line) demonstrates the most rapid and stable convergence, reaching near-optimal RMSE values within the first 100 epochs and continuing to refine its accuracy until it achieves the lowest RMSE by epoch 300. In contrast, the other models exhibit significantly slower convergence rates and higher final error values, emphasizing the advantage of the proposed multi-objective optimization strategy. The NSGA-II-CNN–LSTM model (green line) shows a moderate convergence rate, steadily decreasing its RMSE over training iterations but ultimately stabilizing at a higher error level compared to MOIBBO-CNN–LSTM. This suggests that while evolutionary optimization aids the CNN–LSTM structure, it lacks the adaptability and efficiency of MOIBBO’s dynamic search mechanisms. The ViT and DRL models demonstrate a slower learning process, requiring significantly more epochs to reduce their error rates. Their final RMSE values remain considerably higher, indicating that while these architectures capture useful patterns, they struggle to generalize SEE trade-offs as effectively as the hybrid CNN–LSTM framework. The standalone CNN and LSTM models exhibit the weakest convergence performance, with high initial RMSE values and slow reduction over epochs. Even after 300 epochs, their RMSE values remain significantly higher than other approaches, confirming that neither CNN nor LSTM alone is sufficient for handling the complexities of SEE optimization in CF-mMIMO-based IoT networks. These results reinforce that the MOIBBO-CNN–LSTM model achieves superior learning efficiency and accuracy through its integrated optimization approach, ensuring rapid convergence and minimal prediction error compared to alternative state-of-the-art methods.
Figure 7 illustrates the evolution of SEE over training epochs for different algorithms, showcasing their optimization efficiency. Performance trends provide insights into the trade-off between convergence speed and final SEE. MOIBBO-CNN–LSTM achieves the highest SEE, starting at
Mb/J in the initial epoch and reaching
Mb/J at epoch 200. In particular, it converges quickly, reaching near-optimal SEE (12 Mb/J) by epoch 80. This rapid improvement highlights its efficient learning capacity and its superior optimization strategy. NSGA-II-CNN–LSTM exhibits a similar trend, although slightly lower in performance, stabilizing around 12 Mb/J at epoch 90. This suggests that NSGA-II-CNN–LSTM provides competitive energy efficiency, making it a viable alternative when considering trade-offs between accuracy and computational complexity. ViT and DRL demonstrate moderate SEE improvements, stabilizing at approximately 10 Mb/J and 9 Mb/J, respectively. These methods exhibit a more gradual learning curve that requires more training epochs to achieve satisfactory performance. CNN and LSTM, on the other hand, show the slowest progression. LSTM starts with the lowest SEE (
Mb/J) and reaches
Mb/J at epoch 200, stabilizing later than all other models at around epoch 140. CNN follows a similar trajectory but with slightly better SEE convergence. The delayed improvement in these models highlights their limited optimization efficiency in energy management, which is sensitive to security.
Figure 8 illustrates the impact of the maximum transmit power of APs,
, on the SEE performance across different algorithms. The results demonstrate a general increasing trend in SEE as
rises, indicating that higher transmission power enhances overall energy efficiency. However, the rate of improvement varies across the algorithms. For lower values of
(i.e., 50–200 mW), the SEE of all models exhibits a logarithmic-like growth, showing a rapid increase initially before stabilizing. In this range, the proposed MOIBBO-CNN–LSTM and NSGA-II-CNN–LSTM achieve the highest SEE, reaching approximately 8.0 Mb/J and 7.5 Mb/J at 200 mW, respectively. The ViT and DRL models also show improvements, though with relatively lower SEE values. CNN and LSTM demonstrate the worst performance, with LSTM only reaching 5.5 Mb/J at 200 mW. As
increases beyond 200 mW (up to 500 mW), the growth rate of SEE slows down across all models, following a diminishing return effect. This indicates that after a certain power threshold, increasing
does not significantly improve energy efficiency. The proposed MOIBBO-CNN–LSTM consistently maintains the highest SEE, reaching 9.5 Mb/J at
mW, demonstrating its superior energy efficiency. The NSGA-II-CNN–LSTM model remains competitive with an SEE of approximately 9.0 Mb/J, while ViT and DRL follow closely behind. The LSTM model, however, struggles to exceed 7.0 Mb/J even at higher
values.
The results depicted in
Figure 9 illustrate a downward trend in SEE across all evaluated methods as the number of APs increases. This behavior suggests that while adding more APs enhances SR, the associated power consumption grows at a relatively higher rate, ultimately leading to a net reduction in SEE. Essentially, the benefits of improving SR through additional APs come at the cost of increased energy expenditure, particularly due to the fixed power consumption of the backhaul infrastructure. A crucial implication of these findings is the need for intelligent AP management strategies to mitigate power inefficiencies. One potential solution is the dynamic selection of active APs, where redundant APs are deactivated during periods of lower traffic demand. By strategically putting underutilized APs into sleep mode during off-peak hours, the fixed circuit power consumption can be significantly reduced, leading to improved energy efficiency without compromising system performance. This adaptive AP selection will be explored in our future research. Among the tested models, the proposed MOIBBO-CNN–LSTM consistently outperforms the baseline methods, achieving the highest SEE across all AP configurations. This further highlights the effectiveness of our approach in balancing SR improvements with energy efficiency, making it a compelling solution for optimizing large-scale CF-mMIMO networks.
5. Discussion
Table 3 presents a comparative analysis of different models in optimizing SEE in CF-mMIMO-based IoT networks. The table evaluates each algorithm based on RMSE, R
2, MAPE, and the average execution time (in seconds). The results indicate that the proposed MOIBBO-CNN–LSTM model consistently outperforms the baseline approaches, achieving the lowest RMSE (0.08), the highest R
2 score (0.97), and the lowest MAPE (1.03%), demonstrating its superior predictive performance in accurately modeling SEE trade-offs in CF-mMIMO networks. The comparative analysis highlights the limitations of standalone DL models such as CNN and LSTM, which exhibit significantly higher RMSE values (11.27 and 13.91, respectively) and lower R
2 scores (0.82 and 0.80), confirming their inability to fully capture the complex relationships between energy efficiency and security constraints. The NSGA-II-CNN–LSTM model, while performing better than conventional DL models, still struggles with an RMSE of 3.27 and a MAPE of 6.96%, indicating that the multi-objective optimization strategy of MOIBBO plays a critical role in improving the accuracy of the model. The ViT and DRL models, which use transformer-based and reinforcement learning techniques, also fail to achieve the same level of accuracy, with higher error rates and suboptimal SEE predictions compared to the proposed model.
From a computational efficiency perspective, the results demonstrate that MOIBBO-CNN–LSTM not only achieves superior accuracy but also maintains a reasonable execution time of 962 s, outperforming NSGA-II-CNN–LSTM (1241 s) and DRL (1317 s) in terms of computational cost. While CNN and LSTM models execute faster (652 s and 743 s, respectively), their inferior accuracy makes them impractical for real-world deployment. The ViT model, despite maintaining balanced performance, still suffers from higher computational demands (1012 s) while underperforming in accuracy (RMSE = 6.12, MAPE = 8.32%). The superior performance of MOIBBO-CNN–LSTM can be attributed to its integration of CNN for spatial feature extraction, LSTM for sequential modeling, and MOIBBO for multi-objective optimization, which collectively ensures optimal trade-offs between accuracy, efficiency, and model complexity. Unlike traditional optimization techniques, MOIBBO dynamically adjusts the network’s architecture, hyperparameters, and feature selection, leading to faster convergence, improved generalization, and lower prediction errors. These results confirm that MOIBBO-CNN–LSTM is the most effective framework for optimizing SEE in CF-mMIMO-based IoT networks, providing both high predictive accuracy and efficient computational performance, making it well-suited for real-time IoT network optimization and secure energy management applications.
Table 4 presents a comparative analysis of algorithm execution times based on RMSE termination conditions, demonstrating how efficiently each model converges to different levels of prediction accuracy. The table records the run time (in seconds) required for each model to reach the predefined RMSE thresholds, namely RMSE
, RMSE
, RMSE
, and RMSE
. These values indicate the rate of convergence for each approach, with lower execution times reflecting higher efficiency in achieving a given accuracy level. In particular, MOIBBO-CNN–LSTM is the only method that successfully reaches RMSE
, requiring just 386 s, while all other models fail to converge to this level within a reasonable time frame. The results indicate that MOIBBO-CNN–LSTM consistently outperforms all baseline models in both convergence speed and final accuracy. For an RMSE threshold of less than 20, the proposed model achieves convergence in just 75 s, significantly faster than NSGA-II-CNN–LSTM (201 s), ViT (312 s), and DRL (365 s). As the RMSE threshold becomes more restrictive, MOIBBO-CNN–LSTM maintains its advantage, requiring only 124 s for RMSE
and 263 s for RMSE
, while other models experience a substantial increase in execution time. The NSGA-II-CNN–LSTM model, for example, takes 381 s for RMSE
and 694 s for RMSE
, more than double the time required by the proposed model, highlighting the superior optimization strategy of MOIBBO. The inability of ViT, DRL, CNN, and LSTM to reach RMSE
further underscores their limitations in predictive accuracy and computational efficiency. CNN and LSTM, despite their relatively fast initial convergence, fail to reach any threshold below RMSE
, indicating their restricted capacity to model complex SEE relationships in CF-mMIMO-based IoT networks. DRL and ViT, while capable of improving accuracy over time, exhibit significantly longer execution times, with DRL requiring more than 1000 s to reach RMSE
, demonstrating inefficiency in real-time applications. In contrast, the MOIBBO-CNN–LSTM model efficiently balances accuracy and computational cost, making it the most practical and effective solution to optimize security-aware energy efficiency in CF-mMIMO-based IoT networks. The RMSE values from
Table 4 are also presented visually in
Figure 10 to facilitate a clearer comparison between the models. As shown in the bar chart, the proposed MOIBBO-CNN–LSTM consistently achieves the lowest error across all training stages, while baseline models exhibit notably higher RMSE values.
Table 5 presents the RMSE values recorded in different training epochs (50, 100, 200, and 300) for all evaluated models, illustrating their convergence behavior and overall learning efficiency. The results highlight how each model’s prediction accuracy improves as training progresses, with lower RMSE values indicating better performance. The MOIBBO-CNN–LSTM model achieves the lowest RMSE across all epochs, demonstrating its ability to efficiently learn complex relationships in SEE optimization. By epoch 300, the proposed model attains an RMSE of just 0.08, significantly outperforming all other methods. The NSGA-II-CNN–LSTM model, although benefiting from evolutionary optimization, converges more slowly and stabilizes at a higher RMSE of 3.27 after 300 epochs. Although this approach improves over standalone DL methods, its performance remains inferior to MOIBBO-CNN–LSTM, reinforcing the advantage of MOIBBO’s multi-objective optimization strategy. The ViT and DRL models, despite incorporating advanced architectures, struggle to reach competitive error rates, with RMSE values of 6.12 and 9.39 at epoch 300, respectively. These results suggest that while transformer-based and reinforcement learning approaches capture certain patterns, they are less effective for SEE prediction in CF-mMIMO-based IoT networks compared to the proposed hybrid model. The CNN and LSTM models show the weakest performance, with RMSE values of 11.27 and 13.91 at epoch 300, confirming their limitations in handling SEE trade-offs when used independently. Their slow convergence and high final error rates indicate that neither spatial feature extraction (CNN) nor sequential modeling (LSTM) alone is sufficient for this optimization task. The results reinforce that the MOIBBO-CNN–LSTM model achieves superior predictive accuracy through its integration of CNN for spatial features, LSTM for sequential dependencies, and MOIBBO for hyperparameter optimization, leading to a faster and more effective learning process compared to other state-of-the-art methods.