1. Introduction
In the current world, innovations are being adopted in urban settings as more importance is placed on smart technologies in infrastructural design. Intelligent Transportation Systems (ITS), smart power networks, and multi-sensory networks are some of the components of this change, and when combined, they are referred to as smart cities [
1]. These innovations hinge on the processing and sharing of data, which is central to the effective functioning of cities’ activities and, thus, the overall improvement of the populace’s quality of life in a sustainable manner. Moreover, technological development has led to IoT technology playing a significant role in smart cities’ functionality and performance to develop better connectors, resources, and services. Under these conditions, Named Data Networking (NDN) can be considered a viewpoint that can improve the performance of data transmission and data access in the IoT [
2]. As opposed to IP-based networking, where attention is paid to the position where the content is located, NDN emphasizes the content itself, which makes the latter more suitable for IoT networks that are characterized by their dynamic and data-oriented nature [
3,
4]. As the smart city has been shaped by interconnected things around it, the problems of how to store and control the data gathered by such things have become more crucial [
5].
Traditional methodologies appear to significantly impact the speed and sustainability of smart city use cases. This paper outlines a novel caching approach for the federated learning-based NDN IoT network that leverages robust learning algorithms to determine data popularity and proactively update the cache’s location. Federated learning is another distributed learning model in which edge nodes participate in the process of training a global model while no raw data are exchanged. This technique helps to maintain data privacy and cuts overhead communication; therefore, it is suitable for a smart city ecosystem that values data privacy and minimizes resource utilization. Thus, in the context of the proposed model, federated learning allows for predicting the popularity of data items using local data and aggregating individual predictions to make caching decisions. This approach not only improves the speed of retrieving information from large data sets but also reduces energy usage, which in turn adheres to the sustainability of smart cities. Thus, NDN is considered a promising approach to managing data challenges in smart cities [
6,
7]. NDN, on the other hand, does not place as much emphasis on the host addresses as traditional IP-based networking, but rather on the content, which enhances the actual data’s delivery and dissemination. This is especially true for IoT applications, where data-oriented communications result in lower latencies and better network performance [
8]. However, the billions of interconnected devices in a smart city create a problem of big data management for storage and energy consumption [
9]. To maintain these data, some effective caching strategies are required. The conventional centralized caching strategies are not optimal because they are highly power intensive and cause delay, which is unrealistic for the dynamic and distributed characteristics of smart city applications [
10,
11]. In order to meet these challenges, it is proposed to develop an energy-efficient caching scheme for IoT NDN with predictive analysis based on federal learning in smart cities. Federated learning allows the distribution of the model training process among edge nodes while keeping the data locally without sending them to a centralized location so that privacy is preserved and the need for inter-node communication is minimized [
12]. By predicting how popular the data item in question will be, the placement of each data item in the cache is changed to minimize data retrieval time and energy use [
13].
The proposed model incorporates several key components. It is composed of small smart devices, like sensors and actuators, smart gateways and routers, and a large server at the center. The edge nodes also contain a local machine learning model that predicts the data requests in the corresponding part of the network in the future, while the results of these calculations are periodically transferred to a central server for global fine tuning of the model. The approach to federation guarantees that the caching strategy remains dynamic and is capable of adjusting to fluctuations in data and the network environment. As a result, our goal is to develop a model that combines the aforementioned predictive models with optimization for efficient caching energy use. As a result, the proposed model considers both retrieval delay and energy consumption to improve the efficiency of smart city networks. The key contributions are given below.
We present a survey of the proposed caching model, the federated learning frameworks, and the optimization problem of the proposed system.
We also prove our method through simulations that demonstrate a significant increase in data retrieval efficiency and energy savings compared to conventional methods.
The proposed caching strategy is compared in a simulation environment to evaluate the efficiency by comparing it with the benchmark strategy in terms of the cache hit ratio, energy consumption, and content retrieval delay.
This research provides a novel caching strategy to be considered a part of the large movement aimed at making cities more intelligent and resource efficient through the further development of data management tools.
The rest of the paper follows, as
Section 2 provides related work where existing caching strategies are described and the drawbacks of the current method are also analyzed.
Section 3 explores the challenges of caching in a smart city.
Section 4 focuses on the details of the federated learning-based caching mechanism and its architecture, as well as the algorithms and data models.
Section 5 describes the simulation environment and procedure used in benchmarking the caching approach. In
Section 6, the conclusion of the paper and future research avenues are discussed.
2. Related Studies
This section presents several caching strategies and their impacts on the IoT networks that are based on NDN. Among the different caching techniques, probabilistic-based caching, centrality-based caching, and content- and node-based as well as popularity-based caching have been developed to overcome different criteria of enhancing caching for provision. For instance, Cache Everything Everywhere [
14] wants high availability and consequently, it copies the content that consumes many resources. Consequently, besides adding validity to the content and optimizing the filtering procedure, CCS and TC add new pressure on the clients and the storage systems [
15]. Some of the caching techniques that are based on a node’s centrality, such as betweenness centrality, aim at reducing the response time by caching contents that are most often requested by the users [
4,
16]. However, it is pertinent that such strategies are disadvantageous in that they cause congestion and boost the stretch ratios of paths in a network. Another caching policy is probabilistic caching in which caching probability is adjusted based on caching parameters and request rates, such as ProbCache and pCASTING [
17]. These approaches are geared towards the elimination of energy and latency and are intended to work on the placement of caches. Moreover, different studies have been proposed using the FL concept provided in [
18,
19,
20] regarding edge computing, IoT, vehicular networks [
21,
22,
23], and blockchain [
24].
The authors in [
25] define a Probabilistic Cache (ProbCache) that tries to cache data near consumers. A probability distribution to store records is employed, where caching is performed on the foam of the probability that is inversely proportional to the distance between the consumer and the producer. Nevertheless, it is unfair in the distribution of the resources among the nodes, involving a high computation cost, and it is compulsory to set and adjust many parameters before applying it, which is recklessly known as ProCache. In [
15], the betweenness centrality (Btw) strategy was also introduced, and it caches data once on the reverse path at the node with the highest betweenness centrality. This strategy is used in the assessment of the centrality of nodes in the path between pairs of nodes. It is complex for resource-scarce nodes to compute betweenness centrality. While the edge caching strategy described in [
26] is specific to tree topologies, the idea is to cache content at the leaves. This leads to high duplication in neighboring leaves; however, similar to PoolCache [
27], it offloads and decreases the caching overhead on the core network. These steps are general requirements by which the new caching schemes can be measured. In [
28], consumer caching stores data on routers that are connected to consumers; this type of caching resembles edge caching on tree topology but it behaves like Leave Copy Everywhere (LCE) [
2], that is, every consumer is connected to all the routers on the reverse path. To produce its performance level, it relies on the network’s architecture, consumer distribution, and content popularity. The in-network caching scheme for inter-cache cooperation with Content Space Partitioning and Hash Routing (CPHR) in [
29] is used so the dominating node can partition the content space and assign these partitions to caches so as to maximize the hit ratio. Though it increases propagation latency, necessitates additional content tables, and assumes a predetermined content space, it is not compatible with dynamic IoT networks. In the HCC strategy suggested in [
30], the amounts that need to be cached are split into two layers, and a weighted-based clustering algorithm (WCA) is used in the hierarchical cluster solution. This strategy incurs a large communication overhead because of the frequent exchange of information and does not work properly in cases where cluster heads are unavailable. It assumes a static network, which is not friendly when it comes to IoT cases. A study by Yahui et al. [
31], formulated as part of the NDN-based IoT networks, examines the use of caching with benefits such as receiver and sender isolation, minimizing unnecessary data delivery, and improved Internet expandability. This paper expounds on different caching ideas and difficulties in the NDN-IoT context while stressing the significance of the NDN and IoT structures and current caching approaches. However, this study revealed drawbacks, like poor approaches in the use of resources and computational hindrances and the discretional nature of caching parameters in making suitable adjustments based on different IoT scenarios.
Energy consumption is one of the few factors that should be considered, but it has received limited attention from the NDN-IoT caching community. Most current caching strategies focus more on parameters, such as latency and the cache hit ratio, than on the energy aspect, which is critical for IoT networks’ long-term functionality. In this regard, the proposed caching strategy incorporates the optimal method of selecting contents to cache and stresses energy efficiency. It also helps in reducing the data retrieval latency since high-demand data are usually placed closer to consumers, thus making IoT devices’ energy consumption minimal. Such a balanced context of performance and power-saving rules guarantees the network’s ability to control data traffic levels while also maximizing the longevity of battery-integrated IoT devices used throughout smart city applications, thus increasing the overall efficiency and sustainability of smart city services.
3. Problem Statement
The ever-advancing technology and increased adoption of IoT devices in smart cities have led to a data explosion, necessitating optimal and effective data management and retrieval to improve service delivery [
32]. NDN uses a content-centric approach to develop a scheme that can significantly improve data delivery to the IoT. Nevertheless, the conventional NDN caching paradigms do not necessarily meet these challenges concerning latency and energy, which are crucial in smart city scenarios where conspicuous IoT devices are frequently constrained and geographically disseminated [
33]. The traditional caching techniques in NDN might not be very effective in addressing the temporal variability of data requests, resulting in cache misses whereby devices will be forced to fetch the data from distant servers, thereby causing high request response times. Delay reduces the efficacy of latency-critical applications like smart traffic control, smart emergency response, and smart health services in smart cities [
34]. Also, these IoT devices in smart cities use batteries for power, and many operations, such as continuously retrieving data or fetching data from remote servers, may drain the batteries [
35]. High energy consumption shortens the life of IoT devices, increases maintenance costs, and disrupts smart city services. Day by day, the number of IoT devices in smart cities is increasing, so caching strategies play a vital role in handling them efficiently. Traditional centralized caching strategies may end up posing constraints, resulting in inefficiency and power consumption. When scaled up, some sub-systems’ inefficiencies can lead to poor performance, high latency, and high power consumption in smart city applications. Smart city environments are indeed rather dynamic, and the amount of data generated in a specific period at a particular place and in a given context may significantly differ [
32]. Static caching mechanisms do not address these changes and, thus, offer rather low efficiency.
Caching leads to usage of the network when devices request data that have been cached, thus creating a problem of congestion in the network, especially when many devices are accessing the cached data at the same time during rush hours [
36,
37]. Delay and energy waste are experienced due to overload because of congestion that results in repeated transmission and processing of data in a network. This means that if the transitions fail to respond to dynamic data requirements, there will be latencies and inadequate network utilization, affecting the quality of smart city services [
38,
39]. Caching policies that are not optimal can result in high utilization of network bandwidth, especially when information has to be copied from remote servers [
40]. High bandwidth utilization causes data transmission delays, network congestion, a high latency period, and high operational costs, all of which have an impact on the efficiency of smart city networks [
41]. The two objectives do not go hand in hand most of the time, as it is possible to attain a high cache hit rate at the cost of more energy consumption [
31]. The tension between these objectives is rather difficult to manage. It is noted that a cache, though optimized for high performance, often consumes more power than necessary, and this can negatively influence the IoT networks with service provision and the longevity of the IoT devices [
42].
4. Proposed Model
The proposed caching model based on federated learning and less energy consumption for NDN is developed to solve the following issues. Federated learning is used for training the models for each node independently, and the cache placement of data items can be predicted; therefore, the retrieval latency of the cache can be minimized. A dynamic refresh of cached contents results from frequent updates of the cache based on data access patterns and predicted usage and exploitation of the smart city. Introducing the energy consumption rate into the caching decision and optimizing cache hits for energy conservation will increase battery life for IoT devices and retain efficiency as the number of IoT devices begins to increase across the numerous domains of a smart city. To this end, the proposed model tries to improve various aspects of caching architecture for NDN-based IoT in smart cities and make smart city services more effective. When designing an energy-efficient model for the caching of NDN-based IoT networks in a smart city using federated learning, we should follow a set of structured efficient functional activities for data. The system definition is followed by the nodes, and content and interactions within the network are to be determined. The set of IoT devices, including sensors and actuators, is denoted by
, and the network nodes are represented by
. The contents in the network are represented as
and the packets of prospective requests in such a network are the requests for the above contents. The set of requests generated for those contents is represented as
. For content selection, the next step is to proceed with a process of forming federated learning for the prediction of content popularity. Due to the request packet scheme, each edge node has the local request distribution rates
of each node
and content
. Each node
uses its local federated learning model to estimate the future request rate
for each content
. These local models are then combined at the central server to create a global model that then sent to the various edges to be refined.
Table 1 shows the symbols and their corresponding description used in the model.
In order to identify the cacheable node, there are several factors that need to be considered, like the cache state and the capacity of a network node. The Algorithm 1 shows how to train the FL based model to identify the future demands of the end users. The cache state at node
is defined as follows:
Algorithm 1: Federated Learning Model for Popularity Prediction |
1. Data Collection Input: // Edge Nodes // Contents Output: Procedure: Find the Request Rate
- (a)
- (b)
- (c)
- (d)
- (e)
2. Local Model Training Input: Output: Procedure: Find the Local ML model - (a)
- (b)
- (c)
- (d)
- (e)
3. Federated Learning Aggregation Input: Output: Procedure: Find the Global model and Distribute
- (a)
- (b)
- (c)
- (d)
- (e)
- (f)
- (g)
- (h)
|
refers to the collection of content available at the mentioned node at a particular time
. Caching is a concept of storing content, and this notation aims at capturing the fact that contents that are stored in the cache may change over time. Each network node
in the network has its cache capacity predefined, and it is denoted by
. This capacity is a constraining parameter for the size of content which can be stored at a node
and has a cache capacity
. Therefore, the number of contents can be cached at a node regarding its cache capacity as given by the following Equation (1):
It shows that the number of contents
must not exceed the cache capacity
of a node
. This constraint prevents the cache full problem and plays an important role in controlling the storage capacity. The topology of the cache is controlled by another decision variable
, which determines whether a given content
is cached at node
at time
. Therefore, considering the variable
value as 1 represents the presence of content
in node
; otherwise, it shows 0 values. With the help of this binary variable, how the placement of content is managed and regulated at various nodes in the network is facilitated. Thus, the sum of content cached at a node can be represented by the following Equations (2) and (3):
It shows that the sum of the decision variables
for all content
cannot exceed the cache capacity. This constraint makes it possible to limit the number of contents in the cache to the node’s capacity, making the caching strategy feasible. To evaluate the efficiency of a caching strategy, we need to consider two key factors: content retrieval latency and energy consumption. These are the characteristics that are relevant to determining the degree of access to the cache and the power consumption rate. To determine the latency for retrieving content
by node
at time
, two conditions must be encountered: the content has to be either locally cached or has to be fetched from a remote server. The latency
is defined as follows:
If content
is cached at node
(i.e.,
), then the retrieval latency for content
will be
. This indicates the time it took to obtain the content from the local cache, and this normally takes a very short time. Conversely, if the content
is not cached at node
(i.e.,
), the latency consists of two components: the time for the content to transfer from one remote server
and the time taken for interconnectivity
. Therefore, the total latency is the sum of the remote latency and the network latency, which is given by
. By assessing the level of data retrieval latency attainable through the caching strategy, the time which is usually taken to obtain more data can be minimized on the often-required content. This improves the efficiency of the network, and there is a better response from the services and a better performance from the network. On the other hand, the energy expended for accessing content
from the node
at time
depends on whether the content is anchored in the node itself or if it has to be downloaded from a server. Energy consumption
is defined by the following Equation (5):
Therefore, if content is cached at node (i.e., ), the energy consumption will be . This represents the energy that the system uses in retrieving the data from the local cache, which is mostly limited. Conversely, energy consumption is represented by two factors. First, for fetching the data from a remote server if content is not cached at node , and second, for network communication energy . Energy consumption can be modeled to make caching strategies that are efficient when it comes to energy use. Thus, energy consumption remains one of the main concerns in many applications, and in particular, IoT and mobile networks, since they are always restricted by energy and battery constraints. Hence, through optimal consumption of energy on the network, the network can be more efficient and cheaper to run. Therefore, one can conclude that it is possible to understand both latency and energy consumption, providing a balanced approach to caching systems. There are socio-technical strategies that may enhance by giving minimal latency, while there are others that aim at conserving energy. Thus, the number of function calls on the one hand and their costs on the other can be optimized to fit the specifics of the application at hand. This step helps in the management of resources in that it avoids situations whereby the caching technique places too much load on the network in terms of latency or energy usage. It is useful in the management and scheduling of the physical architecture in order to accommodate different loads and conditions. Algorithm 3 shows the mechanism to identify the energy consumption at local and global level.
The objective function of the caching model is to try to achieve the minimization of the weighted sum of average data retrieval latency and energy consumption. This is performed in order to reduce the latency while at the same time minimizing the energy consumption, which is a measure of efficiency performance. The objective function can be defined as follows:
This represents the request rate or probability that node asks for content at time . So, it helps to calculate the latency and energy consumption by the frequency of requests for each content. and are the latency weighting factor and the energy consumption weighting factor. They define how much latency effect trade-off is preferable to an energy effect trade-off. By unifying the latency and energy consumption into one, the amount of optimization can be controlled easily to find a balance between the two factors. Thus, weighting factors and can adjust a share of the one between the two protocols, depending on the requirements of the given network. Moreover, the objective function is useful because it preserves correlations between the two vital factors, delay and energy, making the optimization plan more inclusive. It should be noted that different applications and different network scenarios have different priorities. For instance, an RT application may have lower latency, as its key characteristic and an IoT network that comprises devices with batteries may require low energy utilization. Weighting factors and offer variations to different priorities in the caching strategy.
The objective function of each entity helps in the optimum utilization of the network resources by minimizing both latency and energy consumption. It results in an enhanced battery time for the devices and low operational costs, in addition to a general improvement in network operations. The quality of service is used to measure the improvement of latency concerning data access, which increases user satisfaction. Effective management of energy usage can enhance the durability of the network devices so that they stay productive for a longer duration; this would lead to a higher availability of the network. Hence, the combined objective function is the only parameter that considers all caching tier levels in the context of caching while considering the data retrieval latency and energy consumption at the same time. In addition, as it is the sole way of looking at the overall caching system and its requirements, it leads to the concept of improvement in the efficacy or improving the best use of resources, flexibility, improvement to the end-user’s usage or experience, and an increase in models for the caching networks. The Algorithm 2 shows the cache state of the network nodes and cache placement.
Algorithm 2: Cache Capacity and Placement |
1. Devices Nodes // Central Server // Contents Output: Procedure: - (a)
- (b)
// Initialize Cache State - (c)
- (d)
- (e)
2. Input: Output: Procedure: - (a)
- (b)
- (c)
- (d)
- (e)
- (f)
- (g)
- (h)
- (i)
- (j)
- (k)
|
Algorithm 3: Energy-Efficient Objective Function and Combined Objective Function |
1. Energy-Efficient Objective Function Input: Devices Nodes // Central Server // Contents // Cache Placement Output: Procedure: Energy-Effiiciency- (a)
- (b)
- (c)
- (d)
- (e)
- (f)
- (g)
- (h)
- (i)
- (j)
- (k)
2. Combined Objective Function Input: Output: Procedure: Objective function j: end Algorithm |
4.1. Optimization Problem
To identify the caching strategy as optimal, we define this an optimization problem. The aim is to minimize the summation of the objective function that relates to the time taken to access the content and energy consumption. The optimization problem is defined as follows:
Therefore, the objective is to minimize the energy and latency, and the cache capacity constraint guarantees that the number of contents stored in the cache of node does not go beyond its cache limit, while the binary decision variable guarantees that the decision variable is binary based on whether content is cached at node . In this way, posing the problem as an optimization one enables the existence of a well-structured set of steps that will lead to the discovery of the optimal caching strategy. It has the role of providing a logical framework through which the problem can be broken down to solve. Thus, the objective function contributes to a reduction in both timing and energy, which, in turn, enhance network resource utilization. This eliminates the chances of the network being sluggish or running aimlessly, which wastes a lot of energy. The cache capacity constraint helps to guarantee that the caching strategy is realistic and does not exceed any node’s capacity to cache. This eliminates congestion of the cache and also ensures that the network’s procedures and works are not interfered with. The optimization framework of the system is flexible enough to adapt to the network condition, request rates, and node capacities. The caching strategy becomes adaptable to continue being effective over time. Weighting factors and in the objective function help the decision maker to control the trade-off between delay and energy costs. This flexibility assists in the fact that the strategy of caching can be fine tuned in accordance with the needs of the network details.
4.2. Caching Based on Popularity Prediction
The caching strategy using the pre-analysis of popularity attributes’ values aims at determining the best positions of the content by targeting the most requested content. This approach applies anticipated future requests in order to keep the cache as relevant as possible in order to reduce retrieval time and energy consumption. Each edge node contains a local machine learning model, which is one part of a federated learning system, to predict the future request rate for each content . This request rate gives the probability or the frequency with which a special content associated with a particular node in the network will be accessed by the IoT devices connected to that node at a specific time .
For the overall popularity of each content
in the entire network, the predicted request rates from all nodes are accumulated. The aggregated predicted popularity for a given content
is calculated as follows:
This sum gives the total demand for content
from all the network nodes providing the global popularity of the given content
. As the popularities of the available contents are identified, the contents are rearranged in descending order regarding their popularity values to rank the contents. This ranking allows for finding out which contents are likely to be required most often in the network. Based on the ranked list, each edge node caches the top
content, and
is identified by the node’s cache capacity
. This approach is aimed to guarantee that the content with a higher access frequency is cached; thus, the number of remote accesses is minimized, and the decision variable
is then updated according to the binary values (0, 1). Therefore, when caching frequently required contents, the probability of the availability of several contents in the local cache is enhanced. This greatly reduces the mean delay times in accessing the required content, as most of the users need the frequently accessed content to be actively cached and available for quick access from the cache rather than obtaining the same content from the remote server. It is also beneficial for a large number of frequently accessed content to be stored locally, and this, in turn, minimizes the energy usage in the retrieval of the content. Algorithm 4 describes the method used to find the popularity of transmitted contents.
Algorithm 4: Predicted Popularity and Cache Update |
Procedure: Find the popularity
- (a)
- (b)
- (c)
// - (d)
- (e)
- (f)
2: Update Cache based on popularity
- (a)
- (b)
- (c)
- (d)
|
Therefore, the predicted request rate is updated at each node dynamically using the popularity ranks of the available contents. There is always an effective way of utilizing the available cache capacity, and this is performed by arranging the content in a manner that ranks it according to the expected popularity, and the cache is only the content that is likely to be frequently accessed. This makes the use of cache very effective by ensuring that no space in the cache is wasted by caching less frequently requested content. Moreover, the users are able to access the content that frequently needs to be looked up, enhancing the user experience, especially in application areas where content requests are most frequent, such as smart city applications. This strategy is suitable to be implemented in large networks, numerous edge nodes, and content. Each node has its own cache since the decision on which content should be stored is made based on the aggregated values of the popularity of occurrence computed at each node. It uses the predicted request rates to change the cache content more effectively. The strategy optimizes contents that are in high demand, which in turn improves the response rate, lowers power usage, adjusts to changing request patterns, and optimizes cache space, ultimately improving the network performance.
6. Conclusions
In this paper, we have designed and analyzed a new caching strategy for federated NDN learning in the smart city of the IoT. This makes the recommended strategy a motion entailing an energy-efficient and optimal method that is used to locate and cache contents most commonly used again. This is meant to optimize the effectiveness of the overall network, decrease the amount of time it takes to access the content, and at the same time elevate the chance of accessing the cache while at the same time minimizing energy consumption. As a result, the proposed caching strategy is compared in terms of the cache hit ratio, energy consumption, and content retrieval delay with the benchmark strategies, which are CEE, SCS, and EACP, to indicate the efficiency of the proposed approach. In the outcomes, it is seen that the proposed strategy has a higher cache hit, which means the chance of finding the requested content in the overall performance is higher; hence, less data are fetched from the source. Also, it reduces, to a very significant degree, the time that it takes to efficiently obtain the wanted content that is in high demand for a specific user or a set of specific users, and thus the speed in the network is equally enhanced. Additionally, energy consumption reveals that FLEEC minimizes energy utilization, aiding in the prolonging of the longevity of the devices in the smart city environment and, therefore, cutting the cost of operation. In addition to that, since energy is efficiently utilized for the proposed method, the strategy involved makes it relevant to the concept of sustainable development, satisfactorily enhancing the wired IoT device resilience in the network. Therefore, based on the proposed caching strategy for NDN in the IoT using federated learning, one can come up with a viable solution for smart city applications. Compared to benchmark strategies, it is an improvement in terms of higher cache hit ratios, lower delays when acquiring content, and less energy consumption. This work provides future direction and reference for the subsequent research and development related to intelligent caching strategies for the improvement of the creation of strong smart city networks.