1. Introduction
The Internet of Things (IoT) is a stimulating and revolutionary ecosystem that is changing our style of living, thinking, and working. In such a smart ecosystem, intelligent objects or things perceive, learn, communicate, and interact in an effective manner to respond to contingencies, issues, and challenges. Many IoT applications already exist, such as e-healthcare, smart cities, and e-education [
1]. The recent proliferation of innovative emerging IoT applications and media streaming has led to an unprecedented growth in the amount of generated traffic [
2]. Cisco has forecast a global data traffic growth at a rate of 46% per year that would yield around 77.5 exabytes of traffic per month in 2023 [
3]. The majority of connected devices are constrained objects, and the majority of traffic concerns the dissemination and retrieval of content. Video content generated from media entertainment streaming applications and IoT devices forms up to 82% of the current total Internet traffic [
4]. As a result, the current Internet system architecture, with its heavy TCP/IP protocol stack and its host-centric operating model, is becoming quite problematic and entails various issues. A new vision of the Internet that is rather centered on the data content was proposed a decade ago by Van Jacobson and called Information-Centric Networking (ICN) [
5]. Since then, several ICN architectures have been proposed; Named Data Networking, however, has gained momentum and is expected to become a key technology for data dissemination in IoT [
6,
7,
8]. NDN is a receiver-driven and connection-less communication model providing easy and scalable data access, security, energy efficiency, and mobility support [
9,
10]. NDN’s features make it also a viable solution to satisfy the peculiarities of IoT [
11], and to better serve web content distribution [
12] and emerging video services [
13]. NDN uses content naming and name-based routing, which eliminate the need for IP addressing of the things in the network. This has led to the emergence of NDN-based IoT among the research community [
7,
14,
15,
16,
17,
18]. For instance, Amadeo et al. in [
16] provided examples of implementing IoT over NDN, including building management systems, smart home applications, and an access control solution to secure devices in an IoT environment. Meddeb et al. in [
7] compared different ICN projects to investigate and evaluate the suitability of these approaches to IoT requirements. Based on a comprehensive qualitative analysis, they showed the suitability of the NDN architecture to meet IoT requirements.
In-network caching is the main feature of the NDN architecture, and is performed as an on-path caching where data packets are cached in nodes along the forwarding path. Many researchers have investigated ways to develop and improve caching schemes for NDN, as in [
14,
15,
19,
20]. Caching decision and cache replacement policies are fundamental leverages to provide scalable, coherent, and accurate networking capabilities. Caching schemes can be classified, based on the criteria used for making the decision, into autonomous (or implicit) and collaborative (or explicit) [
21]. In autonomous schemes, the caching decision is locally computed as only information about the data in the current node are considered. On the other hand, the caching decision in collaborative schemes is based on collaboration between nodes either on a local or global scale. Collaborative schemes are intended to improve the cache hit rate by increasing the data content diversity within the network, and consequently the content retrieval time. In this paper, we propose an innovative collaborative caching approach that takes into account the content popularity and freshness to deliver a remarkable cache hit ratio and content retrieval time for popular content, while increasing the data content diversity within the network by avoiding any caching redundancy at the edge of the network.
As IoT traffic is transient in nature, data freshness has been considered one of the requirements when implementing IoT over NDN. Many studies addressed this issue with different freshness meanings and for different purposes [
22,
23,
24,
25]. It is worth mentioning that the authors in [
24] were the first to consider IoT traffic freshness and proposed the use of a statistical prediction model, Auto-Regressive Moving Average (ARMA), to predict the generation instant of the next content for event-based IoT traffic. Moreover, the replacement technique needs to highly prioritize popular data when the cache is full, to avoid a high cache miss rate and, thus, optimize the network traffic. NDN has some well-known replacement strategies, such as Least Frequently Used (LFU) and Least Recently Used (LRU), which implicitly consider data popularity. Data freshness was also considered during the content eviction process, as in the Least Fresh First (LFF) replacement strategy [
24]. The authors in [
23] proposed an implicit caching approach that combines both data lifetime and popularity. They extended their work in [
25] to add the availability factor to cache data for vehicular NDN. This will be used as a comparative approach for our proposed PF-ClusterCache explicit caching scheme.
The most challenging issue remains as to how to control and lower the redundancy in caching with minimum collaboration effort and communication overhead. In this article, we propose PF-ClusterCache a caching approach that shares the storage resources available in a cluster of nodes at the edge of the network and manages these resources as a unique global pooled storage. As such, PF-ClusterCache can enforce zero redundancy within any cluster of nodes, while it caches only the most recent popular data content. Consequently, a greater amount of different content could be cached within the network, achieving a higher caching diversity. PF-ClusterCache only caches at the edge of the network and avoids caching in internal nodes, allowing them to serve as fast name-based routing structures. It uses a simple hashing technique to map a requested content name to a node within a cluster. This tacitly results in a reduced content retrieval time and much lower communication overhead.
Few works have proposed explicit caching strategies based on managing and controlling in a distributed manner the caches of local nodes to decrease the caching redundancy and increase the content diversity. However, none of these collaborative approaches, to our knowledge, have also taken into account the popularity and freshness of the content as with the proposed PF-ClusterCache. This work advances the state of the art by offering a simple and efficient distributed caching technique that explicitly and simultaneously considers content popularity and freshness, while remaining fully compliant with the NDN specifications.
The main contributions of this article can be summarized as follows:
Design of PF-ClusterCache, a novel collaborative popularity and freshness-aware caching policy that operates at the edge of the network. PF-ClusterCache regards the content stores of nodes in a cluster as a unique global distributed storage.
Design of a popularity and freshness-aware replacement technique that needs to cache only the most recent popular content.
Collaboration among nodes of the same cluster is reduced to a minimum without any information exchange by using a simple hashing of content names to surrogate caching nodes within the cluster.
Integration of PF-ClusterCache into ndnSIM and the evaluation of its performance using a large Transit Stub network topology including a large number of consumers and producers to account for different workloads and IoT scenarios.
Benchmarking PF-ClusterCache against other representative caching schemes using a mix of popular and unpopular traffic flows and under various scenarios.
The remainder of the paper is organized as follows:
Section 2 provides the necessary background and reviews relevant related works.
Section 3 describes the details of the proposed PF-ClusterCache approach, and provides illustrative examples of its operation.
Section 4 first presents the performance evaluation of PF-ClusterCache, and details also the various caching evaluation metrics, the used topology, and the simulation scenarios. Then, it presents a detailed comparative analysis with the LCE, PoolCache, and CFPC approaches.
Section 5 concludes the paper.
3. PF-ClusterCache: Popularity and Freshness-Aware Collaborative Cache Management
PF-ClusterCache is a collaborative caching technique to manage and control the caches of a set of nodes as a global distributed storage pool. As such, PF-ClusterCache controls one large global Content Store, which is partitioned among the different nodes of an edge local cluster. In NDN, the size of the Content Store (CS) at each node is very limited compared with the huge amount of available named content (i.e., the catalogue size). The ratio of the cache size over the catalogue size should be within the interval
, as suggested in [
37]. PF-ClusterCache then represents a new way to have a much larger CS. This tacitly equates to better performance, as will be explored in the next section.
PF-ClusterCache performs the caching of only popular and fresh retrieved content. It forces the caching to be done only at the edge of the network; core routers are then left as a fast switching fabric. Unlike the caching scheme proposed in [
23], PF-ClusterCache caches popular content in only one node within the local cluster. A zero redundancy is then achieved within any cluster of nodes. The local caching node of any requested content, hereafter called the Surrogate Caching Node (SCN), is selected according to a hashing function applied on the requested content name. The hash function maps any requested content name to a node ID within the cluster. This node designates the unique SCN where the requested content might be cached or is to be cached if it turns out to be fresh content and of sufficient popularity. The hash function has to evenly distribute the requested content among the different nodes of the cluster. For the sake of this paper, we consider stable clusters where node membership is rarely changed. Clusters could be defined in many different ways. A cluster defines a set of nodes that are interconnected and geographically close to each other. Network service providers, knowing their needs, may define the various clusters in an ad hoc manner according to the requirements of different applications and working environments (e.g., enterprise, video streaming, schools, etc.). The underlying network topology being flat or hierarchical should not hinder the designation of nodes within the same cluster as long as these nodes are interconnected and within the same geographical neighborhood. However, certain network topologies, such as the Transit Stub (TS) topology, may tacitly designate the stubs as clusters. It is worth noting that not all nodes within a cluster should be members of the PF-ClusterCache. We may also adopt a clustering or partitioning algorithm to decide the clusters or the partitions [
33,
34,
38]. However, doing so is only profitable for small networks and is not only time-consuming but also does not exclude network core routers to serve as caching nodes.
The system design of PF-ClusterCache requires the identification of the current popularity of each piece of requested content. At each SCN node, a popularity table is defined to maintain the current popularity of the requested content within the cluster that is directed to this SCN. Unlike [
23], where the table size may grow to very large, inadequate values, PF-ClusterCache uses rather a very efficient popularity table of a very small fixed size. This popularity table is maintained as a Least Frequently Used (LFU) storage, permitting it to insert a content name, retrieve a content name, or evict (delete) a content name. The table can be easily and efficiently implemented using a min heap data structure and a standard collision-free hash map. The min heap is maintained based on the popularity counts of the content names, and the hash table is indexed via the content names. Since all operations on a collision-free hash map are of order O(1), it follows that the time complexity of the insert, retrieve, and evict operations is O(log n), where n is the actual number of stored names capped at the size of the table. Upon the arrival of an Interest packet with a given content name, the SCN node consults its popularity table. If the content name already exists in the table, it increments its popularity count. However, if the name is not there, then the SCN inserts the name in the table. This insertion may first invoke the eviction of the oldest, least frequently used item (the oldest, least popular content name) from the table. The insertion, retrieve, and evict operations could be done in only O(1) time complexity. This can be achieved by maintaining two doubly linked lists: a doubly linked list on the popularity count and another for all name prefixes that have the same popularity count [
39].
To keep the popularity table current, it is periodically adjusted and smoothed using the Exponentially Weighted Moving Average (EWMA) algorithm. At the end of each time interval
, the popularity count of each name prefix
n currently in the table
is updated as follows:
where
is a predefined weight parameter
,
is the number of interest packets with name prefix
n received by the SCN during the current time interval
i, and
is the calculated popularity count during the previous time interval.
PF-ClusterCache defines a popularity threshold used to decide whether to cache incoming data packets at their designated SCNs. Incoming data content is considered for caching in the SCN content store only if it is valid and its popularity exceeds the defined popularity threshold. Then, the data content is passed to the replacement technique to see whether there is a free place in the CS or content with lower popularity to be evicted. The PF-ClusterCache replacement technique is based on popularity and called the Least Popular First Replacement policy (LPF). LPF ensures that all admitted data content is valid (i.e., fresh), and that only the most popular content occupies the CS of any SCN.
Algorithm 1 illustrates the proposed LPF replacement scheme. The incoming popular data content is automatically stored in the CS if there is still space. Otherwise, it is cached only if its popularity count is greater than that of the least popular content currently in the CS.
Algorithm 1: Least Popular First Replacement Scheme. |
: Popularity Table : Content Store : Pending Interest Table : popular data packet : Caching decision in CS and Forwarding according to PIT - 1:
functionCacheData() - 2:
= - 3:
if is valid then - 4:
if CS is full then - 5:
=PT.GetPopCount() - 6:
= - 7:
=PT.GetPopCount() - 8:
if then - 9:
CS.erase() - 10:
end if - 11:
end if - 12:
CS.insert() - 13:
Forward according to PIT - 14:
end if - 15:
end function
|
The system design of PF-ClusterCache necessitates the addition of only one additional field for a second name within the NDN packet format of both Interest and Data. These names shall hereafter be referred to as Pkt.name1 and Pkt.name2. As such, each Interest or Data packet carries both the requested content name and its associated SCN name. The Interest forwarding as well as the Data forwarding are always accomplished using Pkt.name1. No additional field is required in the FIB or in the PIT; namely, the forwarding of Interest and Data packets is done in exactly the same way as specified by the NDN Network Forwarding Daemon (NFD). PF-ClusterCache has no specific requirements for the underlying routing, which could well be the Named Data Link State Routing Protocol (NLSR) or any other routing protocol.
As illustrated in
Figure 2 and
Figure 3, we consider a local cluster composed of six nodes (or routers)
,
,
,
,
, and
. Consumer C2 sends a request (an Interest) to its edge router
asking for content name
, as indicated by the red arrow
in both figures.
starts by hashing the requested content name
to find out the corresponding Surrogate Caching Node (SCN) within the local cluster. The hashing function maps
to
as the unique SCN where
could be cached. Router
then takes the requested content name (
) from Interest.name1 and puts it in Interest.name2. Then, it places the designated SCN
name (
) in Interest.name1. The Interest is then forwarded according to
FIB as indicated by the red arrow
on both figures. Upon the arrival of this interest to
, the SCN for
, it first verifies whether
has already an entry in the popularity table PT. It is indeed the case, as shown in both Figures (a count of 2 in
Figure 2, and a count of 3 in
Figure 3).
then increments it, and then checks whether a copy of the content name
is already in its CS. Here, we distinguish two different cases, illustrated, respectively, in
Figure 2 and
Figure 3.
Figure 2 illustrates the case where the requested content
is to be fetched from its producer as there is no copy of it already cached within the cluster.
first exchanges Interest.name1 and Interest name2 and then forwards the Interest using its FIB. The Interest is consequently forwarded to
and then to
and so on up until reaching the producer. The insertion of the name of the SCN
in Interest.name2 is meant to avoid the caching of the content on its way back from the producer to the consumer. The only router allowed to cache a copy is this SCN. When receiving the Interest, the producer responds by sending the requested content after putting the content name (
) in Data.name1 and the SCN name
, which has been received in Interest.name2, in Data.name2. As such, the Data packet will follow the reverse path towards consumer
as prescribed by the PITs. On its way to
, the Data packet passes by its SCN
, which consults its popularity table PT to verify whether this name
has a popularity count greater than or equal to the PF-ClusterCache threshold (
; see Algorithm 4). As this is the case,
first caches a copy in its CS, as illustrated in green in
Figure 2, and then forwards the Data packet towards consumer
according to its PIT.
The second case is when there is indeed a cached copy of the requested content in the CS of the SNC
. This is illustrated in
Figure 3. Node
responds with its cached copy, which contains the content name in Data.name1. The Data packet follows the reverse path towards consumer
, as indicated by the PITs of
, and
. This is illustrated in
Figure 3 by green arrows
and
successively.
Algorithm 2 illustrates the PF-ClusterCache strategy at any given router in the network upon the arrival of an Interest packet. The router first checks whether the Interest comes directly from an attached consumer. In the affirmative, it applies the predefined hashing function H to map the content name (Interest.name1) to its Surrogate Caching Node (SCN). Then, it places the received content name in Interest.name2 and the SCN name in Interest.name1. Then, it forwards the Interest normally according to its FIB. Since Interest.name1 contains the SCN name, the Interest will consequently be directed toward the SCN router. However, if the incoming Interest is not from an attached consumer, the receiving router checks whether it itself is the designated SCN. In the affirmative, it first verifies whether Interest.name2 (which is the name of the requested data) has already an entry in the popularity table PT. If so, it increments its count; otherwise, it creates a new entry with a popularity count equal to 1. Then, it performs either of the above-described cases depending on whether or not its CS contains a cached copy of the requested data content. Recall that the popularity table PT is managed as a finite-capacity LPF storage system. Algorithm 2 calls Algorithm 3 for this very purpose. Furthermore, note that the last step in Algorithm 2 represents the case of a receiving router that is neither the SCN nor the consumer of the requested data content. This type of router normally forwards the Interest according to its FIB. This ensures that Interest forwarding is always accomplished using Interest.name1 in synergy with the specification of the Network Forwarding Daemon of the NDN.
Algorithm 2: PF-ClusterCache: Interest Forwarding Algorithm. |
: Popularity Table : Content Store : Pending Interest Table : Forwarding Information Table H: Hash function : Surrogate caching node : Interest Packet : Interest Forward Decision - 1:
if Received from consumer then then - 2:
SCN = H(.name) - 3:
.name2 = .name1 - 4:
.name1 = SCN - 5:
.setMustBeFresh() - 6:
Forward the Interest according to FIB - 7:
else - 8:
if Receiving-Node-Id=.name1 then then - 9:
AddOrIncrementPopCount(.name2) - 10:
if Cached copy in CS then - 11:
Send back data packet to consumer according to PIT - 12:
else - 13:
temp = .name1 - 14:
.name1 = .name2 - 15:
.name2 = temp - 16:
Forward according to FIB - 17:
end if - 18:
else - 19:
Forward according to FIB - 20:
end if - 21:
end if
|
Algorithm 3: PF-ClusterCache: Popularity update Algorithm. |
: Popularity Table : Interest Packet : Insert or increment the popularity count of name prefix in PT - 1:
functionAddOrIncrementPopCount() - 2:
if PT.Find(.name) then - 3:
PT[.name].PopCount++ - 4:
else - 5:
PT.Add(.name) - 6:
PT[.name].PopCount - 7:
end if - 8:
end function
|
Algorithm 4 illustrates the strategy of PF-ClusterCache at any router of the network upon the arrival of a Data packet. A Data packet contains the two names: the content name in Data.name1 and its Surrogate Caching Node in Data.name2. The producer, before sending back the data packet, inserts these two names. Recall that the producer obtains the SCN name from Interest.name2 of the requesting Interest. The Data packet travels backward following the reverse path indicated by the PITs of traversed routers until reaching the SCN router, where a copy of it is cached if its popularity count is greater than or equal to the PF-ClusterCache predefined popularity threshold. Then, the data packet pursues its path towards the consumer. It is here worth noting that the unique router authorized to cache a copy of the Data packet is its designated SCN whose name is prescribed in Data.name2 of the received Data packet. This enforces a zero redundancy within local clusters and amounts to much better diversity within the network, yet the caching is only performed at the edge of the network, leaving the core routers for the sole role of named packet switching.
Algorithm 4: PF-ClusterCache: Data Forwarding and Caching Decision Algorithm. |
: Popularity Table : Pending Interest Table : Predefined Popularity Threshold : Data Packet with .name1 set to content name prefix and .name2 set to SCN name : Caching of is only possible at its designated SCN - 1:
SCN = .name2 - 2:
if SCN = nodeID then - 3:
if is valid then - 4:
if PT.GetPopCount(.name) then - 5:
CacheData() - 6:
end if - 7:
end if - 8:
end if - 9:
Send back to the consumer according to PIT
|
4. Performance Evaluation and Comparative Analysis
In this section, we first evaluate the performance of our proposed caching technique, PF-ClusterCache, and then we perform a comparative analysis with three different caching approaches. These are the well-known Leave Copy Everywhere LCE [
27], which is used as a yardstick for comparison in many studies on caching in NDN; the Caching Fresh and Popular Content (CFPC) [
23], which is, similar to PF-ClusterCache, an approach based on both popularity and freshness; and the PoolCache [
29], which is, similar to PF-ClusterCache, a collaborative caching scheme based on pooling the cache storage of several nodes but with no explicit treatment of data popularity and no consideration of data freshness. The performance evaluation and the comparative analysis are conducted by simulation using the ndnSIM simulator [
40,
41].
4.1. Simulation Scenario
Most of the works on caching in NDN used simplistic topologies of a few nodes, which cannot represent an IoT environment. For our simulations, we rather opted for the Transit Stub (TS) network structure, whose properties closely mimic those of a wide-area IoT. The used TS is a 3-level hierarchical topology of interconnected stubs and transit domains. The stubs transport either originating or terminating traffic. The transit domains, however, are intended to efficiently interconnect the stub domains. The TS topology has been used to represent named data networking of things in works such as [
10,
24,
29]. We used the BRITE library for ndnSIM to generate a TS topology integrated within the ndnSIM simulator [
40,
41].
Figure 4 illustrates a graphical example of a 3-level TS topology. The internal green-colored nodes are level 0 backbone routers to which are connected 2 transit domains comprising level 1 nodes colored in yellow, and finally the stubs containing blue- and red-colored nodes. The actual network topology used in our conducted simulations is composed of around 632 nodes in total: a core domain containing 2 level 0 backbone routers, 2 transit domains each containing (on the average) 15 level 1 backbone routers, and 30 stubs per transit domain each containing (on average) 10 routers. Each stub is either connected to 1 or 2 transit nodes. Consumers are connected to nodes of the stubs in one transit domain, while producers are connected to nodes in the stubs or the other transit domain. Twenty producers are spread randomly across the 30 stubs of a transit domain, and 30 consumers each attached to a node in only 3 out of the 30 stubs on the other transit domain.
We used a large catalogue of half a million data. This content is assigned uniformly to the 20 producers so that each producer is the home of twenty five thousand pieces of data content. At each producer, the data content is divided into one unpopular group and zero or more popular groups. This serves to create a mix of popular and unpopular flows within the network and investigate the influence of the unpopular group traffic on the performance of the studied caching strategies. Requests for data content (i.e., Interests) are generated for each group and each consumer according to a Poisson distribution. However, the requested data content is either selected randomly if it belongs to the unpopular group or selected according to a Zipf probability distribution if it belongs to a popular group. The cumulative submitted traffic rate to the network depends on the number of data content groups used in the simulation. The traffic rate submitted per consumer is defined as the number of requests submitted per second per consumer and denoted by . The traffic rate per consumer per group of data content is denoted by .
We shall investigate the effect of the cache size on the performance of our selected caching strategies. However, nodes within the network are assumed to have the same constant cache size. The popularity table adjustment and smoothing process is performed periodically each freshness period using EWMA. The simulation parameters that have been used are summarized in
Table 1. The question remains as to the value at which to fix the popularity threshold of PF-ClusterCache. Recall that we are using a large catalogue size of half a million content. As such, the chance to request the same content more than once within its freshness period is considerably low, unless the content is popular. As a result, we fix the PF-ClusterCache popularity threshold to 2. Interestingly, the used LPF replacement technique is forced to retain the most popular data content in the caches. Recall that LPF evicts content from the CS if it only has a lower popularity count than the new arriving candidate.
4.2. Evaluation Metrics
To ascertain the efficiency of the proposed PF-ClusterCache and position its performance against the selected caching strategies, we consider the server hit reduction ratio, the average retrieval delay, the hop reduction ratio, and the number of eviction metrics.
A server hit occurs when an Interest could not be answered by an intermediate node and is rather answered by its producers. A cache hit ratio occurs when an Interest is answered by an intermediate router along the path to the server or producer. The server hit reduction ratio, or equivalently the cache hit ratio, represents the reduction of the rate of access to the server. The server hit reduction ratio is then the ratio of the total number of Interests answered by intermediate nodes (by caches) to the total number of generated Interests. It stands as one of the most fundamental used metrics to evaluate caching performance in NDN. Equation (
2) gives this metric, where
N represents the number of consumers,
stands for the number of requests sent by consumer
i and satisfied by the server, and
is the number of requests sent by consumer
i and satisfied by an intermediate router along the path to the producer.
The content average retrieval delay represents the average time duration between the generation time instant of an Interest packet and the time instant of receiving the corresponding Data packet. The content may be sent by an intermediate caching node or ultimately by its producer. The content average retrieval delay is calculated by Equation (
3):
where
stands for the number of interests generated by node
i,
denotes the retrieval delay taken by interest number
r generated by consumer
i, and
N represents the number of consumers.
The hop count is another basic and important performance metric that measures the number of hops traversed to satisfy an Interest by either an intermediate cache or ultimately by its producer. The average hop count is then the average number of hops needed to answer the generated Interests. It is given by Equation (
4):
where
stands for the number of interests generated by node
i,
denotes the number of hops traversed by interest number
r generated by consumer
i until being satisfied either by an intermediate cache or its producer, and
N represents the number of consumers.
The cache eviction represents the total number of evicted data content from all network nodes during the simulation time. A content eviction takes place at a given node when an arriving piece of data eligible for caching finds the node cache full. In this case, data content has to be evicted to cache instead the newly arriving content. Recall that for the proposed LPF, the eviction happens only if the cache is full and the incoming packet has a higher popularity count than the least popular cached data content, which is evicted. The eviction rate is calculated using Equation (
5):
4.3. PF-ClusterCache Performance Analysis
Different parameters directly impact the performance of PF-ClusterCache and any other caching scheme. We concentrate here on the performance of PF-ClusterCache and we present a comparative analysis with other caching schemes in the next section.
To ascertain the quality and efficiency of PF-ClusterCache, we only consider popular traffic but using different numbers of content groups to generate different traffic flows. Mixing popular and unpopular traffic flows will be considered in the next section in the comparative analysis. We shall vary the number of flows from 1 to 40 popular flows. We shall also consider different parameter values for the popularity parameter of the Zipf distribution function to manipulate different popularity levels. The size of the Content Store plays an important role and shall be varied to investigate its impact. We shall also investigate different freshness values. Recall that we fixed the PF-ClusterCache popularity threshold to 2 and the EWMA period, for smoothing the popularity counts, to the same value as that of the freshness period. The traffic intensity submitted to the network depends on the number of consumers, the number of flows, and the simulation time. Recall from
Table 1 that we are considering 30 consumers and a traffic rate per flow per consumer
Interests per second, which amounts to a large number of generated Interests equal to 750,000. We performed ten different simulation replications for each considered scenario. The obtained 95% confidence intervals are very small and we chose not to sketch them on the different curves for better clarity of the figures.
Figure 5 illustrates the server hit reduction (SHR) as a function of the CS size and for different numbers of popular traffic flows. Here, we used a Zipf parameter equal to 1.6 and a freshness period equal to 50 s. A remarkable server hit reduction above 90% is obtained when using only one traffic flow. The SHR decreases with the increase in the number of traffic flows and this decrease becomes smaller as the the CS size becomes larger. Recall that PF-ClusterCache regards the caches of the different nodes of the local cluster as a unique distributed pool of storage. This pool of caches is shared among the different traffic flows generated by consumers connected to the nodes of this cluster. Moreover, the underlying LPF replacement policy enforces that only the most popular data content of these various traffic flows can persist within the pool for the duration of its freshness. At the extreme, when there is only one popular traffic flow, the most popular data content of this flow obtains the entire pool, which is large enough to host it without any LPF eviction. As we can observe from
Figure 5, even a CS size equal to 5, which is a pool of 50 places, is enough to attain the maximum SHR value. More traffic flows require a larger pool and a larger CS size. For a number of flows equal to 10, we need a CS size larger than, say, 20 to approach the highest SHR value. Freshness also plays a role as any cached content becomes stale when its validity expires, and if requested, must be retrieved from its producer. Therefore, PF-ClusterCache provides a different maximum SHR for different numbers of flows; the smaller the number of streams (or flows), the higher the maximum attainable SHR.
Figure 6 depicts the SHR as a function of the Zipf popularity parameter and for various numbers of popular traffic flows. Here, also the freshness of data content is fixed to 50 s. We observe the great efficiency of PF-ClusterCache as the content popularity becomes larger. An increase in the Zipf popularity parameter provides higher frequencies for the most popular content, which restrains the range of this most popular content, and consequently less caching storage is required. In this figure, the CS size is fixed to 10, which is adequate for few traffic streams. However, this CS capacity is small to deliver great SHR for low Zipf popularity parameter values when a large number of traffic streams are deployed. A low Zipf parameter value amounts to a broadening of the range of popularity among data content. The CS size plays an important role, as was discussed earlier and illustrated in
Figure 5.
Table 2 provides the server hit reduction for different freshness period values and for different numbers of traffic streams. The content freshness impacts the server hit reduction ratio as cached popular content should be retrieved directly from its producers once it becomes stale (expired freshness). This is repeated every freshness period and for every piece of cached content. As such, the server hit reduction ratio decreases as the freshness period becomes lower. However, the impact of the freshness on the SHR remains relatively limited as the number of retrieved content due to freshness expiration is very small relative to the number of requested data content by consumers connected to the same cluster. For instance, for a CS size equal to 10, we obtain an increase of only 0.014 in the SHR when increasing the freshness period from 10 s to 250 s. The largest increase in the SHR is obtained when a CS size equal to 40 is used.
Figure 7 depicts the content average retrieval delay as a function of CS size for different Zipf popularity parameter values. This figure clearly shows the great efficiency of the proposed PF-ClusterCache, which requires a very short average delay to retrieve popular content. The content average delay decreases as content popularity increases thanks to the underlying LPF that keeps the the most popular content in the cache. The content average delay also decreases as the CS size increases. Recall from
Figure 5 that increasing the CS size provides better SHR. The PF-ClusterCache is very efficient for IoT and delay-sensitive applications, as it requires a very small content retrieval delay for popular content, even with a small CS size.
Figure 8 depicts the average hop count to retrieve popular content as a function of CS size and for different values of the Zipf parameter. Recall that content may be retrieved from its surrogate node or ultimately from its producer. The strict minimum content average hop count is obtained when the most popular content has infinite freshness (i.e., the freshness value is greater than the simulation time). In such a case, the most popular content resides in the pool of caches of the cluster and therefore the average hop count is equal to the average path length from consumers to SNCs with the same cluster.
Figure 7 and
Figure 8 clearly illustrate the great efficiency of the proposed PF-ClusterCache. With a small but sufficient CS size, the most popular content is kept near the edge of the network, thus yielding a very small average retrieval delay.
Figure 9 illustrates the eviction rate as a function of CS size and for different values of the Zipf parameter. First, we observe that the eviction rate is higher for lower values of the Zipf parameter for all considered values of the CS size. Recall that a lower value of the Zipf parameter allows a broader range of popular content and therefore requires more CS space; otherwise, it leads to more evictions. Second, the eviction rate has remarkable behavior as it starts increasing with the CS size until attaining its maximum, and then declines with a further increase in the CS size. This is essentially due to the specifics of the underlying used LPF replacement scheme. The LPF does not allow a replacement (and therefore an eviction) unless the incoming data content has a higher popularity count than the least popular data in cache. As such, the CS of any SCN within the cluster is always occupied by the most popular data. As a result, when the CS size is small, only the most popular content is cached and therefore no replacement is permitted. As the CS size becomes larger, more popular content (or, equivalently, more content with a lower popularity count) is cached and consequently more replacements can take place. When we reach a CS size sufficient to cache all of the most popular content, the eviction rate becomes very small as virtually all requested content is retrieved from its SCNs within its cluster. Once again, PF-ClusterCache is very efficient, with a rather reduced CS size that should be enough to allow the most frequently requested (most popular) data content to be cached at its SCNs. PF-ClusterCache caches requested popular content near the edge of the network. It does not allow caching at the core routers, leaving them as fast name-based forwarding structures.
Now, we turn to investigating the required maximum popularity table size as a function of the Zipf parameter value. Recall that PF-ClusterCache uses a finite predefined size of the popularity table at each SCN. Recall also that the popularity table is managed as LFU finite storage. An Interest that finds the popularity table of its SCN full and whose name prefix does not have an entry replaces the oldest, least frequently used prefix name found in the table.
Table 3 shows, for 10 popular traffic streams, the maximum popularity table size needed (used in the simulation) for different values of the content popularity. No more than a few hundred places are needed, even for low Zipf parameter values.
4.4. PF-ClusterCache Comparative Analysis
Within a network, we usually have a mix of popular and unpopular traffic. In the previous section, we ascertained the efficiency of the proposed popularity and freshness-aware caching scheme, PF-ClusterCache, using only popular flows. Indeed, any SCN in PF-ClusterCache discovers popular data content using its popularity table, and allows only the most popular content to be cached thanks to the underlying LPF replacement scheme. PF-ClusterCache does not cache unpopular content, and even low-popularity content may not have a chance to become cached unless the CS size allows it.
When considering a mix of popular and unpopular flows, the question arises as to the size of the popularity table used at each SCN. Recall here that for each arriving Interest, an SCN verifies whether the requested content name has already an entry in the table. In the affirmative, its count is incremented; otherwise, it is inserted with a count set to one. This is done for every incoming interest, whether the requested content is popular or not. The remarkable fact is that this does not necessitate a larger size for the popularity table. The LFU strategy of the table replaces the old, least frequently used name, which is normally that for non-popular content.
The comparative analysis of PF-ClusterCache is performed with three different caching approaches: the well-known Leave Copy Everywhere (LCE) [
27], customarily used as a yardstick for comparison in many studies on caching in NDN; the Caching Fresh and Popular Content (CFPC) [
23], which is based on both popularity and freshness; and the PoolCache [
29], which is a collaborative caching scheme based on pooling the cache storage of several nodes but with no explicit treatment of data popularity and no consideration of data freshness. These three caching policies use the Least Recently Used (LRU) replacement scheme.
We divide the content catalogue into 10 groups of 50,000 pieces of content each. Requested content from each group is either selected randomly to generate unpopular traffic (unpopular flow) or using the Zipf probability distribution to generate popular traffic (popular group or flow). Interests are generated at each consumer and for each group of flows according to a Poisson distribution with parameter requests per second.
Figure 10 depicts the server hit reduction ratio (SHR) for the popular traffic as a function of the number of popular flows among the 10 flows and for the different considered caching approaches. PF-ClusterCache clearly outperforms all the other approaches. LCE performs the worst as it maintains copies everywhere, which amounts to excessive evictions and replacements, resulting in cache misses and lower server hit reduction. In CFPC, content popularity control and verification is only performed when a cache is full. This allows it to intermittently cache unpopular content, resulting in a high cache miss and low server hit reduction ratio. PoolCache performs much less successfully than PF-ClusterCache although it uses a similar collaborative caching principle. This is essentially due to its inherent design as an approach that caches both popular and unpopular content. Its inefficiency against CFPC is essentially due to two facts: its underlying LRU replacement scheme that is forced to normally evict and replace unpopular content, and its caching principle, which enforces zero redundancy inside any pool of nodes. PF-ClusterCache clearly outperforms PoolCache thanks to its explicit treatment of content popularity, and its underlying LPF replacement scheme. It is also worth noting in
Figure 10 that the SHR for all the considered caching is insensitive to the considered number of popular flows as it is the SHR for popular traffic only. This SHR, however, is sensitive to the used values of the CS size and the Zipf parameter, as depicted in the next two figures.
Figure 11 illustrates the SHR as a function of CS size for the different considered caching approaches, when using two popular flows among the 10 flows. Again, PF-ClusterCache outperforms all the other considered caching approaches. It is interesting to note that PF-ClusterCache necessitates a much smaller CS size. While PF-ClusterCache delivers its highest SHR even with a CS size of 5, the three other caching approaches require a CS size larger than 40 to attain their maximum SHR.
Figure 12 depicts the SHR of popular traffic as a function of the Zipf parameter value, when using two popular flows among the 10 flows. PF-ClusterCache outperforms all the other approaches and delivers high SHR even for a low value of the Zipf parameter. Recall that the lower the Zipf parameter, the broader the range of popularity within each popular group of content. In turn, the lower the Zipf popularity, the higher the required CS size to attain high SHR ratios. PF-ClusterCache, as it requires a much smaller CS size, delivers much better SHR even at low values of the Zipf parameter for the popular flows.
Figure 13 and
Figure 14 illustrate the average retrieval time of popular content and the average number of hops required to retrieve popular content for the different considered schemes. We clearly observe the efficiency of PF-ClusterCache as it requires less than 50% retrieval delay and hop count compared to the next best approach, PoolCache.
This is a direct result of the popularity-aware caching and replacement of PF-ClusterCache. Although CFPC is also a popularity-aware caching scheme, its performance is rather close to that of LCE. This is primarily due to its method of treating the content popularity, allowing it to cache unpopular content, its underlying replacement scheme, and most importantly allowing it to cache the same content at multiple nodes within the paths towards consumers. As explained earlier, PoolCache and CFPC attain lower server hit reduction and necessitate much a larger CS size. These facts tacitly amount to a much larger retrieval delay and hop count for popular content.
Last but not least,
Figure 15 depicts the number of evicted pieces of content from the two considered popular flows per second using a Zipf parameter value of 1.6 and a CS size of 10. PF-ClusterCache provides the smallest eviction rate as it considers only popular content for caching and uses the LPF replacement scheme. PoolCache provides the second lowest eviction rate, though much larger than that of PF-ClusterCache, as it enforces a zero redundancy within each neighborhood, and thanks also to its LRU, which maintains a good SHR, as shown in
Figure 10.
5. Conclusions
Efficient caching in the network with limited resources remains a significant challenge. The basic founding idea of PF-ClusterCache is to increase the storage available for caching without additional storage and to take into account the freshness and popularity of cached data content.
PF-ClusterCache aggregates caching storage from individual nodes into a global shareable storage to enforce zero caching redundancy across any cluster of nodes. PF-ClusterCache makes it possible to cache much more popular content in clusters and across the network. As a result, it achieved remarkable performance and conserved valuable network resources. Using a mixture of popular and unpopular traffic flows and a three-hierarchy topology, which is a close representation of the IoT network architecture, we compared the proposed scheme with a well-known scheme as LCE and two recent schemes, CFPC and PoolCache. The results showed that PF-ClusterCache outperformed all schemes in all metrics. Moreover, the use of the LPF replacement scheme maintained the efficiency of the caching process by reducing the eviction rate while keeping only the most popular content in the cache, even with a small popularity threshold.
PF-ClusterCache, at the extreme when there is only one node per cluster, behaves similarly to the consumer caching strategy but with the added application of freshness and popularity. However, even with a small number of nodes within the cluster and a limited CS size, the scheme can retain good performance due to the clustering procedure and the added value of popularity. As the LPF replacement scheme keeps only the popular content within the cache, it can be used with other caching schemes to enhance their performance. We aim to study the impact of integrating LPF with caching schemes that can support producer mobility by keeping the most fresh and popular content closer to the mobile producer and reducing the handoff delay impact on the network performance.