PF-ClusterCache: Popularity and Freshness-Aware Collaborative Cache Clustering for Named Data Networking of Things

Alduayji, Samar; Belghith, Abdelfettah; Gazdar, Achraf; Al-Ahmadi, Saad

doi:10.3390/app12136706

Open AccessArticle

PF-ClusterCache: Popularity and Freshness-Aware Collaborative Cache Clustering for Named Data Networking of Things

College of Computer and Information Sciences, King Saud University, Riyadh 11495, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(13), 6706; https://doi.org/10.3390/app12136706

Submission received: 10 June 2022 / Revised: 28 June 2022 / Accepted: 30 June 2022 / Published: 2 July 2022

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Named Data Networking (NDN) has been recognized as the most promising information-centric networking architecture that fits the application model of IoT systems. In-network caching is one of NDN’s most fundamental features for improving data availability and diversity and reducing the content retrieval delay and network traffic load. Several caching decision algorithms have been proposed; however, retrieving and delivering data content with minimal resource usage, reduced communication overhead, and a short retrieval time remains a great challenge. In this article, we propose an efficient popularity and freshness caching approach named PF-ClusterCache that efficiently aggregates the storage of different nodes within a given cluster as global shareable storage so that zero redundancy be obtained in any cluster of nodes. This increases the storage capacity for caching with no additional storage resource. PF-ClusterCache ensures that only the newest, most frequent data content is cached, and caching is only performed at the edge of the network, resulting in a wide diversity of cached data content across the entire network and much better overall performance. In-depth simulations using the ndnSIM simulator are performed using a large transit stub topology and various networking scenarios. The results show the effectiveness of PF-ClusterCache in sharing and controlling the local global storage, and in accounting for the popularity and freshness of data content. PF-ClusterCache clearly outperforms the benchmark caching schemes considered, especially in terms of the significantly greater server access reduction and much lower content retrieval time, while efficiently conserving network resources.

Keywords:

NDN; IoT; in-network caching; data freshness; data popularity; clustering

1. Introduction

The Internet of Things (IoT) is a stimulating and revolutionary ecosystem that is changing our style of living, thinking, and working. In such a smart ecosystem, intelligent objects or things perceive, learn, communicate, and interact in an effective manner to respond to contingencies, issues, and challenges. Many IoT applications already exist, such as e-healthcare, smart cities, and e-education [1]. The recent proliferation of innovative emerging IoT applications and media streaming has led to an unprecedented growth in the amount of generated traffic [2]. Cisco has forecast a global data traffic growth at a rate of 46% per year that would yield around 77.5 exabytes of traffic per month in 2023 [3]. The majority of connected devices are constrained objects, and the majority of traffic concerns the dissemination and retrieval of content. Video content generated from media entertainment streaming applications and IoT devices forms up to 82% of the current total Internet traffic [4]. As a result, the current Internet system architecture, with its heavy TCP/IP protocol stack and its host-centric operating model, is becoming quite problematic and entails various issues. A new vision of the Internet that is rather centered on the data content was proposed a decade ago by Van Jacobson and called Information-Centric Networking (ICN) [5]. Since then, several ICN architectures have been proposed; Named Data Networking, however, has gained momentum and is expected to become a key technology for data dissemination in IoT [6,7,8]. NDN is a receiver-driven and connection-less communication model providing easy and scalable data access, security, energy efficiency, and mobility support [9,10]. NDN’s features make it also a viable solution to satisfy the peculiarities of IoT [11], and to better serve web content distribution [12] and emerging video services [13]. NDN uses content naming and name-based routing, which eliminate the need for IP addressing of the things in the network. This has led to the emergence of NDN-based IoT among the research community [7,14,15,16,17,18]. For instance, Amadeo et al. in [16] provided examples of implementing IoT over NDN, including building management systems, smart home applications, and an access control solution to secure devices in an IoT environment. Meddeb et al. in [7] compared different ICN projects to investigate and evaluate the suitability of these approaches to IoT requirements. Based on a comprehensive qualitative analysis, they showed the suitability of the NDN architecture to meet IoT requirements.

In-network caching is the main feature of the NDN architecture, and is performed as an on-path caching where data packets are cached in nodes along the forwarding path. Many researchers have investigated ways to develop and improve caching schemes for NDN, as in [14,15,19,20]. Caching decision and cache replacement policies are fundamental leverages to provide scalable, coherent, and accurate networking capabilities. Caching schemes can be classified, based on the criteria used for making the decision, into autonomous (or implicit) and collaborative (or explicit) [21]. In autonomous schemes, the caching decision is locally computed as only information about the data in the current node are considered. On the other hand, the caching decision in collaborative schemes is based on collaboration between nodes either on a local or global scale. Collaborative schemes are intended to improve the cache hit rate by increasing the data content diversity within the network, and consequently the content retrieval time. In this paper, we propose an innovative collaborative caching approach that takes into account the content popularity and freshness to deliver a remarkable cache hit ratio and content retrieval time for popular content, while increasing the data content diversity within the network by avoiding any caching redundancy at the edge of the network.

As IoT traffic is transient in nature, data freshness has been considered one of the requirements when implementing IoT over NDN. Many studies addressed this issue with different freshness meanings and for different purposes [22,23,24,25]. It is worth mentioning that the authors in [24] were the first to consider IoT traffic freshness and proposed the use of a statistical prediction model, Auto-Regressive Moving Average (ARMA), to predict the generation instant of the next content for event-based IoT traffic. Moreover, the replacement technique needs to highly prioritize popular data when the cache is full, to avoid a high cache miss rate and, thus, optimize the network traffic. NDN has some well-known replacement strategies, such as Least Frequently Used (LFU) and Least Recently Used (LRU), which implicitly consider data popularity. Data freshness was also considered during the content eviction process, as in the Least Fresh First (LFF) replacement strategy [24]. The authors in [23] proposed an implicit caching approach that combines both data lifetime and popularity. They extended their work in [25] to add the availability factor to cache data for vehicular NDN. This will be used as a comparative approach for our proposed PF-ClusterCache explicit caching scheme.

The most challenging issue remains as to how to control and lower the redundancy in caching with minimum collaboration effort and communication overhead. In this article, we propose PF-ClusterCache a caching approach that shares the storage resources available in a cluster of nodes at the edge of the network and manages these resources as a unique global pooled storage. As such, PF-ClusterCache can enforce zero redundancy within any cluster of nodes, while it caches only the most recent popular data content. Consequently, a greater amount of different content could be cached within the network, achieving a higher caching diversity. PF-ClusterCache only caches at the edge of the network and avoids caching in internal nodes, allowing them to serve as fast name-based routing structures. It uses a simple hashing technique to map a requested content name to a node within a cluster. This tacitly results in a reduced content retrieval time and much lower communication overhead.

Few works have proposed explicit caching strategies based on managing and controlling in a distributed manner the caches of local nodes to decrease the caching redundancy and increase the content diversity. However, none of these collaborative approaches, to our knowledge, have also taken into account the popularity and freshness of the content as with the proposed PF-ClusterCache. This work advances the state of the art by offering a simple and efficient distributed caching technique that explicitly and simultaneously considers content popularity and freshness, while remaining fully compliant with the NDN specifications.

The main contributions of this article can be summarized as follows:

Design of PF-ClusterCache, a novel collaborative popularity and freshness-aware caching policy that operates at the edge of the network. PF-ClusterCache regards the content stores of nodes in a cluster as a unique global distributed storage.
Design of a popularity and freshness-aware replacement technique that needs to cache only the most recent popular content.
Collaboration among nodes of the same cluster is reduced to a minimum without any information exchange by using a simple hashing of content names to surrogate caching nodes within the cluster.
Integration of PF-ClusterCache into ndnSIM and the evaluation of its performance using a large Transit Stub network topology including a large number of consumers and producers to account for different workloads and IoT scenarios.
Benchmarking PF-ClusterCache against other representative caching schemes using a mix of popular and unpopular traffic flows and under various scenarios.

The remainder of the paper is organized as follows: Section 2 provides the necessary background and reviews relevant related works. Section 3 describes the details of the proposed PF-ClusterCache approach, and provides illustrative examples of its operation. Section 4 first presents the performance evaluation of PF-ClusterCache, and details also the various caching evaluation metrics, the used topology, and the simulation scenarios. Then, it presents a detailed comparative analysis with the LCE, PoolCache, and CFPC approaches. Section 5 concludes the paper.

2. Background and Related Work

2.1. NDN Structure and Operation

NDN is a receiver-driven architecture using a pull-based communication model. NDN uses only two types of packets: the Interest packet emitted by a consumer to request a certain piece of data content, and the Data packet as an answer to such an interest, which contains the requested data content. Both packets contain the name prefix (or merely the name) of the data content. Routers (also referred to as nodes) use this content name to route the Interest towards the content provider (hereafter referred to as producer or server). Data packets are transmitted along the reverse paths towards their requesting consumers. Each node in the network has three main components: the Forwarding Information Base (FIB), the Pending Interest Table (PIT), and the Content Store (CS), as shown in Figure 1.

To request content, a consumer issues an Interest packet bearing the content name. The Interest is forwarded based on this name from one router to the next according to the underlying FIBs until it either reaches a router having a cached copy of the requested content or ultimately reaches the producer. At the reception of an Interest, a router first looks up its CS, as depicted in Figure 1. Here, we distinguish two cases. If the router has a cached copy of the requested content in its CS, it sends it back to each of the interfaces registered in the PIT that have requested the content, and then deletes those entries. However, if there is no cached copy in its CS, the router merely forwards the Interest packet to the next hop router according to its FIB, and appends the incoming face of this Interest into the PIT.

As also illustrated in Figure 1, upon the reception of a data packet, a router has first to decide whether or not to cache a copy in its CS, and then forwards back the data packet to all the faces indicated by the corresponding entry in its PIT and eliminates this entry. The decision of whether or not to cache the arriving content depends on the caching strategy in use by the router. The caching strategy may also call a replacement strategy that decides which content to evict from the CS to create a place for the incoming data content.

2.2. NDN Caching Policies

The concept of in-network on-path caching is one of the main and strongest features in NDN. The CS at each router is designed to cache frequently requested content. At each router, the decision has to be taken as to whether or not to cache the incoming packet (caching decision), and in the affirmative, if the CS is full, another decision has to be taken about which content to evict from the CS (replacement decision). The caching and replacement strategies play a pivotal role in the NDN operation and have a great impact on the network performance. An efficient cache management scheme is essential for efficient content distribution, high cache utilization, low content retrieval delay, and a high diversity level. Many caching policies have been proposed for NDN [7,26], the majority of which are not popularity-aware. In fact, normally, only popular content that is frequently requested should be cached. Content that is not popular should not be cached as it will be very seldom asked for. The question remains as to where to cache any given popular content. Leave Copy Everywhere (LCE) [27], for instance, is an approach customarily used in various papers as a yardstick for benchmarking. LCE does not consider the popularity of the incoming content and caches a copy of it in each router along the reverse path towards the consumer. The caching of unpopular content, in addition to caching everywhere along the reverse path, tacitly yields a huge redundancy, which in turn reduces the cache hit rate and increases the eviction rate. On the contrary, our proposed PF-ClusterCache approach is popularity-aware and caches the new content only if it is popular and at only one router. As such, PF-ClusterCache will necessarily attain much higher content diversity and availability, much lower redundancy and eviction rate, and a superior cache hit rate and efficiency.

The question still remains as to where to cache popular content and which content to replace. Caching placement stands as one of the most important decisions that needs to be carefully deployed to maintain network efficiency while enhancing caching performance. Fayazbakhsh et al. in [28] showed that caching data at the edge of the network can preserve the benefit of in-network caching without overloading the network with unnecessary replicas of cached content. Meddeb et al. in [24] proposed a close to consumer caching scheme for NDN/IoT that, as the name indicates, caches only at nodes connected to consumers. Results showed that the proposed scheme decreased the caching cost while improving the overall network performance. The authors of [29] introduced PoolCache, where they forced caching at the edge of the network and disallowed caching at internal nodes of the network. PoolCache presents a good level of diversity and zero redundancy within each pool of routers. However, most of the proposed approaches do not explicitly consider the popularity of the incoming content and may therefore unnecessarily cache unpopular content. PF-ClusterCache, as with PoolCache, is also a caching approach that caches only at the edge of the network. PoolCache will be used as a benchmarking approach for PF-ClusterCache.

All aforementioned cited caching approaches but PoolCache are implicit according to the definition provided by Zhang et al. in [21], in which the caching decision is only based on the node state, regardless of those of other neighboring nodes. Explicit caching approaches are rather collaborative approaches in which the caching decision is taken in collaboration and coordination with other nodes, either on a local or global scale. Such collaboration is essentially meant to reduce the caching redundancy and increase the content diversity throughout the network, albeit at the price of higher traffic and computing overhead. PF-ClusterCache is a collaborative popularity-aware caching scheme achieving a low overhead, high diversity, and very low redundancy.

The authors in [30] proposed a collaborative caching scheme named Content-space Partitioning and Hash-Routing (CPHR), aiming to decrease the server hit rate by partitioning the content space and assigning partitions to caching nodes. The content space is implicitly assumed to be constant. The assignment problem is first formulated as an optimization problem that maximizes the global cache access rate. Next, the content space partitioning is formulated as a min–max linear optimization problem that balances cache loads. Using simulations, they showed that CPHR can double the overall cache hit rate, but at the same time increases the average retrieval latency. Unfortunately, they did not include any comparison with other known caching schemes. The proposed caching requires additional content tables to maintain caching states and expensive mechanisms to maintain table consistency. The periodic updating of the various used data structures and the complexity of the partitioning and assignment problems might hinder its viability to be used in IoT environments.

The authors in [31] proposed a cluster-based caching technique aiming to reduce caching redundancy. Using an improved K-medoids clustering algorithm, they proposed to partition the complete network into a number of clusters. Then, they proposed the use of a Virtual Distributed Hash Table (VDHT) to efficiently manage content cached in each cluster. Moreover, they differentiate between the forwarding within a cluster and the forwarding among clusters. Using simulations, they find that their proposed caching scheme outperforms both LCE and the Probability caching technique (ProCache) [32] in terms of overall cache hit and overall link loads. However, the overhead of the K-medoids clustering algorithm as well as the caching overhead significantly increase as clusters grow in size. In addition, there is no differentiation between edge and internal nodes. The comparative analysis is restricted to LCE and ProCache and uses only two performance metrics.

The authors in [33] proposed an effective cache distribution and search scheme based on hashing. Flooding is used to periodically update the FIBs. This unfortunately results in a huge communication overhead. They use simulations using a very simple grid network topology and do not present any comparative analysis with other caching schemes.

The authors in [34] proposed a two-layer cluster-based caching scheme. They used (WCA) to cluster the entire network. The clusterhead in each cluster is responsible for taking caching decisions based on a centralized probability matrix. The periodic exchange of information about cached content generates a high communication overhead. The scheme has a reliability issue as the breakdown of a clusterhead jeopardizes the complete cluster caching mechanism.

In [35], the authors aimed to increase the diversity of cached content. They proposed to share cache summaries among neighboring routers. They defined a summary packet using a bloom filter along with a technique to share the summaries. At the reception of a data packet, a router may decide to cache it depending on the caching summaries already received from its neighboring routers. Moreover, based on these caching summaries, a router forwards an incoming Interest to the neighboring router having the requested content. The technique necessitates an involved cooperation and requires the persistent maintenance of summaries, which amounts to a large communication overhead in addition to some involved data structures.

Another explicit scheme has been proposed in [36], which combines global content popularity, local content popularity, and node popularity. Overall popularity is calculated periodically across all nodes using EWMA, while node popularity is calculated based on the total number of Face–Hop combinations for each PIT entry. Since popularity information is exchanged in a short periodic cycle, this poses a scalability problem, especially when dealing with a complex network, as is the case in IoT.

Zhang et al. in [22] proposed a cooperative caching (LCC) decision scheme based on the lifetime of IoT data. Depending on the lifetime of the data and the request rate, the caching decision is made. Further, the nodes have been categorized into three types: edge nodes, which reside next to consumers, mid-tier nodes, and root nodes, which are next to producers. Each node type has different dynamic caching thresholds that change based on the request rate, so long-lived data packets are cached near their producers.

In [23], the authors proposed the Caching Fresh and Popular Content (CFPC) approach. It essentially aimed to implicitly cache the most recent and popular data at intermediate nodes. CFPC is based on how many requests are received for each piece of data content during each period and at each router. Both popularity and freshness thresholds are dynamically updated over time. The computed thresholds are only used when the content store is full, to decide whether or not to cache the incoming content. As such, fresh data content with lower popularity can still be cached when there are still free places in the CS. More importantly, the CFPC may require an exorbitantly large data structure to maintain the popularity of requested content, especially when there is a mix of popular and unpopular traffic flows. This poses a scalability issue. Our proposed PF-ClusterCache, on the contrary, operates with a reduced fixed-size popularity table at each router. CFPC will serve as a benchmarking approach for our proposed PF-ClusterCache. The same authors proposed DANTE [25], a revised version of CFPC that is adapted to vehicular named data networks. They leveraged the broadcast nature of the wireless support to cache different content at nearby nodes and consequently improve the caching diversity. The broadcast nature of the wireless medium allows each vehicle to infer what content is available in its vicinity. Diversifying the caching of different content among nearby vehicles limits the caching redundancy and results in better network performance. DANTE necessitates a wireless network and is tailored and dedicated to vehicular NDN. PF-ClusterCache, on the contrary, is a more general caching approach that works regardless of the nature of the underlying communication technology.

3. PF-ClusterCache: Popularity and Freshness-Aware Collaborative Cache Management

PF-ClusterCache is a collaborative caching technique to manage and control the caches of a set of nodes as a global distributed storage pool. As such, PF-ClusterCache controls one large global Content Store, which is partitioned among the different nodes of an edge local cluster. In NDN, the size of the Content Store (CS) at each node is very limited compared with the huge amount of available named content (i.e., the catalogue size). The ratio of the cache size over the catalogue size should be within the interval

[10^{- 5}; 10^{- 1}]

, as suggested in [37]. PF-ClusterCache then represents a new way to have a much larger CS. This tacitly equates to better performance, as will be explored in the next section.

PF-ClusterCache performs the caching of only popular and fresh retrieved content. It forces the caching to be done only at the edge of the network; core routers are then left as a fast switching fabric. Unlike the caching scheme proposed in [23], PF-ClusterCache caches popular content in only one node within the local cluster. A zero redundancy is then achieved within any cluster of nodes. The local caching node of any requested content, hereafter called the Surrogate Caching Node (SCN), is selected according to a hashing function applied on the requested content name. The hash function maps any requested content name to a node ID within the cluster. This node designates the unique SCN where the requested content might be cached or is to be cached if it turns out to be fresh content and of sufficient popularity. The hash function has to evenly distribute the requested content among the different nodes of the cluster. For the sake of this paper, we consider stable clusters where node membership is rarely changed. Clusters could be defined in many different ways. A cluster defines a set of nodes that are interconnected and geographically close to each other. Network service providers, knowing their needs, may define the various clusters in an ad hoc manner according to the requirements of different applications and working environments (e.g., enterprise, video streaming, schools, etc.). The underlying network topology being flat or hierarchical should not hinder the designation of nodes within the same cluster as long as these nodes are interconnected and within the same geographical neighborhood. However, certain network topologies, such as the Transit Stub (TS) topology, may tacitly designate the stubs as clusters. It is worth noting that not all nodes within a cluster should be members of the PF-ClusterCache. We may also adopt a clustering or partitioning algorithm to decide the clusters or the partitions [33,34,38]. However, doing so is only profitable for small networks and is not only time-consuming but also does not exclude network core routers to serve as caching nodes.

The system design of PF-ClusterCache requires the identification of the current popularity of each piece of requested content. At each SCN node, a popularity table is defined to maintain the current popularity of the requested content within the cluster that is directed to this SCN. Unlike [23], where the table size may grow to very large, inadequate values, PF-ClusterCache uses rather a very efficient popularity table of a very small fixed size. This popularity table is maintained as a Least Frequently Used (LFU) storage, permitting it to insert a content name, retrieve a content name, or evict (delete) a content name. The table can be easily and efficiently implemented using a min heap data structure and a standard collision-free hash map. The min heap is maintained based on the popularity counts of the content names, and the hash table is indexed via the content names. Since all operations on a collision-free hash map are of order O(1), it follows that the time complexity of the insert, retrieve, and evict operations is O(log n), where n is the actual number of stored names capped at the size of the table. Upon the arrival of an Interest packet with a given content name, the SCN node consults its popularity table. If the content name already exists in the table, it increments its popularity count. However, if the name is not there, then the SCN inserts the name in the table. This insertion may first invoke the eviction of the oldest, least frequently used item (the oldest, least popular content name) from the table. The insertion, retrieve, and evict operations could be done in only O(1) time complexity. This can be achieved by maintaining two doubly linked lists: a doubly linked list on the popularity count and another for all name prefixes that have the same popularity count [39].

To keep the popularity table current, it is periodically adjusted and smoothed using the Exponentially Weighted Moving Average (EWMA) algorithm. At the end of each time interval

i, i = 1, 2, 3, \dots

, the popularity count of each name prefix n currently in the table

P o p C o u n t_{i} (n)

is updated as follows:

P o p C o u n t_{i} (n) = α * I R_{i} (n) + (1 - α) * P o p C o u n t_{i - 1} (n) i \geq 1

(1)

where

α

is a predefined weight parameter

0 \leq α \leq 1

,

I R_{i} (n)

is the number of interest packets with name prefix n received by the SCN during the current time interval i, and

P o p C o u n t_{i - 1} (n)

is the calculated popularity count during the previous time interval.

PF-ClusterCache defines a popularity threshold used to decide whether to cache incoming data packets at their designated SCNs. Incoming data content is considered for caching in the SCN content store only if it is valid and its popularity exceeds the defined popularity threshold. Then, the data content is passed to the replacement technique to see whether there is a free place in the CS or content with lower popularity to be evicted. The PF-ClusterCache replacement technique is based on popularity and called the Least Popular First Replacement policy (LPF). LPF ensures that all admitted data content is valid (i.e., fresh), and that only the most popular content occupies the CS of any SCN.

Algorithm 1 illustrates the proposed LPF replacement scheme. The incoming popular data content is automatically stored in the CS if there is still space. Otherwise, it is cached only if its popularity count is greater than that of the least popular content currently in the CS.

Algorithm 1: Least Popular First Replacement Scheme.

P T

: Popularity Table

C S

: Content Store

P I T

: Pending Interest Table

I N P U T

: popular data packet

p k t

O U T P U T

: Caching decision in CS and Forwarding according to PIT

1:: functionCacheData( $p k t$ )
2:: $i n c o m i n g$ = $p k t$
3:: if $p k t$ is valid then
4:: if CS is full then
5:: $i n c o m i n g . p o p$ =PT.GetPopCount( $p k t . n a m e p r e f i x$ )
6:: $l p c$ = $L e a s t P o p u l a r D a t a i n C a c h e$
7:: $l p c . p o p$ =PT.GetPopCount( $p c . n a m e p r e f i x$ )
8:: if $i n c o m i n g . p o p \geq l p c . p o p$ then
9:: CS.erase( $l p c$ )
10:: end if
11:: end if
12:: CS.insert( $i n c o m i n g$ )
13:: Forward $p k t$ according to PIT
14:: end if
15:: end function

The system design of PF-ClusterCache necessitates the addition of only one additional field for a second name within the NDN packet format of both Interest and Data. These names shall hereafter be referred to as Pkt.name1 and Pkt.name2. As such, each Interest or Data packet carries both the requested content name and its associated SCN name. The Interest forwarding as well as the Data forwarding are always accomplished using Pkt.name1. No additional field is required in the FIB or in the PIT; namely, the forwarding of Interest and Data packets is done in exactly the same way as specified by the NDN Network Forwarding Daemon (NFD). PF-ClusterCache has no specific requirements for the underlying routing, which could well be the Named Data Link State Routing Protocol (NLSR) or any other routing protocol.

As illustrated in Figure 2 and Figure 3, we consider a local cluster composed of six nodes (or routers)

R 1

,

R 2

,

R 3

,

R 4

,

R 5

, and

R 6

. Consumer C2 sends a request (an Interest) to its edge router

R 1

asking for content name

D a t a 1

, as indicated by the red arrow

(1)

in both figures.

R 1

starts by hashing the requested content name

D a t a 1

to find out the corresponding Surrogate Caching Node (SCN) within the local cluster. The hashing function maps

D a t a 1

to

R 3

as the unique SCN where

D a t a 1

could be cached. Router

R 1

then takes the requested content name (

D a t a 1

) from Interest.name1 and puts it in Interest.name2. Then, it places the designated SCN

R 3

name (

R 3

) in Interest.name1. The Interest is then forwarded according to

R 1

FIB as indicated by the red arrow

(2)

on both figures. Upon the arrival of this interest to

R 3

, the SCN for

D a t a 1

, it first verifies whether

D a t a 1

has already an entry in the popularity table PT. It is indeed the case, as shown in both Figures (a count of 2 in Figure 2, and a count of 3 in Figure 3).

R 3

then increments it, and then checks whether a copy of the content name

D a t a 1

is already in its CS. Here, we distinguish two different cases, illustrated, respectively, in Figure 2 and Figure 3.

Figure 2 illustrates the case where the requested content

D a t a 1

is to be fetched from its producer as there is no copy of it already cached within the cluster.

R 3

first exchanges Interest.name1 and Interest name2 and then forwards the Interest using its FIB. The Interest is consequently forwarded to

R 4

and then to

R 7

and so on up until reaching the producer. The insertion of the name of the SCN

R 3

in Interest.name2 is meant to avoid the caching of the content on its way back from the producer to the consumer. The only router allowed to cache a copy is this SCN. When receiving the Interest, the producer responds by sending the requested content after putting the content name (

D a t a 1

) in Data.name1 and the SCN name

R 3

, which has been received in Interest.name2, in Data.name2. As such, the Data packet will follow the reverse path towards consumer

C 2

as prescribed by the PITs. On its way to

C 2

, the Data packet passes by its SCN

R 3

, which consults its popularity table PT to verify whether this name

D a t a 1

has a popularity count greater than or equal to the PF-ClusterCache threshold (

P o p T h r e s h o l d

; see Algorithm 4). As this is the case,

R 3

first caches a copy in its CS, as illustrated in green in Figure 2, and then forwards the Data packet towards consumer

C 2

according to its PIT.

The second case is when there is indeed a cached copy of the requested content in the CS of the SNC

R 3

. This is illustrated in Figure 3. Node

R 3

responds with its cached copy, which contains the content name in Data.name1. The Data packet follows the reverse path towards consumer

C 2

, as indicated by the PITs of

R 3

, and

R 1

. This is illustrated in Figure 3 by green arrows

(3)

and

(4)

successively.

Algorithm 2 illustrates the PF-ClusterCache strategy at any given router in the network upon the arrival of an Interest packet. The router first checks whether the Interest comes directly from an attached consumer. In the affirmative, it applies the predefined hashing function H to map the content name (Interest.name1) to its Surrogate Caching Node (SCN). Then, it places the received content name in Interest.name2 and the SCN name in Interest.name1. Then, it forwards the Interest normally according to its FIB. Since Interest.name1 contains the SCN name, the Interest will consequently be directed toward the SCN router. However, if the incoming Interest is not from an attached consumer, the receiving router checks whether it itself is the designated SCN. In the affirmative, it first verifies whether Interest.name2 (which is the name of the requested data) has already an entry in the popularity table PT. If so, it increments its count; otherwise, it creates a new entry with a popularity count equal to 1. Then, it performs either of the above-described cases depending on whether or not its CS contains a cached copy of the requested data content. Recall that the popularity table PT is managed as a finite-capacity LPF storage system. Algorithm 2 calls Algorithm 3 for this very purpose. Furthermore, note that the last step in Algorithm 2 represents the case of a receiving router that is neither the SCN nor the consumer of the requested data content. This type of router normally forwards the Interest according to its FIB. This ensures that Interest forwarding is always accomplished using Interest.name1 in synergy with the specification of the Network Forwarding Daemon of the NDN.

Algorithm 2: PF-ClusterCache: Interest Forwarding Algorithm.

P T

: Popularity Table

C S

: Content Store

P I T

: Pending Interest Table

F I B

: Forwarding Information Table

H: Hash function

S C N

: Surrogate caching node

I N P U T

: Interest Packet

p k t

O U T P U T

: Interest Forward Decision

1:: if Received from consumer then then
2:: SCN = H( $p k t$ .name)
3:: $p k t$ .name2 = $p k t$ .name1
4:: $p k t$ .name1 = SCN
5:: $p k t$ .setMustBeFresh( $t r u e$ )
6:: Forward the Interest $p k t$ according to FIB
7:: else
8:: if Receiving-Node-Id= $p k t$ .name1 then then
9:: AddOrIncrementPopCount( $p k t$ .name2)
10:: if Cached copy in CS then
11:: Send back data packet to consumer according to PIT
12:: else
13:: temp = $p k t$ .name1
14:: $p k t$ .name1 = $p k t$ .name2
15:: $p k t$ .name2 = temp
16:: Forward $p k t$ according to FIB
17:: end if
18:: else
19:: Forward $p k t$ according to FIB
20:: end if
21:: end if

Algorithm 3: PF-ClusterCache: Popularity update Algorithm.

P T

: Popularity Table

I N P U T

: Interest Packet

p k t

O U T P U T

: Insert or increment the popularity count of name prefix in PT

1:: functionAddOrIncrementPopCount( $p k t$ )
2:: if PT.Find( $p k t$ .name) then
3:: PT[ $p k t$ .name].PopCount++
4:: else
5:: PT.Add( $p k t$ .name)
6:: PT[ $p k t$ .name].PopCount $\leftarrow 1$
7:: end if
8:: end function

Algorithm 4 illustrates the strategy of PF-ClusterCache at any router of the network upon the arrival of a Data packet. A Data packet contains the two names: the content name in Data.name1 and its Surrogate Caching Node in Data.name2. The producer, before sending back the data packet, inserts these two names. Recall that the producer obtains the SCN name from Interest.name2 of the requesting Interest. The Data packet travels backward following the reverse path indicated by the PITs of traversed routers until reaching the SCN router, where a copy of it is cached if its popularity count is greater than or equal to the PF-ClusterCache predefined popularity threshold. Then, the data packet pursues its path towards the consumer. It is here worth noting that the unique router authorized to cache a copy of the Data packet is its designated SCN whose name is prescribed in Data.name2 of the received Data packet. This enforces a zero redundancy within local clusters and amounts to much better diversity within the network, yet the caching is only performed at the edge of the network, leaving the core routers for the sole role of named packet switching.

Algorithm 4: PF-ClusterCache: Data Forwarding and Caching Decision Algorithm.

P T

: Popularity Table

P I T

: Pending Interest Table

P o p T h r e s h o l d

: Predefined Popularity Threshold

I N P U T

: Data Packet

p k t

with

p k t

.name1 set to content name prefix and

p k t

.name2 set to SCN name

O U T P U T

: Caching of

p k t

is only possible at its designated SCN

1:: SCN = $p k t$ .name2
2:: if SCN = nodeID then
3:: if $p k t$ is valid then
4:: if PT.GetPopCount( $p k t$ .name) $\geq P o p T h r e s h o l d$ then
5:: CacheData( $p k t$ )
6:: end if
7:: end if
8:: end if
9:: Send back $p k t$ to the consumer according to PIT

4. Performance Evaluation and Comparative Analysis

In this section, we first evaluate the performance of our proposed caching technique, PF-ClusterCache, and then we perform a comparative analysis with three different caching approaches. These are the well-known Leave Copy Everywhere LCE [27], which is used as a yardstick for comparison in many studies on caching in NDN; the Caching Fresh and Popular Content (CFPC) [23], which is, similar to PF-ClusterCache, an approach based on both popularity and freshness; and the PoolCache [29], which is, similar to PF-ClusterCache, a collaborative caching scheme based on pooling the cache storage of several nodes but with no explicit treatment of data popularity and no consideration of data freshness. The performance evaluation and the comparative analysis are conducted by simulation using the ndnSIM simulator [40,41].

4.1. Simulation Scenario

Most of the works on caching in NDN used simplistic topologies of a few nodes, which cannot represent an IoT environment. For our simulations, we rather opted for the Transit Stub (TS) network structure, whose properties closely mimic those of a wide-area IoT. The used TS is a 3-level hierarchical topology of interconnected stubs and transit domains. The stubs transport either originating or terminating traffic. The transit domains, however, are intended to efficiently interconnect the stub domains. The TS topology has been used to represent named data networking of things in works such as [10,24,29]. We used the BRITE library for ndnSIM to generate a TS topology integrated within the ndnSIM simulator [40,41].

Figure 4 illustrates a graphical example of a 3-level TS topology. The internal green-colored nodes are level 0 backbone routers to which are connected 2 transit domains comprising level 1 nodes colored in yellow, and finally the stubs containing blue- and red-colored nodes. The actual network topology used in our conducted simulations is composed of around 632 nodes in total: a core domain containing 2 level 0 backbone routers, 2 transit domains each containing (on the average) 15 level 1 backbone routers, and 30 stubs per transit domain each containing (on average) 10 routers. Each stub is either connected to 1 or 2 transit nodes. Consumers are connected to nodes of the stubs in one transit domain, while producers are connected to nodes in the stubs or the other transit domain. Twenty producers are spread randomly across the 30 stubs of a transit domain, and 30 consumers each attached to a node in only 3 out of the 30 stubs on the other transit domain.

We used a large catalogue of half a million data. This content is assigned uniformly to the 20 producers so that each producer is the home of twenty five thousand pieces of data content. At each producer, the data content is divided into one unpopular group and zero or more popular groups. This serves to create a mix of popular and unpopular flows within the network and investigate the influence of the unpopular group traffic on the performance of the studied caching strategies. Requests for data content (i.e., Interests) are generated for each group and each consumer according to a Poisson distribution. However, the requested data content is either selected randomly if it belongs to the unpopular group or selected according to a Zipf probability distribution if it belongs to a popular group. The cumulative submitted traffic rate to the network depends on the number of data content groups used in the simulation. The traffic rate submitted per consumer is defined as the number of requests submitted per second per consumer and denoted by

λ_{c}

. The traffic rate per consumer per group of data content is denoted by

λ_{g}

.

We shall investigate the effect of the cache size on the performance of our selected caching strategies. However, nodes within the network are assumed to have the same constant cache size. The popularity table adjustment and smoothing process is performed periodically each freshness period using EWMA. The simulation parameters that have been used are summarized in Table 1. The question remains as to the value at which to fix the popularity threshold of PF-ClusterCache. Recall that we are using a large catalogue size of half a million content. As such, the chance to request the same content more than once within its freshness period is considerably low, unless the content is popular. As a result, we fix the PF-ClusterCache popularity threshold to 2. Interestingly, the used LPF replacement technique is forced to retain the most popular data content in the caches. Recall that LPF evicts content from the CS if it only has a lower popularity count than the new arriving candidate.

4.2. Evaluation Metrics

To ascertain the efficiency of the proposed PF-ClusterCache and position its performance against the selected caching strategies, we consider the server hit reduction ratio, the average retrieval delay, the hop reduction ratio, and the number of eviction metrics.

A server hit occurs when an Interest could not be answered by an intermediate node and is rather answered by its producers. A cache hit ratio occurs when an Interest is answered by an intermediate router along the path to the server or producer. The server hit reduction ratio, or equivalently the cache hit ratio, represents the reduction of the rate of access to the server. The server hit reduction ratio is then the ratio of the total number of Interests answered by intermediate nodes (by caches) to the total number of generated Interests. It stands as one of the most fundamental used metrics to evaluate caching performance in NDN. Equation (2) gives this metric, where N represents the number of consumers,

s e r v e r H i t_{i}

stands for the number of requests sent by consumer i and satisfied by the server, and

l o c a l H i t_{i}

is the number of requests sent by consumer i and satisfied by an intermediate router along the path to the producer.

SHR = 1 - \frac{\sum_{i = 1}^{N} S e r v e r H i t_{i}}{\sum_{i = 1}^{N} (S e r v e r H i t_{i} + L o c a l H i t_{i})}

(2)

The content average retrieval delay represents the average time duration between the generation time instant of an Interest packet and the time instant of receiving the corresponding Data packet. The content may be sent by an intermediate caching node or ultimately by its producer. The content average retrieval delay is calculated by Equation (3):

A v e r a g e D e l a y = \frac{\sum_{i = 1}^{N} \frac{\sum_{r = 1}^{R_{i}} T_{i r}}{R_{i}}}{N}

(3)

where

R_{i}

stands for the number of interests generated by node i,

T_{i r}

denotes the retrieval delay taken by interest number r generated by consumer i, and N represents the number of consumers.

The hop count is another basic and important performance metric that measures the number of hops traversed to satisfy an Interest by either an intermediate cache or ultimately by its producer. The average hop count is then the average number of hops needed to answer the generated Interests. It is given by Equation (4):

Average Hop Count = \frac{\sum_{i = 1}^{N} \frac{\sum_{r = 1}^{R_{i}} H_{i r}}{R_{i}}}{N}

(4)

where

R_{i}

stands for the number of interests generated by node i,

H_{i r}

denotes the number of hops traversed by interest number r generated by consumer i until being satisfied either by an intermediate cache or its producer, and N represents the number of consumers.

The cache eviction represents the total number of evicted data content from all network nodes during the simulation time. A content eviction takes place at a given node when an arriving piece of data eligible for caching finds the node cache full. In this case, data content has to be evicted to cache instead the newly arriving content. Recall that for the proposed LPF, the eviction happens only if the cache is full and the incoming packet has a higher popularity count than the least popular cached data content, which is evicted. The eviction rate is calculated using Equation (5):

Eviction Rate = \frac{Total Evicted Data}{Sim Time}

(5)

4.3. PF-ClusterCache Performance Analysis

Different parameters directly impact the performance of PF-ClusterCache and any other caching scheme. We concentrate here on the performance of PF-ClusterCache and we present a comparative analysis with other caching schemes in the next section.

To ascertain the quality and efficiency of PF-ClusterCache, we only consider popular traffic but using different numbers of content groups to generate different traffic flows. Mixing popular and unpopular traffic flows will be considered in the next section in the comparative analysis. We shall vary the number of flows from 1 to 40 popular flows. We shall also consider different parameter values for the popularity parameter of the Zipf distribution function to manipulate different popularity levels. The size of the Content Store plays an important role and shall be varied to investigate its impact. We shall also investigate different freshness values. Recall that we fixed the PF-ClusterCache popularity threshold to 2 and the EWMA period, for smoothing the popularity counts, to the same value as that of the freshness period. The traffic intensity submitted to the network depends on the number of consumers, the number of flows, and the simulation time. Recall from Table 1 that we are considering 30 consumers and a traffic rate per flow per consumer

λ_{g} = 2.5

Interests per second, which amounts to a large number of generated Interests equal to 750,000. We performed ten different simulation replications for each considered scenario. The obtained 95% confidence intervals are very small and we chose not to sketch them on the different curves for better clarity of the figures.

Figure 5 illustrates the server hit reduction (SHR) as a function of the CS size and for different numbers of popular traffic flows. Here, we used a Zipf parameter equal to 1.6 and a freshness period equal to 50 s. A remarkable server hit reduction above 90% is obtained when using only one traffic flow. The SHR decreases with the increase in the number of traffic flows and this decrease becomes smaller as the the CS size becomes larger. Recall that PF-ClusterCache regards the caches of the different nodes of the local cluster as a unique distributed pool of storage. This pool of caches is shared among the different traffic flows generated by consumers connected to the nodes of this cluster. Moreover, the underlying LPF replacement policy enforces that only the most popular data content of these various traffic flows can persist within the pool for the duration of its freshness. At the extreme, when there is only one popular traffic flow, the most popular data content of this flow obtains the entire pool, which is large enough to host it without any LPF eviction. As we can observe from Figure 5, even a CS size equal to 5, which is a pool of 50 places, is enough to attain the maximum SHR value. More traffic flows require a larger pool and a larger CS size. For a number of flows equal to 10, we need a CS size larger than, say, 20 to approach the highest SHR value. Freshness also plays a role as any cached content becomes stale when its validity expires, and if requested, must be retrieved from its producer. Therefore, PF-ClusterCache provides a different maximum SHR for different numbers of flows; the smaller the number of streams (or flows), the higher the maximum attainable SHR.

Figure 6 depicts the SHR as a function of the Zipf popularity parameter and for various numbers of popular traffic flows. Here, also the freshness of data content is fixed to 50 s. We observe the great efficiency of PF-ClusterCache as the content popularity becomes larger. An increase in the Zipf popularity parameter provides higher frequencies for the most popular content, which restrains the range of this most popular content, and consequently less caching storage is required. In this figure, the CS size is fixed to 10, which is adequate for few traffic streams. However, this CS capacity is small to deliver great SHR for low Zipf popularity parameter values when a large number of traffic streams are deployed. A low Zipf parameter value amounts to a broadening of the range of popularity among data content. The CS size plays an important role, as was discussed earlier and illustrated in Figure 5.

Table 2 provides the server hit reduction for different freshness period values and for different numbers of traffic streams. The content freshness impacts the server hit reduction ratio as cached popular content should be retrieved directly from its producers once it becomes stale (expired freshness). This is repeated every freshness period and for every piece of cached content. As such, the server hit reduction ratio decreases as the freshness period becomes lower. However, the impact of the freshness on the SHR remains relatively limited as the number of retrieved content due to freshness expiration is very small relative to the number of requested data content by consumers connected to the same cluster. For instance, for a CS size equal to 10, we obtain an increase of only 0.014 in the SHR when increasing the freshness period from 10 s to 250 s. The largest increase in the SHR is obtained when a CS size equal to 40 is used.

Figure 7 depicts the content average retrieval delay as a function of CS size for different Zipf popularity parameter values. This figure clearly shows the great efficiency of the proposed PF-ClusterCache, which requires a very short average delay to retrieve popular content. The content average delay decreases as content popularity increases thanks to the underlying LPF that keeps the the most popular content in the cache. The content average delay also decreases as the CS size increases. Recall from Figure 5 that increasing the CS size provides better SHR. The PF-ClusterCache is very efficient for IoT and delay-sensitive applications, as it requires a very small content retrieval delay for popular content, even with a small CS size.

Figure 8 depicts the average hop count to retrieve popular content as a function of CS size and for different values of the Zipf parameter. Recall that content may be retrieved from its surrogate node or ultimately from its producer. The strict minimum content average hop count is obtained when the most popular content has infinite freshness (i.e., the freshness value is greater than the simulation time). In such a case, the most popular content resides in the pool of caches of the cluster and therefore the average hop count is equal to the average path length from consumers to SNCs with the same cluster. Figure 7 and Figure 8 clearly illustrate the great efficiency of the proposed PF-ClusterCache. With a small but sufficient CS size, the most popular content is kept near the edge of the network, thus yielding a very small average retrieval delay.

Figure 9 illustrates the eviction rate as a function of CS size and for different values of the Zipf parameter. First, we observe that the eviction rate is higher for lower values of the Zipf parameter for all considered values of the CS size. Recall that a lower value of the Zipf parameter allows a broader range of popular content and therefore requires more CS space; otherwise, it leads to more evictions. Second, the eviction rate has remarkable behavior as it starts increasing with the CS size until attaining its maximum, and then declines with a further increase in the CS size. This is essentially due to the specifics of the underlying used LPF replacement scheme. The LPF does not allow a replacement (and therefore an eviction) unless the incoming data content has a higher popularity count than the least popular data in cache. As such, the CS of any SCN within the cluster is always occupied by the most popular data. As a result, when the CS size is small, only the most popular content is cached and therefore no replacement is permitted. As the CS size becomes larger, more popular content (or, equivalently, more content with a lower popularity count) is cached and consequently more replacements can take place. When we reach a CS size sufficient to cache all of the most popular content, the eviction rate becomes very small as virtually all requested content is retrieved from its SCNs within its cluster. Once again, PF-ClusterCache is very efficient, with a rather reduced CS size that should be enough to allow the most frequently requested (most popular) data content to be cached at its SCNs. PF-ClusterCache caches requested popular content near the edge of the network. It does not allow caching at the core routers, leaving them as fast name-based forwarding structures.

Now, we turn to investigating the required maximum popularity table size as a function of the Zipf parameter value. Recall that PF-ClusterCache uses a finite predefined size of the popularity table at each SCN. Recall also that the popularity table is managed as LFU finite storage. An Interest that finds the popularity table of its SCN full and whose name prefix does not have an entry replaces the oldest, least frequently used prefix name found in the table. Table 3 shows, for 10 popular traffic streams, the maximum popularity table size needed (used in the simulation) for different values of the content popularity. No more than a few hundred places are needed, even for low Zipf parameter values.

4.4. PF-ClusterCache Comparative Analysis

Within a network, we usually have a mix of popular and unpopular traffic. In the previous section, we ascertained the efficiency of the proposed popularity and freshness-aware caching scheme, PF-ClusterCache, using only popular flows. Indeed, any SCN in PF-ClusterCache discovers popular data content using its popularity table, and allows only the most popular content to be cached thanks to the underlying LPF replacement scheme. PF-ClusterCache does not cache unpopular content, and even low-popularity content may not have a chance to become cached unless the CS size allows it.

When considering a mix of popular and unpopular flows, the question arises as to the size of the popularity table used at each SCN. Recall here that for each arriving Interest, an SCN verifies whether the requested content name has already an entry in the table. In the affirmative, its count is incremented; otherwise, it is inserted with a count set to one. This is done for every incoming interest, whether the requested content is popular or not. The remarkable fact is that this does not necessitate a larger size for the popularity table. The LFU strategy of the table replaces the old, least frequently used name, which is normally that for non-popular content.

The comparative analysis of PF-ClusterCache is performed with three different caching approaches: the well-known Leave Copy Everywhere (LCE) [27], customarily used as a yardstick for comparison in many studies on caching in NDN; the Caching Fresh and Popular Content (CFPC) [23], which is based on both popularity and freshness; and the PoolCache [29], which is a collaborative caching scheme based on pooling the cache storage of several nodes but with no explicit treatment of data popularity and no consideration of data freshness. These three caching policies use the Least Recently Used (LRU) replacement scheme.

We divide the content catalogue into 10 groups of 50,000 pieces of content each. Requested content from each group is either selected randomly to generate unpopular traffic (unpopular flow) or using the Zipf probability distribution to generate popular traffic (popular group or flow). Interests are generated at each consumer and for each group of flows according to a Poisson distribution with parameter

λ_{g} = 2.5

requests per second.

Figure 10 depicts the server hit reduction ratio (SHR) for the popular traffic as a function of the number of popular flows among the 10 flows and for the different considered caching approaches. PF-ClusterCache clearly outperforms all the other approaches. LCE performs the worst as it maintains copies everywhere, which amounts to excessive evictions and replacements, resulting in cache misses and lower server hit reduction. In CFPC, content popularity control and verification is only performed when a cache is full. This allows it to intermittently cache unpopular content, resulting in a high cache miss and low server hit reduction ratio. PoolCache performs much less successfully than PF-ClusterCache although it uses a similar collaborative caching principle. This is essentially due to its inherent design as an approach that caches both popular and unpopular content. Its inefficiency against CFPC is essentially due to two facts: its underlying LRU replacement scheme that is forced to normally evict and replace unpopular content, and its caching principle, which enforces zero redundancy inside any pool of nodes. PF-ClusterCache clearly outperforms PoolCache thanks to its explicit treatment of content popularity, and its underlying LPF replacement scheme. It is also worth noting in Figure 10 that the SHR for all the considered caching is insensitive to the considered number of popular flows as it is the SHR for popular traffic only. This SHR, however, is sensitive to the used values of the CS size and the Zipf parameter, as depicted in the next two figures.

Figure 11 illustrates the SHR as a function of CS size for the different considered caching approaches, when using two popular flows among the 10 flows. Again, PF-ClusterCache outperforms all the other considered caching approaches. It is interesting to note that PF-ClusterCache necessitates a much smaller CS size. While PF-ClusterCache delivers its highest SHR even with a CS size of 5, the three other caching approaches require a CS size larger than 40 to attain their maximum SHR.

Figure 12 depicts the SHR of popular traffic as a function of the Zipf parameter value, when using two popular flows among the 10 flows. PF-ClusterCache outperforms all the other approaches and delivers high SHR even for a low value of the Zipf parameter. Recall that the lower the Zipf parameter, the broader the range of popularity within each popular group of content. In turn, the lower the Zipf popularity, the higher the required CS size to attain high SHR ratios. PF-ClusterCache, as it requires a much smaller CS size, delivers much better SHR even at low values of the Zipf parameter for the popular flows.

Figure 13 and Figure 14 illustrate the average retrieval time of popular content and the average number of hops required to retrieve popular content for the different considered schemes. We clearly observe the efficiency of PF-ClusterCache as it requires less than 50% retrieval delay and hop count compared to the next best approach, PoolCache.

This is a direct result of the popularity-aware caching and replacement of PF-ClusterCache. Although CFPC is also a popularity-aware caching scheme, its performance is rather close to that of LCE. This is primarily due to its method of treating the content popularity, allowing it to cache unpopular content, its underlying replacement scheme, and most importantly allowing it to cache the same content at multiple nodes within the paths towards consumers. As explained earlier, PoolCache and CFPC attain lower server hit reduction and necessitate much a larger CS size. These facts tacitly amount to a much larger retrieval delay and hop count for popular content.

Last but not least, Figure 15 depicts the number of evicted pieces of content from the two considered popular flows per second using a Zipf parameter value of 1.6 and a CS size of 10. PF-ClusterCache provides the smallest eviction rate as it considers only popular content for caching and uses the LPF replacement scheme. PoolCache provides the second lowest eviction rate, though much larger than that of PF-ClusterCache, as it enforces a zero redundancy within each neighborhood, and thanks also to its LRU, which maintains a good SHR, as shown in Figure 10.

5. Conclusions

Efficient caching in the network with limited resources remains a significant challenge. The basic founding idea of PF-ClusterCache is to increase the storage available for caching without additional storage and to take into account the freshness and popularity of cached data content.

PF-ClusterCache aggregates caching storage from individual nodes into a global shareable storage to enforce zero caching redundancy across any cluster of nodes. PF-ClusterCache makes it possible to cache much more popular content in clusters and across the network. As a result, it achieved remarkable performance and conserved valuable network resources. Using a mixture of popular and unpopular traffic flows and a three-hierarchy topology, which is a close representation of the IoT network architecture, we compared the proposed scheme with a well-known scheme as LCE and two recent schemes, CFPC and PoolCache. The results showed that PF-ClusterCache outperformed all schemes in all metrics. Moreover, the use of the LPF replacement scheme maintained the efficiency of the caching process by reducing the eviction rate while keeping only the most popular content in the cache, even with a small popularity threshold.

PF-ClusterCache, at the extreme when there is only one node per cluster, behaves similarly to the consumer caching strategy but with the added application of freshness and popularity. However, even with a small number of nodes within the cluster and a limited CS size, the scheme can retain good performance due to the clustering procedure and the added value of popularity. As the LPF replacement scheme keeps only the popular content within the cache, it can be used with other caching schemes to enhance their performance. We aim to study the impact of integrating LPF with caching schemes that can support producer mobility by keeping the most fresh and popular content closer to the mobile producer and reducing the handoff delay impact on the network performance.

Author Contributions

Conceptualization, S.A., A.B., A.G. and S.A.-A.; Formal analysis, S.A., A.B., A.G. and S.A.-A.; Writing—original draft, S.A., A.B., A.G. and S.A.-A.; Writing—review & editing, S.A. and A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the Deanship of Scientific Research at King Saud University for funding and supporting this research through the initiative of DSR Graduate Students Research Support (GSR).

Conflicts of Interest

The authors declare no conflict of interest.

References

Arshad, S.; Azam, M.A.; Rehmani, M.H.; Loo, J. Information-centric networking based caching and naming schemes for internet of things: A survey and future research directions. arXiv 2017, arXiv:1710.03473. [Google Scholar]
Sivanathan, A.; Sherratt, D.; Gharakheili, H.; Radford, A.; Wijenayake, C.; Vishwanath, A.; Sivaraman, A. Characterizing and classifying iot traffic in smart cities and campuses. In Proceedings of the 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Atlanta, GA, USA, 1–4 May 2017; pp. 559–564. [Google Scholar]
Cisco Annual Internet Report (2018–2023) White Paper Updated: 9 March 2020. Available online: https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html (accessed on 10 May 2022).
Cisco Report. Cisco Visual Networking Index: Forecast and Methodology, 2016–2021; CISCO White Paper; Cisco: San Jose, CA, USA, 2017. [Google Scholar]
Jacobson, V.; Burke, J.; Estrin, D.; Zhang, L.; Zhang, B.; Tsudik, G.; Claffy, K.; Krioukov, D.; Massey, D.; Papadopoulos, C.; et al. Named Data Networking (NDN) Project 2012–2013 Annual Report; Technical Report; Named Data Networking (NDN): New York, NY, USA, 2013. [Google Scholar]
Amadeo, M.; Campolo, C.; Quevedo, J.; Corujo, D.; Molinaro, A.; Iera, A.; Aguiar, R.L.; Vasilakos, A.V. Information-centric networking for the internet of things: Challenges and opportunities. IEEE Netw. 2016, 30, 92–100. [Google Scholar] [CrossRef]
Meddeb, M.; Dhraief, A.; Belghith, A.; Monteil, T.; Drira, K.; Al-Ahmadi, S. Named data networking: A promising architecture for the Internet of Things (IoT). Int. J. Semant. Web Inf. Syst. IJSWIS 2018, 14, 86–112. [Google Scholar] [CrossRef]
Aboodi, A.; Wan, T.; Sodhy, G. Survey on the Incorporation of NDN/CCN in IoT. IEEE Access 2019, 7, 71827–71858. [Google Scholar] [CrossRef]
Meddeb, M.; Dhraief, A.; Belghith, A.; Monteil, T.; Drira, K. Producer mobility support in named data Internet of Things network. Procedia Comput. Sci. 2017, 109, 1067–1073. [Google Scholar] [CrossRef]
Meddeb, M.; Dhraief, A.; Belghith, A.; Monteil, T.; Drira, K.; Gannouni, S. AFIRM: Adaptive forwarding based link recovery for mobility support in NDN/IoT networks. Future Gener. Comput. Syst. 2018, 87, 351–363. [Google Scholar] [CrossRef]
Abanea, A.; Daoui, M.; Bouzefrane, S.; Muhlethaler, P. A Lightweight Forwarding Strategy for Named Data Networking in Low-end IoT. J. Netw. Comput. Appl. 2019, 148, 102445. [Google Scholar] [CrossRef] [Green Version]
Qiao, X.; Ren, P.; Chen, J.; Tan, W.; Blake, M.; Xu, W. Session persistence for dynamic web applications in Named Data Networking. J. Netw. Comput. Appl. 2019, 125, 220–235. [Google Scholar] [CrossRef]
Wang, M.; Xu, C.; Chen, X.; Zhong, L.; Muntean, G. Decentralized asynchronous optimization for dynamic adaptive multimedia streaming over information centric networking. J. Netw. Comput. Appl. 2020, 157, 102574. [Google Scholar] [CrossRef]
Amadeo, M.; Campolo, C.; Iera, A.; Molinaro, A. Named data networking for IoT: An architectural perspective. In Proceedings of the 2014 European Conference on Networks and Communications (EuCNC), Bologna, Italy, 23–26 June 2014; pp. 1–5. [Google Scholar]
Wang, X.; Wang, X.; Yi, Y. Ndn-based iot with edge computing. Future Gener. Comput. Syst. 2021, 115, 397–405. [Google Scholar] [CrossRef]
Amadeo, M.; Campolo, C.; Molinaro, A. Internet of things via named data networking: The support of push traffic. In Proceedings of the 2014 International Conference and Workshop on the Network of the Future (NOF), Paris, France, 3–5 December 2014; pp. 1–5. [Google Scholar]
Zhang, Y.; Raychadhuri, D.; Grieco, L.A.; Baccelli, E.; Burke, J.; Ravindran, R.; Wang, G.; Ahlgren, B.; Schelen, O. Requirements and Challenges for IoT over ICN. In Internet-Draft Draft-Zhang-Icnrg-Icniot-Requirements-01; Internet Engineering Task Force: Fremont, CA, USA, 2016. [Google Scholar]
Shang, W.; Bannis, A.; Liang, T.; Wang, Z.; Yu, Y.; Afanasyev, A.; Thompson, J.; Burke, J.; Zhang, B.; Zhang, L. Named data networking of things. In Proceedings of the 2016 IEEE First International Conference on Internet-of-Things Design and Implementation (IoTDI), Berlin, Germany, 4–8 April 2016; pp. 117–128. [Google Scholar]
Meddeb, M.; Dhraief, A.; Belghith, A.; Monteil, T.; Drira, K.; AlAhmadi, S. Cache freshness in named data networking for the internet of things. Comput. J. 2018, 61, 1496–1511. [Google Scholar] [CrossRef]
Meddeb, M.; Dhraief, A.; Belghith, A.; Monteil, T.; Drira, K. How to cache in ICN-based IoT environments? In Proceedings of the 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications AICCSA, Hammamet, Tunisia, 30 October–3 November 2017. [Google Scholar]
Zhang, G.; Li, Y.; Lin, T. Caching in information centric networking: A survey. Comput. Netw. 2013, 57, 3128–3141. [Google Scholar] [CrossRef]
Zhang, Z.; Lung, C.H.; Lambadaris, I.; St. Hilaire, M. IoT data lifetime-based cooperative caching scheme for ICN-IoT networks. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 1–7. [Google Scholar]
Amadeo, M.; Ruggeri, G.; Campolo, C.; Molinaro, A.; Mangiullo, G. Caching Popular and Fresh IoT Contents at the Edge via Named Data Networking. In Proceedings of the IEEE INFOCOM 2020 IEEE Conference on Computer Communications Workshops, Toronto, ON, Canada, 6–9 July 2020; pp. 610–615. [Google Scholar]
Meddeb, M.; Dhraief, A.; Belghith, A.; Monteil, T.; Drira, K.; Mathkour, H. Least fresh first cache replacement policy for NDN-based IoT networks. Pervasive Mob. Comput. 2019, 52, 60–70. [Google Scholar] [CrossRef]
Amadeo, M.; Ruggeri, G.; Campolo, C.; Molinaro, A. Diversity-improved caching of popular transient contents in Vehicular Named Data Networking. Comput. Netw. 2021, 184, 107625. [Google Scholar] [CrossRef]
Arshad, S.; Azam, M.A.; Rehmani, M.H.; Loo, J. Recent advances in information-centric networking-based Internet of Things (ICN-IoT). IEEE Internet Things J. 2019, 6, 2128–2158. [Google Scholar] [CrossRef] [Green Version]
Laoutaris, N.; Syntila, S.; Stavrakakis, I. Meta algorithms for hierarchical web caches. In Proceedings of the IEEE International Conference on Performance, Computing, and Communications, Phoenix, AZ, USA, 15–17 April 2004; pp. 445–452. [Google Scholar]
Fayazbakhsh, S.K.; Lin, Y.; Tootoonchian, A.; Ghodsi, A.; Koponen, T.; Maggs, B.; Ng, K.; Sekar, V.; Shenker, S. Less pain, most of the gain: Incrementally deployable icn. ACM SIGCOMM Comput. Commun. Rev. 2013, 43, 147–158. [Google Scholar] [CrossRef]
Alahmri, B.; Al-Ahmadi, S.; Belghith, A. Efficient Pooling and Collaborative Cache Management for NDN/IoT Networks. IEEE Access 2021, 9, 43228–43240. [Google Scholar] [CrossRef]
Wang, S.; Bi, J.; Wu, J.; Vasilakos, A.V. CPHR: In-Network Caching for Information-Centric Networking With Partitioning and Hash-Routing. IEEE/ACM Trans. Netw. 2016, 24, 2742–2755. [Google Scholar] [CrossRef]
Li, C.; Okamura, K. Cluster-based in-networking caching for content-centric networking. Int. J. Comput. Sci. Netw. Secur. IJCSNS 2014, 14, 1. [Google Scholar]
Psaras, I.; Chai, W.K.; Pavlou, G. Probabilistic in-network caching for information-centric networks. In Proceedings of the Second Edition of the ICN Workshop on Information-Centric Networking, Helsinki, Finland, 17 August 2012; pp. 55–60. [Google Scholar]
Sato, Y.; Ito, Y.; Koga, H. Hash based cache distribution and search schemes in content centric networking. IEICE Trans. Inf. Syst. 2019, 102, 998–1001. [Google Scholar] [CrossRef] [Green Version]
Yan, H.; Gao, D.; Su, W.; Foh, C.H.; Zhang, H.; Vasilakos, A.V. Caching strategy based on hierarchical cluster for named data networking. IEEE Access 2017, 5, 8433–8443. [Google Scholar] [CrossRef]
Mun, J.H.; Lim, H. Cache sharing using bloom filters in named data networking. J. Netw. Comput. Appl. 2017, 90, 74–82. [Google Scholar] [CrossRef]
Gui, Y.; Chen, Y. A cache placement strategy based on compound popularity in named data networking. IEEE Access 2020, 8, 196002–196012. [Google Scholar] [CrossRef]
Rossi, D.; Rossini, G. Caching Performance of Content Centric Networks under Multi-Path Routing; Technical Report; Telecom ParisTech: Paris, France, 2011. [Google Scholar]
Rani, V.; Shalinie, M. Efficient Cache Distribution Using Hash-Routing Schemes and Nodal Clustering For Information Centric Network. In Proceedings of the International Conference on Signal Processing, Communications and Networking (ICSCN’17), Chennai, India, 16–18 March 2017. [Google Scholar]
Matani, D.; Shah, K.; Mitra, A. An O(1) algorithm for implementing the LFU cache eviction scheme. arXiv 2021, arXiv:2110.11602v1. [Google Scholar]
Mastorakis, S.; Afanasyev, A.; Zhang, L. On the evolution of ndnSIM: An open-source simulator for NDN experimentation. ACM SIGCOMM Comput. Commun. Rev. 2017, 47, 19–33. [Google Scholar] [CrossRef]
Cawka. Cawka/ndnSIM: ndnSIM: NS-3 Based NDN Simulator. 2019. Available online: https://github.com/cawka/ndnSIM-tutorial (accessed on 10 May 2022).

Figure 1. NDN forwarding control plane.

Figure 2. PF-ClusterCache: Data content retrieval from the producer.

Figure 3. Proposed scheme: Data content already cached at its Surrogate Caching Node.

Figure 4. Transit Stub topology used in the simulation.

Figure 5. SHR as s a function of the Content Store size.

Figure 6. SHR as a function of the Zipf popularity parameter.

Figure 7. Content average delay as a function of CS size and Zipf parameter.

Figure 8. Average hop count to retrieve popular content as a function of CS size.

Figure 9. Eviction rate for different popularity rates and cache sizes.

Figure 10. Server hit reduction ratio for popular traffic as a function of number of popular flows.

Figure 11. Server hit reduction ratio for popular traffic as a function of CS size.

Figure 12. Server hit reduction ratio for popular traffic as a function of popularity rate.

Figure 13. Average delay for popular traffic.

Figure 14. Average hop count for popular traffic.

Figure 15. Eviction rate of content from popular flows.

Table 1. Simulation parameters.

Parameter	Value
Total Average Number of Nodes	600 nodes
Producers	20 producers
Consumers	30 consumers
Catalog Size	500,000 contents
Traffic Rate	$2.5$ requests/s/flow/consumer
Cache Size	$(5, 10, 20, 30, 40)$
Zipf Parameter	$(1.2, 1.4, 1.6, 1.8, 2.0)$
Popularity Threshold	2
Average Freshness	$(10, 50, 100, 250)$ Seconds
Popularity Flows	$(1, 10, 20, 40)$ Flows
Simulation Time	250 s
Warming Up Time	50 s
Simulation Replications	10

Table 2. Server hit reduction for different freshness values and CS sizes.

Freshness	CS = 5	T CS = 10	CS = 20	T CS = 30	CS = 40
10	0.7055025	0.7893845	0.8299602	0.8364318	0.8366853
50	0.7163088	0.8103767	0.8662949	0.8860598	0.8948450
100	0.7182633	0.8139454	0.8726653	0.8944247	0.9051110
250	0.7192372	0.8161867	0.8770879	0.9012621	0.9134426

Table 3. Maximum popularity table size as a function of different popularity parameter values.

Popularity Parameter	Max Table Size
1.2	375
1.4	229
1.6	147
1.8	100
2.0	72

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alduayji, S.; Belghith, A.; Gazdar, A.; Al-Ahmadi, S. PF-ClusterCache: Popularity and Freshness-Aware Collaborative Cache Clustering for Named Data Networking of Things. Appl. Sci. 2022, 12, 6706. https://doi.org/10.3390/app12136706

AMA Style

Alduayji S, Belghith A, Gazdar A, Al-Ahmadi S. PF-ClusterCache: Popularity and Freshness-Aware Collaborative Cache Clustering for Named Data Networking of Things. Applied Sciences. 2022; 12(13):6706. https://doi.org/10.3390/app12136706

Chicago/Turabian Style

Alduayji, Samar, Abdelfettah Belghith, Achraf Gazdar, and Saad Al-Ahmadi. 2022. "PF-ClusterCache: Popularity and Freshness-Aware Collaborative Cache Clustering for Named Data Networking of Things" Applied Sciences 12, no. 13: 6706. https://doi.org/10.3390/app12136706

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PF-ClusterCache: Popularity and Freshness-Aware Collaborative Cache Clustering for Named Data Networking of Things

Abstract

1. Introduction

2. Background and Related Work

2.1. NDN Structure and Operation

2.2. NDN Caching Policies

3. PF-ClusterCache: Popularity and Freshness-Aware Collaborative Cache Management

4. Performance Evaluation and Comparative Analysis

4.1. Simulation Scenario

4.2. Evaluation Metrics

4.3. PF-ClusterCache Performance Analysis

4.4. PF-ClusterCache Comparative Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI