An Efficient Multipath-Based Caching Strategy for Information-Centric Networks

Zhang, Wancai; Han, Rui

doi:10.3390/electronics14030439

Open AccessArticle

An Efficient Multipath-Based Caching Strategy for Information-Centric Networks

by

Wancai Zhang

^1,2 and

Rui Han

^1,2,*

¹

National Network New Media Engineering Research Center, Institute of Acoustics, Chinese Academy of Sciences, No. 21, North Fourth Ring Road, Haidian District, Beijing 100190, China

²

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, No. 19(A), Yuquan Road, Shijingshan District, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(3), 439; https://doi.org/10.3390/electronics14030439

Submission received: 16 December 2024 / Revised: 13 January 2025 / Accepted: 21 January 2025 / Published: 22 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

The growing demand for large-scale data distribution and sharing presents significant challenges to content transmission within the current TCP/IP network architecture. To address these challenges, Information-Centric Networking (ICN) has emerged as a promising alternative, offering inherent support for multipath forwarding and in-network caching to improve data transmission performance. However, most existing ICN caching strategies primarily focus on utilizing resources along the default transmission path and its neighboring nodes, without fully exploiting the additional resources provided by multipath forwarding. To address this gap, we propose an efficient multipath-based caching strategy that optimizes cache placement by decomposing the problem into two steps, multipath selection and cache node selection along the paths. First, multipath selection considers both transmission and caching resources across multiple paths, prioritizing the caching of popular content while efficiently transmitting less popular content. Next, along the selected paths, cache node selection evaluates cache load based on cache utilization and available capacity, prioritizing nodes with the lowest cache load. Extensive simulations across diverse topologies demonstrate that the proposed strategy reduces data transmission latency by at least 12.22%, improves cache hit rate by at least 16.44%, and enhances cache node load balancing by at least 18.77%, compared to the neighborhood collaborative caching strategies.

Keywords:

information centric network; caching strategy; multipath transmission; multipath scheduling; content placement

1. Introduction

In recent years, advancements in technologies such as 5G, the Internet of Things (IoT), and artificial intelligence have led to an exponential growth in data generation and distribution. This trend is especially pronounced in emerging applications and data-intensive scientific domains, where vast datasets are generated. The global distribution and sharing of these datasets are crucial for enabling collaborative processing and analysis across enterprises and research institutions worldwide [1,2,3]. For instance, the Thomas Jefferson National Accelerator Facility (JLab) has over 1800 active users from 324 institutions in 39 countries, and generates approximately 100 petabytes (PB) of data annually through raw scientific observations, data analysis, simulations, and backups [4]. The large scale of these datasets, coupled with the need for timely distribution, presents significant challenges for network transmission, particularly in efficiently utilizing limited network resources and infrastructure.

Traditional IP-based network architectures have inherent limitations in handling large-scale data transmissions [5]. As host-centric networks, they tightly couple transmission paths with data sources, creating substantial bandwidth pressure on wide-area networks due to the repeated access to popular content. Moreover, the lack of efficient content management mechanisms hampers real-time, distributed collaboration, which is crucial for modern, data-intensive applications. Content Delivery Networks (CDNs) have been widely adopted to address these challenges by improving content distribution and delivery. CDNs deploy distributed caches at strategic locations, storing copies of frequently accessed content closer to end-users. This reduces the load on origin servers and minimizes latency by serving data from nearby cache nodes. However, while CDNs have improved content delivery, they remain limited by their host-centric model, where data retrieval is bound to specific servers.

To overcome these limitations, researchers have explored using Information-Centric Networking (ICN) to provide faster access to large-scale data for geographically distributed users [6,7,8,9]. Unlike conventional host-centric IP-based networks, ICN focuses on the content rather than the host, enabling data to be addressed directly by their names. This means users can request data based on their unique content identifiers, bypassing the need to reference a specific host address. As a result, this architecture fundamentally alters the data distribution model, leading to significant improvements in the efficiency of large-scale data dissemination and reducing the bottlenecks typically associated with traditional networks [10].

ICN inherently supports multipath forwarding and in-network caching. In-network caching allows intermediate nodes along network paths to store data replicas, enabling localized data access and significantly reducing latency for subsequent requests that would otherwise rely on remote sources [11]. Multipath forwarding not only helps address mobility and multihoming issues but also enables traffic engineering by aggregating resources from multiple paths, thereby improving network resource utilization [12].

Despite these advantages, in-network caching and multipath forwarding introduce new challenges, particularly regarding how to efficiently combine multipath forwarding with in-network caching. The distributed and ubiquitous nature of the ICN in-network caching system means that every ICN node has the capability to cache and provide services. However, the caching capacity of individual nodes is limited compared to the vast amounts of data being transmitted. Traditional ICN caching strategies primarily utilize the limited cache resources along the default transmission path, which often fails to meet the demands of large-scale data distribution. Furthermore, cache nodes frequently experience heavy loads, resulting in frequent cache replacements, reduced efficiency, and limited network performance. While multipath forwarding can effectively aggregate resources from multiple paths, current research in this area has mainly focused on transmission resources, such as bandwidth and latency, without fully considering the cache resources along these paths. This limits the full potential of in-network caching.

To address this, we propose a multipath-based caching strategy built on the Name Resolution System-based multipath transmission mechanism [13]. This strategy considers both transmission and caching resources along paths, aggregating resources from multiple paths to enhance overall in-network resource utilization and reduce transmission latency. Specifically, we decompose the caching placement problem into two steps, multipath selection and cache node selection along the paths. For multipath selection, we introduce the content-aware hybrid multipath selection method, which adjusts multipath scheduling strategies based on content popularity. For high-popularity content, a cache-prioritized strategy favors paths with more available cache space, optimizing in-network caching and reducing latency for future repeated requests. For low-popularity content, a transmission-prioritized strategy selects paths based on their transmission state, considering factors such as latency and congestion levels to avoid congestion and minimize transmission delay. Additionally, for cache node selection, we propose a cache load-based strategy that evaluates each node’s cache load by considering both the utilization of cached content and the available cache space at the node. The node with the lowest cache load is prioritized for caching, ensuring balanced load distribution across cache nodes and improving caching efficiency. The main contributions of this paper are as follows:

We introduce a multipath caching architecture based on the Name Resolution System, and we formulate a cache placement optimization problem aimed at minimizing average content transmission latency.
By analyzing the optimization problem, we introduce a multipath-based caching strategy that decomposes the optimization problem into two steps, multipath selection and cache node selection along the paths. For multipath selection, we propose a content-aware hybrid strategy that prioritizes high-popularity content for caching and low-popularity content for efficient transmission. For cache node selection along the selected paths, we introduce a cache load-based node selection strategy that evaluates node cache load based on cache utilization and available capacity, prioritizing nodes with the lowest cache load.
We conduct simulations to evaluate the proposed strategy, analyzing its performance under various experimental parameters and comparing it with other caching strategies. The results demonstrate that the proposed strategy effectively reduces content transmission latency, improves cache hit rates, and balances the load across cache nodes.

The remainder of this paper is organized as follows: Section 2 introduces and discusses related work. Section 3 presents the multipath caching architecture based on the Name Resolution System and the optimization function for the multipath-based caching placement problem. Section 4 describes the proposed multipath-based caching strategy and its implementation. Section 5 provides the performance evaluation of the proposed strategy. Finally, Section 6 concludes the paper and discusses directions for future work.

2. Related Works

In recent years, in-network caching in ICN has been widely studied, with numerous caching placement strategies proposed to optimize network performance. Caching placement strategies determine which objects are stored on which cache nodes, which is critical for improving data transmission efficiency. Depending on the cache location, these strategies can be categorized into on-path caching and off-path caching.

On-path caching strategies make caching decisions along the default data delivery path to improve access efficiency. LCE [14] leaves copies on every node along the path, ensuring fast content access but leading to substantial redundancy. LCD [15] caches content only at the next-hop node where the request is satisfied, reducing redundancy but slowing content diffusion. CL4M [16] utilizes topological information to cache at the node with the highest betweenness centrality on the path, effectively utilizing key nodes’ cache space but underutilizing non-core nodes. For large-scale content transmission, WAVE [17] adjusts the number of chunks cached for large-scale content to enable popularity-driven diffusion, but its diffusion speed is slow. To improve this, IPEC [18] prioritizes caching more popular chunks at edge routers, enhancing network performance. Additionally, a method based on content relevance [19] analyzes the storage correlation between target content and nodes, making caching decisions that improve cache efficiency. PB-NCC [20] leverages replication control techniques, determining cache node positions based on the popularity of cached content along the path, further optimizing cache utilization. In [21], an interaction-based caching strategy focuses on preserving the most popular and largest contents for a longer time in local networks, increasing cache hit rates and reducing the need to fetch data from external sources. To address dynamic content distribution with frequent content updates, a fast on-path caching placement strategy is proposed [22], which incorporates users’ interest preferences for different content categories.

Off-path caching strategies enhance overall performance by utilizing nodes outside the default path. For static long video content distribution scenarios with slow popularity and content update changes, the authors in [23] proposed an active caching algorithm combined with nearest replica routing to maximize cache utilization and improves load balancing, but its complexity makes it impractical for large-scale networks. The hash-based caching strategy [24] maps content names to specific nodes using hashing, ensuring fast cache location identification. However, it neglects content popularity and node placement, potentially increasing path lengths and link load. NCR-BN [25] enhances cache resource utilization by enabling neighboring off-path routers to collaboratively make caching and routing decisions, but its transmission and caching remain limited to nodes around the default path. PeNCache [26] performs lightweight collaborative content search and caching by considering both local and global popularity, but like other strategies, it is still constrained to paths around the default route. In [27], an additional layer of caching is introduced by deploying an off-path central cache router, which helps offload excess content from routers along the transmission path. However, node allocation in large-scale networks remains a challenging issue. In [28], caching popular content at designated nodes is used to leverage the network’s cache capacity and enhance content diversity. However, in large-scale networks, selecting the appropriate designated nodes presents significant challenges. Recently, ref. [29] combined SDN with cache states and multipath forwarding to enhance transmission efficiency. However, they focus only on cache space when selecting nodes, neglecting content popularity and cache node load, which can lead to inefficient cache use and node overload. Additionally, the reliance on SDN for global decision-making poses a single-point failure risk.

The on-path caching strategies discussed above are limited in large-scale data transmission scenarios due to their reliance on default path resources, which fail to meet the comprehensive caching needs. Off-path caching strategies, on the other hand, often fail to adequately account for the impact of content popularity. In large-scale data transmission, limited in-network cache resources are prone to being occupied by less popular content, increasing transmission latency. Furthermore, most off-path caching strategies rely on a routing-by-name mechanism, making it difficult to effectively leverage resources in off-path nodes. Research by [30] suggests that Name Resolution System (NRS)-based mechanisms can inherently support content retrieval from cache nodes outside the default data path, providing distinct advantages. Building on this insight, this paper proposes a caching placement strategy that leverages the NRS-based multipath transmission mechanism. By comprehensively considering multipath caching, transmission resources, and the load status of cache nodes, the proposed approach effectively reduces transmission latency, improves cache hit rates, and balances cache node loads.

3. System Model

In this section, we first introduce the NRS-based multipath transmission mechanism [13], followed by a description of the content caching and retrieval process integrated with the multipath transmission mechanism. Subsequently, we formulate the cache placement optimization problem for multipath transmission, with the objective of minimizing the average content transmission latency.

3.1. System Description

The current dominant architecture of the Internet is based on the IP network. A complete overhaul of the existing infrastructure to deploy a new network architecture is not practical. Therefore, the ICN architecture must be compatible with existing IP networks to leverage the current infrastructure and minimize deployment costs. In our previous research, we proposed a multipath transmission architecture based on the Name Resolution System [13], which enables incremental deployment by adding an ID layer above the IP layer. In this architecture, data, network devices, services, and other entities are identified using a globally unique and immutable Entity ID (EID). The EID is flat and semantically neutral, capable of taking various forms, such as numbers or characters. The mapping between the EID and its corresponding Network Address (NA) is managed by the Name Resolution System, which is deployed in a distributed manner to provide low-latency resolution services [31]. By maintaining the mapping between IDs and IP addresses, the NRS decouples ID-based addressing from IP-based routing. At the network layer, a protocol called the Identifier Protocol (IDP) [31] is employed. The IDP defines a set of rules on how to operate the NA based on the packet’s ID, including actions such as adding, deleting, and modifying the NA. Above the ID layer, the Transport Layer protocol [32] is designed to handle the ICN packet format, transmission process, congestion control mechanisms, and retransmission mechanisms. This transport layer protocol ensures efficient, reliable, and scalable data transmission services that are essential for ICN operations.

The multipath transmission system proposed in [13] utilizes multiple ICN routers to establish multiple single-hop relay paths between two endpoints. Specifically, multipath transmission is treated as a service, generating a corresponding Multipath Transmission Service ID (MPSID). ICN routers offering multipath services register their IP addresses and MPSID mappings in the NRS, forming a candidate relay node set. Data service nodes can query the NRS using the MPSID to obtain the IP addresses of these relay nodes. Based on a relay node selection strategy, several appropriate relay nodes are chosen to form multiple relay paths. Upon receiving the transmitted data packets, a relay node can query the NRS using the receiver’s device ID to obtain the destination IP address. The relay node then performs the necessary IP address translation and forwards the data packet to the destination. This multipath transmission architecture based on the NRS not only supports the default IP path but also allows for the use of multiple relay paths, thus improving both robustness and efficiency. Unlike overlay network routing methods, this multipath transmission mechanism operates at the network layer using the MPSID, rather than at the application layer.

In this paper, we propose a multipath-based caching architecture based on the multipath transmission mechanism supported by the NRS. As shown in Figure 1, the user obtains the desired content by sending a request message to the network. The source represents the entity storing the data, which provides data to satisfy the user’s content retrieval needs. In this paper, Named Data Chunks (NDCs) are the fundamental units of in-network caching. These chunks are generated by dividing the original application layer content objects into fixed-size segments. Each NDC has a globally unique name. In addition to forwarding functions, the ICN routers also have the capability to support multipath transmission, cache chunks, and respond to user requests. The NRS is responsible for storing the mapping between EIDs and IP addresses. The details of the multipath transmission, caching, and retrieval processes are as follows:

EID Registration: Nodes supporting multipath services register their NA and MPSID mapping relationship with the NRS (Step 1). Simultaneously, the source registers the mapping relationship between its NA and the names of all the stored NDCs (Step 2).
Chunk Retrieval: When user A wants to retrieve a chunk, the system queries the NRS using the chunk’s name to resolve its location and obtain the NA of the node storing the chunk. In this case, we assume that the user requests chunks 1 and 2 (Step 3). The system then selects a service node based on the replica node selection policy. Here, we employ the nearest-replica routing policy [33], where the closest replica node to the user is chosen. The chunk request is then forwarded to the selected service node, which, in this case, is the source node, as it is the only node storing the chunks (Step 4). The source node then returns the requested chunks (Step 5).
Multipath Transmission: During the transmission of chunks, the edge node R5 queries the NRS using the MPSID to obtain the list of supporting multipath transmission intermediate nodes’ NAs (Step 6). It then selects several appropriate intermediate nodes, such as R2 and R4, to form a candidate relay node set, establishing multiple transmission paths (Step 7). The edge node schedules the chunks for transmission across the selected paths. In this case, chunk 1 is transmitted via the relay path R5 → R2 → R1, and chunk 2 is transmitted via the relay path R5 → R4 → R3 → R1. Upon receiving the data, the multipath service node queries the NRS for the receiver’s IP address using the receiver’s device ID and performs the corresponding IP address conversion to forward the data to the destination (Step 8).
Caching Decisions: As the chunks are transmitted along the selected paths, intermediate nodes decide whether to cache the chunks based on the caching policies. Assume that each intermediate node can store only one chunk. In this case, we assume the edge caching policy [33], where chunks are cached only at nodes closer to the receiver. Thus, in path R5 → R2 → R1, node R2 caches chunk 1, while in path R5 → R4 → R3 → R1, node R3 caches chunk 2 (Step 9). Once the caching operation is complete, the caching nodes must register the mapping between their NA and the cached chunk name with the NRS (Step 10).
Subsequent Retrieval: When User B wants to retrieve chunks 1 and 2 again, the same procedure is followed. First, the NAs of the chunks are resolved through the NRS (Step 11), which returns the NAs of the intermediate nodes caching the chunks. Then, based on the nearest-replica routing policy, the closest replica nodes (R2, R3) are selected, and chunk requests are sent to the corresponding nodes (Step 12). The nodes then return the requested chunks (Step 13).

Note that if only the default single path R5 → R2 → R1 is used for transmission, due to the limited cache space at node R2, it can only store one chunk. Therefore, the request for the other chunk must be fetched again from the source node. However, the multipath-based caching mechanism can leverage the resources of multiple paths, enabling both chunks to be cached within the network. This allows the requests for both chunks to be served by nearby cache nodes, effectively reducing content retrieval latency and improving cache resource utilization.

Additionally, it is important to note that in this paper, NDCs are used as the basic units of in-network caching. In the remainder of this paper, the terms content and chunk will be used interchangeably to refer to NDCs.

3.2. Problem Description

In this section, we construct a network model to define and illustrate the optimization problem for multipath-based cache placement.

We consider an arbitrary network topology represented by

G = (V, E)

, where V denotes the set of nodes in the network, including the cache nodes and source servers, and E represents the set of edges connecting the nodes. Additionally, we assume that users are connected to edge nodes, with each edge node aggregating requests from the users linked to it. The set of edge nodes is denoted as

V_{e}

, where

V_{e} \subseteq V

.

Each cache node

v \in V

has a limited cache space

c_{v}

. When the cache is full, nodes must evict cached content to accommodate new content. Transmission paths between a node v and a user u are denoted as

P_{v, u}

, which are established using the multi-path transmission mechanism introduced in Section 3.1.

The set of available content in the network is represented as M, where each content item

m \in M

has a size

s_{m}

. We assume that the source servers store all content, ensuring that user requests are always fulfilled.

The request rate for content is assumed to remain constant over a period of time. We use

r_{m}^{u}

to represent the request rate for content m of user u. The objective is to minimize the average transmission latency of user requests across the network. Table 1 summarizes the symbols used.

The goal is to minimize the average transmission latency for user requests, expressed as:

min D = \frac{1}{R} \sum_{u \in V_{e}} \sum_{m \in M} \sum_{v \in V} \sum_{p \in P_{v, u}} r_{m}^{u} \cdot x_{v}^{m} \cdot a_{u, v}^{m} \cdot z_{p}^{v, u, m} \cdot t_{p}^{v, u, m}

(1)

Here,

x_{v}^{m}

is a binary cache decision variable, indicating whether content m is cached at node v (1 if cached, 0 otherwise). The variable

a_{u, v}^{m}

is a binary replica selection variable, determining whether node v serves as the response node for user u’s request for content m. This variable follows the replica selection strategy, and we assume that each request is served by only one replica node. The variable

z_{p}^{v, u, m}

represents the transmission path selection, indicating whether path p from node v to user u is selected for transmitting content m. Only one path is chosen for each content transmission.

This paper primarily focuses on cache placement decisions, including path selection and cache decision-making. The replica selection strategy adopted is the commonly used nearest replica selection strategy [33] in ICN architectures based on the NRS. Under this strategy, when content m has multiple replica nodes, the request for content m by user u is handled by the nearest replica node storing content m. The objective function based on the nearest replica selection strategy is expressed as follows:

min D (X, Z) = \frac{1}{R} \sum_{m \in M} D_{m} (X, Z)

(2)

D_{m} (X, Z) = \{\begin{matrix} \sum_{v \in V} \sum_{u \in U_{v}^{m}} \sum_{p \in P_{v, u}} r_{m}^{u} \cdot x_{v}^{m} \cdot z_{p}^{v, u, m} \cdot t_{p}^{v, u, m} if x_{v}^{m} > 0, v \in V \\ \sum_{u \in U_{src}^{m}} \sum_{p \in P_{v, u}} r_{m}^{u} \cdot z_{p}^{src, u, m} \cdot t_{p}^{src, u, m} if x_{v}^{m} = 0, v \in V \end{matrix}

(3)

subject to

\sum_{m \in M} s_{m} \cdot x_{v}^{m} \leq C_{v}, \forall v \in V

(4)

\sum_{p \in P_{v, u}} z_{p}^{v, u, m} = 1, \forall u \in V_{e}, \forall m \in M, \forall v \in V

(5)

z_{p}^{v, u, m} \leq x_{v}^{m}, \forall u \in V_{e}, \forall m \in M, \forall v \in V, \forall p \in P_{v, u}

(6)

x_{v}^{m} \leq \sum_{p \in P_{v}} z_{p}^{m}, \forall v \in V, \forall m \in M

(7)

Here,

U_{v}^{m}

represents the set of users routed to node v when requesting content m, while

U_{src}^{m}

represents the set of users routed to the source service node src when requesting content m. Equation (4) represents the cache capacity constraint, which ensures that the total amount of cached content at each node does not exceed its maximum allowed cache capacity. Equation (5) expresses the path selection constraint, stating that for each content request, if node v is chosen as the responding node, a valid transmission path must be selected to transmit content m. Equation (6) represents the request response and path availability constraint, which requires that content m must be cached at node v and the path p must be available before it can be used for transmission. Equation (7) illustrates the dependency between path selection and cache placement, stating that if node v caches content m, at least one path passing through node v must be selected for the transmission of content m.

4. Multipath-Based Caching Strategy

The optimization objective of multipath-based caching placement is influenced by both multipath selection and cache node placement. Directly solving the problem leads to extremely high computational complexity. Furthermore, the objective of this paper is not to compute the optimal solution offline, as this would introduce unacceptable computational and communication overhead, particularly in large-scale distributed networks. To address this, we propose a heuristic approach, dividing the problem into two steps, multipath selection and cache node selection along the paths.

In the path selection phase, the goal is to determine the optimal transmission path for each requested content, minimizing total transmission latency. This decision is based on the overall transmission performance and the cache availability across all candidate paths. Once the paths are selected, the next step is to identify which nodes along those paths will cache the content. The objective of the cache node selection phase is to optimize content placement, enhancing caching efficiency, reducing transmission latency, and balancing the load across cache nodes.

For the following of this section, we first introduce the content-aware hybrid multipath selection strategy, which includes content popularity estimation algorithms, cache-prioritized scheduling strategies, and transmission-prioritized scheduling strategies, to address the path selection problem. Next, we present the cache load-based node selection algorithm to decide where to place content along the selected paths. Finally, we discuss the implementation of the proposed strategy and provide an analysis of the overhead involved.

4.1. Content-Aware Hybrid Multipath Selection Strategy

This section explores the strategy of selecting the most appropriate transmission path from multiple candidates for content delivery. Traditional multipath selection mechanisms typically focus on aggregating the bandwidth resources of multiple paths to avoid network congestion and reduce data transmission latency. However, they often neglect the potential benefits of leveraging in-network caching resources. By caching content closer to users, future requests for the same content can bypass remote servers, effectively reducing both latency and transmission load. It can be expected that, even though the current path selection may lead to increased transmission latency, the potential caching benefits can reduce latency for subsequent content requests, ultimately improving overall transmission performance.

However, in-network cache resources are typically limited compared to the volume of content being transmitted. Therefore, maximizing cache utility is crucial, with content popularity playing a central role in determining caching benefits. Highly popular content, which is requested more frequently, offers significant caching advantages. Prioritizing such content for caching reduces the need for repeated transmissions from the source. In contrast, less popular content provides limited caching benefits. Excessive caching of low-popularity content risks cache pollution, degrading overall caching efficiency [34]. Additionally, the available bandwidth on transmission paths significantly impacts data delivery speed. When large volumes of content traverse the same link, congestion can occur, severely affecting transmission efficiency [35]. Therefore, a robust multipath selection strategy should comprehensively consider factors such as content popularity, cache availability along the transmission paths, and path congestion to optimize in-network cache utilization and minimize overall transmission latency.

To make full use of the limited in-network caching and bandwidth resources, we propose a content-aware hybrid multipath selection strategy based on the principle of prioritizing high-popularity content for caching and low-popularity content for transmission. This strategy consists of a content popularity estimation algorithm, a cache-prioritized scheduling strategy, and a transmission-prioritized scheduling strategy. As illustrated in Figure 2, we first analyze the popularity of content to classify it into high-popularity or low-popularity categories. High-popularity content is scheduled with a cache-prioritized strategy to maximize cache resource utilization, while low-popularity content is scheduled with a transmission-prioritized strategy to avoid congestion and reduce transmission latency.

4.1.1. Content Popularity Estimation Algorithm

Determining content popularity is a challenging task, particularly in large-scale deployments. Tracking the popularity of all content across such environments is both difficult and impractical. A common method is tracking the request frequency at each node. However, in ICN, there is a filtering effect for upstream nodes: it only requests that experience cache misses are forwarded to higher-level nodes. Therefore, relying solely on local request frequency cannot accurately reflect the true popularity of content. To address this, ref. [36] proposed a local popularity estimation method. In this approach, edge routers collect request frequencies and forward the data to upstream gateways. Intermediate nodes then aggregate this data to rank content popularity. While this method improves accuracy, it still faces challenges when dealing with large volumes of content. Specifically, collecting and reporting request frequencies for each piece of content generates significant communication overhead, which can negatively impact network efficiency.

To reduce this overhead, we propose a lightweight content popularity estimation method. Rather than tracking the precise popularity of every piece of content, our approach focuses on identifying whether content qualifies as high-popularity content within a specific area. To achieve this, we utilize the differentiation threshold information from intermediate nodes along multiple paths to comprehensively assess the high-popularity content threshold. Since only the threshold values at the nodes need to be recorded, this significantly reduces communication overhead. Specifically, Each cache node calculates a local threshold based on its cached content’s request frequency, using a differentiation ratio

ρ

. The highest threshold from each sub-path is selected, and the averages across sub-paths create a comprehensive threshold for content classification.

For content heat calculation, traditional methods based on cumulative historical access counts fail to accurately evaluate content heat, as popularity often changes over time. To address this, following the approach outlined in [22], we incorporate content freshness into the calculation of content heat. The content heat is calculated using Equation (8):

{Heat}_{i, j} (t) = \{\begin{matrix} 1, & if request is the first arrival \\ 1 + {Heat}_{i, j} (t^{'}) \cdot e^{- λ (t - t^{'})}, & otherwise \end{matrix}

(8)

where t is the time when content i reaches node j,

t^{'}

is the arrival time of the previous request for content i, and

e^{- λ (t - t^{'})}

is a monotonically decreasing function representing content freshness. Algorithm 1 shows the detailed process of estimating the popularity of content.

Algorithm 1: Content Popularity Estimation

4.1.2. Cache-Prioritized Scheduling Strategy

High-popularity content provides greater caching benefits, so in multipath selection, paths that can most effectively cache this popular data should be prioritized. Accurately evaluating the available cache resources on a path is crucial for optimizing caching performance. Traditional methods evaluate cache sufficiency solely based on cache space. However, as noted in [37], cache space alone does not account for the actual workload of nodes. Instead, it estimates the caching capability of nodes by observing recent cache consumption, focusing mainly on the consumption of cache space. However, this approach neglects the factor of data access frequency. Simply evaluating the cache consumption ratio does not accurately reflect the effectiveness of caching. For instance, if a node caches a large amount of unpopular content, even if it consumes a significant amount of cache space, it does not improve caching efficiency.

To address this, we propose a method that incorporates content access frequency into the evaluation of available cache space. We calculate unused or inefficiently used cache resources by considering content that has not been requested during a given period. The available cache space for a node i is defined as

C_{i, available} = C_{i, total} - C_{i, requested}

(9)

where

C_{i, total}

is the total physical cache capacity of node i and

C_{i, requested}

is the amount of content requested within the observation period. To prevent newly cached content from being prematurely classified as inefficiently used, a minimum waiting time is set. If the content is not accessed within the minimum waiting time, it will not be counted as unrequested content. The total available cache space along a path

{path}_{i}

is calculated as follows:

PathAvailCache (p a t h_{i}) = \sum_{j \in {path}_{i}} C_{j, available}

(10)

Using this method, we evaluate cache resource effectiveness on multiple paths and select the path with the highest available cache space for transmitting high-popularity content. Algorithm 2 shows the detailed process of cache-prioritized scheduling strategy.

Algorithm 2: Cache-Prioritized Scheduling

4.1.3. Transmission-Prioritized Scheduling Strategy

The caching benefit of low-popularity content is limited. Therefore, in multipath selection for such content, the primary focus should be on transmission efficiency, with an emphasis on the transmission resources of the paths rather than caching. Unlike traditional IP networks, ICN involves not only forwarding traffic but also caching service traffic and multipath transmission traffic, which makes the network traffic more complex and dynamic [32]. This complexity presents challenges in accurately assessing path states. Moreover, as highlighted in [38], path congestion has a significant impact on transmission performance. Therefore, accurately sensing changes in path states and avoiding congestion are critical for designing an efficient scheduling strategy.

Traditional IP scheduling strategies rely on end-to-end state information, often neglecting the status of in-network nodes, which leads to inaccurate path state assessments. To address this, ref. [39] suggests evaluating intermediate node congestion by monitoring average queue lengths. Inspired by this, we estimate the congestion level by periodically monitoring the average queue length of each port at a node. The overall congestion level for a path is the sum of congestion levels at all nodes along it. To comprehensively assess path performance, the path status value is calculated by combining three metrics:

{bandwidth}_{i}

, the minimum available bandwidth along the path;

{RTT}_{i}

, the round-trip time of the path; and

{CL}_{i}

, the congestion level of the path. The path status value is defined as follows:

PathStatus (p a t h_{i}) = (c_{1} \cdot {bandwidth}_{i} + \frac{c_{2}}{{RTT}_{i}}) \cdot sigmoid ({CL}_{i})

(11)

sigmoid (x) = \frac{1}{1 + e^{γ x}}

(12)

where

c_{1}

and

c_{2}

are weighting factors for bottleneck bandwidth and transmission delay, respectively.

γ

is the congestion impact factor, controlling the sensitivity of congestion to the path status value. The path bottleneck bandwidth, path latency, and path congestion level can be obtained through periodic probing, combined with in-band telemetry techniques [40]. The sub-path with the highest path status value is selected for transmitting low-popularity content, as it indicates better conditions for handling traffic with higher efficiency. Algorithm 3 shows the detailed process of transmission-prioritized scheduling strategy.

Algorithm 3: Transmission-prioritized scheduling strategy

4.2. Cache Load-Based Node Selection Strategy

In Section 4.1, we discussed the selection of an appropriate transmission path from multiple candidates. In this section, we focus on the selection of suitable cache nodes along the chosen path, with the aim of balancing the load on cache nodes, improving caching efficiency, and reducing transmission latency.

In large-scale data transmission scenarios, the volume of content to be transmitted is immense. If content is cached indiscriminately at all nodes along a path, it can lead to substantial cache redundancy, where multiple nodes store the same content. This redundancy wastes valuable cache resources and undermines overall caching efficiency. Moreover, excessively relying on high-performing cache nodes, such as edge nodes, can lead to node overload. This often results in cached content being replaced before it has had a chance to serve subsequent requests, thus diminishing the effectiveness of caching. Therefore, selecting appropriate caching nodes and optimizing cache resource utilization are crucial considerations in the caching strategy along paths.

To address these challenges, we propose a cache node selection algorithm based on cache load, which comprehensively considers both the utilization of already cached content and the occupancy of the node’s cache space. The cache load of a node i is defined as

{CacheLoad}_{i} = 1 - \frac{C_{i, available}}{C_{i, total}}

(13)

where

C_{i, available}

is the available cache space of node i, as determined by the method in Section 4.1.2, and

C_{i, total}

is the total cache capacity of node i. The cache load measures the utilization level of the node’s cache resources. A higher cache load indicates efficient use of cached resources, where the cached content is actively serving requests and providing fast responses. A lower cache load indicates inefficient use of cached resources, where cached content occupies space but fails to effectively serve requests. Such content should be replaced by new content.

To maximize cache resource utilization and balance the load across nodes, the algorithm selects the node with the lowest cache load along the path for caching new content. This selection ensures that cache resources are efficiently utilized without overloading specific nodes. The cache node is determined as follows:

CacheNode = arg min_{i} {CacheLoad}_{i}

(14)

Algorithm 4 shows the detailed process of the cache node selection algorithm based on cache load.

Algorithm 4: Cache node selection algorithm based on cache load

4.3. Implementation

4.3.1. Packet Header Design

Figure 3 illustrates the design of the packet header. To ensure compatibility with the existing IP network architecture, the ID message header is placed in the payload of the IP packet, while the cache message header resides in the payload of the ID packet. The ID message header contains key information such as the ReceiverID, which identifies the receiving node; the ChunkID, which specifies the unique identifier for the chunk being transmitted; and the MultiPathID, which identifies the transmission path. The cache message header indicates whether the packet is a request or data message and includes fields essential for implementing the cache load-based selection strategy. The Type field specifies whether the packet is a request (req) or data (data) packet. The MinCacheLoad field records the minimum cache load along the transmission path, applicable for both request and data packets. The NodeID field identifies the selected cache node for storing the chunk. The Flag field determines the caching strategy, where a value of 0 indicates the default caching strategy (i.e., the content is cached whenever there is available space in the cache node, aiming to quickly fill the in-network cache), and a value of 1 indicates the cache load-based strategy. By default, the flag for high-popularity content is set to 1, while the flag for low-popularity content is set to 0.

4.3.2. Multipath Selection Strategy Process

Figure 4 illustrates the implementation process of the content-aware hybrid multipath selection strategy. Upon receiving a chunk request packet, the system constructs candidate multipaths using the NRS-based multipath establishment algorithm from [41]. The popularity of the requested chunk is then updated and calculated using Equation (8). Then, based on Algorithm 1, the chunk is classified into high-popularity content and low-popularity content. For high-popularity content, the cache-prioritized scheduling strategy is applied (Algorithm 2). For low-popularity content, the transmission-prioritized scheduling strategy is used (Algorithm 3). Once the transmission path is selected, the corresponding multipath ID is embedded into the data packet. Simultaneously, the mapping between the chunk ID and the selected transmission path ID is recorded locally. The data packet is then transmitted following the multipath transmission process based on the NRS. The proposed multipath selection strategy operates at the granularity of chunks. For each complete chunk, the strategy is executed only once upon receiving the first request packet. Subsequent packets for the same chunk are transmitted using the recorded mapping between the chunk ID and the transmission path ID.

4.3.3. Cache Node Selection Strategy Process

As mentioned in Section 3.1, large-scale content is uniformly divided into equal-sized chunks, typically measured in megabytes (MB), for transmission. Due to the Maximum Transmission Unit (MTU) limitations of underlying network infrastructure, the size of a single data packet cannot exceed the MTU, which is typically set to 1500 bytes. To comply with this limitation, the transmission of chunks requires splitting them into multiple packets. In ICN networks, receiver-driven transport control protocols are widely used. These protocols, initiated by the receiver, allow for dynamic adjustment of transmission rates, making them well-suited for content distribution, cache management, and bandwidth allocation. In our implementation, the first packet of each chunk is responsible for identifying the node with the minimum cache load along the transmission path. The ID of this node is recorded for subsequent operations. The detailed process of cache decision for a given chunk is as follows:

(1): Forwarding process of the first request packet

As illustrated in Figure 5, the receiver initiates a request for a chunk. In the first request packet, the minCacheLoad field and the CacheNode field are both initialized to Null.

(2): Forwarding process of the first data packet

As shown in Figure 6, upon receiving the first request packet, the service node generates the corresponding first data packet. After selecting the transmission path based on the multipath selection strategy, the caching strategy flag is set. The MinCacheLoad field in the data packet is initialized to MaxValue, and the CacheNode field is set to Null. During the forwarding of the first data packet, each cache node along the path calculates its cache load. If a node’s cache load is less than the value in the MinCacheLoad field, the packet’s MinCacheLoad field is updated with the node’s cache load, and the CacheNode field is updated with the node’s ID. Additionally, the current node allocates a temporary buffer to cache the first data packet.

(3): Forwarding process of the subsequent request packets

As shown in Figure 7, the forwarding process for subsequent request packets involves using the values of the MinCacheLoad and CacheNode fields determined by the first data packet. These fields are populated with the corresponding values recorded during the first data packet’s forwarding process. The MinCacheLoad field reflects the minimum cache load along the path, while the CacheNode field identifies the selected cache node.

(4): Forwarding process of subsequent data packets

As shown in Figure 8, subsequent data packets initialize their MinCacheLoad and CacheNode fields based on the values carried by the corresponding request packet. During the forwarding process, each cache node checks whether its node ID matches the CacheNode value in the data packet. If the node ID does not match the CacheNode value, the node clears the temporary buffer allocated for the first data packet. If the node ID matches the CacheNode value, the node proceeds with the caching process for the data packets.

4.4. Overhead Analysis

The communication overhead introduced by our scheme mainly arises from the multipath probing process. The cost of multipath probing increases linearly with the number of multipath candidates. However, the multipath establishment method we use [41] effectively limits the number of candidate paths. Additionally, the multipath probing information is sent only once per period. Considering the performance improvements our scheme provides, we argue that the overhead from multipath probing is acceptable.

Regarding computational overhead, the time complexity of the cache-prioritized scheduling strategy, transmission-prioritized scheduling strategy, and content popularity estimation algorithm is O(K), where K is the number of candidate paths. The cache load-based cache node selection algorithm requires each node to compute its cache load and update packet fields. These calculations involve simple arithmetic with a few parameters and do not require complex computations. Furthermore, these algorithms operate at the granularity of chunks, meaning calculations are only performed during the transmission of the first packet of each chunk. For subsequent packets, only simple field matching is needed, without further computation. Additionally, we adopt an O(1) time complexity for the Least Recently Used (LRU) cache replacement policy, which ensures that our scheme has low computational overhead.

5. Performance Evaluation

This section presents simulation experiments for the proposed Multipath-based Caching Placement Strategy (MCPS). We first introduce the experimental setup. Then, we investigate the performance of our proposed caching strategy and compare it against several representative related works under various experimental parameters.

5.1. Experimental Setup

We use Icarus [42] as the simulation software, a Python-based ICN caching simulator licensed under the GNU GPLv2. The version of Icarus used in this paper is 0.8.0. Icarus is well-suited for simulating various caching strategies and network topologies, offering object-oriented encapsulation that enables easy extension of its functionality and modules. The simulator is capable of handling millions of pre-warmed or requested simulations within a short time, making it particularly effective for simulating large-scale data requests. Our simulations were conducted on a Dell PowerEdge R720 server, powered by Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00 GHz with 32 cores, and equipped with 16 GB ECC DDR3 memory.

In Icarus, we implemented the Name Resolution System-based multipath transmission mechanism and the nearest replica routing mechanism. Furthermore, our proposed MCPS was developed, incorporating the content-aware hybrid multipath selection strategy and the cache load-based cache node selection strategy. The MCPS configuration included a path probing interval of 5 s, a threshold ratio of 0.1 for differentiating high-popularity and low-popularity content, a content popularity decay coefficient

λ

of 0.5, and a minimum waiting time of 5 s for newly cached content. For path status calculations, the bandwidth weight factor

c_{1}

, latency weight factor

c_{2}

, and congestion weight factor

γ

were set to 2, 500, and 1, respectively. These algorithm parameters can be adjusted based on the specific scenarios encountered.

In addition to MCPS, several representative caching mechanisms were implemented for comparison, including the on-path caching strategy with content popularity and replica number control (PB-NCC) [20], the hash-based caching mechanism (HR_SYMM) [24], and the neighborhood cooperative caching mechanism (NCR-BN) [25]. The performance of the proposed architecture was evaluated using metrics such as cache hit ratio, data transmission latency, and cache node load balancing. The LRU algorithm was selected as the cache replacement strategy.

The experimental setup included

1 \times 10^{3}

network contents, with each content containing 100 equal-sized chunks, resulting in a total of

1 \times 10^{5}

chunks. A total of

4 \times 10^{5}

user requests were generated, with

1 \times 10^{5}

requests for warm-up at the start. We assume that the arrival process of the requests on the edge nodes follows a Poisson distribution, and the cache size ratio ranged from 0.01 to 0.1. The popularity of different content items is assumed to follow a Zipf distribution [43], with the Zipf Alpha parameter set between 0.6 and 1.2. According to [44], the popularity of different chunks contained within the content also follow a Zipf distribution, with the Zipf Alpha parameter set to 1.1. The in-network link bandwidth is uniformly set to 10 chunks/s, meaning that 10 chunks can be transmitted per second, simulating the limited in-network bandwidth resources in a large-scale data transmission scenario. The bandwidth between the source node and the edge nodes, as well as the bandwidth between users and the edge nodes, is set to 100 chunks/s, ensuring that the links to the edge nodes are not the bottleneck. The user request rate varies from 10 to 100 requests per second. To validate the scalability of the proposed strategy in different network topologies, we use three network topologies: the European academic network GEANT [45], and two networks from Rocketfuel [46], Sprintlink and Exodus. Each experiment was performed five times, with the results averaged to ensure robustness.

Table 2 lists the main parameters involved in the simulation.

5.2. Simulation Results

We configure various experimental scenarios by adjusting parameters such as network topology, cache size ratio, the Zipf distribution parameter Alpha, and request rates. For each scenario, we evaluate the performance of different strategies using metrics including data transmission latency, cache hit ratio, and cache node load balancing. The experimental results are presented based on these performance evaluation metrics to facilitate the analysis.

5.2.1. Data Transmission Latency

Data Transmission Latency (DTL) effectively reflects the network’s transmission performance. Lower latency results in better user experiences, aligning with the goals of network optimization. DTL is measured as the average time required for the requested content to reach the user.

Figure 9 illustrates the variation in data transmission latency across different topologies as the cache size ratio increases, with a Zipf distribution parameter

α

set to 0.8, a request rate of 50 requests per second, and cache size ratios ranging from 0.01 to 0.1. As the cache size ratio increases, transmission latency decreases for all strategies across all topologies. Among them, the MCPS strategy exhibits significantly lower latency compared to the other strategies. This improvement is mainly attributed to the MCPS strategy’s effective integration of multipath transmission and cache resource utilization. In contrast, the HR_SYMM strategy shows the worst performance in terms of latency due to its random selection of cache nodes, leading to an increase in transmission path length. The PB-NCC strategy, which relies solely on the default transmission path, has limited access to in-network cache resources and does not account for path congestion, resulting in higher latency. Although the NCR-BN strategy improves cache utilization through cooperative caching, it still relies on the default path and does not exploit alternative paths, leading to higher latency.

Figure 10 shows the variation in data transmission latency for different strategies as the Zipf distribution parameter

α

changes, with the cache size ratio set to 0.05 and a request rate of 50 requests per second. The MCPS strategy maintains its advantage in terms of data transmission latency. Notably, as

α

increases, the differences in latency between strategies gradually diminish. This occurs because higher

α

values concentrate more requests on a small set of popular content, leading to a high cache hit ratio. Under these conditions, most requests are served by cache nodes located closer to the users, reducing transmission latency differences between strategies.

Figure 11 illustrates the variation in data transmission latency across different topologies and caching strategies as the request rate increases. The cache size ratio is fixed at 0.05, and the Zipf distribution parameter

α

is set to 0.8. As the request rate increases, all caching strategies experience increased latency due to higher data transmission loads and more frequent network congestion. The MCPS strategy maintains stable latency even under higher request rates, primarily due to its effective use of multipath transmission and cache resources. By optimally selecting paths and cache nodes, MCPS avoids congestion and maximizes in-network resource usage, reducing transmission latency. In contrast, the PB-NCC strategy, which only uses default transmission paths, experiences gradual latency increases as congestion occurs. Similarly, while NCR-BN improves cache utilization through neighborhood cooperation, it still relies on default paths and fails to exploit alternative lighter paths, resulting in higher latency. The HR_SYMM strategy, which randomly selects cache nodes and paths using a hash mechanism, neglects the actual state of paths and cache nodes, often resulting in inefficient detours or congested paths, severely affecting transmission efficiency.

5.2.2. Cache Hit Ratio

The cache hit ratio (CHR) is a key indicator for evaluating the effectiveness of content caching. A higher cache hit ratio signifies better caching performance. It is defined as the ratio of the number of requests served by in-network nodes to the total number of requests.

Figure 12 illustrates how the cache hit ratio changes as the cache size ratio increases, with the Zipf distribution parameter

α

set to 0.8 and a request rate of 50 requests per second. As the cache size ratio increases, the cache hit ratio improves for all strategies across all topologies. Among them, the MCPS strategy achieves a cache hit ratio second only to the HR_SYMM strategy, outperforming the other strategies. The high cache hit ratio of HR_SYMM is mainly due to its hash-based routing mechanism, which ensures that content is cached in randomly designated nodes, maximizing cache diversity. In contrast, the PB-NCC strategy exhibits a lower cache hit ratio as it only utilizes resources along the default transmission path, limiting its caching scope. The NCR-BN strategy improves cache utilization through neighborhood cooperative caching, but since it does not exploit caching resources along alternative paths, its cache hit ratio remains lower than that of the MCPS strategy.

Figure 13 illustrates the variation in cache hit ratio for different caching strategies as the Zipf distribution parameter

α

changes. The cache size ratio is set to 0.05, and the request rate is 50 requests per second. MCPS provides approximately 20.3%, 16.2%, and 10.4% improvement in cache hit ratio over NCR-BN in the three topologies, respectively. As the value of

α

increases, the cache hit ratio gradually improves for all strategies. This trend is due to the increasing concentration of requests on a small subset of popular content as

α

grows.

Figure 14 shows the variation in cache hit ratio for different caching strategies as the request rate increases.The cache size ratio is uniformly set to 0.05, and the Zipf distribution parameter

α

is set to 0.8. Across all strategies, the cache hit ratio remains stable despite the increase in request rate. Among the strategies, MCPS achieves a cache hit ratio that is consistently second only to the HR_SYMM strategy, outperforming the other strategies.

5.2.3. Cache Node Load Balancing

Cache node load balancing is evaluated using Jain’s Fairness Index [47], an established metric for assessing the fairness of resource allocation. Jain’s Fairness Index measures the uniformity of resource distribution. In this study, it is used to assess the level of cache node load balancing. The formula for Jain’s Fairness Index is as follows:

Jain ’ s fairness index = \frac{{(\sum_{j = 1}^{N} p_{j})}^{2}}{N \cdot \sum_{j = 1}^{N} p_{j}^{2}}

(15)

where

p_{j}

represents the number of requests served by the cache node j, and N is the total number of cache nodes. The closer Jain’s Fairness Index is to 1, the more evenly distributed the load across nodes, indicating better load balancing.

Figure 15 illustrates the variation in cache node load balancing across different topologies as the cache size ratio changes. The Zipf distribution parameter

α

is set to 0.8, and the request rate is 50 requests per second. As the cache size ratio increases, the load balancing among cache nodes fluctuates. The MCPS strategy consistently outperforms others in load balancing due to its comprehensive consideration of cache node load along transmission paths, which ensures efficient resource utilization. The PB-NCC strategy also shows relatively good load balancing, aided by its control over content popularity and replica distribution. In contrast, HR_SYMM performs poorly due to its random cache node selection, which ignores content popularity and cache node load, leading to imbalanced cache distribution across the network.

Figure 16 illustrates the variation in cache node load balancing for different caching strategies as the Zipf distribution parameter

α

increases. The cache size ratio is set to 0.05, and the request rate is 50 requests per second. As

α

grows, load balancing decreases for all strategies, as requests become concentrated on a small set of popular content, causing cache nodes storing these items to handle disproportionate traffic. The MCPS strategy consistently maintains an advantage in cache node load balancing. As

α

increases, HR_SYMM experiences a sharper decline in load balancing. At lower

α

values, the more uniform request distribution benefits HR_SYMM’s random cache node selection, leading to better load balancing. However, as

α

grows and requests concentrate on a few popular items, HR_SYMM’s disregard for content popularity worsens load imbalances, resulting in a performance decline.

Figure 17 illustrates the variation in cache node load balancing across different topologies as the request rate increases. The cache size ratio is uniformly set to 0.05, and the Zipf distribution parameter

α

is 0.8. The MCPS strategy consistently maintains an advantage in cache node load balancing. Specifically, MCPS provides approximately 5.3%, 4.9%, and 14.1% improvement in cache node load balancing over PB-NCC across the three topologies, respectively.

5.3. Discussion

The MCPS strategy adapts well to various scenarios and demonstrates superior performance. By considering both multipath caching and transmission resources, and incorporating content popularity awareness, the strategy applies different multipath scheduling approaches for content with varying popularity. This enables MCPS to fully utilize in-network resources, improve cache hit rates, and reduce data transmission latency. Additionally, the cache node selection algorithm, which is based on node cache load, effectively prevents cache redundancy and node overload, further optimizing cache resource allocation. These mechanisms ensure that MCPS achieves minimal data transmission latency while maintaining a high cache hit rate and balanced load across cache nodes. In contrast, although PB-NCC effectively balances cache node load by controlling content popularity and replica count, its dependence on cache and transmission resources from the default path leads to lower cache hit rates and higher transmission latency. NCR-BN enhances cache hit rates by improving the utilization of neighboring cache resources through collaborative caching. However, its performance is still limited by its reliance on the default path, restricting resource utilization from other paths and limiting transmission efficiency. HR_SYMM achieves the highest cache hit rate by utilizing the cache nodes of the entire network through a hashing routing mechanism. However, it fails to consider content popularity, node status, or path conditions, resulting in high transmission latency and poor load balancing across cache nodes. Specifically, in terms of data transmission latency, MCPS reduces latency by 50.05% compared to HR_SYMM, and in terms of cache node load balancing, MCPS improves load balancing by 46.59% over HR_SYMM.

6. Conclusions

This paper proposes a caching strategy based on multipath transmission for ICN networks. The caching placement problem is addressed by decomposing it into two steps: multipath selection and cache node selection along the paths. For multipath selection, we introduce a content-aware hybrid multipath selection strategy that comprehensively considers the transmission and caching resources of paths. By optimizing scheduling strategies based on content popularity, this approach enhances cache resource utilization and transmission efficiency. For cache node selection, we propose a cache load-based node selection strategy that combines the utilization efficiency of cached content and the usage of cache resources. By prioritizing nodes with the lowest cache load for caching, this strategy balances cache node load and improves caching efficiency. The effectiveness of this approach was verified through simulation experiments. the results demonstrate that the proposed strategy performs excellently in terms of data transmission latency, cache hit rate, and cache node load balancing. Specifically, in terms of data transmission latency, the proposed strategy reduces latency by 53.4% compared to the on-path caching strategy PB-NCC.

However, there are some limitations in our current approach. The use of the Least Recently Used cache replacement policy can lead to cache pollution and cyclic replacement issues, which reduce caching efficiency. Additionally, while the strategy optimizes the selection of cache nodes, the caching capacity of individual nodes is still limited. This limitation becomes particularly significant in large-scale data distribution scenarios, especially when dealing with high-volume content. In future work, we will focus on developing more efficient cache replacement strategies and investigate multi-tiered caching architectures to enhance the caching capacity of individual nodes and better handle large-scale data distributions.

Author Contributions

Conceptualization, W.Z. and R.H.; methodology, W.Z. and R.H.; software, W.Z.; writing—original draft preparation, W.Z.; writing—review and editing, W.Z. and R.H.; supervision, R.H.; project administration, R.H.; funding acquisition, R.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Key R&D Program of China: Application Demonstration of Polymorphic Network Environment for Computing from the Eastern Areas to the Western. (Project No. 2023YFB2906404).

Data Availability Statement

Data are contained within the article.

Acknowledgments

We would like to express our gratitude to Jianping Song and Yuqi Liu for their meaningful support for this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ICN	Information-Centric Networking
IoT	Internet of Things
JLab	Thomas Jefferson National Accelerator Facility
PB	Petabytes
CDN	Content Delivery Network
EID	Entity ID
NA	Network Address
IDP	Identifier Protocol
MPSID	Multipath Transmission Service ID
NDC	Named Data Chunk
MB	Megabytes
MTU	Maximum Transmission Unit
LRU	Least Recently Used
DTL	Data Transmission Latency
CHR	Cache Hit Ratio

References

Sim, C.; Wu, K.; Sim, A.; Monga, I.; Guok, C.; Würthwein, F.; Davila, D.; Newman, H.; Balcas, J. Effectiveness and predictability of in-network storage cache for Scientific Workflows. In Proceedings of the 2023 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, 20–22 February 2023; IEEE: New York, NY, USA, 2023; pp. 226–230. [Google Scholar]
Deng, Z.; Sim, A.; Wu, K.; Guok, C.; Hazen, D.; Monga, I.; Andrijauskas, F.; Würthwein, F.; Weitzel, D. Analyzing Transatlantic Network Traffic over Scientific Data Caches. In Proceedings of the 2023 on Systems and Network Telemetry and Analytics, Orlando, FL, USA, 20 June 2023; pp. 19–22. [Google Scholar]
Sim, C.; Wu, K.; Sim, A.; Monga, I.; Guok, C.; Hazen, D.; Würthwein, F.; Davila, D.; Newman, H.; Balcas, J. Predicting Resource Utilization Trends with Southern California Petabyte Scale Cache. In Proceedings of the EPJ Web of Conferences, Norfolk, VA, USA, 8–12 May 2023; EDP Sciences: Les Ulis, France, 2024; Volume 295, p. 01044. [Google Scholar]
Zurawski, J.; Brown, B.; Rai, G.; Dart, E.; Dawson, C.; Hawk, C.; Mantica, P.; Margetis, S.; Miller, K.; Miller, N.; et al. Nuclear Physics Network Requirements Review Final Report. In Nuclear Physics Network Requirements Review Final Report; Lawrence Berkeley National Laboratory: Berkeley, CA, USA, 2024; Report #: LBNL-2001602; Available online: https://escholarship.org/uc/item/4qx1b4x8 (accessed on 20 January 2025).
Shannigrahi, S.; Fan, C.; Papadopoulos, C. Request aggregation, caching, and forwarding strategies for improving large climate data distribution with ndn: A case study. In Proceedings of the 4th ACM Conference on Information-Centric Networking, Berlin, Germany, 26–28 September 2017; pp. 54–65. [Google Scholar]
Shannigrahi, S.; Fan, C.; Papadopoulos, C. Named data networking strategies for improving large scientific data transfers. In Proceedings of the 2018 IEEE International Conference on Communications Workshops (ICC Workshops), Kansas City, MO, USA, 20–24 May 2018; IEEE: New York, NY, USA, 2018; pp. 1–6. [Google Scholar]
Wu, Y.; Mutlu, F.V.; Liu, Y.; Yeh, E.; Liu, R.; Iordache, C.; Balcas, J.; Newman, H.; Sirvinskas, R.; Lo, M.; et al. N-DISE: NDN-based data distribution for large-scale data-intensive science. In Proceedings of the 9th ACM Conference on Information-Centric Networking, Osaka, Japan, 19–21 September 2022; pp. 103–113. [Google Scholar]
Fan, C.; Shannigrahi, S.; DiBenedetto, S.; Olschanowsky, C.; Papadopoulos, C.; Newman, H. Managing scientific data with named data networking. In Proceedings of the Fifth International Workshop on Network-Aware Data Management, Austin, TX, USA, 16 November 2015; pp. 1–7. [Google Scholar]
Ogle, C.; Reddick, D.; McKnight, C.; Biggs, T.; Pauly, R.; Ficklin, S.P.; Feltus, F.A.; Shannigrahi, S. Named data networking for genomics data management and integrated workflows. Front. Big Data 2021, 4, 582468. [Google Scholar] [CrossRef] [PubMed]
Asaeda, H.; Matsuzono, K.; Hayamizu, Y.; Hlaing, H.H.; Ooka, A. A survey of information-centric networking: The quest for innovation. IEICE Trans. Commun. 2024, 107, 139–153. [Google Scholar] [CrossRef]
Sim, A.; Kissel, E.; Hazen, D.; Guok, C. Experiences in deploying in-network data caches. In Proceedings of the EPJ Web of Conferences, Norfolk, VA, USA, 8–12 May 2023; EDP Sciences: Les Ulis, France, 2024; Volume 295, p. 07018. [Google Scholar]
Li, W.; Li, Y.; Wang, W.; Xin, Y.; Lin, T. A popularity-driven caching scheme with dynamic multipath routing in CCN. In Proceedings of the 2016 IEEE Symposium on Computers and Communication (ISCC), Messina, Italy, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 633–638. [Google Scholar]
Xu, Y.; Ni, H.; Zhu, X. A Novel Multipath Transmission Scheme for Information-Centric Networking. Future Internet 2023, 15, 80. [Google Scholar] [CrossRef]
Zhang, G.; Li, Y.; Lin, T. Caching in information centric networking: A survey. Comput. Netw. 2013, 57, 3128–3141. [Google Scholar] [CrossRef]
Laoutaris, N.; Che, H.; Stavrakakis, I. The LCD interconnection of LRU caches and its analysis. Perform. Eval. 2006, 63, 609–634. [Google Scholar] [CrossRef]
Chai, W.K.; He, D.; Psaras, I.; Pavlou, G. Cache “less for more” in information-centric networks. In Proceedings of the NETWORKING 2012: 11th International IFIP TC 6 Networking Conference, Prague, Czech Republic, 21–25 May 2012; Proceedings, Part I 11. Springer: Berlin/Heidelberg, Germany, 2012; pp. 27–40. [Google Scholar]
Cho, K.; Lee, M.; Park, K.; Kwon, T.T.; Choi, Y.; Pack, S. WAVE: Popularity-based and collaborative in-network caching for content-oriented networks. In Proceedings of the 2012 Proceedings IEEE INFOCOM Workshops, Orlando, FL, USA, 25–30 March 2012; IEEE: New York, NY, USA, 2012; pp. 316–321. [Google Scholar]
Lim, S.H.; Ko, Y.B.; Jung, G.H.; Kim, J.; Jang, M.W. Inter-chunk popularity-based edge-first caching in content-centric networking. IEEE Commun. Lett. 2014, 18, 1331–1334. [Google Scholar] [CrossRef]
Man, D.; Lu, Q.; Wang, H.; Guo, J.; Yang, W.; Lv, J. On-path caching based on content relevance in information-centric networking. Comput. Commun. 2021, 176, 272–281. [Google Scholar] [CrossRef]
Li, Y.; Wang, J.; Han, R. PB-NCC: A popularity-based caching strategy with number-of-copies control in information-centric networks. Appl. Sci. 2022, 12, 653. [Google Scholar] [CrossRef]
Derakhshan, F.; Timm-Giel, A.; Agüero, R. On Popularity-and Volume-Based Reduction of Logistic Costs in ICN. In Proceedings of the 2024 IEEE International Conference on Communications Workshops (ICC Workshops), Denver, CO, USA, 9–13 June 2024; IEEE: New York, NY, USA, 2024; pp. 908–913. [Google Scholar]
Shan, S.; Feng, C.; Zhang, T.; Liu, Y. A user interest preferences based on-path caching strategy in named data networking. In Proceedings of the 2017 IEEE/CIC International Conference on Communications in China (ICCC), Qingdao, China, 22–24 October 2024; IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar]
Shan, S.; Feng, C.; Zhang, T.; Loo, J. Proactive caching placement for arbitrary topology with multi-hop forwarding in ICN. IEEE Access 2019, 7, 149117–149131. [Google Scholar] [CrossRef]
Saino, L.; Psaras, I.; Pavlou, G. Hash-routing schemes for information centric networking. In Proceedings of the 3rd ACM SIGCOMM Workshop on Information-Centric Networking, Hong Kong, China, 12 August 2013; pp. 27–32. [Google Scholar]
Wu, T.; Zheng, Q.; Shi, Q.; Yang, F.; Xu, Z. NCR-BN Cooperative Caching for ICN Based on Off-Path Cache. In Proceedings of the 2022 5th International Conference on Hot Information-Centric Networking (HotICN), Guangzhou, China, 24–26 November 2022; IEEE: New York, NY, USA, 2022; pp. 42–47. [Google Scholar]
Chaudhary, P.; Hubballi, N. PeNCache: Popularity based cooperative caching in Named Data Networks. Comput. Netw. 2024, 257, 110995. [Google Scholar] [CrossRef]
Rath, H.K.; Panigrahi, B.; Simha, A. On cooperative on-path and off-path caching policy for information centric networks (ICN). In Proceedings of the 2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA), Crans-Montana, Switzerland, 23–25 March 2016; IEEE: New York, NY, USA, 2016; pp. 842–849. [Google Scholar]
Hara, T.; Shigeyasu, T. Development of new off-path caching algorithm for reducing both of network traffic and content acquisition time on NDN. In Proceedings of the International Conference on Network-Based Information Systems, Ho Chi Minh City, Vietnam, 17–19 January 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 217–225. [Google Scholar]
Alhowaidi, M.; Nadig, D.; Hu, B.; Ramamurthy, B.; Bockelman, B. Cache management for large data transfers and multipath forwarding strategies in Named Data Networking. Comput. Netw. 2021, 199, 108437. [Google Scholar] [CrossRef]
Abdullahi, I.; Arif, S.; Hassan, S. Survey on caching approaches in information centric networking. J. Netw. Comput. Appl. 2015, 56, 48–59. [Google Scholar] [CrossRef]
Wang, J.; Chen, G.; You, J.; Sun, P. Seanet: Architecture and technologies of an on-site, elastic, autonomous network. J. Netw. New Media 2020, 6, 1–8. [Google Scholar]
Xu, Y.; Ni, H.; Zhu, X. An effective transmission scheme based on early congestion detection for information-centric network. Electronics 2021, 10, 2205. [Google Scholar] [CrossRef]
Zhang, F.; Zhang, Y.; Raychaudhuri, D. Edge caching and nearest replica routing in information-centric networking. In Proceedings of the 2016 IEEE 37th Sarnoff Symposium, Newark, NJ, USA, 19–21 September 2016; IEEE: New York, NY, USA, 2016; pp. 181–186. [Google Scholar]
Janaszka, T.; Bursztynowski, D.; Dzida, M. On popularity-based load balancing in content networks. In Proceedings of the 2012 24th International Teletraffic Congress (ITC 24), Krakow, Poland, 4–7 September 2012; IEEE: New York, NY, USA, 2012; pp. 1–8. [Google Scholar]
Chao, Y.; Ni, H.; Han, R. A Path Load-Aware Based Caching Strategy for Information-Centric Networking. Electronics 2022, 11, 3088. [Google Scholar] [CrossRef]
Xin, Y.; Li, Y.; Wang, W.; Li, W.; Chen, X. Content aware multi-path forwarding strategy in Information Centric Networking. In Proceedings of the 2016 IEEE Symposium on Computers and Communication (ISCC), Messina, Italy, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 816–823. [Google Scholar]
Kim, D.; Lee, S.W.; Ko, Y.B.; Kim, J.H. Cache capacity-aware content centric networking under flash crowds. J. Netw. Comput. Appl. 2015, 50, 101–113. [Google Scholar] [CrossRef]
Cardwell, N.; Cheng, Y.; Gunn, C.S.; Yeganeh, S.H.; Jacobson, V. Bbr: Congestion-based congestion control: Measuring bottleneck bandwidth and round-trip propagation time. Queue 2016, 14, 20–53. [Google Scholar] [CrossRef]
Duan, Y.; Ni, H.; Zhu, X. Reliable Multicast Based on Congestion-Aware Cache in ICN. Electronics 2021, 10, 1579. [Google Scholar] [CrossRef]
Tan, L.; Su, W.; Zhang, W.; Lv, J.; Zhang, Z.; Miao, J.; Liu, X.; Li, N. In-band network telemetry: A survey. Comput. Netw. 2021, 186, 107763. [Google Scholar] [CrossRef]
Xu, Y.; Ni, H.; Zhu, X. A Multipath Data-Scheduling Strategy Based on Path Correlation for Information-Centric Networking. Future Internet 2023, 15, 148. [Google Scholar] [CrossRef]
Saino, L.; Psaras, I.; Pavlou, G. Icarus: A caching simulator for information centric networking (icn). In Proceedings of the SimuTools, ICST, Lisbon, Portugal, 17–19 March 2014; Volume 7, pp. 66–75. [Google Scholar]
Ioannou, A.; Weber, S. A survey of caching policies and forwarding mechanisms in information-centric networking. IEEE Commun. Surv. Tutor. 2016, 18, 2847–2886. [Google Scholar] [CrossRef]
Yu, J.; Chou, C.T.; Du, X.; Wang, T. Internal popularity of streaming video and its implication on caching. In Proceedings of the 20th International Conference on Advanced Information Networking and Applications-Volume 1 (AINA’06), Vienna, Austria, 18–20 April 2006; IEEE: New York, NY, USA, 2006; Volume 1, p. 6. [Google Scholar]
Capone, V.; Usman, M. The GÉANT network: Addressing current and future needs of the HEP community. J. Phys. Conf. Ser. 2015, 664, 052005. [Google Scholar] [CrossRef]
Spring, N.; Mahajan, R.; Wetherall, D. Measuring ISP topologies with Rocketfuel. ACM SIGCOMM Comput. Commun. Rev. 2002, 32, 133–145. [Google Scholar] [CrossRef]
Chiu, D.M.; Jain, R. Analysis of the increase and decrease algorithms for congestion avoidance in computer networks. Comput. Netw. ISDN Syst. 1989, 17, 1–14. [Google Scholar] [CrossRef]

Figure 1. Caching architecture based on multi-path transmission.

Figure 2. Diagram of the content-aware hybrid multipath selection strategy.

Figure 3. Packet header design.

Figure 4. The process of the multipath selection strategy.

Figure 5. Forwarding process of the first request packet.

Figure 6. Forwarding process of the first data packet.

Figure 7. Forwarding process of subsequent request packets.

Figure 8. Forwarding process of subsequent data packets.

Figure 9. Data transmission latency results from different caching strategies when the cache size ratio varies.

Figure 10. Data transmission latency results from different caching strategies when the alpha varies.

Figure 11. Data transmission latency results from different caching strategies when the request rate varies.

Figure 12. Cache hit ratio results from different caching strategies when the cache size ratio varies.

Figure 13. Cache hit ratio results from different caching strategies when the value of alpha varies.

Figure 14. Cache hit ratio results from different caching strategies when the request rate varies.

Figure 15. Cache Node Load Balancing degree results from different caching strategies when the cache size ratio varies.

Figure 16. Cache node load balancing degree results from different caching strategies when the value of alpha varies.

Figure 17. Cache Node Load Balancing degree results from different caching strategies when the request rate varies.

Table 1. Summary of the notations.

Name	Comment
M	Content set, representing all requested content
V	The set of in-network nodes
$P_{v, u}$	Transmission path set between node v and user u
R	The set of user requests
$V_{e}$	The set of edge nodes
$r_{m}^{u}$	Request rate of user u for content m
C	Storage capacities of all nodes in V
$x_{v}^{m}$	Cache decision variable, indicating whether content m is cached at node v
$t_{p}^{v, u, m}$	Path transmission latency, representing the latency to transmit content m from node v to user u along path p
$a_{u, v}^{m}$	Node selection variable, indicating whether node v is selected to respond to user u’s request for content m
$z_{p}^{v, u, m}$	Transmission path selection variable, indicating whether path p from node v to user u is selected to transmit content m
$s_{m}$	Size of content m
$U_{v}^{m}$	The set of users routed to node v for content m
$U_{s r c}^{m}$	The set of users routed to the source servers for content m
$P_{v}$	Transmission path set passing through node v

Table 2. Experiment parameters.

Parameter	Value
Topology	GEANT/Sprintlink/Exodus
Number of Chunks	$1 \times 10^{5}$
Total User Requests	$4 \times 10^{5}$
Bandwidth	10 chunks/s
Request Rate	10–100 requests/s
Request Distribution	Poisson distribution
Content Popularity Distribution	Zipf distribution
Zipf Parameter $α$	[0.6, 0.7, 0.8, 0.9, 1, 1.1, 1.2]
Cache Size Ratio	[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1]
Number of Repetitions	5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, W.; Han, R. An Efficient Multipath-Based Caching Strategy for Information-Centric Networks. Electronics 2025, 14, 439. https://doi.org/10.3390/electronics14030439

AMA Style

Zhang W, Han R. An Efficient Multipath-Based Caching Strategy for Information-Centric Networks. Electronics. 2025; 14(3):439. https://doi.org/10.3390/electronics14030439

Chicago/Turabian Style

Zhang, Wancai, and Rui Han. 2025. "An Efficient Multipath-Based Caching Strategy for Information-Centric Networks" Electronics 14, no. 3: 439. https://doi.org/10.3390/electronics14030439

APA Style

Zhang, W., & Han, R. (2025). An Efficient Multipath-Based Caching Strategy for Information-Centric Networks. Electronics, 14(3), 439. https://doi.org/10.3390/electronics14030439

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Efficient Multipath-Based Caching Strategy for Information-Centric Networks

Abstract

1. Introduction

2. Related Works

3. System Model

3.1. System Description

3.2. Problem Description

4. Multipath-Based Caching Strategy

4.1. Content-Aware Hybrid Multipath Selection Strategy

4.1.1. Content Popularity Estimation Algorithm

4.1.2. Cache-Prioritized Scheduling Strategy

4.1.3. Transmission-Prioritized Scheduling Strategy

4.2. Cache Load-Based Node Selection Strategy

4.3. Implementation

4.3.1. Packet Header Design

4.3.2. Multipath Selection Strategy Process

4.3.3. Cache Node Selection Strategy Process

4.4. Overhead Analysis

5. Performance Evaluation

5.1. Experimental Setup

5.2. Simulation Results

5.2.1. Data Transmission Latency

5.2.2. Cache Hit Ratio

5.2.3. Cache Node Load Balancing

5.3. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI