Decentralized Mechanism for Edge Node Allocation in Access Network: An Experimental Evaluation

Calle-Cancho, Jesus; Cañada, Carlos; Pastor-Vargas, Rafael; Paoletti, Mercedes E.; Haut, Juan M.

doi:10.3390/fi16090342

Open AccessArticle

Decentralized Mechanism for Edge Node Allocation in Access Network: An Experimental Evaluation

by

Jesus Calle-Cancho

¹

,

Carlos Cañada

²,

Rafael Pastor-Vargas

³

,

Mercedes E. Paoletti

²

and

Juan M. Haut

^2,*

¹

Department of Computing and Telematics Engineering, University of Extremadura, 10001 Cáceres, Spain

²

Department of Technology of Computers and Communications, University of Extremadura, 10001 Cáceres, Spain

³

Deparment of Communication and Control Systems, School of Computer Science Engineering, National Distance Education University (UNED), 28040 Madrid, Spain

^*

Author to whom correspondence should be addressed.

Future Internet 2024, 16(9), 342; https://doi.org/10.3390/fi16090342

Submission received: 31 August 2024 / Revised: 12 September 2024 / Accepted: 19 September 2024 / Published: 20 September 2024

(This article belongs to the Special Issue Distributed Storage of Large Knowledge Graphs with Mobility Data)

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid advancement of the Internet of Things and the emergence of 6G networks in smart city environments, a growth in the generation of data, commonly known as big data, is expected to consequently lead to higher latency. To mitigate this latency, mobile edge computing has been proposed to alleviate a portion of the workload from mobile devices by offloading it to nearby edge servers equipped with appropriate computational resources. However, existing solutions often exhibit poor performance when confronted with complex network topologies. Thus, this paper introduces a decentralized mechanism aimed at determining the locations of network edge nodes in such complex network topologies, characterized by lengthy execution times. Our proposal provides performance improvements and offers scalability and flexibility as networks become more complex. Experimental evaluations are conducted using the Shanghai Telecom dataset to validate our proposed approach.

Keywords:

next-generation networks; mobile edge computing; decentralized mechanism

1. Introduction

In recent years, there has been a significant evolution in mobile communications due to the widespread proliferation of mobile devices, which generate an unprecedented amount of data traffic. According to a recent Ericsson forecast, mobile subscriptions worldwide are expected to reach 8.6 billion by the end of 2028. Furthermore, global mobile data traffic is projected to increase nearly fourfold between 2023 and 2028, reaching around 53 exabytes per month by the end of 2028. This significant growth is driven by the ongoing expansion of 5G networks and the increasing use of data-intensive applications [1]. Thus, the new generation of mobile communications has to evolve to cope with this growth and ensure that emerging services and applications meet the specific demands of mobile users. Advanced 5G and 6G are being proposed as technologies that will address complex challenges related to ultra-dense deployments with strict latency and reliability requirements.

In these next-generation network environments, producing both highly reliable and low latency enhancements simultaneously is complex. Cutting-edge technologies such as mobile edge computing (MEC) enable advancements in this regard, although bringing computing capabilities closer to the end mobile users is still required. The main MEC principle is to locate an edge data center (EDC) at the edge of the mobile network, promoting the distribute computation and storage capabilities to reduce communications distance, and, consequently, the communication delay between the mobile network and end users. The MEC approach seeks the joint optimization of mobile access and core network functions localized together in the edge cloud [2]. MEC has been standardized by the European Telecommunications Standards Institute (ETSI) [3] with the goal of delivering computing services closer to the end user. This makes it particularly useful in scenarios where low latency and locality are critical [4].

The placement of EDCs in mobile edge computing environments presents a significant challenge. EDCs are positioned near wireless components, such as base stations (BSs), within infrastructure owned by a mobile network operator (MNO) to ensure effective radio coverage [5]. The locations of these EDCs are crucial for minimizing access delays for mobile users and optimizing resource utilization, particularly in smart cities where hundreds or thousands of base stations connect users to EDCs. Due to the vast size of these networks, poor placement of edge servers may lead to long access delays and uneven workload distribution, with some servers becoming significantly overloaded. In contrast, others remain underutilized or even idle. Thus, strategically placing edge servers is essential for enhancing the performance of various mobile applications, including reducing edge server access delays.

Additionally, some research studies have identified a strong correlation between user mobility and the physical infrastructure of smart cities and the cellular access network [6,7]. Therefore, it is essential to analyze the deployment of EDCs and their interconnection with the cellular network. This interconnection has been extensively studied and is classified as an NP-hard problem [8], indicating that solving it requires significant computational resources. As the complexity of the problem increases exponentially with the number of base stations in the network, this challenge becomes even more pronounced in smart cities, where a dense 5G/6G network of base stations is typical. This is because these cities rely on an ultra-dense deployment of base stations, meaning many more stations compared with less urbanized areas, with the goal of providing coverage for all the services and applications that are part of the smart city ecosystem [9,10].

An efficient management of this interconnection is crucial, as it directly impacts the performance and scalability of proposed solutions.

Most existing studies have focused on offloading mobile users’ workloads to edge data centers to enable energy savings for mobile devices [11,12]. However, these approaches assume that the edge nodes have already been deployed. Other studies focus on the placement of EDCs in mobile edge environments, paying attention to access delays and workloads [13,14,15]. However, these proposals have overlooked scalability, which become increasingly important as networks grow more complex, due to the significant increase in the number of base stations required to provide coverage in 5G/6G environments [9]. With the advent of 5G and MIMO technologies [16], it is crucial that placement strategies are scalable, given the significant increase in the number of access devices and the corresponding deterioration in performance when addressing the EDC placement problem. Therefore, this paper proposes a decentralized cloud-based placement solution that achieves significant improvements in terms of scalability.

The rest of the paper is structured as follows. Section 2 outlines this work’s proposed methodology and techniques. Section 3 provides a detailed description of the experimental setup used for evaluation. In Section 4, we present the results of the comparative performance evaluation, and, finally, in Section 5, we conclude the paper.

2. Methodology

In this section, we introduce the system modeling and define the problem of EDC placement. Additionally, we explain the proposed decentralized methodology to address the problem.

2.1. System Model

In mobile edge computing environments, the EDC placement problem can be modeled as a network represented by an undirected graph

G = (V, E)

. This graph comprises numerous mobile users, base stations, and potential EDC locations. Specifically,

V = B \cup L

, where B denotes the set of base stations and L denotes the set of potential EDC locations. The edges, E, represent the connections between base stations and EDCs located at the points in L. We assume that there are K EDCs, each designated to a different location, where K is a constant, and that each base station connects directly to its assigned EDC. We also consider that each EDC is responsible for a subset of base stations in B to process the mobile user requests, and the same base station is not shared between any two EDCs.

Each base station has its own EDC, and the access delay to the EDC is proportional to the distance between the base station and the EDC. To optimize EDC placement in mobile edge computing networks, we need to account for this access delay.

Figure 1 depicts an example of the EDC placement problem, where we can see that EDC

L_{1}

manages a group of base stations:

b_{1}, b_{2}, b_{3}, b_{4}, b_{5}

. Therefore, we need to determine the optimal placement of

L_{1}

among the base stations to minimize access delay. Along with solving this problem, it is also crucial to define the dominant area of each EDC to achieve a balanced workload.

The formal definition of the EDC placement problem has been established in [15] and is detailed below. A binary decision variable,

x_{i, j} \in 0, 1

, is used to indicate whether a base station,

b_{i}

, is assigned to an EDC,

L_{j}

. Specifically,

x_{i, j} = 1

if base station

b_{i}

is assigned to

L_{j}

; otherwise,

x_{i, j} = 0

. This applies for all i and j where

1 \leq i \leq | B |

(with

| B |

defining the total number of base stations) and

1 \leq j \leq K

. This approach assumes that each EDC is situated with one of the base stations. Since there are

| B |

base stations, the number of possible locations for placing the EDCs is

| B |

. For each assignment, we assign a score,

c_{i, j}

, to evaluate its suitability for an EDC. This score takes into account the distance from the EDC to the group of base stations it manages (latency) and the load of the users connected to all those base stations (workload).

For the distance, a smaller distance from each base station to its assigned EDC is preferred, while, for the workload, a more balanced distribution of the workload among each EDC is preferred.

The EDC placement model is as follows:

\begin{matrix} Minimize & \sum_{i = 1}^{| B |} \sum_{j = 1}^{K} c_{i, j} x_{i, j} \\ subject to : & \sum_{j = 1}^{K} x_{i, j} = 1, \forall i \\ x_{i, j} \in {0, 1} \end{matrix}

(1)

where

\sum_{j = 1}^{K} x_{i, j} = 1

ensures that each base station is assigned to one and only one EDC.

2.2. EDC Deployment Strategies

This section presents the proposed decentralized strategy based on clustering for solving the EDC placement problem. Clustering offers an unsupervised alternative that has been widely used in various fields. A commonly used group of clustering algorithms is the centroid-based clustering methods, such as K-Means [17], which operate on the assumption that similar points naturally form clusters in feature space. While these methods can yield satisfactory results, their high computational complexity often limits them. This paper presents a distributed framework for large-scale base station deployment using cloud computing. Such extensive network deployments demand significant computing power due to the vast numbers of data involved. As a case study, we focus on unsupervised clustering, specifically the K-Means algorithm, to illustrate how cloud computing technologies can be effectively leveraged for distributed parallel processing in large-scale deployments.

It will be compared with the reference K-Means algorithm to verify its effectiveness, and its advantages will be discussed in the following sections.

K-Means

K-Means [17] effectively identifies well-separated clusters by minimizing the inertia criterion. Specifically, K-Means requires prior knowledge of the number of clusters to detect, and it then groups points to minimize the within-cluster sum of squares, as defined by:

\sum_{i = 0}^{n} ({∥ x_{i} - μ_{j} ∥}^{2})

(2)

where

μ_{j}

is the mean of the samples in the cluster and

x_{i}

is the i-th sample.

This method is often used to partition a dataset into K groups automatically. It begins with the selection of K initial cluster centers (centroids), which are then iteratively refined by minimizing the distance between each point of the cluster and its centroid. In the EDC placement problem, K-Means clustering is used to identify K clusters of base stations and place K edge servers at their centers to minimize the distance within each cluster sum of squares [18].

The selection of the initial centers highly influences the effectiveness of the K-Means algorithm, as it may converge to a local minimum. Therefore, proper initialization is crucial for achieving the best final solution. To obtain a set of high-quality initial cluster centers, several methods have been proposed, including the K-Means++ method [19].

2.3. Parallel/Distributed K-Means

Apache Spark [20] is a powerful data engine designed for the efficient management of large-scale datasets. It is engineered to provide the necessary computational speed, scalability, and flexibility for complex data analysis tasks. A key distinction between Spark and MapReduce lies in Spark’s ability to process and retain data in its memory across multiple computational steps, avoiding the need for repeated disk I/O operations. This in-memory processing capability significantly enhances Spark’s processing speed compared with traditional MapReduce frameworks. Apache Spark also offers a unique data structure, i.e., DataFrame, which defines a distributed collection of data organized into named columns, akin to a table in a relational database. DataFrames in Spark provide a higher-level abstraction over the Resilient Distributed Dataset (RDD), offering an optimized framework for distributed data processing. In contrast to RDD, DataFrames possess a schema, which involves the storage of metadata relating to the data structure; this schema defines the column names and data types, which facilitates Spark to perform optimizations such as predicate pushdown or project pruning. First, when a DataFrame operation is performed, Spark constructs a logical plan, which is an abstract representation of the computation. Then, the catalyst optimizer applies various rule-based and cost-based optimizations to produce an optimized physical plan that specifies how the computation will be executed across the computing cluster. As DataFrames are distributed across the cluster environment in the form of partitions, each partition is a subset of the DataFrames’ data. As a result, Spark ensures that transformations on DataFrames (map, filter operations) are applied to each partition in parallel, leveraging the distributed nature of the data. However, these transformations are lazy, i.e., they are not executed immediately, but build a logical plan as actions that trigger the execution of the plan.

We consider a DataFrame representing a large dataset, D, with columns

c_{1}, c_{2}, \dots, c_{n}

, where each

c_{i}

is a vector of features in a multi-dimensional space. In the K-Means algorithm, we seek to partition this dataset into K clusters by minimizing the within-cluster sum of squares (WCSS). For a given cluster,

C_{j}

, the centroid,

μ_{j}

, is computed as the mean of all points, which can be expressed as:

\begin{matrix} μ_{j} & = \frac{1}{| C_{j} |} \sum_{x_{i} \in C_{j}} x_{i} \end{matrix}

(3)

K-Means aims to minimize the sum described in (1) across all clusters. In Spark, this can be efficiently executed across the computing cluster by representing dataset D as a DataFrame and, for each data point,

x_{i}

, determining the closest centroid,

μ_{j}

, as we can observe in Algorithm 1. This operation is distributed across the partitions of the DataFrame, where each partition computes the closest centroid for its subset of data points and, after assigning all points to their closest centroids, recomputes the centroids by making a generalization of the centroids of each cluster. This involves filtering the DataFrame by cluster assignment (i.e., grouping by cluster ID) and then calculating the mean of the points in each cluster. In Spark, this is achieved through a combination of groupBy and aggregation operations over the DataFrame. Finally, the WCSS is computed by summing the squared distances of points from their assigned centroids. The workflow is described in Figure 2.

\begin{matrix} WCSS & = \sum_{x_{i} \in D} {∥ x_{i} - μ_{closest_centroid (x_{i})} ∥}^{2} \end{matrix}

(4)

This operation is performed by applying a map operation to calculate the squared distance for each point, followed by a reduction to sum these values across the entire dataset.

In the context of node allocation in complex network topologies, such as those found in MEC, we could suppose that each base station is a point represented as

x_{i}

, the initial centroids,

ν_{j}

, represent the initial guess of the EDC locations, and the parameter K is the number of edge data centers to deploy. The final EDC positions (centroids) after the execution are the optimal locations for the data centers to ensure efficient and low-latency service. Further details of the implementation are described in [21].

Algorithm 1: Distributed K-Means in Apache Spark

3. Experimental Setup

This section describes the dataset used in the performance tests and the infrastructure utilized to conduct the experiments.

3.1. Dataset Description

In our experiments, we utilized the Shanghai Telecom base station dataset, which provides internet information for mobile users accessing 3233 base stations. After a previous analysis, 3000 effective base stations were identified, as some were idle or had invalid data. The dataset includes detailed records of the start and end times of base station access for each mobile user. Figure 3 shows the distribution of these 3233 base stations. As a densely populated city, Shanghai is well suited for mobile edge computing networks. Each base station’s workload is represented by the number of mobile user requests it handles, which can be estimated using population density or historical access data through linear regression techniques. Specifically, the dataset comprises 4.6 million call records and 7.5 million flow records from approximately 10,000 mobile users over six months, detailing the exact times of base station access.

Table 1 provides information on 11 randomly selected base stations from the Shanghai Telecom dataset, detailing the workload of each station. The workload is calculated as the total request time based on mobile user activities’ start and end times. From the data in Table 1, it is evident that there is a significant workload imbalance among these base stations. If the EDC placement problem focuses solely on communication delay without considering the workload, it could result in an uneven distribution of workload across the EDC.

3.2. Physical Infrastructure and Capabilities

For the performance evaluation discussed in this article, a range of node configurations were designed to evaluate different scenarios. The experimental setup consistently included a single primary node, referred to as the master, which was responsible for coordinating the tasks across the cluster. In addition to the master node, varying numbers of task nodes, also known as Spark workers, were employed. The numbers of task nodes were scaled across different configurations, utilizing 1, 2, 4, 8, or 16 nodes to evaluate the impact of cluster size on performance. To clarify the role and purpose of each node type mentioned, Table 2 provides a brief description of each node’s name and the specific task it performs.

Each task node was responsible for executing a portion of the distributed workload, supporting parallel processing and a thorough analysis of how the performance of the cluster scaled as more nodes were added. A central node was also included in the configuration to support the data-intensive nature of the experiments. This node was dedicated to data storage and management through the Hadoop Distributed File System (HDFS), ensuring efficient data access and management throughout the experiments.

This architecture, comprising a master node for orchestration, varying task nodes for distributed computation, and a dedicated central node for data storage, provided a robust framework to analyze system performance under different conditions. It enabled detailed observation of how performance scaled with the number of nodes and the effectiveness of HDFS in managing large data volumes. The m5.xlarge Amazon EC2 instance, with its four Intel Xeon vCPUs, 16 GB of DDR4 memory, and support for up to 10 Gbps network bandwidth via the Enhanced Networking Adapter (ENA), offered balanced compute resources for the tests. Data storage relied on Amazon Elastic Block Store (EBS), with read/write speeds of up to 4750 Mbps and 2375 Mbps, respectively, ensuring efficient handling of data-intensive tasks. The Nitro System hypervisor further enhanced performance by reducing virtualization overhead. The cluster, built with Amazon’s EMR (v.7.2.0) service, ran Hadoop v.3.3.6 and Spark v.3.5.1, using Spark-specific Python 3.9 code and the PySpark library. Additional dependencies, such as NumPy, were installed using EMR bootstrap actions to ensure all nodes had the necessary software for running the experiments, particularly for clustering tasks with PySpark’s K-Means class.

4. Performance Evaluation

This section presents the performance evaluation results, comparing the serial implementation with the distributed methodology proposed in this article for addressing the EDC placement problem. These performance tests were conducted using the dataset that encompasses the cellular network deployment and usage in the city of Shanghai, as represented in the Shanghai Telecom dataset [13,14,15].

4.1. Comparison of EDC Deployment Strategies

First, a study about latency and workload was conducted using the distributed methodology to ensure that these metrics are not adversely affected by the proposed approach. Table 3 compares the latency and workload results between the serial version and the distributed Spark version for different EDC configurations (20, 30, and 40 EDCs), presenting the latency and workload, along with their standard deviations, for both versions. Additionally, these results are shown in Figure 4 to graphically illustrate the trend of both metrics based on the number of EDCs deployed across the city of Shanghai.

Figure 4a shows that the latency values are almost identical between the serial and Spark versions, with minor differences in their standard deviations, indicating that both methods deliver similar performance in terms of latency. On the other hand, as shown in Figure 4b, there is a slight variation in the workload between the two versions, with the Spark version generally showing a higher workload, particularly for the 20 EDCs configuration. This suggests that the distributed method may handle a slightly higher load, although the difference decreases as the number of EDCs increases.

Figure 5 illustrates the deployment of different EDC configurations across the city of Shanghai, providing coverage for the 3000 base stations included in the dataset. This visual representation highlights how various configurations of EDCs are distributed throughout the city to ensure optimal coverage and connectivity. Figure 5 shows that both the serial and distributed strategies for EDC placement yield comparable results. This similarity is expected, given that both approaches aim to effectively manage and balance the load among the base stations. The distribution patterns demonstrate that, regardless of the methodology used, the coverage and performance metrics achieved are consistently aligned. However, as it will be shown in the following sections, the distributed methodology offers better performance in terms of execution times. Thus, we confirm that both strategies are capable of providing similar coverage and handling the demands of the extensive network effectively, with the distributed approach providing an advantage in execution efficiency.

Figure 6 offers a comprehensive and detailed analysis of the performance characteristics of the EDC placement algorithm implemented within the Apache Spark framework, elucidating how the algorithm behaves regarding the number of worker nodes and the size of the datasets. These visualizations not only demonstrate the algorithm’s effectiveness in reducing execution time but also provide critical insights into its scalability, efficiency, and overall performance in a distributed computing environment, making them a valuable resource for understanding the practical implications of the algorithm’s deployment in real-world scenarios.

4.2. Absolute Execution Time vs. Number of Worker Nodes

Figure 6a provides a direct representation of execution time as a function of the number of worker nodes, offering a straightforward comparison of how different dataset sizes respond to variations in cluster size. These results reinforce the findings from the logarithmic plot (Figure 6b), further emphasizing the significant reductions in execution time achieved by increasing the number of worker nodes, particularly for larger datasets. For example, with the 5045.5 MB dataset, the execution time decreases markedly as more worker nodes are added. These runtimes demonstrate the algorithm’s capacity to distribute and process large workloads across a distributed cluster effectively.

This significant reduction in execution time for larger datasets highlight the algorithm’s ability to scale effectively within the Spark framework, making it potentially applicable to big data scenarios, where fast processing of large numbers of data is essential. In practical terms, the algorithm can be deployed in environments where data processing speed is critical, such as in real-time analytics, streaming data processing, and large-scale simulations. In addition, Figure 6a also highlights the diminishing returns associated with adding more worker nodes for smaller datasets. For datasets around 108.1 MB, the execution time reduction is less pronounced, indicating that the additional overhead of managing more nodes may outweigh the performance gains beyond a certain point. This insight is vital for optimizing resource allocation in distributed systems, where it is essential to balance the number of worker nodes with the dataset size to achieve the best possible performance. It suggests that a more conservative approach to node allocation may be more efficient for smaller workloads, avoiding unnecessary costs and complexity.

4.3. Execution Time (Logarithmic Scale) vs. Number of Worker Nodes

Figure 6b, which employs a logarithmic scale to represent execution time as a function of the number of worker nodes, is particularly insightful in illustrating the efficiency gains achieved through parallel processing. The use of a logarithmic scale is significant because it allows for a clearer visualization of the relative changes in execution time across different scales, especially when dealing with varying dataset sizes. These results reveal a substantial reduction in execution time as the number of worker nodes increases, a trend that is particularly pronounced for larger datasets. For instance, with the dataset sized at 10,854.4 MB, the execution time decreases sharply as the number of worker nodes increases, indicating that the algorithm is highly effective at parallelizing tasks across multiple nodes. This capability is critical in distributed computing environments, where the ability to efficiently distribute workloads can lead to significant reductions in processing time, thereby enhancing overall system performance.

The impressive reduction in execution time when handling larger datasets also suggests that the algorithm is well suited for handling substantial data numbers, a common requirement in massive data applications, such as deployments of complex networks in smart cities. However, the results also reveal that for smaller datasets, such as those around 108.1 MB, the reduction in execution time is less significant, and the curve flattens more quickly. This indicates that the overhead associated with managing multiple worker nodes may begin to outweigh the benefits of parallel processing when the dataset size is relatively small. This observation is crucial for understanding the trade-offs involved in using Apache Spark for smaller workloads and highlights the importance of optimizing node allocation to avoid unnecessary computational overhead.

4.4. Speedup vs. Number of Worker Nodes

Figure 6c illustrates the speedup achieved by increasing the number of worker nodes, and provides a deeper understanding of the algorithm’s scalability and its ability to leverage additional computational resources. Speedup is a critical metric in distributed computing, as it reflects the degree to which the algorithm can reduce execution time relative to the smallest cluster size. Speedup is calculated relative to the execution time using the smallest number of worker nodes, and the results show a clear trend, i.e., as the dataset size increases, the speedup approaches linearity, particularly for the largest datasets.

For example, the dataset sized at 10,854.4 MB exhibits a nearly linear growth in speedup as more worker nodes are added, indicating that the algorithm efficiently utilizes the additional computational power provided by the larger cluster. This linearity suggests that the algorithm has low overhead and high parallel efficiency, which are crucial characteristics for scalability in distributed systems. In practical terms, this means that the algorithm is capable of scaling effectively across a wide range of cluster sizes, making it suitable for deployment in large-scale, data-intensive applications where maximizing computational efficiency is paramount.

The ability to achieve near-linear speedup is particularly important in edge computing and big data environments, where the ability to scale efficiently across multiple nodes can significantly improve processing time and overall system performance. However, the plot also shows that, for smaller datasets, the speedup curve begins to flatten as the number of worker nodes increases, indicating that the benefits of adding more nodes are diminishing. This flattening suggests a point where adding more nodes does not contribute significantly to improving performance and may even introduce unnecessary overhead. This highlights the need to carefully assess the trade-off between the size of the problem to be addressed and the number of distributed nodes, where an unnecessary number of nodes results in computational overhead dedicated to managing communications and distribution across the computing cluster.

5. Conclusions

This work focuses on the EDC placement problem, providing a novel solution based on machine learning (specifically, unsupervised clustering) and the Apache Spark framework, emphasizing its feasibility and effectiveness in distributed computing environments. Throughout the study, it has been demonstrated that integrating the K-means algorithm for the EDC placement problem into Apache Spark is feasible and highly beneficial for scenarios requiring efficient workload distribution in large-scale data processing systems.

The research highlights that combining Apache Spark’s distributed processing capabilities with a specific edge node placement algorithm offers a robust solution to scalability and efficiency challenges in edge computing environments. This integration significantly enhances performance in managing and processing large numbers of data, which is increasingly critical in smart city environments.

Moreover, the proposed approach makes a unique contribution to the existing literature by offering a novel method for optimizing the deployment of edge nodes. Its implementation within Spark not only enhances the algorithm’s adaptability and flexibility with various distributed system architectures but also maximizes the utilization of computational resources in scalable platforms.

Author Contributions

Conceptualization, J.M.H. and J.C.-C.; methodology, M.E.P.; software, C.C.; validation, J.C.-C., C.C. and R.P.-V.; formal analysis, M.E.P.; investigation, C.C.; resources, R.P.-V.; data curation, J.M.H.; writing—original draft preparation, J.C.-C.; C.C.; writing—review and editing, M.E.P.; visualization, J.M.H.; supervision, J.M.H.; project administration, J.C.-C.; funding acquisition, J.M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Consejería de Economía, Ciencia y Agenda Digital of the Junta de Extremadura and the European Regional Development Fund (ERDF) of the European Union under Grant GR21040, in part by the European Regional Development Fund (ERDF) of the European Union Interreg V-A Espana–Portugal (POCTEP) 2021–2027 program, under Grant 0206_RAT_EOS_PC_6_E, and in part by the Spanish Ministry of Science and Innovation under the project “TED2021-131699B-I00/MCIN/AEI/10.13039/501100011033/” and European Union NextGenerationEU/PRTR.

Data Availability Statement

The original data presented in the study are openly available in http://sguangwang.com/TelecomDataset.html (accessed on 18 September 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ericsson. Ericsson Mobility Report: Mobile Traffic Forecast. 2024. Available online: https://www.ericsson.com/en/reports-and-papers/mobility-report/dataforecasts/mobile-traffic-forecast (accessed on 12 August 2024).
Raeisi-Varzaneh, M.; Dakkak, O.; Habbal, A.; Kim, B.S. Resource Scheduling in Edge Computing: Architecture, Taxonomy, Open Issues and Future Research Directions. IEEE Access 2023, 11, 25329–25350. [Google Scholar] [CrossRef]
Giust, F.; Verin, G.; Antevski, K.; Chou, J.; Fang, Y.; Featherstone, W.; Fontes, F.; Frydman, D.; Li, A.; Manzalini, A.; et al. MEC deployments in 4G and evolution towards 5G. ETSI White Pap. 2018, 24, 1–24. [Google Scholar]
Taleb, T.; Samdanis, K.; Mada, B.; Flinck, H.; Dutta, S.; Sabella, D. On Multi-Access Edge Computing: A Survey of the Emerging 5G Network Edge Cloud Architecture and Orchestration. IEEE Commun. Surv. Tutorials 2017, 19, 1657–1681. [Google Scholar] [CrossRef]
Li, C.; Xue, Y.; Wang, J.; Zhang, W.; Li, T. Edge-Oriented Computing Paradigms: A Survey on Architecture Design and System Management. ACM Comput. Surv. 2018, 51, 1–34. [Google Scholar] [CrossRef]
Cao, H.; Sankaranarayanan, J.; Feng, J.; Li, Y.; Samet, H. Understanding Metropolitan Crowd Mobility via Mobile Cellular Accessing Data. ACM Trans. Spat. Algorithms Syst. 2019, 5, 1–18. [Google Scholar] [CrossRef]
Furno, A.; Fiore, M.; Stanica, R.; Ziemlicki, C.; Smoreda, Z. A Tale of Ten Cities: Characterizing Signatures of Mobile Traffic in Urban Areas. IEEE Trans. Mob. Comput. 2017, 16, 2682–2696. [Google Scholar] [CrossRef]
Wang, S.; Zhao, Y.; Xu, J.; Yuan, J.; Hsu, C.H. Edge server placement in mobile edge computing. J. Parallel Distrib. Comput. 2019, 127, 160–168. [Google Scholar] [CrossRef]
Wang, C.X.; You, X.; Gao, X.; Zhu, X.; Li, Z.; Zhang, C.; Wang, H.; Huang, Y.; Chen, Y.; Haas, H.; et al. On the Road to 6G: Visions, Requirements, Key Technologies, and Testbeds. IEEE Commun. Surv. Tutorials 2023, 25, 905–974. [Google Scholar] [CrossRef]
Adedoyin, M.A.; Falowo, O.E. Combination of Ultra-Dense Networks and Other 5G Enabling Technologies: A Survey. IEEE Access 2020, 8, 22893–22932. [Google Scholar] [CrossRef]
Chun, B.G.; Ihm, S.; Maniatis, P.; Naik, M.; Patti, A. CloneCloud: Elastic execution between mobile device and cloud. In Proceedings of the 6th Conference on Computer Systems, Washington, DC, USA, 16–18 May 2011; ACM: New York, NY, USA, 2011; pp. 301–314. [Google Scholar]
Kosta, S.; Aucinas, A.; Hui, P.; Mortier, R.; Zhang, X. ThinkAir: Dynamic resource allocation and parallel execution in the cloud for mobile code offloading. In Proceedings of the IEEE INFOCOM 2012, Orlando, FL, USA, 25–30 March 2012; IEEE: Minneapolis, MN, USA, 2012; pp. 945–953. [Google Scholar]
Li, Y.; Zhou, A.; Ma, X.; Wang, S. Profit-aware Edge Server Placement. IEEE Internet Things J. 2022, 9, 55–67. [Google Scholar] [CrossRef]
Guo, Y.; Wang, S.; Zhou, A.; Xu, J.; Yuan, J.; Hsu, C.H. User Allocation-aware Edge Cloud Placement in Mobile Edge Computing. Softw. Pract. Exp. 2020, 50, 489–502. [Google Scholar] [CrossRef]
Wang, S.; Guo, Y.; Zhang, N.; Yang, P.; Zhou, A.; Shen, X. Delay-aware Microservice Coordination in Mobile Edge Computing: A Reinforcement Learning Approach. IEEE Trans. Mob. Comput. 2021, 20, 939–953. [Google Scholar] [CrossRef]
Elijah, O.; Abdul Rahim, S.K.; New, W.K.; Leow, C.Y.; Cumanan, K.; Kim Geok, T. Intelligent Massive MIMO Systems for Beyond 5G Networks: An Overview and Future Trends. IEEE Access 2022, 10, 102532–102563. [Google Scholar] [CrossRef]
Hartigan, J.A.; Wong, M.A. A k-means clustering algorithm. Appl. Stat. 1979, 28, 100–108. [Google Scholar] [CrossRef]
MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 1967; pp. 281–297. Available online: https://digitalassets.lib.berkeley.edu/math/ucb/text/math_s5_v1_article-17.pdf (accessed on 18 September 2024).
Arthur, D.; Vassilvitskii, S. K-means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; ACM: New York, NY, USA, 2007; pp. 1027–1035. [Google Scholar]
Haut, J.M.; Paoletti, M.E.; Moreno-Álvarez, S.; Plaza, J.; Rico-Gallego, J.A.; Plaza, A. Distributed deep learning for remote sensing data interpretation. Proc. IEEE 2021, 109, 1320–1349. [Google Scholar] [CrossRef]
Haut, J.M.; Paoletti, M.; Plaza, J.; Plaza, A. Cloud implementation of the K-means algorithm for hyperspectral image analysis. J. Supercomput. 2017, 73, 514–529. [Google Scholar] [CrossRef]

Figure 1. An example of mobile edge computing with EDC placement.

Figure 2. Graphical scheme of Spark architecture for K-Means II algorithm.

Figure 3. Deployment of the 3233 base stations from the dataset over the city of Shanghai.

Figure 4. Performance evaluation results based on the selected metrics: (a) Latency (km); (b) Workload (min).

Figure 5. EDC deployments with base stations grouped based on the number of EDCs, using the serial version (a–c) and the proposed methodology (d–f).

Figure 6. Performance evaluation results of the EDC placement problem implemented with the Apache Spark framework: (a) execution time as a function of the number of worker nodes; (b) execution time as a function of the number of worker nodes on a logarithmic scale; (c) achieved speedup by increasing the number of worker nodes.

Table 1. Information and workload of some base stations included in Shanghai Telecom base station dataset.

Base Station ID	Longitude	Latitude	User Number	Workload (min)
11	121.422303	31.180175	14	18,841
277	121.306923	31.206547	21	25,506
330	121.369095	31.121363	185	252,354
361	121.387532	31.324464	22	29,467
1349	121.448422	31.162868	589	706,841
1448	121.471519	30.824719	103	133,736
1889	121.768142	31.168720	31	42,071
1919	121.341904	30.733903	476	613,174
1994	121.009542	31.099755	55	69,600
2564	121.513919	31.246946	335	448,576
2978	121.182589	31.152749	86	113,609

Table 2. Description of node types used during Spark execution.

Node Type	Description
Single Primary Node	Also known as the master node. It coordinates the execution of tasks across the cluster.
Task Nodes	These nodes are responsible for executing portions of the workload assigned by the master node.
Central Node	This node manages the administration of the Hadoop Distributed File System (HDFS) operations, such as read and write tasks.

Table 3. Comparison of latency (km) and workload (min) results between the serial and distributed versions.

EDCs	Serial Version		Spark Version
EDCs	Latency	Workload ( $\times 10^{7}$ )	Latency	Workload ( $\times 10^{7}$ )
20	4.63944 ± 0.04750	2.620 ± 0.314	4.58038 ± 0.08287	2.618 ± 0.265
30	3.68743 ± 0.04739	1.762 ± 0.058	3.69201 ± 0.04122	1.784 ± 0.134
40	3.18106 ± 0.06106	1.314 ± 0.142	3.13490 ± 0.02953	1.303 ± 0.039

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Calle-Cancho, J.; Cañada, C.; Pastor-Vargas, R.; Paoletti, M.E.; Haut, J.M. Decentralized Mechanism for Edge Node Allocation in Access Network: An Experimental Evaluation. Future Internet 2024, 16, 342. https://doi.org/10.3390/fi16090342

AMA Style

Calle-Cancho J, Cañada C, Pastor-Vargas R, Paoletti ME, Haut JM. Decentralized Mechanism for Edge Node Allocation in Access Network: An Experimental Evaluation. Future Internet. 2024; 16(9):342. https://doi.org/10.3390/fi16090342

Chicago/Turabian Style

Calle-Cancho, Jesus, Carlos Cañada, Rafael Pastor-Vargas, Mercedes E. Paoletti, and Juan M. Haut. 2024. "Decentralized Mechanism for Edge Node Allocation in Access Network: An Experimental Evaluation" Future Internet 16, no. 9: 342. https://doi.org/10.3390/fi16090342

APA Style

Calle-Cancho, J., Cañada, C., Pastor-Vargas, R., Paoletti, M. E., & Haut, J. M. (2024). Decentralized Mechanism for Edge Node Allocation in Access Network: An Experimental Evaluation. Future Internet, 16(9), 342. https://doi.org/10.3390/fi16090342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Decentralized Mechanism for Edge Node Allocation in Access Network: An Experimental Evaluation

Abstract

1. Introduction

2. Methodology

2.1. System Model

2.2. EDC Deployment Strategies

K-Means

2.3. Parallel/Distributed K-Means

3. Experimental Setup

3.1. Dataset Description

3.2. Physical Infrastructure and Capabilities

4. Performance Evaluation

4.1. Comparison of EDC Deployment Strategies

4.2. Absolute Execution Time vs. Number of Worker Nodes

4.3. Execution Time (Logarithmic Scale) vs. Number of Worker Nodes

4.4. Speedup vs. Number of Worker Nodes

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI