1. Introduction
In the 5G network of various mobile terminals and Internet of Things devices, many new applications (such as unmanned driving, augmented reality and virtual reality technology, large-scale online games, etc.) have led to the explosive growth of data communication and computing tasks in mobile networks [
1]. Most of these new applications are computationally intensive and delay-sensitive applications, which have higher requirements for service delay and reliability. However, the existing mobile cloud computing paradigm, which offloads computing tasks generated by mobile devices to the cloud center for processing, makes it difficult to meet the real-time requirements of mobile users. The main reason is that this model not only consumes a lot of core network bandwidth but also has a negative impact on the overall performance of the core network.
In order to solve the problems outlined above regarding the 5G network, the European Telecommunications Standards Institute proposed the concept of mobile edge computing [
2]. The basic idea of mobile edge computing is to sink computing ability and storage capacity to the edge servers of the mobile access network near the users (such as 5G base stations), with the goal of the deep integration of traditional communication networks and Internet services [
3,
4]. Therefore, mobile edge computing can not only greatly reduce the computing delay [
5] but also significantly save the core network bandwidth of the existing cloud computing architecture and improve the economic benefits of Telcos.
Generally, each edge node is located near the base station, and users access the edge node only through the base station. Logically, the base station and the edge node are unified. Although 5G edge nodes can be connected to multiple base stations through the core network, we only consider edge nodes that are physically close to the base station. Because other edge nodes need to connect to this base station through the core network, in our research scenario, the coverage of the 5G base station is equal to the coverage of the edge node on the base station.
Unlike previous mobile networks, the 5G network uses high-frequency and ultra-high-frequency (UHF). The two bands of 5G defined by 3GPP are FR1 and FR2. Our work focuses on the FR2 band. The frequency range of FR2 is 24 GHz to 52 GHz. The existing literature shows that the higher the frequency is, the smaller the coverage area is. In the case of the same power as 4G base stations, 5G base stations using FR2 have smaller coverage under the same geographical location, so more stations are needed to cover the same area. At the same time, 5G base stations are equipped with edge servers supporting computing and storage. So, the cost of 5G base stations is higher.
Obviously, a scientific and reasonable 5G base station deployment scheme can minimize investment costs and maximize service benefits for telecom operators while meeting user requirements. Therefore, considering the cost problem, the placement method of 5G base stations can no longer adopt the full coverage mode like 4G base stations but adopt the placement method in key areas with large populations and many computing requirements (such as stadiums, railway stations, and resident areas, etc.).
By reasonably configuring 5G base stations (called edge nodes in this article) in key areas, it can not only maximize the utilization rate of 5G base stations but also reduce the load of the core network and improve the overall benefit of Telcos. This is also the purpose of our research in this paper.
Therefore, how to find these key areas has become our basic work. Since the key areas are generally the hotspot areas where users stay, we need to first dig out the user’s stay areas by analyzing the characteristics of mobile users, such as the stay areas and dataflow of users, and so on. Then, we determine the deployment location of 5G base stations according to the user’s characteristics.
1.1. Related Works
Early base station placement of 3G and 4G only considers the coverage [
6,
7,
8]; it is easy to cause communication congestion in hotspot areas. Therefore, in the 5G network era, how to make scientific placement and give reasonable locations of base stations has attracted widespread attention.
Zheng et al. [
9] proposed a machine learning framework based on user behavior to solve the placement problem of 5G small base stations and designed a hypergraph construction algorithm to solve it. Liu et al. [
10] proposed an indoor small base station placement strategy for commercial buildings based on power management to extend the use time of mobile devices.
Considering both indoor and outdoor users, Qutqut et al. [
11] proposed a dynamic placement strategy to minimize data delivery costs and the utilization of the micro base station. Lyu et al. [
12] proposed an optimal placement algorithm for UAV base stations based on a rotational placement strategy, which used a minimum number of stations to cover the user terminals. Galkin et al. [
13] first proposed a terminal classification method based on the k-means algorithm and then deployed UAV base stations and ground base stations according to the classification results. Bor-Yaliniz et al. [
14] used a probabilistic line-of-sight model to study the placement of a single UAV to support the traffic diversion of ground base stations. To improve communication quality and coverage in UAV-enabled systems, Carvajal-Rodriguez et al. [
15] presented a systematic study on 3D placement in UAV-enabled communication systems and introduced the threat analysis of this placement. Kalantari et al. [
16] proposed a delay-tolerant and delay-sensitive 3D placement algorithm for UAV base stations and studied the association relationship between users and base stations. Mozaffari et al. [
17] studied the optimization problem of joint placement of micro base stations and UAV base stations to minimize the average delay of the network. Obviously, the above research focuses on communication coverage.
With the rapid evolution of the 5G network, application services through mobile intelligent terminals have become a hotspot in mobile computing. In this scenario, an efficient and reasonable 5G edge node deployment scheme can effectively reduce the computing cost and communication delay of mobile terminals and significantly improve investment efficiency and resource utilization of Telcos.
Considering user requests and resource constraints, Zhai et al. [
18] proposed an edge node deployment method based on the Dueling-DQN algorithm to improve the service requests and service responses. In order to reduce the overall cost of deploying edge networks, Santoyo-González et al. [
19] proposed an edge node placement framework (EdgeON). This framework implemented an optimization strategy for the edge node placement problem in the delay tolerance network.
In order to balance the workload of the edge server and the communication delay between the client and the edge server, Ye et al. [
20] proposed an edge server deployment method based on the genetic algorithm. This method transformed the edge server deployment problem into a two-objective optimization problem under three constraints.
In recent years, location services in autonomous driving and public security based on 5G edge computing have become a new research hotspot. Albanese et al. [
21] proposed a 5G base station placement scheme to address the deployment problem of 5G edge nodes of telecom operators with low investment and high positioning accuracy. This scheme selected the locations of the 5G base station from the candidate sites by a given throughput-positioning ratio (TPR) so as to maximize both throughput and positioning accuracy. Li et al. [
22] proposed a placement algorithm of edge servers based on the access point suitability assessment in a 5G network. This algorithm used the Analytic Hierarchy Process (AHP) and entropy weight method to evaluate the suitability of each access point based on its features and determine whether the access point is suitable for placing an edge server.
Through the cluster analysis of the locations of a large number of users, the distribution characteristics and behavioral characteristics of users can be obtained. These characteristics can improve the location selection of the base station deployment.
Yang et al. [
23] put forward a spatiotemporal activity model, including the users’ spatiotemporal characteristics and users’ activity features. Fatima et al. [
24] analyzed the check-in data of social networks with geographic locations to extract the movement characteristics and similarity characteristics of users and combined these characteristics into the supervised learning algorithm to predict future locations. Noulas et al. [
25] extracted the spatio-temporal check-in data of users and used the linear regression method to predict the next location of users.
Eric et al. [
26] combined the user mobile characteristics with the user application characteristics to predict the usage pattern of mobile applications. Shafiq et al. [
27] constructed a fine-grained model of geographical space and application data in cellular networks and studied the correlation degree between applications and geographic locations. Xu et al. [
28] studied how, when, and where users used various applications to obtain the spatial-temporal distribution of requests from different applications.
1.2. Our Contributions
For rapidly deploying the 5G edge networks, the first addressed problem is how to select the placement location of 5G edge nodes. Because 5G network architecture is more complex, there are more factors to be considered when 5G edge nodes are configured. How to deploy 5G edge nodes reasonably with low cost and high service quality is an NP-hard problem.
Considering the relationship between the 5G edge node and user hotspot, this paper proposes a placement method for 5G base stations. Our major contributions include:
- (1)
We first extract the user trajectory data from the given area and cluster all locations from these trajectories to get the cluster areas.
- (2)
We calculate the characteristics, such as the number of users and duration time in each cluster area, and extract the hotspots from all cluster areas based on the threshold of the number of users and the threshold of the time slices.
- (3)
We define two parameters, the high load utilization rate of the base station and the bandwidth reduction rate of the core network, and take the weighted sum of two parameters as the optimization objective to build a mathematical model and design a greedy algorithm to solve this model.
The remainder of this paper is organized as follows.
Section 2 gives our placement framework of the 5G edge nodes.
Section 3 describes the extraction algorithm of cluster areas based on DBSCAN, the hotspot extraction algorithm based on the sliding time window, and the location selection algorithm of the 5G base station based on the hotspot and benefit constraints. In
Section 4, we conduct a series of experiments to evaluate our proposed methods.
Section 5 provides the conclusions.
3. Our Proposed Method
At present, we mainly use density-based clustering algorithms to analyze the user distribution in a certain target area. However, the density-based clustering algorithm does not consider the timestamp of each location, so the clustering result may not be a real hotspot. For example, the users who quickly pass an area without stopping (such as the subway station entrances) can also be treated as the clustering objects in this algorithm.
In order to obtain more accurate user hotspots, it is necessary to take into account the timestamp contained in the user’s location. Therefore, we must further analyze the time characteristics of the clustering results so as to eliminate the redundant locations obtained by the users in the non-stay state.
Based on the above ideas, we propose a three-stage method to extract the user distribution features in the deployed area. Firstly, the density-based clustering algorithm (such as DBSCAN) is used to obtain the cluster area. Then, the user characteristics (such as the number of users) and the time characteristics (such as the duration time) of all cluster areas are computed based on the sliding time window. Finally, the hotspot areas are extracted to support the placement of the 5G base station based on the user characteristics and the time characteristics of cluster areas.
To better understand our method, the following are some definitions of terms.
Definition 1. Set of candidate locations for base station deployment.
It refers to a set of candidate placement locations for deploying base stations.
where
is the latitude and longitude of the
ith candidate’s location. |
S| is the size of the set
S, that is, the number of candidate locations.
Definition 2. User trajectory.
It refers to the set of a series of locations when the user moves in the geographical spaces according to the time sequence. A location is usually composed of latitude, longitude, altitude, and timestamp. We defined the trajectory of the ith user as . Here, represents the location of the ith user at Tj, and |T| represents the location number of the trajectory.
Based on the above definition, we further use D = to record all user trajectories. Where |D| represents the number of user trajectories in the trajectory set. refers to the jth locations in , and is the timestamp of the jth location in .
Definition 3. User cluster areas.
It refers to the clustered area where the user locations are concentrated within a small distance, that is, the area with a large density of locations. We use C = to record all user cluster areas. Where |C| represents the number of user cluster areas.
Each location in cluster areas is identified using three parameters: the coordinate of this location, the timestamp of this location, and the user number corresponding to this location. For example, the jth location of the ith cluster area is represented by (, ).
Definition 4. User hotspot areas.
It refers to the group cluster areas that meet certain conditions (such as duration time and user number). For example, supermarket, hospital, and stadium, etc. Different hotspot areas usually correspond to different user interest scenarios, so the hotspot area is also called the interest area.
Obviously, for improving the high performance/price at deploying the 5G edge nodes, we first should choice the hotspot areas, especially hotspot areas with long duration time to place 5G base stations.
3.1. The Extraction Algorithm of User Cluster Areas
Compared with other clustering algorithms, the DBSCAN algorithm can quickly cluster and effectively process noise points, and it does not predefine the number of clusters. Therefore, we use the DBSCAN algorithm to extract the cluster areas of users in the first stage. In the DBSCAN algorithm, the following parameters are involved, which have a decisive effect on the clustering result.
- (1)
ε-neighborhood: Given a point p, all the points whose distance from p is within Eps are called the ε-neighborhood.
- (2)
MinPts: This is the threshold of the number of points in the ε-neighborhood. That is, the minimum number of points in the ε-neighborhood when p becomes the core point.
- (3)
Core point: If the ε-neighborhood of a location point contains at least MinPts location points, then the point is called the core point.
- (4)
Boundary point: If the location p is not a core point but falls in the neighborhood of a core point, then p is called a boundary point.
- (5)
Noise point: If the point p is neither a core point nor a boundary point, then point p is called a noise point.
When using the BDSCAN algorithm to cluster data sets, the distance parameter
Eps and density threshold
MinPts need to be determined first. The selection of two parameters directly determines the quality of the clustering results. Current experiments show that when
MinPts is constant if the
Eps is too large, most points will converge into the same cluster; If the
Eps is too small, it will split one cluster into multiple clusters. When
Eps is constant, if the
MinPts are too large, more points will be marked as noise points; if the
MinPts are too small, more points will be clustered into core points. Because the above two parameters are closely related to the sample locations in the dataset, we need to optimize two parameters by many experiments in
Section 4.
Algorithm 1 gives the user cluster area extraction algorithm based on BDSCAN. The input of this algorithm is the trajectory data set
D of all users,
Eps, and
MinPts. The output is the set
C of user cluster areas obtained by clustering.
Algorithm 1. The extraction algorithm of the user cluster areas based on BDSCAN |
Input: | |
Output: | |
1: Initialize k to 0 2: Do 3: Do 4: and i into a new set of NTS 5: into a new location set of NLS 6: Executive DBSCAN clustering, ClusterLabels ← DBSCAN(Eps, MinPts).fit_predict(NLS) 7: Calculate the length of the ClusterLabels, CL←|ClusterLabels| 8: Exact the cluster number from ClusterLabels, CN←| set(ClusterLabels)| − 1 9: Do 10: For each j ≤ CN Do 11: [j]i Then 12: Do 13: # Lcations 14: # Timestamps 15: # User’s number 16: 17: k = k + 1 18: 19: |
In Algorithm 1, the location, timestamp, and user number of all user trajectories are combined into a new set NTS. At the same time, the location coordinates of all user trajectories are formed into a new set of NLS, which is used as the input of the DBSCAN clustering algorithm in Python.
Then, this algorithm calls the DBSCAN instance for clustering and gets the clustering result for all locations, named the clustering label.
Each bit in the cluster label corresponds one-to-one to each location in the NLS. If the label value is −1, the location point is a discrete point. If the label value is an integer greater than 0, it means that the location points have been successfully clustered. All locations with the same label value belong to the same cluster area.
Finally, according to the label value of the cluster result, the location with the same label value, the time stamp, and the user number corresponding with this location are stored in set C.
3.2. The Hotspot Extraction Algorithm Based on the Sliding Time Window
The characteristic of the BDSCAN algorithm is to cluster all the locations in one target area without considering the time factor. Therefore, when a small number of users (such as 10 users) are clustered over a long period of time in a certain area, the area will be clustered into a clustering area. However, the fact is that this clustering area may not be a real cluster area for users.
For example, if an area has a small number of user locations per hour (such as about 10), but the cumulative locations will reach 240 in 24 h and 1680 in a week, so, when the DSCAN algorithm is used to cluster this area, it may be clustered into a clustering area. But when the 1680 locations may be owned by 10 users, it is clear that this clustering area is not a real cluster area, and therefore cannot be called a hotspot.
In the real world, a user hotspot is always associated with time. For example, the student canteen is a hotspot only in the morning, noon, and evening; the student classroom is generally a hotspot during non-mealtimes; a hospital is a hotspot area from 7 a.m. to 10 p.m.; and a dormitory is a hotspot area from 7 p.m. to 11 p.m.
Therefore, we need to determine whether each cluster area based on DBSCAN is a hotspot area according to duration time and number of users in each cluster area.
For example, if a student canteen is a hotspot in a day from 7:30 to 8:30, 11:30 to 12:30, and 18:00 to 19:00, then the duration time of this hotspot area is 3 h.
In order to extract the user number and duration time of each cluster, we designed a feature extraction algorithm of cluster area based on the sliding time window. This algorithm is shown in Algorithm 2.
Firstly, this algorithm defines a time window, denoted by Td, and sets a threshold () of the number of users in the cluster area. The threshold is defined as the minimum number of users in a given area at a given time. A cluster area can be identified as a hotspot only when the user number in a cluster area exceeds in a period of time. Based on historical data, this threshold needs to be less than the parameter MinPts in DBSCAN.
In practice, if a cluster area only accidentally meets the user number threshold in a hotspot for a short time, it is not an actual hotspot. To avoid fake hotspots in cluster areas, we also need to set a time slice threshold, called as .
Threshold is as follows: a cluster area is considered a hotspot area only when the number of users in at least St consecutive time slices is greater than .
Algorithm 2 shows the process of extracting user features and time features of each cluster based on a sliding time window. The idea of this algorithm is as follows:
For each cluster area Ci, the calculation time interval is set into [Ts, Te], where Ts is the start time and Te is the end time.
Starting from Ts, the number of users corresponding to the current cluster within [Ts, Ts + Td] is counted, and save the result to UserNum.
If UserNum ≥ Pt, the consecutive time slices number (called SliceNum) is added by 1, and the total number of users in this cluster area is added by UserNum.
Further, if SliceNum is equal to , the duration time of the ith cluster area is recorded as ← + SliceNum * . If SliceNum is greater than , ← + .
Then, the time window is slid forward using
Td, and then the above processes are repeated until (
+ )
> .
Algorithm 2. The feature extraction algorithm of cluster areas based on the sliding time window |
Input: | C =, , , |
Output: | CF = |
1: Initialize UserNum, SliceNum, , 2: For each cluster Do 3: Calculate the maximum value of all timestamps, Te← 4: Calculate the minimum value of all timestamps, Ts← 5: For each cluster Do 6: StartTime ← Ts # Set the start time of scanning 7: EndTime ← Ts + Td # Set the end time of scanning 8: While StartTime < Te Do 9: For each Do 10: If StartTim EndTime Then 11: UserNum ← UserNum + 1 12: If UserNum ≥ Then 13: SliceNum ← SliceNum + 1 14: StartTime ← EndTime 15: EndTime ← StartTime + Td 16: If SliceNum ≥ Then 17: ← + SliceNum * Td 18: SliceNum ← 0 19: ← + UserNum 20: Put , into 21: Return |
Based on the above processes, we can get the user distribution characteristics () of the ith cluster area.
Here, is the sum of the number of users in all time slices, and each user is recorded only once in each time slice.
If meets the two conditions: UserNum ≥ and SliceNum ≥ , the cluster area is called the hotspot area. If is equal to zero, it means that the area is only a cluster area, not a hotspot area, and will not be considered as an alternative location when the base stations are configured.
Table 1 shows the feature extraction process for a cluster area
Ci. Here, we assume that
= 5,
= 3,
= 10 min, and the initial number of users is 0.
As we can see from this table, there are 5 users in time slice 1, which meets the Pt requirement. We mark SliceNum as 1, and the number of users is accumulated to USerNum, that is, = + UserNum = 5.
Sliding the time window forward, we can see there are 6 users in time slice 2, which is greater than Pt, so SliceNum is marked as 2, and the number of users is also accumulated to UserNum, that is, = + UserNum = 11.
Sliding the time window forward again, there are 7 users in time slice 3, which is greater than Pt. In this case, SliceNum reaches 3 and meets St. The duration time is calculated as follows: = + Td × 10 = 30 min.
Continue to slide time window, there are 2 users in time slice 4, it is less than the Pt threshold, therefore, the duration time SliceNum is unchanged.
Repeat the above process until all time slices are processed.
Finally, two characteristics of the cluster area are obtained: = 55, = 70 min.
3.3. The Location Selection Algorithm of 5G Edge Nodes Based on Hotspots and Benefit Constraint
In practical applications, the primary factor that Telcos consider in building 5G base stations is investment and return. Since the 5G base stations adopt the edge computing technology, its main function is not only to provide traditional voice communication, but also to provide data storage and computing services.
Since the placement of the 5G base stations (or edge nodes) involves a variety of factors (such as personnel density, geographical conditions, economic constraints, etc.), this placement work is considered to be complex system engineering.
Based on the above analysis, we propose the following placement target of 5G base stations:
- (1)
Maximize the high load utilization rate of the base station in the target area.
- (2)
Maximize the bandwidth reduction rate of the core network in the target area.
Definition 5. High load utilization rate of base stations.
It refers to the proportion of the time that the base station is in a high load state in one day. We can describe this definition using the ratio of the time in hotspots to the service time in one day. That is, U(i) = /T.
Here, is the hotspot duration time of the ith base station, and T is the total service time of the ith base station in a day.
In order to convert the high load utilization rate of the base station to a value between 0 and 1, the high load utilization rate of each base station needs to be normalized. Here, we assume that all base stations have the same service time
T in a day. The normalized
U(
i) in the
ith base station is represented as follows:
Here, M is the number of base stations, and .
Definition 6. Bandwidth reduction rate of the core network.
It refers to the ratio of the data generated by all users in new base station to the data generated by all users in all candidate base station in a day. The reason of this define is that the data traffic generated in this hotspot area with new base station will be processed at the edge server, and without occupying the bandwidth of the core network.
We use
B(
i) to represent the bandwidth reduction rate of the core network after the base station
i is placed. That is,
Here, we assume that the data traffic of is in a day, M is the number of candidate base stations,.
According to the above description, if we need to deploy N base stations after extracting M(N ≤ M) hotspots in one target area, the base station placement problem can now be described as a mathematical problem: select N locations from M candidate hotspots to deploy base stations to maximize the base station high load utilization rate and core network bandwidth reduction rate of the communication system in the entire target area. That is, the problem can be formalized as the 0–1 linear programming problem.
Firstly, we define a decision variable Si to indicate whether a base station needs to be placed in the candidate hotspot area i. When Si = 1, a base station is placed in the candidate hotspot area i. When Si = 0, it means that the candidate hotspot area i does not place a base station.
Then, according to decision variables Si, can be calculated.
Finally, by introducing weighting factors:
, the problem of selecting the deployment location of 5G base stations can be formalized into Formula (3).
subject to:
In the above formula, the objective function is to maximize the investment return of Telcos by deploying base stations in hotspots with high demand of computing and data storage.
Constraint C1: ensure that the number of base stations deployed is less than or equal to the number that can be built by Telcos.
Constraint C2: α is the weight of the high load utilization rate, β is the weight of the core network bandwidth reduction rate, the value of both ranges from 0 to 1, and the sum of α and β is 1. If the Tecom operator expects the base station to be in a state of efficient computation for a long time, α should be set to a relatively large value. If the operator wants the base station to be able to handle more data traffic and reduce the load on the core network, the β should be set to a relatively large value.
Constraint C3: it specifies the value of the decision variable.
As we can see from Formula (3), this problem is actually a 0–1 integer linear programming problem, which is also an NP-hard problem. Although it is possible to find the optimal solution by exhaustion, this method is obviously not desirable when M is large enough. Therefore, we design a heuristic greedy algorithm to solve this formula.
This heuristic greedy algorithm is shown in Algorithm 3.
Algorithm 3. The heuristic greedy algorithm of location selection of 5G base station |
Input: | , , |
Output: | S = |
1: Initialize RS← ø, SS← ø 2: For 1 ≤ i ≤ M Do 3: 4: 5: For 1 ≤ i ≤ M Do 6: Calculate the revenue after each base station is placed,
7: Put into RS, RS←RS+{} 8: For 1 ≤ i ≤ N Do 9: Select the largest element from RS, R’ = Max(RS) 10: Remove element R’ from RS, RS←RS + {R’} 11: Put R’ int SS 12: For 1 ≤ i ≤ N Do 13: Calculate the center position Pi of the cluster corresponding to each R’ from SS 14: Put Pi into S 15: Return S |
4. Our Experiments
To test the research work in this paper, we conducted experiments using a location data set collected by Microsoft Research Asia (MSRA) in the GeoLife project [
29,
30]. The GeoLife dataset includes partial trajectory data of 182 users during 2007–2012, most of which were generated in Beijing. The dataset consists of 17,621 tracks covering 1,292,951 km and 50,176 h.
In GeoLife dataset, the time of the trajectories recorded by 182 users is difference, a portion of users have carried a GPS logger for years, while some of the others only have a trajectory dataset of a few weeks. Each user’s trajectories are stored in multiple files according record period. Each location in trajectory is composed of latitude, longitude, altitude, timestamp, etc.
Obviously, for the deployment of 5G base stations, we only need to focus on the distribution of users in a single day in one object area. However, because the location distribution of each user on working days and non-working days is very different, the clustering results will generate a bias when only one day’s user locations are clustered.
In addition, the GeoLife dataset does not include adequate locations belonging to the same day, so we need reconstruct the experimental dataset to evaluate our research. Our method is: the weekly or monthly location data in the GeoLife dataset are converted into the same day’s experimental data of different users. Using this data reconstruction method, the number of users and locations in our experiment has been greatly increased.
For example, we can sample 202,693 locations of 20 original users in the GeoLife dataset from a week (such as 23 October 2008, 00:00:00 to 30 October 2008, 23:59:59). These locations can be construced into our experimental dataset including 20 × 7 = 140 users in one day.
4.1. The Experiment of Clustering Parameters Based on DBSCAN
By analyzing the coordinates of the previous 202,693 locations, we can find that the geographical coordinates of these locations are distributed in the range of latitude 39.8~40.3 and longitude 115.9~117.3.
According to the geographical distance of the earth, the distance between [39.80, 115.9] and [40.3, 117.3] is about 131.5 km. According to the Euclidean distance used in DBSCAN, the distance between [39.80, 115.9] and [40.30, 117.3] is about 1.47. Obviously, the ratio of geographical distance to Euclidean distance is approximately 90 km.
Therefore, we can approximately correlate DBSCAN’s parameter Eps with the geographic distance. That is, when Eps = 0.005, the corresponding geographical distance is approximately 450 m, and when Eps = 0.003, the corresponding geographical distance is approximately 270 m.
- (1)
The influence of MinPts on clustering results
According to the previous description, the coverage of 5G base stations is smaller than that of 4G base stations in the same geographical location, usually within 200~500 m. So, we select three different
Eps (e.g., 0.003, 0.004, and 0.005) to experiment with the influence of
MinPts on clustering results. The experimental results are shown in
Figure 2.
Through the above experiment, we can find that the clustering results fluctuate up and down when MinPts is in [50, 400], and the clustering results are in a descending state when MinPts is in [400, 1600]. When MinPts = 1200, the clustering results are similar in = 0.003, 0.004 and 0.005.
- (2)
The influence of Eps on clustering results
In order to further verify the influence of
Eps on clustering results, we conducted clustering experiments on
ranged from 0.001 to 0.006 under
MinPts is equal to 200, 400 and 600, and the experimental results are shown in
Figure 3.
Through this experiment, we can find that when MinPts is between [600 and 700], the clustering results of different Eps fluctuate slightly up and down and in a relatively stable state. When = 0.0055 (about 500 m), the clustering results are similar in MinPts = 200, 400 and 600. When = 0.005 (about 450 m), the clustering results are similar in MinPts = 800, 1200.
4.2. The Feature Extraction Experiment of Clustering Results
Our experiment environment is a Personal Computer, Windows 10, Intel (R) Core (TM)
[email protected] processor, 16 GB RAM, and Python programming language.
- (1)
Test the runtime of the clustering algorithm based on DBSCAN
When
= 0.002, 0.004 and
MinPts = 800, we separately tested the clustering time of DBSCAN, The test results are shown in
Figure 4. In this test, we select the data of 20 users from 1 to 9 days in the GeoLife dataset, the number of locations ranging from 13,060, 37,517, 86,130, 126,173, 149,941, 175,440, 202,693, 228,854, 259,522 to 316,436.
We can observe from
Figure 4 that the clustering time increases exponentially when
Eps = 0.004. Therefore, we only selected
Eps = 0.002 and extracted about 200,000 locations to test our methods in the following experiments.
- (2)
The influence of Pt on the hotspot extraction
When Eps = 0.002 and MinPts = 400, we cluster 202,693 locations using the DBSCAN algorithm. The clustering result includes 31 cluster areas and 1 discrete area.
We test the number of hotspots extracted at different
Pt with
Td = 0.5 h, 1 h, and 2 h, respectively. The test results are shown in
Figure 5a when
St = 2, and
Figure 5b when
St = 3.
In
Figure 5, we can observe that: only 1 hotspot area is extracted from the 31 clustering areas when
St = 2 and
Pt ≥ 8; 8~10 hotspots are extracted from the 31 clusters when
St = 2 and
Pt = 2; only 1 hotspot area is extracted from the 31 clustering areas when
St = 3 and
Pt ≥ 6; 5~8 hotspots are extracted from the 31 clusters when
St = 3 and
Pt = 2.
It is obvious that the bigger the St is, the less the hotspots that can be extracted, the bigger the Pt is, the smaller the hotspots that can be extracted. So, we can control the number of hotspots by settings St and Pt.
- (3)
The influence of Td on the hotspot extraction
We cluster our experimental dataset with
Eps = 0.002 and
MinPts = 400 and test the number of hotspots at different
Td.
Figure 6a and
Figure 6b show the experimental results when
St = 2 and
St = 3, respectively. From this experiment, we can determine that
Td is also a key factor in extracting the hotspots.
When
Pt = 2 and
Pt = 3, the maximum number of hotspots are extracted at
Td = 6 in
Figure 6a and
Td = 2 in
Figure 6b because the duration time of hotspots can be computed using
St*
Td. This experiment preliminarily verified the rule that hotspots should continue 6–12 h.
4.3. The Location Selection Experiment of Base Station Placement
Our experiments test the high load utilization rate of the base station and the bandwidth reduction rate of a core network using different methods. Some baseline methods [
19,
20] for comparison include:
- (1)
Top-U. This method [
20] takes the hotspots with the Top
N number of users as candidate locations of base stations. In this strategy, more users mean more data traffic in the base station. So, the bandwidth savings on the core network are the greatest when these candidate locations are plcaed the 5G edge nodes.
- (2)
Top-T. In this method [
20], the hotspots with the Top
N duration time are used as candidate locations of base stations. The longer the duration of the hotspot, the more it means that this hotspot will be in a state of high demand for a long time, so the higher the utilization rate of edge nodes deployed in 5G base stations can be obtained.
- (3)
Random. This method randomly selects N hotspots in the candidate hotspots to deploy the 5G base stations.
We cluster our experimental dataset with Eps = 0.002 and MinPts = 400 and extract some hotspots from these clusters using St = 2, Pt = 1, and Td = 2. Based on these hotspots, we test the high load utilization rate and the bandwidth reduction rate under four strategies.
Figure 7a shows the effect of the number of 5G edge nodes deployed on the high load utilization rate of base stations. As we can see from this figure when the number of edge nodes deployed is the same, Top-T has the highest utilization rate of base stations in high load, followed by our method (here,
α =
β = 0.5), Top-U ranks third, and random method has the lowest. When α = 1, our method is equivalent to Top-T. Further tests show that when the number of deployed edge nodes is the same, compared with Top-U, the high load utilization rate of our method can be increased by up to 7.69%.
Figure 7b shows the effect of the number of edge nodes deployed on the bandwidth reduction rate of the core network. As we can see from
Figure 7 when the same number of edge nodes are deployed, the core network bandwidth reduction rate of Top-U is the highest; our method is the second highest (here, the weight parameter is set to
α =
β = 0.5), Top-T is the third, and random method is the lowest. When
= 1, our method is equivalent to Top-U. Compared with the Top-T, the bandwidth reduction rate of the core network in our method can be improved up to 6.34%.
In addition, through further analysis of two experiments, we can see that with the increase of the number of edge nodes, the high load utilization rate and bandwidth reduction rate both increase rapidly at the beginning, but when the number of of 5G edge nodes deployed reaches about 80% of the number of candidate base stations, the growth rates of bandwidth reduction rate has been decreased. Obviously, the benefits of deploying the same number of 5G edge nodes at this time are decreasing.
Figure 8a shows the clustering results of our reconstructed experimental data based on DBSCAN with
Eps = 0.002 and
MinPts = 400.
Figure 8b shows the center coordinates of hotspot areas extracted from the clustering results. These coordinates will eventually be used as candidate locations for deploying 5G edge nodes.
In
Figure 8a, the dark blue lines are the trajectory formed by the user’s locations, and the colored dots are the clustering centers. In
Figure 8b, the green lines are the user’s trajectory, and the blue dots are the center coordinates of hotspot areas extracted from clusters.