1. Introduction
In recent decades, satellite constellation networks have been developed to provide multiple ground traffic services for continuous global coverage, which can effectively supplement the coverage-limited terrestrial networks, such as air traffic monitoring [
1] and ship trajectory identification [
2]. Since the traffic distribution is geographically non-uniform, e.g., aircraft traffic distribution is dense in population agglomeration while scarce in vast ocean regions, satellite coverage traffic volume (SCTV) changes drastically during the satellite movement. To better fulfill the ground traffic service task, it is necessary to predict SCTV in advance so that the resource-limited satellite network could allocate the onboard resources dynamically, e.g., provide more power for payloads to work with full capacity. With the auxiliary information of the predicted upcoming ground traffic, He et al. [
3] optimize the frequency reuse and onboard transmit power. Moreover, Yu et al. [
4] adaptively adjust the onboard receiver configuration to improve the overall signal detection probability. To predict SCTV, traditionally a global SCTV distribution data table is first statistically constructed on the ground according to historical data and uploaded to the satellite. Then SCTV is predicted onboard by the data table lookup. To update the table with dynamically accumulated data, the SCTV data table should be uploaded each time the satellite passes over the ground station. Moreover, for better SCTV prediction, more data is preferred to construct a table with fine resolution. Then large data transmission and storage are necessitated, which is prohibitive for satellite communication and onboard data handling. To solve these problems, this paper proposes to distill the data into a surrogate model to be uploaded to satellite, which can both save the valuable communication link resource (as much fewer surrogate model parameters need be uploaded only) and improve the SCTV prediction accuracy compared to table lookup.
Recently, a surrogate modeling method, namely meta-modeling method, has been widely studied in data-driven modeling, mainly including Polynomial Response Surface (PRS) [
5], Kriging [
6,
7], Radial Basis Function (RBF) [
8], Support Vector Regression (SVR) [
9], etc. In the work of Song et al. [
10], the performance of PRS, RBF, Kriging, and SVR are compared in the design optimization of foam-filled tapered structures, and the results show that no single model is the best for approximating all objective functions in the considered problems. Forrester and Keane [
11] review different meta-modeling methods used in surrogate-based optimization, and recommend that the choice of which surrogate to use should be based on the problem size, the expected complexity and the cost of the analyses. Similarly, Bhosekar et al. [
12] investigate recent advances in the field of surrogate models for problems in modeling, feasibility analysis, and optimization. They conclude that the correct selection of surrogates should consider the type of problem at hand. From the previous research, the consensus is that each surrogate has its own superiorities and drawbacks, and different surrogates are suitable for approximating different objective functions [
13]. For the SCTV prediction problem with geographically changing distribution features, as shown in the experimental study in
Section 4, it is also observed that the single surrogate can hardly perform universally well in this problem.
To increase the approximation quality of the surrogate model, much research has been conducted into integrating multiple surrogates into a single ensemble to exploit the advantages of different surrogates for better approximation accuracy and robustness. One popular ensemble method is using the weighted sum approach [
14]. Goel et al. [
15] study the effectiveness of the weighted aggregation method for the approximation of helicopter vibrations. Wang et al. [
16] employ the weighted average surrogate to solve the problem of computationally expensive function evaluations in optimization. Gu et al. [
17] construct the ensemble of PRS, Kriging, RBF, and SVR for the approximation of an occupant protection system. For ensemble modeling, on one hand, the prediction accuracy of the ensemble is greatly influenced by the performance of the contributing surrogates. Viana et al. [
18] find that adding inaccurate surrogates into the ensemble is likely to result in loss of accuracy. On the other hand, the weight factors also have a significant effect on the prediction accuracy of the ensemble. To obtain the ensemble with better performance, some research focuses on how to solve the appropriate weights considering the regional characteristics of the objective function to be approximated or predicted [
19]. Zhang et al. [
20] determine the weight of each contributing surrogate based on the local measure of accuracy in the pertinent trust region. Yin et al. [
21] divide the design space into multiple sub-domains, each of which is assigned a set of optimized weights. These optimized weights are determined by minimizing the error metric of the training points in the corresponding sub-domains. Lee et al. [
22] propose a pointwise ensemble which calculates the weights based on the
v-nearest points cross-validation error. Although these studies could enhance the positive effects of the accurate contributing surrogates by increasing their weights in the local area, they do not completely eliminate the negative influences of the inaccurate contributing surrogates, leading to the relatively low accuracy of the ensemble model [
23]. Moreover, for the SCTV prediction problem, the specific practical problem features should be considered for effective ensemble modeling.
In this paper, an effective surrogate ensemble modeling method is proposed for the SCTV prediction. First, the global earth surface domain is split into multiple sub-domains according to the prior geographical knowledge of the SCTV distribution, and then multiple different candidate surrogates are constructed on each sub-domain, respectively. Second, to fully exploit these surrogates and combine into a more accurate ensemble, a partial weighted aggregation method (PWTA) is developed. Because each sub-domain has distinct SCTV features, and different surrogates have different performance, PWTA adaptively selects the candidate surrogates with higher accuracy as the contributing models (the negative inaccurate surrogates are eliminated) for each sub-domain, based on which the ultimate ensemble is constructed in each sub-domain. In this way, for the sub-domains, there are independent positive contributing surrogates and weights so that the ensembles are more suitable for the corresponding sub-domains. Thus, the proposed surrogate ensemble modeling method could capture the regional SCTV features better in each sub-domain. The method proposed in this paper mainly has two contributions: (a) instead of constructing candidate surrogates in the global domain, multiple independent candidate surrogates are built for each sub-domain. (b) In each sub-domain, unlike integrating all the candidate surrogates to build an ensemble, the candidate surrogates are adaptively selected as contributing surrogates to construct a single ensemble.
The rest of the paper is organized as follows. In
Section 2, a brief review of PRS, Kriging, RBF, the weighted aggregation method, and the BestGMSE surrogate are introduced. In
Section 3, the satellite coverage traffic volume model is described, and the partial weighted surrogate ensemble modeling method is developed in detail. In
Section 4, the proposed surrogate ensemble modeling method is testified in the SCTV prediction problem with engineering data, followed by the conclusions in the final section.
3. Satellite Coverage Traffic Volume Modeling and Prediction Approximation
In this section, SCTV is described as the objective function to be modeled with respect to the ground sites as the input based on the historical data. To improve the balance between the SCTV prediction accuracy and data transmission as well as storage efficiency, an effective surrogate ensemble modeling method is proposed to approximate the objective function, which mainly includes two parts. First, the global earth surface domain is divided into multiple sub-domains according to specific SCTV distribution features. Second, for each sub-domain, multiple different candidate surrogates are established, and a multi-surrogate management method is developed to adaptively select the contributing surrogates and combine them into a single ensemble with better performance.
3.1. Satellite Coverage Traffic Volume Modeling
Given any ground site
, where
and
are the geographical longitude and latitude, the ground traffic density can be statistically obtained according to the historical data, denoted as
. For the SCTV calculation, the ground traffic density in the satellite coverage region
around the site
should all be considered. First, the area
of the coverage region can be calculated by
where
is the radius of the earth,
and
are the altitude and half-beam angle of the satellite with the corresponding nadir point site
.
is the geocentric half-cone-angle of the coverage region. The diagram of the satellite coverage region on the earth’s surface is presented in
Figure 1. The SCTV
is defined as
where
are the ground sites which belong to the satellite coverage region
around the nadir point site
. Detailed solution process of SCTV can be found in the literature [
25]. Notice that for most ground business, there would be large geographical distribution variances of SCTV. Take air traffic monitoring as an example, the air traffic SCTV data is downloaded from TianTuo-3 (National University of Defense Technology, Changsha, China) as shown in
Figure 2 (National University of Defense Technology developed and launched TianTuo-3 micro-satellite in May 2014, which achieves worldwide collection of the air traffic SCTV data [
26]). It can be seen that aircraft traffic distribution is dense in population agglomeration while scarce in vast ocean regions.
To predict SCTV, traditionally a global SCTV distribution data table is first statistically constructed on the ground according to historical data and uploaded to the satellite. Then SCTV is predicted onboard by the data table lookup. When the SCTV distribution is scarce, the satellite payload is preferably kept at low power or shut down to save onboard resources. However, with the dense SCTV distribution, the payload should be maintained at high power (with full capacity) for better reception of the real-time signals. To update the date table with dynamically accumulated data, the SCTV data table should be uploaded each time the satellite passes over the ground station. Moreover, for better SCTV prediction, more data is preferred to construct a table with fine resolution. Then large data transmission and storage are necessitated, which is prohibitive for satellite communication and onboard data handling. To solve these problems, this paper proposed to distill the data into a surrogate model to be uploaded to satellite. Through sampling, a small amount of training points with the corresponding SCTV responses , the surrogate can be constructed on the ground. Then the surrogate is uploaded to the satellite instead of the SCTV data table. When the satellite passes over the ground station, only a few parameters of the surrogate are demanded to update. In this way, the valuable communication link resource would be saved, and the SCTV prediction accuracy could be improved compared to table lookup. Moreover, for satellite missions of collecting the ground traffic business, such as air traffic monitoring and ship trajectory identification, the SCTV is only related to the ground traffic. Thus, the proposed method and SCTV prediction results can be directly generalized to more complex satellite constellations.
3.2. Surrogate Ensemble Modeling for SCTV Prediction
To further improve the SCTV prediction accuracy, an enhanced surrogate ensemble modeling method is investigated by dividing the global earth surface domain into multiple sub-domains and managing multiple surrogates in each sub-domain. The main idea of the proposed surrogate ensemble modeling method is to allow each sub-domain to have independent contributing surrogates and weight factors so that the ensembles are more suitable for the corresponding sub-domains. Compared with direct modeling in the global domain, the proposed method seeks to better capture the local characteristics of the objective function in each sub-domain with different SCTV features.
The prior knowledge of the SCTV features in the global domain, namely the geographical knowledge of SCTV distribution, is generally known according to historical experiences, which could be used to guide the sub-domain division effectively [
27]. According to aircraft traffic distribution which is dense in population agglomeration while scarce in ocean regions, the global earth surface domain is split into 12 sub-domains: (a) North America; (b) Pacific Ocean; (c) Antarctica; (d) Western Europe; (e) Caribbean Sea; (f) South America; (g) Atlantic and Western Africa; (h) Russia; (i) Middle East; (j) Indian Ocean; (k) East Asia; (l) Oceania, as shown in
Figure 3 (the global coastline is drawn in MATLAB (MathWorks, Natick, MA, USA)). Furthermore, (b) the Pacific Ocean, (c) Antarctica, (g) Atlantic and Western Africa and (j) Indian Ocean have scarce traffic distribution due to the large marine area. Hence, the satellite payload could be kept at low power or shut down at these sub-domains, and there is no need to construct the surrogate in these areas. For the other eight regions, the onboard resources need be allocated dynamically, and in this paper surrogates are built for these eight regions. Notice that for different satellite missions, the geographical distribution variances of SCTV present different characteristics. Thus, the division of the global earth surface domain with prior knowledge should be based on the specific mission background. For example, ship traffic distribution is dense in vast ocean regions while aircraft traffic distribution is scarce in those areas, and therefore the focus of the division should be different for these two missions. In this paper, surrogate modeling for air traffic SCTV prediction is studied for illustration.
Due to the geographical distribution variances of SCTV, there are distinct SCTV features in each sub-domain. To enhance the surrogate accuracy in each sub-domain, the ensemble modeling method is an effective way [
18]. The commonly used approach is the weighted aggregation method described in
Section 2.4. However, when forming a weighted ensemble by (13) the weighted sum of all the candidate surrogates, it is possible that the inaccurate surrogate is included which will lead to loss of accuracy. Based on this consideration, instead of employing all the candidate surrogates, in this paper it is proposed to only select a part of them as contributing models with high accuracy to constitute the ensemble, which is named partial weighted aggregation method (PWTA). The details of PWTA are as follows.
First, for the
sub-domain, construct multiple different candidate surrogates, and calculate the corresponding GMSE based on leave-one-out cross-validation by (15) of each surrogate as the criterion to measure its accuracy. Then the sequence of the surrogates according to GMSE in the ascending order can be obtained, and denote the candidate model set with the ranking sequence as
with the corresponding GMSE and predictor sets denoted as
and
. Here,
is the total number of the candidate surrogates. To choose the relatively more accurate surrogates from the candidate set
so as to compose a more accurate ensemble, the first issue is to define the number of contributing surrogates
to be selected. There are two important points that should be taken into consideration during setting the threshold for the “more accurate” candidate selection. On one hand, because each domain has distinct SCTV features and different surrogates have different performance, the threshold is preferred to be decided adaptively in each sub-domain rather than simply fixed by a specific number or ratio. On the other hand, considering that GMSE values of the inaccurate surrogates might greatly deviate from those of the accurate surrogates, the threshold could be determined by borrowing the idea of identifying the outliers [
28] so as to rationally screen out the surrogates with low accuracy (or comparatively large GMSE values). According to these considerations, the number of contributing surrogates
to be selected in the
sub-domain is set as
where
is a user-defined control parameter. From our numerical experience, it is appropriate to take 1 to 3 for
. When the value of
is small, there may be less candidate surrogates chosen as positively contributing models. With the enlarged
value, it is likely to select more candidate surrogates into the ultimate ensemble.
and
are the mean and standard deviation of the GMSE set in the
sub-domain. To eliminate the negative effect of the inaccurate models GMSE so as to obtain a robust estimation,
and
are solved using the first
(
denotes the rounding operation) elements of the set
[
29]
From Equations (19) and (20) it can be observed that for different sub-domains, there may be different types as well as different numbers of the contributing surrogates. To combine the independent contributing models into the ultimate ensemble for each sub-domain SCTV prediction, the associated weights are calculated for the
sub-domain by
and the prediction value of the ensemble in the
sub-domain is
After the surrogate ensemble modeling procedure, eight ensembles are obtained for the eight sub-domains. Notice that for points at the boundary of the sub-domains which belong to two or more sub-domains at the same time, their SCTV values are determined by averaging the predictions of the ensembles from the sub-domains sharing these boundaries.