A New Robust Lunar Landing Selection Method Using the Bayesian Optimization of Extreme Gradient Boosting Model (BO-XGBoost)

Wen, Shibo; Wang, Yongzhi; Gong, Qizhou; Liu, Jianzhong; Kang, Xiaoxi; Liu, Hengxi; Chen, Rui; Zhu, Kai; Zhang, Sheng

doi:10.3390/rs16193632

Open AccessArticle

A New Robust Lunar Landing Selection Method Using the Bayesian Optimization of Extreme Gradient Boosting Model (BO-XGBoost)

by

Shibo Wen

¹,

Yongzhi Wang

^1,2,*

,

Qizhou Gong

³,

Jianzhong Liu

^4,5,

Xiaoxi Kang

^6,7,

Hengxi Liu

¹

,

Rui Chen

⁸,

Kai Zhu

^4,5

and

Sheng Zhang

^4,5

¹

College of Geoexploration Science and Technology, Jilin University, Changchun 130026, China

²

Institute of Integrated Information for Mineral Resources Prediction, Jilin University, Changchun 130026, China

³

College of Instrumentation and Electrical Engineering, Jilin University, Changchun 130061, China

⁴

Center for Lunar and Planetary Science, Institute of Geochemistry, Chinese Academy of Sciences, Guiyang 550081, China

⁵

CAS Center for Excellence in Comparative Planetology, Chinese Academy of Sciences, Hefei 230026, China

⁶

Deep Space Exploration Laboratory, Beijing 100043, China

⁷

Lunar Exploration and Space Engineering Centre, China National Space Administration, Beijing 100190, China

⁸

State Key Laboratory of Lunar and Planetary Sciences, Macau University of Science and Technology, Macau 999078, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(19), 3632; https://doi.org/10.3390/rs16193632 (registering DOI)

Submission received: 1 September 2024 / Revised: 23 September 2024 / Accepted: 27 September 2024 / Published: 29 September 2024

(This article belongs to the Section Satellite Missions for Earth and Planetary Exploration)

Download

Browse Figures

Versions Notes

Abstract

:

The safety of lunar landing sites directly impacts the success of lunar exploration missions. This study develops a data-driven predictive model based on machine learning, focusing on engineering safety to assess the suitability of lunar landing sites and provide insights into key factors and feature representations. Six critical engineering factors were selected as constraints for evaluation: slope, elevation, roughness, hillshade, optical maturity, and rock abundance. The XGBoost model was employed to simulate and predict the characteristics of landing areas and Bayesian optimization was used to fine-tune the model’s key hyperparameters, enhancing its predictive performance. The results demonstrate that this method effectively extracts relevant features from multi-source remote sensing data and quantifies the suitability of landing zones, achieving an accuracy of 96% in identifying landing sites (at a resolution of 0.1° × 0.1°), with AUC values exceeding 95%. Notably, slope was recognized as the most critical factor affecting safety. Compared to assessment processes based on Convolutional Neural Networks (CNNs) and Random Forest (RF) models, XGBoost showed superior performance in handling missing values and evaluating feature importance accuracy. The findings suggest that the BO-XGBoost model shows notable classification performance in evaluating the suitability of lunar landing sites, which may provide valuable support for future landing missions and contribute to optimizing lunar exploration efforts.

Keywords:

moon; landing site prediction; feature importance; XGBoost; Bayesian optimization

1. Introduction

Driven by scientific curiosity, technological advancement, and the potential for utilizing lunar resources and establishing future human habitation, lunar exploration has remained a top priority in the global space arena. As early as the 1950s, multiple countries had proposed concepts for lunar exploration and resource utilization [1]. The first successful soft landing of the Luna 9 spacecraft by the Soviet Union in 1958 marked the beginning of humanity’s lunar landing effort [2]. The subsequent Apollo missions also conducted multiple soft landings on the Moon and manned explorations [3]. Over the decades, numerous successful robotic and crewed lunar landing missions, along with laboratory analyses of returned lunar samples and meteorites, have greatly enhanced our understanding of the Moon’s geology, topography, and environment [4]. Notably, in 2020, the Chang’e-5 mission accomplished the first unmanned sample return since the Apollo era, collecting 1731 g of lunar samples and revealing valuable insights into the distribution and evolutionary history of lunar materials, providing invaluable new evidence for studying the origin and evolution of the Moon [5,6].

Currently, lunar landing exploration is shifting towards a more comprehensive and forward-looking direction. In addition to ensuring safe landings, scientists are increasingly focused on the potential of landing sites to provide valuable scientific information and resources, such as geological structure, heat flow, and mineral deposits. This focus aims to enhance our understanding of the Moon’s formation and evolution while laying the groundwork for future lunar development and utilization [7,8]. Therefore, accurately predicting, evaluating, and identifying the optimal landing area that meets scientific exploration objectives, engineering constraints, and resource potential has become a critical step in ensuring the success of lunar exploration missions. However, the selection of optimal landing sites on the lunar surface is a complex and multifaced challenge, requiring the integration of various factors, including terrain characteristics, environmental conditions, resource availability, etc.

From an engineering perspective, landing areas must ensure safe and reliable operations [9,10]. Remote sensing and geological surveys provide a thorough understanding of a target region’s terrain and rock distribution, helping to identify relatively flat and safe landing sites. This ensures successful landing, mobile exploration, and instrument operation [11]. Scientific value is equally important. Different regions may be suitable for different scientific exploration targets, and areas with a smaller spatial range containing multiple scientific objectives are considered ideal landing sites. For example, the near side of the Moon is suitable for studying the Moon–Earth plasma, while the far side is better for very-low-frequency radio surveys [12,13]. The youngest or oldest regions on the surface can help establish an accurate absolute chronology [14]. Although the polar regions pose challenges due to their complex terrain and lighting conditions, their unique scientific value has drawn widespread attention [15]. While choosing lunar landing sites requires balancing multiple factors, engineering feasibility remains crucial.

Currently, methods for selecting lunar landing sites primarily rely on manual analyses of lunar exploration data. This involves multi-factor overlay and weighted scoring of high-resolution images, topographic maps, slope, temperature, and other data, combined with expert knowledge and judgment to assess landing suitability [16]. However, this process is often tedious and time-consuming, making it challenging to identify optimal landing points quickly. Moreover, subjective factors may influence human analysis, complicating the precise identification and evaluation of subtle but critical topographic features. Researchers are also exploring the use of machine learning to enhance the selection process. As a data-driven technique, machine learning can automatically and effectively learn features from large and complex datasets, executing higher-level abstractions to reveal potential patterns and trends [17], thereby aiding in the prediction and classification of lunar landing zones. Even with small sample datasets, intelligent feature learning can significantly improve classification performance [18,19,20].

Various machine learning-based assessment methods have emerged for evaluating the suitability of potential landing areas, including fuzzy cognitive and selection models, hierarchical clustering analyses based on topographic data, K-means clustering models based on slope data, and optimal selection models based on multi-factor evidence layer weighting and fractals [18,19,20]. New methods, such as CNN-based assessments of lunar south pole landing suitability, also show great promise. Generative adversarial networks (GANs) can also simulate potential landing scenarios, providing further insights into site viability [21,22]. These optimized methods have preliminarily established a practical quantitative evaluation framework for lunar landing site selection, indicating that automated optimal solutions for lunar landing site selection are indeed feasible. However, existing methods are mainly limited by their inability to apply uniform standards to assess the complex and diverse lunar environments and features, as well as their coarse analytical resolution, which fails to capture rapidly changing surface conditions. Furthermore, rapid and accurate predictions are crucial when selecting suitable lunar landing points from large, complex datasets, especially when the interpretability of the results is a crucial requirement.

The XGBoost algorithm is a particularly effective approach among various machine learning models. As an ensemble learning method based on Gradient Boosting Decision Trees (GBDT), XGBoost typically offers high predictive accuracy when dealing with intricate datasets [23]. XGBoost significantly enhances training speed through parallel computation and cache optimization. In contrast, CNNs, which rely on complex convolution operations and extensive parameter tuning, often require longer training on large-scale image datasets. Additionally, XGBoost incorporates built-in regularization techniques that effectively mitigate overfitting, improving the model’s generalization capability and making it particularly suitable for data-driven lunar landing site prediction. Another advantage of XGBoost is its ability to handle missing values automatically, which reduces the need for extensive data preprocessing. Furthermore, XGBoost provides feature importance assessments, allowing users to gain insights into the model’s decision-making process. Its support for distributed computing enables the handling of larger datasets effectively. Given these advantages, this study employed the XGBoost algorithm for the prediction and evaluation of lunar landing sites. The main objectives of this study were as follows:

To develop a machine learning-based predictive model for future lunar landing site selection from an engineering safety perspective and conduct a landing suitability assessment.
To provide insights into the key factors and feature representations that significantly enhance both landing exploration safety and the overall predictive accuracy of the model.
To evaluate the performance of the proposed model against established criteria and expert-driven methods, as well as compare it with other models such as the CNN model.

The results of this research are expected to enhance the optimization of future lunar exploration missions, improve their safety and success rates, and improve long-term planning and strategic decision-making regarding lunar settlement and resource utilization.

2. Materials and Methods

2.1. Engineering Dataset

To ensure engineering safety and data availability, this study selected six key topographic factors to quantify the favorability of exploration missions, as shown in Table 1. These factors include the Digital Elevation Model (DEM), slope, roughness, hillshade, rock abundance, and optical maturity (OMAT).

The rationale behind selecting these factors lies in their direct impact on the safety and stability of landing sites. The DEM and slope provide an overall understanding of the lunar surface’s topographic characteristics, helping to assess terrain undulation and the flatness of a landing area, which are critical for ensuring the stability and safety of the landing process. The DEM data were derived from joint mapping by the Lunar Orbiter Laser Altimeter (LOLA) and Selenological and Engineering Explorer (SELENE), producing a near-global lunar DEM with a resolution of approximately 512 pixels per degree (ppd) [24], thus providing high-precision topographic information essential for landing site selection. The slope data were based on global maximum slope data (in degrees) from the SLDEM shape data at a resolution of 512 ppd [25].

Excessive hillshade can obstruct communication and solar power generation, limiting lander performance. Thus, a hillshade map derived from the LOLA data was included [26]. High rock density poses risks to the lander, so rock abundance data were obtained from maps of the lunar surface temperature generated by the Diviner Lunar Radiometer Experiment on the Lunar Reconnaissance Orbiter (LRO), covering the region of 70°S–70°W, 180°S–180°N from July 2009 to July 2022 [26]. To cover more of the lunar surface, Diviner rock abundance data from 5 July 2009 to 30 November 2010 for the regions of 70°S–80°S and 70°N–80°N were used [27]. Excessive surface roughness can impact landing stability, with roughness data derived from LOLA at a 100 m scale [28]; OMAT indicates the weathering degree of a landing area surface, affecting the performance of the visual navigation system. The OMAT data were derived from the lunar image captured by the Ultraviolet/Visible (UVVIS) camera on the Clementine satellite [29].

Table 1. The six key engineering data used in this study and their sources.

	Name	Initial Resolution (ppd)	Source
1	LRO LOLA SLDEM	512	https://pgda.gsfc.nasa.gov/products/54, accessed on 7 July 2024
2	LRO LOLA Slope	512	https://imbrium.mit.edu/DATA/SLDEM2015_SLOPE/, accessed on 7 July 2024
3	LRO LOLA Hillshade	512	https://trek.nasa.gov/moon/TrekWS/rest/cat/metadata/fgdc/html?label=LRO_LOLAKaguya_ClrHillshade_60N60S_512ppd, accessed on 7 July 2024
4	LRO Diviner Rock Abundance	128	https://dataverse.ucla.edu/dataset.xhtml?persistentId=doi:10.25346/S6/LFAVXU, accessed on 7 July 2024 https://pds-geosciences.wustl.edu/lro/urn-nasa-pds-lro_diviner_derived1/data_derived_gdr_l3/, accessed on 7 July 2024
5	LRO LOLA Roughness	256	https://pds-geosciences.wustl.edu/lro/lro-l-lola-3-rdr-v1/lrolol_1xxx/data/lola_gdr/, accessed on 7 July 2024
6	Clementine UVVIS OMAT	512	[25]

2.2. Data Standardization and Preprocessing

Using ArcGIS 10.8 software, the rock abundance and roughness data were resampled to 512 ppd using the bilinear interpolation method, and all data were standardized to the GCS_MOON_2000 coordinate system (EPSG: 103881). The original data were further processed to quantify the vague engineering constraint indicators to effectively train the dataset, facilitating a more in-depth analysis. Given the lack of a unified evaluation standard for lunar engineering constraints and the insufficient quantification of the relationship between different indicators, this study proposes a constraint indicator evaluation method based on existing lunar research [5,8,11,30,31,32,33].

Each constraint indicator was divided into six levels, and a non-linear scoring system was adopted to assign the scores of different levels: highly suitable (10), suitable (9), moderately suitable (3), slightly unsuitable (2), and unsuitable (1). The specific quantification standards for each engineering constraint are provided in Table 2. The final total score was obtained by taking the arithmetic mean of the scores for each constraint. This scoring method allowed for iterative adjustments of indicator weights during model training, enabling the model to learn scientifically reasonable relationships between the indicators. However, the existing scoring method resulted in discontinuities in the final scores, making it impossible to apply the XGBoost algorithm directly for modeling. To address this issue, original scores were re-encoded so that the target variable could be directly input into the XGBoost model for multi-class prediction.

2.3. Prediction Model Construction

As shown in Figure 1, the research first selected an area with comprehensive data coverage and extensive prior study (65°W–65°E, 45°N–15°S) and assigned scores to the regional data (Table 2) to create a dataset for training and testing the model. The rationale for this selection included the following:

The original data from various datasets had good coverage for this region, providing a comprehensive and reliable training dataset for the model.
This area has been frequently chosen as a landing site for lunar missions, such as in Apollo, Surveyor, Luna, and Chang’e 3 programs, indicating its suitability for successful landings.
Extensive research and landing activities in this region suggest favorable engineering conditions, allowing the model to learn from previous explorations and improve its understanding of optimal landing site selection.

Following common practices in machine learning, the dataset corresponding to the training area was randomly divided into a training set and a testing set at a 7:3 radio. The choice of 70% for the training set struck a balance between effective model training and adequate validation. This proportion allowed the model to learn diverse patterns while retaining 30% of the data for rigorous testing, ensuring generalization and preventing overfitting [34]. The model’s performance was evaluated using various metrics on the testing set to predict its potential effectiveness on external datasets. After completing model training and evaluation, the optimized model was used to predict landing suitability across the entire lunar surface. Subsequently, the feature weights for each characteristic were also calculated to quantify the impact of various engineering constraint indicators on the final landing site scores. During training, the Bayesian optimization method was employed to optimize hyperparameters, such as the number of iterations, learning rate, and decision tree behavior, guiding the XGBoost model towards optimal performance. The effectiveness of the Bayesian optimization was assessed using 5-fold cross-validation. Model predictions were implemented using Jupyter and Python 3.10.14.

2.3.1. XGBoost Algorithm

XGBoost is a machine learning algorithm based on the gradient boosting framework that is widely used for classification and regression tasks. It utilizes a second-order Taylor expansion to optimize the loss function, allowing for a more accurate approximation of the true loss function [23]. Compared to traditional GBDT, XGBoost features several optimizations, including support for multi-threaded CPU parallelization and a regularization term in the loss function to prevent overfitting. These improvements give XGBoost significant advantages in computational efficiency and generalization performance. The mathematical expression of the XGBoost model is as follows:

{\hat{y}}_{i} = \sum_{k = 1}^{K} f_{k} (x_{i}), f_{k} \in F

(1)

In Equation (1),

{\hat{y}}_{i}

represents the predicted value for the

i

sample,

k

denotes the number of decision trees,

x_{i}

represents the input data for the

i

sample, and

f_{k} (x_{i})

represents the function of the

k

decision tree generated in the

k

iteration, where

f_{k}

is a function in the function space

F

of the tree ensemble.

The objective of the model is to iteratively optimize the weak classifiers

f_{k}

to minimize the overall loss function:

L (ϕ) = \sum_{i = 1}^{N} l (y_{i}, {\hat{y}}_{i}) + \sum_{K = 1}^{K} Ω (f_{k}) = \sum_{i = 1}^{N} l [y_{i}, {\hat{y}}_{i}^{t - 1} + f_{t} (x_{i})] + \sum_{K = 1}^{K} Ω (f_{k})

(2)

In Equation (2),

l

is the loss function that describes the error between the predicted and actual values, while

Ω

is the regularization term used to control model complexity, construct the tree structure, and prevent overfitting [35]. XGBoost trains the model parameters by iteratively optimizing this objective function.

2.3.2. Bayesian Optimization Algorithm

The Bayesian optimization algorithm is an effective global optimization technique that is widely used to find the global optimal solution for complex problems [35,36]. In this study, the Bayesian optimization algorithm (Algorithm 1) was employed to determine the optimal hyperparameters of the XGBoost landing zone prediction model. This algorithm utilizes a Gaussian process model, comprehensively considering the previous parameter information and continuously updating prior knowledge to identify the best combination of hyperparameters.

Algorithm 1 Main steps of Bayesian optimization process.

1. Initialize the hyperparameter vector

X_{0}

.

2. For the iteration step

t = 1, 2, \dots

:
a. Select the next “most promising” evaluation point by maximizing the acquisition function:

X_{t} = {a r g m a x}_{x \in χ} α (x | D_{1 : t - 1})

b. Evaluate the target function at the selected point:

y_{t} = f (x_{t}) + ε_{t}

c. Update the dataset:

D_{t} = D_{t - 1} \cup {x_{t}, y_{t}}

, and update the probabilistic proxy model.

3. End for.

To enhance the optimization process, we specifically chose ‘Expected Improvement’ as the acquisition function. This choice allowed the algorithm to balance exploration and exploitation by considering the uncertainty of the current model. Additionally, we set the Bayesian optimization process to run for 50 iterations to sufficiently explore the hyperparameter space and identify the best combinations. Through this iterative process, Bayesian optimization effectively explored the parameter space to find the globally optimal combination of hyperparameters. Compared to simple grid or random search, this method determined the optimal parameters more efficiently and has been widely applied in various complex optimization problems.

2.3.3. Evaluation Metrics

This study primarily employed two evaluation metrics: accuracy and the Receiver Operating Characteristic (ROC) curve [35]. In classification tasks, accuracy is commonly used to evaluate model performance. Specifically, accuracy refers to the proportion of samples the model correctly classifies out of all samples. It reflects the overall performance across all classes, and the formula is as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(3)

where

T P

(True Positive) is the number of truly positive samples correctly predicted as positive.

T N

(True Negative) is the number of truly negative samples correctly predicted as negative.

F P

(False Positive) is the number of samples truly negative but incorrectly predicted as positive.

F N

(False Negative) is the number of samples truly positive but incorrectly predicted as negative. In multi-class classification, accuracy is typically calculated as the proportion of correctly classified samples across all classes relative to the total number of samples.

The ROC curve is a widely used method for evaluating the performance of classification models. It plots the relationship between the True Positive Rate (TPR) and the False Positive Rate (FPR), demonstrating the model’s performance at different thresholds. TPR, also known as sensitivity or recall, represents the proportion of positive samples that are correctly identified (Equation (4)), and FPR represents the proportion of negative samples incorrectly identified as positive (Equation (5)).

T P R = \frac{T P}{T P + F N}

(4)

F P R = \frac{F P}{F P + T N}

(5)

The Area Under the Curve (AUC) for each class indicates the overall performance of the model, with values ranging from 0 to 1. A higher AUC signifies better model performance, an AUC of 0.5 indicates no classification ability, and an AUC of 1 indicates perfect classification.

While the ROC curve is primarily used in binary classification to evaluate the relationship between TPR and FPR, its calculation and interpretation become more complex in multi-class classification. In such cases, individual ROC curves can be computed for each class (one-vs-rest), or micro-averaged and macro-averaged ROC curves can be calculated. The micro-averaged ROC aggregates the TP, FP, FN, and TN across all samples to derive the overall TPR and FPR, generating a comprehensive performance curve. The macro-averaged ROC calculates the ROC curve and AUC for each class individually and then averages these values to reflect overall performance across classes. To effectively represent the classification performance of the model, this study computed individual ROC curves and AUC values for each class. Additionally, the micro-averaged ROC curve was calculated by aggregating TP, FP, FN, and TN across all samples, providing an overall ROC curve and AUC value to represent the model’s performance.

2.3.4. Feature Importance

Feature importance is a key metric for evaluating the contribution of features in a model’s classification and prediction process [37]. It reflects the degree of influence a feature has on a model’s predictive performance, with higher importance indicating a greater contribution to the model’s decision-making. Tree-based machine learning algorithms provide an effective tool for computing feature importance. The underlying principle is to measure the importance of a feature based on the information gained from its use as a split point in constructing the decision tree. Specifically, during the model training process, the more a feature is used as a split point, the higher its importance, as the choice of the split point reflects the feature’s ability to differentiate the target variable. By analyzing and ranking the importance of the feature, we can gain deeper insights into the relative influence of various engineering factors on the prediction of the target variable. This not only aids in feature selection and model optimization but also contributes to a better understanding of the problem domain and the acquisition of valuable insights.

2.3.5. Baseline Methods

To evaluate the performance of the BO-XGBoost model and gain further insights into the effectiveness of different machine learning approaches, we compared it with two commonly used models: the Attention Mechanism Convolutional Neural Network (Attn-CNN) and the Random Forest model [38,39,40]. Both models have broad applications in classification and regression problems, providing valuable benchmarks for our research. We utilized the same training dataset as that for XGBoost. We employed the evaluation metrics outlined in Section 2.3.3 to assess the performance of these methods in the context of engineering-constrained lunar landing site selection and suitability prediction.

The Attn-CNN model is a deep learning architecture that is well suited to processing image data. The model constructed in this study incorporated an attention mechanism and consisted of two convolutional layers. We employed the Adam optimizer for gradient descent, optimizing the learning rate through a cosine annealing algorithm to enhance the model’s performance. To enable a detailed comparison, the Attn-CNN model was trained for 100 epochs using the same dataset as that for BO-XGBoost.

Random Forest is an ensemble learning technique that utilizes multiple decision trees for classification and regression, offering advantages in handling high-dimensional data, assessing feature importance, and mitigating overfitting. However, it overemphasizes highly correlated features and may perform poorly on small sample datasets. To optimize the classification performance of the Random Forest, we implemented a Bayesian optimization algorithm over 50 iterations, evaluating the optimization outcomes through 5-fold cross-validation in each iteration. Bayesian optimization focused on key model parameters, such as the number of trees, maximum depth, and the minimum samples required for node splitting, aiming to balance model performance with computational efficiency. This approach helped identify the optimal parameter settings suitable for various feature combinations and data structures.

3. Results

3.1. Model Performance

After successfully constructing the XGBoost model based on Bayesian optimization, we evaluated its performance using accuracy and the ROC curve. The accuracy result was 96%, which indicated that the model demonstrated exceptional performance. As shown in Figure 2, the eight class labels correspond to the specific scores in the landing area assessment process. The AUC values for the model were all greater than 95% across different classes and the ROC curves generally converged towards the top-left corner of the plot, indicating that the BO-XGBoost model could accurately perform the scoring tasks for the different classes within the training region (Figure 2a). Furthermore, Figure 2b demonstrates the model’s overall scoring capability for the area. The micro-averaged ROC calculation yielded a high AUC value of 97.46%, close to the top-left corner, confirming excellent scoring capabilities within the training region. These results showcase the strong classification performance of the BO-XGBoost model, both in terms of individual class assessments and the overall scoring capability for the region of interest. However, while there were highly suitable landing zones based on individual data (Table 2), the ROC curve evaluation results (Figure 2a) showed no ROC score of 10. This indicated that, when considering the six categories of engineering data in this study, no landing zones fully met all criteria. This limitation must be taken into account in future site selection efforts.

3.2. Analysis of Feature Importance

The model evaluation indicated that the BO-XGBoost model performed well in the lunar landing site suitability assessment, effectively identifying suitable landing sites and explaining the relationships between various engineering factors and landing sites. However, different engineering factors had varying degrees of influence on landing site selection, making it essential to understand their importance.

Figure 3 shows the feature importance of each engineering factor. Among the six influencing factors selected in this study, the order of importance was slope > shadow > rock abundance > OMAT > roughness > DEM. Among them, slope was the most critical, accounting for 26.14% of the weight, highlighting its significant impact on landing safety. The importance of slope has been demonstrated in multiple studies on landing site selection [41,42,43,44], and the weight in this study further confirms its significance. Steep slopes pose challenges for vertical descent, horizontal position control, and lander stability. Additionally, rugged terrain can cause the lander to tilt or roll over, jeopardizing the entire landing process. Therefore, a landing area with a relatively gentle slope is crucial for safe landings.

The importance of hillshade (20.24%) indicates that terrain occlusion significantly affects landing site selection. The lunar surface features complex terrain with numerous mountains, craters, and other geomorphological elements. A landing area surrounded by excessive terrain occlusion can severely impact a lander’s positioning, navigation, wireless communication, and solar panel illumination—all critical for mission success. Thus, choosing a relatively flat area with minimal terrain occlusion is essential to ensure that a lander can complete its critical tasks.

The weights of rock abundance and OMAT were 17.37% and 17.08%, respectively, indicating that geological characteristics are also significant factors in landing site selection. High rock abundance increases the risk of obstacles during landing, while low optical maturity suggests a thicker regolith layer, potentially causing the lander to sink into the lunar soil. Selecting an area with relatively dispersed rocks and a thinner regolith layer can help ensure a smooth landing and safe post-landing activities.

Roughness carried a weight of 13.42%, making it another important consideration. An excessively rugged and uneven surface can create significant challenges for a lander’s descent and subsequent operations, increasing the risk of landing failure. Choosing a relatively flat and smooth area is crucial for stable landings. DEM had a lower weight of only 5.76%. This may be attributed to the relatively small absolute elevation differences on the lunar surface, making elevation less critical than other factors. However, selecting an appropriate DEM range remains important to ensure a safe landing and adequate power supply for landers.

3.3. Predictions Results

3.3.1. Lunar-Wide Prediction

After evaluating the model’s performance, we conducted landing zone prediction imaging for both the training model and the entire lunar surface (Figure 4) to demonstrate the model’s ability to assess landing suitability based on engineering constraints. In the selected areas with multiple verified landing sites (65°W–65°E, 45°N–15°S, within the range of the black square in Figure 4), most regions received high suitability scores, indicating they are appropriate for landing. All successfully landed missions are located within these suitable areas. Suitable landing zones are primarily found in flat regions, such as large plains and open spaces, where the terrain is relatively even, facilitating safe landings and subsequent operations. Additionally, low-slope areas help reduce impact risks during landing, with all confirmed landing sites situated in these low-slope regions. In contrast, areas deemed unsuitable by the model generally corresponded to complex terrains, such as craters’ edges and heavily impacted regions. Steep slopes and canyons were high-risk areas, consistent with existing research on landing site selection.

In the context of China’s lunar exploration program, over a hundred potential landing sites have been proposed. To avoid overlap with existing missions and similar geological structures, 50 landing sites are retained, forming a database for crewed lunar exploration [19,45]. According to the results, all retained landing sites fall outside unsuitable areas. However, three pre-selected landing zones received lower scores (indicated by the red circles), suggesting that further consideration is needed of their suitability. Additionally, four pre-selected sites lay outside the scope of this study and could not be analyzed. It is also unfortunate that the landing area for the U.S. Artemis program at the South Pole falls outside the predicted area for this study, making it difficult to analyze.

3.3.2. Evaluation of Recommended Landing Zones

This study evaluated regions repeatedly recommended for their high scientific value as pre-selected landing sites, including the Aristarchus crater on the Moon’s near side and the Moscoviense basin and Orientale basin on the far side [46,47]. We utilized resolutions of 0.5° × 0.5° and 0.1° × 0.1° to assess these areas and calculate their corresponding suitable landing areas (Figure 5).

The Aristarchus crater (23.7°N, 312.6°E) is a well-preserved Copernican impact crater with a diameter of 40 km, located on the Aristarchus plateau. The accurate dating of this crater will help constrain its size–frequency distribution (CSFD). This area exhibits evidence of explosive and effusive volcanic activity, including volcanic breccias, lunar rilles, and intrusive features [48,49,50]. The rich volcanic units provide a unique perspective for studying partially molten materials within the Moon. Due to its relatively young formation history, the crater has experienced minimal erosion, and its central peak and walls display layered rock formations along with ejecta from the Mare Imbrium, which may expose mantle materials. Thus, as a landing site, the Aristarchus crater offers insights into questions regarding volcanic activity, magma ocean differentiation [51], and the Moon’s deep composition. According to the results shown in Figure 5, 93% of the area around the Aristarchus crater was deemed suitable for landing at a resolution of 0.5° × 0.5°. This suitability further increased to 96% when assessed at a resolution of 0.1° × 0.1°. These assessments, based on six engineering factors, indicated extremely favorable geological features and environmental conditions that could minimize landing risks.

The Moscoviense basin (27.2°N, 147.6°E) is a multi-ring impact basin from the Nectarian period, characterized by geological diversity including lunar mare deposits, volcanic breccias, and lunar swirls [52]. Its composition consists of pyroxene, olivine, and magnesium–aluminum spinel, with a history of multiple magmatic filling events [46]. As a landing site, the Moscoviense basin can provide important information regarding magnetic anomalies of lunar swirls, the geological age and structure of far-side basins, and processes of large-scale impacts [46,47,53]. Figure 5 indicates that 35% of the Moscow basin and its surroundings were classified as suitable for landing at a resolution of 0.5° × 0.5°, while, at a resolution of 0.1°×0.1°, the suitability increased to 43%. Although this suitability is lower than that of the Aristarchus crater, it still offers a significant area for exploration, particularly for investigating geological diversity.

The Orientale basin (20.0°S, 265°E) is the youngest and most well-preserved multi-ring impact basin on the lunar surface, featuring three concentric rings and over 20 mare ponds. Its ejecta are divided into three distinct units, with the Maunder Formation comprised of ancient lunar mare materials [54]. The scientific potential of this landing site includes acquiring information on KREEP rocks, studying the lower lunar crust and possibly the lunar mantle, and obtaining insights from volcanic breccias or ancient lunar mare samples [46]. This could further constrain the impact history of the solar system, help analyze the Moon’s internal structure and composition, and refine our understanding of its thermal evolution and weathering processes. According to the results in Figure 5, 25% of the Orientale basin was considered suitable for landing at a resolution of 0.5° × 0.5°. At a more detailed resolution of 0.1° × 0.1°, the suitability improved to 32%. While this percentage is lower, it still presents significant potential for exploration, especially regarding the Moon’s early history and impact events.

As the resolution increased, the area identified as suitable for landing using multiple engineering factors also expanded. This may be attributed to the higher resolution’s ability to capture subtle variations in terrain, other engineering factors, and local features, allowing for more precise identification of suitable regions—such as minor topographical changes or specific slope conditions—that might be overlooked at lower resolutions. Additionally, certain areas may have been misclassified as low suitability zones at a resolution of 0.5° × 0.5° due to the coarse nature of the data, whereas, at a resolution of 0.1° × 0.1°, these areas may have met more engineering criteria, thereby being classified as suitable landing zones.

3.4. Comparative Analysis

This study further compared the performance of three models—BO-XGBoost, Attn-CNN, and Bayesian Optimized Random Forest (BO-RF)—in predicting the suitability of lunar landing sites using accuracy and ROC curves. The comparison results indicate that the ROC curve of the BO-XGBoost model was the smoothest, demonstrating stable classification performance, effectively distinguishing between different categories, and capturing complex geographical and environmental features, leading to excellent suitability recognition (Section 3.1). In contrast, the Attn-CNN model achieved an accuracy of 91%, but its macro-ROC curve showed relative fluctuations (Figure 6a), indicating inconsistent performance at certain thresholds. Nevertheless, this model achieved a micro-AUC value of 0.9764 (Figure 6b), highlighting its potential for recognizing and processing complex features, making it suitable for lunar geographic information analysis. The BO-RF model attained an accuracy of 93%, with its macro-ROC curve reflecting a degree of stability (Figure 6c), suggesting that this model possessed good sample differentiation ability for the study dataset, likely due to the ensemble effect of the Random Forest approach. Its highest micro-AUC value was 0.9876 (Figure 6d). All three models exhibited micro-AUC values close to 1, indicating overall strong performance in classification tasks.

All three models generated suitability maps for lunar landing sites. The results from the BO-XGBoost model were clear and detailed, effectively identifying optimal landing sites, thereby providing a basis for subsequent engineering site selection (Figure 4). Feature importance analysis showed that slope was the most significant factor, consistent with current scientific understanding (Section 3.2). This may be attributed to the gradient boosting method dynamically optimizing feature usage, allowing for a more accurate reflection of feature importance. In contrast (Figure 7), the suitability map generated by the Attn-CNN model demonstrated advantages in handling spatial features, emphasizing critical visual feature areas; however, it exhibited ambiguity in assessing suitability in complex regions, such as the edges of large impact basins in the distance. Feature importance evaluation indicated that roughness was the most significant, followed by slope (Figure 8), which contradicts current scientific understanding. This discrepancy may have arisen from the deep learning characteristics of the model and the complexity of feature interactions [38,39], possibly leading to the underestimation or neglect of important features. The suitability map produced by the Random Forest primarily displayed large contiguous suitable or unsuitable areas, with some critical regions being overlooked or misclassified, particularly in the distal highland areas, where regions with missing southern data were incorrectly classified as highly suitable landing sites. This could have been due to the way the BO-RF model calculates feature importance, typically averaging the results across trees, thereby masking or underestimating the significance of certain features.

Overall, the Attn-CNN model, with its complex structure, requires significantly more computational resources and time for training and tuning compared to the BO-XGBoost model (approximately 2–3 times in this study). Although the introduced attention mechanism can extract important information from high-dimensional features, the deep learning characteristics render its decision-making process challenging to interpret. On the other hand, while the BO-RF model is structurally simpler and achieves robust classification through a combination of multiple decision trees, its feature importance assessment falls short when addressing complex relationships. In comparison, the BO-XGBoost model effectively merges the advantages of ensemble learning and tree models, dynamically adjusting hyperparameters through Bayesian optimization, resulting in moderate model complexity. It provides a certain level of interpretability through feature importance scoring, offering a reliable basis for the suitability analysis of lunar landing sites. Additionally, it performs exceptionally well on structured data, effectively addressing the complex nonlinear problem of lunar landing site selection within large-scale datasets.

4. Discussion

Lunar landing site selection is a critical task in current space exploration. Traditionally, this process has relied on empirical judgment or simple qualitative analysis, lacking a systematic quantitative approach. This study attempted to apply data-driven machine learning methods to predict lunar landing sites. By combining Bayesian optimization with the XGBoost model, we could more objectively and scientifically evaluate the landing conditions on the lunar surface, providing technical support for future exploration missions. However, it should be noted that due to data availability constraints, this study focused on six engineering factors primarily related to lunar surface topography that may significantly impact the landing process (Table 1). Additionally, suitable landing areas were predicted across the entire lunar surface, excluding the poles. However, specific landing areas will still need to be further refined based on the particular landing mission. The absence of OMAT data resulted in striped predictions in the south-central region (Figure 4). Additionally, roughness data did not cover the north and south poles, limiting predictions to the region of 80°N–80°S, 180°W–180°E.

This study did not consider several important lunar scientific characteristics, such as mineral composition and the radiation environment, which are critical in the landing process. The mineral composition and distribution significantly influence landing strategies and site selection and are closely related to resource development and scientific research, making them essential factors in lunar site assessment [8,55]. The radiation environment on the Moon presents significant challenges for both robotic and crewed missions. High levels of cosmic radiation and solar particle events can affect the reliability of onboard systems and pose health risks to astronauts [56,57]. Understanding these factors is crucial for developing effective shielding and operational protocols. Future research should prioritize the integration of these scientific characteristics into the selection process and develop a more comprehensive framework that accurately captures the complex features of the lunar surface. By exploring the use of multi-source heterogeneous data, such as remote sensing images, in situ measurements, and historical mission data, we can improve prediction accuracy and ensure that selected landing sites are technically viable and scientifically promising. This holistic approach will pave the way for more successful exploration missions and enable deeper insights into the Moon’s geological and environmental conditions.

The combination of Bayesian optimization and XGBoost represents the core of this research. Bayesian optimization adaptively adjusted the hyperparameters of XGBoost, enhancing the model’s performance in lunar landing site prediction. XGBoost excels at handling complex nonlinear relationships and interactions between features. This capability allowed it to effectively manage outliers and missing values in the intricate lunar environment, demonstrating exceptional robustness and making it particularly suitable for analyzing multidimensional lunar data. Additionally, XGBoost provided feature importance evaluation tools that helped identify the key factors influencing landing site suitability. While all factors contribute to landing site selection, understanding their relative importance allows for more informed decision-making, ultimately enhancing the safety and success of lunar landing missions. Future work could also explore the interactions between these factors, providing deeper insights into their combined effects on landing site suitability. Another significant advantage of the BO-XGBoost model is its excellent scalability, enabling it to adapt to increasing data volumes and converge quickly, significantly reducing computation time. This also helps address the issue we mentioned earlier regarding the lack of scientific constraints. With more comprehensive data in the future, the BO-XGBoost model can adaptively perform analysis and provide results within the specified requirements. These qualities make this method more effective and reliable for lunar landing site selection.

Beyond the XGBoost model, relevant studies could explore other machine learning models [30,58,59], conducting comparative analyses to identify the optimal prediction model. To improve the interpretability, feature importance analysis and partial dependence plots can be employed to examine how each input factor influences prediction results. This study adopted cross-validation to evaluate the generalization performance of the model, which is a valued practice. In the future, introducing an independent test set would provide a realistic assessment of the model’s predictive ability on new data. The research results can support landing site selection for lunar exploration missions and inform the design of landing vehicles and strategies. This approach can offer more scientific and reliable support for future lunar exploration. It may also be extended to landing site selection for other celestial bodies, such as Mars and asteroids.

5. Conclusions

This study presented a Bayesian-optimized Extreme Gradient Boosting model to assess the suitability of lunar landing sites from the perspective of landing engineering safety. The model achieved promising results, with an accuracy of 96% and AUC values exceeding 95%, suggesting the potential effectiveness of machine learning in evaluating complex lunar environments. Utilizing the XGBoost ensemble learning model for feature modeling and prediction, this approach addresses some limitations associated with traditional manual analyses of large and complex datasets. Additionally, the integration of the Bayesian optimization algorithm allows for the adaptive tuning of the XGBoost model’s key hyperparameters, enhancing its predictive performance and robustness. By considering important engineering factors such as slope, DEM, and roughness, the BO-XGBoost model may contribute to the identification of safe and stable lunar landing areas, offering valuable insights for the optimization of future lunar exploration missions. While this method shows promise for supporting future lunar resource exploration, lunar base construction, and crewed lunar landing tasks, there is still room for improvement. Future work could involve integrating additional data, further refining the model structure, and broader validation of the method’s applicability and robustness to enhance its predictive capabilities and expand its potential applications.

Author Contributions

Conceptualization, Y.W. and S.W.; Data curation, Q.G.; Formal analysis, S.W.; Funding acquisition, Y.W., J.L. and X.K.; Investigation, S.W., Q.G., H.L., R.C., K.Z. and S.Z.; Methodology, S.W., Y.W. and Q.G.; Project administration, S.W., J.L. and X.K.; Resources, S.Z.; Software, S.W. and Q.G.; Supervision, S.W., Y.W., J.L. and K.Z.; Validation, S.W. and Y.W.; Visualization, Q.G.; Writing—original draft, S.W. and Q.G.; Writing—review and editing, S.W. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key Research and Development Program of China (2022YFF0503100) and the Graduate Innovation Fund of Jilin University (2024CX112, 2024CX108).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to thank the anonymous referee for his/her constructive comments that improved the manuscript. We also appreciate the researchers from all participating institutions involved in this project.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lin, Y.; Yang, W.; Zhang, H.; Hui, H.; Hu, S.; Xiao, L.; Liu, J.; Xiao, Z.; Yue, Z.; Zhang, J.; et al. Return to the Moon: New perspectives on lunar exploration. Sci. Bull. 2024, 69, 2136–2148. [Google Scholar] [CrossRef] [PubMed]
Pei, Z.; Liu, J.; Wang, Q.; Kang, Y.; Zou, Y.; Zhang, H.; Zhang, Y.; He, H.; Wang, Q.; Yang, R.; et al. Overview of lunar exploration and international lunar research station. Chin. Sci. Bull. 2020, 65, 2577–2586. [Google Scholar] [CrossRef]
Canup, R.M.; Righter, K.; Dauphas, N.; Pahlevan, K.; Ćuk, M.; Lock, S.J.; Stewart, S.T.; Salmon, J.; Rufu, R.; Nakajima, M.; et al. Origin of the Moon. Rev. Mineral. Geochem. 2023, 89, 53–102. [Google Scholar] [CrossRef]
Jolliff, B.L. Introduction to special section: New Views of the Moon II, a series of papers related to the lunar science initiative New views of the moon enabled by combined remotely sensed and lunar sample data sets. J. Geophys. Res. Planets 2000, 105, 20275–20276. [Google Scholar] [CrossRef]
Li, C.; Wang, C.; Wei, Y.; Lin, Y. China’s present and future lunar exploration program. Science 2019, 365, 238–239. [Google Scholar] [CrossRef]
Li, C.; Hu, H.; Yang, M.F.; Pei, Z.Y.; Zhou, Q.; Ren, X.; Liu, B.; Liu, D.; Zeng, X.; Zhang, G.; et al. Characteristics of the lunar samples returned by the Chang’E-5 mission. Natl. Sci. Rev. 2021, 9, nwab188. [Google Scholar] [CrossRef]
Ye, P.; Huang, J.; Sun, Z.; Yang, M.; Meng, L. The process and experience in the development of Chinese lunar probe. Sci. Sin. Technol. 2014, 44, 543–558. [Google Scholar] [CrossRef]
Liu, J.; Zeng, X.; Li, C.; Ren, X.; Yan, W.; Tan, X.; Zhang, X.; Chen, W.; Wei, Z.; Liu, Y.; et al. Landing site selection and overview of China’s lunar landing missions. Space Sci. Rev. 2021, 217, 1–25. [Google Scholar] [CrossRef]
Qiao, L.; Liu, X.; Zhao, J.; Wei, Y.; Xiao, L. Geological investigations of Luna 17, Apollo 15 and Chang’E-3 landing sites at Mare Imbrium of the Moon. Sci. Sin. Phys. Mech. Astron. 2016, 46, 029603. [Google Scholar] [CrossRef]
Lu, Y.; Wu, Y.; Michael, G.G.; Ma, J.; Cai, W.; Qin, N. Chronological sequence of Chang’E-4 landing zone within Von Kármán crater. Icarus 2021, 354, 114086. [Google Scholar] [CrossRef]
Sun, Z.; Zhang, T.; Zhang, H.; Jia, Y.; Zhang, H.; Chen, J.; Wu, X.; Shen, Z. The technical design and achievements of Chang’E-3 probe. Sci. Sin. Technol. 2014, 44, 331–343. [Google Scholar] [CrossRef]
Xu, X.; Angelopoulos, V.; Wang, Y.; Zuo, P.; Wong, H.; Cui, J. The energetic particle environment of the lunar nearside: SEP Influence. Astrophys. J. 2017, 849, 151. [Google Scholar] [CrossRef]
Chen, X.; Gao, F.; Wu, F.; Zhang, Y.; Wang, T.; Liu, W.; Zou, D.; Deng, F.; Gong, Y.; He, K.; et al. Large-scale array for radio astronomy on the farside (LARAF). Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2024, 382, 20230094. [Google Scholar] [CrossRef] [PubMed]
Yue, Z.; Di, K.; Wan, W.; Liu, Z.; Gou, S.; Liu, B.; Peng, M.; Wang, Y.; Jia, M.; Liu, J.; et al. Updated lunar cratering chronology model with the radiometric age of Chang’e-5 samples. Nat. Astron. 2022, 6, 541–545. [Google Scholar] [CrossRef]
Wu, B.; Li, F.; Ye, L.; Qiao, S.; Huang, J.; Wu, X.; Zhang, H. Topographic modeling and analysis of the landing site of Chang’E-3 on the Moon. Earth Planet. Sci. Lett. 2014, 405, 257–273. [Google Scholar] [CrossRef]
Lemelin, M.; Blair, D.M.; Roberts, C.E.; Runyon, K.D.; Nowka, D.; Kring, D.A. High-priority lunar landing sites for in situ and sample return studies of polar volatiles. Planet. Space Sci. 2014, 101, 149–161. [Google Scholar] [CrossRef]
Preethi, P.; Mamatha, H.R. Region-based convolutional neural network for segmenting text in epigraphical images. Artif. Intell. Appl. 2022, 1, 119–127. [Google Scholar] [CrossRef]
Zeng, X.; Mu, L. Lunar spatial environmental indicators dynamically modeling based exploration area selection. Geomat. Inf. Sci. Wuhan Univ. 2017, 42, 91–96. [Google Scholar] [CrossRef]
Cao, Y.; Wang, Y.; Liu, J.; Zeng, X.; Wang, J. Selection of whole-moon landing zones based on weights of evidence and fractals. Remote Sens. 2022, 14, 4623. [Google Scholar] [CrossRef]
Jia, Y.; Liu, L.; Wang, X.; Guo, N.; Wan, G. Selection of lunar south pole landing site based on constructing and analyzing fuzzy cognitive maps. Remote Sens. 2022, 14, 4863. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Di, K.; Peng, M.; Wan, W.; Liu, Z. A generative adversarial network for pixel-scale lunar DEM generation from high-resolution monocular imagery and low-resolution DEM. Remote Sens. 2022, 14, 5420. [Google Scholar] [CrossRef]
Tao, Y.; Muller, J.-P.; Conway, S.J.; Xiong, S.; Walter, S.H.G.; Liu, B. Large area high-resolution 3D mapping of the Von Kármán crater: Landing site for the Chang’E-4 lander and Yutu-2 rover. Remote Sens. 2023, 15, 2643. [Google Scholar] [CrossRef]
Sagi, O.; Rokach, L. Approximating XGBoost with an interpretable decision tree. Inf. Sci. 2021, 572, 522–542. [Google Scholar] [CrossRef]
Barker, M.K.; Mazarico, E.; Neumann, G.A.; Zuber, M.T.; Haruyama, J.; Smith, D.E. A new lunar digital elevation model from the Lunar Orbiter Laser Altimeter and SELENE Terrain Camera. Icarus 2016, 273, 346–355. [Google Scholar] [CrossRef]
Rosenburg, M.A.; Aharonson, O.; Head, J.W.; Kreslavsky, M.A.; Mazarico, E.; Neumann, G.A.; Smith, D.E.; Torrence, M.H.; Zuber, M.T. Global surface slopes and roughness of the Moon from the Lunar Orbiter Laser Altimeter. J. Geophys. Res. Planets 2011, 116, E02001. [Google Scholar] [CrossRef]
Powell, T.M.; Horvath, T.; Robles, V.L.; Williams, J.P.; Hayne, P.O.; Gallinger, C.L.; Greenhagen, B.T.; McDougall, D.S.; Paige, D.A. High-resolution nighttime temperature and rock abundance mapping of the Moon using the Diviner lunar radiometer experiment with a model for topographic removal. J. Geophys. Res. Planets 2023, 128, e2022JE007532. [Google Scholar] [CrossRef]
Bandfield, J.L.; Ghent, R.R.; Vasavada, A.R.; Paige, D.A.; Lawrence, S.J.; Robinson, M.S. Lunar surface rock abundance and regolith fines temperatures derived from LRO Diviner Radiometer data. J. Geophys. Res. Planets 2011, 116, E00H02. [Google Scholar] [CrossRef]
Smith, D.E.; Zuber, M.T.; Neumann, G.A.; Mazarico, E.; Lemoine, F.G.; Head Iii, J.W.; Lucey, P.G.; Aharonson, O.; Robinson, M.S.; Sun, X.; et al. Summary of the results from the lunar orbiter laser altimeter after seven years in lunar orbit. Icarus 2017, 283, 70–91. [Google Scholar] [CrossRef]
Lucey, P.G.; Blewett, D.T.; Taylor, G.J.; Hawke, B.R. Imaging of lunar surface maturity. J. Geophys. Res. Planets 2000, 105, 20377–20386. [Google Scholar] [CrossRef]
Feng, Y.; Li, H.; Tong, X.; Li, P.; Wang, R.; Chen, S.; Xi, M.; Sun, J.; Wang, Y.; He, H.; et al. Optimized landing site selection at the lunar south pole: A convolutional neural network approach. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1–18. [Google Scholar] [CrossRef]
Jia, Y.; Zou, Y.; Ping, J.; Xue, C.; Yan, J.; Ning, Y. The scientific objectives and payloads of Chang’E−4 mission. Planet. Space Sci. 2018, 162, 207–215. [Google Scholar] [CrossRef]
Li, C.; Mu, L.; Zou, X.; Liu, J.; Ren, X.; Zeng, X.; Yang, Y.; Zhang, Z.; Liu, Y.; Wei, Z.; et al. Analysis of the geomorphology surrounding the Chang’e-3 landing site. Res. Astron. Astrophys. 2014, 14, 1514. [Google Scholar] [CrossRef]
Flahaut, J.; Carpenter, J.; Williams, J.P.; Anand, M.; Crawford, I.A.; van Westrenen, W.; Füri, E.; Xiao, L.; Zhao, S. Regions of interest (ROI) for future exploration missions to the lunar South Pole. Planet. Space Sci. 2020, 180, 104750. [Google Scholar] [CrossRef]
Foody, G.M.; McCulloch, M.B.; Yates, W.B. The effect of training set size and composition on artificial neural network classification. Int. J. Remote Sens. 1995, 16, 1707–1723. [Google Scholar] [CrossRef]
Xie, L.; Zhang, R.; Zhan, J.; Li, S.; Shama, A.; Zhan, R.; Wang, T.; Lv, J.; Bao, X.; Wu, R. Wildfire risk assessment in Liangshan prefecture, China based on an integration machine learning algorithm. Remote Sens. 2022, 14, 4592. [Google Scholar] [CrossRef]
Liu, X.; Tang, H.; Zhang, X.; Chen, M. Gaussian process model-based performance uncertainty quantification of a typical turboshaft engine. Appl. Sci. 2021, 11, 8333. [Google Scholar] [CrossRef]
Zheng, H.; Yuan, J.; Chen, L. Short-term load forecasting using EMD-LSTM neural networks with a Xgboost algorithm for feature importance evaluation. Energies 2017, 10, 1168. [Google Scholar] [CrossRef]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Duan, Y.; Li, H.; Zhang, K.; Zhang, S.; Wu, S. Channel-spatial attention network for lunar image super-resolution. In Proceedings of the 2022 5th International Conference on Image and Graphics Processing, Beijing, China, 7–9 January 2022; pp. 333–338. [Google Scholar]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
De Rosa, D.; Bussey, B.; Cahill, J.T.; Lutz, T.; Crawford, I.A.; Hackwill, T.; van Gasselt, S.; Neukum, G.; Witte, L.; McGovern, A.; et al. Characterisation of potential landing sites for the European Space Agency’s Lunar Lander project. Planet. Space Sci. 2012, 74, 224–246. [Google Scholar] [CrossRef]
Djachkova, M.V.; Litvak, M.L.; Mitrofanov, I.G.; Sanin, A.B. Selection of Luna-25 landing sites in the South Polar Region of the Moon. Sol. Syst. Res. 2017, 51, 185–195. [Google Scholar] [CrossRef]
Hashimoto, T.; Hoshino, T.; Tanaka, S.; Otsuki, M.; Otake, H.; Morimoto, H. Japanese moon lander SELENE-2—Present status in 2009. Acta Astronaut. 2011, 68, 1386–1391. [Google Scholar] [CrossRef]
Amitabh, S.; Srinivasan, T.P.; Suresh, K. Potential Landing Sites for Chandrayaan-2 Lander in Southern Hemisphere of Moon. In Proceedings of the 49th Annual Lunar and Planetary Science Conference, The Woodlands, TX, USA, 1 March 2018; p. 1975. [Google Scholar]
Niu, R.; Zhang, G.; Mu, L.; Lin, Y.-T.; Liu, J.; Bo, Z.; Dai, W.; Qin, Z.; Zhang, P. Scientific objectives and suggestions on landing site selection of Manned Lunar Exploration Engineering. Adv. Astronaut. Sci. Technol. 2024, 7, 37–50. [Google Scholar] [CrossRef]
Jawin, E.R.; Valencia, S.N.; Watkins, R.N.; Crowell, J.M.; Neal, C.R.; Schmidt, G. Lunar science for landed missions workshop findings report. Earth Space Sci. 2019, 6, 2–40. [Google Scholar] [CrossRef]
Xiao, L.; Qiao, L.; Xiao, Z.; Huang, Q.; He, Q.; Zhao, J.; Xue, Z.; Huang, J. Major scientific objectives and candidate landing sites suggested for future lunar explorations. Sci. Sin. Phys. Mech. Astron. 2016, 46, 029602. [Google Scholar] [CrossRef]
Ling, Z.; Zhang, J.; Wu, Z.; Sun, L.; Liu, J. The compositional distribution and rock types of the Aristarchus region on the Moon. Sci. Sin. Phys. Mech. Astron. 2013, 43, 1403. [Google Scholar] [CrossRef]
Mustard, J.F.; Pieters, C.M.; Isaacson, P.J.; Head, J.W.; Besse, S.; Clark, R.N.; Klima, R.L.; Petro, N.E.; Staid, M.I.; Sunshine, J.M.; et al. Compositional diversity and geologic insights of the Aristarchus crater from Moon Mineralogy Mapper data. J. Geophys. Res. Planets 2011, 116, E00G12. [Google Scholar] [CrossRef]
Zisk, S.H.; Hodges, C.A.; Moore, H.J.; Shorthill, R.W.; Thompson, T.W.; Whitaker, E.A.; Wilhelms, D.E. The Aristarchus-Harbinger region of the Moon: Surface geology and history from recent remote-sensing observations. Moon 1977, 17, 59. [Google Scholar] [CrossRef]
Lucey, P.G.; Hawke, B.R.; Pieters, C.M.; Head, J.W.; McCord, T.B. A compositional study of the Aristarchus region of the Moon using near-infrared reflectance spectroscopy. J. Geophys. Res. Solid Earth 1986, 91, 344–354. [Google Scholar] [CrossRef]
Wieczorek, M.A.; Neumann, G.A.; Nimmo, F.; Kiefer, W.S.; Taylor, G.J.; Melosh, H.J.; Phillips, R.J.; Solomon, S.C.; Andrews-Hanna, J.C.; Asmar, S.W.; et al. The crust of the moon as seen by GRAIL. Science 2013, 339, 671–675. [Google Scholar] [CrossRef]
Thaisen, K.G.; Head, J.W.; Taylor, L.A.; Kramer, G.Y.; Isaacson, P.; Nettles, J.; Petro, N.; Pieters, C.M. Geology of the Moscoviense Basin. J. Geophys. Res. Planets 2011, 116, E00G07. [Google Scholar] [CrossRef]
Whitten, J.; Head, J.W.; Staid, M.; Pieters, C.M.; Mustard, J.; Clark, R.; Nettles, J.; Klima, R.L.; Taylor, L. Lunar mare deposits associated with the Orientale impact basin: New insights into mineralogy, history, mode of emplacement, and relation to Orientale Basin evolution from Moon Mineralogy Mapper (M3) data from Chandrayaan-1. J. Geophys. Res. Planets 2011, 116, E00G09. [Google Scholar] [CrossRef]
Ling, Z.; Jolliff, B.L.; Wang, A.; Li, C.; Liu, J.; Zhang, J.; Li, B.; Sun, L.; Chen, J.; Xiao, L.; et al. Correlated compositional and mineralogical investigations at the Chang’e-3 landing site. Nat. Commun. 2015, 6, 8880. [Google Scholar] [CrossRef] [PubMed]
Cucinotta, F.A.; Saganti, P.B. Radiation Environment of the Moon. In Encyclopedia of Lunar Science; Cudnik, B., Ed.; Springer International Publishing: Cham, Switzerland, 2023; pp. 997–1006. [Google Scholar]
Carpenter, J.D.; Fisackerly, R.; De Rosa, D.; Houdou, B. Scientific preparations for lunar exploration with the European Lunar Lander. Planet. Space Sci. 2012, 74, 208–223. [Google Scholar] [CrossRef]
Salman, H.; Kalakech, A.; Steiti, A. Random forest algorithm overview. Babylon. J. Mach. Learn. 2024, 2024, 69–79. [Google Scholar] [CrossRef] [PubMed]
Petrakis, G.; Partsinevelos, P. Lunar ground segmentation using a modified U-net neural network. Mach. Vis. Appl. 2024, 35, 50. [Google Scholar] [CrossRef]

Figure 1. Workflow of the XGBoost model for lunar landing site prediction.

Figure 2. ROC curves and corresponding AUC values. (a) ROC curve for each class. (b) Micro-averaged ROC curve for BO-XGBoost model.

Figure 3. The feature importance of each engineering factor.

Figure 4. Lunar surface suitability map for landing site predictions (0.1° × 0.1°).

Figure 5. LROC WAC images and landing suitability of three recommended landing sites. Left column: LROC images; middle column: landing suitability assessments map (0.5° × 0.5°); right column: landing suitability assessments map (0.1° × 0.1°).

Figure 6. Performance comparison of baseline methods. (a) ROC curve for each class of the Attn-CNN model. (b) Micro-averaged ROC curve of the Attn-CNN model. (c) ROC curve for each class of the BO-RF model. (d) Micro-averaged ROC curve of the BO-RF model.

Figure 7. Comparison of land suitability maps calculated using baseline methods.

Figure 8. Comparison of feature importance.

Table 2. Quantification standards for engineering constraint indicators (10-point scale).

Data	Grade	Suitability Level	Score
SLDEM	[−9127, −5930)	Slightly Unsuitable	2
	[−5930, 533)	Highly Suitable	10
	[533, 1102)	Suitable	9
	[1102, 3166)	Moderately Suitable	3
	[3166, 10,772]	Unsuitable	1
Slope	[0.001, 3.2)	Highly Suitable	10
	[3.2, 7.9)	Suitable	9
	[7.9, 12.8)	Moderately Suitable	3
	[12.8, 19.9)	Slightly Unsuitable	2
	[19.9, 81]	Unsuitable	1
Hillshade	[1, 76.4)	Slightly Unsuitable	2
	[76.4, 134)	Slightly Unsuitable	2
	[134, 164.7)	Highly Suitable	10
	[164.7, 184)	Suitable	9
	[184, 254]	Unsuitable	1
Rock Abundance	[0.001, 0.015)	Highly Suitable	10
	[0.015, 0.031)	Suitable	9
	[0.031, 0.13)	Moderately Suitable	3
	[0.13, 0.36)	Slightly Unsuitable	2
	[0.36, 2.29]	Unsuitable	1
Roughness	[0.003, 0.008)	Slightly Unsuitable	2
	[0.008, 0.13)	Suitable	9
	[0.13, 0.25)	Highly Suitable	10
	[0.25, 035)	Slightly Unsuitable	2
	[0.35, 1.06]	Unsuitable	1
Optical Maturity	[1, 62)	Slightly Unsuitable	2
	[62, 103)	Highly Suitable	10
	[103, 114)	Suitable	9
	[114, 176)	Slightly Unsuitable	2
	[176, 254]	Unsuitable	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wen, S.; Wang, Y.; Gong, Q.; Liu, J.; Kang, X.; Liu, H.; Chen, R.; Zhu, K.; Zhang, S. A New Robust Lunar Landing Selection Method Using the Bayesian Optimization of Extreme Gradient Boosting Model (BO-XGBoost). Remote Sens. 2024, 16, 3632. https://doi.org/10.3390/rs16193632

AMA Style

Wen S, Wang Y, Gong Q, Liu J, Kang X, Liu H, Chen R, Zhu K, Zhang S. A New Robust Lunar Landing Selection Method Using the Bayesian Optimization of Extreme Gradient Boosting Model (BO-XGBoost). Remote Sensing. 2024; 16(19):3632. https://doi.org/10.3390/rs16193632

Chicago/Turabian Style

Wen, Shibo, Yongzhi Wang, Qizhou Gong, Jianzhong Liu, Xiaoxi Kang, Hengxi Liu, Rui Chen, Kai Zhu, and Sheng Zhang. 2024. "A New Robust Lunar Landing Selection Method Using the Bayesian Optimization of Extreme Gradient Boosting Model (BO-XGBoost)" Remote Sensing 16, no. 19: 3632. https://doi.org/10.3390/rs16193632

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A New Robust Lunar Landing Selection Method Using the Bayesian Optimization of Extreme Gradient Boosting Model (BO-XGBoost)

Abstract

1. Introduction

2. Materials and Methods

2.1. Engineering Dataset

2.2. Data Standardization and Preprocessing

2.3. Prediction Model Construction

2.3.1. XGBoost Algorithm

2.3.2. Bayesian Optimization Algorithm

2.3.3. Evaluation Metrics

2.3.4. Feature Importance

2.3.5. Baseline Methods

3. Results

3.1. Model Performance

3.2. Analysis of Feature Importance

3.3. Predictions Results

3.3.1. Lunar-Wide Prediction

3.3.2. Evaluation of Recommended Landing Zones

3.4. Comparative Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI