*4.1. Predictive Model Building*

In order to fully explore the nonlinear relationship between the 3D ore-controlling factors and the ore-forming facts, based on the sample data set established above, this paper selects two machine learning methods, logical regression and random forest, to carry out 3D ore-forming predictions in the deep part of the mining area.

In addition to the support of a large number of effective datasets, the machine learning model also needs to set the model's parameters for the current dataset, which is an important factor in determining the model's performance. The random forest algorithm includes the two most important parameters: the number of decision trees M and the number of attributes K in the randomly selected attribute set. In this paper, the sampling dataset will be used to determine the appropriate number of decision trees and attributes of the random forest classification model using cross-validation. Due to the regression model adopted in this paper, after obtaining the error estimates of the results of each cross-validation set, the standard deviation is taken as the evaluation standard to evaluate the consistency of the model on different data sets (Figure 7). *Minerals* **2022**, *12*, 1174 9 of 14 ber of attributes K in the randomly selected attribute set. In this paper, the sampling dataset will be used to determine the appropriate number of decision trees and attributes of the random forest classification model using cross-validation. Due to the regression model adopted in this paper, after obtaining the error estimates of the results of each cross-validation set, the standard deviation is taken as the evaluation standard to evaluate the consistency of the model on different data sets (Figure 7).

**Figure 7.** Standard Deviation maps of random forest algorithm under different parameters. **Figure 7.** Standard Deviation maps of random forest algorithm under different parameters.

According to the results, this paper uses logistic regression and random forest (M = 200, K = 12) methods to carry out a 3DMPM on the deep edge of the Xuancheng–Magushan area and obtains the distribution map of favorable areas (Figure 8). According to the results, this paper uses logistic regression and random forest (M = 200, K = 12) methods to carry out a 3DMPM on the deep edge of the Xuancheng–Magushan area and obtains the distribution map of favorable areas (Figure 8).

**Figure 8.** Distribution map of favorable areas, (**a**) Random forest model results; (**b**) Logistic regres-

The confusion matrix is a standard format for expressing the accuracy evaluation. It is often used in binary classification scenarios. Each column of the matrix represents the prediction of the sample, and each row of the matrix represents the real situation of the sample. To more intuitively express the quality of the model's performance, we extend three metrics from the matrix: precision, recall, and specificity. The trained model is used in the test set divided above to test the performance of the model. According to the results, the blocks with favorable degrees of mineralization greater than 0.5 predicted in the test set are selected as favorable units for mineralization. Finally, we compare the real value and the predicted value of each block in the test set and use these three prediction indica-

sion model results.

*4.2. Model Performance Analysis*

tors to compare the model (Table 3).

**Figure 7.** Standard Deviation maps of random forest algorithm under different parameters.

gushan area and obtains the distribution map of favorable areas (Figure 8).

According to the results, this paper uses logistic regression and random forest (M = 200, K = 12) methods to carry out a 3DMPM on the deep edge of the Xuancheng–Ma-

ber of attributes K in the randomly selected attribute set. In this paper, the sampling dataset will be used to determine the appropriate number of decision trees and attributes of the random forest classification model using cross-validation. Due to the regression model adopted in this paper, after obtaining the error estimates of the results of each cross-validation set, the standard deviation is taken as the evaluation standard to evaluate the con-

sistency of the model on different data sets (Figure 7).

**Figure 8.** Distribution map of favorable areas, (**a**) Random forest model results; (**b**) Logistic regression model results. **Figure 8.** Distribution map of favorable areas, (**a**) Random forest model results; (**b**) Logistic regression model results.

#### *4.2. Model Performance Analysis 4.2. Model Performance Analysis*

The confusion matrix is a standard format for expressing the accuracy evaluation. It is often used in binary classification scenarios. Each column of the matrix represents the prediction of the sample, and each row of the matrix represents the real situation of the sample. To more intuitively express the quality of the model's performance, we extend three metrics from the matrix: precision, recall, and specificity. The trained model is used in the test set divided above to test the performance of the model. According to the results, the blocks with favorable degrees of mineralization greater than 0.5 predicted in the test set are selected as favorable units for mineralization. Finally, we compare the real value and the predicted value of each block in the test set and use these three prediction indica-The confusion matrix is a standard format for expressing the accuracy evaluation. It is often used in binary classification scenarios. Each column of the matrix represents the prediction of the sample, and each row of the matrix represents the real situation of the sample. To more intuitively express the quality of the model's performance, we extend three metrics from the matrix: precision, recall, and specificity. The trained model is used in the test set divided above to test the performance of the model. According to the results, the blocks with favorable degrees of mineralization greater than 0.5 predicted in the test set are selected as favorable units for mineralization. Finally, we compare the real value and the predicted value of each block in the test set and use these three prediction indicators to compare the model (Table 3).

tors to compare the model (Table 3). **Table 3.** Comparison of performance indicators.


Comparing the three performance indicators, it can be concluded that the random forest model performs better than the logistic regression model, which can effectively distinguish non-ore body units in the case of predicting more known ore bodies in the test set and has a good generalization ability.

The ROC curve is also often used in the performance evaluation of the two-class network [46]. It can indicate the ability to identify the sample at a certain threshold. The vertical and horizontal coordinates of the points on the curve represent the true positive rate (TPR) and the false positive rate (FPR) of the output results under different thresholds, respectively. The ROC curve indicates the percentage of true positive units in the known mineralization units in the different positive prediction ranges of the model. The area under the curve is called the AUC value. The larger the AUC value, the better the model effect. This paper compares the ROC curves of the two models (Figure 9) and finds that the image of the MPM method based on the random forest is more inclined to the upper left corner than the logistic regression model. The AUC values of the two models are 0.989 and 0.969, indicating the random forest model has better performance and more reliable results.

**Table 3.** Comparison of performance indicators.

set and has a good generalization ability.

results.

**Models Accuracy Recall Speciality** 

Logistic regression 90.625% 83.62% 93.33% Random forest 96.63% 93.97% 97.67%

Comparing the three performance indicators, it can be concluded that the random forest model performs better than the logistic regression model, which can effectively distinguish non-ore body units in the case of predicting more known ore bodies in the test

The ROC curve is also often used in the performance evaluation of the two-class network [46]. It can indicate the ability to identify the sample at a certain threshold. The vertical and horizontal coordinates of the points on the curve represent the true positive rate (TPR) and the false positive rate (FPR) of the output results under different thresholds, respectively. The ROC curve indicates the percentage of true positive units in the known mineralization units in the different positive prediction ranges of the model. The area under the curve is called the AUC value. The larger the AUC value, the better the model effect. This paper compares the ROC curves of the two models (Figure 9) and finds that the image of the MPM method based on the random forest is more inclined to the upper left corner than the logistic regression model. The AUC values of the two models are 0.989 and 0.969, indicating the random forest model has better performance and more reliable

**Figure 9.** Comparison of ROC curves. **Figure 9.** Comparison of ROC curves.

The performance of the two models was further quantitatively evaluated by plotting the capture efficiency curves [47,48] (Figure 10). First, the predicted metallogenic favorableness of all blocks is sorted in descending order. Then various thresholds are set according to the sorting results to reclassify the unit blocks in the study area. Finally, the capture efficiency is calculated by counting the number of known ore body units in different sections. The calculation process of the capture efficiency is to perform the statistical calculation on all blocks in the study area. From the capture efficiency curve, it can be obtained that the blocks in the top 4‰ of the metallogenic favorable degree predicted by the random forest model in the study area can cover all the known ore bodies. In the logistic regression model results, only the blocks in the top 20‰ of the favorable degree of mineralization in the study area can cover all known ore bodies. It can be shown that the The performance of the two models was further quantitatively evaluated by plotting the capture efficiency curves [47,48] (Figure 10). First, the predicted metallogenic favorableness of all blocks is sorted in descending order. Then various thresholds are set according to the sorting results to reclassify the unit blocks in the study area. Finally, the capture efficiency is calculated by counting the number of known ore body units in different sections. The calculation process of the capture efficiency is to perform the statistical calculation on all blocks in the study area. From the capture efficiency curve, it can be obtained that the blocks in the top 4‰ of the metallogenic favorable degree predicted by the random forest model in the study area can cover all the known ore bodies. In the logistic regression model results, only the blocks in the top 20‰ of the favorable degree of mineralization in the study area can cover all known ore bodies. It can be shown that the random forest can contain more known ore body units in the block unit with high posterior probability and can screen out the metallogenic prospect area more finely. *Minerals* **2022**, *12*, 1174 11 of 14 random forest can contain more known ore body units in the block unit with high posterior probability and can screen out the metallogenic prospect area more finely.

**Figure 10.** Capture efficiency curves. **Figure 10.** Capture efficiency curves.

#### **5. Discussion**

specting and exploration.

**5. Discussion**  After analyzing the indicators of the logistic regression model and the random forest model, it can be seen that the prediction results of the random forest model are better. The After analyzing the indicators of the logistic regression model and the random forest model, it can be seen that the prediction results of the random forest model are better. The

accuracy of the random forest model in the test set is 96.63%, which is higher than that of the logistic regression model by 6.005%, i.e., 10.35% higher in recall and 4.34% higher in

mineral control in the study area. At the same time, compared with logistic regression, the random forest model can better identify ore body characteristics and can cover more known ore body units in the same number of block units with high metallogenic favorable degrees. By comparing the distributions and shapes of favorable areas predicted by the two methods, it can be seen that the random forest can constrain the specific locations of the prospectivity targets more finely, thereby effectively improving the efficiency of pro-

In this paper, the prediction results of random forest are used to delineate the metallogenic target area, and the unit block with a metallogenic favorable degree greater than

According to the prediction results, there are 7652 favorable areas in the study area, accounting for 1.08 % of the whole study area, including 96.71 % of the known ore bodies. Therefore, the random forest model can not only effectively identify the known ore bodies, but also screen out Blocks with greater metallogenic potential. Then five metallogenic po-

0.5 is selected as the potential metallogenic unit.

tential areas are divided (Figure 11).

accuracy of the random forest model in the test set is 96.63%, which is higher than that of the logistic regression model by 6.005%, i.e., 10.35% higher in recall and 4.34% higher in specificity, indicating that random forest can better characterize the characteristics of the mineral control in the study area. At the same time, compared with logistic regression, the random forest model can better identify ore body characteristics and can cover more known ore body units in the same number of block units with high metallogenic favorable degrees. By comparing the distributions and shapes of favorable areas predicted by the two methods, it can be seen that the random forest can constrain the specific locations of the prospectivity targets more finely, thereby effectively improving the efficiency of prospecting and exploration.

In this paper, the prediction results of random forest are used to delineate the metallogenic target area, and the unit block with a metallogenic favorable degree greater than 0.5 is selected as the potential metallogenic unit.

According to the prediction results, there are 7652 favorable areas in the study area, accounting for 1.08 % of the whole study area, including 96.71 % of the known ore bodies. Therefore, the random forest model can not only effectively identify the known ore bodies, but also screen out Blocks with greater metallogenic potential. Then five metallogenic potential areas are divided (Figure 11). *Minerals* **2022**, *12*, 1174 12 of 14

**Figure 11.** Delineation of prospecting target areas. **Figure 11.** Delineation of prospecting target areas.

The five metallogenic prospective areas classified in this paper all have high metallogenic potential. The No. Ⅰ and No. Ⅱ target areas are located in the prospecting area of Magushan. The burial depth of the No. Ⅰ target area is about −900 m~−1200 m, and the burial depth of the No. 2 target area is about −1500 m~−2000 m. The target area is located in the middle of the high gravity anomaly in Magushan as a whole, with a trend near east– west, and the isolines on the north and south sides change rapidly; in terms of the aeromagnetization pole anomaly, the Magushan anomaly clearly shows a high magnetic anomaly, with a trend near the pear-shaped distribution in the north and the south: the contour changes smoothly, the gradient changes rapidly on the north side, and extends to the south, showing the subsidence direction of the concealed rock mass. The measurement anomalies of 1:200,000 water system sediments show that Cu, Hg, and W are anomalous in the vicinity of the Magushan deposit. The No. Ⅲ target area is located on the surface of the high-density body, and the burial depth is about −2100 m~−2800 m. The No. IV target The five metallogenic prospective areas classified in this paper all have high metallogenic potential. The No. I and No. II target areas are located in the prospecting area of Magushan. The burial depth of the No. I target area is about −900 m~−1200 m, and the burial depth of the No. II target area is about −1500 m~−2000 m. The target area is located in the middle of the high gravity anomaly in Magushan as a whole, with a trend near east–west, and the isolines on the north and south sides change rapidly; in terms of the aeromagnetization pole anomaly, the Magushan anomaly clearly shows a high magnetic anomaly, with a trend near the pear-shaped distribution in the north and the south: the contour changes smoothly, the gradient changes rapidly on the north side, and extends to the south, showing the subsidence direction of the concealed rock mass. The measurement anomalies of 1:200,000 water system sediments show that Cu, Hg, and W are anomalous in the vicinity of the Magushan deposit. The No. III target area is located on the surface of the high-density body, and the burial depth is about −2100 m~−2800 m. The No. IV

area is generally controlled by structural forms, such as the uplift and depression of the rock mass, and the buried depth is about −1100 m~−1500m. The No. V target area is located

this area, and the burial depth is about −2100 m~−2900 m. Therefore, the five prospectivity targets classified in this paper can be the priority exploration targets for future mineral

(1) The 3DMPM is an important tool for deep targets delineation for future exploration. This paper delineates five prospectivity targets with good mineralization potentials in the deep area of the Xuancheng–Magushan area, which can be used for future explora-

(2) In the Xuancheng–Magushan area, the favorable areas divided by the random forest model contain 96.71% of known ore bodies and only account for 1.08% of the study area, which can show that the random forest model can perform better than the logistic regression model in the 3DMPM using the dataset of the study area. It means that the random forest model could provide more effective and accurate support for integrating

exploration in this area.

predictive data during the 3DMPM.

**6. Conclusions**

tion.

target area is generally controlled by structural forms, such as the uplift and depression of the rock mass, and the buried depth is about −1100 m~−1500m. The No. V target area is located at the intersection of the faults, and there are certain magnetic anomalies on the surface of this area, and the burial depth is about −2100 m~−2900 m. Therefore, the five prospectivity targets classified in this paper can be the priority exploration targets for future mineral exploration in this area.
