**1. Introduction**

Geosciences is a data-intensive science, and geological survey and mineral exploration has accumulated a large amount of multi-source geosciences data, and the introduction of big data and its application of machine learning and deep learning can effectively support mineral resources exploration [1–6].

3D ore-induced information has multiphase, multisource, and multivariable characteristics, which derives the difficulty of information extraction, identification, and deduction, posing challenges for its classification and prediction. Big data intelligent algorithms in geosciences provide efficient tools for mineral exploration with machine learning and deep learning methods [7–9]. The deep learning algorithms, including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Multi-Layer perceptron (MLP), Deep Auto-encoder (DAE), and Deep Belief Networks (DBN) [10–16], have been successfully used in geological, geochemical, geophysical, and remote sensing data analysis and quantitative mineral resources prediction [17–31].

Convolutional Neural Network, as one of the most successful neural network models of deep learning methods, has a number of achievements in geosciences, such as lithology identification [32], geological mapping [33], and 3D geological structure inversion [34–36]. In recent years, many scholars have also achieved excellent results by applying CNN to the study of mineral prediction [37–44].

**Citation:** Yu, Z.; Liu, B.; Xie, M.; Wu, Y.; Kong, Y.; Li, C.; Chen, G.; Gao, Y.; Zha, S.; Zhang, H.; et al. 3D Mineral Prospectivity Mapping of Zaozigou Gold Deposit, West Qinling, China: Deep Learning-Based Mineral Prediction. *Minerals* **2022**, *12*, 1382. https://doi.org/10.3390/ min12111382

Academic Editor: Martiya Sadeghi

Received: 31 July 2022 Accepted: 27 October 2022 Published: 30 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Auto-encoder network is a typical unsupervised learning algorithm for ore-induced anomaly detection. The auto-encoder network contains two parts, that is Encoder and Decoder, which can be used for data dimensional reduction to reconstruct sample population with unknown complex multivariate probability distributions, especially for big data [45,46]. With the rise in deep learning, auto-encoder networks and related variant networks are widely used for image classification [47–49], anomaly detection [34,50], and data generation [51], etc.

The idea of Deep Auto-encoder network for anomaly detection in geosciences is to learn the reconstruction ability of anomaly-free samples, and to achieve anomaly detection by reconstructing anomaly samples with a large error. Related achievements include the integration of auto-encoder network with density-based spatial clustering for geochemical anomaly detection [52], physically constrained variational auto-encoder for geochemical pattern recognition [53], and detection of geochemical anomalies related to mineralization using the GANomaly network [54]. DAE is an unsupervised learning method, which achieves more efficient detection of ore-induced comprehensive anomalies, and does not need to manually label positive and negative samples, saving valuable labor costs.

Knowledge distillation (KD) is an important method to quickly deal with the problems caused by labeled data (positive and negative sample imbalance, number of labeled samples, etc.), which can efficiently achieve information transfer between networks, and it is also called Student–Teacher learning framework. KD is widely used in computer vision, speech recognition, natural language processing, etc., to achieve model compression [55] and knowledge transfer [56]. KD is composed of the Teacher and Student network, the Teacher network usually has certain learning ability through training, and the Student network is used to imitate the ability of the Teacher network and thus achieve knowledge transfer. If the Student network has a smaller network structure compared to the Teacher network, it also achieves model compression. So far, KD methods have been less applied in geological and geochemical data analysis or mineral resources prediction.

The main task of quantitative mineral prediction is to conduct comprehensive analysis of geological, geophysical, geochemical, and remote sensing data and drilling engineering data in the study area based on the research of geological background and metallogenic regularity, and then construct mathematical models to effectively extract and identify favorable information on mineralization, carry out data fusion of quantitative mineralization information, construct mineral prediction models, perform mineral prospectivity mapping, and exploration targets delineation.

Compared with using the DAE to reconstruct the anomaly-free samples directly and then detecting the test samples, the Teacher network of KD model can learn the reconstruction ability of all samples and then use the Student network to learn the reconstruction ability of the Teacher network for the normal samples. The test process achieves the detection of anomaly data by calculating the Mean Square Error (MSE) in the results of Student and Teacher networks. Therefore, we propose the STOAD model based on the module of CNN, the framework of KD and the reconstruction ability of DAE, to achieve 3D Mineral Prospectivity Mapping in Zaozigou gold deposit. Results show the STOAD model performs more efficiently and more robustly compared to the DAE model, and three main mineral exploration targets are delineated.

#### **2. Geological Setting and Datasets**

#### *2.1. Geological Setting*

The West Qinling is located at the western part of the Qinling orogenic belt, with the Qilian orogenic belt to the north, the Qaidam block to the west, and the songpan block to the south (Figure 1a) [57–60].

The geotectonic position of the Xiahe-Hezuo area is located at the northwestern part of the West Qinling orogenic belt and at the western extension of the Qinling–Qilian–Kunlun central orogenic belt, whose complex geological structure creates a superior mineralization environment [61].

The Zaozigou Gold deposit is located within the West Qinling fold belt and is a typical epithermal-type gold deposit in the Xiahe-Hezuo area (Figure 1a). The main controlling factors for mineral resources formation within the region are tectonic movement and magmatism [62], with the regional tectonics spreading in a NW direction with developing folds and fracturs. Complex geological structure and magmatism are dominated by Yanshan period intermediate-acid intrusive rocks, which are widely distributed in the form of the batholith, stock, apophysis, and veins [62] (Figure 1b). The Zaozigou Gold deposit is located within the West Qinling fold belt and is a typical epithermal-type gold deposit in the Xiahe-Hezuo area (Figure 1a). The main controlling factors for mineral resources formation within the region are tectonic movement and magmatism [62], with the regional tectonics spreading in a NW direction with developing folds and fracturs. Complex geological structure and magmatism are dominated by Yanshan period intermediate-acid intrusive rocks, which are widely distributed in the form of the batholith, stock, apophysis, and veins [62] (Figure 1b).

The West Qinling is located at the western part of the Qinling orogenic belt, with the Qilian orogenic belt to the north, the Qaidam block to the west, and the songpan block to

The geotectonic position of the Xiahe-Hezuo area is located at the northwestern part

of the West Qinling orogenic belt and at the western extension of the Qinling–Qilian– Kunlun central orogenic belt, whose complex geological structure creates a superior min-

*Minerals* **2022**, *12*, x FOR PEER REVIEW 3 of 22

**2. Geological Setting and Datasets**

the south (Figure 1a) [57–60].

eralization environment [61].

*2.1. Geological Setting*

**Figure 1.** (**a**) Geotectonic location of the study area. NQL: North Qinling tectonic belt; SDS: Shangdan suture zone; CBS: Caibei suture zone; AMS: A'nyemaqen suture zone. (**b**) Geological map of Xiahe-Hezuo area (modified from [62]). 1. Quaternary; 2. Neogene; 3. Cretaceous;4. Upper Triassic Huari group; 5. Middle-Lower Triassic Daheba group; 6. Lower Triassic Jiangligou group; 7. Lower Triassic Guomugou group; 8. Upper Permian Shiguan group; 9. Lower Permian Daguanshan group; 10. Upper Carboniferous Badu group; 11. Intermediate-acid intrusive rocks; 12. Fracture; 13. Angular unconformity; 14. Gold depsit; 15. Copper deposit; 16. Lead depsit; 17. Antimony deposit; 18. Mercury deposit; 19. Iron deposit; 20. Iron–copper deposit; 21. Copper–molybdenum deposit; 22. Copper-tungsten deposit; 23. Polymetallic deposit. **Figure 1.** (**a**) Geotectonic location of the study area. NQL: North Qinling tectonic belt; SDS: Shangdan suture zone; CBS: Caibei suture zone; AMS: A'nyemaqen suture zone. (**b**) Geological map of Xiahe-Hezuo area (modified from [62]). 1. Quaternary; 2. Neogene; 3. Cretaceous;4. Upper Triassic Huari group; 5. Middle-Lower Triassic Daheba group; 6. Lower Triassic Jiangligou group; 7. Lower Triassic Guomugou group; 8. Upper Permian Shiguan group; 9. Lower Permian Daguanshan group; 10. Upper Carboniferous Badu group; 11. Intermediate-acid intrusive rocks; 12. Fracture; 13. Angular unconformity; 14. Gold depsit; 15. Copper deposit; 16. Lead depsit; 17. Antimony deposit; 18. Mercury deposit; 19. Iron deposit; 20. Iron–copper deposit; 21. Copper–molybdenum deposit; 22. Copper-tungsten deposit; 23. Polymetallic deposit.

The Triassic strata are the main stratigraphy for gold deposit. The genesis and spatialtemporal evolution of the intermediate-acidic dike is closely related to gold mineralization in the area, and during the mineralization process, the magmatic rocks not only provide the mineralized material, but also their internal environment is very suitable as an ore-

bearing space, which can be regarded as a significant mineralization indicator for gold mineralization [63]. mineralization [63]. The spatial distribution of mineral deposits is directly controlled by the geotectonic position in the Xiahe-Hezuo area, which plays a major role in the formation of different

The Triassic strata are the main stratigraphy for gold deposit. The genesis and spatialtemporal evolution of the intermediate-acidic dike is closely related to gold mineralization in the area, and during the mineralization process, the magmatic rocks not only provide the mineralized material, but also their internal environment is very suitable as an orebearing space, which can be regarded as a significant mineralization indicator for gold

*Minerals* **2022**, *12*, x FOR PEER REVIEW 4 of 22

The spatial distribution of mineral deposits is directly controlled by the geotectonic position in the Xiahe-Hezuo area, which plays a major role in the formation of different types of gold deposits and is the boundary of the belt from a spatial perspective. The most important types of mineral-controlling structures are fractures and folds in this area [64,65]. The main orebodies of the Zaozigou deposit are produced in NE, NW, and near-SN oriented fracture zones, with two apparent mineralization periods, and post-formation fractures have modified and destroyed the orebodies [66–70]. types of gold deposits and is the boundary of the belt from a spatial perspective. The most important types of mineral-controlling structures are fractures and folds in this area [64,65]. The main orebodies of the Zaozigou deposit are produced in NE, NW, and near-SN oriented fracture zones, with two apparent mineralization periods, and post-formation fractures have modified and destroyed the orebodies [66–70].

The Zaozigou gold deposit is a typical representative of gold deposits associated with intermediate-acid dike rocks in the south of the Xiahe-Hezuo fracture. It is located at approximately 9 km southwest of the Hezuo city, Gansu Province, with convenient access to the mine site (Figure 2a). The main ore-bearing position is between Gully 1 and Gully 4, with a total area of approximately 2.6 km<sup>2</sup> (Figure 2b). The Zaozigou gold deposit is a typical representative of gold deposits associated with intermediate-acid dike rocks in the south of the Xiahe-Hezuo fracture. It is located at approximately 9 km southwest of the Hezuo city, Gansu Province, with convenient access to the mine site (Figure 2a). The main ore-bearing position is between Gully 1 and Gully 4, with a total area of approximately 2.6 km<sup>2</sup> (Figure 2b).

**Figure 2.** (**a**) Regional location of study area. QLA = Qilian tectonic belt; WQL: West Qinling tectonic belt; NQL: North Qaidam tectonic belt; SF1: Wushan–Tianshui–Shangdan suture zone; SF2: Maqu-Nanping–Lueyang suture zone; SG: Songpan–Ganzi tectonic belt; YZ: Yangtze block. (**b**) Geological map of Zaozigou gold deposit (modified from [71]). 1. Quartz diorite porphyrite; 2. Granodioriteporphyry; 3. Plagioclase granite porphyry; 4. Gold orebody; 5. Fracture; 6. Mineral exploration line; 7. Drilling Hole. **Figure 2.** (**a**) Regional location of study area. QLA = Qilian tectonic belt; WQL: West Qinling tectonic belt; NQL: North Qaidam tectonic belt; SF1: Wushan–Tianshui–Shangdan suture zone; SF2: Maqu-Nanping–Lueyang suture zone; SG: Songpan–Ganzi tectonic belt; YZ: Yangtze block. (**b**) Geological map of Zaozigou gold deposit (modified from [71]). 1. Quartz diorite porphyrite; 2. Granodioriteporphyry; 3. Plagioclase granite porphyry; 4. Gold orebody; 5. Fracture; 6. Mineral exploration line; 7. Drilling Hole.

There are only Triassic and Quaternary strata exposed in the Zaozigou gold deposit (Figure 2b). The strata of the area are mainly the lower part of the Gulangdi group (T2*g* of the Middle Triassic formation, and the Quaternary (Q<sup>h</sup> alp) is developed in the intermountain valley. The lower part of the Gulangdi group (T2*g* 1 ) is the main ore-bearing rock, There are only Triassic and Quaternary strata exposed in the Zaozigou gold deposit (Figure 2b). The strata of the area are mainly the lower part of the Gulangdi group (T2*g* 1 ) of the Middle Triassic formation, and the Quaternary (Q<sup>h</sup> alp) is developed in the intermountain valley. The lower part of the Gulangdi group (T2*g* 1 ) is the main ore-bearing rock, which is a set of fine clastic rocks consisting of siliceous slate, clastic feldspathic fine sandstone

which is a set of fine clastic rocks consisting of siliceous slate, clastic feldspathic fine sandstone with interbedded siltstone slate and argillaceous slate. The formation is composed

1 ) with interbedded siltstone slate and argillaceous slate. The formation is composed of a large sedimentary cycle from bottom to top consisting of siltstone → argillaceous slate → calcareous slate [66,72].

The fractures are well developed and have a complex morphology, forming an intersecting trend, indicating that the area has experienced multiple phases of geological activity. Fractures strictly control the distribution of orebodies and vein rocks within the deposit. They can be classified into five groups of orientations, namely NW, N-S, NE, E-W, and NNE [73,74]. Fracture structures are the structural surfaces of mineralization, and these structural surfaces control the spreading characteristics of the orebodies [75].

Intermediate-acid dike, including fine crystalline diorite, diorite porphyrite, biotite dioritic porphyrite, and quartz diorite porphyrite densely produced with a porphyritic structure, spreading in NNE direction, turning to a nearly N-S direction in the western part of the deposit and a few NW directions. Influenced by the regional multi-period deep fracture activity, the multi-period magmatism has overlapped on multi-phases mineralization within the deposit [76].

There are 147 gold orebodies have been found in the Zaozigou gold deposit, of which 17 orebodies are main ore bodies with gold reserves greater than 1 tonne, and the total gold reserves of more than 100 tonnes [71]. According to the spatial distribution and combination of the mineralization, Zaozigou deposit can be divided into eastern and western ore groups.

The Eastern ore group is mainly located between Gully 1 and Gully 3 with the strike of NE orientation, containing Au1 controlled by F24, Au9 controlled by F21, and Au15 controlled by F25. These orebodies extend over 1000 m long and 300 m wide, with a NW direction tendency and steep dip near the ground surface, locally nearly upright. In the deep, these orebodies have been staggered by gently dipping fracture, causing the tendency to change to SE orientation (Figure 2b). In addition, orebodies M4 and M6 are laying underground, with the strike of NWW orientation, the tendency of SW orientation, and the dip of 8◦~26◦ . These orebodies cross the NE-striking orebodies obliquely, staggering them, and their own mineralization behavior occurs simultaneously [77].

The Western ore group is mainly distributed between Gully 3 and Gully 4, spreading in a nearly N-S direction, with August 29~August 31 as the main orebodies. The orebodies extend over 1000 m long and wider than 500 m, with a strike of 350◦~10◦ , varying tendency and dips greater than 75◦ , locally subvertical. These orebodies extend, and the mineralization is weaker in the steeper parts and stronger in the shallower parts.

#### *2.2. Datesets Description*

Historical geological and geochemical data were completed, including geological reports, geological exploration maps, and drills geochemical data, from the Development Research Center of China Geological Survey and No.3 Geological and Mineral Exploration team, Gansu Provincial Bureau of Geology and Mineral Exploration and Development, the coordinate system used in the mine-scale is Gaussian Kruger projection coordinates.

This study collected 72 drillings data in the Zaozigou gold deposit, established a drilling location database, an assay database, an inclinometry database, and a lithology database. The 3D model of drillings was constructed based on the drill hole data database (Figure 3).

The primary geochemical halo data from the drillings were collected from the "Zaozigou gold successive resources exploration project in Hezuo city, Gansu Province", with a total of 72 drillings and 5028 samples with 12 elements of Ag, As, Au, Cu, Hg, Pb, Zn, Sb, W, Bi, Co, and Mo (the element detection methods can refer literature [71]). The sampling method was the continuous picking method with sample intervals generally within 10 m. Some orebodies or strongly altered areas being sampled at decreased intervals.

**Figure 3.** 3D model of Drillings in Zaozigou gold deposit. (**a**) Plan view of drillings distribution. (**b**) Side view of drillings distribution. (**c**) Sample grade distribution of drillings. **Figure 3.** 3D model of Drillings in Zaozigou gold deposit. (**a**) Plan view of drillings distribution. (**b**) Side view of drillings distribution. (**c**) Sample grade distribution of drillings. *3.1. Deep Auto-Encoder Network*

#### **3. Methodology** 3.1.1. Network Structure Designing

**3. Methodology**

tection.

The primary geochemical halo data from the drillings were collected from the "Zaozi-*3.1. Deep Auto-Encoder Network* The basic end-to-end structure of DAE and STOAD consists of the input layer (Input),

*Minerals* **2022**, *12*, x FOR PEER REVIEW 6 of 22

gou gold successive resources exploration project in Hezuo city, Gansu Province", with a 3.1.1. Network Structure Designing convolutional layer (CONV), activation function layer, and output layer (Output).

total of 72 drillings and 5028 samples with 12 elements of Ag, As, Au, Cu, Hg, Pb, Zn, Sb, W, Bi, Co, and Mo (the element detection methods can refer literature [71]). The sampling The basic end-to-end structure of DAE and STOAD consists of the input layer (Input), convolutional layer (CONV), activation function layer, and output layer (Output). DAE is a method that uses neural networks to compress and reconstruct data, and its core is to use an Encoder to compress the original data *X* to obtain the intermediate vari-

method was the continuous picking method with sample intervals generally within 10 m. Some orebodies or strongly altered areas being sampled at decreased intervals. **3. Methodology** DAE is a method that uses neural networks to compress and reconstruct data, and its core is to use an Encoder to compress the original data *X* to obtain the intermediate variable Z, and then use a Decoder to reconstruct Z to obtain *X*' (Figure 4). It can be widely applied to tasks such as data dimensional reduction, feature extraction, and anomaly detection. able Z, and then use a Decoder to reconstruct Z to obtain *X*' (Figure 4). It can be widely applied to tasks such as data dimensional reduction, feature extraction, and anomaly de-

**Figure 4.** Network structure of DAE. **Figure 4.** Network structure of DAE.

#### 3.1.2. Model Training

In the DAE model, the objective function is to measure the error between the reconstructed sample and the real sample and back propagate to adjust the network parameters. Mean Square Error (MSE) is usually used as the objective function in the reconstruction problem, and the optimization process is mainly divided into three steps: (i) forward propagation for calculating the output, (ii) calculating the loss of reconstruction, and

(iii) back propagation to calculating the gradient according to the loss for optimizing the model parameters.

$$\mathbf{x}^{l} = \sigma\left(\mathbf{z}^{l}\right) = \sigma\left(\mathbf{W}^{l}\mathbf{x}^{l-1} + \mathbf{b}^{l}\right) \tag{1}$$

Equation (1) express the calculation process from the *l* − 1th layer to the *l*th layer of CNN, where *x l* represents the features of the *l*th layer (e.g., *x* 1 represents the input data, *x* 2 represents the output obtained after the first layer of convolution and activation calculation), *b l* represents the bias term of the *l*th layer, *W<sup>l</sup>* represents the weight parameter of the convolution of the *l*th layer, *Z l* represents the intermediate variables of the *l*th layer without the activation function calculation, and *σ* is the activation function. Rectified Linear Unit (ReLU) function is used as the activation function in this paper.

Assuming that the DAE model has a total of *L* layers, the errors of the input data and the reconstructed data are calculated by MSE to obtain the loss value.

$$\text{Loss} = f(W, b, \mathbf{x}, y) = \frac{1}{2} \parallel \sigma \left(\mathbf{z}^{L}\right) - \mathbf{x}^{1} \parallel\_{2}^{2} = \frac{1}{2} \parallel \sigma \left(W^{L}\mathbf{x}^{L-1} + b^{L}\right) - \mathbf{x}^{1} \parallel\_{2}^{2} \tag{2}$$

*δ L* is the partial derivatives of *Loss* to *z L* ,

$$\delta^L = \frac{\partial f(\mathcal{W}, b, \mathbf{x}, y)}{\partial \mathbf{z}^L} = \frac{\partial f(\mathcal{W}, b, \mathbf{x}, y)}{\partial \mathbf{x}^L} \odot \sigma' \left(\mathbf{z}^L\right) \tag{3}$$

$$\delta^{l} = \frac{\partial I(\mathcal{W}, b, \mathbf{x}, y)}{\partial z^{l+1}} \frac{\partial z^{l+1}}{\partial z^{l}} = \left(\frac{\partial z^{l+1}}{\partial z^{l}}\right)^{T} \delta^{l+1} = \left(\mathcal{W}^{l+1}\right)^{T} \delta^{l+1} \odot \sigma' \left(z^{l}\right) \tag{4}$$

The partial derivatives of *W* and *b* are calculated from the losses, and  is the Hadamard product.

$$\frac{\partial f(\mathcal{W}, b, \mathbf{x}, \mathbf{y})}{\partial \mathcal{W}^{l}} = \frac{\partial f(\mathcal{W}, b, \mathbf{x}, \mathbf{y})}{\partial \mathbf{z}^{l}} \frac{\partial \mathbf{z}^{l}}{\partial \mathbf{w}^{l}} = \delta^{l} \left(\mathbf{x}^{l-1}\right)^{\mathsf{T}};\\\frac{\partial f(\mathcal{W}, b, \mathbf{x}, \mathbf{y})}{\partial \mathbf{b}^{l}} = \frac{\partial f(\mathcal{W}, b, \mathbf{x}, \mathbf{y})}{\partial \mathbf{z}^{l}} \frac{\partial \mathbf{z}^{l}}{\partial \mathbf{b}^{l}} = \delta^{l} \tag{5}$$

The gradient of the back-propagation of each convolutional layer can be calculated by Equation (5), and finally the convolutional kernel parameters are updated by the following two equations. *α* is the learning rate.

$$\mathcal{W}^{l+1} = \mathcal{W}^{l+1} - \alpha \frac{\partial f(\mathcal{W}, b, \mathbf{x}, y)}{\partial \mathcal{W}^{l+1}} \tag{6}$$

$$b^{l+1} = b^{l+1} - a \frac{\partial I(W, b, \mathbf{x}, y)}{\partial b^{l+1}} \tag{7}$$

where *l* = 1, 2, 3 . . . , *L* − 1, the above steps are repeated to achieve the training of the DAE network.

#### 3.1.3. Model Testing

In the testing process, the test data *X* are reconstructed by DAE to obtain *X*', MSE between *X* and *X*' is normalized. Setting the threshold value *µ,* when the MSE value is greater than *µ*, corresponding samples are judged as anomalies, otherwise, the sample is normal. The calculation equations are as follows.

$$\text{Error}(X) = \|\|X - X'\|\|\_2^2 \tag{8}$$

$$\text{Predict}(X) = \text{MinMax}\_{0-1}\left\{ \|\begin{array}{c} X - X' \end{array}\|\_2^2 \right\} = \begin{cases} 1, & \text{Predict}(X) \ge \mu \\ 0, & \text{Predict}(X) < \mu \end{cases} \tag{9}$$

#### *3.2. KD-Based STOAD Model 3.2. KD-Based STOAD Model* 3.2.1. Network Structure Design

### 3.2.1. Network Structure Design

*Minerals* **2022**, *12*, x FOR PEER REVIEW 8 of 22

KD [78] is a mainstream method applied to model compression, which mainly consists of the Teacher network (T) and Student network (S). The larger T network is usually trained first for the tasks of image classification, target detection, semantic segmentation, etc., and then the smaller S network is trained to learn the generalization ability of the T network. Compared with T-networks, S-networks have fewer parameters and faster computation speed. KD [78] is a mainstream method applied to model compression, which mainly consists of the Teacher network (T) and Student network (S). The larger T network is usually trained first for the tasks of image classification, target detection, semantic segmentation, etc., and then the smaller S network is trained to learn the generalization ability of the T network. Compared with T-networks, S-networks have fewer parameters and faster computation speed. Different information will be contained at different depth of the network. For fully

Different information will be contained at different depth of the network. For fully extract metallogenic information, the STOAD model is proposed to achieve anomaly detection of layers based on calculating the differences between T encoder and S network. STOAD is divided into two structures: The T network and the S network, where the T network has the same Encoder and Decoder as the DAE, and the S network has the same structure as the Encoder in the T network. Especially, three scales output are designing to express different information at different depth of the network. extract metallogenic information, the STOAD model is proposed to achieve anomaly detection of layers based on calculating the differences between T encoder and S network. STOAD is divided into two structures: The T network and the S network, where the T network has the same Encoder and Decoder as the DAE, and the S network has the same structure as the Encoder in the T network. Especially, three scales output are designing to express different information at different depth of the network.

In the training process, STOAD model uses all layers to pre-train the T network for reconstruction. The encoder in the T network can be recognized as having the ability to compress the layers, and thus can be seen as having the ability to handle both normal and abnormal data [79]. If the T network is not trained, the anomaly detection can be achieved, but the feature extraction of the T network is not clear, and the interpretability is poor. Sequentially, taking normal data as input, training the S network, calculating the three scale output differences of T encoder and S network using MSE, while *h<sup>i</sup>* , *i* = 1, 2, 3, are the scales represents different depth of the network. Finally, back-propagates to update the parameters of the S network, the S network only has the capacity of handling normal data learned from the encoder of T network. In the training process, STOAD model uses all layers to pre-train the T network for reconstruction. The encoder in the T network can be recognized as having the ability to compress the layers, and thus can be seen as having the ability to handle both normal and abnormal data [79]. If the T network is not trained, the anomaly detection can be achieved, but the feature extraction of the T network is not clear, and the interpretability is poor. Sequentially, taking normal data as input, training the S network, calculating the three scale output differences of T encoder and S network using MSE, while *hi*, *i* = 1,2,3, are the scales represents different depth of the network. Finally, back-propagates to update the parameters of the S network, the S network only has the capacity of handling normal data learned from the encoder of T network.

In the testing process, taking all data as input. Since the S network only learns the ability of dealing with normal data from the encoder of T network, the S network is difficult to maintain the similar multiscale output features with the T network because of a larger MSE. Therefore, the anomaly detection can be achieved by the MSE values (Figure 5). In contrast to DAE, STOAD not only has three outputs, but also uses multiple levels of semantic and detailed information to make full use of the information in the data for anomaly detection. In the testing process, taking all data as input. Since the S network only learns the ability of dealing with normal data from the encoder of T network, the S network is difficult to maintain the similar multiscale output features with the T network because of a larger MSE. Therefore, the anomaly detection can be achieved by the MSE values (Figure 5). In contrast to DAE, STOAD not only has three outputs, but also uses multiple levels of semantic and detailed information to make full use of the information in the data for anomaly detection.

**Figure 5. Figure 5.** STOAD network structure. STOAD network structure.

### 3.2.2. Model Training and Testing

The training loss of STOAD consists of two parts, which are the training loss of the T network and the S network

The overall T-network can be considered as a DAE, and MSE is used as the reconstruction *loss* of the T-network, so that the encoder part of the T-network holds the data compress capability. The training *loss* of the T network can be calculated as follow equation.

$$Loss\_{\text{teacher}} = \frac{1}{2} \parallel X - X\prime\parallel\_2^2 \tag{10}$$

The *loss* of the S network consists of the MSEs of the three scales output of the T network. In order for the S network to learn the normal data compression ability from the T network, let *h*<sup>1</sup> = *h*<sup>2</sup> = *h*<sup>3</sup> = 1/3, and the training *loss* of the S network is calculated by Equation (8).

$$Loss\_{\text{student}} = \sum\_{i=1}^{3} \frac{1}{2} h\_i \parallel X\_{\text{t}\_i} - X\_{\text{s}\_i} \parallel\_2^2 \tag{11}$$

The STOAD model training and testing process is mainly as follows.


$$Loss\_{\text{student}} = f(\mathcal{W}, b, \mathbf{x}, \mathbf{y}) = \sum\_{i=1}^{3} \frac{1}{2} h\_i \parallel \sigma \left( \mathcal{W}\_{\text{t}}^{i+1} \mathbf{x}^i + b\_{\mathbf{t}}^{i+1} \right) - \sigma \left( \mathcal{W}\_{\text{s}}^{i+1} \mathbf{x}^i + b\_{\mathbf{s}}^{i+1} \right) \parallel\_2^2 \tag{12}$$

*δ L* is the partial derivatives of *z <sup>L</sup>* of S network.

$$\delta^L = \frac{\partial J(\mathcal{W}, b, \mathbf{x}, y)}{\partial \mathbf{z}^L} = \sum\_{i=1}^3 \frac{1}{2} h\_i \frac{\partial J(\mathcal{W}, b, \mathbf{x}, y)}{\partial \mathbf{x}^L} \odot \sigma' \left(\mathbf{z}^L\right) \tag{13}$$

and the *δ l* is, *i* is 1 to *L* − 1:

$$\boldsymbol{\delta}^{l} = \frac{\partial \boldsymbol{I}(\boldsymbol{W}, \boldsymbol{b}, \boldsymbol{x}, \boldsymbol{y})}{\partial \boldsymbol{z}^{l+1}} \frac{\partial \boldsymbol{z}^{l+1}}{\partial \boldsymbol{z}^{l}} = \left(\frac{\partial \boldsymbol{z}^{l+1}}{\partial \boldsymbol{z}^{l}}\right)^{T} \boldsymbol{\delta}^{l+1} = \left(\boldsymbol{W}^{l+1}\right)^{T} \boldsymbol{\odot} \boldsymbol{\sigma}^{\prime} \left(\boldsymbol{z}^{l}\right) \boldsymbol{\delta}^{l+1} \tag{14}$$

$$\frac{\partial J(\mathcal{W}, b, \mathbf{x}, y)}{\partial \mathcal{W}^l} = \frac{\partial J(\mathcal{W}, b, \mathbf{x}, y)}{\partial z^l} \frac{\partial z^l}{\partial w^l} = \delta^l \left(\mathbf{x}^{l-1}\right)^T \tag{15}$$

$$\frac{\partial J(\mathcal{W}, b, \mathbf{x}, y)}{\partial b^{l}} = \frac{\partial J(\mathcal{W}, b, \mathbf{x}, y)}{\partial z^{l}} \frac{\partial z^{l}}{\partial b^{l}} = \delta^{l} \tag{16}$$

Convolutional module is updated by the following two equations. *α* is the learning rate.

$$\mathcal{W}^{l+1} = \mathcal{W}^{l+1} - \alpha \frac{\partial f(\mathcal{W}, b, \mathbf{x}, y)}{\partial \mathcal{W}^{l+1}} \tag{17}$$

$$b^{l+1} = b^{l+1} - \alpha \frac{\partial I(\mathcal{W}, b, \mathbf{x}, y)}{\partial b^{l+1}} \tag{18}$$

where *l* = 1, 2, . . . , *L* − 1.

In the testing process, we used T and S networks to calculate the testing dataset, separately. Defining the weights of three scales are *Wh*<sup>1</sup> = *Wh*<sup>2</sup> = *Wh*<sup>3</sup> = 1/3, the weighted sum of the output can be calculated by Equation (19), where *X*t<sup>i</sup> and *X*s<sup>i</sup> represent the output of the Encoder part of the T network and the S network, respectively. Predict (*x*) is

the normalized Error (*x*), selecting a threshold *µ*, when the Predict (*x*) is greater than *µ* the sample point is judged to be an abnormal sample, otherwise it is a normal sample.

$$\text{Error}(\mathbf{x}) = \sum\_{i=1}^{3} h\_i \parallel X\_{t\_i} - X\_{s\_i} \parallel\_2^2 \tag{19}$$

$$\text{Predict}(\mathbf{x}) = \text{MinMax}\_{\mathbf{0}-1} \{ \sum\_{i=1}^{3} h\_i \parallel \mathbf{X}\_{\mathbf{t}\_i} - \mathbf{X}\_{\mathbf{s}\_i} \parallel ^2\_2 \} = \begin{cases} 1, & \text{Predict}(\mathbf{x}) \ge \mu \\ 0, & \text{Predict}(\mathbf{x}) < \mu \end{cases} \tag{20}$$

#### **4. Results** the normalized Error (*x*), selecting a threshold *μ*, when the Predict (*x*) is greater than *μ* the

#### *4.1. Geological and Geochemical Quantitative Prediction Model at Depth of Zaozigou Gold Deposit* sample point is judged to be an abnormal sample, otherwise it is a normal sample.

Before carrying out the deep quantitative mineral prospectivity mapping in the Zaozigou gold deposit, it is necessary to review the geological and geochemical quantitative prediction model established in previous work [80]. Error() = ∑ℎ ∥ t<sup>i</sup> − s<sup>i</sup> ∥2 2 3 =1 (19)

Orebodies are strictly controlled by fractures in the Zaozigou gold deposit. The 30 m buffer zone of the fractures can effectively reflect the influence range of fracture, which can be used as a mineral prediction indicator. Factor F4 is an element association of Sb-Hg, which has a close relationship with fractures, and can be used as a favorable indicator for inferring deep fractures. The middle anomaly area of Au extracted by the multiple fractal C-V model can well reflect the spatial distribution of orebodies, which should be used as an important quantitative indicator. Near-ore halo element association B2 is extracted by the knowledge-driven CoDA also can express the location of orebodies well, it should be another mineral prediction indicator. The ratio of front halo to tail halo is an important geochemical parameter for predicting orebodies, B1/B3 is regarded as a prediction indicator accordingly (Table 1; Figure 6). The detailed data analysis and interpretation can refer to the literature [80]. Predict () = 0−1 {∑ℎ ∥ t<sup>i</sup> − s<sup>i</sup> ∥2 2 3 =1 } = { 1, () ≥ 0, () < (10) **4. Results** *4.1. Geological and Geochemical Quantitative Prediction Model at Depth of Zaozigou Gold Deposit* Before carrying out the deep quantitative mineral prospectivity mapping in the Zaozigou gold deposit, it is necessary to review the geological and geochemical quantitative prediction model established in previous work [80]. Orebodies are strictly controlled by fractures in the Zaozigou gold deposit. The 30 m

**Table 1.** Geological and geochemical quantitative mineral resource prediction model at depth of the Zaozigou gold deposit. buffer zone of the fractures can effectively reflect the influence range of fracture, which can be used as a mineral prediction indicator. Factor F4 is an element association of Sb-Hg, which has a close relationship with fractures, and can be used as a favorable indicator


can refer to the literature [80].

**Figure 6.** *Cont*.

**Figure 6.** (**a**) Fracture buffer zone; (**b**) Element association of fracture; (**c**) Au anomaly; (**d**) Element association of near-ore halo; (**e**) Geochemical parameter (Front halo/tail halo). **Figure 6.** (**a**) Fracture buffer zone; (**b**) Element association of fracture; (**c**) Au anomaly; (**d**) Element association of near-ore halo; (**e**) Geochemical parameter (Front halo/tail halo).

#### **Table 1.** Geological and geochemical quantitative mineral resource prediction model at depth of the *4.2. Training Sample Selection*

Zaozigou gold deposit. **Ore-Forming Factor Description Prediction Indicator Variables** Geology fracture Influence range of fracture 30 m buffer zone Element association of fracture Hg-Sb (F4) Ore-forming For mineralization prediction, the target variable (i.e., the label of training samples) is either mineralized or nonmineralized (denoted by 1 and 0, respectively). A total of 39,306 positive samples extracted from known orebodies and 49,686 negative samples extracted from the nonmineralized position confirmed by drillings are used as training dataset. In contrast to mineralization generates concentrated in limited space, the nonmineralization is a widespread phenomenon, and negative samples are selected to be distributed as randomly and uniformly as possible throughout the study area [81] (Figure 7). *Minerals* **2022**, *12*, x FOR PEER REVIEW 12 of 22

(B3)

**Figure 7.** Distribution of positive samples and negative samples. **Figure 7.** Distribution of positive samples and negative samples.

#### *4.3. DAE-Based 3D MPM 4.3. DAE-Based 3D MPM*

GeForce RTX 3090.

(Figure 9).

DAE is a widely used unsupervised model in deep learning. In this study, the DAE model is constructed by convolution, and the training process automatically learns to extract the semantic features of the normal evidence layer to compress and reconstruct the corresponding layers. In the testing process, detecting ore-induced anomalies by MSE value, the higher the MSE value is, the greater the loss of reconstruction is, which illus-DAE is a widely used unsupervised model in deep learning. In this study, the DAE model is constructed by convolution, and the training process automatically learns to extract the semantic features of the normal evidence layer to compress and reconstruct the corresponding layers. In the testing process, detecting ore-induced anomalies by MSE value, the higher the MSE value is, the greater the loss of reconstruction is, which illustrates the high-probability area of ore-forming.

trates the high-probability area of ore-forming. This DAE model is built using python 3.8, PyTorch 1.71 framework, the optimizer is Adam, the batch size is 16, the learning rate is 0.001, the learning rate decay is 10% per 5 This DAE model is built using python 3.8, PyTorch 1.71 framework, the optimizer is Adam, the batch size is 16, the learning rate is 0.001, the learning rate decay is 10% per

Epochs, the CPU is 12th Gen Intel(R) Core(TM) i7-127OOKF, and the GPU is NVIDIA

the remaining 30% of the normal samples and the relatively small number of anomaly samples. The DAE model is trained for 500 Epochs and the loss function is plotted as follows. The model relatively converges at an MSE value of about 0.025, but the MSE fluctuates wider (Figure 8). In Figures 4 and 6, the horizontal axis indicates the sampling order and the vertical axis indicates the loss of DAE. The prediction parameters of precision value reaches 93%, Recall value is 88%, and F1-measure is 90%, which demonstrate the high accuracy and reliability of DAE-based MPM. It clearly shows that the high ore-forming probability at depth of Zaozigou gold deposit, which is worth for further investigation 5 Epochs, the CPU is 12th Gen Intel(R) Core(TM) i7-127OOKF, and the GPU is NVIDIA GeForce RTX 3090.

DAE model uses 70% of the normal data for training, and the testing dataset includes the remaining 30% of the normal samples and the relatively small number of anomaly samples. The DAE model is trained for 500 Epochs and the loss function is plotted as follows. The model relatively converges at an MSE value of about 0.025, but the MSE fluctuates wider (Figure 8). In Figures 4 and 6, the horizontal axis indicates the sampling order and the vertical axis indicates the loss of DAE. The prediction parameters of precision value reaches 93%, Recall value is 88%, and F1-measure is 90%, which demonstrate the high accuracy and reliability of DAE-based MPM. It clearly shows that the high ore-forming probability at depth of Zaozigou gold deposit, which is worth for further investigation (Figure 9). *Minerals* **2022**, *12*, x FOR PEER REVIEW 13 of 22 *Minerals* **2022**, *12*, x FOR PEER REVIEW 13 of 22

**Figure 8.** The training loss of DAE model. **Figure 8.** The training loss of DAE model. **Figure 8.** The training loss of DAE model.

*4.4. STOAD-Based 3D Mineral Prospectivity Mapping* **Figure 9.** DAE-based 3D MPM. **Figure 9.** DAE-based 3D MPM.

*4.4. STOAD-Based 3D Mineral Prospectivity Mapping*

struction ability of Decoder.

Epochs.

Epochs.

anomaly samples.

struction ability of Decoder.

anomaly samples.

ReLU activation function is used for its capability of dealing with nonlinear data.

The STOAD model is implemented using python 3.8, and PyTorch 1.71 framework. The T network is mainly composed of two parts: encoder and Decoder. The structure of T

The S network only has the encoder part, which is consistent with the T encoder.

The same training parameters are used in the T and S networks, with an optimizer of Adam, a batch size of 16, a learning rate of 0.001, and a learning rate decay of 10% per 5

Similarly, STOAD uses 70% of the normal data for training, and the testing dataset includes the remaining 30% of the normal samples and the relatively small number of

ReLU activation function is used for its capability of dealing with nonlinear data.

The S network only has the encoder part, which is consistent with the T encoder.

The same training parameters are used in the T and S networks, with an optimizer of Adam, a batch size of 16, a learning rate of 0.001, and a learning rate decay of 10% per 5

Similarly, STOAD uses 70% of the normal data for training, and the testing dataset includes the remaining 30% of the normal samples and the relatively small number of

The T network is mainly composed of two parts: encoder and Decoder. The structure of T network is symmetric for letting the number of parameters of both parts identical, and thus the data compression ability of encoder is approximately equal to the data recon-

#### *4.4. STOAD-Based 3D Mineral Prospectivity Mapping*

The STOAD model is implemented using python 3.8, and PyTorch 1.71 framework. The T network is mainly composed of two parts: encoder and Decoder. The structure of T network is symmetric for letting the number of parameters of both parts identical, and thus the data compression ability of encoder is approximately equal to the data reconstruction ability of Decoder.

The S network only has the encoder part, which is consistent with the T encoder. ReLU activation function is used for its capability of dealing with nonlinear data.

The same training parameters are used in the T and S networks, with an optimizer of Adam, a batch size of 16, a learning rate of 0.001, and a learning rate decay of 10% per 5 Epochs. *Minerals* **2022**, *12*, x FOR PEER REVIEW 14 of 22

> Similarly, STOAD uses 70% of the normal data for training, and the testing dataset includes the remaining 30% of the normal samples and the relatively small number of anomaly samples. The STOAD model is trained with 500 Epochs, and the Teacher loss function and the

> The STOAD model is trained with 500 Epochs, and the Teacher loss function and the Student loss function are plotted in Figure 10, the T network converges at about MSE value of 0.025 (similar to the above DAE), and fluctuates wider. The MSE curve of the S network converges at a lower value of about 0.0125, and distributes more stable than T network. The loss of the T network is the same as DAE indicating the ability to reconstruct all training data, the loss of the S network represents the difference in output between the S and T networks, and the more stable loss of the S network means that the S network can learn better the ability of the T network to reconstruct normal data. Student loss function are plotted in Figure 10, the T network converges at about MSE value of 0.025 (similar to the above DAE), and fluctuates wider. The MSE curve of the S network converges at a lower value of about 0.0125, and distributes more stable than T network. The loss of the T network is the same as DAE indicating the ability to reconstruct all training data, the loss of the S network represents the difference in output between the S and T networks, and the more stable loss of the S network means that the S network can learn better the ability of the T network to reconstruct normal data.

**Figure 10.** The training loss of STOAD model. **Figure 10.** The training loss of STOAD model.

11).

The prediction parameter of Precision value reaches 96%, Recall value is 90%, and F1 The prediction parameter of Precision value reaches 96%, Recall value is 90%, and F1 value is 93%.

value is 93%. The 3D MPM based on STOAD model shows the high ore-forming probability at depth of Zaozigou gold deposit which has the similar characteristics ore-forming proba-The 3D MPM based on STOAD model shows the high ore-forming probability at depth of Zaozigou gold deposit which has the similar characteristics ore-forming probability distribution at depth. It gives evidence for deep mineral resources prediction (Figure 11).

bility distribution at depth. It gives evidence for deep mineral resources prediction (Figure

**Figure 11.** STOAD-based 3D MPM. **Figure 11.** STOAD-based 3D MPM. **Figure 11.** STOAD-based 3D MPM.

#### **5. Discussion 5. Discussion 5. Discussion**

and 11).

In this study, the deep learning methods of DAE and STOAD model are carried out for 3D MPM in Zaozigou gold deposit. In this study, the deep learning methods of DAE and STOAD model are carried out for 3D MPM in Zaozigou gold deposit. In this study, the deep learning methods of DAE and STOAD model are carried out for 3D MPM in Zaozigou gold deposit.

Compared with the DAE model, STOAD model has a higher precision in detection of ore-induced anomalies, which demonstrates a higher reliability of 3D MPM (Figures 9 Compared with the DAE model, STOAD model has a higher precision in detection of ore-induced anomalies, which demonstrates a higher reliability of 3D MPM (Figures 9 and 11). Compared with the DAE model, STOAD model has a higher precision in detection of ore-induced anomalies, which demonstrates a higher reliability of 3D MPM (Figures 9 and 11).

Meanwhile, the ROCs of the two methods were used to evaluate the models, and the AUC values of the two methods were 0.92 of DAE and 0.97 of STOAD. Observably, the STOAD model expresses a higher performance (Figure 12). Meanwhile, the ROCs of the two methods were used to evaluate the models, and the AUC values of the two methods were 0.92 of DAE and 0.97 of STOAD. Observably, the STOAD model expresses a higher performance (Figure 12). Meanwhile, the ROCs of the two methods were used to evaluate the models, and the AUC values of the two methods were 0.92 of DAE and 0.97 of STOAD. Observably, the STOAD model expresses a higher performance (Figure 12).

**Figure 12.** ROC curves of the DAE and STOAD. **Figure 12. Figure 12.** ROC curves of the DAE and STOAD. ROC curves of the DAE and STOAD.

From the points of algorithm theory, the STOAD model is proposed based on the KD idea combined with the DAE network, which is an improvement of DAE. So, it is better than the conventional DAE model in terms of algorithm performance. From the points of algorithm theory, the STOAD model is proposed based on the KD idea combined with the DAE network, which is an improvement of DAE. So, it is better than the conventional DAE model in terms of algorithm performance.

From the points of application, the results of quantitative mineral resources prediction show that STOAD is more accurate and robust, which is demonstrated by prediction parameters, the loss curve (Figures 8 and 10) and the ROC curve (Figure 12). From the points of application, the results of quantitative mineral resources prediction show that STOAD is more accurate and robust, which is demonstrated by prediction parameters, the loss curve (Figures8 and 10) and the ROC curve (Figure 12).

Furthermore, a total of 4 methods, two machine learning methods (MaxEnt model and GMM) and two deep learning methods (DAE and STOAD model), have been carried out for 3D MPM (Figure 13). The machine learning-based 3D MPM can refer to the literature [80], which is the previous work of our research team. Furthermore, a total of 4 methods, two machine learning methods (MaxEnt model and GMM) and two deep learning methods (DAE and STOAD model), have been carried out for 3D MPM (Figure 13). The machine learning-based 3D MPM can refer to the literature [80], which is the previous work of our research team.

**Figure 13.** Comparison of the 3D MPM of 4 methods. (**a**) MaxEnt mode. (**b**) GMM. (**c**) DAE model. (**d**) STOAD model. **Figure 13.** Comparison of the 3D MPM of 4 methods. (**a**) MaxEnt mode. (**b**) GMM. (**c**) DAE model. (**d**) STOAD model.

The prediction results of the four methods exhibit a high correlation with the known orebodies in the shallow part and a slight difference in the deep part (Figure 13). From the comparative analysis of the prediction results and model performance among the four methods, the STOAD takes advantage of mathematical theory and has the highest AUC value (Figure 14). Especially, the high probability area can better reflect the metallogenic regularity at large depth. Therefore, it is decided to delineate a comprehensive mineral exploration targets at large depth of the Zaozigou based on the prediction results of STOAD model and supported by the prediction results of other methods. In order to improve the prediction accuracy, the area with a logical probability greater than 0.75 and less than 0.998 is defined as a medium potential area, and the area with a logical probability greater than 0.998 is defined as a high potential area, based on which three main mineral exploration targets are circled (Figure 15). The prediction results of the four methods exhibit a high correlation with the known orebodies in the shallow part and a slight difference in the deep part (Figure 13). From the comparative analysis of the prediction results and model performance among the four methods, the STOAD takes advantage of mathematical theory and has the highest AUC value (Figure 14). Especially, the high probability area can better reflect the metallogenic regularity at large depth. Therefore, it is decided to delineate a comprehensive mineral exploration targets at large depth of the Zaozigou based on the prediction results of STOAD model and supported by the prediction results of other methods. In order to improve the prediction accuracy, the area with a logical probability greater than 0.75 and less than 0.998 is defined as a medium potential area, and the area with a logical probability greater than 0.998 is defined as a high potential area, based on which three main mineral exploration targets are circled (Figure 15).

Target I is located at elevation 1600~2000 m, belongs to the NE-orientation orebody group, which should be the extension of the Au1 orebody. In this position, we developed the main ore-forming fracture. The Au and Sb concentration in this position, and the ratio of front halo to tail halo has been increasing. Especially, it appears high logical probability calculated by all four methods at this position, indicating the Au1 orebody may extends deeper than the elevation of 1800 m or a concealed orebody exist therein.

Target II is located in the NW-orientation orebody group at the elevation of 2000~2400 m, where the fractures distribute complex, and the tail halo element Co has a significant vertical overlap with the front halo elements Sb and As. The elevated geochemical parameters starting from 2600 m to deep indicate that there may be an extension of ore bodies or new blind orebodies at depth. According to the theory of primary halo zonation, the front halo indicators generally appear at the leading edge of the ore body 100–300 m, and the blind ore body will appear at an altitude of 2400 m deep.

**Figure 14.** ROC curves of 4 methods, MaxEnt model, GMM, DAE model and STOAD model. **Figure 14.** ROC curves of 4 methods, MaxEnt model, GMM, DAE model and STOAD model. **Figure 14.** ROC curves of 4 methods, MaxEnt model, GMM, DAE model and STOAD model.

**Figure 15.** Comprehensive mineral exploration targets. **Figure 15.** Comprehensive mineral exploration targets. **Figure 15.** Comprehensive mineral exploration targets.

Target I is located at elevation 1600~2000 m, belongs to the NE-orientation orebody group, which should be the extension of the Au1 orebody. In this position, we developed the main ore-forming fracture. The Au and Sb concentration in this position, and the ratio of front halo to tail halo has been increasing. Especially, it appears high logical probability calculated by all four methods at this position, indicating the Au1 orebody may extends deeper than the elevation of 1800 m or a concealed orebody exist therein. Target II is located in the NW-orientation orebody group at the elevation of 2000~2400 m, where the fractures distribute complex, and the tail halo element Co has a Target I is located at elevation 1600~2000 m, belongs to the NE-orientation orebody group, which should be the extension of the Au1 orebody. In this position, we developed the main ore-forming fracture. The Au and Sb concentration in this position, and the ratio of front halo to tail halo has been increasing. Especially, it appears high logical probability calculated by all four methods at this position, indicating the Au1 orebody may extends deeper than the elevation of 1800 m or a concealed orebody exist therein. Target II is located in the NW-orientation orebody group at the elevation of 2000~2400 m, where the fractures distribute complex, and the tail halo element Co has a Target III is located below 1600 m, it can be divided into targets III-1, III-2, and III-3. Target III-1 is located at elevation of about 1300 m, target III-2 located at elevation of about 1000 m., target III-3 located at elevation of 700 m. The deep fracture is the main indicator. It is worth mentioning that the deep drill for scientific research has achieved in 2021, which is the deepest drill of Zaozigou gold deposit, which reaches to the elevation of 700 m (Figure 15). In previous study, there may be an "upper veins and lower layers" distribution feature of orebodies [82]. The shallow stratigraphy is strongly folded, and steeply dipping fracture structures are developed; the orebody is controlled by fractures and vein rocks,

while the deep orebody may be mainly controlled by intrusive mass to form laminated ore bodies. This understanding is also reflected in the present prediction targets.

#### **6. Conclusions**

To address the scientific problem of quantitative mineralization prediction at large depth, the geological and geochemical quantitative mineral resources prediction model at depth of the Zaozigou gold deposit is constructed based on the quantitative extraction of mineralization signatures, which consists of five quantitative prediction indicators, including 30 m fracture buffer zone, element association (Sb-Hg) of ore-bearing structures, metallogenic element Au, near-ore halo element association Au-Ag-Cu-Pb-Zn, and the ratio of front halo to tail halo (As-Sb-Hg)/(W-Mo-Bi). Then, two machine learning methods (MaxEnt model and GMM) and two deep learning methods (DAE and STOAD model) are carried out for 3D MPM, and mineral exploration targets are delineated at depth (Figure 15).

Because the primary halo of drills is limited by the engineering-controlled boundaries, this study adopts the idea of quantitative expression of ore-controlling factors by element associations and limit the deep inference range within the modeling space to perform deep quantitative mineral prediction. However, the accuracy of the prediction of machine learning and deep learning methods inevitably decreases rapidly with increasing depth. This shortcoming can be supplemented by two aspects of quantitative inversion of geophysical data and numerical simulation of deep mineralization dynamics, which is also a direction worth further research.

In summary, a series of methods are developed in this study of 3D Mineral Prospectivity Mapping of Zaozigou gold deposit, West Qinling, China. The achieved results by machine learning and deep learning methods shows well performance in quantitative mineral resources prediction at depth, which is worth being promoted to similar ore deposits. Moreover, it was concluded that the deep learning-based approach works better with larger amounts of data, and the STOAD proposed in this paper has a higher accuracy in anomaly detection. May the relevant ideas of this study provide a reference for related researchers.

**Author Contributions:** Ideas, B.L. and Z.Y.; Methodology, Z.Y. and M.X.; software, M.X., Y.W., Y.K., C.L. and G.C.; writing—original draft preparation, Z.Y., B.L., C.L., Y.W., Y.K. and Y.G.; writing review and editing, Z.Y. and B.L.; visualization, M.X., Y.K., S.Z., H.Z., L.W. and R.T.; supervision, B.L.; funding acquisition, B.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key Research and Development Program of China (Grants 2017YFC0601505); the National Natural Science Foundation of China (Grants 41602334; 42072322); the Key Laboratory of Geochemical Exploration, Ministry of Natural Resources (Grant AS2019P02-01); Sichuan Science and Technology Program (Grant 2022NSFSC0510); and the Opening Fund of the Geomathematics Key Laboratory of Sichuan Province (Grant scsxdz2020yb06, scsxdz2021zd04).

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors thank the anonymous reviewers and the editors for their hard work to this paper. We are grateful to the Development Research Center of China Geological Survey and No. 3 Geological and Mineral Exploration team, Gansu Provincial Bureau of Geology and Mineral Exploration and Development for their data support.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

