**1. Introduction**

The main task of quantitative mineral prediction is to conduct comprehensive analysis of geological, geophysical, geochemical and remote sensing data and drilling engineering data in the study area based on the research of the geological background and metallogenic regularity, and then construct mathematical models to effectively extract and identify favorable information on mineralization, carry out data fusion of quantitative mineralization information, construct mineral prediction models, perform potential mineral resources quantitative assessment and exploration targets delineation [1–3].

**Citation:** Kong, Y.; Chen, G.; Liu, B.; Xie, M.; Yu, Z.; Li, C.; Wu, Y.; Gao, Y.; Zha, S.; Zhang, H.; et al. 3D Mineral Prospectivity Mapping of Zaozigou Gold Deposit, West Qinling, China: Machine Learning-Based Mineral Prediction. *Minerals* **2022**, *12*, 1361. https://doi.org/10.3390/ min12111361

Academic Editor: Martiya Sadeghi

Received: 31 July 2022 Accepted: 16 October 2022 Published: 26 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Quantitative mineral resource prediction is developing toward deep 3D quantitative prediction. The framework of 3D quantitative mineral prediction has been basically completed, among which the representative works include Xiao [4–10], who studied the characteristics of large-scale three-dimensional mineralization prediction and summarized the combined prediction process of mine concession structure, mineral exploration model, metallogenic series and quantitative analysis methods, which laid the foundation for three-dimensional mineral resource prediction; Chen [11–14] proposed the "cubic prediction model" for mineralization prediction by 3D-comprehensive 3D modeling of geology, geophysics, geochemistry and remote sensing [15–17]; Mao [18–22] proposed a three-dimensional prediction process for deep mineral resources prediction, the threedimensional morphological analysis of geological bodies, geological anomaly extraction and three-dimensional quantitative prediction methods [23]; Yuan [24,25] proposed a "four-step" three-dimensional mineralization prediction method, which summarizes the three-dimensional mineralization prediction work into four key steps: three-dimensional geological database construction, three-dimensional geological modeling and mapping, three-dimensional prediction information deep mining and three-dimensional geosciences data fusion and prediction.

In this study, three-dimensional quantitative mineral resources prediction is executed from a geological and geochemical perspective. It is a common understanding in the last century that the distribution of rock geochemical elements follows normal or lognormal distribution, based on which classical statistical theories and methods have been widely applied in the field of geochemical data analysis [26]. Since the 1990s, fractal and multifractal methods have been developed rapidly, with representative achievement from Cheng's team, who proposed the content-area (C-A) model [26,27], the spectrum-area (S-A) model [28] and Local Singularity Analysis (LSA) [26,29–32], and it is believed that the multifractal distribution has the ability to simultaneously portray the normal distribution, lognormal distribution, Zipf's law, Pareto's law, etc., and can be used as the basic law of geochemical element distribution [26,29]. Based on the theoretical research of fractal and multifractal methods, its application in geochemical spatial distribution pattern research is blossoming [10,33–37]. The multifractal-related methods have achieved many successful cases in regional geochemical anomaly extraction [32,38–45]. With the improvement of 3D software and hardware performance, as well as the development of 3D modeling technology, fractal models used in 3D data have made some progress [34,46,47]. Three-dimensional primary halo geochemical survey is an effective tool for deep mineral resources prediction. In fact, the spatial zonation pattern of primary haloes is the pattern of element clustering regularity in 3D space, and the anomalous structure of elements is an intuitive indicator of deep mineral prediction. Especially, Cheng [27] proposed that the distribution of geochemical elements in either horizontal or vertical directions is consistent with the fractal distribution, based on which Afzal [34] proposed the C-V model, which is a powerful tool to delineate the nonlinear characteristics of element distribution in a 3D space.

Geochemical data are typically compositional data, which gives rise to the problem of "closure effects" due to the lack of scale consistency in the covariance matrix of the compositional data [48–50], and, in practice, classical geostatistical methods that ignore the compositional properties of geochemical data often yield poor results [51–60]. Since the introduction of Aitchison geometry and the application of additive log-ratio transformation (alr) [61], the theory and methods of log-ratio transformation of compositional data have been gradually improved, making the application of classical statistical methods more reasonable [33,35]. The geochemical elemental association extraction method based on Sequential Binary Partition (SBP) [50] can design elemental associations based on the geological background and mineralization knowledge, which is easy to interpret, and provides solutions for geochemical inference of lithology, tectonics, alteration, mineralization, etc. Therefore, the CoDA method provides new ideas for analyzing the relationship of geochemical data and extracts geochemical associations for evaluating mineral resources rapidly [51,55,62–67].

Geosciences is a data-intensive science, and geological survey and mineral exploration have accumulated a large amount of multi-source geosciences data, and the introduction of big data and its application of machine learning-related methods can effectively support mineral resources exploration [26,68–73]. During the past decade or so, a large number of achievements related to machine learning have been published in geosciences [37,57,74–85]. For example, based on metallogenic regularity, an intelligent geology and intelligent mine model is established on a big data platform combined with high-performance computers [70,86]; big data methods are used to realize the automatic collection of geological prospecting thematic data, machine learning- and deep learningrelated methods development for deep mining of geological data, intelligent prospecting, etc. [37,87–89]; and three-dimensional prediction methods of hidden orebodies can be based on deep learning [17,72,90–98]. Numerous studies have shown that big data and machine learning have good performance in mineralization prediction, and machine learning-related algorithm research has greatly enriched the approaches of geological data processing and analysis. Meanwhile, data cleaning, data screening and deep mining of geological big data can enrich the information sources for mineralization information inference and quantitative mineral prediction at depth.

This study focuses on the scientific problems of deep extraction and inference of favorable geological and geochemical information about mineralization at depth and quantitative mineralization prediction at large depth. Specifically, the C-V model will be used to extract the spatial distribution pattern of primary halo geochemical element anomalies, and the CoDA will be employed to identify and infer the element associations for tracing the extension of deep ore-controlling information. Moreover, the machine learning methods of the MaxEnt model and GMM will be carried out for mineral resources prediction at the large depth of the Zaozigou gold deposit.

#### **2. Geological Setting and Datasets**

#### *2.1. Geological Setting*

West Qinling is located at the western part of the Qinling orogenic belt, with the Qilian orogenic belt to the north, the Qaidam block to the west and the songpan block to the south (Figure 1a) [99–102].

The geotectonic position of the Xiahe–Hezuo area is located at the northwestern part of the West Qinling orogenic belt and at the western extension of the Qinling–Qilian–Kunlun central orogenic belt, whose complex geological structure creates a superior mineralization environment [103].

The Zaozigou gold deposit is located within the West Qinling fold belt and is a typical epithermal-type gold deposit in the Xiahe–Hezuo area (Figure 1a). The main controlling factors for mineral resources formation within the region are tectonic movement and magmatism [104], with the regional tectonics spreading in a NW direction with developing folds and fractures. Complex geological structure and magmatism are dominated by Yanshan period intermediate-acid intrusive rocks, which are widely distributed in the form of the batholith, stock, apophysis and veins [104] (Figure 1b).

The Triassic strata are the main stratigraphy for gold deposits. The genesis and spatialtemporal evolution of the intermediate-acidic dike is closely related to gold mineralization in the area, and during the mineralization process, the magmatic rocks not only provide the mineralized material, but also their internal environment is very suitable as an orebearing space, which can be regarded as a significant mineralization indicator for gold mineralization [105].

The spatial distribution of mineral deposits is directly controlled by the geotectonic position in the Xiahe–Hezuo area, which plays a major role in the formation of different types of gold deposits and is the boundary of the belt from a spatial perspective. The most important types of mineral-controlling structures are fractures and folds in this area [106,107]. The main orebodies of the Zaozigou deposit are produced in NE, NW

Copper-tungsten deposit.

gold mineralization [105].

a total area of approximately 2.6 km<sup>2</sup>

and near-SN oriented fracture zones, with two apparent mineralization periods, and postformation fractures have modified and destroyed the orebodies [108–112]. *Minerals* **2022**, *12*, x FOR PEER REVIEW 4 of 32

**Figure 1.** (**a**) Geotectonic location of the study area. NQL: North Qinling tectonic belt; SDS: Shangdan suture zone; CBS: Caibei suture zone; AMS: A'nyemaqen suture zone. (**b**) Geological map of Xiahe–Hezuo area (modified from [104]). 1. Quaternary; 2. Neogene; 3. Cretaceous; 4. Upper Triassic Huari group; 5. Middle-Lower Triassic Daheba group; 6. Lower Triassic Jiangligou group; 7. Lower Triassic Guomugou group; 8. Upper Permian Shiguan group; 9. Lower Permian Daguanshan group; 10. Upper Carboniferous Badu group; 11. Intermediate-acid intrusive rocks; 12. Fracture; 13. Angular unconformity; 14. Gold deposit; 15. Copper deposit; 16. Lead deposit; 17. Antimony deposit; 18. Mercury deposit; 19. Iron deposit; 20. Iron-copper deposit; 21. Copper-molybdenum deposit; 22. **Figure 1.** (**a**) Geotectonic location of the study area. NQL: North Qinling tectonic belt; SDS: Shangdan suture zone; CBS: Caibei suture zone; AMS: A'nyemaqen suture zone. (**b**) Geological map of Xiahe–Hezuo area (modified from [104]). 1. Quaternary; 2. Neogene; 3. Cretaceous; 4. Upper Triassic Huari group; 5. Middle-Lower Triassic Daheba group; 6. Lower Triassic Jiangligou group; 7. Lower Triassic Guomugou group; 8. Upper Permian Shiguan group; 9. Lower Permian Daguanshan group; 10. Upper Carboniferous Badu group; 11. Intermediate-acid intrusive rocks; 12. Fracture; 13. Angular unconformity; 14. Gold deposit; 15. Copper deposit; 16. Lead deposit; 17. Antimony deposit; 18. Mercury deposit; 19. Iron deposit; 20. Iron-copper deposit; 21. Copper-molybdenum deposit; 22. Copper-tungsten deposit.

The Triassic strata are the main stratigraphy for gold deposits. The genesis and spatial-temporal evolution of the intermediate-acidic dike is closely related to gold mineralization in the area, and during the mineralization process, the magmatic rocks not only The Zaozigou gold deposit is a typical representative of gold deposits associated with intermediate-acid dike rocks in the south of the Xiahe–Hezuo fracture. It is located approximately 9 km southwest of Hezuo city, Gansu Province, with convenient access to the mine site (Figure 2a). The main ore-bearing position is between Gully 1 and Gully 4, with a total area of approximately 2.6 km<sup>2</sup> (Figure 2b).

important types of mineral-controlling structures are fractures and folds in this area [106,107]. The main orebodies of the Zaozigou deposit are produced in NE, NW and near-SN oriented fracture zones, with two apparent mineralization periods, and post-for-

intermediate-acid dike rocks in the south of the Xiahe–Hezuo fracture. It is located approximately 9 km southwest of Hezuo city, Gansu Province, with convenient access to the mine site (Figure 2a). The main ore-bearing position is between Gully 1 and Gully 4, with

(Figure 2b).

mation fractures have modified and destroyed the orebodies [108–112].

The spatial distribution of mineral deposits is directly controlled by the geotectonic

The Zaozigou gold deposit is a typical representative of gold deposits associated with

provide the mineralized material, but also their internal environment is very suitable as an ore-bearing space, which can be regarded as a significant mineralization indicator for

7. Drilling Hole.

zation within the deposit [118].

eastern and western ore groups.

**Figure 2.** (**a**) Regional location of study area. QLA = Qilian tectonic belt; WQL: West Qinling tectonic belt; NQL: North Qaidam tectonic belt; SF1: Wushan–Tianshui–Shangdan suture zone; SF2: Maqu– Nanping–Lueyang suture zone; SG: Songpan–Ganzi tectonic belt; YZ: Yangtze block. (**b**) Geological map of Zaozigou gold deposit (modified from [113]). 1. Quartz diorite porphyrite; 2. Granodioriteporphyry; 3. Plagioclase granite porphyry; 4. Gold orebody; 5. Fracture; 6. Mineral exploration line; **Figure 2.** (**a**) Regional location of study area. QLA = Qilian tectonic belt; WQL: West Qinling tectonic belt; NQL: North Qaidam tectonic belt; SF1: Wushan–Tianshui–Shangdan suture zone; SF2: Maqu– Nanping–Lueyang suture zone; SG: Songpan–Ganzi tectonic belt; YZ: Yangtze block. (**b**) Geological map of Zaozigou gold deposit (modified from [113]). 1. Quartz diorite porphyrite; 2. Granodioriteporphyry; 3. Plagioclase granite porphyry; 4. Gold orebody; 5. Fracture; 6. Mineral exploration line; 7. Drilling Hole.

There are only Triassic and Quaternary strata exposed in the Zaozigou gold deposit (Figure 2b). The strata of the area are mainly the lower part of the Gulangdi group (T2*g* 1 ) of the Middle Triassic formation, and the Quaternary (Q*<sup>h</sup> alp*) is developed in the intermountain valley. The lower part of the Gulangdi group (T2*g* 1 ) is the main ore-bearing rock, which is a set of fine clastic rocks consisting of siliceous slate, clastic feldspathic fine sandstone with interbedded siltstone slate and argillaceous slate. The formation is composed of a large sedimentary cycle from bottom to top consisting of siltstone → argillaceous There are only Triassic and Quaternary strata exposed in the Zaozigou gold deposit (Figure 2b). The strata of the area are mainly the lower part of the Gulangdi group (T2*g* 1 ) of the Middle Triassic formation, and the Quaternary (Q*<sup>h</sup> alp*) is developed in the intermountain valley. The lower part of the Gulangdi group (T2*g* 1 ) is the main ore-bearing rock, which is a set of fine clastic rocks consisting of siliceous slate, clastic feldspathic fine sandstone with interbedded siltstone slate and argillaceous slate. The formation is composed of a large sedimentary cycle from bottom to top consisting of siltstone → argillaceous slate → calcareous slate [108,114].

slate → calcareous slate [108,114]. The fractures are well developed and have a complex morphology, forming an intersecting trend, indicating that the area has experienced multiple phases of geological activity. Fractures strictly control the distribution of orebodies and vein rocks within the deposit. They can be classified into five groups of orientations, namely NW, N-S, NE, E-The fractures are well developed and have a complex morphology, forming an intersecting trend, indicating that the area has experienced multiple phases of geological activity. Fractures strictly control the distribution of orebodies and vein rocks within the deposit. They can be classified into five groups of orientations, namely NW, N-S, NE, E-W and NNE [115,116]. Fracture structures are the structural surfaces of mineralization and these structural surfaces control the spreading characteristics of the orebodies [117].

W and NNE [115,116]. Fracture structures are the structural surfaces of mineralization and these structural surfaces control the spreading characteristics of the orebodies [117]. Intermediate-acid dike, including fine crystalline diorite, diorite porphyrite, biotite dioritic porphyrite and quartz diorite porphyrite densely produced with a porphyritic structure, spreads in NNE direction, turning to a nearly N-S direction in the western part Intermediate-acid dike, including fine crystalline diorite, diorite porphyrite, biotite dioritic porphyrite and quartz diorite porphyrite densely produced with a porphyritic structure, spreads in NNE direction, turning to a nearly N-S direction in the western part of the deposit and a few NW directions. Influenced by the regional multi-period deep fracture activity, the multi-period magmatism has overlapped on multi-phases mineralization within the deposit [118].

There are 147 gold orebodies that have been found in the Zaozigou gold deposit, of which 17 orebodies are main orebodies with gold reserves greater than 1 tonne and the total gold reserves amount to more than 100 tonnes [113]. According to the spatial distribution and combination of the mineralization, the Zaozigou deposit can be divided into

of the deposit and a few NW directions. Influenced by the regional multi-period deep

There are 147 gold orebodies that have been found in the Zaozigou gold deposit, of which 17 orebodies are main orebodies with gold reserves greater than 1 tonne and the total gold reserves amount to more than 100 tonnes [113]. According to the spatial distribution and combination of the mineralization, the Zaozigou deposit can be divided into eastern and western ore groups. *Minerals* **2022**, *12*, x FOR PEER REVIEW 6 of 32 The eastern ore group is mainly located between Gully 1 and Gully 3 with the strike

The eastern ore group is mainly located between Gully 1 and Gully 3 with the strike of NE orientation, containing Au1 controlled by F24, Au9 controlled by F21 and Au15 controlled by F25. These orebodies extend over 1000m long and 300m wide, with a NW direction tendency and steep dip near the ground surface, locally nearly upright. In the deep, these orebodies have been staggered by a gently dipping fracture, causing the tendency to change to a SE orientation (Figure 2b). In addition, orebodies M4 and M6 are laying underground, with the strike of NWW orientation, the tendency of SW orientation and a dip of 8◦~26◦ . These orebodies cross the NE-striking orebodies obliquely, staggering them, and their own mineralization behavior occurs simultaneously [119]. of NE orientation, containing Au1 controlled by F24, Au9 controlled by F21 and Au15 controlled by F25. These orebodies extend over 1000m long and 300m wide, with a NW direction tendency and steep dip near the ground surface, locally nearly upright. In the deep, these orebodies have been staggered by a gently dipping fracture, causing the tendency to change to a SE orientation (Figure 2b). In addition, orebodies M4 and M6 are laying underground, with the strike of NWW orientation, the tendency of SW orientation and a dip of 8°~26°. These orebodies cross the NE-striking orebodies obliquely, staggering them, and their own mineralization behavior occurs simultaneously [119]. The western ore group is mainly distributed between Gully 3 and Gully 4, spreading

The western ore group is mainly distributed between Gully 3 and Gully 4, spreading in a nearly N-S direction, with Au29~Au31 as the main orebodies. The orebodies extend over 1000 m long and are wider than 500 m, with a strike of 350◦~10◦ , varying tendency and dips greater than 75◦ , locally subvertical. These orebodies extend, and the mineralization is weaker in the steeper parts and stronger in the shallower parts. in a nearly N-S direction, with Au29~Au31 as the main orebodies. The orebodies extend over 1000 m long and are wider than 500 m, with a strike of 350°~10°, varying tendency and dips greater than 75°, locally subvertical. These orebodies extend, and the mineralization is weaker in the steeper parts and stronger in the shallower parts.

#### *2.2. Datesets Description 2.2. Datesets Description* Historical geological and geochemical data were completed, including geological re-

Historical geological and geochemical data were completed, including geological reports, geological exploration maps, drills geochemical data, etc., from the Development Research Center of China Geological Survey and No. 3 Geological and Mineral Exploration team, Gansu Provincial Bureau of Geology and Mineral Exploration and Development; the coordinate system used in the mine-scale is Gaussian Kruger projection coordinates. ports, geological exploration maps, drills geochemical data, etc., from the Development Research Center of China Geological Survey and No. 3 Geological and Mineral Exploration team, Gansu Provincial Bureau of Geology and Mineral Exploration and Development; the coordinate system used in the mine-scale is Gaussian Kruger projection coordinates.

This study collects data from 72 drillings in the Zaozigou gold deposit and establishes a drilling location database, an assay database, an inclinometry database and a lithology database. The 3D model of drillings is constructed based on the drill hole data database (Figure 3). This study collects data from 72 drillings in the Zaozigou gold deposit and establishes a drilling location database, an assay database, an inclinometry database and a lithology database. The 3D model of drillings is constructed based on the drill hole data database (Figure 3).

**Figure 3.** 3D model of Drillings in Zaozigou gold deposit. (**a**) Plan view of drillings distribution. (**b**) Side view of drillings distribution. (**c**) Sample grade distribution of drillings. **Figure 3.** 3D model of Drillings in Zaozigou gold deposit. (**a**) Plan view of drillings distribution. (**b**) Side view of drillings distribution. (**c**) Sample grade distribution of drillings.

The primary geochemical halo data from the drillings are collected from the "Zaozigou gold successive resources exploration project in Hezuo city, Gansu Province", with a

The primary geochemical halo data from the drillings are collected from the "Zaozigou gold successive resources exploration project in Hezuo city, Gansu Province", with a total of 72 drillings and 5028 samples with 12 elements of Ag, As, Au, Cu, Hg, Pb, Zn, Sb, W, Bi, Co and Mo (the element detection methods can be seen in the literature [113]). The sampling method was the continuous picking method with sample intervals generally within 10m. Some orebodies or strongly altered areas were sampled at decreased intervals.

#### **3. Methodology**

#### *3.1. Concentration-Volume (C-V) Model*

Geochemical anomalies are a concept relative to geochemical background, and the multifractal approach provides an effective tool to separate anomalies from the background. For hydrothermal mineralization processes, multi-phase mineralization is common, which will result in multi-phase superposed element spatial distribution [38,120,121]. The C-V model can process the nonlinear primary geochemical halo data with the following equation:

$$V(\rho < v) \propto k v^{-D\_1} \ V(\rho \ge v) \propto k v^{-D\_2} \tag{1}$$

where *V*(*ρ* < *v*) and *V*(*ρ* ≥ *v*) are the volumes of the element content less than *v* and greater than *v*; *D*<sup>1</sup> and *D*<sup>2</sup> are the fractal dimension values, also called the singularity index; *k* is a constant coefficient, which can be calculated by the least square method; *v* is the threshold value of element contents, and several element content intervals are divided by *v*. The curves *V*(*ρ* < *v*) and *V*(*ρ* ≥ *v*) of the volumes corresponding to different *v* follow a power-law relationship. Taking the natural logarithm of both sides of the equation, the *lg-lg* plots have a linear relationship in different *v* intervals. The fractal dimensions in different *v* intervals can be calculated by the least square method. The geochemical anomalies and backgrounds can be extracted by different fractal dimensions [39].

In practice, the C-V model is used on the primary geochemical halo data volume model for anomaly identification, where the threshold of fractal dimensions may indicate the boundary between different mineralization.

## *3.2. Compositional Data Analysis*

Geochemical data, as typical of compositional data, should be properly transformed before data analysis [122–125] to eliminate the effects of "closure effects". The "opened (transformed)" data can often be analyzed by classical statistical methods to obtain performance improvements [36,53,61,122].

#### 3.2.1. Central Log-Ratio Transformation (clr)

The calculation of this method is: (i) calculate the geometric mean of all compositional subvectors; (ii) divide each subvector by the geometric mean separately; (iii) take the natural logarithm. Its calculation formula is shown as follows.

$$\text{clr}(\mathbf{x}) = \left[ \ln \frac{\mathbf{x}\_1}{\sqrt[D]{\prod\_{i=1}^D \mathbf{x}\_i}}, \ln \frac{\mathbf{x}\_2}{\sqrt[D]{\prod\_{i=1}^D \mathbf{x}\_i}}, \dots, \ln \frac{\mathbf{x}\_D}{\sqrt[D]{\prod\_{i=1}^D \mathbf{x}\_i}} \right] \tag{2}$$

#### 3.2.2. Sequential Binary Partition (SBP)

The common isometric log-ratio transformation (ilr) is difficult to interpret. Egozcue [50] proposed the sequential binary partition (SBP) technique based on the ilr transformation, which can provide geochemical interpretation reasonably [66,123,124].

The sequential binary partition technique performs non-overlapping dichotomous classification continuously by the relative information between variables. In practice, positive (+) and negative (−) signs are used to represent two different classifications of compositional variables, and '0' is used to represent the unconcerned variables in one time. By performing continuous non-overlapping dichotomy in a simplex space, a basis vector is formed and then transformed. The results, called compositional balance, can be

geologically interpreted according to dichotomy clusters. Especially, the technique provides an important tool to identify element associations [66,123,124,126]. clr and SBP have their advantages in the geochemical associations' extraction, especially in inferring lithology, faults, alteration and mineralization, and the corresponding frame-

*Minerals* **2022**, *12*, x FOR PEER REVIEW 8 of 32

The relevant formula is [50]:

 

+

(∏<sup>+</sup> ) 1 

(∏<sup>−</sup>

) 1 

where denotes the new compositional vector defined by the standard orthogonal ba-

3.2.3. Geochemical Compositional Data Analysis Framework Based on CoDA

is the product of the variables labeled (+) involved in the *i*th binary partition

is the product of the variables labeled (−) involved in the *i*th binary partition.

The log-ratio transformation of geochemical data can solve the problem of the "clo-

sure effect" caused by the lack of scale consistency in the covariance matrix of the compositional data. The clr-biplot and compositional balance methods developed based on the

= 1,2, . . . , − 1; = 1,2, . . . , ; = 1,2, . . . , (3)

= √

sis, ∏<sup>+</sup>

and ∏<sup>−</sup>

The relevant formula is [50]: works have been well applied [65,66,124].

$$B\_{i} = \sqrt{\frac{r\_{i}s\_{i}}{r\_{i} + s\_{i}}} \ln \frac{\left(\prod\_{+} x\_{j}\right)^{\frac{1}{r}}}{\left(\prod\_{-} x\_{k}\right)^{\frac{1}{s}}} \\ i = 1, 2, \dots, D-1; \; j = 1, 2, \dots, r; \; k = 1, 2, \dots, s \\ \tag{3}$$

where *B<sup>i</sup>* denotes the new compositional vector defined by the standard orthogonal basis, ∏<sup>+</sup> *x<sup>j</sup>* is the product of the variables labeled (+) involved in the *i*th binary partition and <sup>∏</sup><sup>−</sup> *<sup>x</sup><sup>k</sup>* is the product of the variables labeled (−) involved in the *i*th binary partition. to gain the element associations, while the knowledge-driven framework is based on geological and geochemical understanding to design the element associations (Figure 4).

#### 3.2.3. Geochemical Compositional Data Analysis Framework Based on CoDA In Figure 4, the closure and centering are necessary preprocess steps, which can can

The log-ratio transformation of geochemical data can solve the problem of the "closure effect" caused by the lack of scale consistency in the covariance matrix of the compositional data. The clr-biplot and compositional balance methods developed based on the clr and SBP have their advantages in the geochemical associations' extraction, especially in inferring lithology, faults, alteration and mineralization, and the corresponding frameworks have been well applied [65,66,124]. be seen in the literature [51]. The data-driven framework in this study used clr transformation to "open" geochemical data, and the factor analysis and correlativity methods are used to explore the relationship among elements and extract element associations. Meanwhile, the knowledge-driven framework designs the element associations through deep

This study uses a data-driven and knowledge-driven framework of compositional data analysis to identify the geochemical associations of the primary halo. The data-driven framework infers the data characteristics by measuring elemental statistical correlations to gain the element associations, while the knowledge-driven framework is based on geological and geochemical understanding to design the element associations (Figure 4). research of the geological features, such as the element concentrations in different rocks, the primary geochemical halo associations, and so on. Then, the SBP is performed to quantitatively extract the value of corresponding geochemical associations. Eventually, the results of the CoDA can be employed to infer metallogenic information in an unknown area.

**Figure 4.** Geochemical CoDA framework. **Figure 4.** Geochemical CoDA framework.

*3.3. Machine Learning-Based Quantitative Mineral Prediction Methods* Machine learning algorithms could highlight hidden details in datasets without ex-In Figure 4, the closure and centering are necessary preprocess steps, which can can be seen in the literature [51]. The data-driven framework in this study used clr transformation to "open" geochemical data, and the factor analysis and correlativity methods are used to explore the relationship among elements and extract element associations. Meanwhile, the

plicit search and have the ability to identify complex spatial patterns [127]. Given this, scholars have attempted to use machine learning algorithms to extract mineralization knowledge-driven framework designs the element associations through deep research of the geological features, such as the element concentrations in different rocks, the primary geochemical halo associations, and so on. Then, the SBP is performed to quantitatively extract the value of corresponding geochemical associations. Eventually, the results of the CoDA can be employed to infer metallogenic information in an unknown area.

#### *3.3. Machine Learning-Based Quantitative Mineral Prediction Methods*

Machine learning algorithms could highlight hidden details in datasets without explicit search and have the ability to identify complex spatial patterns [127]. Given this, scholars have attempted to use machine learning algorithms to extract mineralization information by integrating multi-source geosciences data for identifying mineralization-related anomalies and their variation that cannot be detected by traditional methods [128–135]. Machine learning algorithms can be roughly categorized as supervised learning algorithms and unsupervised algorithms. For better validating the suitability of 3D MPM, this study discusses the application of the supervised algorithm MaxEnt model and unsupervised algorithm GMM.

## 3.3.1. MaxEnt Model

The principle of MaxEnt is a criterion of probabilistic model learning for making predictions based on incomplete information. The main idea is that, when predicting the probability distribution of a random event, all known constraints need to be satisfied without subjective assumptions so that the probability distribution is most uniform, the prediction risk is minimal and the entropy value of the probability distribution is maximum [136].

Let *T* = {(*x*1, *y*1),(*x*2, *y*2), . . .(*xn*, *yn*} be the training dataset and *fi*(*x*, *y*), *i* = 1, 2, . . . , *n* be the eigenfunction, and the learning process of the MaxEnt model is equivalent to the constrained optimization problem:

$$\begin{aligned} \max\_{P \in \mathcal{C}} H(P) &= -\sum\_{\mathbf{x}, \mathbf{y}} \bar{P}(\mathbf{x}) P(\mathbf{y}|\mathbf{x}) \ln(P(\mathbf{y}|\mathbf{x})) \\ \text{s.t.} & E\_P(f\_i) = E\_{\tilde{P}}(f\_i) \, i = 1, 2, \dots, n \\ & \sum\_{\mathbf{y}} P(\mathbf{y}|\mathbf{x}) = 1 \end{aligned} \tag{4}$$

where *<sup>E</sup>P*e(*fi*) <sup>=</sup> <sup>∑</sup>*x*,*<sup>y</sup> <sup>P</sup>*e(*x*, *<sup>y</sup>*)*fi*(*x*, *<sup>y</sup>*) is the expected value of n eigenfunction *<sup>f</sup>i*(*x*, *<sup>y</sup>*) related to the empirical distribution *<sup>P</sup>*e(*x*, *<sup>y</sup>*); *<sup>E</sup>P*e(*fi*) <sup>=</sup> <sup>∑</sup>*x*,*<sup>y</sup> <sup>P</sup>*e(*x*)*P*(*y*|*<sup>x</sup>* )*fi*(*x*, *<sup>y</sup>*) is the expected value of n eigenfunctions *fi*(*x*, *y*) related to the *P*(*Y*|*X* ) with the empirical distribution *P*e(*X*).

Following the custom of optimization problems, the problem of finding the maximum value is rewritten as the equivalent problem of finding the minimum value:

$$\begin{aligned} \min\_{P \in \mathbb{C}} -H(P) &= \sum\_{\mathbf{x}, \mathbf{y}} P(\mathbf{x}) P(\mathbf{y}|\mathbf{x}) \ln(P(\mathbf{y}|\mathbf{x})) \\ \text{s.t.} & E\_P(f\_i) - E\_{\tilde{P}}(f\_i) = 0 \, i = 1, 2, \dots, n \\ & \sum\_{\mathbf{y}} P(\mathbf{y}|\mathbf{x}) = 1 \end{aligned} \tag{5}$$

The solution resulting from solving the constrained optimization problem is the solution learned by the MaxEnt model. However, the empirical distribution expectation is usually not equal to the true expectation but will be approximate to the true expectation. If the solution is solved strictly according to the above constraints, it is easy to cause overfitting of the training data during the learning process. Therefore, the constraints can be appropriately relaxed in practice, such as replacing the above equation with *<sup>E</sup>P*(*fi*) <sup>−</sup> *<sup>E</sup>P*e(*fi*) <sup>≤</sup> *<sup>β</sup><sup>i</sup>* , *β<sup>i</sup>* is the modulation multiplier, which is a constant.

#### 3.3.2. Gaussian Mixture Model (GMM)

GMM is a quantified model which generated through Gaussian probability density function fitting. This model decomposes objective distribution into several Gaussian probability density functions. By using enough Gaussian functions and tuning the parameters, the model can generate a very complex probability density to approximate to almost any continuous probability. Theoretically, any objective distribution can be fitted by a combination of multiple Gaussian density functions, and the higher the number of probability density sub-functions, the more closely it approximates to the actual data distribution. GMM has the advantages of good flexibility, is not limited by the sample size and can accurately describe the data structure.

When the dataset *X* = (*X*1, *X*2, . . . , *XT*) of the training data can be divided into *k* classes and each class obeys Gaussian distribution, the k order probability distribution of the GMM is

$$P(X|\theta \mid) = \sum\_{i=1}^{k} w\_i P(X|\theta\_i \mid) \tag{6}$$

$$P(X|\theta\_i) = \frac{1}{(2\pi)^{\frac{D}{2}} |\sum\_{i}|^{\frac{1}{2}}} \exp\left(-\frac{1}{2}(X-\mu\_i)^T \sum\_{i}^{-1} (X-\mu\_i)\right) \tag{7}$$

$$\sum\_{i=1}^{k} w\_i = 1, \ w\_i > 0 \tag{8}$$

where *P*(*X*|*θi*) is the probability density of the *i*th Gaussian model, *w<sup>i</sup>* is the weight of the *i*th model in the whole GMM, *µ<sup>i</sup>* is the mean vector and ∑*<sup>i</sup>* is the covariance matrix. After the parameter initialization, the Expectation-Maximization (EM) algorithm based on the maximum likelihood estimation is often chosen for the parameter estimation of the GMM [137].

In the geological point of view, the GMM is an unsupervised machine learning algorithm; it divides the data into two categories (mineralization and non-mineralization) and then makes category judgments on samples by learning information about unlabeled samples.

As for the dataset *D* = {*x*1, *x*2, . . . , *xm*}, which contains *k* Gaussian mixture components, the algorithm steps are as follows.

