*Article* **Delineation and Analysis of Regional Geochemical Anomaly Using the Object-Oriented Paradigm and Deep Graph Learning—A Case Study in Southeastern Inner Mongolia, North China**

**Bo Zhao 1, Dehui Zhang 2,\*, Rongzhen Zhang 2,3,\*, Zhu Li 2,4, Panpan Tang <sup>1</sup> and Haoming Wan <sup>1</sup>**


**Abstract:** This research describes an advanced workflow of an object-based geochemical graph learning approach, termed OGE, which includes five key steps: (1) conduct the mean removal operation on the multi-elemental geochemical data and then normalize them; (2) data gridding and multiresolution segmentation; (3) calculate the Moran's I value and construct the geochemical topology graph; (4) unsupervised deep graph learning; (5) the within-object statistical analysis. The final product of OGE is an object-based anomaly score map. The performance of OGE was demonstrated by a case study involving eighteen ore-forming elements (Cu, Pb, Zn, W, Sn, Mo, F, Au, Fe2O3, etc.) in stream sediment samples in the Bayantala-Mingantu district, North China. The results showed that the OGE analysis performed at lower levels of scale greatly improved the quality of anomaly recognition: more than 80% of the known ore spots, no matter what their scales and mineral species, were predicted in less than 45% of the study area, and most of the ore spots falling outside the delineated anomalous regions occur nearby them. OGE can extract both the spatial features and compositional relationships of geochemical variables collected at irregularly distributed centroids in irregularly shaped image objects, and it outperforms other convolutional autoencoder models such as GAUGE in anomaly detection.

**Keywords:** object-based image analysis; graph neural network; geochemical anomalies; fractal dimension

## **1. Introduction**

Regional geochemical surveys are an important part of geoscience investigations in both environmental and mineral exploration studies [1]. By discrete data gridding, the original X-Y-Z (longitude–latitude-element content) regional geochemical data can be converted into a big matrix, which is analogous to a remote sensing (RS) image: e.g., a matrix element vs. an image pixel, element content vs. the reflectance value or digital number [2], a multi-element dataset vs. a multiband image, delineating geochemical anomalies vs. classifying ground objects, etc. Intuitively, some algorithms developed for processing the remotely sensed data may be equally applicable to geochemical data analysis. The object-based image analysis (OBIA) [3] is such an example.

In the RS field, there has been a very rapid growth in the use of OBIA ever since its documented introduction in late 1990s [3]. The first step of this method is the grouping of spatially contiguous pixels with similar spectral/textural characteristics into meaningful image objects, which is a process often termed "segmentation". An object is usually composed of a group of pixels that are spatially continuous and have high spectral homogeneity, and the shape of an adaptive object is usually consistent with the shape of a

**Citation:** Zhao, B.; Zhang, D.; Zhang, R.; Li, Z.; Tang, P.; Wan, H. Delineation and Analysis of Regional Geochemical Anomaly Using the Object-Oriented Paradigm and Deep Graph Learning—A Case Study in Southeastern Inner Mongolia, North China. *Appl. Sci.* **2022**, *12*, 10029. https://doi.org/10.3390/ app121910029

Academic Editors: Zeming Shi and Qingjie Gong

Received: 1 September 2022 Accepted: 3 October 2022 Published: 6 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

ground target [4]. Once the objects are formed, the next step is to assign appropriate labels to them by using a supervised or unsupervised classifier [5]. The object-based approach is a widely accepted solution to process high spatial resolution RS data [6], and obviously, it applies as well to analyzing regional geochemical dataset if the X-Y-Z matrix is plotted as an image. However, current geochemical data analysis conducted in the literature is still on a per-point basis [7], and every sampling point (matrix element) is treated as an individual unit and is processed without any textural, contextual, neighborhood, and shape consideration. Therefore, except identification and separation of the anomalies from background, traditional geochemical analysis rarely contributes new insight [7–9]. In practice, the geological object, e.g., sedimentary stratum, tectonic line, igneous intrusion, etc., is usually the minimum mapping unit required for regional geological analysis [9]. Likewise, it is feasible and better to use the geochemical objects produced by OBIA to do regional geochemical analysis.

According to [10], OBIA can be carried out in two ways: (1) the object-level analysis: most of the current object-oriented analysis approaches rely on object-level summary statistics, such as median, mean, and standard deviation of the pixel values within each segment. Such summary measures provide one value per band for each segment to describe its data central tendency or dispersion [11]. (2) The within-object analysis. The pixel values within an object usually have their own distribution pattern—unimodal Gaussianity or multiple peaks [3], and it reflects the internal structure characteristics of the object. In theory, these data-analysis/-mining techniques apply as well to analyzing regional geochemical datasets characterized by multi-source, multi-stage and multi-genesis [7], and a large amount of geological information can thus be rediscovered, not just anomaly and background. Moreover, OBIA is often performed at multiple levels of scale to capture variably sized image objects, which may help with discriminating between classes that are differentiable by size (for example buildings versus forests) [12]. At a given scale parameter, homogeneous areas (background) result in larger objects, and heterogeneous areas (targets/anomalies) result in smaller objects. The larger the segmentation scale, the larger the segmented objects grow, and vice versa. Obviously, such a multi-scale design will help to establish a better understanding of the regional geochemical processes at different scales.

In OBIA-based image analysis, the standard practice is to conduct "multiresolution segmentation + machine learning-based classification" [5]. Actually, over the last few years, machine learning techniques have become an essential tool to advance different branches of science and engineering, including geochemical anomaly recognition. For example, a hybrid machine learning method was proposed in [13], which is based on combining K-Nearest Neighbor Regression and Random Forest Regression to predict Pb and Zn grades in the Irankuh Mining District (IMD), Iran. Reference [14] goes deeper: it trained four regression machine learning algorithms, i.e., K neighbor regressor, support vector regressor, gradient boosting regressor, and random forest regressor, to build a hybrid model to predict Pb and Zn grades of IMD. After that, the multifractal model [15] was used to classify Pb-Zn anomalies. Despite the success of the examples in the literature, few studies have explored the advantages of introducing traditional machine learning algorithms into OBIA-based geochemical prospecting.

At the same time, over past years, deep learning has yielded impressive results in image analysis, e.g., the land cover mapping. In particular, neural networks with FCN (fully convolution networks) structures are widely used in image analysis tasks [16]. In recent years, there has become an increased focus on the combination of OBIA and FCN. For example, in [17], the object-based convolutional neural network (OCNN) was proposed for urban land use classification using high-resolution RS images. Rather than pixel-wise convolutional processes, OCNNs rely on segmented objects as their functional units, and then, FCNs were used to analyze and label objects such as to partition within-object and between-object variation [10]. Reference [18] presented a multiscale OCNN framework for large-scale RS land cover classification, in which FCNs pretrained at multiple scales were

applied for final classification. Additional examples which adopt OCNN as the basic image analyzer can be found in [19–22]. Although OCNN approaches have achieved astonishing performance, they require an enormous amount of training data [23]. In fact, supervised OCNN are unsuitable for regional geochemical exploration because we cannot have a priori knowledge to make per-point or per-object annotations for multi-element geochemical data. Unsupervised learning methods, on the other hand, do not rely on labeled samples, and have become the mainstream methodology in geochemical anomaly recognition [24,25]. GAUGE (i.e., recognition of Geochemical Anomalies Using Graph Learning) [24], One-class Support Vector Machine [25], and Autoencoders [26] are recent examples. However, there is still a research gap regarding the combination of OBIA and unsupervised deep learning, and we wonder if such a combination could produce even more exploration geochemical information and provide new insights into regional geologic process.

This study mainly follows the idea of GAUGE [24]. GAUGE is a three-step procedure including: (1) building up a multi-elemental geochemical topology graph at a group of randomly located sampling points. (2) constructing a symmetric GAT (Graph Attention Network) autoencoder, namely an attributed graph encoder combined with an attributed reconstruction decoder [27]. The former is used to model both the spatial structure and compositional relationships of geochemical variables simultaneously, and the latter is responsible for reconstructing the input variables with the obtained node embeddings. (3) conduct the anomaly detection. To be specific, the Euclidean distance between pairs of original multi-elemental content values and the reconstructed background values at each sampling point are calculated as the anomaly score, and on this basis, the anomaly map is created [24]. Inspired by the success of GAUGE, in this article we present a novel object-based graph learning architecture (OGE) for geochemical anomaly recognition and analysis, and there are three main differences between GAUGE and OGE: (1) the OGE approach involves a three-step pre-processing procedure, which is (i) grid the multi-element geochemical sampling data, and generate a multi-channel image in "Geo-Tiff" format; (ii) segment the original image into homogeneous objects; and (iii) extract the centroids of each object, and transform them into topology graphs according to their adjacency relations, while GAUGE does not. (2) The autoencoder of OGE comprises a GAT-based encoder and a GCN (Graph Convolution Network)-based decoder [28], while GAUGE involves a symmetric GAT-based autoencoder. (3) the OGE approach involves post-processing and post-analysis. For example, the nodal anomaly scores must be assigned back to the corresponding image objects as the summary statistics. And then, the within-object anomaly analysis can be conducted. To the best of our knowledge, we are the first to introduce in the OBIA tasks the unsupervised GNN (Graph Neural Network).

The remainder of this work is organized as follows: Section 2 reviewed the geological settings of the study area and described the data materials; Section 2 also gave the detailed methodology of OGE; Section 3 described and analyzed the experiment results; in Section 4, we discussed the implications and extensions of the results of Section 3; and we concluded the research in Section 5.

## **2. Materials and Methods**

## *2.1. Materials*

## 2.1.1. Geological Settings

The study area is located in Eastern Inner Mongolia, North China, and its regional geology is shown in Figure 1. It can be seen from Figure 1 that: (1) the study area is heavily covered with Neogene and Quaternary sedimentary layers [29], especially N2b red purple variegated mudstone and clay rock. In addition, J3mk (acid volcanic rocks), J3mn (tuff), P1sm (limestone, tuff, greywacke, and fine sandstone), and P1e (andesite) are also sparsely distributed. (2) The igneous intrusive rocks are widely developed in the study area, dominated by intermediate and acidic granitoids, and their rock-forming ages extend mainly from Permian to Cretaceous, with a few Neoarchean and Paleozoic intrusive rocks. Note that there are both I-type (formed by the igneous) and S-type (formed

by melting of metasedimentary rocks) granitoids in this area. Therein, S-type magmas are in general more evolved chemically than I-types, are relatively reduced, and when highly fractionated, are related to Sn-W mineralization; I-types are opposite and are related to Cu-Mo mineralization [30]. (3) The geological structure in this area is very complex, including NE- and NW-trending fault zones, NE-trending folds (anticline and syncline), etc. However, due to the large magmatic events and the widely distributed Cenozoic sedimentary layers, it is difficult to identify the characteristics of geological structure occurring before the Mesozoic.

**Figure 1.** Generalized geologic map of the study area. Note: 1. Quaternary; 2. Baogedawula Formation, Pliocene Series, Neogene; 3. Manitu Formation, Upper Jurassic Series; 4. Manketouebo Fromation, Upper Jurassic Series; 5. Sanmianjing Formation, Lower Permian Series; 6. Elitu Formation, Upper Permian Series; 7. Early Cretaceous alkali-feldspar granite; 8. Early Cretaceous syenogranite; 9. Early Cretaceous granite porphyry; 10. Late Permian granodiorite; 11. Middle Permian monzonitic granite; 12. Middle Permian quartz diorite; 13. Early Permian quartz diorite; 14. Early Permian monzonitic granite; 15. Late Ordovician-Middle Silurian monzonitic granite; 16. Neo-Archean Liushugou Formation; 17. fault zone; 18. syncline; 19. anticline; 20. W-Nb-Ta-F deposit; 21. U-mineralized deposit. The study area is located in Inner Mongolia, North China, as shown in the overview.

## 2.1.2. Data Materials

We collected a set of 1:200,000 stream sediments survey data matching the study area shown in Figure 1, which were provided by China Geological Survey. As the study area is situated in arid and semiarid regions, the sampling media include soil and stream sediments (10–40 mesh). The surface organic layer was avoided during the sampling process. There were 0.25 to 1 sampling points per km2, and samples collected within 4 km2 were combined into one composite testing sample. If the stream-sediment samples cannot be easily collected, the soil samples will be used instead. The sample collection, preparation, storage, chemical tests, quality control and data preprocessing adhered strictly to the "criterion of regional geochemical exploration" released by the Ministry of Land and Resources of China in 2006. For example, the general quality control flow involved the following steps: (i) analyses of the Geochemical-Standard-Drainage sediment samples; (ii) analyses of the coded Geochemical-Reference-Drainage sediment samples developed

by the provincial laboratory; (iii) duplicate analyses of the randomly selected and coded samples; (iv) rechecked analyses of the samples with anomalous values; and (v) rechecked analyses conducted by external laboratories, if necessary.

Finally, we got 1325 composite samples, and 39 elements were accurately analyzed by multiple methods, including 32 elements and 7 oxides. In this article, we mainly focus on eighteen main ore-forming elements, which are Ag, As, Au, B, Be, Bi, Cu, F, Hg, Mo, Nb, Pb, Sb, Sn, U, W, Zn, and Fe2O3. Elemental contents of each composite sample are determined mainly by the ICP-MS (i.e., inductively coupled plasma mass spectrometry) method [7]. The statistical parameters of them are given in Table 1.


**Table 1.** Statistical parameters of 17 main ore-forming elements.

CV: coefficient of variation; Unit of elemental concentration: Ag/Au/Hg: 10−9; Fe2O3: %, others: 10−6.

Previous studies [29] have indicated that the mineralization of Nb, Ta, Li, Be, REE, U, etc. in the study area is most related to highly fractionated peraluminous granitoids (S-type); while the mineralization of W, Pb, Zn, Ag, fluorite, tourmaline (B), Au, Cu, etc. is mainly of quartz-vein type, often occurring within the syenogranitic wall rocks (highly fractionated I-type) surrounding the peraluminous granitoids. Based on these facts and according to the metallogenic specialization of granitoids [31], the relevant ore-forming elements can be divided into two groups: (1) Ag, Au, As, B, Cu, Hg, Mo, Pb, U, Zn, and Fe2O3, which are closely related to magnetite series granites (I-type); and (2) Be, Bi, F, Mo, Nb, Pb, Sb, Sn, U, W, and Fe2O3, which are closely related to ilmenite series granites (S-type) [29]. At the same time, in consideration of the extensive development of granitic complexes in this area and the complex mineral paragenesis [30] such as Cu-Mo versus W-Mo, Pb-Zn versus W, Sn-Pb, U-Pb versus Th-U, and pyrite versus Fe2O3, we make Mo, Pb, U, and Fe2O3 appear in both groups.

## *2.2. Methodology*

The proposed algorithm is a complex multi-step procedure, which involves several different methodologies and datasets, and the details of each step are given below. At the center of this algorithm is OGE—a graph network-based autoencoder, and other subalgorithms can be regarded as the pre-processing and post-processing for OGE.

## 2.2.1. Data Pre-Processing

#### (1) Pre-Processing of Original Geochemical Data

First, based on Python, a mean removal operation was conducted on the original X-Y-Z geochemical data of the ore-forming elements to remove the background noise, that is, subtracting the mean values from each value in the X-Y-Z dataset. And then, normalized them into the range [0, 1]. After that, based on Surfer v11.0 and using the Kriging interpolation algorithm, the normalized data of each element were expanded into a matrix of size 405 × 500, and then it was saved as a single-band image in "Info ASCII Grid

(\*.grd)" format, so that it is readable for the eCognition Developer and Python platforms. Finally, these \*.grd files were concatenated along the dimension of channels and saved as a multi-band \*.geotif image, termed Image I.

(2) Multiresolution Segmentation

Using the business image processing software eCognition, the FNEA (Fractal Net Evolution Approach) multiresolution segmentation was conducted on Image I. The scale parameter is empirically set as 3.0, the shape parameter is set as 0.1, and the compactness as 0.5. Finally, we got 663 segmented objects as shown in Figure 2. Note that the Fe2O3-band in Image I was not involved in the segmentation process. This is because: as the only major element (or say rock-forming element), the incorporation of Fe2O3 may misguide FNEA to pay more attention to regional diagenesis, rather than metallogenesis.

**Figure 2.** The multiresolution segmentation result and the centroids within each image object.

(3) Find the Centroid of Each Object

First, draw the minimum enclosing rectangle (MER) for each image object, and the centroids of the MERs will act as the centroids of the corresponding objects, see Figure 2. However, if the centroid falls on or outside the boundary of an image object, it will be recalculated in the following way: (i) generating the skeleton lines for each object; (ii) removing the suspension lines along the main skeleton line; and (iii) finding the midpoint of the main skeleton line, and taking it as the centroid of the corresponding object. Thanks to the skimage.morphology, numpy, and other Python packages, these processing steps can be easily implemented on a computer.

Finally, we note that image segmentation is the process by which an original image is partitioned into some homogeneous regions/objects [6], so the spectral features of the centroids can be used to represent the spectral features of the corresponding image objects. Our proposed algorithm is developed based on this assumption.

## 2.2.2. Constructing the Geochemical Topology Graph

As shown in Figure 2, the geochemical dataset can be collected as a group of centroids in an area. At the suggestion of [24], for regional geochemical exploration, only capturing the multivariable (multi-elemental) geochemical anomalies may be insufficient because their spatial distribution also reflects complex geological processes such as mineralization. To achieve this, the undirected graph (*G*) [24] which is used to connect closely related centroids to represent the spatial structure from point data is constructed as:

$$X = \{ \vec{X\_{1}}, \vec{X\_{2}}, \dots, \vec{X\_{N}} \}, \vec{X\_{i}} \in R^{F} \tag{1}$$

$$A = \{A\_{1,1}, A\_{1,2}, \dots, A\_{N,N}\}, \ A\_{i,j} = \begin{cases} 1, & d\_{i,j} \le K \\ 0, & d\_{i,j} > K \end{cases} \tag{2}$$

where *X* and *A* represent the set of nodes/centroids, and edges, respectively; *N* represents the number of nodes in the graph; *F* denotes the number of features, namely geochemical variables (elements), of each node; *d*i,j denotes the spatial distance between node *j* and *i*; *K* represents the distance threshold when constructing the geochemical topology graph, and generally, the smaller the distance between these two centroids is, the more related they are [24]; 1—connected edge and 0—disconnected.

In our case, *N* = 663; *F* = 11, which means there are 11 I-series or S-series ore-forming elements; and *K* is determined by the global Moran's I analysis according to [24].

Global Moran's I is a measure of the overall clustering of the spatial data. Figure 3 gives the global Moran's I values versus different distance bands of all the geochemical variables. According to [32], if the Moran's I value decreases rapidly as the distance band increases, there is a strong dependency relationship between the spatial structure and distance; on the contrary, if the Moran's I value decreases slowly as the distance band increases, this indicates a stable spatial structure [32]. Obviously, the inflection point of the curve can act as the optimal threshold *K* to balance partial heterogeneity (i.e., anomaly) and the background features. An edge connects two adjacent centroids when the distance between them is less than *K*, and remains disconnected otherwise.

**Figure 3.** Variation in global Moran's I index with different distance bands (namely *K*).

It can be seen from Figure 3 that there is a clear turning point at about 20 km for most of the ore-forming elements. For insurance purposes, here we use *K* = 22 as the distance threshold to construct the geochemical topology graph for subsequent network training and anomaly detection. Figure 4 is the generated topology graph, and each node of it corresponds to *F* = 11 features or geochemical variables [33]. Obviously, such a topology graph integrates both the spatial structural features and multi-elemental concentrations for a group of centroids in a given area.

**Figure 4.** The geochemical topology graph in the study area. Red dot—the graph node; black line—edge.

## 2.2.3. The Graph Network Architecture

Figure 5 gives the pipeline of the OGE architecture. Its basic components include a GAT-dominated encoder, a GCN-dominated decoder, the loss function, and so on. A detailed description of them is provided below.

(1) The GAT-Dominated Encoder.

GAT is one of the most popular GNN (Graph Neural Network) architectures and is considered as the state-of-the-art architecture for representation learning with graphs [33]. As illustrated in Figure 6, GAT is a two-step procedure: first, it calculates the attention coefficients (or say weights) of each graph edge according to the masked attention mechanism; afterwards, it aggregates the neighboring node features according to their weights [27].

**Figure 5.** The overall OGE framework for geochemical anomaly identification. GATv2: Graph Attention Networks (Version 2); GCN: Graph Convolution Networks; D\_in/D\_out: Number of input/output channels.

**Figure 6.** Illustration of the GAT layer (Modified after [24]): (**a**) how to determine the attention coefficient, corresponding to the Formula (3); (**b**) aggregating features of the node 2-based attention mechanism, with neighborhood {1,3,4,7}, corresponding to the Formula 4.

Given an input topology graph *G* (*X*, *A*) containing *N* nodes/centroids, each node has a feature vector <sup>→</sup> *xi* and dimension *F*. The attention coefficients *αij* between nodes *i* and *j* is calculated as: 

$$\begin{aligned} \mathfrak{a}\_{ij} &= \operatorname{softmax} \left( \operatorname{atantion} \left( \left< \boldsymbol{\mathcal{W}} \cdot \stackrel{\rightarrow}{\mathbf{x}}\_{i} \; \; \; \boldsymbol{\mathcal{W}} \cdot \stackrel{\rightarrow}{\mathbf{x}}\_{j} \right> \right) \\ &= \operatorname{softmax} \left( \operatorname{LeakyReLU} \left( \stackrel{\rightarrow}{a}^{T} \left[ \left< \boldsymbol{\mathcal{W}} \cdot \stackrel{\rightarrow}{\mathbf{x}}\_{i} \right| \; \; \; \stackrel{\rightarrow}{\mathbf{W}} \cdot \stackrel{\rightarrow}{\mathbf{x}}\_{j} \right] \right) \right) \end{aligned} \tag{3}$$

where *attention*() represents a single-layer feedforward neural network with a weight vector → *a* ; *W* represents the weight matrix for transforming the input features into higher-level features; *<sup>T</sup>* and || represent the transposition and concatenation operations, respectively; LeakyReLU is a non-linear activation function; and *softmax*() is used to normalize the neighbors of node *i* between 0 and 1.

After that, GAT aggregates each neighboring node and obtains the embedded features as follows:

$$\stackrel{\rightarrow}{\mathbf{x}\_i^{\prime}} = \sigma \left( \sum\_{j \in N\_i} \boldsymbol{\alpha}\_{ij} \cdot \boldsymbol{\mathcal{W}} \cdot \stackrel{\rightarrow}{\mathbf{x}\_j^{\prime}} \right) \stackrel{\rightarrow}{\mathbf{x}\_i^{\prime}} \in \ \mathcal{N}\_i \tag{4}$$

where <sup>→</sup> *xi* and → *xi* represent the input data and embedded features, respectively; *σ*-*sigmoid*() is used to normalize the output between 0 and 1; *N*<sup>i</sup> denotes the neighbors of node *i*. From the geological point of view, the geochemical signatures of mineralization-favored spaces inherited from multiple geochemical processes are often anisotropic [7], and Formula (4) indicates that the anisotropy of the node *i* in terms of geochemical signatures can be characterized by the relationship weights *αij* of node *i*'s neighborhood points {*x*j, *j* = 1, 2, . . . , *N*i}, including itself, see Figure 6b.

Authors in [34] further pointed out that: in GAT, every node attends to its neighbors given its own representation as the query, but for any query, the neighbor-scoring is monotonic with respect to per-node scores. This could be problematic in real-world applications. To remove this limitation, they proposed GATv2 by just modifying the order of operations in GAT. Their experiments demonstrated that GATv2 outperforms GAT in all benchmarks while having the same parametric cost. Inasmuch, as shown in Figure 5, this study used two GATv2 layers to construct the encoder of OGE, and then, the input

geochemical topology graph is learned and mapped to a low-dimensional vector space Z. Our encoder can seamlessly learn the compositional relationships of eleven geochemical variables (i.e., nodal attributes) and their spatial structural features. On the contrary, traditional encoders (backbone networks) based on convolutional layers or self-attention can only process data with regular spatial arrangements, and they cannot be applied to extract features from graph structured data.

## (2) The GCN-Dominated Decoder.

GCN is a type of convolutional neural network that can work directly on graphs and take advantage of their structural information [33]. To put it simply, for each node, first we get the feature information from all its neighbors including itself, and then, we take the average of all its neighbors (assuming we use the *average*() function), after that, these average values will be feed into a fully connected neural network [33]. Note that in practice, we usually use more sophisticated aggregate functions rather than *average*(). In Figure 5, a GCN-based decoder, which is symmetric to the encoder architecture, was proposed to reconstruct the nodal attributes from the encoded latent representations Z. Finally, we use the *sigmoid*() function to restrict the reconstructed background values within the range of [0, 1].

Maybe it is better to use GAT layers to construct the decoder structurally symmetric to the encoder. If so, D\_out of the second GAT layer in the decoder must be divisible by the number of nodal features (11), or say D\_out \* 8 (the number of heads) must be equal to 11 and D\_out must be an integer, but it is obviously not possible in our case. So instead, GCN was applied to obtain the nodal reconstructed features.

## (3) The Loss Function.

Our OGE is an unsupervised machine learning approach, so its loss function does not involve manually annotated labels. At the suggestion of Guan et al. [24], here we use the unsupervised SPRE (i.e., sampling point reconstruction error) function as our loss function, which is formulized as:

$$L\_{SPRE} = \frac{1}{N} \sum\_{i=1}^{N} A\_i \tag{5}$$

$$\mathcal{A}\_{i} = \sqrt{\sum\_{k=1}^{F} \left(\boldsymbol{\mathfrak{x}}\_{i}^{k} - \boldsymbol{\mathfrak{x}}\_{i}^{\prime k}\right)^{2}} \tag{6}$$

where *x<sup>k</sup> <sup>i</sup>* and *<sup>x</sup><sup>k</sup> <sup>i</sup>* represent the *k*th original and reconstructed nodal features of *i*th node, respectively; *N* and *F* denote the number of nodes/centroids and variable categories, respectively; *Ai* denotes the reconstruction error (or say the anomaly score) for each centroid, namely the multivariate Euclidean distance between the original geochemical values and the reconstructed background values.

By minimizing the Formula (5), OGE can continuously learn the spatial structural features and compositional relationship from the input graph structured data by back propagation, and then approximate iteratively the input attributed graph with encoded latent features until the *LSPRE* function converges [24]. After that, we input the original geochemical topology data into the trained model to obtain the reconstructed geochemical background values for each node or centroid. And then, we calculate the reconstruction error (namely *Ai*) for each node. Finally, the geochemical anomaly score map is generated by assigning the nodal reconstruction errors back to the corresponding image objects as the summary statistics. This process is summarized and pictured in Figure 5.

## 2.2.4. Data Post-Processing

After obtaining the object-oriented anomaly score map, we need a binarization threshold defining whether a given image object is anomalous or not. In this article, the threshold value is empirically set as the 50% quantile (median) of the object-oriented anomaly score

matrix, which is 0.089434 for the I-type elemental association, 0.091733 for the S-type association, and 0.12807 for all the ore-forming elements.

In addition, as the anomaly score is an object-level summary statistics, so our postprocessing operations will focus on the within-object statistical analysis as mentioned in the Introduction. From the geological perspective, differentiation-dominant processes are believed to be the pivotal cause of metallogenesis [30], and the fractal dimension (*D*) [15] is introduced here to describe the differentiation degree of each anomalous patch. As a within-object statistical parameter, *D* is used to quantify the power-law relationship between multi-elemental contents and their cumulative summation in a given image object. It can be computed and interpreted in the following way:

$$N(\geq r) = Cr^{-D} \text{ ( $r > 0$ )}\tag{7}$$

where *r* is the multi-elemental content value within an image object, and *N*(≥*r*) denotes the summation of content values larger than or equal to a given *r*; *C* > 0 is a proportionality coefficient, and the exponent *D* is known as the fractal dimension.

Mathematically, a number of straight-line segments (namely scaleless intervals) can be derived from Formula (7) on the log-log paper. It aims to cluster a dataset into most similar groups in the same segment and most dissimilar groups in different segments [15]. For two adjacent straight lines with different slopes, *D*<sup>n</sup> & *D*n+1, the inflection point (*T*) is routinely determined by RSS:

$$\text{RSS} = \sum\_{i=1}^{i\_0} \left[ \ln \mathcal{N}(r\_i) + \text{D}\_1 \ln r\_i - \ln \mathbb{C}\_1 \right]^2 + \sum\_{i=i\_0+1}^{\mathcal{N}} \left[ \ln \mathcal{N}(r\_i) + \text{D}\_2 \ln r\_i - \ln \mathbb{C}\_2 \right]^2 \to \text{Min} \tag{8}$$

where RSS denotes the "residual sum of squares", and *ri*<sup>0</sup> is the dividing threshold (namely *T*n, n = 1, 2,... ). In a similar fashion, the slopes of several scaleless segments, as well as the thresholds between them, can be quantitatively determined.

As shown in Figure 7, in this step, they are customarily classified into two segments, *D*<sup>1</sup> and *D*2. According to Zhao et al. [15], the dimension *D* can be used to measure the clustering degree of a dataset, which is a diagnostic spatial distribution pattern for orerelated geological objects. The larger the *D* value, the steeper is the line, the greater is the rate of change in elemental content, the lower is the degree of clustering, and vice versa. Note that *D*<sup>1</sup> usually represents the low-content background information, and it does not change much over different image objects, so we use *D*<sup>2</sup> as an indicator to quantify the multielemental clustering. Suppose there are *L* pixels in an image object, and each object has *F* data layers representing different geochemical variables, and then, all of the *L* × *F* content values will be fed into the fractal algorithm aforementioned. As the selected elements are paragenetically associated with each other, and their content values are normalized to [0, 1] by subtracting the mean values, so if the metallogenesis or differentiation occurs, the content values of these elements within a given object will become more clustered, which corresponds to a smaller *D*2; otherwise, they remain scattered because of the weak feature of elemental association, which corresponds to a larger *D*<sup>2</sup> value. In addition, we do not have to make a strict assumption that the multi-elemental contents follow a multi-fractal distribution when conducting the within-object statistical analysis, for the algorithm can adaptively process or yield new insights for those non-fractal data [35]. Finally, every image object will be assigned a dimension value that can be used to modify the anomalous regions delineated by OGE.

**Figure 7.** The fractal schema ln N(≥*r*) versus ln *r*.

## **3. Results**

## *3.1. Implementation Details*

We implement the proposed OGE model as well as the relevant pre- and post- processing using the Python framework which is open source. The most important packages involved in the programming experiments include but not limited to Pytorch, Pytorch Geometric, Numpy, Networkx, and so on. OGE was conducted on a single NVIDIA Tesla V100 with 32 GB of GPU memory and train for 2000 epochs to make the model converge. We use the Adam optimizer to train our models from scratch, and the initial learning rate is set to 0.005, and the weight decay coefficient is set to 0.0005 By the way, the primary purpose of regional geochemical exploration is to delimit areas of interest for further exploration, and thus, the performance of OGE can be evaluated or described from two perspectives: the percentage of discovered mineral deposits to the total mineral deposits—the recognition rate, and the size of the anomalous area.

For a better illustration, taking the I-series elemental association as example, Figure 8 gives the recognition rate versus loss curves during the training phase. As can be seen, before completing 160 epochs, both the recognition rate and the training loss decrease sharply. And then, the decreasing trend of loss become slow, eventually stabilizing at 0.124 or so; meanwhile, the growth of the recognition rate rises and became oscillating, centered at 0.837. So, in order to jump over the oscillation area, here we set the training epoch as 2000.

89

**Figure 8.** Variation in loss and the recognition rate with number of training epochs. Note that: here we set the threshold of the anomaly score map as median, and the recognition rate is calculated as the ratio of the number of ore spots falling within the anomalous areas to the total number of known ore spots.

## *3.2. The Object-Based Anomaly Score Map*

For the I-series elements, it can be seen from Figure 9 that: (1) most of the known ore spots/deposits, no matter what their scales and mineral species, fall within the image objects having higher anomaly scores (brighter colored areas). (2) Using the median value as the threshold, there are 30 ore spots falling within the delineated anomalous regions, and 13 spots falling outside. The recognition rate is 69.767%. (3) Using the 60th percentile value as the threshold, there are 26 ore spots falling within the Level I anomalies, and the recognition rate is 60.465%. (3) The Level II anomalous area accounts for 49.778% of the total area, and the Level I area accounts for 40.079%. (4) Many of the anomalous objects are barren, especially those on the northwest corner of the study area. (5) Most of the mineral spots except Au falling outside the anomalous regions occur nearby the anomalous boundaries. If we create a 4-pixel buffer zone around the Level II anomalous objects, there are only 4 ore spots falling outside, and the ore-spot recognition rate comes up to 90.70%. (6) If we create a buffer zone around the Level I anomalies, the recognition rate increases to 74.419%.

For the S-series elements, it can be seen from Figure 10 that: (1) there are 40 ore spots falling within the Level II anomalous regions, and only 7 spots falling outside. The recognition rate is 85.106%. (2) there are 33 ore spots falling within the Level I anomalies, and the recognition rate is 70.213% (3) most of the ore spots falling outside the anomalous regions occur nearby the anomalous boundaries, and relevant anomaly scores fluctuate between 0.030 and 0.070. If we create a 4-pixel buffer zone around the Level II anomalous patches, there are only one fluorite ore-spot falling outside, and the recognition rate comes up to 97.87%. (4) If we create a buffer zone around the Level I anomalies, the recognition rate increases to 78.723%. (5) The Level II anomalous area without the buffer zone accounts for 50.054% of the total area, and the buffered anomalous area accounts for 68.24%. (6) The spatial distribution of the anomalous regions in Figure 10 is generally consistent with that in Figure 9.

**Figure 9.** (**upper**) the object-based anomaly-score base map with known I-series ore spots (red squares) overlaid on it. The image objects with larger scores are represented by brighter tones, while image objects with smaller scores are represented by darker tones. (**lower**) the delineated anomalous regions (green-colored patches—Level II anomalies, orange-colored patches—Level I anomalies, delineated by the 60% quantile) with ore spots overlaid on it. We also created a 4-pixel buffering zone (white colored belts) around the known anomalous regions. Note: the mineral species include Cu, Au, Fe, Mo, Hg, Ag-Pb-Zn-Fe, Ag-Pb-Zn, Pb-Zn, Cu-Zn, Ag-Pb, Pb, Zn, and U. In the lower picture, the ore spots not falling in the anomalous objects are labelled with their mineral species. The size of the ore spots is proportional to the mineralized scale ranging from mineralized spots to small- and medium-scaled ore deposits. Actually, most of the ore spots shown in Figure 9 are mineralized spots, namely micro-mineralization outcrops. The ore-spot data are provided by reference [29].

**Figure 10.** (**upper**) the object-based anomaly-score base map with known S-series ore spots (red squares) overlaid on it; (**lower**) the delineated anomalous regions (green-colored patches—Level II anomalies, orange-colored patches—Level I anomalies, delineated by the 60% quantile) with mineral spots overlaid on it. We also created a 4-pixel buffering zone (white colored belts) around the known anomalous regions. Note: the mineral species include Fe, Mo, Ag-Pb-Zn-Fe, Ag-Pb-Zn, Pb-Zn, Ag-Pb, Pb, W, Nb-Ta, Rb-Nb-Ta, Fluorite, W-Mo, Beryl, U, and etc. Other legends in this figure are consistent with those in Figure 9 if not specified.

Figure 11 gives the anomaly score map of all the ore-forming elements, and we can observe that: (1) The Level II anomalous area accounts for 49.881% of the total area. (2) There are 50 ore spots falling within the Level II anomalous regions, and 18 ore spots falling outside. (3) The recognition rate of the Level I anomalies is 66.176%. (4) In the buffered anomalous area, only 8 mineral spots falling outside, and the recognition rate increases to 88.24%. (3) Only a few ore spots fall within the image objects with the brightest tone, namely with the largest anomaly scores.

**Figure 11.** (**upper**) the object-based anomaly-score base map with all the known ore spots (red squares) overlaid on it; (**lower**) the buffered anomalous regions (green- and orange- colored patches edged with white) with ore spots overlaid on it. Note: the mineral species of the ore spots include Cu, Au, Fe, Mo, Hg, Ag-Pb-Zn-Fe, Ag-Pb-Zn, Pb-Zn, Cu-Zn, Ag-Pb, Pb, Zn, W, Nb-Ta, Rb-Nb-Ta, Fluorite, W-Mo, Beryl, U, etc. Other legends in this figure are consistent with those in Figure 9 if not specified.

Given the above, OGE's performance on S-series elements is even better than that on I-Series elements. The main drawback of our methodology is that the delineated anomalous area is too large and many of the anomalous patches are barren, which is not helpful for guiding the follow-up anomaly verification. As stated before, these results may demonstrate the possible pitfalls of using object-level anomaly scores due to their possible misrepresentation of the within-object geochemical characteristics [10]. That is to say, the anomalies delineated by OGE must be further modified through the within-object multi-elemental statistical analysis.

## *3.3. Elemental Within-Object Separability*

Figure 12 gives the object-based map of dimension (*D*2) for the I-series and S-series elemental associations, respectively. We note that as the geochemical behaviors of the I-series elements are quite different from those of the S-series elements [30], so the objectbased *D*<sup>2</sup> map for all elements was not calculated here. From Figures 12 and 13, we have delivered three important observations as follows:


**Figure 12.** (**upper**) the object-based D2 map for I-series elements with relevant ore spots (red squares) overlaid on it. (**lower**) the D2 map for S-series elements with relevant ore spots overlaid on it. The green colored patches represent the delineated non-anomalous regions.

**Figure 13.** (**upper**) the buffered anomalous regions (green colored patches edged with white, delineated by the 50% quantile) for I-series elements with relevant ore spots overlaid on it. (**lower**) the buffered anomalous regions for S-series elements with relevant ore spots overlaid on it. The ore spots not falling in the anomalous regions are labelled with their mineral species. Other legends in this figure are consistent with those in Figures 9 and 10.

## *3.4. Comparison and Validation by Factor Analysis*

The best way to validate the effectiveness of our OGE algorithm is to observe how many ore spots can fall into the anomalous image objects, but this is not enough because most of the anomalous patches in Figure 13 are barren. Comparing Figure 13 to Figure 1, we also discover that the presence of ore spots and anomalies is not completely controlled by the spatial distribution of the outcropped bedrocks and known geological structure. For example, over a third of the known ore spots occur within the N2b and Quaternarycovered areas with less geological exposure. Inasmuch, it is difficult to directly validate the anomalies from the perspective of regional geology. In this Section, the factor analysis is used to do so. As we know, factor analysis is a well-validated technique that is widely used to reduce a large number of variables into fewer numbers of factors [32]. Although irrespective of the spatial structure of geochemical patterns, it facilitates the identification of multivariate geochemical anomalies. This is consistent with the major aim of OGE. So, supposing the results of factor analysis are tenable and close to truth, we can cross-validate the correctness and robustness of our model's outputs by observing if the factor-score anomalies overlap considerably with the anomalies delineated in Figure 13.

We can make the following observations from Figure 14: (1) Nearly all the factor-score anomalies, no matter what the elemental association each factor represents, reside within the OGE-derived anomalous areas, and occupy most of the interior space. This result strongly supports the basic correctness of our model's outputs. (2) Our OGE model is advantageous since quite a few ore spots that cannot be identified by the factor-score anomalies are well-identified by OGE. (3) The I-series elemental association can be further divided into five factors: Factor 1: As-Pb-Zn, Factor 2: Ag-B-Cu, Factor 3: Au-Hg, Factor 4: Mo-U, and Factor 5: Fe2O3. Likewise, the S-series elements can be divided into: Factor 1: Bi-W, Factor 2: Pb-Sb-Sn, Factor 3: Mo-U, Factor 4: Nb-Fe2O3, and Factor 5: Be-F. These divisions improve the interpretability of the extracted anomalies and facilitate the mineral species-specific geochemical exploration.

**Figure 14.** The geochemical anomalous areas (white) delineated in Figure 13 with relevant factorscore anomalies (colored polygons) overlaid on them. (Left): the I-series elements. (Right) the S-series elements. Note that: the factor-score anomalies were obtained by setting the binarization threshold as mean + 1 × standard deviation. Other legends in this figure are consistent with those in Figures 9 and 10.

Given the above, by calculating the ore-spot recognition rate and conducting factor analysis, the OGE-derived anomalies in Figure 13 get validated.

#### **4. Discussion**

In our opinion, there are six meaningful aspects around the proposed network architecture worth to discuss, which are listed in the following paragraphs.

First, many factors, e.g., steep topography, the weathering and transportation of chemical compositions in stream sediment samples, surface hydrological processes, etc., can cause displacement of geochemical anomalies from the source area [7], and that is why most of the mineral spots falling outside the delineated anomalous regions occur nearby them. Inasmuch, it is highly necessary to create buffer zones for the known anomalous regions, although the total anomalous area increases accordingly.

Second, Figure 15 gives the anomalous score map using the geochemical dataset consisting of seven major rock-forming elements (Fe2O3, Al2O3, SiO2, MgO, Na2O, K2O, and CaO) as input to OGE. Here we suppose that the anomalies delineated in Figure 15 reflect the regional diagenesis. It can be seen from Figure 15 that only 6 ore-spots fall outside the anomalous regions, and the ore-spot recognition rate is as high as 91.176%, exceeding Figure 11's 88.24%. The spatial distribution of the anomalous regions in Figure 15 is also consistent with that in Figures 9–11. Obviously, owing to the strong control of bedrock geology may exert on the chemical composition of stream sediments or soil samples, the determination of geochemical anomalies can be heavily affected by the lithology background in areas with variable lithologies. That is to say, the mineralization is attached to the diagenesis, and the delineated anomalies in Figures 9–11 primarily reflects the regional diagenesis, and, to a lesser extent, the mineralization. Zhao et al. [36] further pointed out that, due to the lower spatial resolution, the 1:200,000 geochemical data cannot directly link to small-/medium-scaled ore occurrences, especially the micro-mineralization outcrops. Worse still, taking the image objects as basic analysis units tends to accentuate these trends rather than mitigate them. These explained why our delineated anomalous area is so large and is largely controlled by the outcropped bedrocks as displayed in Figure 1.

**Figure 15.** The buffered anomalous regions (green colored patches edged with white) with all the known ore spots overlaid on it. Note that the anomalous regions are generated by using the geochemical dataset consisting of the major rock-forming elements as input to OGE. We set the binarization threshold as the 50% quantile: 0.66982. Other legends in this figure are consistent with those in Figure 11 if not specified.

Third, in the within-object analysis stage, we use fractal dimension *D*<sup>2</sup> to measure the degree of multi-elemental clustering, but its performance in terms of reducing the prospecting-target-area is not as good as expected. We also observed that: the overall distribution pattern of the multi-elemental contents of the "pixels" in an object is usually right-skewed, thus it can be approximately modelled by a bimodal Gaussian distribution: one population deals with the normally distributed part, and the other one deals with the long right tail—although sometimes it is multi-peaked. And then, we can separate them and obtain their shape parameters, namely *m*<sup>1</sup> and *m*2: the mean values of two adjacent Gaussian distributions, respectively, and *m*<sup>1</sup> < *m*2; *σ*<sup>1</sup> and *σ*2: the corresponding standard deviations. On this basis, the separability (*J*) [37] is calculated as:

$$J = 2(1 - e^{-B}), J \in [0, 2], \tag{9}$$

$$B = \frac{1}{8}(m\_1 - m\_2)^2 \frac{2}{\sigma\_1^2 + \sigma\_2^2} + \frac{1}{2} \ln\left[\frac{\sigma\_1^2 + \sigma\_2^2}{2\sigma\_1\sigma\_2}\right] \tag{10}$$

If the metallogenesis or differentiation occurs, the multi-elemental content values within an image object will have a small separability because of the strong feature of elemental association; otherwise, they will have a larger *J* value.

Unfortunately, like the case in Figure 13, only a moderately reduced prospectingtarget-area was generated based on the object-oriented *J* map. These experiments imply that: *D*2, *J*, as well as other manually designed features may not be a good indicator to separate multi-elemental anomalies from background, and alternatively, deep features produced by, e.g., OGE usually perform well in this case.

Fourth, the delineated anomalous area can be dramatically reduced by increasing the binarization threshold. For example, for the S-series elements, when setting the binarization threshold as the 60% quantile (0.10562), the anomalous area decreases to 40.005% of the total area, but 6 ore-spots are excluded from the buffered anomalous regions; when setting the threshold as the 70% quantile (0.12625), the anomalous area decreases to 29.872%, but 14 ore-spots are excluded; when setting the threshold as the 80% quantile (0.16079), the anomalous area decreases to 20.157%, but 22 ore-spots are excluded; when setting the threshold as 0.2202 – the 90% quantile, the anomalous area decreases to 10.022%, but 34 ore-spots fall outside the buffered anomalous regions, and only 13 ore-spots fall within. The case is similar for the I-series elements. That is to say, the anomaly scores of the ore-containing image objects vary in a wide range, from the 50% quantile to 100%. In our opinion, this can be mainly attributed to the low resolution of the geochemical data in use and its weak ability in detecting the micro-mineralization outcrops. To prove this, Figure 16 further gives the anomalous regions extracted by 1:50,000 soil/stream-sediment survey data, which is generated using the geochemical dataset consisting of 12 ore-forming elements (Ag, As, Au, Bi, Cu, Hg, Mo, Pb, Sb, Sn, W, and Zn) as input to OGE. In Figure 16, we raise the binarization threshold from the 50% quantile (median) to the 60% quantile, and the anomalous area decreases to 38.11% of the total area accordingly. Comparing to Figure 11, a 11.77% reduction in anomalous area has been achieved without affecting the recognition rate: only 7 ore-spots fall outside the buffered anomalous area. If we raise the threshold to the 70% quantile, the anomalous area decreases to 29.54%, and only 11 orespots fall outside (the recognition rate remains above 80%). Obviously, due to the improved spatial resolution, our OGE model is guided to focus more on regional mineralization, rather than diagenesis.

**Figure 16.** The buffered anomalous regions (blue colored patches edged with white) with all the known ore spots overlaid on it. Note that the anomalous regions are generated by using the 1:50,000 geochemical dataset consisting of 12 ore-forming elements (Ag, As, Au, Bi, Cu, Hg, Mo, Pb, Sb, Sn, W, and Zn) as input to OGE. We set the binarization threshold as the 60% quantile: 0.10515. Some ore spots, e.g., fluorite, Nb-Ta, etc., are removed from this figure because of the absence of relevant geochemical data. The 1:50,000 soil/stream-sediment geochemical data was provided by the Development Research Center, China Geological Survey. To ensure consistency, the 1:200,000 multiresolution segmentation result shown in Figure 2 is reused to generate the 1:50,000 anomaly score map. Other legends in this figure are consistent with those in Figure 11 if not specified.

Fifth, in the pre-processing stage, first we interpolate sampled point data into raster grids, and then, they are aggregated into different image objects, which inevitably introduces uncertainties into the data. Naturally, people may question if a reduced segmentation scale can help improve the anomaly-detection performance. To answer this question, we use the scale parameter 2.0 to segment the input geochemical image, and finally we get 2465 objects as displayed in Figure 17. Taking the S-series elements as example, comparing to Figure 10 (upper), more details about the anomalies have emerged, but their spatial distribution patterns are generally consistent with those in Figure 10. We still have to set the binarization threshold as median to keep the ore-spot recognition rate greater than 90%, whereas if we raise the threshold to the 70% quantile, only 9 mineral spots fall outside the buffered anomalous area, and the recognition rate remains greater than 80%. Figure 18 gives the big picture: when we set the threshold smaller than or equal to the 60% quantile, the OGE analysis performed at higher levels of scale has higher recognition rate; whereas when we set the threshold greater than the 60% quantile, lower levels of scale may perform

better. Figure 17 is such an example: when the threshold is set as the 70% quantile and the scale parameter is set as 2, 81.4% of the I-series mineral spots can fall within the corresponding buffered-anomalous-area which accounts for <45% of the study area, and 80.9% of the S-series ore spots can be included. By contrast, if the scale parameter increases to 3, only 72% of the I-series ore spots and 70% of the S-series ore spots can be included in the relevant anomalous areas. Given the above, in order to balance the recognition rate and anomalous area, it is better to use a smaller scale parameter to segment the multivariable geochemical image.

**Figure 17.** (**upper**) the object-based anomaly score map for I-series elements with relevant ore spots (red squares) overlaid on it. (**lower**) the object-based anomaly score map for S-series elements with relevant ore spots overlaid on it. The image objects with larger scores are represented by brighter tones, while the image objects with smaller scores are represented by darker tones. Here we set the binarization threshold as the 70th percentile value, and the regions delineated by green lines are the anomalous regions with a 3-pixel buffer zone, which account for approximately 44.375% of the total area.

**Figure 18.** Plot of the recognition rate versus threshold and anomalous area. (**left**): the I-series elements; (**right**): the S-series elements.

Sixth, as our OGE architecture was developed as an updated version of GAUGE by combining OBIA and GNN (GAT & GCN) into a singular network, so it is necessary to compare the performance between them. As shown in Figure 19, which is generated by using the geochemical dataset consisting of all 18 ore-forming elements as input to GAUGE, we found that the spatial distribution patterns of the anomalous areas are generally accorded with those in Figure 11, but more details have emerged; if setting the binarization threshold as the 50% quantile, there are more than 19 ore-spots falling outside but being close to the anomalous regions; if setting the threshold as the 70% quantile, there are 25 missed ore-spots. As far as the recognition rate is concerned, the performance of GAUGE is no better than OGE. Nevertheless, even the threshold is set as the 70% quantile, most of the missed ore spots could fall within the 2 km (i.e., the controlling distance of one sampling point) buffered anomalous regions, and the ore- spot recognition rate approaches 100%. However, the resulting anomalous area will increase to >75% of the total area. In addition, the within-object statistical analysis is not supported by GAUGE. Given the above, our OGE performed better in anomaly detection: in Figure 17, more than 80% of the known ore spots were predicted in the buffered anomalous area accounting for less than 45% of the total area. In fact, both GAUGE and OGE were designed to extract both the spatial features and compositional relationships of geochemical variables collected at irregularly distributed locations. For OGE, such irregularity is implemented by the multiresolution segmentation whose output is a series of irregularly distributed centroids in irregularly shaped image objects [6], which may reflect the variation of geochemical background across the space. On the other hand, as shown in Figure 19, the sampling grid is relatively regular, and very few geochemical sampling points are missing in this area. This implies that GAUGE could not capture the practical spatial distribution characteristics of geochemical variables, which reflects complex geologic processes (i.e., mineralization). That is why the visual performance of GAUGE is not as good as expected. Other convolutional autoencoder models available in the literature, such as SCMA (Spatially Constrained Multi-Autoencoder) [38], were not discussed here because they cannot be applied to an irregular area for anomaly identification.

From the above, it is still difficult for us to reduce the anomalous area to below 30% of the study area. However, previous studies on separation of geochemical anomalies in the literature mainly focus on a singular mineral species, while our study seeks to delineate the all-mineral-species anomalies. In this sense, a relatively large anomalous-target-area may be necessary.

**Figure 19.** Anomaly map obtained by GAUGE. (**left**): threshold = the 50% quantile; (**right**) threshold = the 70% quantile. Note: the colored dots represent different geochemical sampling points; yellow dots denote anomalies and blue dots denote the background. All the known ore-spots (red squares), no matter what their scales and mineral species, are overlaid on the anomaly maps. The anomaly score maps are generated by using the geochemical dataset consisting of all 18 ore-forming elements as input to GAUGE.

#### **5. Conclusions**

The main contribution of this article is fourfold as follows: (1) to the best of our knowledge, this is the first time that the OBIA framework has been applied in regional geochemical prospecting; (2) a novel GNN architecture (OGE) has been designed to extract multi-elemental geochemical anomalies which can locate most of the known ore spots, no matter what their scales and mineral species; (3) for the first time the within-object multi-elemental statistical analysis has been conducted to modify the anomalous regions delineated by object-level summary statistics; and (4) most studies on separation of geochemical anomalies in the literature focus on a singular mineral species, while our study attempts to delineate the all-mineral-species anomalies.

Taking the 1:200,000 stream-sediment geochemical survey data in the Bayantala-Mingantu district, North China as case study, our findings indicated that:

(1) Most of the ore spots falling outside the delineated anomalous regions occur nearby them. (2) Due to the low resolution of the geochemical data at hand and its weak ability in detecting the micro-mineralization outcrops, the anomalous area produced by OGE is relatively large and mainly reflects the regional diagenesis. (3) The withinobject statistical analysis based on handcrafted features, e.g., D2 and J, has a limited effect on reducing the anomalous target area. (4) OGE can be guided to focus more on regional mineralization, rather than diagenesis, by using fine-scale geochemical data as input. (5) Smaller segmentation scales can greatly improve the application of OGE. Finally, more than 80% of the known ore spots were predicted in less than 45% of the study area. (6) OGE is designed to extract both the spatial features and compositional relationships of geochemical variables collected at irregularly distributed centroids in irregularly shaped image objects, and it performs slightly better in anomaly detection than other convolutional autoencoder models such as GAUGE.

Future research will focus on improving the OGE in the following respects: (1) to alleviate the spatial heterogeneity of geochemical backgrounds, it is better to conduct the mean removal operation within every image object, rather than removing the global mean values for each element. (2) The optimal segmenting scale can be determined by using the ESP (estimation of scale parameter) tool proposed in [39]. (3) The possible approaches of integrating multiple datasets (e.g., remote sensing data and geophysical data) in the framework of OGE will also be explored. (4) Improve the within-object statistical analysis method, so that those strong but false or meaningless anomalies can be eliminated, while retaining those weak but important or meaningful anomalies.

**Author Contributions:** Conceptualization, B.Z. and D.Z.; methodology, B.Z. and R.Z.; software, B.Z.; validation, P.T. and H.W.; formal analysis, B.Z., H.W. and P.T.; investigation, B.Z. and Z.L.; resources, Z.L.; data curation, Z.L.; writing—original draft preparation, B.Z.; writing—review and editing, D.Z. and R.Z.; visualization, B.Z.; supervision, D.Z.; project administration, R.Z. and Z.L.; funding acquisition, D.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was financially supported by the project of "Study of the Ore-Forming Regularity and Ore Prediction for Key Metallic Deposits in the Bayantala-Mingantu District, Inner Mongolia, China" under Grant 2020-KY03.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The 1:50,000 and 1:200,000 geochemical data used in this study were provided by the Development Research Center, China Geological Survey.

**Acknowledgments:** Constructive criticism and helpful review comments on this manuscript were provided by several anonymous reviewers, and the authors would like to thank them here.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


**Jie Li 1, Qingjie Gong 1,\*, Bimin Zhang 2, Ningqiang Liu 1, Xuan Wu 3, Taotao Yan 2, Xiaolei Li <sup>3</sup> and Yuan Wu <sup>1</sup>**


**Abstract:** Geochemical gene is a new promising concept proposed recently in the discrimination and traceability of geological materials and is also a useful tool to recognize geochemical anomalies in mineral exploration. Based on the lithogenes of LG01 and LG03, geological materials can be classified into nine types of LG\_CR compositionally. With respect to geological materials with 11 types of LG\_CR, in order to eliminate the lithological influence and to further narrow the prospecting target area, a tungsten metallogene named MGW11 is proposed for geochemical tungsten exploration after the tungsten metallogene MGW. Six weathering profiles of 11 types of LG\_CR developed on granitic intrusions in different areas in China are selected to test the stable properties such as heredity and inheritance of MGW11 and MGW. The results indicate that MGW11 and MGW metallogenes illustrate stable properties during rock weathering regardless of weathering degrees, although gene variations of MGW11 and MGW are also observed during extreme weathering. Based on the regional geochemistry survey data in the Lianyang area in south China, where stream sediments are mostly 11 types of LG\_CR compositionally, geochemical maps of mineralization similarities of MGW11 and MGW are contoured, and the anomaly areas are determined on the mineralization similarity value of ≥40%. Comparing the tungsten deposits and anomaly areas determined on MGW11 and MGW metallogenes spatially, a total of six polymetallic W deposits recognized in the study area are all located in the anomaly areas. Therefore, mineralization similarities of MGW11 and MGW can be viewed as useful integrated indices on geochemical tungsten exploration. In areas with 11 types of LG\_CR compositionally, anomaly areas determined on the MGW11 are smaller than those on the MGW, which indicates that MGW11 is more efficient than MGW in targeting W deposits during tungsten prospecting because of the elimination of the lithological influence.

**Keywords:** geochemical gene; MGW; LG\_CR; mineralization similarity; Lianyang

## **1. Introduction**

Geochemical gene is a new promising concept proposed recently in the discrimination and traceability of geological materials [1,2] and is also a useful tool to recognize geochemical anomalies in mineral exploration [3,4]. Geochemical gene is proposed and illustrated firstly as a lithogene [5] named LG02, then followed by lithogenes called LG01 [6] and LG03 [7], gold metallogene (MGAu) [3], and tungsten metallogene (MGW) [4], and REE (rare earth elements) genes called REEG01 and REEG02 [6]. Therefore, there are a total of seven geochemical genes reported till now, which were introduced and reviewed by Gong et al. [1] recently.

With respect to lithogenes LG01 and LG03, their gene properties of heredity and inheritance during weathering have been tested on lots of weathering profiles developed over different lithological rocks in different climate zones in China [6–8] according to the similar gene criterion of ≥80% on gene similarity [1]. Their application in classifying

**Citation:** Li, J.; Gong, Q.; Zhang, B.; Liu, N.; Wu, X.; Yan, T.; Li, X.; Wu, Y. Construction, Test and Application of a Tungsten Metallogene Named MGW11: Case Studies in China. *Appl. Sci.* **2023**, *13*, 606. https://doi.org/ 10.3390/app13010606

Academic Editor: Ricardo Castedo

Received: 6 December 2022 Revised: 29 December 2022 Accepted: 30 December 2022 Published: 2 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

geological materials is useful and suitable for fresh and altered rocks and weathered products such as soils and sediments [2]. The classification method is introduced here briefly. The ideal acidic rock in China (a virtual rock sample represented by the elemental abundance of acidic rock in China compositionally) has the same gene code of 10202020202 on LG01 and LG03, and the ideal basic rock in China also has the same gene code of 12020202020 on these two lithogenes [6,7]. The gene similarity of a sample relative to the ideal acidic rock is called the sample's acidic similarity and can be labeled as *R*Acidic. According to this definition, the *R*Acidic of the ideal acidic rock in China is 100%, while the *R*Acidic of the ideal basic rock in China is 0%. Geological materials can be classified into three groups acidic-like composition with *R*Acidic ≥ 80% labeled 1, intermediate-like composition with *R*Acidic between 75% and 25 labeled 2, and basic-like composition with *R*Acidic ≤ 20% labeled 3. In order to integrate the classification results of LG01 and LG03, LG\_CR (classification results of lithogenes) with a double-digit is proposed by Wu et al. [2]. The classification result of LG01 is put as the first digit, and the result of LG03 is the second digit. There is a total of nine types theoretically of LG\_CR classified based on LG01 and LG03 as 11, 12, 13, 21, 22, 23, 31, 32, and 33 types. The 11 types of LG\_CR of a sample indicates that the values of *R*Acidic of LG01 and LG03 of the sample are all ≥80%. Therefore, the result of LG\_CR can be used to classify geological materials [2].

With respect to the metallogenes of MGAu and MGW, their gene properties of heredity and inheritance during weathering have also been tested in China [3,4]. The ideal geochemical background samples (all indicator elements are clearly lower than their immobile elements in gene spectral lines) have the same metallogene code of 10202020202 on MGAu and MGW. In contrast, the ideal ore samples (in which indicator elements are enriched clearly in gene spectral lines) will have the same metallogene code of 12020202020 on these two metallogenes. Like the above mentioned on lithogenes, the gene similarity of a sample relative to the ideal ore sample is called the sample's mineralization similarity and can be labeled as *R*IdealOre. According to this definition, the *R*IdealOre of the ideal ore sample is 100%, while the *R*IdealOre of the ideal background sample is 0%. The application of metallogenes is that their *R*IdealOre can be viewed as an integrated index of recognizing geochemical anomalies for mineral prospecting, and the mineralization similarity (*R*IdealOre) value of 40% can be viewed as the criterion to discriminate samples with or without mineralization [1,3,4]. Although the metallogenes have been used well in geochemical exploration [4,9], the anomaly area determined on the *R*IdealOre is commonly too large than deposit areas, which is unfavorable to targeting deposits promptly and precisely during prospecting. This large anomaly area determined on *R*IdealOre resulted from the elemental reference values during the metallogenes' construction. The reference values are determined by the elemental abundances of acidic, intermediate, and basic rocks in China. Therefore, the *R*IdealOre index is applicable to geological materials ignoring the lithology of basic-like, intermediate-like, or acidic-like compositions. If the target deposits, such as tungsten deposits, are located in the acidic-like composition area, such as the 11 types of LG\_CR area, the determined anomaly areas on the *R*IdealOre of MGW are certainly larger than the target area. Therefore, a tungsten metallogene aiming at 11 types of LG\_CR materials should be constructed to substitute the MGW to determine anomaly areas more efficiently.

In this paper, a tungsten metallogene named MGW11 aiming at 11 types of LG\_CR materials is constructed firstly. Then the heredity and inheritance properties of MGW11 are tested on lots of weathering profiles developed over 11 types of LG\_CR rocks in different climate zones in China. Finally, the *R*IdealOre of MGW11 is used to determine geochemical anomaly areas in the Lianyang area of south China, and the results are compared with those derived from the MGW.

#### **2. Construction of MGW11**

A geochemical gene is commonly constructed on five steps as a selection of elements, determination of reference values, spectral line and codes, calculation of similarity, and sequence of elements [1,3]. On the basis of the MGW metallogene proposed by Gong et al. [4], the selected elements and their sequences of MGW can be adopted to construct the new tungsten metallogene named MGW11 here. In addition, the methods of coding spectral lines and calculating gene similarities are also adopted during the construction of the MGW11. Therefore, the main task or key step to construct the MGW11 is the determination of reference values for each selected element.

In MGW and MGAu metallogenes, reference values are determined on the abundances of five geological materials in China, which are acidic rock, intermediate rock, basic rock, soil, and stream sediment [10]. The reference values of the six immobile elements, Ti, Th, Nb, Zr, La, and Y, were calculated as

$$C\_{\rm ref} = 10^{(1\lg C\_{\rm min} - 0.1)} \tag{1}$$

where *C*ref is the reference value and *C*min is the minimum abundance of each element in the five materials [3]. While references values of the five indicator elements as Cu, W, Sn, Zn, Mo in MGW were calculated as

$$\mathbb{C}\_{\text{ref}} = 10^{\left(\lg \mathbb{C}\_{\text{max}} + 0.1\right)} \tag{2}$$

where *C*max is the maximum abundance of each element in the five materials [4]. The spectral lines of MGW of the five materials are listed in Figure 1a. The gene codes of the five materials are the same as 10202020202, which is the ideal background material's metallogene code.

**Figure 1.** Spectral lines of MGW (**a**) and MGW11 (**b**) metallogenes for geological materials in China. Sample data in (**a**) are from Chi and Yan [10] and data in (**b**) are from Chi and Yan [10], Hou et al. [11], Xiang et al. [12] respectively.

With respect to the MGW11, reference values are determined on elemental abundances of a total of 190 geological materials in China, including 85 records of acidic rocks from Chi and Yan [10], 48 records of soils from Hou et al. [11], and 57 records of stream sediments from Xiang et al. [12] which are all 11 types of LG\_CR compositionally. The reference values of the six immobile elements, Ti, Th, Nb, Zr, La, and Y, were calculated as

$$C\_{\rm ref} = 10^{(\lg C\_{\rm min} + 0.1)} \tag{3}$$

The references values of the five indicator elements, Cu, W, Sn, Zn, and Mo, in MGW, were calculated as

$$\mathbb{C}\_{\text{ref}} = 10^{(\lg \mathbb{C}\_{\text{max}} - 0.1)} \tag{4}$$

where *C*ref is the reference value and *C*min and *C*max are the minimum and maximum abundances of each element in the 190 materials in China. The spectral lines of MGW11 of the three materials are listed in Figure 1b with the same gene code of 10202020202, which is the ideal background material's gene code. The elemental sequence and reference values of MGW and MGW11 are listed in Table 1.


**Table 1.** Reference values for tungsten metallogenes of MGW and MGW11.

Note: Unit in μg/g.

If a sample of 11 types of LG\_CR is mineralized during tungsten ore-forming processes, indicator elements will be enriched relative to the other six immobile elements. If the five indicator elements were all enriched clearly, the sample would have the MGW11 code of 12020202020 and is called the ideal ore sample. Therefore, the MGW11 similarity between the ideal ore and the ideal background sample is 0%. As aforementioned, the genetic similarity between a sample and the ideal ore is called the sample's mineralization similarity labeled as *R*IdealOre, which can be used as an index to discriminate a geological material as an anomaly or background sample. The 40% value of the *R*IdealOre was suggested as the criterion to discriminate samples with or without mineralization or to classify anomaly or background samples [3,4] which is also adopted here to the MGW11 gene.

## **3. Test of MGW11**

## *3.1. Materials*

The test of a geochemical gene focuses on stable properties such as heredity and inheritance during rock weathering [1]. Here six weathering profiles of 11 types of LG\_CR are selected from literature to test the properties of the MGW11. The six weathering profiles are labeled as DH31, TL19D04, LHK55, and LC19 from northeast to southwest in China and LY18D13 and LY18D06 in Liangyang area of south China (Figure 2, Table 2).

**Figure 2.** Locations of weathering profiles and Lianyang area in China. LY18D13 and LY18D06 weathering profiles are located in Lianyang area.

**Profiles Parent Rock Longitude Latitude Depth (m) Sample Count References** DH31 Monzogranite E 128◦22 22 N 43◦18 44 11 13 [8] TL19D04 Monzogranite E 123◦56 44 N 42◦26 16 6 9[13] LHK55 Granite E 112◦07 09 N 33◦49 06 5.2 11 [14] LC19 Granite E 100◦18 05 N 22◦12 24.2 14 20 [15] LY18D13 Granite E 112◦15 57.6 N 24◦23 00 14.6 23 [16] LY18D06 Granite E 112◦4 26.76 N 24◦9 39.96 25 29 [17]

**Table 2.** Information on weathering profiles.

The DH31 profile (E 128◦22 22, N 43◦18 44) developed over the Dunhua monzogranite in a temperate monsoon climate. The depth of DH31 profile is ca. 11 m, and 13 samples are collected sequentially from the topsoil downward to the monzogranite. Details, including descriptions, elemental concentrations, and the analytical qualities of these samples, can be found in Reference [8]. The TL19D04 profile (E 123◦56 44, N 42◦26 16) formed on the Tieling monzogranite in a temperate continental monsoon climate. The depth of TL19D04 profile is ca. 6 m, and 9 samples are collected sequentially from the topsoil downward to the monzogranite. Details of these samples can be found in Reference [13]. The LHK55 profile (E 112◦07 09", N 33◦49 06) developed over the Taishanmiao granite in a warm temperate continental monsoon climate. The depth of LHK55 profile is ca. 5.2 m, and 11 samples are collected sequentially from the topsoil down to the granite. Details of these samples can be found in Reference [14]. The LC19 profile (E 100◦18 05, N 22◦12 24.2) formed on the Lincang granite in a subtropical monsoon climate. The depth of LC19 profile is ca. 14 m, and 20 samples are collected sequentially from the topsoil down to the granite. Details of these samples can be found in Reference [15].

The LY18D13 profile (E 112◦15 57.6, N 24◦23 ), developed over the Lianyang granite in a subtropical monsoon climate. The depth of LY18D13 profile is ca. 14.6 m, and 23 samples are collected sequentially from the topsoil down to the granite. Details of these samples can be found in Reference [16]. The LY18D06 profile (E 112◦4 26.76, N 24◦9 39.96), also developed over the Lianyang granite, is ca. 25 m, and 29 samples are collected sequentially from the topsoil down to the granite, which details including descriptions, elemental concentrations, and their analytical qualities can be found in Reference [17].

## *3.2. Results*

Based on the reported elemental concentrations of samples from each weathering profile, weathering indices including CIA and WIG, acidic similarities (*R*Acidic) of LG01 and LG03 lithogenes, samples' similarities relative to the top soil and the bottom bedrock, mineralization similarities (*R*IdealOre) of MGW and MGW11 metallogenes are calculated for each weathering profile and illustrated in Figures 3 and 4.

The CIA (chemical index of alteration) was developed by Nesbit and Young [18] in reconstructing paleoclimate from Early Proterozoic sediments, and the WIG (weathering index of granite) was proposed by Gong et al. [19] to describe the weathering degrees of weathered granitic products in the absence of CO2 contents. The calculation methods used here were detailed and described by Wu et al. [8]. The calculation methods on gene codes and similarities used here can be found in Reference [1]. Acidic similarities (*R*Acidic) of LG01 and LG03 in each profile were calculated firstly to check whether their products are 11 types of LG\_CR or not. The gene similarity (*R*) of samples is calculated relative to their bedrock (heredity property) labeled as *R*Bedrock, their top soil (inheritance property) labeled as *R*Topsoil, and the ideal ore (mineralization similarity) labeled as *R*IdealOre respectively in each weathering profile (Figures 3 and 4).

**Figure 3.** Weathering indices and gene similarities of samples from weathering profiles of DH31, TL19D04, LHK55, and LC19. (**a**–**e**) are the weathering indices, acidic similarities, similarities of MGW, similarities of MGW11, and mineralization similarities respectively in profile DH31. (**f**–**j**) are those in profile TL19D04, (**k**–**o**) are those in LHK55, and (**p**–**t**) are those in LC19.

**Figure 4.** Weathering indices and gene similarities of samples from weathering profiles of LY18D13 and LY18D06 in Lianyang area. (**a**–**e**) are the weathering indices, acidic similarities, similarities of MGW, similarities of MGW11, and mineralization similarities respectively in profile LY18D13. (**f**–**j**) are those in profile LY18D06.

The CIA values of samples from DH31, TL19D04, and LHK55 profiles range from 49.8 to 56.6, from 50.9 to 57.4, and from 49.2 to 57.6, respectively (Figure 3a,f,k). The WIG values of samples from these three profiles range from 68.5 to 89.8, from 64.6 to 86.9, and from 63.6 to 94.3, respectively (Figure 3a,f,k). According to the classification values of 60 and 80 on CIA [20] and values of 20 and 60 on WIG [6,8], these three profiles have undergone incipient weathering. While CIA values of samples from LC19, LY18D13, and LY18D06 profiles in south China range from 53.0 to 94.1, from 59.8 to 83.6, and from 50.7 to 93.1, respectively (Figures 3p and 4a,f). The WIG values from these three profiles in south China range from 5.2 to 76.0, from 15.6 to 67.8, and from 6.3 to 86.1, respectively (Figures 3p and 4a,f). These values indicate that the profiles in south China have undergone extreme weathering.

The *R*Acidic of samples from all profiles vary from 80% to 100% on LG01 and LG03 (Figures 3b,g,l,q and 4b,g). According to the classification method proposed by Wu et al. [2], which is also introduced above, samples from the six weathering profiles are all 11 types of LG\_CR compositionally.

In Figure 3, all values of *R*Bedrock and *R*Topsoil of MGW and MGW11 in the four profiles of DH31, TL19D04, LHK55, and LC19 are ≥80% which indicates good heredity and inheritance of MGW and MGW11 metallogenes in each profile (Figure 3c,d,h,i,m,n,r,s). Values of *R*IdealOre on MGW are all ≤25%, and values of *R*IdealOre on MGW11 are all ≤15% (Figure 3e,j,o,t). This indicates that all samples from these four profiles are background samples (rather than anomaly samples) according to the discrimination criterion of 40% of *R*IdealOre.

In Figure 4, values of *R*IdealOre on MGW are all ≤30%, and values of *R*IdealOre on MGW11 are all ≤20% (Figure 4e,j). This indicates that all samples from weathering profiles in the Lianyang area are also background samples according to the discrimination criterion of 40% of *R*IdealOre.

In the LY18D13 profile, values of *R*Bedrock and *R*Topsoil of MGW11 (Figure 4d) are all ≥85% which indicates stable heredity and inheritance of MGW11. Except for some upper soil samples, values of *R*Bedrock of MGW are higher than 80% (Figure 4c), which indicates good heredity of MGW except for some variations in the upper soils with extreme weathering degrees. On the other hand, values of *R*Topsoil of MGW are higher than 80% except for only the bottom bedrock sample (Figure 4c), which indicates a good inheritance of MGW, excluding the bottom bedrock sample.

In the LY18D06 profile, values of *R*Bedrock of MGW and MGW11 are ≥80% except in some upper soil samples (Figure 4h,i) with extreme weathering degrees, which indicates good heredities of MGW and MGW11 except in some variations in the upper soils. However, values of *R*Topsoil of MGW and MGW11 are higher than 80% only in the upper soils (Figure 4h,i), which indicates good inheritance of MGW and MGW11 only in upper soils and gene variations appear relative to the lower or parent samples in this profile.

In summary, stable properties such as heredity and inheritance of MGW11 and MGW are verified in six weathering profiles which are all 11 types of LG\_CR compositionally and are all geochemical background samples with different weathering degrees. However, gene variations of MGW11 and MGW are also observed in some extremely weathered samples. With respect to the MGW, MGW11 shows better stability in the 11 types of LG\_CR geological materials.

#### **4. Application of Geochemical Exploration in Lianyang Area**

The main aim of constructing the tungsten metallogenes such as MGW and MGW11 is to determine geochemical anomalies using their mineralization similarities (*R*IdealOre) as an integrated index for tungsten exploration. With respect to the MGW, MGW11 should be more precise on anomaly determination in the tungsten mineralized area with geological materials of 11 types of LG\_CR compositionally on its construction. Therefore, the Lianyang area in south China is selected here to test or illustrate the applications of MGW and MGW11, where some tungsten deposits have been found, and geological materials such as stream sediments are most 11 types of LG\_CR compositionally.

## *4.1. Geographical and Geological Settings*

The Lianyang area is located in south China with an area of ca. 5400 km2 ranging from E 111◦47 24 to E 112◦40 12 with a distance of 90 km from west to east and N 24◦02 24 to N 24◦34 48 with a distance of 60 km from north to south (Figure 5a), which is situated in a typical subtropical monsoon zone with a humid climate. The mean annual temperature is ca. 15.5~20.4 ◦C. The annual rainfall is ca. 1500~2200 mm, most of which falls in summer, according to the public network data. The topography in the Lianyang area is characterized by foothills, and the elevation is low in the middle, high in the north and south, and with a value of 50 to 1900 m a.s.l. Soils are thickly developed, and the regolith thickness commonly varies from 10 to 25 m depending on the relief. The land is commonly covered by crops or arbors.

The strata in the study area belong to Cambrian, Devonian, Carboniferous, Permian, Triassic, Jurassic, and Quaternary periods, respectively, which petrological descriptions are illustrated briefly in Figure 5a as notes. Faults are mainly trending N-S in the center and NE-SW in the northwestern, and NW-SE in the northeastern. The main intrusion in the study area is the Lianyang granitic complex in which two types of lithology can be recognized gradationally: the medium-grained biotite granite as the main body and the medium-coarse-grained porphyritic-like biotite granite outcropped locally [21].

**Figure 5.** Geological map of Lianyang area (**a**) and geochemical maps of the LG\_CR (**b**), *R*IdealOre of MGW (**c**), and *R*IdealOre of MGW11 (**d**). Figure 5a is modified after the G49 with a scale of 1:1000,000 from China Geological Survey with notes are as the followings. 1—Quaternary sand and gravel mixed with clay silty sand; 2—Cretaceous sandy conglomerate, pebbled sandstone and siltstone; 3—Jurassic sandstone, siltstone, grit stone and mudstone; 4—Triassic limestone and mudstone; 5—Permian limestone and mudstone; 6—Carboniferous dolomitic limestone and dolomite; 7—Devonian sandstone and dolomitic limestone; 8—Cambrian sandstone, slate and siltstone; 9—Proterozoic sandstone, siltstone and slate; 10—Monzogranite; 11—Granodiorite; 12—Quartzdioritic porphyrite; 13—Peridotite; 14—Petrological boundary; 15—Fault; 16—Main residential place; 17—Locations of LY18D13 and LY18D06 weathering profiles; 18—Au deposit; 19—Cu deposit; 20—Polymetallic W deposit; 21—Mo-Fe deposit.

Six polymetallic W deposits have been recognized in the study area [22], although they have not been receiving much attention [23]. The polymetallic W deposits are mainly distributed around the Lianyang granitic complex contacting with strata and faults (Figure 5a). Among the six polymetallic W deposits, three deposits are located in the north margin of the Lianyang granitic complex near the Zhainan town, two deposits in the south margin of the Lianyang granitic complex near the Zhongzhou town, and one in the northeast margin of the Lianyang granitic complex near the Yangshan town. In addition, three Cu deposits, one Au deposit, and one Mo-Fe deposit are also recognized in this area [16,24].

#### *4.2. Materials, Results and Discussion*

In the Lianyang area, a total of 1393 geochemical records (or samples) of stream sediments were retrieved from the database of the RGNR (Regional Geochemistry–National Reconnaissance) project [12]. In this project or database, stream sediment is the sampling media with a scale of 1:200,000 and was analyzed with 10 major components (SiO2, Al2O3, Fe2O3 or TFe2O3, K2O, Na2O, CaO, MgO, Ti, P, and Mn) and 29 trace elements (W, Sn, Mo, Bi, Cu, Pb, Zn, Cd, Au, Ag, As, Sb, Hg, Li, Be, Sr, Ba, B, F, V, Cr, Co, Ni, Zr, Nb, Th, U, Y, and La) [25].

Based on the geochemical data of each record which represents an area of four square kilometers, the gene codes and their acidic similarities (*R*Acidic) of LG01 and LG03 are calculated on the GGC software [1] firstly. Then the LG\_CR results of the 1393 samples are derived according to the method proposed by Wu et al. [2]. The geochemical map of the LG\_CR in the Lianyang area is illustrated in Figure 5b. Although nine types of LG\_CR can be classified theoretically on double-digit numbers based on the acidic similarities of LG01 and LG03 lithogenes, only four types are recognized in the Lianyang area as 11, 12, 21, and 22 types. The 11 types of LG\_CR are dominant in the whole study area, and the other types of 12, 21, and 22 are located sporadically in the east and north of the area. Therefore, the Lianyang area can be viewed as an area of 11 types of LG\_CR compositionally by and large.

Based on the geochemical data of the 1393 records, the gene codes and their mineralization similarities (*R*IdealOre) of MGW and MGW11 are calculated. The geochemical maps of the *R*IdealOre of MGW and MGW11 in the Lianyang area are illustrated in Figure 5c,d. In these maps, the blue areas with 0 ≤ *R*IdealOre ≤ 20 can be viewed as the normal background area, the yellow area with 20 < *R*IdealOre < 40 can be viewed as the higher background area, and the red area with 40 ≤ *R*IdealOre ≤ 100 are the anomaly areas for tungsten exploration. Furthermore, the anomaly areas can be divided into three zones; the outer zone with 40 ≤ *R*IdealOre < 60, the middle zone with 60 ≤ *R*IdealOre < 80, and the inner zone with 80 ≤ *R*IdealOre ≤ 100 (Figure 5c,d).

By comparing the deposits and the anomaly areas spatially in the Lianyang area, we can find that a total of six polymetallic W deposits in the study area are all located in the anomaly areas determined on the *R*IdealOre of MGW (Figure 5c) and MGW11 (Figure 5d). This indicated that the mineralization similarities of MGW and MGW11 can be viewed as useful integrated indices on geochemical tungsten exploration.

With respect to the three Cu deposits in the study area, one is located in the anomaly area, one is near the anomaly area, and one is in the background area determined on MGW (Figure 5c) and MGW11 (Figure 5d). With respect to the Au deposit and the Mo-Fe deposit in the study area, the Au deposit is located in the higher background areas determined on MGW and MGW11 (Figure 5c,d), and the Mo-Fe deposit is located in the background areas (or higher background area determined on the MGW in Figure 5c and normal background area determined on the MGW11 in Figure 5d. This indicates that the mineralization similarities of MGW and MGW11 are invalid in Cu, Au, and Mo (Fe) geochemical exploration.

By comparing the anomaly areas locating the six tungsten deposits, we can find that the areas determined on the MGW11 are all smaller than the anomaly areas determined on the MGW, especially in the south and northeast margins of the Lianyang granitic complex. This is helpful in targeting the tungsten deposits efficiently in geochemical exploration and is consistent with the aim of this study. Therefore, the mineralization similarity of MGW11 is an integrated index for recognizing the tungsten anomalies, which eliminates not only the closure effect of compositional data such as the spider diagrams [26] and elemental correlations [27] but also the weathering and lithology influences during geochemical exploration.

In addition, two anomaly areas are determined in the south margin of the Lianyang granitic complex near the Qiashui town, both on the MGW (Figure 5c) and on the MGW11 (Figure 5d) with outer and middle anomaly zones (or *R*IdealOre ≥ 60%) which may be potential targets for further tungsten prospecting.

## **5. Conclusions**

(1) A tungsten metallogene named MGW11 is proposed for geochemical tungsten exploration in areas with 11 types of LG\_CR compositionally.

(2) The MGW11 and MGW metallogenes illustrate stable properties such as heredity and inheritance during weathering of rocks with 11 types of LG\_CR compositionally regardless of weathering degrees. However, gene variations of MGW11 and MGW are also observed during extreme weathering.

(3) The mineralization similarities of MGW11 and MGW can be viewed as useful integrated indices on geochemical tungsten exploration. In areas with 11 types of LG\_CR compositionally, anomaly areas determined on the MGW11 are all smaller than those on the MGW because the MGW11 eliminates the lithological influence on recognizing tungsten anomalies during tungsten prospecting.

**Author Contributions:** J.L.: Conceptualization, Data curation, Writing—original draft. Q.G.: Conceptualization, Methodology, Writing—review and editing. B.Z.: Conceptualization, Data curation. N.L.: Methodology, Formal analysis, Data curation. X.W.: Conceptualization, Data curation. T.Y.: Conceptualization, Formal analysis. X.L.: Data curation. Y.W.: Data curation. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was financially supported by the Fund from the Key Laboratory of Geochemical Ex-ploration, Ministry of Natural Resources (IGGEW2021030).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We greatly appreciate the comments from the anonymous reviewers and editors for their valuable suggestions to improve the quality of this manuscript.

**Conflicts of Interest:** The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.

## **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
