Data Evaluation for Cassiterite and Coltan Fingerprinting

Gäbler, Hans-Eike; Schink, Wilhelm; Gawronski, Timo

doi:10.3390/min10100926

Open AccessArticle

Data Evaluation for Cassiterite and Coltan Fingerprinting

by

Hans-Eike Gäbler

^*,

Wilhelm Schink

and

Timo Gawronski

Federal Institute for Geosciences and Natural Resources (BGR), Geology of Mineral Resources, D-30655 Hannover, Germany

^*

Author to whom correspondence should be addressed.

Minerals 2020, 10(10), 926; https://doi.org/10.3390/min10100926

Submission received: 15 September 2020 / Revised: 14 October 2020 / Accepted: 16 October 2020 / Published: 19 October 2020

(This article belongs to the Special Issue Analytical Tools to Constrain the Origin of Minerals)

Download

Browse Figures

Versions Notes

Abstract

Within due diligence concepts for raw material supply chains, the traceability of a shipment is a major aspect that has to be taken into account. Cassiterite and coltan are two so-called conflict minerals for which traceability systems have been established. To provide additional credibility to document-based traceability systems the German Federal Institute for Geosciences and Natural Resources (BGR) has developed the analytical fingerprint (AFP) for the minerals coltan, cassiterite, and wolframite. AFP is based on the analysis of a sample from a shipment with a declared origin and evaluates whether the declared origin is plausible or not. This is done by comparison to reference samples previously taken at the declared mine site. In addition to the generation of the analytical data, the data evaluation step, with the aim to state whether the declared origin is plausible or not, is of special importance. Two data evaluation approaches named “Kolmogorov–Smirnov distance (KS-D) approach” and “areas ratio approach” are applied to coltan and cassiterite and result in very low rates of false negative results, which is desired for AFP. The areas ratio approach based on hypothesis testing and a more sophisticated evaluation of the multivariate data structure has some advantages in terms of producing lower rates of false positive results compared to the KS-D approach.

Keywords:

coltan; cassiterite; analytical fingerprint; data evaluation

1. Introduction

The growing interest of consumers and the general public in the social and environmental conditions under which raw materials used to manufacture goods for daily life are produced has provoked regulations by governments and business actors and triggered the development of due diligence concepts for responsible raw material supply chains [1,2,3]. This is of special interest for the Great Lakes region in Central Africa where resource-related conflicts occur [4] and due diligence measures affect the livelihood of artisanal miners as well as the international trade with so-called conflict minerals (coltan, cassiterite, wolframite, and gold) [5]. Within due diligence concepts for raw material supply chains the traceability of a shipment is one of the issues that have to be taken into account [6]. The usual “bag and tag” traceability systems depend on document-based, therefore artificial, information on the origin of a shipment [7,8] and thus are highly susceptible to fraud attempts. However, the credibility of those traceability systems is enhanced significantly if an independent tool based on the analysis of intrinsic properties of the minerals would verify the declared origin and therefore strengthen the plausibility of the documentation of the respective shipment.

Analytical tools that provide information on the origin of minerals or raw materials have been developed for different materials such as gem stones [9,10,11,12], gold [13,14], yellow cake [15] or base metals [16] amongst others. For several years, the German Federal Institute for Geosciences and Natural Resources (BGR) has developed a tool called the analytical fingerprint (AFP), which aims to check independently whether the declared origin of a coltan, cassiterite or wolframite shipment is credible or not [17,18,19,20,21,22]. Here, special attention is given to ores mined and traded in the Great Lakes region in Central Africa. AFP is based on analytical data of a sample in question, a reference sample database and a data evaluation step. The typical application case for AFP is the investigation of an ore concentrate shipment from the trading chain, for which a specific mine site is given as its origin. AFP provides evidence whether the given origin is credible or not.

In addition to the development of analytical methods to obtain data for AFP, the data evaluation step is of particular importance. The data of an ore concentrate are multivariate, not normally-distributed and due to the process of mining, the samples cannot be regarded as representative aliquots of a common population (e.g., geological formation) [21,22]. This has to be kept in mind when data evaluation concepts are developed. Two different data evaluation concepts have been proposed and evaluated for wolframite, namely the Kolmogorov–Smirnov distance (KS-D) approach [21] and the likelihood ratio approach [22]. This study expands the application of both concepts to the commodities coltan and cassiterite, presents performance investigations and illustrates both concepts by two case studies.

2. Materials and Methods

2.1. Samples, Database, Sample Preparation and Analysis

2.1.1. Samples and Database

Within the framework of this study, a sample refers to an aliquot of a coltan or cassiterite ore concentrate, which contains several hundred or several thousand individual mineral gains. The majority of those grains are coltan or cassiterite grains accompanied by grains of gangue minerals. In this study, samples collected from the same ore body (even at different dates) are called “brother samples” while samples collected from different ore bodies are called “non-brother samples”. These ore bodies usually are pegmatites and quartz veins of limited lateral extent. However, since a deposit might have experienced more than one mineralization, mine sites as well might comprise different ore bodies.

This study comprises a database of 288 cassiterite and 302 coltan concentrate samples taken from different countries (7 countries for cassiterite and 17 countries for coltan) with an emphasis on the African Great Lakes region. From this region, 273 cassiterite concentrates and 253 coltan concentrates originate. For cassiterite, from 48 locations at least two independently taken ore concentrates are available, resulting in 268 pairs of brother samples. The appropriate numbers for coltan are 29 locations and 142 pairs of brother samples.

2.1.2. Sample Preparation

Polished sections are prepared from the sampled ore concentrates and are used for (i) grain identification by scanning electron microscopy combined with mineral liberation analysis (SEM-MLA) and (ii) analysis of the identified cassiterite or coltan grains by laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) [18,20]. Ore concentrates contain coltan and/or cassiterite grains as well as gangue material grains. The analysis of gangue material is not straightforward as its portion in ore concentrate is not mine specific but depends on the varying ore material processing on-site, which is mainly done manually by the miners. Therefore, the grain identification step is necessary to limit the analysis to cassiterite and coltan grains.

2.1.3. Analysis

The analysis of coltan and cassiterite grains is done by LA-ICP-MS. Details of the analytical protocols for coltan and cassiterite are given by Gäbler et al., [18] and Gäbler et al. [20], respectively. In brief, the grains are ablated by a 193 nm excimer laser using 50 µm spots or short 50 µm wide lines. The ablated material is transported into a sector field ICP-MS instrument where different suites of elements are monitored for cassiterite and coltan. For cassiterite, initially 45 elements are monitored [20], but in a later stage of the project, the list of the monitored elements was reduced to 28 elements (list of elements see Section 3.1), because some elements such as the rare earth elements did not show detectable concentrations in the vast majority of the analysed cassiterite grains. For coltan, 42 elements are monitored (list of elements see Section 3.1). The ICP-MS signals from cassiterite grains are calibrated using the calibration material NIST SRM 610 (National Institute of Standards and Technology, U.S. Department of Commerce, Standard Reference Material) with ¹¹⁸Sn as internal standard [20]. For coltan, an in-house coltan reference sample is used for calibration and a calibration strategy based on a 100% m/m normalisation of the sum of the main oxides combined with internal standardisation for the trace elements using ⁵⁵Mn is applied [18].

2.2. Data Evaluation

AFP data evaluation is done by two different data evaluation approaches. The aim of both approaches is to evaluate whether a sample in question (E) and a corresponding reference sample (D) from the database (i.e., from the declared origin of the sample in question) are from the same origin. If both samples D and E can be regarded as coming from the same origin, the declared origin of the sample in question is credible.

The first one, called KS-D approach, is based on the evaluation and combination of element-specific cumulative distribution functions combined with an empirically deduced decision criterion. The second one is based on the likelihood ratio concept. Both approaches have been used for the AFP of wolframite [21,22].

2.2.1. KS-D Approach

The KS-D approach initially described for wolframite [21] is also suitable for other minerals and is applied to cassiterite and coltan in this study. Element-specific distribution functions were used to characterize the concentrations of a given element of a sample and a non-parametric statistical tool was used for the comparison of samples E and D. The chosen tool is the Kolmogorov–Smirnov statistic KS-D (maximum vertical distance between two empirical cumulative distribution functions, range 0–1), which is used as a measure of similarity to compare the samples E and D for a given element. The median of all element-specific KS-D values of a two-sample comparison is then used as a measure for the degree of similarity of the two samples. A small median KS-D therefore reflects a high degree of similarity, whereas a large median KS-D indicates a low degree of similarity. Element-specific empirical distribution functions for samples are not calculated if more than 30% of the element-specific data of a sample fall below the respective detection limit. This was done to avoid calculating pointless KS-D values as a result of varying detection limits due to differences in the day-to-day performance of the LA-ICP-MS instrument.

To put the resulting median KS-D of two individual samples in relation and decide whether it is a characteristic of two samples with common origin or not, a decision criterion (DC) is required that is empirically deduced. For mine sites with several reference samples, a deposit-specific decision criterion can be applied by calculating median KS-D values for all possible comparisons of reference samples from that mine site. For this mine site, the number of comparisons is n(n-1) for which mean and standard deviation can be calculated. The decision criterion is then calculated as follows:

DC = X + 3σ

(1)

where DC is the decision criterion, X is the mean and σ is the standard deviation calculated from the median KS-D values. If this decision criterion is applied, samples are only accepted to originate from the declared origin if their median KS-D value is smaller than the DC when compared to the single reference samples of the declared origin.

However, if e.g., only two or three reference samples are available from the same mine site this method is not suitable in order to obtain a DC due to the low number of KS-D values. In this case, the DC thus is derived by taking into account all available mine sites with more than one reference sample and median KS-D values are calculated for all reference sample pairs from common mine sites. Then, the DC is calculated as given in Equation (1), however, with X as the mean of all median KS-D values of reference sample pairs with common origin.

2.2.2. Areas Ratio Approach

Martyna et al. [22] have developed a likelihood ratio and an areas ratio approach for the AFP of wolframite, which can be used for cassiterite and coltan as well. They demonstrate that both approaches give comparable results. As the computing time necessary for the areas ratio approach is much shorter, for the present study on cassiterite and coltan the areas ratio approach is applied.

The areas ratio approach considers two contrasting hypothesis:

Hypothesis 1 (H1).

The sample in question (E) and a corresponding reference sample (D) are from the same origin.

Hypothesis 2 (H2).

The sample in question (E) and a corresponding reference sample (D) are from different origins.

Mineral concentrate data analyzed by LA-ICP-MS result in data properties, which have to be taken into account prior to application of likelihood ratio approaches. The multidimensionality and the lack of data normality within samples are those properties, which require that classical likelihood ratio approaches have to be modified [22]. The solution proposed by Martyna et al. [22] is to “distill” a single score from the data, which can be used to calculate the two probabilities that the two samples E and D are either from the same origin or from different origins.

The general idea to develop that score is to compare in a first step the reference sample (D) to all other reference samples of the database X_f (with f = 1 to m, m = number of samples in the database that are not from the same origin as sample D, see [22]). Here, special attention is given to identify the differences between D and each X_f. In a second step, the data of sample E are compared to the results obtained for sample D. If both samples D and E are from the same origin, they should behave very similarly in this exercise and vice versa. The differences between sample D and E are summarized in a score and compared to scores, which are obtained from sample pairs with a common origin and sample pairs with different origins.

The above-mentioned score is obtained by procedures detailed in Martyna et al. [22]. First, the original variables (element concentrations of the mineral grains) were log-transformed to reduce the huge data ranges. The data for each pair of D and X_f are combined and a robust principal component analysis (rPCA) [23] was applied to reduce data dimensionality. In a second step, linear discriminant analysis (LDA) was applied on those principal components, which explain 95% of the square of the median absolute deviation (MAD² [23]) to identify the direction (t_f) that separates best the grains from D and X_f. The log-transformed data of samples D, X_f and E are projected on the obtained rPCA directions and then on the LDA direction t_f. This results in three distributions (built up by the different mineral grains in each sample) of D, X_f and E over the LDA direction t_f [22]. As t_f was chosen to separate D and X_f best, the distributions for D and X_f should be maximally separated. If samples D and E are similar to each other the distribution for E is expected to be close to that of D, if this is not the case, the distribution of E should appear somewhere else (see Figure 2 in [22]). The Kolmogorov–Smirnov distance (KSD) is computed to characterize the difference between the distributions for E and D (KSD(ED)) and the difference between the distributions for E and X_f (KSD(EX_f)). Both KSD values are combined to give the ΔKSD defined as ΔKSD = KSD(ED) − KSD(EX_f). If samples D and E are from the same origin, they are expected to have a similar geochemical composition. In this case, KSD(ED) is expected to give a small value and KSD(EX_f) is expected to give a large value because t_f was designed to maximize the difference between D (and E, if D and E are from the same origin) and X_f. Therefore, the expected value for ΔKSD should be <0 if sample E and D are from the same origin and ≈0 or >0 if they are not (see Table 1 in [22]).

Looking at a single case where a sample E is declared to originate from the location represented by sample D in the database, m different ΔKSD values are obtained and must be evaluated together. The m different ΔKSD values result in a distribution of ΔKSD values for that single case. This distribution is interpreted in the light of a likelihood ratio approach, which means comparing it to two ΔKSD distributions obtained from (i) comparisons of two samples with a common origin and (ii) comparisons of two samples with different origins.

This is achieved by computing the common area below the ΔKSD distribution curve of the single case and (i) the ΔKSD distribution obtained by comparisons of two samples from the same origin and (ii) the ΔKSD distribution obtained by comparisons of two samples with different origins [22]. The area ratio (AR) of both common areas is computed to indicate which hypothesis H1 (AR > 1 or log AR > 0) or H2 (AR < 1 or log AR < 0) is supported.

In Martyna et al., [22], this model is called ΔKSD-AR (AR: areas ratio) and looks similar to a likelihood ratio model, but is not a likelihood ratio model in a strict sense. Martyna et al. [22] explain this in detail with the consequence that an improved model ΔKSD-AR-LR (LR = likelihood ratio) was developed.

3. Results and Discussion

3.1. Reference Samples Database

For each commodity, a database has been established that is updated regularly. The database contains reference samples from Central Africa and additional samples collected worldwide. All samples are ore concentrates and have been analyzed by the methods given above. The results are stored in a database.

The database for cassiterite consists of 13,538 individual cassiterite grains, each analyzed for 28 elements. Summary statistics for cassiterite are given in Table 1. The coltan database includes data of 13,238 individual coltan grains analyzed for 42 elements. Summary statistics for coltan are given in Table 2.

3.2. Performance of the Data Evaluation Approaches

For the statistical evaluation of ore concentrate data in terms of an AFP, some properties of those ore concentrates have to be taken into account [21,22]. The data of an ore concentrate are multivariate, not normally-distributed and due to the process of mining, the samples cannot be regarded as representative aliquots of a population (e.g., geological formation). Brother samples are similar to each other, but they are not representative aliquots of their geological formation. This means that non-parametric statistical tools should be preferred for data evaluation and tools for decision-making should be based on empirical data obtained from geological formations. Both data evaluation approaches applied in this study consider these thoughts.

In order to evaluate the performance of both data evaluation approaches, training and test sets of sample pairs for cassiterite and coltan are specified. The training sets are used to develop the criteria for AFP decision-making from empirical data while the test sets are used to deduce performance data. The test sets are not used to develop the AFP decision-making criteria and provide known brother and non-brother properties. The data evaluation approaches with the decision-making parameters (DC for the KS-D approach and AR for the areas ratio approach) obtained from the training sets are applied on the test set sample pairs. The results are compared to the known brother and non-brother properties of the test set sample pairs, which enables the calculation of the performance data.

The cassiterite database provides information on 268 brother sample pairs, while for coltan, 142 brother sample pairs are available. For the training set, 201 brother sample pairs are randomly selected for cassiterite while 102 brother sample pairs are selected for coltan. Accordingly, the test sets consisted of the remaining 67 and 40 brother sample pairs for cassiterite and coltan, respectively. For the areas ratio approach, the training and test sets are completed by the same numbers of randomly selected non-brother sample pairs as are used for brother sample pairs. For the KS-D approach this has to be done only for the test set.

For the performance tests, the random selection of training and test sets was repeated 10 times. This resulted in the simulation of 1340 single cases for cassiterite (670 for known brother sample pairs and 670 for known non-brother sample pairs) and 800 single cases for coltan (400 for known brother sample pairs and 400 for known non-brother sample pairs), which are used to obtain the performance data.

3.2.1. KS-D Approach

In order to evaluate the performance of the KS-D approach on cassiterite, KS-D values are calculated (see Section 2.2.1) for the brother sample pairs of the training set and the brother and non-brother sample pairs of the test set. Following Equation (1), with X as the mean and σ as the standard deviation of the obtained 201 KS-D values for the brother sample pairs of the training set, a single DC is derived. As a variation of Equation (1), the term 3σ can be replaced by 2σ, which results in a less rigorous evaluation in terms of avoiding the erroneous rejection of true brother sample pairs.

The KS-D values of the test set sample pairs are evaluated according to Equation (2):

KS-D_n = DC − KS-D

(2)

where KS-D_n indicates whether a sample pair is regarded as a brother or non-brother sample pair. KS-D_n is supposed to be positive for brother and negative for non-brother sample comparisons.

Hence, all negative KS-D_n values of brother sample pairs of the test set represent incorrectly assigned brother sample comparisons, which correspond to false negative (FN) results in the sense of hypothesis testing (see areas ratio approach). All positive KS-D_n values of non-brother sample pairs of the test set represent incorrectly assigned non-brother sample pairs, which corresponds to false positive (FP) results in the sense of hypothesis testing (see areas ratio approach). This procedure of performance evaluation was applied on the data of both commodities.

The obtained distributions of KS-D_n for brother and non-brother comparisons are given in Figure 1 and Figure 2 as density functions.

For both commodities, brother sample comparisons are overwhelmingly correctly classified. They display positive KS-D_n values with FN rates of about 1.2% (3.4% for the 2σ variation) for cassiterite and 0% for coltan (7.3% for 2σ). A correct classification is also achieved for the majority of non-brother samples, which show negative KS-D_n values with FP rates of about 24.6% (14% for 2σ) for cassiterite and 45% for coltan (29% for 2σ).

These numbers clearly demonstrate that although the KS-D approach reaches a significant separation between the two distributions, the false classification is not equally distributed and the chance of classifying non-brothers incorrectly is by a factor of ten higher than classifying brother samples incorrectly. Within the framework of AFP, low FN rates at the expense of higher FP rates are desired to avoid correctly declared shipments being spuriously questioned.

3.2.2. Areas Ratio Approach

From the brother and non-brother sample pairs of the training sets, typical ΔKSD distributions for (i) comparisons of two samples with a common origin (brother sample pairs) and (ii) comparisons of two samples with different origins (non-brother sample pairs) are calculated. These distributions are used in an areas ratio approach to evaluate the brother and non-brother sample pairs of the test sets as single cases. For each test sample pair, a ratio of common areas (see Section 2.2.2) is obtained, which indicates whether H1 (log AR > 0) or H2 (log AR < 0) is supported.

The obtained distributions of the log AR for brother and non-brother sample pairs are illustrated in Figure 3. For brother sample pairs, only a small part of the obtained log AR is below 0, while this is the case for the vast majority of the log AR computed for non-brother pairs. This demonstrates that the applied approaches work and result in low numbers of false results.

Non-brother sample pairs that give log AR > 0 support the wrong hypothesis and are called false positive (FP) results. Brother sample pairs that give log AR < 0 support the wrong hypothesis as well and are called false negative (FN) results.

The performance test gives 5.4% of FP results and 2.1% of FN results for cassiterite and 10% of FP results and 5.8% of FN results for coltan. The higher rate of FP compared to FN results is acceptable for AFP as in a FP case, a wrongly declared origin is erroneously accepted, which means that in this case a suspect declaring the wrong origin is just a “lucky guy”. A lower rate of FN results is appreciated for AFP as in FN result cases, a correctly declared sample would erroneously be identified as being not credible. Looking at the value of the AR in a real case provides the opportunity to estimate how strong H1 or H2 are supported. If the AR is around one (or log AR around zero) then the result is only weakly supported, but if the AR gives a value quite different from one, the result is more strongly supported. For example, in the performance test for coltan, the AR values of the obtained FN results are always larger than 0.84, which indicates that the evidence for all FN results is not very strong. This means that in a real case, in addition to the evaluation of whether AR is above or below one, the value of the AR (how much it differs from one) has always to be taken into account as this indicates how strong H1 or H2 are supported.

The results of a performance test, as carried out in this study, is dependent on a careful sampling of brother samples on a mine site. This is especially important if cassiterite appears in pegmatites and quartz veins on the same mine site. The geochemical composition of cassiterite from quartz veins and pegmatites differs significantly due to the completely different geochemical processes causing cassiterite formation. If cassiterite bearing pegmatites and quartz veins are mined on the same mine site, it is advantageous for AFP to separate them in different reference samples—as strict as possible. Six out of 14 FN results from this performance test originate from a single mine site where pegmatites and quartz veins are mined. It cannot be excluded that the reference samples from this mine site contain at least some cassiterite portions from both types of cassiterite host material. FN results may appear if mixed samples are compared to pure quartz vein or pure pegmatite samples.

3.2.3. Comparison of the Performance of Both Data Evaluation Approaches

Both data evaluation approaches give comparable low rates for FN results. This is highly appreciated for AFP, because blaming a correctly declared sample as being not credible must be avoided. FP results, which mean a wrongly declared origin is erroneously accepted as credible, are not desirable but have no negative impact on the owner of a shipment as a FN result would have. The higher rates of FP results obtained by the KS-D approach compared to the areas ratio approach may be due to (i) the fact that in the areas ratio approach from the obtained analytical data, typical sample properties are more rigorously extracted using robust and multivariate PCA while this is not done in the univariate KS-D approach where all analyzed elements are considered equally and (ii) for decision-making, the areas ratio approach studies the properties of typical brother and non-brother comparisons while the KS-D approach only works out the properties of brother sample comparisons.

3.3. Case Studies

Two case studies are presented in more detail to illustrate the data evaluation approaches. The case studies reflect mine sites that are mined for coltan or cassiterite.

3.3.1. Case Study Coltan

The mine site of the first case study is a single, elongated pegmatite crosscut by quartz veins with coltan as the predominant commodity. Cassiterite is only present in a minor amount. Four reference samples have been taken from this mine site with sampling distances between each other varying between 10 m and 110 m. Two sampling campaigns at two different dates have been carried out by different accredited sampling teams to obtain the reference samples.

KS-D Approach

The number of four available reference samples is not enough to obtain a reasonable deposit-specific DC. Therefore, the standard DC approach is applied and only the brother sample comparisons, not related to any reference sample of the mine site of this case study, are used. This leaves 256 comparisons, which are used to calculate a DC for both sigma variations as described in Section 2.2.1.

The four available reference samples imply a number of 12 available brother comparisons for the mine site of this case study, which are checked against the previously obtained DC (DC = 0.35 for 3σ and 0.31 for 2σ). All comparisons between the four samples produce a lower median KS-D (0.20–0.27, see Table 3) and thus are accepted as being brother comparisons (FN = 0%).

To obtain the rate of false-positive results, all available samples not originating from the mine site of this case study are compared to samples CS1_1 to CS1_4 by checking the respective KS-D against the above mentioned DC of each sigma variation. The result is a misclassification of non-brothers (FP) of 65% for 3σ and 43% for 2σ. The relatively high rates of FP results can be attributed to the fact that for this mine site the calculation of a deposit-specific DC is not possible. In the case that a shipment with a declared origin from this mine site shall be evaluated, it is recommended to sample additional reference samples to apply a site-specific DC, which would reduce the portion of FP results (see cassiterite case study below).

Areas Ratio Approach

The areas ratio approach for the coltan case study mine site is applied by calculating log AR for all possible two-sample comparisons of reference samples from this mine site (samples CS1_1–CS1_4). Additionally, three samples from the three mines closest to the case study mine site (CS1_5–CS1_7) are compared to the reference samples of the mine site. The distances between the three additional samples and the case study mine site are 10 km, 27 km, and 50 km, respectively. The log AR of the two-sample comparisons are given in Table 4.

All possible two-sample comparisons of reference samples from the coltan case study mine site give positive log AR between 0.26 and 0.33 (Table 4), which strongly support H1—both samples are from the same origin. The log AR for the comparisons of samples from the mines closest to the case study mine site (CS1_5–CS1_7) and the reference samples of the case study mine site are negative or slightly positive (range: −0.26–0.13). The negative values support H2—both samples are from different origins. The small positive values just slightly support H1. This behavior is clearly different from that of two-sample comparisons of reference samples from the case study mine site and demonstrates that the application of the areas ratio approach works for this case study.

3.3.2. Case Study Cassiterite

The mine site of this case study hosts a cassiterite and coltan bearing pegmatite of elongated shape striking in a northwest-southeast direction. The lateral dimensions of the deposit are roughly 1000 m in length with a width of 100–200 m. Seven reference samples distributed all over the mine site are available. Two sampling campaigns at two different dates have been carried out by different accredited sampling teams to obtain the reference samples. The distances between the sampling points of the reference samples are between 50 m and 800 m.

The number of available reference samples is sufficient to obtain a reasonable mean and standard deviation required for the calculation of a deposit-specific DC for the KS-D approach. Therefore, for this case, it is possible to conduct data evaluation based on the application of (i) a deposit-specific DC and (ii) a standard DC calculated from all available deposits where at least two reference samples are available.

KS-D Approach with Deposit-Specific DC

The application of a deposit-specific DC is evaluated using a leave-one-out approach. One of the seven reference samples is selected and treated as a sample in question. The six remaining reference samples are tested against each other as described in Section 2.2.1, resulting in 30 two-sample comparisons. For the thus obtained 30 KS-D values, mean and standard deviation are determined and a deposit-specific DC is calculated. The sample left out in the first step is then tested against this DC by individual comparison with the six samples that were used for the calculation of the DC. This procedure is repeated for all reference samples of this mine site. Both variations for the calculation of the DC (3σ and 2σ) are applied. As there are seven samples available, the total number of brother comparisons to test the deposit-specific DC in a leave-one-out approach is 42 per sigma variation. The rates of FN results from this test are given in Table 5.

The evaluation regarding the non-brothers is realized in a similar way. This time, all reference samples were compared to each other and a deposit-specific DC including information from all reference samples was deduced (DC = 0.35 for 3σ and 0.31 for 2σ). Subsequently, every sample not originating from the mine site of this case study was compared to this DC. The number of possible non-brother comparisons is 2828 and hence is significantly larger than the number of brother comparisons. The rates of FP results are 5.3% for the 3σ variation and 3.2% for the 2σ variation.

Performance data of the 3σ variation demonstrates a 100% rate of correct classification of brother samples. Although the 2σ variation performs better with regards to the incorrectly classified non-brothers represented by FP values of only 3.2%, brother comparisons are sometimes not recognized as such, which results in a very high FN rate of 50% in case of sample CS2_6 and a single misclassification for sample CS2_7. This demonstrates that the calculation of a DC using 3σ should be preferred for AFP as for AFP erroneously not accepted but correctly declared sample origins (FN results) have to be avoided.

KS-D Approach with Standard DC

This approach also refers to the explanation in Section 2.2.1 but is modified regarding the database in order to serve in this case study. From the initially available 268 brother sample pairs, only those are used to deduce a DC that are not from the mine site of this case study. These 226 comparisons are used to deduce a DC. Again, two variations for the calculation of the DC (3σ, 2σ) are applied.

The 42 possible brother sample comparisons for this mine site are checked against the respective DC (0.49 for the 3σ variation and 0.42 for the 2σ variation). No incorrectly assigned brother sample pairs (FN = 0%) for both σ variations are obtained (Table 6).

For the non-brothers, all available samples not originating from this case study’s mine site are compared with the reference samples of this mine site via the KSD method, resulting in 2828 possible non-brother sample comparisons. The rate of incorrect recognitions of non-brother samples (FP results) is 21.3% for the 3σ variation and about 11.1% for 2σ.

Comparable low rates for FN results (at least for the 3σ variation of the DC calculations) but lower rates of FP results make the deposit-specific DC approach more advantageous compared to the standard DC approach. If enough (e.g., five or more) reference samples from one mine site are available, the deposit-specific approach should be applied.

Areas Ratio Approach

The areas ratio approach for the case study mine site is applied by calculating log AR for all possible two-sample comparisons of reference samples from this mine site (samples CS2_1–CS2_7). Additionally, three samples from mines nearby (CS2_8–CS2_10) are compared to the reference samples of the mine site. Sample CS2_8 originates from a site where a quartz vein is mined for cassiterite, about 650 m away from the center of the case study mine site. Samples CS2_9 and CS2_10 originate from pegmatite mine sites 3 and 12 km away from the case study mine site, respectively. The log AR results are given in Table 7.

All two-sample comparisons of reference samples result in log AR between 0.31 and 0.65 and as such clearly support H1—both samples come from the same origin. All two-sample comparisons of samples from the other three mine sites and a sample from the case study mine site result in log AR < 0 and support H2—both samples are from different origins. Sample CS2_8 gives the most negative results for log AR (−0.93–−0.80). This indicates a large difference in the geochemistry of this sample compared to the samples of the case study mine site, although this sample was taken quite close to the case study mine site. The reason for this is that sample CS2_8 originates from a quartz vein mine while the samples from the case study mine site originate from a pegmatite. The formation of cassiterite in quartz veins and pegmatites occurs under quite different geochemical conditions, which results in different compositions of cassiterite concerning the incorporated trace element. For cassiterite, this has to be taken into account when reference samples are taken and used for AFP.

The above presented data show what the results of the areas ratio approach look like and how they can be used to check whether the information on the provenance of a cassiterite shipment is credible or not.

4. Conclusions

Both data evaluation approaches can be used for both commodities. Criteria for AFP decision-making can be derived from empirical data for both data evaluation approaches.

Performance tests for both data evaluation approaches resulted in very low rates of false negative results (0–7.3% for KS-D and 2.1–5.8% for the areas ratio approach), which is desired for AFP. The areas ratio approach based on hypothesis testing and a more sophisticated evaluation of the multivariate data structure has some advantages in terms of producing lower rates of false positive results (5.4–10%) compared to the KS-D approach (25–45%).

The two presented case studies reveal that for the KS-D approach the application of a deposit-specific decision criterion (DC) should be preferred to reduce the rate of false positive results. This means that at least four or five reference samples should be collected from a single mine site to enable the calculation of a deposit-specific decision criterion.

Author Contributions

Conceptualization, methodology and formal analysis H.-E.G., W.S. and T.G.; writing—original draft preparation, H.-E.G. and W.S.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors are grateful to Alexis Kagaba, Thomas Munyambonera, Justin Uwema, Frank Melcher, Torsten Graupner, Philip Schütte, Rudolf Mauer, and Maren Liedtke for sampling the reference samples. Peter Rendschmidt, Donald Henry, and Thomas Munyambonera are thanked for the careful preparation of polished sections. Maria Sitnikova is thanked for providing numerous MLA maps of coltan and cassiterite concentrates. Two anonymous reviewers are thanked for their valuable comments on an earlier version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

US Congress. Dodd-Frank Wall Street Reform and Consumer Protection Act; Section 1501, Public Law No. 111-203, 21 July 2010, Section 1502; US Congress: Washington, DC, USA, 2010.
European Parliament and Council. Regulation (EU) 2017/821 of the European Parliament and of the Council of 17 May 2017 laying down supply chain due diligence obligations for Union importers of tin, tantalum and tungsten, their ores, and gold originating from conflict-affected and high-risk areas. Off. J. Eur. Union 2017, 130, 1–20. [Google Scholar]
OECD. OECD Due Diligence Guidance for Responsible Supply Chains of Minerals from Conflict-Affected and High-Risk Areas, 3rd ed.; OECD Publishing: Paris, France, 2016. [Google Scholar]
Vogel, C.; Raeymaekers, T. Terr(it)or(ies) of Peace? The Congolese Mining Frontier and the Fight Against “Conflict Minerals”. Antipode 2016, 48, 1102–1121. [Google Scholar] [CrossRef]
Schütte, P. International mineral trade on the background of due diligence regulation: A case study of tantalum and tin supply chains from East and Central Africa. Resour. Policy 2019, 62, 674–689. [Google Scholar] [CrossRef]
Franken, G.; Vasters, J.; Dorner, U.; Melcher, F.; Sitnikova, M.; Goldmann, S. Certified Trading Chains in Mineral Production: A Way to Improve Responsibility in Mining. In Non-Renewable Resource Issues: Geoscientific and Societal Challenges; Sinding-Larsen, R., Wellmer, F.-W., Eds.; Springer: Dordrecht, The Netherlands, 2012; pp. 213–227. [Google Scholar] [CrossRef]
Young, S.B. Field vignette: Sourcing “conflict-free” minerals from central Africa. In Africa’s Mineral Fortune: The Science and Politics of Mining and Sustainable Development; Taylor and Francis: London, UK, 2018; pp. 300–301. [Google Scholar] [CrossRef]
Wakenge, C.I.; Dijkzeul, D.; Vlassenroot, K. Regulating the old game of smuggling? Coltan mining, trade and reforms in the Democratic Republic of the Congo. J. Mod. Afr. Stud. 2018, 56, 497–522. [Google Scholar] [CrossRef]
Zwaan, J.C.; Mertz-Kraus, R.; Renfro, N.D.; McClure, S.F.; Laurs, B.M. Rhodochrosite Gems: Properties and Provenance. J. Gemmol. 2018, 36, 14. [Google Scholar]
Dalpé, C.; Hudon, P.; Ballantyne, D.J.; Williams, D.; Marcotte, D. Trace Element Analysis of Rough Diamond by LA-ICP-MS: A Case of Source Discrimination? J. Forensic Sci. 2010, 55, 1443–1456. [Google Scholar] [CrossRef]
Ahmadjan Abduriyim, H.K. Applications of laser ablation–inductively coupled plasma–mass spectrometry (LA-ICP-MS) to gemology. Gems Gemol. 2006, 42, 98–118. [Google Scholar]
Pornwilard, M.-M.; Hansawek, R.; Shiowatana, J.; Siripinyanond, A. Geographical origin classification of gem corundum using elemental fingerprint analysis by laser ablation inductively coupled plasma mass spectrometry. Int. J. Mass Spectrom. 2011, 306, 57–62. [Google Scholar] [CrossRef]
Roberts, R.J.; Dixon, R.D.; Merkle, R.K.W. Distinguishing Between Legally and Illegally Produced Gold in South Africa. J. Forensic Sci. 2016, 61, S230–S236. [Google Scholar] [CrossRef]
Watling, R.J.; Herbert, H.K.; Delev, D.; Abell, I.D. Gold fingerprinting by laser ablation inductively coupled plasma mass spectrometry. Spectrochim. ACTA Part B At. Spectrosc. 1994, 49, 205–219. [Google Scholar] [CrossRef]
Sirven, J.; Pailloux, A.; M’Baye, Y.; Coulon, N.; Alpettaza, T.; Stephane, G. Towards the determination of the geographical origin of yellow cake samples by laser-induced breakdown spectroscopy and chemometrics. J. Anal. At. Spectrom. 2009, 24, 451–459. [Google Scholar]
Machault, J.; Barbanson, L.; Augé, T.; Bailly, L.; Orgeval, J.-J. Mineralogical and microtextural parameters in metals ores traceability studies. Ore Geol. Rev. 2014, 63, 307–327. [Google Scholar] [CrossRef]
Melcher, F.; Sitnikova, M.; Graupner, T.; Martin, N.; Oberthür, T.; Henjes-Kunst, F.; Gäbler, E.; Gerdes, A.; Brätz, H.; Davis, D.; et al. Fingerprinting of conflict minerals: Columbite-tantalite (“coltan”) ores. SGA News 2008, 23, 1–14. [Google Scholar]
Gäbler, H.-E.; Melcher, F.; Graupner, T.; Bahr, A.; Sitnikova, M.A.; Henjes-Kunst, F.; Oberthür, T.; Brätz, H.; Gerdes, A. Speeding Up the Analytical Workflow for Coltan Fingerprinting by an Integrated Mineral Liberation Analysis/LA-ICP-MS Approach. Geostand. Geoanalytical Res. 2011, 35, 431–448. [Google Scholar] [CrossRef]
Savu-Krohn, C.; Rantitsch, G.; Auer, P.; Melcher, F.; Graupner, T. Geochemical Fingerprinting of Coltan Ores by Machine Learning on Uneven Datasets. Nat. Resour. Res. 2011, 20, 177–191. [Google Scholar] [CrossRef]
Gäbler, H.-E.; Rehder, S.; Bahr, A.; Melcher, F.; Goldmann, S. Cassiterite fingerprinting by LA-ICP-MS. J. Anal. At. Spectrom. 2013, 28, 1247–1255. [Google Scholar] [CrossRef]
Gäbler, H.-E.; Schink, W.; Goldmann, S.; Bahr, A.; Gawronski, T. Analytical Fingerprint of Wolframite Ore Concentrates. J. Forensic Sci. 2017, 62, 881–888. [Google Scholar] [CrossRef]
Martyna, A.; Gäbler, H.-E.; Bahr, A.; Zadora, G. Geochemical wolframite fingerprinting—the likelihood ratio approach for laser ablation ICP-MS data. Anal. Bioanal. Chem. 2018, 410, 3073–3091. [Google Scholar] [CrossRef]
Varmuza, K.; Filzmoser, P. Introduction to Multivariate Statistical Analysis in Chemometrics; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar] [CrossRef]

Figure 1. Distributions of brother and non-brother comparison test sets of KS-D_n of cassiterite and coltan for the respective decision criterion (DC) calculated with 3σ.

Figure 2. Distributions of brother and non-brother comparison test sets of KS-D_n of cassiterite and coltan for the respective DC calculated with 2σ.

Figure 3. Distributions of log area ratios (log AR) for brother and non-brother sample pairs.

Table 1. Summary statistics of element concentrations in cassiterite grains.

Element	5th Percentile (mg·kg⁻¹)	50th Percentile (mg·kg⁻¹)	95th Percentile (mg·kg⁻¹)
Al	5.7	89.1	2440.8
As	<0.22	0.22	8.83
Ba	<0.18	<0.18	3.26
Bi	<0.03	<0.03	0.96
Ca	<10	<10	44
Cr	<1.2	<1.2	8.2
Fe	98	1115	8126
Ga	0.08	1.39	22.35
Ge	<0.15	0.15	1.42
Hf	0.07	36.40	430.51
In	<1.9	<1.9	25.9
Mg	<0.1	<0.1	25.7
Mn	<1	14	1441
Nb	2	1168	16,386
Pb	0.02	0.47	9.25
Sb	0.20	0.62	18.94
Sc	<0.12	4.00	85.98
Si	<121	436	2419
Sr	<0.03	0.03	1.73
Ta	0.04	1738	41,480
Th	<0.01	0.01	1.23
Ti	26	1071	5699
U	0.13	1.50	19.07
V	<0.24	3.67	85.77
W	0.8	11.2	1357.6
Y	<0.03	0.03	2.15
Zn	<0.9	0.9	57.3
Zr	2	389	1779

Table 2. Summary statistics of element concentrations in coltan grains. Please note that the concentrations of the main element oxides are given in (%).

Element	5th Percentile (mg·kg⁻¹)	50th Percentile (mg·kg⁻¹)	95th Percentile (mg·kg⁻¹)
Ta₂O₅ (%)	8.9	36.2	70.1
Nb₂O₅ (%)	12.6	44.2	68.3
MnO (%]	1.9	8.8	17.6
FeO (%)	0.56	9.35	17.18
SnO₂ (%)	0.01	0.13	0.76
TiO₂ (%)	0.02	0.28	1.57
WO₃ (%)	0.01	0.12	0.93
Al	9	57	1360
As	<0.4	0.4	14.2
Ba	<0.3	0.3	52.2
Be	<0.9	<0.9	1.8
Bi	<0.11	0.11	30.86
Ca	<63	<63	460
Ce	<0.03	0.13	39.86
Dy	<0.08	0.78	94.21
Er	<0.06	0.39	39.56
Eu	<0.05	<0.05	1.01
Gd	<0.15	0.23	35.78
Hf	18	95	624
Ho	<0.04	0.12	14.13
La	<0.04	0.03	8.99
Li	<7	7	102
Lu	<0.03	0.13	12.31
Mg	<4	35	1273
Mo	<1.3	1.9	7.6
Nd	<0.08	0.08	17.60
Pb	2.7	34.2	656.5
Pr	<0.01	0.01	3.13
Rb	<0.4	<0.4	8.3
Sb	<0.04	<0.04	3.98
Sc	0.5	5.9	306.0
Si	<281	281	1842
Sm	<0.10	0.10	16.23
Sr	<0.2	0.2	29.8
Tb	<0.05	0.11	15.29
Th	0.18	3.00	73.22
Tl	<0.03	0.03	7.99
Tm	<0.04	0.08	8.95
U	19	204	1743
Y	0.05	3.94	491.92
Yb	<0.10	0.83	80.75
Zr	126	737	4028

Table 3. Kolmogorov–Smirnov distance (KS-D) values of sample comparisons from case study 1, sample in question in first line.

Sample	CS1_1	CS1_2	CS1_3	CS1_4
CS1_1		0.20	0.20	0.27
CS1_2	0.20		0.18	0.25
CS1_3	0.20	0.21		0.23
CS1_4	0.23	0.24	0.23

Table 4. Log AR for two-sample comparisons of reference samples (CS1_1–CS1_4) from the coltan case study mine site and two-sample comparisons of samples from other mine sites nearby (CS1_5–CS1_7) and samples from the case study mine site (for details, see text).

Sample	CS1_1	CS1_2	CS1_3	CS1_4	CS1_5	CS1_6	CS1_7
CS1_1		0.27	0.33	0.26	−0.26	−0.06	−0.24
CS1_2	0.30		0.33	0.28	−0.08	0.13	−0.19
CS1_3	0.32	0.32		0.33	−0.03	0.09	−0.13
CS1_4	0.26	0.29	0.30		0.05	0.07	−0.13

Table 5. Performance data of the leave-one-out approach for reference samples of case study 2.

Sample Left Out	CS2_1	CS2_2	CS2_3	CS2_4	CS2_5	CS2_6	CS2_7
mean	0.232	0.243	0.238	0.232	0.232	0.217	0.220
sd	0.041	0.035	0.037	0.040	0.045	0.035	0.039
DC_3sigma	0.355	0.349	0.347	0.353	0.368	0.322	0.337
DC_2sigma	0.314	0.313	0.311	0.313	0.322	0.287	0.298
FN_3sigma	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
FN_2sigma	0.0%	0.0%	0.0%	0.0%	0.0%	50.0%	16.7%

Table 6. KS-D values of sample comparisons of case study 2 using the standard KS-D approach, sample in question in first line.

Sample	CS2_1	CS2_2	CS2_3	CS2_4	CS2_5	CS2_6	CS_7
CS2_1		0.18	0.19	0.24	0.23	0.29	0.29
CS2_2	0.18		0.15	0.17	0.22	0.25	0.24
CS2_3	0.19	0.15		0.19	0.22	0.28	0.27
CS2_4	0.24	0.17	0.19		0.26	0.30	0.23
CS2_5	0.19	0.21	0.20	0.24		0.23	0.25
CS2_6	0.24	0.23	0.26	0.28	0.23		0.30
CS2_7	0.25	0.22	0.25	0.22	0.26	0.30

Table 7. Log AR for two-sample comparisons of reference samples from the case study mine site (CS2_1–CS2_7) and two-sample comparisons of samples from other mine sites (CS2_8–CS2_10) and samples from the case study mine site (for details, see text).

Sample	CS2_1	CS2_2	CS2_3	CS2_4	CS2_5	CS2_6	CS2_7	CS2_8	CS2_9	CS2_10
CS2_1		0.63	0.58	0.56	0.49	0.38	0.52	−0.80	−0.08	−0.37
CS2_2	0.65		0.60	0.49	0.48	0.41	0.48	−0.93	−0.07	−0.49
CS2_3	0.56	0.52		0.51	0.42	0.31	0.38	−0.89	−0.08	−0.48
CS2_4	0.62	0.62	0.62		0.43	0.43	0.50	−0.90	−0.11	−0.45
CS2_5	0.54	0.45	0.41	0.41		0.44	0.49	−0.88	−0.04	−0.39
CS2_6	0.42	0.44	0.47	0.34	0.56		0.50	−0.87	−0.07	−0.52
CS2_7	0.57	0.54	0.45	0.50	0.59	0.46		−0.81	−0.17	−0.43

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gäbler, H.-E.; Schink, W.; Gawronski, T. Data Evaluation for Cassiterite and Coltan Fingerprinting. Minerals 2020, 10, 926. https://doi.org/10.3390/min10100926

AMA Style

Gäbler H-E, Schink W, Gawronski T. Data Evaluation for Cassiterite and Coltan Fingerprinting. Minerals. 2020; 10(10):926. https://doi.org/10.3390/min10100926

Chicago/Turabian Style

Gäbler, Hans-Eike, Wilhelm Schink, and Timo Gawronski. 2020. "Data Evaluation for Cassiterite and Coltan Fingerprinting" Minerals 10, no. 10: 926. https://doi.org/10.3390/min10100926

APA Style

Gäbler, H.-E., Schink, W., & Gawronski, T. (2020). Data Evaluation for Cassiterite and Coltan Fingerprinting. Minerals, 10(10), 926. https://doi.org/10.3390/min10100926

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data Evaluation for Cassiterite and Coltan Fingerprinting

Abstract

1. Introduction

2. Materials and Methods

2.1. Samples, Database, Sample Preparation and Analysis

2.1.1. Samples and Database

2.1.2. Sample Preparation

2.1.3. Analysis

2.2. Data Evaluation

2.2.1. KS-D Approach

2.2.2. Areas Ratio Approach

3. Results and Discussion

3.1. Reference Samples Database

3.2. Performance of the Data Evaluation Approaches

3.2.1. KS-D Approach

3.2.2. Areas Ratio Approach

3.2.3. Comparison of the Performance of Both Data Evaluation Approaches

3.3. Case Studies

3.3.1. Case Study Coltan

KS-D Approach

Areas Ratio Approach

3.3.2. Case Study Cassiterite

KS-D Approach with Deposit-Specific DC

KS-D Approach with Standard DC

Areas Ratio Approach

4. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI