*3.1. Sampling Procedure*

The sampling strategies, data storage, screening, and validation of the ItacGMBP were entirely assisted by a computer-based framework associated with a geographic information system environment. Firstly, a sampling protocol was structured to guide the field teams in storing all field data using tablets in a geochemical database. Further information can be accessed elsewhere [43].

The soil survey was designed on a nominal scale of 1:250,000, with a sampling density of 1 sample/25 km2, using grids of 5 km × 5 km, homogeneously distributed along the whole IRW (Figure 1b). The surface soil samples were collected at a constant depth of 10 cm. Each surface soil sample was collected and then bulked to provide one composite sample (approximately 5 kg).

The stream sediment survey was designed on a nominal scale of 1:1,000,000, with a sampling density of 1 sample/100 km2, using 2nd–3rd order drainage microcatchments along the entire IRW (Figure 1b). The collection of stream sediment samples (approximately 3 kg) was carried out in active sedimentation areas at the catchment outlet, preferably in the middle of the channel to minimize contamination occurring at the margins.

For the quality control procedure, a duplicate sample was collected for every 20 samples collected. Samples were collected as close as possible to the location defined in the planning field map. Most of the unsampled sites are located in remote areas with severe access limitations. The detailed sampling procedure and description of different sampling media used in the ItacGMBP can be found elsewhere [24,43–46].

In the PB, 364 samples of surface soil and 189 samples of stream sediment were collected. The sampling sites are plotted over the land use and land cover (Figure 1c) and geological layers (Figure 1d) for further inspections.

#### *3.2. Sample Preparation and Chemical Analyses*

The ItacGMBP's analytical program was extensive, but only the relevant analyses for the present study are described. For more information, refer to previous studies [24,43–46]. Both soil and stream sediment samples were submitted to the same preparation and analytical protocols. The samples were oven-dried at 70 ◦C, disaggregated, homogenized, and sieved through a <0.177 mm (80 mesh) sieve. Approximately 50 g of the samples were grounded, sieved through a <75 μm (200 mesh) sieve, and submitted to aqua regia (1:3 M HNO3:HCl) digestion. The aqua regia soluble concentrations of 51 elements (the most relevant for the present study are Fe, Al, Ag, B, As, Ba, Bi, Cd, Co, Cr, Cu, Hg, Mn, Mo, Ni, Pb, Sb, Se, Sn, U, V, and Zn) were determined via Inductively Coupled Plasma—Atomic Emission Spectrometry (ICP-AES) and Inductively Coupled Plasma—Mass Spectrometry (ICP-MS). All samples were prepared and analyzed in laboratories of ALS Brasil Ltda (Belo Horizonte, Brazil), a certified/accredited laboratory.

#### *3.3. Data Processing and Statistical Analysis*

Firstly, the censored data (values below the lower limit of detection—LLD), commonly observed in the geochemical data set, was replaced by LLD/2) indicated for each element, according to the analytical method. This procedure is widely used in the literature [22,23,45,46]. It is important to highlight that, eventually, two sets of data were used in this study, one for the NPB (soils = 223 samples; stream sediments = 122 samples) and another one for the SPB (soils = 141 samples; stream sediments = 67 samples) (cf. Figure 1). Standard statistical procedures [22,23,47] were undertaken in this research, which involves calculating several descriptive statistics parameters (amount of data below LLD, arithmetic mean, standard deviation—SD, minimum—Min, median—Med, and maximum—Max), constructing graphs based on the exploratory data analysis (e.g., box plots, scatter plot and quantile-quantile plot) and conducting hypothesis test (e.g., Mann–Whitney–Wilcoxon test). Data analysis was performed using the R programming language in RStudio.

#### *3.4. The Determination of Geochemical Background Values*

A variety of statistical methods was used for the calculation of geochemical threshold values, which is highly recommended by experts in the field. These methods are described elsewhere [19,20,22,23,48], and the general steps are highlighted herein.

Median + 2 ∗ Median Absolute Deviation (MMAD): The MMAD is considered one of the most prestigious methods for deriving geochemical threshold values [19]. Firstly, a requirement for using the MMAD method is that the raw data should be prior transformed to a common logarithm (log10) scale to ge<sup>t</sup> close to a normal distribution, unusually seen in geochemical data set [23]. In this approach, the median absolute deviation (MAD; Equation (1)), leads to a robust estimation of the variability of univariate geochemical data [23].

$$MAD\_{(y)} = 1.48 \ast medium\_i \mid y\_i - medium\_j(y\_j) \mid \tag{1}$$

Afterward, the *MAD* value should be multiplied by two and added to the median. At this stage, the result should be back-transformed, from logarithmic scale to natural scale, to derive the *MMAD* threshold value (Equation (2)):

$$M\_{MAD(y)} = 10^{\left(madian(y) + 2 \ast MAD(y)\right)} \tag{2}$$

Tukey's Inner Fence (TIF): The TIF method depends solely on the data distribution. In addition, background values can be calculated even if no outliers are present in the data set (threshold value greater than the maximum value). Moreover, the threshold value obtained from this approach is most informative if the true number of outliers is below 10% of the data [19,23]. Firstly, the log10 transformation is required prior to using this approach [19]. The TIF method consists of determining the upper (Q3 = 75th percentile) and lower (Q1 = 25th percentile) quartiles of the boxplot. Then, the inner fence is determined as the interquartile range (IQR = Q3 − Q1) extended by 1.5 times. The threshold value based on the TIF method is calculated following Equation (3):

$$TF = 10^{\left(\text{Q3} + 1.5 \text{ \* IQR} (y)\right)} \tag{3}$$

Percentile-based approaches: These approaches are considered as the most straightforward methods for deriving threshold values [48,49], widely used by many environmental agencies from different countries (e.g., Brazil [50], Australia [51], and Finland [52]). The 98th percentile (P98) has become widespread as a 2% outliers (1 in 50 rate) is deemed acceptable, and it distances the method from the 97.5 percentile, 2.5% 1 in 40 rate, which delivers similar results to the "Mean + 2 ∗ Standard Deviation" (outdated method) when a normal distribution of a given element is satisfied [19]. The 95th percentile (P95) corresponds to a more restricted value, as it considers 5% of all samples as outliers. Other percentile values are also used (e.g., 90th, 85th, and 75th), however, they are not going to be considered as threshold values in the present study.

#### *3.5. Geoprocessing and Spatial Representation of Soil and Stream Sediment Geochemical Data*

Spatial analyses were processed using the software ArcGIS 10.4.1, under the World Geodetic System 1984 (WGS84) datum. The construction of integrated stream sediment and soil geochemical maps was based on two techniques: (i) Catchment-based representation (polygon) for stream sediment data [53–55] which, simply involves attributing the element concentrations to their respective catchment watershed, with the sample collected at the outlet; (ii) Geochemical dot representation (point) for surface soil data, attributing the uni-element concentration to the sampling site [55,56]. The class intervals were defined according to the quantile method, usually applied to right-skewed distribution, commonly seen in geochemical data [57]. For the correct construction of integrated geochemical maps, the same class interval was used for the representation of the concentration (usually in mg kg−1, except for Fe and Al in wt.%) of a given element in soil and in stream sediment. Finally, spatial analyses using geoprocessing techniques (e.g., buffer, clip, intersect and merge) were eventually used for further interpretation.
