MDS configuration using the KS distance:
DZ.X <- MDS(DZ)
plot(DZ.X)
```
In this case, the overloaded plot function produces not one but two graphics windows. The first of these shows the MDS configuration, whereas the second shows the *Shepard plot* [40,41]. This is a scatterplot that sets out the Euclidean distances between the samples measured on the MDS configuration against the *disparities*, which are defined as:

$$
\delta[i, j] = f(KS[i, j])\tag{20}
$$

where *KS*[*i*, *j*] is the KS-distance between the *i* th and *j* th sample and *f* is a monotonic transformation, which is shown as a step-function. The Shepard plot allows the user to visually assess the goodness-of-fit of the MDS configuration. This can be further quantified using the 'stress' parameter:

$$S = \sum\_{i} \sum\_{j} (d[i, j] - \delta[i, j])^2 \Big/ \sum\_{i} \sum\_{j} (d[i, j])^2 \tag{21}$$

The lower the stress, the better the fit. For moderately sized datasets, stress values should be less than 10% [40]. For larger datasets, a higher dimensional solution may be necessary, using the optional parameter k of provenance's MDS function [50].

#### **11. 'Big' Data**

*Summary: The tutorial jointly analyses 16 Namibian samples using five different provenance proxies, including all three data classes introduced in Sections 3–5. It introduces Procrustes Analysis and 3-way MDS as two alternative ways to extract geologically meaningful information from these multivariate 'big' dataset.*

It is increasingly common for provenance studies to combine compositional, point-counting or distributional datasets together [4,13]. Linking together bulk sediment data, heavy mineral data and single mineral data requires not only a sensible statistical approach, but also a full appraisal of the impact of mineral fertility and heavy mineral concentration in eroded bedrock and derived clast sediment [51–53]. Assuming that such an appraisal has been made, this Section introduces some exploratory data analysis tools that can reveal meaningful structure in complex datasets.

	- (a) Major element concentrations (Major.csv, compositional data)
	- (b) Trace element concentrations (Trace.csv, compositional data)
	- (c) Bulk petrography (PT.csv, point-counting data)
	- (d) Heavy mineral compositions (HM.csv, point-counting data)
	- (e) Detrital zircon U-Pb data (DZ.csv, distributional data)

All these datasets can be visualised together in a single summary plot:

*Minerals* **2019**, *9*, 193

```
library(provenance)