create the biplot:
plot(pc.comp)
```
where the read.compositional function reads the .csv file into an object of class compositional, thus ensuring that logratio statistics are used in all provenance functions (such as PCA) that accept compositional data as input. Also note that the provenance package *overloads* the plot function to generate a compositional biplot when applied to the output of the PCA function.

**Figure 5.** (**i**) —the compositional dataset of Equation (15) shown on a ternary diagram; (**ii**)—subjecting the same dataset to an additive logratio transformation (alr) produces a configuration of points that is identical to Figure 4i; (**iii**)—as a consequence, the PCA biplot of the logratio transformed data looks identical to Figure 4iii; (**iv**)—using a centred logratio transformation (clr) yields the same configuration as panel iii but with more easily interpretable vector loadings.

#### **9. Correspondence Analysis**

*Summary: Point-counting data can be analysed by MDS using the Chi-square distance. Correspondence Analysis (CA) yields identical results whilst producing biplots akin to those obtained by PCA. This tutorial first uses a simple three sample, three variable toy example that is (almost) identical to those used in Sections 6–8, before applying CA to a real dataset of heavy mineral counts from Namibia.*

Consider the following three sets of trivariate point-counting data:

$$X = \begin{bmatrix} a & b & c \\ 1 & 0 & 100 & 0 \\ 38 & 13 & 1 \\ 108 & 38 & 0 \end{bmatrix} \tag{17}$$

This dataset intentionally looks similar on a ternary diagram to the compositional dataset of Section 3. The only difference is the presence of zeros, which preclude the use of logratio statistics. This problem can be solved by replacing the zero values with small numbers, but this only works when their number is small [26,27]. Correspondence Analysis (CA) is an alternative approach that does not require such 'imputation'.

CA is a dimension reduction technique that is similar in many ways to PCA [25,49]. CA, like PCA, is a special case of MDS. Whereas ordinary PCA uses the Euclidean distance, and compositional data can be compared using the Aitchison distance, point-counting data can be compared by means of a chi-square distance:

$$d\_{ij} = \sqrt{\sum\_{k=1}^{K} \frac{X\_{..}}{X\_{.k}} \left(\frac{X\_{ik}}{X\_{i\cdot}} - \frac{X\_{jk}}{X\_{j\cdot}}\right)^2} \tag{18}$$

where *<sup>X</sup>*·*<sup>k</sup>* <sup>=</sup> <sup>∑</sup>*<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *Xik*, *Xi*· <sup>=</sup> <sup>∑</sup>*<sup>K</sup> <sup>k</sup>*=<sup>1</sup> *Xik* and *<sup>X</sup>*·· <sup>=</sup> <sup>∑</sup>*<sup>m</sup> <sup>i</sup>*=<sup>1</sup> <sup>∑</sup>*<sup>K</sup> <sup>k</sup>*=<sup>1</sup> *Xik*. Applying this formula to the data of Equation (17) produces the following dissimilarity matrix:

$$
\begin{array}{cccc}
1 & 2 & 3 \\
1 & 0 & 1.5 & 1.5 \\
2 & 1.5 & 0 & 0.33 \\
3 & 1.5 & 0.33 & 0
\end{array}
\tag{19}
$$

Note that, although these values are different than those in Equation (11), the ratios between them are (approximately) the same. Specifically, *d*1,2/*d*1,3 = 1.5/1.5 = 1 for Equation (19) and *d*1,2/*d*1,3 = 6.4/6.4 = 1 for Equation (11); or *d*1,2/*d*2,3 = 1.5/0.33 = 4.5 for Equation 19) and *d*1,2/*d*2,3 = 6.4/1.4 = 4.5 for Equation (11). Therefore, when we subject our point-counting data to an MDS analysis using the chi-square distance, the resulting configuration appears nearly identical to the example of Section 7.

The following script applies CA to the heavy mineral composition of Namib desert sand. It loads a table called HM.csv that contains point counts for 16 samples and 15 minerals. To reduce the dominance of the least abundant components, the code extracts the most abundant minerals (epidote, garnet, amphibole and clinopyroxene) from the datasets and amalgamates the ultra-stable minerals (zircon, tourmaline and ru- tile), which have similar petrological significance.

```
library(provenance)