**2. Materials and Methods**

#### *2.1. Data*

The 2017 SAFCDB [20] (available at http://safoods.mrc.ac.za/products.html, accessed on 9 September 2021) contained nutrient information on 1667 food items and 169 food components. This consisted of both uncooked and cooked food items, as well as composite dishes. Fortified food items were described as such. Table 1 provides a detailed description of the food items by food group. For ease of reference, we will use the term 'nutrients' to encompass the nutrients, minerals and vitamins used in the analysis. All nutrient values were expressed per 100 g. The most common nutrients with a minimal quantity of missing values were selected for analysis (*n* = 28; Table 2). In our selection of the nutrients, we also ensured that nutrients were non-collinear. For example, because total carbohydrate is the sum of available carbohydrate and dietary fibre, we opted to include available carbohydrate and dietary fibre instead of total carbohydrate. Nine macronutrients, nine minerals, and ten vitamins were analysed. Due to the standard principal component analysis (PCA) technique requiring complete data for all variables, all food items that had complete nutrient information for the selected 28 nutrients were included in the principal component analysis (*n* = 971).


**Table 1.** Number of foods per food group.

**Table 2.** Nutrients included in the analysis with their unit of measurement and corresponding abbreviations used in figures.


Abbreviations: g = grams; mg = milligrams; μg = micrograms; RE = retinol equivalents.

#### *2.2. Methods*

Statistical methods that consider the correlated nature and presence of multiple nutrients within a food item are needed to evaluate the nutrient patterns amongst food items. Principal component analysis is one of the oldest and simplest dimension-reduction techniques available [21] and is applicable to correlated variables. When applied to food composition data, PCA allows the analysis of multiple nutrients simultaneously. PCA aims to describe the maximum amount of variation in the dataset using the least number of principal components (PCs). The PCs are uncorrelated linear combinations of the original variables that capture most of the variation within the first few components. PCA aids data reduction by explaining the covariation amongst the variables using a few linear combinations. PCA also aids data interpretation by finding features that explain the covariation. The contribution of each variable to a component is called the loading and high loadings indicate important variables. Rotation methods can be applied to enhance interpretability by producing loadings that are as close to zero or one as possible. For each PC, observations have a score that combines each of the variables. The score indicates how much each observation is related to a PC [22]. Factor analysis is also a common multivariate dimension reduction technique but has slight differences to PCA. While PCA describes the

relationships among the observed variables in a simpler way, factor analysis finds latent factors that influence the observed variables. Hence, the application of factor analysis is more suited to the analysis of consumption data as it will be able to generate latent factors, that is, dietary patterns, which predict food choices [23]. Figure 1 presents the methodology and rationale.

**Figure 1.** Flow chart of methodology and rationale.

#### 2.2.1. Correlation Analysis

For each nutrient, some foods contained exceptionally high values. For example, oysters were especially high in zinc and amaranth leaves were especially high in magnesium. Due to these outliers, we calculated pairwise-complete Spearman correlations for the complete dataset (*n* = 1667), to determine nutrient co-occurrence patterns.
