**4. Discussion**

Public health practitioners and policy makers rely on FCDBs to assess nutrient availability and provide information to link dietary data with nutrient intake for nutritional epidemiology. They also utilize FCDBs for developing nutrition interventions and for informing consumer education. Policies impact food product composition to address dietary shortfalls, but the full potential of food composition is often not recognized [25]. In South Africa, studies have been limited to determining consumption habits among populations [16–19] but our study aims to examine the nutrient patterns present within the food items consumed by the population. More specifically, we aimed to examine the nutrient patterns present among food items listed in the SAFCDB [20] using correlation and PCA. FCDBs are often country-specific due to the influence of environmental, genetic, and processing factors on the nutrient content of food. National FCDBs also include countryspecific foods and recipes, reflecting the unique consumption patterns of the country [26]. Therefore, analysing foods contained in the SAFCDB would provide information on the nutrient levels of foods consumed by the South African population.

Significant correlations between the nutrients were identified. Nutrients obtained primarily from plant-based foods, such as total fibre and available carbohydrates, exhibited a strong positive correlation with plant protein. Nutrients obtained primarily from animal

products, such as cholesterol and vitamin B12, were strongly associated with animal protein. These plant-derived nutrients negatively correlated with animal-derived nutrients, confirming what is known about nutrient co-occurrence. Our results are also consistent with the correlations found elsewhere among raw foods [13] and raw plant foods [9], suggesting that similar nutrient patterns are evident among cooked and composite dishes as well, which were included in our analysis. The underlying correlation structure contributes to features that distinguish between nutrient-based food groupings. This must be accounted for in any statistical analyses undertaken using multivariate methods. In addition, the high correlation implies better prediction models which are useful in estimating values of missing nutrients, a problem common to FCDBs. While the 2017 SAFCDB contained nutrition data for 1667 food items, only 971 food items could be analysed due to missing data. In addition, missingness also excluded biotin and folate from the analysis, which are both vital B-vitamins that are sourced from food [27]. Methods to impute missing values in food composition data have been investigated [28–30] and further research in this area could facilitate the completeness of FCDBs.

Our study affirmed that some food items are more compositionally alike than others, by identifying eight nutrient patterns that were consistent with existing knowledge. All analysed nutrients, except iron and pantothenic acid, featured on a pattern. Although iron and pantothenic acid did not meet our threshold for a high loading, both stood out on nutrient patterns that contained their expected sources. Vitamin A featured on two nutrient patterns, due to its availability in foods of both plant and animal origin. A study [14] conducted in Finnish foods, identified four nutrient content patterns using factor analysis and was able to group wheat products with legumes, and mushrooms with offal foods—a common finding in our study as well. Although the study was able to include 106 nutrients, the patterns were comparable to the patterns found in our study, suggesting that only a few key nutrients are needed to successfully determine nutrient patterns.

We also validated our results by applying the dimension reduction technique to the food items themselves. Results of both analyses were similar, and a large amount of the nutritional variation was able to be explained by a few patterns. The patterns included food items from across different food groups, suggesting compositional similarity despite conceptual dissimilarity. Hence, applying clustering techniques within each conceptual FCDB group may reveal more intricate groupings. However, this approach may suffer from high dimensionality with small sample size issues. Two studies applied clustering techniques within FCDB food groups. The first study [15] found six subgroups within the 'Cereals' category of the West Africa Food Composition Table. These subgroups separated grains by type and preparation methods. For example, pearl millet separated from other grains, and maize was separated across three clusters depending on whether it was raw, boiled, or prepared as a porridge. Likewise, our analysis differentiated between white and brown rice, and soft maize meal and stiff or crumbly maize meal. The second study [9] applied clustering techniques within five food categories (fruits, vegetables, nuts and seeds, legumes, and cereal grains) of the U.S. Department of Agriculture (USDA) National Nutrient Database for Standard Reference (SR) Legacy (2018). The study found that similar foods were not necessarily from the same category. For example, wheat germ was found to cluster with legumes, a finding repeated in our analysis as well. Another similar finding was almonds and coconuts, macadamias, pecans, and walnuts separating from other nuts in the database. Chestnuts were also isolated from other nuts. Our results suggest that statistical methods can be used to create a natural food exchange list to accommodate different dietary preferences.

Dark leafy greens such as spinach and other leaves (amaranth, blackjack, cowpea, etc.) were differentiated from other vegetables in the database. The application of PCA to food items had greater discernability than PCA applied to nutrients. Under the food pattern analysis, dark leafy greens were further divided into spinach and amaranth leaves and other leafy greens. Similarly, orange-coloured fruit and vegetables grouped together, which was not seen under the nutrient pattern analysis. This type of clustering was also identified

in Pennington et al. [10]. The daily consumption of dark-green leafy vegetables and orangecoloured fruit and vegetables is recommended as per the South African FBDGs [4] and the Dietary Guidelines for Americans [31] and is important for a healthy diet as they are rich sources of vitamins and minerals [4]. Classifications that are based on nutritional similarity are useful to nutritionists, researchers, and consumers for the development of dietary guidance materials, development of food frequency questionnaires and reporting of consumption studies, and adherence to dietary guidelines [32].

The PCA method was also able to separate canned vegetables and vegetables fried in oil from the other vegetables. This is helpful in determining food preparation characteristics from nutrient information. Both analyses were able to identify foods made with egg, such as choux pastry and custard, and group these items together with eggs. However, the nutrient analysis additionally included milk and savoury tarts, which are traditionally made with egg. Similarly, both analyses were able to identify foods made with milk and cheese, such as malted milk beverages, puddings, yoghurt, and cheese sauces. Employing a principal component analysis may additionally be helpful in identifying ingredients for composite dishes in a FCDB.

Our results provide data-driven evidence to support the existing knowledge of food and nutrient patterns, as well as South African food-based dietary guidelines and nutrition policies. Each of the nutrient patterns identified corresponded to a guideline and supports the consumption of a variety of foods and moderation of other foods. High sodium levels in food items have led to the current promulgated salt regulation and reduction of salt content of food items in the country [6]. Food items belonging in the high sodium food pattern closely mirrored the categories identified in the regulation. Under the food item analysis, canned vegetables grouped together with other processed food items on a high sodium pattern. Canned vegetables, processed meat, processed cheese, bread, and sauces are suggested to have similar levels of sodium, and this is consistent with research showing these categories to have the highest median sodium levels, based on packaged foods in South Africa [33]. This analysis supports the regulation and can be used in a similar fashion to identify foods with a high sugar content. FBDGs are developed in response to a public health problem [34] and requires identifying rich sources of nutrients that are of public health importance [35]. The patterns identified in our results each describe foods that are rich sources of specific nutrients. Foods providing these nutrients are recommended to be either limited or increased, as appropriate, and implementation of the FBDGs should then be accompanied by monitoring and evaluation of the effects. Food systems [36] are dynamic and are influenced by key drivers such as regulatory frameworks, consumer influence, technological innovations, concerns for food safety, and growing attention paid to diet and health [37,38]. Thus, continuous updates of a FCDB are essential to reflect the changes not only in the types of food provided but also the composition thereof [39]. The evaluation of the effects can be based on changes in food composition [34] and some studies have applied statistical methods to different versions of FCDBs to determine changes over time in composition of fruits and vegetables [40–42]. Therefore, repeating our analysis on past versions and future updates of food composition data could assess whether the implementation of FBDGs and regulations have impacted the reformulation of products. The SAFCDB is updated every three years as updates are resource-intensive and can be challenging to regularly implement, as updates are applied to all database-related tools and products such as publications, software programs, and applications.

The research of innovative statistical methods tailored towards food composition data has the potential to provide improved evidence for dietary guidelines and policy. In addition, it can also support the dietary patterns found in consumption studies. Makura-Kankwende et al. [17] showed that the animal driven dietary pattern, characterised by animal protein and saturated fat, was associated with an increased body mass index amongst black South African women. From our results, foods high in animal protein and saturated fat correspond to meat and meat products, processed meat, and fried foods. These foods are generally present in the Western diet, and the animal driven pattern found

may be suggestive of a shift towards this diet [17]. Another study, Visser et al. [19] found that a dietary pattern featuring vitamin A and vitamin B12 was associated with lower odds of anaemia in 5–12-year-old South African children. This dietary pattern is reflected in our results which identified foods containing this combination of nutrients, mainly, organ meat such as kidney and liver [19].

Some limitations need to be considered. Missing nutrient values excluded essential nutrients such as folate and biotin from the analysis and contained our sample to 58% of foods available in the SAFCDB. With respect to the PCA method, subjective decisions on the data matrix, rotation method, number of retained components and loading threshold need to be made [16,18]. However, our results are consistent with existing knowledge and has strengths in presenting nutrient and food patterns among South African foods that support food-based dietary guidelines, nutrition policies and consumption studies. We are currently working on developing K-Means and Gaussian Mixtures (GMs) clustering models to identify food items that are more like each other. We are aware that several food items contain missing nutrient values in the database, so we will incorporate multiple imputation techniques to account for missing data. We believe the development and application of these models to food composition databases will contribute to an understanding of nutritional uptake in the population and monitoring adherence to national nutritional prevailing regulation and guidelines.
