*2.4. Recipe Calculation and Additional Factors*

Recipes or prepared dishes will be introduced as part of another database. Recipes will be linked to the S4H FCDB in order to obtain all the necessary information. For the calculations, the edible portion, cooking method and those factors that can generate changes in the nutrient content (such as retention factors (RF) and yield factors (YF)) will be taken into account. In addition, allergen data and preparation methods will be implemented.

For the harmonized calculation of recipes, a mixed model was used, since it is the most widely used and accepted [3,76]. This method was proposed as standard by EuroFIR, and consist of applying YF at the recipe level and the RF to each individual ingredient [48,77]. This procedure requires incorporating beforehand the standardized YF and RF based on the food group classification system [25,78]. YF and RF values were obtained from different sources in order to cover the largest number of foods and cooking methods [26,50,76–80]. For the RF of polyphenols, in addition to those given in Phenol-Explorer 3.6 [75], the values retrieved from the EPIC study [61] were also used. The calculation method involved the following steps: first, weights of the raw ingredients were collected. Second, nutrient and compound levels were corrected for edible portions, if applicable. Next, ingredients were modified to account for the effects of cooking by using yield factors to adjust the raw weights. In addition, retention factors were also applied for nutrient losses or gains during cooking. Finally, the ingredient values were summed to obtain recipe values. Final values were expressed per 100 g of recipe and per total recipe weight. The estimates were performed automatically and entered as recipes in the database.

## *2.5. Information Management and Data Quality*

Tables and FCDB were implemented in MySQL open-source software. MySQL is a cross-platform relational database management system. A total of eight tables were implemented and interrelated. Tables were disaggregated to provide more versatility and security. All values were subjected to a variation range. Organizations such as INFOODS or

EUROFIR propose different methodologies to ensure and validate data quality [25,33,50]. However, in this case, the coordinators decided to follow a system of hazard analysis and critical control points (HACCP) [50]. For each data input, an original document and a working document identified with the same code were stored. For each step identified as HACCP, a series of validation tests were performed. These tests were based on different recommendations [3,25,33,50,57]. The validation procedure was followed by corrections, if necessary. The corrections of the conflicting foods were checked by data traceability extending to the original FCDB. The verifications performed are shown in Table 2. Those processes were applied at each stage of quality control, trying to minimize systematic and random errors. All tests were performed manually or semi-automatically by the coordinators, except for the recipes, which were automated.

**Table 2.** Steps identified as HACCP and validation testing.


#### **3. Results**

Around 26,200 foods were collected from different FCDBs. Branded foods, recipes or ready-to-eat products, among others, were excluded and a total of 6410 foods were obtained. The Netherlands, the Italian and the United Kingdom's FCDB were the ones that contributed the largest number of foods in the unification process. A large number of foods were excluded from the FAO FCDB due to incomplete information. Subsequent to unification, filtering and quality validation, 2648 foods were obtained for the S4H FCDB and 47% of them had an equivalent food in another FCDB, so that achieved unified values. The foods were grouped by food groups and shown in Supplementary Material S1 (Excel sheet).

Regarding nutrients, bioactive compounds and other information, 880 items were collected. About 95% of the items corresponded to nutrients or other food compounds. Only 5% corresponded to other items such as the food group, its code or some additional factors. During harmonization and standardization, 78.7% of the tagnames were kept with the recommended INFOOD standards units [33,60], without taking into account the polyphenol tagnames. However, the majority of the polyphenols did not have standard tagnames and represented 55.7% of the total of items. Only 5.3% of other compounds did not have standard tagnames. The standard units of 8.4% of the total number of compounds was modified to more functional units.

Germany contributed the highest percentage (15%) of total nutrients, Spain 9% and Greece 2%. It should be noted that 65.5% of the nutrients included in the database were polyphenols from Phenol-Explorer 3.6. If we do not take this into account, the percentages are tripled, as shown in Figure 2. For example, Spain and Germany had around 88% of the 40 most used nutrients in epidemiology, while Greece had only 40%. After Phenol-Explorer, the FAO FCDB is the one with the highest percentage of compounds, around 28.2%. However, the English and Italian FCDBs were the ones with the highest percentage of nutrient values used in epidemiology, with more than 95%.

**Figure 2.** Absence or presence of different compounds and nutrients in the FCDB. The FAO/INFOODS tagnames are expressed with the S4H IDs, listed in the Supplementary Material Table S1. Not all tagnames are shown; the Phenol-Explorer Database is not included and the complete figure is depicted as Supplementary Material Table S3.

Figure 3 shows an example of the values of the unification process for the item A00MH Spinaches, raw. Raw spinach was selected because it was included in most FCT/FCDB. The value of total proteins is quite similar, which confirms a correct classification of the food. However, micronutrient values were more heterogeneous among the different FCT/FCDB. With the unification, the S4H FCDB obtained intermediate values considering the possible variability and also, in the case of Selenium, it retrieved values similar to those of the national FCT/FCDB.

**Figure 3.** Protein, zinc, selenium and manganese content of the food categorized as spinach in different FCDBs and the unified values corresponding to the S4H FCDB.

Regarding recipes, tables and interrelations for energy, protein, carbohydrate, fat, sodium, calcium, riboflavin, Vitamin C, the flavonols group and (-)-epicatechin were checked for correctness. A set of recipes was selected from the database to perform manual

and automatic calculations; the results were identical in 80% of the cases, and when they were not, mismatches came from the compilers' failure to choose performance or retention factors. This problem disappears when automated.

After data validation, no errors were detected in the transformation of units because there were no systemic deviations detected in any specific nutrient or compound. In 1.8% of the foods, some nutrients showed extreme standard deviations, most likely coming from the original FCDB. In addition, 7.5% of the foods had high deviations in some nutrients; all of these values, coming from the harmonization and coding phase, were reviewed and corrected. No differences were detected when using either mean or median values, except in some specific cases, such as unified foods with more than six FCDB. Nevertheless, the median value gave estimates closer to the overall computation of the data. In addition, 4.9% of foods had macronutrients that did not meet the established quality limits; the same happened with the sum of total fats, where 2.8% presented mismatches. Therefore, 17% of the food products had some type of error. Of this percentage, about 88% could be resolved by excluding 54 food items, resulting in a total of 2648 foods. The data were transferred properly and all MySQL interrelations were checked.
