*4.2. Data Quality and Recipe Calculation*

High quality data are essential for nutritional studies [48]. The use of the HACCP system [50] allowed us to quickly and sensibly evaluate data quality at different stages. In addition, the FAO guidelines served as a reference in the detection of critical points at any stage of the process [33]. Initial training was essential to successfully complete all the tasks, while guaranteeing the highest possible accuracy and quality.

For S4H FCDB, name verification and food description, as well as translations, were corrected thanks to the collaboration of researchers whose first language was mostly the language of the FCDB.

An FCDB should be frequently updated. For example, in the TEDDY study, the FCDB was updated at least once a year [67]. The incorporation of the original food IDs to guarantee the traceability of the food was a critical control point. Original food IDs allowed us to identify and correct errors and even to retrieve or update the information. Failures in the classification and verification of food grouping and compound labeling were detected due to outliers or manual coding by using standard deviations. Three different checking approaches were used: (i) Checking that the sum of macronutrients was within the range or the presence of implausible values detected semi-automatically in the spreadsheets; (ii) Checking for data transfer from spreadsheet to MySQL by direct verifications between versions and table relationships; and (iii) Checking the model recipe by manual verifications by compilers and automatic verifications by interconnecting the different databases. These verifications made it possible to ensure the comparability and reliability of the data.

Performing chemical analyses for all recipes and complex food matrices is not achievable. Calculations are performed indirectly using each ingredient's nutritional information [10,91]. In order to properly calculate a recipe, different parameters must be taken into account, such RF or YF. One of them is that values should not be missing, since these may lead to a biased underestimation of nutrient intake [43]. During unification and the inputting of values, this problem was solved to a large extent. The EuroFIR recipe calculation procedure was selected as a reference because it is one of the most commonly used [76]. There are several studies that use an app or software to estimate or perform interventions in nutrition and health [61,92,93]. Accordingly, the S4H FCDB will be interlinked with a recipe database. It will therefore make possible the automatic calculation of recipe values, taking into account all necessary parameters, such as edible portion, retention factors and

yield factors or even allergens. Thus, the recipes will be as adequate and representative as possible to cover the needs of the population.

#### *4.3. Strengths and Limitations of the S4H FCDB*

With the continuous expansion of food trade worldwide [10], climate change or innovation in agriculture [13], international FCDBs are essential. For this reason, S4H FCDB wants to be a reference in the creation of a unified FCDB. Much effort has been made to overcome the common drawbacks that are generally associated with the FCDB's construction. The variability in food composition (when using different FCDBs) is one of the most detected limitations [7,20]. S4H FCDB attempts to address this limitation by using the median value as the reference estimation. Additionally, there is no guarantee that national FCDB data are free of errors [2]. However, all national FCDBs are used in their own country. The unification gave us a global view of possible wrong values, allowing them to be corrected. Another limitation was represented by missing foods and nutrients from the national FCDB [47]. The S4H FCDB inputs those missing foods and compounds giving coverage and completing those values in the national FCDB. Discrepancies may exist between the tagnames proposed by FAO/INFOODS and their units [15]. However, the decisions to change units were consensual and made to improve their functionality. Moreover, inputted values from other datasets, especially dishes and recipes, did not guarantee directly related values [10,65]; for this reason, recipes and ready-to-eat products were removed. Recipes will be calculated thanks to the interconnection between the S4H FCDB and a recipe database.

The work was complex, and although the compilers were experts in nutrition, mistakes may have been made when choosing codes for harmonization [15,23]. However, the use of guidelines and data validation throughout the whole process allowed for the verification and correction of possible mistakes. The preparation of this material required a long time, and perhaps with automated methods and a subsequent exhaustive check, similar results could have been obtained [88]. There may have been failures during the translation of some foods [47], especially regional foods, although if no reliable translation was found, foods were discarded. Even so, our results are encouraging. Misspellings and translation mistakes were detected while manually identifying and classifying. Thus, one of the limitations may have actually been a strength.

In most nutritional epidemiological studies, results are similarly interpreted regardless of how they make estimations or which FCDB is used. This generates an unrealistic relationship of nutrient intakes and their impact on health [94]. An increasingly large number of epidemiological studies attempt to make their data comparable [51,67,95,96]. One of the strengths of the S4H FCDB is that with unified values, data from different countries could be compared, as it would take biodiversity and different parameters affecting the same kind of food into account. Another option is to use national FCDB data and only fill in the missing nutrients and compounds to avoid underestimations [6,18]. Organizations such as EUROFIR have the potential to create a standardized FCDB which should be free to use [48]. EFSA already has a tool as a first step towards the unification of nutrients [97]. The S4H FCDB is one of the most comprehensive FCDB regarding the number of foods and nutrients, being able to collect more than 800 compounds from each foodstuff. Thus, to date it is only surpassed by the https://foodb.ca (accessed on 27 October 2021) project supported by the Canadian Institutes of Health Research and by The Metabolomics Innovation Centre. This Database includes not only nutritional information, but also a large amount of bioactive compounds [24,55,98]. However, it must be noted that the S4H FCDB uses different FCT/FCDB, giving much more homogeneous and comparable nutritional values.
