**2. Methodology**

The use of data as strategic knowledge in the agrifood sector is a resource for multiple stakeholders. The traceability search engine concept was designed using three specific food supply chains as a model to make sure that the concept is more generally valid considering the differences—sometimes very relevant—that occur among the different production chains. The olive oil, milk and fishery products food chains were selected as a model based on the following criteria:


For the olive oil, processes that lead to edible virgin olive oils (extra virgin olive oil and virgin olive oil), as defined in Reg. No 1308/2013 (cons. 2020) [29], were included. For the milk, cow milk was specifically considered without including dairy products. For fishery products, all seawater or freshwater animals (except for live bivalve molluscs, live echinoderms, live tunicates and live marine gastropods, and all mammals, reptiles and frogs) whether wild or farmed were considered including all edible forms, parts and products of such animals, as defined in Reg. No n.853/2004 (cons. 2021) [30]. To cover different types of fishery products three different specific supply chains were examined as examples of three food product categories:


3. Anchovies supply chain—as representative of a marine wild-caught, small size, medium fat fish.

First of all, each step of the food chain, with all possible routes, was mapped from primary production to human intake.

The mapping was reported in a flowchart for several purposes: to enable an easier comprehension, to support data analysis step by step, and to support visualisation.

Then, a table was created in which each step was described in detail and the inputs and outputs of the steps were identified. The input and the output of a step describes the status of incoming and outgoing food in order to identify them uniquely.

When applicable, terms were reported with their official definitions (e.g., for FAO, WHO, European Regulations, or EFSA documents).

Finally, every step was examined to understand its effect on nutritional quality, safety, authenticity and transparency, and to obtain a catalogue of significant parameters.

For nutritional quality, intrinsic attributes of food such as chemical composition, physical structure, biochemical changes, nutritional value and nutraceuticals (i.e., the capacity, due to the chemical components, to bring benefit to human health) were considered, as well as shelf-life and the way packaging interacts with the food. For food safety, microbial and chemical contamination were considered (hazards from pathogens, microbial spoilage, presence of mycotoxins, heavy metals, pesticides, etc.). The conditions and practices that influence the nutritional quality of intrinsic attributes of food, and that influence food safety (leading to contamination and foodborne illness) were examined. Authenticity and transparency were considered concerning chemical or genetic markers and profiles that allow for the demonstration of geographical, botanical, or zoologic origin, or for the identification of frauds.

For each criterion (nutritional quality, safety, authenticity and transparency) and each step the following parameters were identified:


The parameters of interest (data) are considered as analytes (e.g., a chemical, physical or microbiological substance or component) or as nominal properties/characteristics (e.g., profile or taste) that may be subjected to change depending on the conditions in the step under investigation. The parameters of interest were reported for each step, or in some cases for multiple pooled steps, with the specification of the matrix (e.g., semi-finished products). Some of them were compulsorily provided by the manufacturers, while others were measured only voluntarily.

The parameters of influence (metadata) are considered as conditions that can have an effect on or modify the levels of the parameters of interest. They can refer to the matrix of the corresponding stage, or to other aspects (e.g., the environmental, processes, conditions). For example, the physical–chemical characteristics of soil can influence the bioavailability of toxic and potentially toxic elements and therefore their content in the olives; pedoclimatic conditions such as temperature, rainfall and distance from the sea can affect the isotope ratios and nitrogen level and sun exposure can affect the content of polyphenols.

Finally, for each parameter (or class of parameters) and each step, monitoring systems (e.g., indicators/measurement devices) of the parameters of interest and influence were mapped by differentiating between those for offline detection, such as analytic laboratory methodologies, and those permitting in situ and in-line monitoring, such as IoT sensors.

Based on this examination, a concept for a traceability search engine was created that allowed searches for the different aspects described above. Such a concept is presented in the results section.

The above-described methodology has been applied for all of the three supply chains. In order to provide a practical example, what has been elaborated for the olive oil supply chain is presented in Figure 3 (flowchart), Table 1 (inputs and outputs of the supply chain's steps) and Table 2 (catalogue of parameters of interest and parameters of influence).

**Figure 3.** Flowchart of the edible virgin olive oils supply chain.



**Table 2.** Extract of the catalogue of parameters of interest and parameters of influence for nutritional quality, safety and authenticity/transparency concerning the virgin olive oil supply chain. The phases in italics are optional and at the choice of the food companies and "X" indicate that the process phase does not affect authenticity/traceability.


### **3. Results: Traceability Search Engine Concept**

The purpose of the traceability search engine concept is to support users in searching and finding available food data and information, and to provide knowledge and guidance to its users. A summary of the concept in one sentence could be the following: the traceability search engine concept uses tags having semantical meaning, groups them in dimensions, assigns each data or information resource these tags, allowing use of either a simple search or a visual space search, and offering informed guidance. In the following, some terms will be defined, the idea of dimensions are explained, and finally the visual search of data and information resources is presented.

The first term that the search engine concept uses is data or information resource which can be a dataset or a scientific publication. Such data or information resources can have any form or format, and they can be structured, semi-structured or unstructured data or information. While structured data or information such as databases or Excel files have data organised in a certain order such as rows and columns, unstructured data or information such as free text do not use an ordered structure and data and information must be extracted. Semi-structured data and information is in between structured and unstructured, and it normally combines them. An example could be a Wiki where each term is explained on a separate page and therefore using a list structure, but the content of a page is free text and therefore unstructured. This structural classification is a high-level classification, while more concrete types of data or information resources are commonly used as datasets, databases, Wikis, scientific papers, project reports or websites.

The concept does not require that data and information resources are directly included in a software that implements this concept. It is only required that a resource is described with enough information so that it can be used in the rest of the concept.

The type of a resource is a first characterisation of a data and information resource, and it is considered as a dimension in the concept. A dimension has different tags, and a tag is a term or a keyword that can be assigned to a resource, and it describes a certain aspect of the resource, see Figure 4. The dimension is therefore the group, while the tags are concrete values such as dataset, scientific publication or Wiki.

**Figure 4.** A dimension is a group of tags which are possible concrete terms or keywords of the dimension.

For each resource, none, one or multiple tags of a dimension can be assigned. The tags and the dimensions have a name and a description explaining their meaning and usage. Additional fields can be added such as input and output as we have seen in the former section. Tags within a dimension can also have a hierarchical structure, allowing for the use of a tree structure with multiple parents. In the food traceability search engine concept, the following dimensions are defined:


The food supply chain is another example of a dimension and each step in the chain is considered a tag. Two or more dimensions can have dependencies between each other. For example, the dimension food group or matrix and the dimension food supply chain. Depending on the food group, the food supply chain can be different as presented in the last section. The concept therefore allows for the defining of dependency rules for tags and dimensions. A dependency rule for dimensions allows for defining each food group's different food supply chain. Another example is that the parameter of interest is a super dimension where all other dimensions are child dimensions; they were separated because they represent a specific aspect. A dependency rule between tags allows for defining that tags of one dimension can only occur in combination with tags of other dimensions; e.g., certain parameters of interest only occur in combination with a certain phase in the food supply chain, see Table 2.

The description of dimensions and the description of dependency rules are designed to inform users and to serve as a knowledge and documentation base in the domain of the traceability search engine. The documentation should not only allow simple description but also an advanced documentation means as shown in Figure 5. The documentation contains all the information that was collected in the former section, and it represents the current knowledge. As this is developing over time, this documentation needs to be adjusted and extended.

Possible users of such a search engine can range from laypersons to experts, and they are presented in Figure 2. Depending on the type of user, an appropriate scope of information can be made available. For more advanced users, more information can be presented while for less advanced users summaries and simplifications are enough. Therefore, the user should be able to define his/her level of expertise.

The documentation supports consumers by providing information about the production chain of a food item and allowing identification processes that can influence quality, safety, authenticity and transparency. For example, users can check the quality of milk and what affects it. Researchers on the other hand have the ability to find information about which production step influences authenticity and be able to compare their data to other datasets. Food producers also have the possibility to investigate the influences on the quality of their food production, and they can use the knowledge for improvements.

Having assigned the tags to the resources, they can be used to browse and search data and information resources. A simple search allows therefore to select one or multiple tags from one or multiple dimensions and to retrieve all resources that have these tags assigned. The result is normally presented as a list of resources. If more than one tag is used, it should be defined if the AND, OR, or both operators are used. The AND operator defines those resources must have all tags assigned while the OR operator defines that either of the tags must be assigned.

More interesting is the space search because the result list of resources has some limitations. The list, for instance, does not show where no resources are available and comparing tags is more or less comfortable depending on the implementation. A result list must be considered as a keyhole view where only a part is visible, while most of the room is not visible. The space search solves this issue by allowing the use of dimensions to span a result space. If two dimensions are selected, the tags of one dimension are put next to each other on one axis and the tags of the other dimension are put next to each other on the other axis. This results in a table and the data and information resource are listed in the corresponding cell. Table 3 shows a schematic example where the food supply chain was used in combination with three main aspects in food science, which are safety, quality, authenticity and transparency.

**Table 3.** Example of result presentation of a 2-dimensional space search.


The resulting table in Table 3 demonstrates how the keyhole view is removed by showing all possible combinations of two dimensions providing an overview of what data and information resources are available and where no resources are available. The example also shows that not all tags need to be mapped on an axis to reduce the number of columns and rows.

Taking into account different users and their needs, a graphic presentation of the entire supply chain is beneficial, showing its individual steps and the entire food flow process from primary production to human intake. Thanks to this, users can see the entire complexity of the process as well as obtain detailed information about the phase of the process that interests them.

The resulting cells are clickable and, when selected, another view with all resources is presented, showing more information than in the multi-dimensional result space. The idea is that the list items or the result space items represent a short summary, while more information can be found on a separate page when clicking on an item. The list or search space result presentation is called result view while the detail information page is called the detail view. How the detail view is structured and what information it contains depends on the data domain.

The space search is not limited to two dimensions, but it can combine three or more dimensions. The presentation of the result gets a challenge as multi-dimensional spaces are hard to present and maybe a decomposition in multiple 2-dimensional tables is needed.

The concept also allows for the combining of two or more dimensions on a single axis to increase the space that is spanned and to enlarge the overview of available resources. A limiting factor is the space of the computer screen, in particular if tablets and mobiles are used. In such cases, the reduction of tags mapped on axes is advisable.

Finally, the simple search and the space search can be combined. While the space search presents the results in multi-dimensional space, the dimensions that were not used to span the space can be used to further filter resources. In this way, more advanced search operations are possible and more specific results can be presented.
