3.2.4. Sample·Panelist Interaction

In the case of a total consensus among the members of the panel to assess the descriptors in all samples, their effects should not be significant. However, in this work, there were numerous significant cases (Table 1). The evaluation of the interaction is usually measured by the coefficients of the ANOVA, defined as the difference between the expected mean score by all panelists and that given by a specific one. It is tedious to reproduce their meaning in all descriptors, so only the case of skin red and flesh red are shown as examples (Figure 1). The effect might be significant because of two circumstances: (i) the panelists do no rank the samples in the same order and (ii) they do no use the scale in the same way. Both situations were found in this work. Examples of different ranks were observed, among other descriptors, for skin red, panelist 1 gave the highest score to HA1, but panelist 2 ranked it as the second one from the bottom; a similar behaviour occurred for flesh red regarding panelist 5 with respect to panelist 6 (Figure 1).

**Figure 1.** Panel performance. Sample·panelist interaction coefficients for selected descriptors (skin red and flesh red).

On the other side, for skin red, panelist 1 used a narrower scale than panelist 6; the same trend can be observed for flesh red by panelist 1 and panelist 12 (Figure 1). Therefore, to improve panel performance, it will be required further additional training in the scoring of some attributes and the amplitude of their scales.

The corresponding coefficients of each panelist in the ANOVA model were assessed by the identification of the panelists who mainly contributed to the interaction [19]. With this aim, the difference between the expected score and that given by a concrete panelist, overall sessions and samples, represent how far a specific panelist scores the sample differently to the product mean of the whole panel. No significant differences were usually observed (panelists had, in general, good reproducibility), but some peculiarities were noticed. For example, panelist A12 scored skin green (Figure 2A) sensibly higher than any other panelist; subsequently, he was critical in the significance of this interaction. Additionally, panelist A3 tends to scoring skin red, skin sheen, and flesh red above the panel average (Figure 2A).

**Figure 2.** Panel performance. Sample·panelist interaction as assessed by (**A**) the panelist's contributions (coefficients) for selected descriptors (skin red, skin green, skin sheen, and flesh red), and (**B**) means of panelists over the whole panel according to samples.

Another way of observing the sample·panelist interaction and measuring the panelists' reproducibility is by plotting the mean per panelist over the mean on the whole panel according to

samples. In agreement with previous comments, some panelists gave high scores to several descriptors and, in this line, panelist A12 overscored skin green in samples HL2, HA2, MAL2, and ML2 (Figure 2B). These high scores were due to a tendency of this panelist to evaluate several descriptors (flesh yellow and briny, data not shown) higher than other panel members. Similarly, outstanding scores were observed for panelist A5 in vinegary, alcohol, and sourness, and for panelist A8 in mouth coating, chewiness, stringency, and residual (data not are shown). However, most of the panelists differently scored only one descriptor like A4 in grassy smell, A10 in cheesy smell, A3 in a buttery, or A6 in rancid, to mention a few cases. Therefore, no panelist systematically contributed to the interaction, but the above-mentioned results could indicate that the panel performance would be improved by the further training of some panel members (A12, A5, and A8, on several descriptors or A4, A10, A3, or A6, only regarding specific ones). Kermit and Lengard Almli [19] also found several assessors who showed poor performance in some attributes, such as mealiness or fruity flavor.
