**Sensopeptidomic Kinetic Approach Combined with Decision Trees and Random Forests to Study the Bitterness during Enzymatic Hydrolysis Kinetics of Micellar Caseins**

**Dahlia Daher 1,2, Barbara Deracinois <sup>1</sup> , Philippe Courcoux 3,4, Alain Baniel <sup>2</sup> , Sylvie Chollet <sup>1</sup> , Rénato Froidevaux <sup>1</sup> and Christophe Flahaut 1,\***


**Citation:** Daher, D.; Deracinois, B.; Courcoux, P.; Baniel, A.; Chollet, S.; Froidevaux, R.; Flahaut, C. Sensopeptidomic Kinetic Approach Combined with Decision Trees and Random Forests to Study the Bitterness during Enzymatic Hydrolysis Kinetics of Micellar Caseins. *Foods* **2021**, *10*, 1312. https://doi.org/10.3390/foods10061312

Academic Editor: Mónica Carrera

Received: 29 April 2021 Accepted: 31 May 2021 Published: 7 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Abstract:** Protein hydrolysates are, in general, mixtures of amino acids and small peptides able to supply the body with the constituent elements of proteins in a directly assimilable form. They are therefore characterised as products with high nutritional value. However, hydrolysed proteins display an unpleasant bitter taste and possible off-flavours which limit the field of their nutrition applications. The successful identification and characterisation of bitter protein hydrolysates and, more precisely, the peptides responsible for this unpleasant taste are essential for nutritional research. Due to the large number of peptides generated during hydrolysis, there is an urgent need to develop methods in order to rapidly characterise the bitterness of protein hydrolysates. In this article, two enzymatic hydrolysis kinetics of micellar milk caseins were performed for 9 h. For both kinetics, the optimal time to obtain a hydrolysate with appreciable organoleptic qualities is 5 h. Then, the influence of the presence or absence of peptides and their intensity over time compared to the different sensory characteristics of hydrolysates was studied using heat maps, random forests and regression trees. A total of 22 peptides formed during the enzymatic proteolysis of micellar caseins and influencing the bitterness the most were identified. These methods represent simple and efficient tools to identify the peptides susceptibly responsible for bitterness intensity and predict the main sensory feature of micellar casein enzymatic hydrolysates.

**Keywords:** bitterness; enzymatic hydrolysis; micellar caseins; off-flavours; peptidomics; random forests; regression trees; sensory analysis

#### **1. Introduction**

The enzymatic hydrolysis of milk proteins displays a generally unpleasant bitter taste. The perception of bitter taste plays a crucial role in their use in various application fields. Indeed, the bitter flavour of extensively hydrolysed proteins has been and continues to be a major hindrance for their use. In addition, bitterness is sometimes combined with off-flavours that also appear during hydrolysis. However, these milk protein hydrolysates have significant advantages such as in sport nutrition where the use of these hydrolysates induces a very rapid release of amino acids in the blood, which may maximise muscle protein anabolism and facilitate recovery [1]. Moreover, these hydrolysates make it possible to boost muscle synthesis in sensitive subjects such as the elderly [2]. They are also used in clinical and infant nutrition where milk protein hydrolysates are recommended for a rapid supply of amino acids while ensuring low protein allergenicity. Indeed, the allergenicity of a protein is reduced or eliminated when the protein is hydrolysed into a low molecular

weight peptide composition. Moreover, milk protein hydrolysates cater to the nutritional requirements of infants and toddlers, improving milk protein digestibility and reducing frequent spit-up.

For many years, scientists have performed important studies on explaining the appearance of bitterness in hydrolysates. For example, Murray and Baker were the first authors interested in the taste of protein enzymatic hydrolysates [3]. They found a bitter taste in enzymatic hydrolysates from caseins and lactalbumin, obtained with commercial proteinases, and a neutral taste in hydrolysates obtained from gelatine. Ichikawa et al. hydrolysed caseins, soy protein, ovalbumin and gluten with a proteinase from *Bacillus subtilis* and reported the development of a pronounced bitter taste with casein hydrolysates [4]. Various factors can influence the appearance of undesirable flavours in protein enzymatic hydrolysates, such as the nature of protein substrate(s) and enzyme(s), the hydrolysis duration, the selected pH and the temperature conditions. Concerning the casein proteins, β-, αS1- and κ-caseins produce the most bitter hydrolysates [5]. The causes for the bitterness were identified as early as 1970 by Fujimaki et al. and Matoba et al. [6,7] as being the presence of specific peptides rather than free amino acids in the protein hydrolysate. For example, the free forms of L-leucine and L-phenylalanine residues are bitter, with thresholds of 15–20 mM, but Leu-Leu or Ile-Leu and Leu-Phe are more than 10 times more bitter. Kim and Li-Chan (2006) and Iwaniak et al. (2018) confirmed that the bitter taste of peptides is determined by the presence of amino acids with high hydrophobicity [8,9]. According to Iwaniak's data, the bitterness of peptides results from the presence of residues with bulky and branched side chains such as Leu, Ile, Val, Tyr, Phe and Trp. The bitterness of peptides also increases as the number of amino acids increases. Moreover, some structural characteristics, such as the diastereoisomer of the L series, the presence of a proline residue at the geometric centre and/or close to a basic amino acid, hydrophobic amino acids at Nand C-terminal positions in the peptide, and two and three residues of Leu, Tyr, Phe at the C-terminal of the peptide, influence the bitterness. In addition, it has been claimed that there are no bitter peptides for lengths greater than 25 residues [7].

In a previous study [10], the comparison between the sensory characteristics and the principal components of the principal component analysis (PCA) of mass spectrometry data reveals that peptidomics constitutes a convenient, valuable, fast and economic intermediate method to evaluate the bitterness of enzymatic hydrolysates as a trained sensory panel can conduct it. Nevertheless, to go further in the understanding of the peptide-related bitterness appearance/disappearance during the hydrolysis time, an enzymatic hydrolysis kinetic study gathering a sensory evaluation and a peptidomics approach combined with machine learning algorithms were carried out. Herein, we have studied the enzymatic hydrolysis kinetics of micellar caseins subjected to hydrolyses using commercially available and food-grade proteases, allowing the production of more or less bitter hydrolysates. Organoleptic characteristics, and more particularly the bitterness, were quantified for each sample collected during the kinetics using a trained sensory panel. Then, peptides generated during hydrolysis were characterised by a peptidomics approach combining the peptide chromatographic separation by reversed-phase high-pressure liquid chromatography (RP-HPLC), the detection and fragmentation of peptides by tandem mass spectrometry (MS/MS) and the mass data management. Finally, we studied the nature of the generated peptides and their influence in the appearance of bitterness during the hydrolysis process by using a method based on differential expression analysis, heat maps, regression trees and random forests.

#### **2. Material and Methods**

#### *2.1. Enzymatic Hydrolysis Kinetics*

Micellar caseins (ratio micellar caseins/whey proteins (92:8)) were prepared by the Ingredia S.A. manufacturer (St-Pol-Sur-Ternoise, France) using industrial processes. These proteins were hydrolysed with the food grade enzymes (Table 1) Flavourzyme and Protamex that were obtained from Novozymes (Bagsvaerd, Denmark) and allowed the preparation

of a kinetics named 109, and Promod 523MDP ™ and FlavorPro 937 ™ were obtained from Biocatalysts and allowed the preparation of a kinetics named 125 (Wales, UK).


**Table 1.** Characteristics of Novozymes proteases.

\* Leucine amino peptidase units per gram (LAPU/g); Anson unit per gram (AU-N/g); Gelatin digestion units per gram (GDU/g).

The enzymatic hydrolyses were performed for nine hours using a confidential recipe. Overall, the protein solution of micellar caseins (92%) was diluted with distilled water to a concentration of 10% of total nitrogenous matter and brought to the desired pH by adding NaOH (4N). The necessary enzyme quantity was then added directly if it was in liquid form or solubilised in distilled water if it was in powder form. The hydrolysis monitoring was carried out by collecting data from pH, temperature and osmometry. Then, the degree of hydrolysis (DH) was determined using Nielsen et al.'s method based on the reaction of primary amino groups with ortho-phthaldialdehyde (OPA) [11], and the DH was calculated as previously described [10].

Samples were taken every hour and the enzymes were inactivated by heating at 98 ◦C for 3 min. About 1.5 L of hydrolysates was dried by atomisation using the Mini Spray Dryer B-290 from BUCHI (Rungis, France). The drying process was performed following the same procedure described previously [10]. Each hour, an aliquot of each hydrolysis was frozen at −20 ◦C before further analyses.

#### *2.2. Analyses of Samples*

#### 2.2.1. Sensory Analysis

Panel Composition and Training

The sensory analysis was carried out with a total of 19 healthy adults (12 females and 7 males, aged from 45 to 65 years old). They were enrolled in a training program, for 20 months, designed to identify and quantify the different descriptors chosen to characterise the hydrolysates. The descriptors were: (i) five odours (milk, fermented milk, rancid, soymilk and smelly), (ii) eleven flavours (bitter, sour, milk, sweet, mild, cheese, vanilla, salty, rancid, barn and whey) and (iii) five persistence flavours (bitter, sour, milk, sweet and cheese). The quantification of each descriptor was performed using a scale from 1 (low) to 7 (high). In this study, only bitterness data will be processed. Before starting this experiment, the performance of the assessors in terms of discrimination, repeatability and agreement was validated.

Tasting Conditions

The assessors evaluated the nine samples in a duplicate manner during four sessions (two sessions with four samples and two with five), for both kinetics. Those sessions were performed under standard sensory conditions (ISO 13299, 2003). Samples were presented in a sequential monadic way and their presentation order was based on a Williams' Latinsquare arrangement. Samples were dissolved in mineral water at a concentration of 10% of dry matter and 20 mL was presented in white plastic tumblers to each assessor and served at room temperature. The tests were performed in individual booths under white lighting and at 20 ± 2 ◦C. No time restriction was imposed on the assessors to perform this test.

Sensory Data Analyses

For both hydrolysis kinetics (109 and 125), sensory data were first assessed by a twoway ANOVA considering the samples and the consumers as factors and the bitterness scores as the dependent variable. A Duncan's multiple range test (*p* ≤ 0.05) was performed to compare the samples two by two. These statistical analyses were computed using XLStat (XLStat 2020 1.1, Paris, France).

#### 2.2.2. Mass Spectrometry: Sample Preparation and Peptide Characterisation Using HPLC-ESI-Qtof-MS/MS and Bioinformatics Treatment

The samples were prepared and analyzed in duplicate with the same method used in a previous study [10]. Briefly, peptides were purified and concentrated using a C18 solid phase extraction and 10 µL was separated using a reversed-phase high-performance liquid chromatography and an apolar gradient of 60 min: 1% ACN/0.1% of formic acid (FA) (*v/v*) for 3 min, then 1 to 30% ACN/0.1% FA (*v/v*) for 42 min, 30 to 95% ACN/0.1% FA (*v/v*) for 10 min and finally 95 to 99% ACN/0.1% FA (*v/v*) for 5 min. The analysis of eluted peptides was performed with a Synapt-G2-Si (Waters) mass spectrometer in sensitivity, positive and data-dependent analysis (DDA) modes (HPLC-MS/MS). Several quality control (QC) samples corresponding to (i) the mixture in equivalent volume of all C18-purified samples of both kinetics, (ii) the mixture of samples of kinetics 109 and (iii) those of kinetics 125 were also analysed at the beginning, middle and end of the HPLC-MS/MS analysis session.

Raw data from all HPLC-MS/MS runs were imported in Progenesis QI for proteomics software (Version 4.1, Nonlinear Dynamics, Newcastle upon Tyne, UK). First, data filtering was conducted before peak picking where a maximum charge of +4, a retention time defined between 5 and 50 min and a minimum intensity of 1000 were applied. Then, data alignment was automatically managed by Progenesis software using one of all QC runs as reference. Subsequently, manual alignment was performed if necessary, to optimise run alignment, and data normalisation was automatically performed for principal component analysis (PCA). The filtering criteria used for the statistical comparison of mass signals of HPLC-MS/MS runs were set as follows: (i) a maximum coefficient of ANOVA less or equal to 10−<sup>10</sup> and (ii) only the identified peptides. Concomitantly, Progenesis software reported the quantitative evolution of peptides in terms of normalised abundance in the different hydrolysates. The variables used are derived from the comparison of peptide maps, i.e., the position of the isotopic massifs and their intensity. The reprocessing of mass spectrometry data and database searches to identify the peptides were performed via Peaks Studio version 10+ (Bioinformatics Solutions Inc., Waterloo, ON, Canada) using the UniProt database (10 September 2018) restricted to the complete proteome of *Bos taurus* organism. The parameters of mass tolerance thresholds, number of missing cleavage sites tolerated, choice of enzyme, and false discovery rate (FDR) were the same as previously [10].

#### *2.3. Relationship between Sensory and Mass Data*

The link between identified peptides and sensory perception of the samples was investigated using various methods: a heat map with differential expression analysis, and regression trees and random forest methodologies.

#### 2.3.1. Heat Map

A heat map was drawn from the MS-data corresponding to identified peptides from micellar caseins and their quantification in the 18 samples of both kinetics 109 and 125; the peptides corresponding to the features and the hydrolysis samples to the individuals. The heat map reflects the matrix data so that the values (normalised peptide abundance) are replaced by colour intensities ranging from yellow (low abundance) to red (high abundance). Cluster analysis was also carried out based on the heat map and the results were drawn as tree maps in the heat map [12].

Differential expression was also used to identify peptides that significantly influence the bitterness of hydrolysates. This latter was performed by merging kinetics 109 and 125. For the differential expression test, two groups were established: the group named "1" will be considered as more bitter and the "2" as less bitter. The split between these two groups was determined visually as follows: the samples were ranked in descending order of bitterness and the split was defined at the point where the greatest difference between two successive values was observed.

#### 2.3.2. Regression Trees and Random Forest

Regression trees (RTs) optimally subdivide the samples by a set of decision rules. These rules are constructed by iteratively separating the dataset with binary splits based on the choice of one predictor variable and an associated threshold value. The random forest (RF) algorithm generates multiple trees without pruning, improving the stability of the model. This is achieved by a double process of randomisation: (i) a random selection of the predictors at each node of each tree and (ii) each tree is grown on a different random data subset, selected by bootstrapping, i.e., sampling from the initial samples with replacement. The data portion used for the training phase is known as the "in-bag" data, whereas the rest is called the "out-of-bag" data. The latter will provide estimates of predicting errors [13]: the root mean square error of this predicted value is computed on the out-of-bag samples (RMSEOOB).

In our case, RFs consist of modelling a sensory variable (bitterness descriptor) as a function of a number of predictors (presence or absence of peptides as well as their normalised abundance). The variables correspond to the 116 identified peptides. The aim here is to find out which peptides have a strong importance in understanding the intensity of the bitter descriptor. For this purpose, 50 forests of 5000 trees have been built. RFs will allow us to obtain the importance of each peptide and a confidence interval is computed around the importance of the peptides. All the peptides whose lower bounds of the confidence interval are greater than 0 are selected and are thus involved to predict the intensity of the studied attribute. − *≤* 

RTs and RFs have been carried out using language R 3.5.1 [14] and the R packages rpart [15] and random Forest [16].

#### **3. Results**

#### *3.1. Influence of Hydrolysis Kinetics on the Sensory Characteristics of Hydrolysates*

Figure 1 shows the evolution of bitterness intensity and DH during kinetics 109 (a) and 125 (b). ANOVA shows that samples are significantly discriminated (*p* ≤ 0.05) for both kinetics.

■ ▲ ■ ▲ − ≤ 0.05 **Figure 1.** Evolution of bitterness and DH during hydrolysis kinetics 109 (**a**) and 125 (**b**). The black kinetics 109 () and kinetics 125 (N), respectively. The dotted orange lines represent the evolution of DH for kinetics 109 () and kinetics 125 (N), respectively. The bitterness intensity values from 1 (low) to 7 (high) are means +/− standard deviation (*n* = 2): the different letters (a, b, c) indicate means that significantly differ among the nine samples at *p* ≤ 0.05 according to Duncan's multiple range test.

Globally, the bitterness intensity (black line, Figure 1a) of the samples of the hydrolysis 109 decreases over the time. During the first three hours, the intensity is at its highest

75 peptides from β casein, 19 from α

casein, 10 from α from β

V224. The α and α

κ and β

−

Peptides identified from β

tides from α

level and stagnates at the value of 5.20. Then, a decrease begins from 5.20 to 3.00 in 2 h of hydrolysis (between 3 and 5 h of hydrolysis) and remains stable at the bitterness level of 3 for the kinetics' remaining time. After nine hours of hydrolysis, the DH (dotted orange line, Figure 1a) reaches a value of 50.8%. A consequent increase in the DH is observed between the 4th (13.9%) and 5th hour (44.1%), which is concomitant with the decrease in the bitterness intensity. A Pearson correlation coefficient (*r =* −0.933; *p* ≤ 0.001) between DH and bitterness score suggests that the higher the DH, the less bitterness in the samples.

Concerning kinetics 125 (black line, Figure 1b), the bitterness appears to be stable over time. The minimum bitterness value is observed after two hours of hydrolysis with an intensity of 2.18 ± 0.37. A progressive increase in bitterness is observed after the 5th hour of hydrolysis and until the end of the hydrolysis, as indicated by the bitterness values which increase from 2.23 ± 0.46 to 3.59 ± 0.50. The DH (dotted orange line, Figure 1b) increases over time to reach the maximum value of 28.9% after nine hours of hydrolysis. Here, again a high increase, ranging from 10.9% to 22.6%, is observed between the 3rd and 5th hour of hydrolysis. However, contrary to kinetics 109, the bitterness intensity increase follows the DH increase, especially from the fifth hour of kinetics.

#### *3.2. Peptide Characterisation and Peptide Abundance Evolution during the Hydrolysis*

The HPLC-MS/MS raw data obtained for the 36 withdrawn samples (nine collected samples × two kinetics × two replicates) and the nine QCs (QC 109 × three replicates, QC 125 × three replicates and QC109–125 × three replicates) were imported in Progenesis QI for proteomics software. Among the 2635 peak picked mass signals, 479 mass signals have an ANOVA < 10−10, and among them, 116 mass signals were identified as milk protein peptides (Supplemental Table S1). These latter represent the global diversity, all hydrolysates combined, of identified peptides. Overall, 75 peptides from β-casein, 19 from α-S1 casein, 10 from α-S2 casein, 9 from kappa-casein, and 3 from β-lactoglobulin and no peptides from α-lactalbumin were identified. Between 114 and 116 peptides were identified per sample collected during the kinetics. The size features of identified peptides are: (i) a length comprising between 6 and 23 amino acids with a length mean of 11 amino acids and (ii) a molecular mass mean of 1303.37 ± 335.46 Da.

Peptides identified from β-casein corresponded mainly to three protein regions: Y75- G109, A116-F134 and T142-V224. The α-S1 casein- and α-S2 casein-peptides corresponded to three protein regions (G25-G48, L114-M138 and P192-P212) and two protein regions (L111-N130 and R185-A204), respectively. The κ-casein- and β-lactoglobulin-peptides corresponded to two protein regions (F39-G60 and F76-L95) and one protein region (V57- L73), respectively.

PCA was performed using the 116 identified peptides (shown as light grey numbers in Figure 2) whose amino acid sequences are gathered in Supplemental Data S1. Figure 2 shows the first two principal components and illustrates the correlations between the 36 withdrawn samples. These principal components #1 and #2 explain 82.16% of the variance, and in such PCA, the more distant the groups, the more different in terms of peptide population. In the biplot presented in Figure 2, the technical replicates (same colour points) of each sample (including QCs) are close to each other, as can be seen with the examples shown with a red arrow on the PCA, indicating good technical repeatability. Moreover, the QCs of kinetics 109 (in yellow) are found at almost equal distance between the two groups formed by kinetics 109, and it is the same for those of kinetics 125 (in dark to light blue) and the QC of kinetics 109–125, which are found between the three groups represented on the PCA.

**Figure 2.** Principal component analysis corresponding to the 116 identified peptides of kinetics 109 and 125. In yellow the quality controls, corresponding to the equimolar mixture of the samples of kinetics 109 (QC 109), in blue "water green" those of kinetics 125 (QC 125) and in black those of all the samples combined (QC 109–125). Each QC appears as three replicates corresponding to an injection at the beginning, middle and end of the LC-MS/MS analysis. Each sample of both kinetics was analysed in replicates (samples with the same colour on the PCA as shown for the sample 109-5 (red arrows)), corresponding to a total of 36 samples: nine samples of hydrolysis kinetics 125 ranging from dark blue (125-9) to light blue (125-1) and nine samples of hydrolysis kinetics 109 ranging from brown (109-9) to very light red (109-1).

(αS1 β αS β An agglomerative hierarchical clustering (AHC) allowed us to display three groups on the PCA: (i) a group circled in brown (top right) gathering samples 109-1, -2, -3, (ii) a group circled in burgundy red (top left) gathering samples 109-4, -5, -6, -7, -8, -9, and (iii) a central group circled in blue gathering all samples of kinetics 125. Notably, the evolution, according to the hydrolysis time, of peptide heterogeneity is clearly evidenced on the PCA of T1 to T9 of kinetics 125, which moves from right to left (from dark blue to light blue). As for the sensory analysis, a difference is observed between samples 109-1, -2, -3, which are significantly more bitter than the other samples of the kinetics. The samples of kinetics 125 are positioned between the two groups of kinetics 109 and thus appear to have peptide sequences common to both groups and with intermediate normalised abundances.

The Progenesis QI software uses the peptide identities and their MS-based abundance data to generate an explicit picture of the evolution of normalised abundance of the peptide during hydrolysis kinetics (Figure 3a,b). As illustrated in Figure 3a,b, the peptides FVAPFPE (αS1-CN (39–45)) and LYQEPVLGPVRGPFPI (β-CN (207–222)) are more abundant during the first three hours of kinetics 109 (left part of curves), and conversely have negligible normalised abundance in the other samples collected during kinetics 109 and 125. On the other hand, Figure 3b shows normalised abundance curves according to hydrolysis times of peptides LQYLYQGPIVL (αS2-CN (111–121)) and YPFPGPIPNSLPQN (β-CN (75–88)), more abundant during kinetics 125. The latter are not or only very weakly present in samples 109-1, -2, -3, considered as the most bitter, suggesting that they do not bring significant bitterness to the samples. They would therefore not be responsible for the difference in bitterness between samples 109-1, -2, -3 and the others.

#### *3.3. Relationship between Generated Peptides and Bitterness during Hydrolysis* 3.3.1. Heat Map

The heat map presented Figure 4 shows the differences in terms of normalised abundances between the samples of both kinetics in a more visual way than a table. On the heat map, the peptides are grouped in rows and the samples withdrawn during the hydrolysis kinetics (109 and 125) in columns. The peptides are divided into two groups: A' and B' (left dendrogram) and the samples are divided into two groups: A and B (top dendrogram).

**Figure 3.** Examples of two opposite evolutions of the normalised abundance of peptides (**a**) whose abundance is highest at the beginning of kinetics 109 and (**b**) whose abundance is highest at the end of kinetics 125. The red and blue colour represent kinetics 109 and kinetics 125, respectively. (HM: highest mean; LM: lowest mean).

In order to identify the peptides responsible for the difference in bitterness, a differential expression analysis was performed by merging kinetics 109 and 125. Among the 18 samples, two different groups were formed according to the bitterness scores obtained by sensory analysis (group #1 and group #2). Group #1 includes the samples 109-1, 109-2, and 109-3, which all had a bitterness intensity greater than 3.91, and group #2 includes the remaining 15 samples of both kinetics with an intensity equal to or lower than 3.91. Among the 116 identified peptides, only 54 are significant (*p* ≤ 0.05—noted with an asterisk (\*) in Figure 4) which means that they contribute to the difference in terms of bitterness between groups #1 and #2.

≤ 0.05 The peptide group B' corresponds to the 54 significant peptides discriminated by the differential analysis, and explains the difference in bitterness between the samples in groups A and B. The peptide group A', corresponding to the non-significant peptides of the differential analysis, explains the difference between the samples of kinetics 125 and 109 (without the samples 109-1, 2 and 3). The sample group A corresponds to the three samples 109-1, 109-2 and 109-3 with the highest bitterness intensities according to the sensory data obtained by the panel. These three samples are well differentiated on the heat map: the red colour (corresponding to the peptide group B') and the yellow colour (corresponding to the peptide group A') rectangles on the left show that for the first samples of hydrolysis 109, we have a relatively high normalised abundance of group B' peptides compared to group A'. According to their normalised abundance, these peptides as well as their quantity tend to explain the high bitterness of the first samples at the beginning of the hydrolysis. The second group of samples (B) includes the subgroup of the samples of kinetics 125 in the following order: 125-2, 125-1, 125-3, 125-4, 125-5, 125-8, 125-9, 125-6, and 125-7, and the subgroup of the remaining samples of kinetics 109: 109-4, 109-8, 109-9, 109-5, 109-6 and 109-7. The samples of the kinetics 109 subgroup with the exception of 109-1, 2 and 3 are characterised by high normalised abundances of the A' peptide group. The kinetics 125 subgroup samples are characterised by high normalised abundances of the first eight peptides of group A' and intermediate normalised abundances of the remaining peptides.

≤ 0.05 **Figure 4.** Heat map related to mass spectrometry data. The peptides marked with \* correspond to the peptides that explain the significant difference between the two groups of hydrolysates obtained with the differential analysis (*p* ≤ 0.05). Peptides marked with an "x" are the peptides identified through random forests as the most influential in explaining the bitterness of hydrolysates.

#### 3.3.2. Regression Trees and Random Forests

RT and RF were also performed by merging the samples of kinetics 109 and 125. The importance of peptides as predictors of the main taste characteristic of enzymatic hydrolysates, namely bitterness, is presented in Figure 5. The measure of this importance quantifies the contribution of each peptide to the prediction of the sensory profile. Based on the confidence interval, we noticed that more than three quarters (94) of the peptides influenced bitterness. In order to build an accurate and parsimonious model and to identify the best subset of predictors from the 94 pre-selected predictor variables, we reduced the number of predictors by adding the peptides sequentially, from the most important to the least important one: all possible subsets from 5 to k variables are considered in turn. For each subset, an RF is constructed with the same parameters as before. The quality of the prediction was computed for each model generated and evaluated using the RMSEOOB index. The lower it is, the better the quality of the prediction will be. The optimal RF was obtained for 22 peptides with a RMSEOOB of 0.3405. The latter therefore corresponds to the first 22 peptides presented in Figure 5.

**Figure 5.** Random forests on the descriptor bitter: importance of the 116 peptides. The confidence intervals (95%) were obtained with 50 random forests of 5000 trees.

The model obtained with these selected 22 peptides was applied to both kinetics for predicting hydrolysate bitterness profiles (Figure 6). Figure 6 represents the evolution of the bitterness intensity of samples during kinetics 109 and 125 over time, with the observed values (full lines associated with full circles and triangles) and the predicted values (dotted lines associated with red empty circles and black triangles). The prediction is quite good, with a mean error of 0.34 for the predicted perception of bitterness and a correlation of 0.93 between observed and predicted values.

Thanks to the 22 peptides, an optimal RT has been built (Figure 7). The first peptide which splits the initial 18 samples into 2 groups was FVAPFPE at a normalised abundance threshold value of 2,431,772. The three samples (node 7, *n* = 3) with a value higher than this threshold were considered as the most bitter. These latter are the three first samples of kinetics 109 (109-1, 109-2 and 109-3). On the other hand, fifteen hydrolysates with a FVAPFPE value below this threshold were grouped together (nodes 3, 5 and 6 with, respectively, 6, 8 and 1 sample(s)). Then, when the normalised abundance of NIPPLTQTPVVVPPF was

lower than 56,200.67, six samples were separated from the others and revealed the least bitterness perception. A last split was performed another time with the peptide FVAPFPE and beyond the abundance of 656,107.8. The hydrolysate was again considered more bitter when the peptide abundance was higher than this value.

**Figure 6.** Prediction of the bitterness of out-of-bag (OOB) samples. The red colour represents the evolution of the intensity of samples during kinetics 109 and the black colour represents the samples of kinetics 125. The full lines with full-circles or -triangles represent the observed values and the open-circles and -triangles represent the predicted values.

**Figure 7.** Optimal regression tree built to predict the hydrolysate bitterness from the 22 identified peptides. The boxplots represented below the tree show the intensity and the gradual evolution of bitterness. *n* = number of hydrolysates for each group defined by a different level of bitterness with sample reference number. Node 3 (*n* = 6) corresponds to hydrolysates 125-1, -2, -3, -4, -5 and -6; Node 5 (*n* = 8) corresponds to hydrolysates 125-7, -8, -9/109-5, -6, -7, -8 and -9; Node 6 (*n* = 1) corresponds to hydrolysate 109-4; Node 7 (*n* = 3) corresponds to hydrolysates 109-1, -2 and -3.

called "ki-

netics 109" and "kinetics 125".

#### **4. Discussion**

The enzymatic hydrolysis of caseins is especially known for the appearance of bitterness, which is a hindrance to their use in the agri-food industry [17]. The first step of this study was to analyse the bitterness profile of two different hydrolysis kinetics called "kinetics 109" and "kinetics 125". For these given hydrolysis conditions, the sensory evaluation, driven with a trained sensory panel, reveals that significantly different bitterness levels are obtained depending on the hydrolysis time. This latter is more visible for kinetics 109, for which we obtained a significant drop in bitterness after about three hours of hydrolysis, reaching an intensity of 3. It is well known that the development of a specific sensory profile in protein hydrolysates depends on the protein source, enzyme specificity and the conditions of the hydrolysis [18,19]. Based on all these rules, we selected and combined some specific enzymes known in the literature to aid in the development of low bitterness hydrolysates for applications in nutrition and the agri-food market. These enzymes are Flavorpro 937MDP ™, a mixture of endo- and exoproteases, which according to the manufacturer Biocatalysts, has been developed to remove the excessive bitter-tasting peptides produced when using animal and bacterial proteases; Promod 523MDP ™, an endoprotease with a bromelain activity, which is effective in the production of highly digestible proteins [20]; Protamex, which also substantially reduces bitterness when hydrolysing caseins, as indicated by the company Novozymes [21]; and finally Flavourzyme, which contains both endo- and exoprotease activities and has shown its efficiency in obtaining less bitter milk protein hydrolysates compared to other enzyme preparations [22]. This efficiency has been linked to the presence of high exopeptidase activity within this preparation [23,24]. Indeed, the exopeptidases cleave peptides from their C- or N-terminal extremities, allowing a reduction in the bitterness due to terminal hydrophobic amino acid residues, for example [25].

The sensory study of kinetics 109 and 125 allowed us to monitor on one hand the evolution of certain sensory features, and on the other hand the physico-chemical parameters, such as the DH. Concerning the latter, it emerged that it was not always correlated with the bitterness of the hydrolysates, as already confirmed by a lot of studies [26–29].

The second step was to identify the peptides generated during the time course of hydrolyses and concomitantly quantify their normalised abundance. These mass spectrometry data were then analysed by performing a heat map combined with a differential expression analysis of both kinetics combined. These statistical tests are very often used in the presence of big data such as OMICS-type data [30]. This heat map brings an overview of the peptide abundance evolution during hydrolysis kinetics and an image of the proximity of the samples. For kinetics 109, the lower peptide cluster is very abundant at the beginning of hydrolysis, then decreases over time while remaining stable during the last hours of the hydrolysis. Combined with the differential expression, 54 peptides have been identified as responsible for the bitterness difference. These latter are the most abundant peptides in the first three bitterest samples 109-1, 109-2 and 109-3, except LKKYKVPQLE, VYQHQKA and LSQSKVLPVPQgKA. The evolution of peptide abundance during kinetics 125 seems to be almost identical for all the samples constituting it, except for samples 125-1, -2, -3, which would explain the stability of its bitterness over time.

One of the main strengths of this study concerns the use of RT and RF methodologies to predict the bitterness of samples. It is important to note that single trees are easy to interpret and to establish the relationships inside the dataset. However, they are unstable, and small perturbations in the dataset can completely change their structure. Therefore, for the prediction of the sample taste, the use of the whole RF is more convenient. In fact, we obtained a very satisfactory quality index with a value of 0.34, meaning that the bitterness intensity can be predicted with a confidence interval of 0.34. Such a value is a good estimation in sensory evaluation. Moreover, the results showed that the RF highlighted the importance of peptides in explaining the bitterness of the 18 samples. As shown in Figure 5, the 22 most influential peptides selected to construct the RF are among the 54 peptides most involved in the differentiation of the two sample categories #1 and #2

derived from differential expression. The simultaneous presence of this set of peptides and their abundance are the cause of the difference in bitterness existing between the samples. An additional verification was made with the BIOPEP database. Available online: http://www.uwm.edu.pl/biochemia/index.php/en/biopep (accessed on 10 February 2021) [31] and the literature to determine if some of those 22 peptides were already reported as being bitter. In this database, all data about the taste of the peptides were obtained from sensory studies described in the literature. The peptides YQEPVLGPVRGPFP, YQEPVL-GPVRGPFPIIV, APKHKEMPFPKYPVEPF, MAPKHKEMPFPKYPVEPF, AMAPKHKEMPF-PKYPVEP and PVLGPVRGPFP had already been identified [17,32–34]. The studies of Karametsi et al. highlighted the following bitter peptides: GPVRGPFPIIV and YQEPVL-GPVRGPFPI [33], and those of Toelstede et al. the sequence GPVRGPFP [35]. These results show that their primary structure is similar to peptides LLYQEPVLGPVRGPFPIIV, LYQEPVLGPVRGPFPI, LYQEPVLGPVRGPFP, LYQEPVLGPVRGPFPIIV and LLYQEPVL-GPVRGPFP that we have identified in this study, suggesting that this protein region of the β-casein is conducive to the release of bitter peptides. Besides that, the FVAPFPE peptide is already known to display angiotensin converting enzyme inhibitory activity [36], but no information on its taste has been given. However, a peptide of close structure "FFVAPF-PEVFGK", referenced in the BIOPEP database, has been identified as bitter. Moreover, a correlation between the bitter taste of a peptide and bioactivity has been demonstrated [19]. The VEELKPTPEGDLEIL peptide was also characterised as bitter by Spellman et al. (2015). No information was found on the other peptide sequences. Thus, a large majority of peptides highlighted by RFs have already been identified in the literature. However, it is worth to note that it is the combination of the presence and/or the absence and the association of different peptides which is responsible for hydrolysate's bitterness.

However, this prediction model has a major limitation since it can only be used with the same enzymes used in this study. The use of RFs makes it possible to take into account the entire complex mixture represented by a peptide hydrolysate. Indeed, the synergistic and antagonistic effects that may exist between peptides and their impact on bitterness are considered.

#### **5. Conclusions**

The impact of bitterness on food rejection has been studied extensively. Therefore, the development of protein hydrolysates with low levels of bitterness is an essential challenge for their incorporation in various foods. The hydrolysates formulated here may be used in the development of future food formulations such as peptide-fortified ready-to-drink infant formulae and low pH beverages such as fruit juices, for instance.

The data generated may be employed to inform the selection of a certain type of enzyme preparation and target the degree of hydrolysis values to generate hydrolysates with adequate sensory properties. A peptide hydrolysate is a complex mixture where interactions between peptides can complicate their study. However, random forests appear to be a useful tool for their analysis. Moreover, the use of RT and RF methodologies allows us, on one side, to highlight peptides involved in the explanation of the bitterness of samples, and on the other side to predict the bitterness profile of a micellar casein hydrolysate.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/foods10061312/s1, Table S1: list of peptides identified by the HPLC-MS/MS analyses.

**Author Contributions:** D.D.: investigation, formal analysis, visualization, data curation, writing original draft; B.D.: investigation, formal analysis, validation, data curation, supervision, writing original draft; P.C.: investigation, formal analysis, validation, data curation, writing—original draft; A.B.: methodology, supervision, writing—reviewing and editing; S.C.: methodology, formal analysis, data curation, supervision, funding acquisition, writing—reviewing and editing; R.F.: methodology, supervision, funding acquisition, writing—reviewing and editing; C.F.: conceptualization, methodology, visualization, validation, data curation, supervision, funding acquisition, writing—reviewing and editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been carried out in the framework of the joint laboratory between the Charles Viollette institute and the Ingredia project (Allinpep) and in the framework of the Alibiotech research program which is financed by the European Union, French State and the French Region of Hautsde-France. The HPLC-MS/MS experiments were performed on the REALCAT platform funded by a French governmental subsidy managed by the French National Research Agency (ANR) within the frame of the "Future Investments'program (ANR-11- EQPX-0037)". The Hauts-de-France region and the FEDER, the Ecole Centrale de Lille and the Centrale Initiatives Foundation are also warmly acknowledged for their financial contributions to the acquisition of REALCAT platform equipment.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Data is contained within the article or supplementary material.

**Conflicts of Interest:** The project was funded by Ingredia S.A. Dahlia Daher and Alain Baniel are employed by Ingredia S.A.

#### **References**


## *Article* **Proteomic Characterization of Bacteriophage Peptides from the Mastitis Producer** *Staphylococcus aureus* **by LC-ESI-MS/MS and the Bacteriophage Phylogenomic Analysis**

**Ana G. Abril <sup>1</sup> , Mónica Carrera 2,\* , Karola Böhme <sup>3</sup> , Jorge Barros-Velázquez <sup>4</sup> , Benito Cañas <sup>5</sup> , José-Luis R. Rama <sup>1</sup> , Tomás G. Villa <sup>1</sup> and Pilar Calo-Mata 4,\***


**Abstract:** The present work describes LC-ESI-MS/MS MS (liquid chromatography-electrospray ionization-tandem mass spectrometry) analyses of tryptic digestion peptides from phages that infect mastitis-causing *Staphylococcus aureus* isolated from dairy products. A total of 1933 nonredundant peptides belonging to 1282 proteins were identified and analyzed. Among them, 79 staphylococcal peptides from phages were confirmed. These peptides belong to proteins such as phage repressors, structural phage proteins, uncharacterized phage proteins and complement inhibitors. Moreover, eighteen of the phage origin peptides found were specific to *S. aureus* strains. These diagnostic peptides could be useful for the identification and characterization of *S. aureus* strains that cause mastitis. Furthermore, a study of bacteriophage phylogeny and the relationship among the identified phage peptides and the bacteria they infect was also performed. The results show the specific peptides that are present in closely related phages and the existing links between bacteriophage phylogeny and the respective *Staphylococcus* spp. infected.

**Keywords:** pathogen detection; LC-ESI-MS/MS; proteomics; mass spectrometry; phage peptide biomarker

#### **1. Introduction**

The vast majority of mastitis cases are due to an intramammary infection caused by a microorganism belonging to either the *Staphylococcus* or *Streptococcus* genus [1,2]. *Staphylococcus aureus* is considered one of the major foodborne pathogens that can cause serious food intoxication in humans due to the production of endotoxins; this pathogen remains a major issue in the dairy industry due to its persistence in cows, its pathogenicity, its contagiousness and its ease of colonization of the skin and mucosal epithelia [3–5].

It is well-known that *S. aureus* bacteriophages encode genes for staphylococcal virulence factors, such as Panton-Valentine leucocidin, staphylokinase, enterotoxins, chemotaxisinhibitory proteins or exfoliative toxins [6]. These phages are usually integrated into bacterial chromosomes as prophages, wherein they encode new properties in the host, or vice versa, as transcriptions may hardly be affected by gene disruptions [7]. Phage-encoded recombinases, rather than the host recombinase, RecA, are involved in bacterial genome excisions and integrations [8,9]. These integrations may occur at specific bacterial genome sites that are identical to those present in the DNA of the phage, or, as in the case of phage

**Citation:** Abril, A.G.; Carrera, M.; Böhme, K.; Barros-Velázquez, J.; Cañas, B.; Rama, J.-L.R.; Villa, T.G.; Calo-Mata, P. Proteomic Characterization of Bacteriophage Peptides from the Mastitis Producer *Staphylococcus aureus* by LC-ESI-MS/MS and the Bacteriophage Phylogenomic Analysis. *Foods* **2021**, *10*, 799. https://doi.org/10.3390/foods10040799

Received: 8 March 2021 Accepted: 6 April 2021 Published: 8 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Mu (as long as the given gene is not expressed), some phages can integrate randomly within the bacterial genome. In addition, bacteriophage and staphylococcal species interactions may substantially alter the variability of the bacterial population [10,11].

All known *S. aureus* phages are composed of an icosahedral capsid filled with doublestranded DNA and a thin, filamentous tail, and they belong to the order *Caudovirales* (tailed phages) [12,13]. Some *Podoviridae* family phages, such as the *Staphylococcus* viruses S13′ and S24-1, have been reported, characterized and used in phage therapy against *S. aureus* infections [14]. There are some well-known *Siphoviridae* phages of *S. aureus*, such as the prophage ϕSaBov, which is integrated into a bovine mastitis-causing *S. aureus* strain [15].

The interaction between bacteria and bacteriophages leads to an exchange of genetic information, which enables bacteria to rapidly adapt to challenging environmental conditions and to be highly dynamic [11,16]. As closely related phages normally occupy the same genome location in different bacteria, a specific site in different bacterial strains can be occupied by completely different phages or can be empty.

Conventional culture-based methods have been used for the detection of pathogenic bacteria [17,18] and their phages [19,20]; however, at this point, these procedures are timeconsuming and laborious. For this reason, new, rapid molecular microbial diagnostic methods based on genomics and proteomics tools have been developed to achieve faster and more efficient bacterial and bacteriophage identification [1,21–24]. Specifically, phage typing is a classic technique for such purposes [25]. Moreover, biosensors based on phage nucleic acids, receptor-binding proteins (RBPs), antibodies and phage display peptides (PDPs) have been used for pathogen detection [26–30].

Mass spectrometry techniques, such as MALDI-TOF MS (matrix-assisted laser desorption/ionization time-of-flight mass spectrometry) and LC-ESI-MS/MS (liquid chromatographyelectrospray ionization-tandem mass spectrometry), have been used for the analysis and detection of specific diagnostic peptides in pathogenic bacterial strains [31,32]. In addition, LC-ESI-MS/MS methods have been employed for the identification and detection of bacteriophages [19]. In the case of bacteriophage detection and identification by a mass spectrometry analysis, the required production of viruses may be time-consuming. The detection of prophages based on protein biomarkers can be an alternative to genomic detection, and in this sense, proteomic techniques can be cheaper and faster and can ascertain different bacteriophage species by using a single analysis [33]. Based on the specificity of many bacteriophages with their hosts, bacteriophages are considered signal amplifiers; therefore, the detection of peptides from phages is suitable for pathogen identification. For example, Serafim et al. 2017 [33] identified bacteriophage lambda by a LC-ESI-MS/MS analysis. Moreover, the identification of peptides by means of LC-ESI-MS/MS from bacteriophage-infected *Streptococcus* has been performed, which revealed new information on phage phylogenomics and their interactions with the bacteria they infect [19]. However, no study has been published on *S. aureus* phage detection and identification by LC-ESI-MS/MS or on *S. aureus* phage characterization without a previous phage purification step. Viral genomic detection and phage display are time-consuming methods. Here, we describe an easy, fast and accurate method for the detection of bacteriophages without the need for the pretreatment of bacterial lysis for bacteriophage replication. This method led to the identification of putative temperate and virulent phages present in the analyzed strains.

A previously published work performed by our laboratory [3] studied the global proteome of several strains of *S. aureus* by shotgun proteomics. Important virulence protein factors and functional pathways were characterized by a protein network analysis. In this work, and for the first time, we aimed to use proteomics to characterize phage contents in different *S. aureus* strains to identify the relevant phage-specific peptides of several *S. aureus* strains and to identify both phages and bacterial strains by LC-ESI-MS/MS.

#### **2. Materials and Methods**

#### *2.1. Bacteria*

In this study, a total of 20 different *S. aureus* strains obtained from different sources were analyzed (Table S1 in Supplemental Data 2). These strains were previously characterized by MALDI-TOF mass spectrometry [1] after being obtained from the Institute of Science of Food Production of the National Research Council of Italy (Italy) and from the Spanish Type Culture Collection (Spain). The majority of the strains are from food origins, except for strain U17, which is a human clinical strain. Strains ATCC (American Type Culture Collection) 9144 and ATCC 29213 are classified as *S. aureus* subsp. *aureus*, while strain ATCC 35845 is categorized as *S. aureus* subsp. *anaerobius*. In previous works, the species identification of *S. aureus* and the presence of enterotoxins were evaluated by multiplex polymerase chain reactions (multiplex PCRs) [3,34,35]. The strains were reactivated in a brain–heart infusion medium (BHI, Oxoid Ltd., Hampshire, UK) and incubated at 31 ◦C for 24 h. Bacterial cultures were then grown on plate count agar (PCA, Oxoid) at 31 ◦C for 24 h [1,3,36]. Tubes of broth were inoculated under aerobic conditions.

#### *2.2. Protein Extraction and Peptide Sample Preparation*

Protein extraction was prepared as described previously [37]. All analyses were performed in triplicate. Protein extracts were subjected to in-solution tryptic digestion [38].

#### *2.3. Shotgun LC-MS/MS Analysis*

Peptide digests were acidified with formic acid (FA), cleaned on a C18 MicroSpin™ column (The Nest Group, South-borough, MA, USA) and analyzed by LC-ESI-MS/MS using a Proxeon EASY-nLC II Nanoflow system (Thermo Fisher Scientific, San Jose, CA, USA) coupled to an LTQ-Orbitrap XL mass spectrometer (Thermo Fisher Scientific, San Jose, CA, USA) [3]. Peptide separation (2 µg) was performed on a reverse-phase (RP) column (EASY-Spray column, 50 cm × 75 µm ID, PepMap C18, 2-µm particles, 100-Å pore size, Thermo Fisher Scientific, San Jose, CA, USA) with a 10-mm precolumn (Accucore XL C18, Thermo Fisher Scientific, San Jose, CA, USA) using a linear 120-min gradient from 5% to 35% solvent B (solvent A: 98% water, 2% ACN (Acetonitrile) and 0.1% FA and solvent B: 98% ACN, 2% water and 0.1% FA) at a flow rate of 300 nL/min. For ionization, a spray voltage of 1.95 kV and a capillary temperature of 230 ◦C were used. Peptides were analyzed in the positive mode from 400 to 1600 amu (1 µscan), which was followed by 10 data-dependent collision-induced dissociation (CID) MS/MS scans (1 µscan) using an isolation width of 3 amu and a normalized collision energy of 35%. Fragmented masses were set in dynamic exclusion for 30 s after the second fragmentation event, and unassigned charged ions were excluded from the MS/MS analysis.

#### *2.4. LC-MS/MS Mass Spectrometry Data Processing*

LC-ESI-MS/MS spectra were searched using SEQUEST-HT (Proteome Discoverer 2.4, Thermo Fisher Scientific, San Jose, CA, USA) against the *S. aureus* UniProt/TrEMBL database (208,158 protein sequence entries in July 2020). The following parameters were used: semi-tryptic cleavage with up to two missed cleavage sites and tolerance windows set at 10 ppm for the precursor ions and 0.06 Da for the MS/MS fragment ions. These additional identified semi-tryptic peptides increased the sequence coverage and confidence in protein assignments. The variable modifications that were allowed were as follows: (M\*) methionine oxidation (+15.99 Da), (C\*) carbamidomethylation of Cys (+57.02 Da) and acetylation of the N-terminus of the protein (+42.0106 Da). To validate the peptide assignments, the results were subjected to a statistical analysis with the Percolator algorithm [39]. The false discovery rate (FDR) was kept below 1%. The mass spectrometric data were deposited into the public database PRIDE (Proteomics Identification Database), with the dataset identifier PXD023530.

#### *2.5. Selection of Potential Peptide Biomarkers*

For each peptide identified by LC-ESI-MS/MS, we used the BLASTp program to determine the homologies and exclusiveness with protein sequences registered in the NCBI (National Center for Biotechnology Information) database [40]. For the BLASTp search, the *Staphylococcus* taxon was included and excluded with the aim of finding the peptides that belonged to the *Staphylococcus* phages, *Staphylococcus* spp. and only to *S. aureus*.

#### *2.6. Phage Genome Comparison and Relatedness*

Genomes of all studied *Staphylococcus* spp. phages were downloaded from the Gen-Bank database, analyzed and compared using the Web server VICTOR (Virus Classification and Tree Building Online Resource, http://ggdc.dsmz.de/victor.php, accessed on 27 November 2020) for the calculation of the intergenomic distances and the construction of the phylogenomic tree [41].

#### **3. Results**

#### *3.1. S. aureus Proteome Repository*

Protein mixtures from each of the 20 different *S. aureus* strains (Table S1 in Supplemental Data 2) were digested with trypsin and analyzed by LC-ESI-MS/MS.

A total of 1933 nonredundant peptides corresponding to 1282 nonredundant annotated proteins were identified for all *S. aureus* strains (see the Excel dataset in Supplemental Data 1). Among them, 79 phage peptides were identified. These peptides belong to proteins such as phage repressors, structural phage proteins, uncharacterized phage proteins and complement inhibitors. Figure 1 shows a comparative representation of the different types of phage proteins identified in this study. These phage peptides were selected and analyzed using the BLASTp algorithm. For the BLASTp search, *Staphylococcus* was included and excluded with the aim of finding peptides belonging to *Staphylococcus* bacteriophages.

**Figure 1.** Comparative representation of different types of phage proteins identified in this study for the different strains (represented by different colors). The number of each type of protein is shown in parentheses.

The obtained staphylococcal phage-specific peptides shared homology with the *Staphylococcus* phages and *Staphylococcus* spp. in the NCBI database. Among them, all shared homology with *S. aureus*; however, eighteen peptides were specific to *S. aureus* (IRLPYYDVK, LYVGVFNPEATK, SIINGKLDSQWTVPNEHK, M\*NDSNQGLQANPQYTIHYLSQEITR, PCPALM\*NKRNSIATIHR, SQDSNLTPELSTKAPK, ESINANTYINQNLEK, VAVLSTPLVTS-FESK, KDGEILFDAIDIYLRNK, MPVYKDGNTGKWYFSI, KTTSEALKEVLSDT, EPKPV-DATGADDPLKPDDRM\*ITNFHANLVDQKVSY, MSHNALTTGIGIGAGAG, VQHPGK-LVNKVM\*SGLNINFGGGANATAK, QM\*MEGLSGVMDLAAVSGEDLGAVSDIVTDGLTA FGLKAKDSG, KSNVEAFSNAVK, GMVASMQMQVVQVNVLTM\*ELAQQNAMLTQQLTELK and DIITVYC\*PENGTATDEY). Figure S1 shows the MS/MS spectra for these *S. aureus*specific peptide biomarkers. Table 1 summarizes the list of 79 specific staphylococcal bacteriophage peptides, bacterial peptides with putative phage origins and bacteria and phages with 100% homology with respect to the NCBI protein database.

All staphylococcal phage peptides with 100% homology were found to belong to the *Siphoviridae* family: 52 staphylococcal phages belong to the *Phietavirus* genus, 37 belong to the *Biseptimavirus* genus, 30 are *Triavirus*, two are phieta-like viruses and one is a SPbeta-like virus, and the others are nonclassified *Siphoviridae* viruses (Table S2 in Supplemental Data 2). *Siphoviridae* genomes are usually organized into functional modules, such as lysogeny, DNA replication, packaging, morphogenesis and lysis modules [6,42].



25





*Staphylococcus* phage IME1361\_01



#### *3.2. Phage Peptides Determined from the Analyzed S. aureus Strains*

For strains S2 and S3, six and three phage peptides were determined, respectively. For strain S4, seventeen phage peptides were determined, and three phage peptides were determined for strain S5. For strains S6 and S7, three and one phage peptides were determined, respectively. Moreover, for strains S8 and S9, two phage peptides and seven phage peptides were determined. For strains S10 and S11, five and three phage peptides were determined, respectively. For strains S12 and S13, five phage peptides and six phage peptides were determined, respectively. For strains S14 and S15, four and two phage peptides were determined, respectively. For strain S16, three phage peptides were determined, and one phage peptide was determined for strain S17. For strains S18 and S19, one phage peptide each was determined. Finally, for strain S20, seven phage peptides were determined.

A large number of phage peptides from structural proteins were identified (Table 1). Peptides from proteins such as the major capsid protein, major tail protein, minor structural protein, phage head morphogenesis protein, tail tape measure protein and phage tail fiber protein were determined. Moreover, different phage peptides from the major capsid protein and tail protein were determined (Table 1). Identifying these phage peptides is reasonable, as the major capsid protein and major tail protein are the most abundant proteins in mature virions [6].

There are a large number of uncharacterized protein sequences in databases, and more than 20% of all protein domains are annotated as "domains of unknown function" (DUFs). Several uncharacterized phage proteins and DUFs from *Staphylococcus* bacteriophages were identified for the analyzed strains (Table 1) [43,44].

Different peptides from repressor-type Cro/CI were determined. For strains S11 and S20 (both potential enterotoxin C producers), the same phage peptides of repressor-type Cro/CI were identified (Table 1). CI and Cro are encoded in the lysogeny module of lambdoid bacteriophages, particularly λ bacteriophages. Together, CII and CIII (that are formed through the anti-terminator role of protein N) act as an inducer that favors the first expression of the *cI* gene from the appropriate promoter; if the CI repressor predominates, the phage remains in the lysogenic state, but if the Cro predominates, the phage transitions into the lytic cycle, helped by the late Q regulator. The xenobiotic XRE regulator is extended in bacteria and has similarity to the Croλ repressor, exhibiting a helix-turn-helix (HTH) conformation [45]. Peptides of the CI/Cro-repressor types are usually named XRE family proteins in the NCBI database for bacteria.

Three phage peptides of the complement inhibitor were identified (Table 1). Staphylococcal complement inhibitors are involved in the evasion of human phagocytosis by blocking C3 convertases, and a study reported that complement inhibitor genes were also found in *staphylococcal* phages [46]. Another autolysin was determined in the present results, an N-acetylmuramoyl-L-alanine amidase that plays a role in bacterial adherence to eukaryotic cells [19]. The phage protein NrdI, which is a type of ribonucleotide reductase (RNR), was also identified. Several peptides of transposases, integrases and terminases were identified along with a DNA primase phage associated protein and a DNA phage binding protein. Moreover, peptides of other proteins, such as GNAT family N-acetyltransferase, holin, peptidase, methylase, anti-repressor protein (Ant), phage-resistant protein, phage-encoded lipoprotein, phage infection protein, phage portal protein, toxin phage proteins associated with pathogenicity islands and a protein involved in fibrinogen-binding proteins, were identified. A PBSX family phage terminase peptide was determined, and this protein is involved in double-stranded DNA binding, DNA packaging and endonuclease and ATPase activities [47].

As shown in Table 1, the vast majority of phage-specific peptides are not specific to *S. aureus* and can be found in other species of *Staphylococcus*. As an exception, the same peptides, such as peptide LLHALPTGNDSGGDKLLPK from a major capsid protein, were also found in *Streptococcus pneumoniae*, and peptide AYINITGLGFAK from a major tail protein was also found in *Pararheinheimera mesophila*; whether these examples represent direct recombinations between bacteria belonging to different families or whether phagemediated recombination occurs remains to be elucidated. Furthermore, as mentioned before, eighteen identified peptides were very specific for *S. aureus* based on the NCBI database (see Figure S1).

#### *3.3. Staphylococcus spp. Phage Genome Comparisons and Their Relatedness*

A phylogenomic tree of *Staphylococcus* spp. phages from the NCBI database (accession numbers in Table S2 in Supplemental Data 2) with 100% similarity to those found in this study was built (Figure 2). The phages identified in this study were classified in the order *Caudovirales* and the family *Siphoviridae*. Many of these bacteriophages were classified into the genera *Phietavirus*, *Biseptimavirus, Triavirus* phieta-like virus, SPbeta-like virus and unclassified genera. Genomes of well-known phages of the families *Siphoviridae*, *Myoviridae* and *Podoviridae*, such as phage Lambda, T4 and T7, respectively, were added for comparison purposes. The genome analysis showed three well-defined clusters that mainly divided the phylogenomic tree into different phage genera (*Phietavirus*, *Biseptimavirus* and *Triavirus*). Two principal branches separated Clusters A, B and C from D. Cluster A was formed by *Staphylococcus Phietavirus,* two phieta-like viruses and two unclassified *Staphylococcus* phages. Cluster B was formed by *Staphylococcus* phages classified as *Biseptimavirus* and by one unclassified *Staphylococcus* phage. Cluster C was formed by enterobacterial bacteriophages and one SPbeta-like virus. Finally, cluster D was formed by *Triavirus Staphylococcus* phages and two unclassified *Staphylococcus* phages. To the best of our knowledge, this is the first time that phages from mastitis-causing staphylococci were grouped in a phylogenomic tree.

Specific peptides were found in related *Staphylococcus* spp. phages (Table 2) located closely in the phylogenomic tree (Figure 2). Peptides HAGYVRC\*KLF and MPVYKDGNTGKWYFSI were found in phages of cluster A. Furthermore, peptides IYDRNSDTLDGLPVVNLK, QKNVLNYANEQLDEQNKV, EVPNEPDYIVIDVC\*EDYSASK, KSNVEAFSNAVK and KLYIIEEYVKQGM were found in *Staphylococcus* phages of the A.1 subbranch in cluster A. Additionally, peptide AVAELLKEINR was found in phages of the A.2 branch. The peptide AYINITGLGFAK was found in phages of cluster B.1, and TSIELITGFTK was found in phages of cluster B.2. Peptides VSYTLDDDDFITDVETAK and LLHALPTGNDSGGD-KLLPK, which belong to the phage major capsid protein, were found in the same 14 *Staphylococcus* phages of cluster D. Peptides ELAEAIGVSQPTVSNWIQQTK and IQQLA-DYFNVPK, which belong to the phage-repressor Cro/CI family of proteins, were found in the same bacteriophages of cluster D. Moreover, peptides LYVGVFNPEATK, RVSYTLD-DDDFITDVETAKELKL LYVGVFNPEATK, VLEMIFLGEDPK, KAMIKASPK, EFRNKL-NELGADK and GMPTGTNVYAVKGGIADK were also found in phages of cluster D. Peptides IHDKELDDPSEEESKLTQEEENSI, IIINHDEIDLL, KDRYSSVSY and AEEAGVTVKQL are specific to *Staphylococcus* phage SPbeta-like.

**Table 2.** Phage biomarker peptides that belong to bacteriophages and phylogenomic tree clusters. Relationships between specific phage biomarker peptides and phylogenomic tree clusters.


**Protein Peptide Phages Cluster Located** Major capsid protein LLHALPTGNDSGGDKLLPK *Staphylococcus* phage phiSa2wa\_st72 *Staphylococcus* phage phiSa2wa\_st121mssa *Staphylococcus* phage vB\_SauS\_phi2 *Staphylococcus* phage StauST398-2 *Staphylococcus* phage LH1 *Staphylococcus* phage phiSa2wa\_st30 *Staphylococcus* virus phi12 *Staphylococcus* virus 3ª *Staphylococcus* virus phiSLT *Staphylococcus* phage tp310-2 *Staphylococcus* phage vB\_SauS\_JS02 *Staphylococcus* phage R4 *Staphylococcus* phage vB\_SauS\_fPfSau02 *Staphylococcus* phage SA137ruMSSAST121PVL Cluster D Major capsid protein RVSYTLDDDDFITDVETAKELKL *Staphylococcus* phage LH1 *Staphylococcus* phage StauST398-2 *Staphylococcus* phage vB\_SauS\_phi2 *Staphylococcus* phage R4 Cluster D Major tail protein LYVGVFNPEATK *Staphylococcus* phage vB\_SauS\_ phi2 *Staphylococcus* virus phi12 *Staphylococcus* virus phiSLT *Staphylococcus* phage R4 *Staphylococcus* phage vB\_SauS\_JS02 *Staphylococcus* phage SH-St 15644 *Staphylococcus* virus 3a *Staphylococcus* phage P240 Cluster D Phage repressor, Cro/CI family ELAEAIGVSQPTVSNWIQQTK *Staphylococcus* virus IPLA35 *Staphylococcus* phage SMSAP5 *Staphylococcus* phage vB\_SauS\_phi2 Cluster D Phage repressor, Cro/CI family IQQLADYFNVPK *Staphylococcus* virus IPLA35 *Staphylococcus* phage SMSAP5 *Staphylococcus* phage vB\_SauS\_phi2 Cluster D Major tail protein AYINITGLGFAK *Staphylococcus* phage phiNM3 *Staphylococcus* phage StauST398-4 *Staphylococcus* phage P282 *Staphylococcus* phage phiN315 *Staphylococcus* phage phi7247PVL *Staphylococcus* phage phiSa2wa\_st22 *Staphylococcus* virus 77 *Staphylococcus* phage P954 Cluster B.1 Major capsid protein IYDRNSDTLDGLPVVNLK *Staphylococcus* virus 85 *Staphylococcus* phage SP5 *Staphylococcus* virus phiETA2 *Staphylococcus* phage phiNM *Staphylococcus* virus SAP26 *Staphylococcus* phage SA12 *Staphylococcus* virus Baq Sau1 Cluster A.1 Uncharacterized phage protein AVAELLKEINR *Staphylococcus* virus 71 *Staphylococcus* virus 55 *Staphylococcus* virus 88 Cluster A.2 DUF2479, Phage tail fiber, BppU family phage baseplate upper protein HAGYVRCKLF *Staphylococcus* phage SA97 *Staphylococcus* virus 55 uncultured Caudovirales phage *Staphylococcus* virus 85 *Staphylococcus* virus 80 *Staphylococcus* virus phiETA3 *Staphylococcus* virus phiETA2 *Staphylococcus* phage 55-2 *Staphylococcus* phage B166 *Staphylococcus* phage B236 *Staphylococcus* virus SAP26 *Staphylococcus* virus 88 *Staphylococcus* virus phiETA *Staphylococcus* virus 11 *Staphylococcus* phage SP5 *Staphylococcus* virus 69 *Staphylococcus* phage ROSA *Staphylococcus* phage TEM123 *Staphylococcus* virus 92 *Staphylococcus* phage StauST398-1 *Staphylococcus* virus phiNM2 *Staphylococcus* virus phiNM1 *Staphylococcus* virus 29 *Staphylococcus* phage vB\_SauS-SAP27 *Staphylococcus* virus 80alpha *Staphylococcus* phage HSA84 *Staphylococcus* virus phiMR11 *Staphylococcus* phage SAP33 *Staphylococcus* phage 3MRA Cluster A Phage terminase KLYIIEEYVKQGM *Staphylococcus* virus Baq\_Sau1 *Staphylococcus* virus phiETA2 *Staphylococcus* virus 69 *Staphylococcus* virus 11 *Staphylococcus* virus 80alpha Cluster A.1 Phage-related cell wall hydrolase; Peptidase C51; CHAP domain-EVPNEPDYIVIDVC\*EDYSASK *Staphylococcus* virus IPLA88 *Staphylococcus* virus phiNM2 *Staphylococcus* phage SAP40 *Staphylococcus* phage phi 53 *Staphylococcus* virus phiNM4 *Staphylococcus* phage SA12 *Staphylococcus* virus 69 *Staphylococcus* phage SA97 *Staphylococcus* phage TEM123 *Staphylococcus* virus 11 *Staphylococcus* virus Cluster A.1

**Table 2.** *Cont.*

phiMR25 *Staphylococcus* virus 53 *Staphylococcus* phage SAP33


In addition, a correlation relating bacterial species for each cluster with all peptides found in the bacteriophages with 100% similarity was found. The results showed that clustered phages were related to specific species of *Staphylococcus*. All studied phages were found to be related to *S. aureus*; however, most of them were also found to be related to additional *Staphylococcus* species. *S. argenteus* was found to be related in all clusters of the phylogenomic tree. Cluster A phage peptides were found to be mainly related to *S. simiae*. However, different *Staphylococcus* species (*S. xylosus*, *S. muscae*, *S. haemolyticus*, *S. simiae*, *S. sciuri*, *S. pseudintermedius, S. devriesei*, *S. warneri* and *S. capitis*) were found to be related to phages of cluster D.

#### *3.4. Identification of Peptides of Virulence Factors*

In this work, 405 peptides from *S. aureus* were determined to be related to virulence factors (Excel dataset Supplemental Data). Among these peptides, proteins such as staphopain, beta-lactamase, elastin-binding protein peptides and a multidrug ATP-binding cassette (ABC) transporter were identified.

**Figure 2.** Phylogenomic tree generated by the Virus Classification and Tree Building Online Resource (VICTOR) using the complete genomic sequences of the determined *Staphylococcus* spp. phages. The access numbers of the determined phage genomes are shown in Table S2 in Supplemental Data 2. Genomes of the *lambda* (NC\_001416.1), *T4* (NC\_000866.4) and *T7* (NC\_001604.1) phages were added for comparison purposes. The VICTOR phylogenetic tree construction was based on an intergenic distance analysis with the GBDP tool (Genome BLAST Distance Phylogeny). The significance of each branch is indicated by a pseudo-bootstrap value calculated as a percentage for 1000 subsets. Bar, 20 nt (nucleotides) substitutions per 100 nt. Clusters are represented by different colors: light blue, cluster A, red, cluster A.1, purple, cluster A.2, light green, cluster B, yellow, cluster B.1, pink, cluster B.2, black, cluster C and orange, cluster D. Specific cluster peptides are represented by different color forms: r forms: , yell , yellow-filled diamond IQQLADYFNVPK (cluster A-specific), , brown- ), , brown-filled diamond HAGYVRC\*KLF (cluster A-specific), ow-filled diam c), ,, black-outlined diamond IYDRNSDTLDGLPVVNLK (cluster A.1-specific), diamond HAG ), , red=outlined diamond AVAELLKEINR (cluster A.2-specific), ond IYDRNSDTLDG , , pink-filled diamond KSNVEAFSNAVK (cluster A.1), tlined diamond 1), ,, gray-filled diamond QKN-VLNYANEQLDEQNKV (cluster A.1), pink-filled ster A.1), , brown , brown-outlined diamond MPVYKDGNTGKWYFSI (cluster A-specific), ), , , dark gray-filled diamond KLYIIEEYVKQGM (cluster A.1-specific), ), , purple-outlined diamond EVPNEPDYIVIDVC\*EDYSASK (cluster A.1-specific), c), , orange-fille , orange-filled diamond AYINITGLGFAK (cluster B.1-specific), ), , ye , yellow-outlined diamond TSIELIT-GFTK (cluster B.2-specific), ), ,, red-filled diamond VSYTLDDDDFITDVETAK (cluster D-specific), lined diamond ic), , , green-filled diamond LLHALPTGNDSGGDKLLPK (cluster D-specific), ond VSYTLDDDDFITDV LPTGNDSGG- ic), , blac , black-filled diamond RVSYTLDDDDFITDVETAKELKL (cluster D-specific), (cluster D-specific), , pur , purple-filled diamond LYVGVFNPEATK (cluster D-specific, k-filled diamond RVSYTLDDDDFITDVETAKEL ific, ,, blue-filled diamond ELAEAIGVSQPTVSNWIQQTK (cluster D-specific); ic); , l , light green-filled diamond VLEMIFLGEDPK (cluster D-specific), ), , orange-ou , orange-outlined diamond KAMIKASPK (cluster D-specific) and and , gray-ou , gray-outlined diamond GMPTGTNVYAVKGGIADK (cluster D-specific).

#### **4. Discussion**

LC-MS/MS-based methods for bacteriophage identification offer several advantages compared with other approaches, since bacteriophages can be directly identified with this method without using genomic tools, which provides a new strategy for drawing the appropriate conclusions. In addition, the method proposed here may be applied for further analyses without the requirement of growing bacteria, since the samples can be collected directly from foodstuffs. The study of noninduced prophages provides a fast analysis and can detect specific temperate phage proteins produced by *S. aureus* while integrated in the bacterial genome or by phages that are infecting the bacteria. Both cases provide the identification of specific *S. aureus* species or strains—in this case, an *S. aureus* mastitis producer. In the proteomic repository of the 20 different *S. aureus* strains analyzed, 79 peptides from staphylococcal bacteriophages were identified. Among them, eighteen of these phage peptides were *S. aureus*-specific. As bacteriophages are host-specific, these putative diagnostic peptides could be good diagnostic biomarkers for the detection and characterization of *S. aureus* and *S. aureus* phages.

The results show that a given specific peptide is present in closely related phages (Table 2). These bacteriophage peptides can be used as specific markers to establish *S. aureus* bacteriophage relationships (Figure 2). Additionally, phages that show the same peptides and are specific to *Staphylococcus* spp. are located close to one another in the phylogenomic tree, suggesting that a link does exist between phage phylogeny and bacteriophages that can infect the same bacterial species.

The study shown here exemplifies how phylogenomic trees based on the genome analysis provide useful information, and the study corroborates previous investigations, which suggested that viral genomic or subgenomic region analyses provide the best tool for reconstructing viral evolutionary histories [48]. Nevertheless, the lack of knowledge of the phage genomic content [49] makes a phage analysis more difficult. The first priority must be the contribution of new large amounts of data for phages infecting bacteria [12].

In addition, there is an urgent need for novel therapies to treat and prevent mastitis [50]. Bacteriophage therapy is an alternative to the antibiotic treatment of bovine mastitis [51], with a high specificity and a low probability for bacterial resistance development [52]. Many studies have demonstrated the effectiveness of bacteriophages in a variety of animal models to fight several mastitis-causing pathogenic bacteria. Some studies have shown how virulent phages such as SPW and SA phages are active against bovine mastitis-associated *S. aureus.* Moreover, SAJK-IND and MSP phages have specific lytic activity against several strains of *S. aureus* isolated from mastitis milk samples [53]. Indeed, mouse-induced mastitis models decreased their bacterial counts after treatment with a vBSM-A1 and vBSP-A2 phage cocktail [54]. Finally, several temperate phage mixtures have been shown to be more effective than using a single temperate phage for inhibiting *S. aureus.* According to the data obtained for the different models of mastitis, phage therapy using bacteriophages in this study can be considered an innovative alternative to antibiotics for the treatment of mastitis caused by *S. aureus*.

Finally, the proteomic analysis by LC-ESI-MS/MS performed in this study provides relevant insights into the search for potential phage origin diagnostic peptide biomarkers for mastitis-causing *S. aureus*. In addition, this method may be useful for searching peptide biomarkers for the identification and characterization of mastitis-causing species and for finding new *S. aureus* phages useful as possible therapies for mastitis.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/ 10.3390/foods10040799/s1: Figure S1: MS/MS spectrums for *S. aureus*-specific peptide biomarkers. The corresponding peptides were tested for specificity using the BLASTp algorithm. Excel Dataset Supplemental Data 1: Complete nonredundant peptide dataset. Supplemental Data 2: Table S1: *Staphylococcus aureus* (SA) strains used in this study. Table S2: Linage, authors and accession number of studied bacteriophages [55–88].

**Author Contributions:** A.G.A. wrote the manuscript; A.G.A., K.B., T.G.V., P.C.-M., B.C., J.B.-V., J.-L.R.R. and M.C. conceptualized, revised and corrected the paper. P.C.-M. and M.C. co-supervised the work. M.C. and P.C.-M. got the funding. All authors listed have made a substantial, direct and intellectual contribution to the work and approved the work for publication.

**Funding:** This work received financial support from the Xunta de Galicia and the European Union (European Social Fund-ESF), from the Spanish Ministry of Economy and Competitivity Project AGL 2.013-48.244-R and from the European Regional Development Fund (ERDF) (2007–2013). This work was also supported by the GAIN-Xunta de Galicia Project (IN607D 2017/01) and the Spanish AEI/EU-FEDER PID2019-103845RB-C21 project. Mónica Carrera was supported by the Ramón y Cajal contract (Ministry of Science and Innovation of Spain).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All relevant data are included in the article. The mass spectrometric data were deposited into the public database PRIDE (Proteomics Identification Database), with the dataset identifier PXD023530.

**Acknowledgments:** The mass spectrometry proteomics data were deposited into the ProteomeXchange Consortium via the PRIDE [89] partner repository with the dataset identifier PXD023530.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


*Article*
