**Soybean (***Glycine max***) Protein Hydrolysates as Sources of Peptide Bitter-Tasting Indicators: An Analysis Based on Hybrid and Fragmentomic Approaches**

### **Anna Iwaniak \*,**†**, Monika Hrynkiewicz** †**, Piotr Minkiewicz, Justyna Bucholska and Małgorzata Darewicz \***

Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Chair of Food Biochemistry, Pl. Cieszy ´nski 1, 10-719 Olsztyn-Kortowo, Poland; monika.protasiewicz@uwm.edu.pl (M.H.); minkiew@uwm.edu.pl (P.M.); justyna.bucholska@uwm.edu.pl (J.B.)

**\*** Correspondence: ami@uwm.edu.pl (A.I.); darewicz@uwm.edu.pl (M.D.); Tel.: +48-89-523-3722 (A.I.)

† These authors equally contributed to this work.

Received: 2 March 2020; Accepted: 3 April 2020; Published: 6 April 2020

**Abstract:** The aim of this study was to analyze soybean proteins as sources of peptides likely to be bitter using fragmentomic and hybrid approaches involving in silico and in vitro studies. The bitterness of peptides (called parent peptides) was theoretically estimated based on the presence of bitter-tasting motifs, particularly those defined as bitter-tasting indicators. They were selected based on previously published multilinear stepwise regression results. Bioinformatic-assisted analyses covered the hydrolysis of five major soybean-originating protein sequences using bromelain, ficin, papain, and proteinase K. Verification of the results in experimental conditions included soy protein concentrate (SPC) hydrolysis, RP-HPLC (for monitoring the proteolysis), and identification of peptides using RP-HPLC-MS/MS. Discrepancies between in silico and in vitro results were observed when identifying parent peptide SPC hydrolysate samples. However, both analyses revealed that conglycinins were the most abundant sources of parent peptides likely to taste bitter. The compatibility percentage of the in silico and in vitro results was 3%. Nine parent peptides with the following sequences were identified in SPC hydrolysates: LS**VI**S**PK**, D**VLVI**P**LG**, LI**VI**LNG, NP**FL**FG, ISSTIV, PQMIIV, PFPSIL, DD**FFL**, and **FFEI**TPEK (indicators are in bold). The fragmentomic idea of research might provide a supportive method for predicting the bitterness of hydrolysates. However, this statement needs to be confirmed experimentally.

**Keywords:** bioinformatics; BIOPEP-UWM database; bitter-tasting peptides; hydrolysates; soybean proteins

### **1. Introduction**

Soybean has been known as a food for thousands of years [1] and recently has become increasingly popular among consumers for ecological, ethical, and health-beneficial concerns [2]. Its health-beneficial properties are due to the presence of biologically active components like isoflavones, saponins, protease inhibitors, and peptides. Briefly, their activity is related to their preventive potential against cardiovascular disease, diabetes, menopausal symptoms, osteoporosis, prostate, and breast cancers. According to the literature, peptides derived from soybean proteins are responsible for a variety of activities that regulate body functions, e.g., reduction of blood pressure, cholesterol, and carbohydrate levels, as well as exhibit anti-inflammatory, antioxidative, and anticancer effects. The first three "reducing" functions of peptides are related to the inhibition of the following enzymes: ACE (angiotensin converting enzyme, EC 3.4.15.1), DPP IV (dipeptidyl peptidase IV; EC 3.4.14.5), and HMGR (3-hydroxy-3-methylglutaryl-coenzyme A reductase; EC 1.1.1.34), respectively [1].

Quite often, the biological function of a peptide is associated with bitter taste; thus, both peptides and protein hydrolysates may not be acceptable by consumers due to the undesired taste profile [3]. Due to the fact that peptides are derived from food proteins, they are considered as "natural". Thus, their sensory acceptance poses a challenge for food scientists and technologists who aim at producing functional foods [3], especially when some of the bioactive peptides negatively affect the taste of a protein hydrolysate [4].

Currently, three approaches are used to study peptides from foods [5]. The first one is called a classical approach, and briefly, it is an experimental protocol consisting of the main methodological steps like hydrolysis of protein, identification of peptides, and determination of the bioactivity of the hydrolysate(s), as well as the peptide(s). The other two approaches benefit from the progress in the development and application of bioinformatics and chemometrics in the food science. This results from the increasing popularity of databases of biomolecules as sources of information about, e.g., peptides, as well as programs serving for peptide analyses. Thus, the method for studying peptides involving the application of bioinformatic and/or cheminformatic predictions is called an in silico approach, whereas the combination of the classical and in silico protocols is defined as a hybrid approach [5].

Some in silico studies were undertaken to analyze the impact of the amino acid composition of peptides on their bitterness. For example, the bitterness of peptides was found to result from the presence of residues with bulky and branched side chains, like Leu, Ile, Tyr, Phe, and Val. According to the literature, the first three of these amino acids were found to be extremely bitter with an unpleasant flavor and odor [6].

The fact that some regularities can be observed between the presence of the specific amino acid(s) and the function of a whole peptide sequence has prompted some scientists to find the foundations for introducing the fragmentomic idea of research. According to this idea, shorter motifs (fragments) with a known property encrypted in a molecule of interest may affect its function [7]. According to Liu et al. [8], motifs are generally understood as reproducible patterns in a protein or peptide sequence that are ascribed to a specific biological function (in this case, taste). This applies also to peptides identified in a food protein hydrolysate. For example, a sequence identified (i.e., precursor/parent peptide) that contains peptide motifs with confirmed bitterness (i.e., bitter-tasting motifs) may suggest the bitter-tasting potential of a parent sequence. Such an idea of research was applied by Iwaniak et al. [9] for the hybrid analysis of bovine milk protein hydrolysates as sources of bitter-tasting motifs, especially those with an "indicator" status.

To sum up, soybean protein hydrolysates are known sources of bioactive peptides. According to the literature, soybean hydrolysates taste bitter, which may be due to the presence of peptide sequences [10]. Taking into account these two facts, the aim of this study was to apply the hybrid protocol for an analysis of soybean protein hydrolysates as sources of peptides likely to be bitter due to the presence of bitter-tasting motifs, particularly those defined as bitter-tasting indicators. According to the definition introduced by Iwaniak et al. [9], bitter-tasting indicators should be literally understood as "shorter motifs with known bitterness, which found in the sequences of peptides, may potentially determine their taste".

### **2. Materials and Methods**

### *2.1. Soybean Protein Sequences and Computer Simulation of Their Hydrolysis*

Five sequences of soybean (Glycine max) proteins, namely: 7S globulin (403 aa, P13917), glycinin (492 aa, P04347), β-chain of β-conglycinin, (414 aa, P25974), α-chain of β-conglycinin (543 aa, P13916), and profilin (130 aa, O65809), were acquired from the UniProt database [11] (accessed December 2018). The number of amino acids in a protein chain, as well as the accession number of proteins in UniProt are provided in brackets.

The BIOPEP-UWM database [12] tool called "Enzyme(s) action" was applied for computer simulation of hydrolysis. Each soybean protein sequence was theoretically hydrolyzed by bromelain, ficin, papain, and proteinase K, respectively. The following steps were required to hydrolyze protein when opening the BIOPEP-UWM tool: Sensory peptides and amino acids → Analysis → Enzyme(s) action → For your sequence → Paste the protein sequence (e.g., 7S globulin) → Select the enzyme (e.g., bromelain) → View the report with the results. Motifs that were theoretically released from a protein, excluding single amino acids, were copied, and each of them was searched for the presence of bitter-tasting peptides, according to the following protocol: BIOPEP-UWM sensory peptides and amino acids → Analysis → Profiles of proteins potential sensory activity → For your sequence → Paste the released peptide (e. g., VFDG) → Report. Bitter-tasting indicators (see below) were searched manually after making the above-mentioned analysis. Moreover, the potentially released peptide was searched for showing an additional bioactivity (if any) using an analogical procedure, by selecting "Bioactive peptides" instead of the "Sensory peptides and amino acids" tab after opening the BIOPEP-UWM database.

### *2.2. Bitter-Tasting Peptide Indicators*

Peptides possessing a status of bitter-tasting indicators were selected from the BIOPEP-UWM database collection of 102 bitter di- and tri-peptides (51 sequences of di- and tri-peptides each) that were analyzed using multivariate linear stepwise regression (MLR) [13]. Based on the MLR results, those peptides whose experimental measures of bitterness were approximative to the theoretical ones were defined as bitter-tasting peptidic indicators. They were as follows: PK(0.17;0.08), AD(0.17;0.10), VD(0,08;0.10), VE(0.17;0.15), EI(0.25;0.18), YG(0.33;0.27), VL(0.17;0.21), VI(0.17;0.21), LG(0.05;0.16), GV(0.22;0.25), GP(0.17;0.25), RG(0.13;0.21), IG(0.22;0.19), LE(0.33;0.32), KP(0.33;0.33), VF(0.33;0.37), VY(0.33;0.37), LL(0.4;0.37), FI(0.67;0.56), IF(0.67;0.56), FL(0.67;0.60), FF(0.83;0.72), PGR(0.04;0.26), GGP(0.11;0.12), GGV(0.03;0.10), GLG(0.1;0.17), PPG(0.11;0.03), LGL(0.20;0.07), FGG(0.22;0.02), VVV(0.22;0.05), GGL(0.1;0.12), GVV(0.22;0.17), PGP(0.11;0.25), KPK(0.33;0.37), YGG(0.43;0.52), PGI(0.43;0.48), PPP(0.50;0.66), GLL(0.67;0.69), LLL(0.83;0.64), GGF(0.67;0.76), GGY(0.67;0.74), PIP(0.70;0.82), GYY(2.50; 2.32). Finally, 21 dipeptides and 21 tripeptides achieved the status of a bitter taste indicator. Their experimental and theoretical bitterness are provided in the brackets, respectively. The experimental bitterness of a peptide was expressed as Rcaf. value, meaning the threshold concentration for 1 mm caffeine solution as a standard (the higher the Rcaf. value, the bitterer the peptide is) [14].

### *2.3. Materials and Reagents*

Soy protein concentrate (SPC) called Isomil®(containing 68 % protein, according to the product specification) was produced by Libra Poland Ltd. (Warsaw, Poland). Enzymes: bromelain (EC 3.4.22.32; 5-15 units/mg protein; Cat. No. B5144), ficin (EC 3.4.22.3, ≥1 unit/mg protein; Cat. No. F4125), papain (EC 3.4.22.2, 10 units/mg protein; Cat. No. P4762), and proteinase K from *Tritirachium album* (EC 3.4.21.64, ≥ 30 units/mg protein; Cat. No. P2308), and trifluoroacetic acid (TFA), acetonitrile (ACN), 2,2-bis(hydroxymethyl)-2,2 ,2"-nitrilotriethanol (Bis-Tris), 2-mercaptoethanol, and urea were purchased from Sigma-Aldrich Sp. z o.o. (Pozna ´n, Poland). All chemicals were of analytical grade. Water used to formulate solutions and buffers was prepared using a Milli-Q PLUS system (Millipore Corp., New York, NY, USA).

### *2.4. Hydrolysis of SPC*

SPC hydrolysis was carried out according to the protocol provided by Peñta-Ramos and Xiong [15] with slight modifications. Firstly, five separate water solutions of SPC containing 3% protein (w/v) each were prepared. All of them had the non-adjusted pH of 7.0±0.1. SPC solutions were continuously and gently stirred, as well as pre-heated for 5 min using a Heidolph Unimax Modular Incubator 1010 (Heidolph Instruments GmbH & CO. KG, Schwabach, Germany). The preincubation temperatures were

typical of the activity of enzymes used, according to their specifications provided by the manufacturer, and were as follows: bromelain and ficin (50 ◦C), papain (65 ◦C), and proteinase K (37 ◦C). Finally, 3 hour hydrolysis of four SPC samples was conducted under continuous stirring and the enzyme-to-substrate ratio (protein) of 1:100 (w/w) [15,16]. The pH values of SPC solutions were optimal for enzyme activity as provided by the manufacturer, i.e.: 7.0 for bromelain, papain, and proteinase K; and 6.5 for ficin. In the case of the last sample, its pH was adjusted using 0.1 M HCl. Afterwards, all hydrolysates were heated at 90 ◦C for 15 min to inactivate the enzymes and then freeze-dried. Finally, four SPC hydrolysates were produced, namely B-SPC, F-SPC, P-SPC, and PK-SPC, the abbreviations referring to bromelain-, ficin-, papain-, and proteinase-K-SPC hydrolysates, respectively, according to the convention used in the previous paper [9]. The fifth sample, 0-SPC, was a reference sample being the non-hydrolyzed SPC solution. All hydrolysates were prepared in duplicate.

### *2.5. RP-HPLC for Monitoring the Process of SPC Hydrolysis and RP-HPLC-MS*/*MS for the Identification of SPC-Originating Peptides*

RP-HPLC (reversed-phase high performance liquid chromatography) was applied to observe the progress in the hydrolysis of SPC using the above-mentioned enzymes. The preparation of SPC/SPC hydrolysate samples taken for RP-HPLC and RP-HPLC-MS/MS (reversed-phase high performance liquid chromatography online with tandem mass spectrometry), as well as the parameters of devices for chromatography were exactly the same as in the protocol by Iwaniak et al. [9].

Firstly, two milligrams of freeze-dried SPC/SPC hydrolysate sample were dissolved in 300 μL of a buffer containing 0.1 M Bis-Tris and 4 M urea. Then, two microliters of 2-mercaptoethanol were added, and the mixture was vortexed and finally incubated at room temperature (1 hour). Then, six-hundred-eighty microliters of 6 M urea solution in a mixture of ACN and H2O (v/v; 100: 900; pH 2.2 was adjusted by the addition of TFA) were added to the sample and stirred. Finally, the samples were centrifuged (10 min; 10,000× *g*) [9].

The RP-HPLC analysis was carried out on Shimadzu® devices comprised of two LC-20AD pumps, an SIL-20AC HT autosampler, a CBM-20A controller, a CTO-10AS VP thermostat, an SPD-M20A photodiode detector, a DGU-20A5 degasser, and an FRC-10A fraction collector. The Jupiter Proteo Phenomenex® column (Torrance, CA, USA; 250 <sup>×</sup> 2 mm; particle diameter: 4μm; pore diameter: 90Å) was applied for RP-HPLC analysis of the samples. According to the protocol by Iwaniak et al. [9], Solvent A was 0.01 % TFA water solution (v/v), whereas Solvent B was 0.01 % (v/v) TFA dissolved in acetonitrile (ACN). The gradient of Solvent B increased from 0 to 40 % during 60 min. The column was washed with Solvent B (40-100 %, 60–65 min; 100%, 65–70 min, 100 -0 %, 70–71 min; 0 %, 71–80 min). Data registration was between the 0th and 80th minute of the RP-HPLC analysis. The injection volume was 30 <sup>μ</sup>L; the flow rate was 0.2 mL <sup>×</sup> min<sup>−</sup>1; and the column temperature was 30 ◦C. All chromatograms were acquired at 220 nm [9]. Finally, the chromatographic profiles of the samples taken before the 60th min of the RP-HPLC analysis were the subject of discussion.

Similarly, the exact procedure as that described by Iwaniak et al. [9] was applied for the identification of peptides released from SPC hydrolysates using RP-HPLC-MS/MS. Briefly, the identification of peptides was carried out using the VARIAN® 500-MS (Agilent Technologies, Santa Clara, CA, USA) ion trap mass spectrometer with an electrospray ion source and an RP-HPLC assembly comprised of two 212-LC pumps, a ProStar 410 autosampler, a Degassit degasser (MetaChem Technologies®, Torrance, CA, USA), and a nitrogen generator (Parker Domnick Hunter Scientific®, Gateshead, U.K.). Data were registered between 5 and 60 min. The other parameters for mass spectrometry included: needle and shield voltages: 5000 and 600 V, respectively; spraying and drying gas (nitrogen) pressure: 55 and 30 psi, respectively; drying gas temperature: 390 ◦C; flow rate of damping gas (helium): 0.8 mL <sup>×</sup> min−1; positive polarity with current ionization: 600 V; capillary voltage: 100 V; retardation factor loading: 100 %; isolation window: 3.0; excitation storage level m/z = 100–2000 Da; flow rate: 0.2 mL <sup>×</sup> min<sup>−</sup>1; injection volume: 15μL; frequency of data recording: 0.05-0.07 Hz single scan averaged from five microscans. The gradient and column type including its

parameters were identical to those presented above [9]. Peptides were identified by comparison of experimental mass-to-charge ratios, and the ratios were calculated using the Fragment Ion Calculator program (http://db.systemsbiology.net:8080/proteomicsToolkit/FragIonServlet.html) as described in our previous article [9]. All analyses were performed in duplicate.

We introduced an additional parameter to compare the susceptibility of proteins to particular proteolytic enzymes:

$$\mathbf{C} = (\mathbf{S}\_{\text{shortter}} - \mathbf{S}\_{\text{0shorter}}) / \mathbf{S}\_{\text{longer}} \tag{1}$$

where:

Sshorter: relative area of peaks between 14.00 and 39.99 min in the chromatogram of the hydrolysate; S0shorter: relative area of peaks between 14.00 and 39.99 min in the chromatogram of the non-hydrolyzed sample;

Slonger: relative area of peaks between 40.00 and 60.00 min in the chromatogram of the hydrolysate.

Retention time prediction is a helpful strategy supporting peptide identification by mass spectrometry [17]. To facilitate peptide identification, their theoretical retention times (tR predicted) were calculated and then compared with the experimental (tR experimental) ones. The tR predicted was predicted using Sequence Specific Retention Calculator (SSRCalc) (http://hs2.proteome.ca/SSRCalc/SSRCalc.html; accessed: December 2019) [18] according to the mathematical formula introduced by Darewicz et al. [19]:

$$\text{t}\_{\text{R predicted}} = 0.0002 \times (\text{t}\_{\text{R.SSR.calc}})^3 - 0.0085 \times (\text{t}\_{\text{R.SSR.calc}})^2 + 1.0415 \times (\text{t}\_{\text{R.SSR.calc}}) + 8.6434 \tag{2}$$

where:

tR SSRCalc: retention time (min) calculated with Sequence Specific Retention Calculator (SSRCalc).

The specific parameters implemented in the software to obtain tR SSRCalc, like the retention time of the substance not adsorbed on the column (parameter "a"), the parameter dependent on the acetonitrile gradient (parameter "b"), pore diameter, column, and TFA concentration, were as described by Iwaniak et al. [9].

### **3. Results**

### *3.1. In Silico Analysis*

Many data concerning molecules can be found in databases [5]. The BIOPEP-UWM database of sensory peptides and amino acids [20] was used to analyze the in silico fragments released from soybean proteins. These peptides were searched for the presence of shorter bitter-tasting fragments, including indicators. The results are shown in Tables 1 and A1 (in the Appendix A). They include only released fragments that (a) contained bitter-tasting motifs and (b) consisted of four amino acid residues at a minimum. Shorter fragments were not discussed. They were di- and tri-peptides that could already match the sequences with a known taste sensation (i.e., bitterness).

**Table 1.** Number of parent peptides including these containing bitter-tasting indicators found in soybean protein sequences.



**Table 1.** *Cont.*

<sup>1</sup> B, bromelain; F, ficin; P, papain; PK, proteinase K. The complete list of peptide sequences is presented in the Appendix A (Table A1).

Regardless of the substrate and enzyme applied, all released fragments were the potential sources of motifs matching the sequences collected in the BIOPEP-UWM database and known in the literature as bitter-tasting. Thus, it was the premise to define them as parent peptides. Taking into consideration all enzymes applied in the study, the highest number of parent peptides was potentially released from β-conglycinin (α-chain), while profilin was the protein revealing the smallest number of peptides produced. Generally, the best in silico enzymatic potential aiming to produce parent peptides from all soybean protein sequences was represented by bromelain and papain. Other enzymes, i.e., ficin and proteinase K, produced less parent peptides than the above enzymes. However, in most of the cases, proteinase K was predicted to be slightly better than ficin considering the number of peptides potentially released from soybean proteins. The most abundant potential sources of parent peptides containing one bitter-tasting indicator at minimum were: profilin (7 sequences, enzyme used: bromelain), 7S globulin (16 sequences each, enzymes used: bromelain and papain), glycinin (17 sequences, enzyme used: bromelain), β-conglycinin (β-chain) (18 and 17, enzymes used: bromelain and papain, respectively), and β-conglycinin (α-chain) (23 sequences, enzyme used: bromelain). The highest total number of parent peptides with encrypted bitter-tasting indicators, regardless of the enzyme used, was determined in the β- and α-chain of β-conglycinin (43 and 53 out of 78 and 93 generally released, respectively). Considering the enzymes that produced such parent peptides (i.e., with encrypted indicators), their ranking was as follows: bromelain > papain > ficin > proteinase K. The criterion in this ranking was the total number of parent peptides (i.e., the higher the number, the better place in the rank) containing one indicator of bitter taste at a minimum.

Dipeptides rather than tripeptides with or without the status of a bitter-tasting indicator were the great majority of motifs found in the parent peptides. Both two and three amino acid fragments were observed in the following peptides: ERPG-RP, RPG (source: profilin hydrolyzed with ficin); RQLEENLVVFDLA-**VF**, **LE**, LV, DL, EEN (source: 7S globulin hydrolyzed with proteinase K); SRPG-RP, RPG (source: 7S globulin hydrolyzed both with bromelain and ficin); EENL-EEN (source: 7S

globulin hydrolyzed with proteinase K); VEENICTMK-**VE**, EEN (source: glycinin hydrolyzed with bromelain); ESEGGL-EGG, GL, **GGL**, EG-EGGSV-GR, EGG-EG; EGGL-EGG, EG, **GGL**, GL (source: glycinin hydrolyzed with proteinase K); LLLPH-**LL, LL, LLL**, PFPSILG-FP, PF, PFP, **LG**, IL (source: β-chain of β-conglycinin hydrolyzed with papain), EENL–EEN (source: β-chain of β-conglycinin hydrolyzed with proteinase K); EIPRPRPRPQHPEREPQQPG-RP, RP, RP, PR, PR, PR, **EI**; EEDEDEQPRPIPFPRPQPRQEEEHEQREEQEWPRK-RP, RP, PR, PR, PR, PR, FP, PF, PFP, **PIP**; SEEEDEDEDEEQDERQFPFPRPPHQK-PP, RP, PR, FP, FP, FPF, PF, PFP (source: α-chain of β-conglycinin hydrolyzed with both bromelain and ficin); LLLPHFNSK-**LL, LL, LLL**; PVVVNA-**VVV**, VV, VV (source: α-chain of β-conglycinin hydrolyzed with bromelain); PIPFPR-PR, FP, PF, PFP, **PIP** (source: α-chain of β-conglycinin with papain); QFPFPR-PR, FP, FP, FPF, PF, PFP, PNTLLLPNH-**LL, LL, LLL**, LLLPH-**LL, LL, LLL**, and PVVVNA-**VVV,** VV, VV (source: α-chain of β-conglycinin hydrolyzed with proteinase K). Motifs in bold were the fragments with the status of being an indicator.

Some of the parent peptides in silico released from proteins contained such bitter-tasting motifs possessing or not possessing the status of indicators that fully or almost fully overlapped their whole sequences. They were for example encrypted in: VIRG-**VI, RG (**profilin, enzyme used: bromelain); EITLG-**LG, EI** (7S globulin, enzyme used: bromelain); DVLVIPLG-**VL, VI, LG**, LV (glycinin, enzymes used: bromelain and papain); VVLY/VV, **VL** (glycinin, enzyme used: bromelain); VVFK-**VF**, VV (glycinin, enzyme used: papain); EGGL-EGG, EG (glycinin, enzyme used: proteinase K); DIFL**-FL, IF** (α- and β-chain of β-conglycinin, enzyme used: ficin**)**; FVDA-FV, **VD**, DA (α- and β-chain of β-conglycinin, enzyme used: papain); VLFG-**VL**, FG, LF, VIVE-**VI, VE**, IV; PFPSILG-FP, PF, PFP, **LG**, IL (β-chain of β-conglycinin, enzyme used: papain); and NILE-**LE**, IL (α-chain of β-conglycinin, enzyme used: papain).

### *3.2. Monitoring the Process of SPC Hydrolysis*

Taking into account the ranking of enzymes suitable for soybean protein hydrolysis, as well as the number of peptide bitter-tasting indicators produced, the next step was to produce soybean hydrolysates to compare in silico and in vitro results. Five samples representing 0-SPC, B-SPC, F-SPC, P-SPC, and PK-SPC were subjected to RP-HPLC separation to observe the progress in SPC hydrolysis (see Figure 1 and Table 2). Three major time segments could be distinguished in all chromatograms, namely: 0.00–13.99, 14.00–39.99, and 40.00–60.00 min. The first time segment contained the highest peak eluting for about 10 min (not shown in Figure 1). It was an injection peak that could contain non-retained substances like the buffers used for protein/hydrolysate solutions and low molecular weight compounds present in protein concentrates [9].

**Figure 1.** RP-HPLC (Reversed-Phase High Performance Liquid Chromatography) chromatograms of 0-soy protein concentrate (SPC) (**a**), B-SPC (**b**), F-SPC (**c**), P-SPC (**d**), and PK-SPC (**e**). Retention time range displayed in the Figure is between 14th and 60th minute. Chromatograms were acquired at λ = 220 nm as recommended by Visser et al. [21]. Abbreviations: 0-SPC-soy protein concentrate, B-SPC, F-SPC, P-SPC and PK-SPC: bromelain-, ficin-, papain-, and proteinase K-soybean protein hydrolysate, respectively.

Three major time segments could be distinguished in all chromatograms, namely: 0.00–13.99, 14.00–39.99, and 40.00–60.00 min. The first time segment contained the highest peak eluting for about 10 min (not shown in Figure 1). It was an injection peak that could contain non-retained substances like the buffers used for protein/hydrolysate solutions and low molecular weight compounds present in protein concentrates [9]. In our previous experiment [9], the shape and area of an injection peak were approximately the same in all chromatograms. Therefore, peaks that could be observed within in 0.00 and 13.99 min were not considered for further interpretation. Peaks eluting between 40.00 and 60.00 min corresponded to high molecular weight compounds, like, e.g., proteins [9,22]. On the other hand, the chromatogram of the unhydrolyzed SPC sample contained much material with a short retention time (Table 2) as compared with the unhydrolyzed milk protein concentrate analyzed in our previous experiment [9].


**Table 2.** Results of soybean protein hydrolysis as revealed by RP-HPLC (Reversed-Phase High Performance Liquid Chromatography).

<sup>1</sup> The area of all peaks between 14 and 60 min is 100% (See Figure 1); <sup>2</sup> 0-SPC, B-SPC, F-SPC, P-SPC, PK-SPC: non-hydrolyzed soy protein concentrate, soy protein concentrate hydrolyzed by bromelain, ficin, papain, or proteinase K, respectively (see the Materials and Methods); <sup>3</sup> Calculated according to Equation (1).

The area of peaks within the range of 14.00–39.99 min and 40.00–60.00 min was almost equal in this experiment (see Table 2), whereas for milk protein concentrate, the ratio of total peak area within a shorter tR range to a longer tR range was only ca. 0.05 [9]. Taking the above into account, we introduced the parameter C (Equation (1)) to compare results obtained during experiments performed with soybean proteins (this experiment) and milk proteins [9].

All chromatograms differed when looking at the time interval between 14.00 and 39.99 min. It was also reflected in the differences between the percentages of total peak areas observed in these time segments (Table 2). The order of the C parameter value was as follows: ficin > papain > proteinase K > bromelain, which suggested that ficin was the most efficient among the proteolytic enzymes used for hydrolysis of the soybean protein concentrate. The order of susceptibility of milk protein concentrate (MPC) [9] was reported to be as follows: proteinase K (C = 19.6) > ficin (C = 18.4) > papain (C = 7.8) > bromelain (C = 1.6). Ficin seemed to be sufficient for hydrolysis of both MPC and SPC. Proteinase K hydrolyzed MPC more extensively than SPC. There was a relatively high amount of unhydrolyzed proteins after the hydrolysis of both protein preparations by bromelain.

### *3.3. Identification of Peptides Likely to Be Bitter Derived from SPC Hydrolysates*

The results of identification of parent peptides in SPC hydrolysate samples are present in Table 3. The highest number of such peptides was identified in ficin and bromelain hydrolysates of SPC (five and four, respectively). Peptides derived from F-SPC matched the sequences originally encrypted in: 7S globulin (1 peptide), glycinin (1 peptide), and β-chain of β-conglycinin (3 peptides). In turn, in the case of B-SPC, parent peptides were matched glycinin and the β-chain of β-conglycinin (two peptides each). One parent peptide was reported in papain SPC hydrolysate (source: α-chain of β-conglycinin), whereas no peptides were found in proteinase K soybean protein hydrolysate. To summarize, the most effective enzymes producing in vitro the highest number of parent peptides were ficin > bromelain > papain. One parent sequence, i.e., LI**VI**LNG, was identified in both B-SPC and P-SPC and matched both chains of β-conglycinin analyzed. All parent peptides identified in SPC hydrolysates contained motifs both with and without the status of a bitter-tasting indicator. Taking into consideration 339 parent peptides as the total sum of the sequences predicted as identified after in silico hydrolysis of soybean proteins and 10 parent peptides in vitro identified in SPC hydrolysates, the compatibility of the in silico and in vitro results of peptide identification was ca. 2.95%.


**Table 3.** Parent peptides identified in SPC hydrolysates using LC-MS/MS (Reversed-Phase High Performance Liquid Chromatography online with tandem Mass Spectrometry).

<sup>1</sup> Bitter-tasting indicators are given in bold; <sup>2</sup> B-SPC, F-SPC, and P-SPC: bromelain-, ficin-, and papain-soybean protein hydrolysate, respectively (see the Materials and Methods); <sup>3</sup> peptides with no indicator status are given in normal font.

Parent peptides were considered as identified in SPC hydrolysate if (a) the fragment ions were observed in the particular retention time (tR) and (b) tR predicted and tR experimental differed within the range of ±10% [19,22]. Peptide identification was understood as detection of a group of fragment (daughter) ions eluted in the same retention time [19,22,23]. Some of them could be formed via the non-sequential charge-directed pathway, leading to their formation by loss of water or the ammonia neutral molecule [24].

An example of a parent peptide that was identified in SPC hydrolysate fulfilling the above-mentioned criteria was the **FFEI**TPEK sequence (Figure 2) containing two bitter-tasting indicators: **FF** and **EL**. The **FFEI**TPEK was identified in an F-SPC sample and matched the β-chain of β-conglycinin. The m/z of the precursor (M+H)<sup>+</sup> ion was 1010.500 Da. An intensive peak was observed in tR experimental =36.200 min (tR predicted = 35.950 min). Moreover, the detection of this peptide in an F-SPC sample was confirmed by the presence of eight fragment ions that eluted within the above-mentioned time. They were as follows: X4 <sup>+</sup>, C4 <sup>+</sup>, B8 <sup>+</sup>, B7 <sup>+</sup>, B6 <sup>+</sup>, A8 <sup>+</sup>, Z7 <sup>+</sup>, and Y5 <sup>+</sup>, and their intensity was expressed from several hundred (Y5 <sup>+</sup> ion) to thousands of counts (kCounts; all other fragment ions).

**Figure 2.** LC-MS/MS (Reversed-Phase High Performance Liquid Chromatography online with tandem Mass Spectrometry) chromatograms of total ion current (upper chromatogram: 1010.5 > 272:1021 Da) and daughter ions of the **FFEI**TPEK peptide matching the β-conglycinin (β-chain) fragment, identified among products of the soybean protein concentrate hydrolyzed by ficin. Nomenclature of daughter ions according to Roepstorff and Fohlman [25].

One peptide, i.e., DD**FFL**, with two encrypted indicators: **FF** and **FL**, was identified in F-SPC. It matched the β-chain of β-conglycinin. The most intensive peak eluted for about 35 min (see Figure 3); however, the theoretical retention time calculated for this peptide was 41.170 min. The difference between tR predicted and tR experimental exceeded the threshold value of 10% for defining the peptide as identified [21]. The m/z of the (M+H)<sup>+</sup> precursor ion of the DD**FFL** peptide was 656.300 Da, and nine intensive peaks were observed in 35 min. They were the following fragment ions: X2 +, C4 <sup>+</sup>, B4 <sup>+</sup>, B3 <sup>+</sup>, B2 <sup>+</sup>, A3 <sup>+</sup>, A5 <sup>+</sup>, Z2 <sup>+</sup>, and Z4 <sup>+</sup>. All of them occurred in thousands of counts. Thus, the DD**FFL** peptide was considered as identified in the F-SPC sample. Some differences between predicted and experimental RT (retention time) were observed during the microLC-ToF-MS (i.e., micro liquid chromatography time-of-flight mass spectrometry) identification of bioactive peptides derived from yoghurt. According to Kunda et al. [26], such an unequivocal identification based on retention time differences is called Type 1 identity conflict.

**Figure 3.** LC-MS/MS chromatograms of total ion current (upper chromatogram–656.3 > 181: 666 Da) and daughter ions of the DD**FFL** peptide matching the β-conglycinin (β-chain) fragment identified among products of hydrolysis of soybean protein concentrate by ficin. Nomenclature of daughter ions according to Roepstorff and Fohlman [25].

It is well documented in the literature that some biopeptides impart bitter taste, which is a limitation when thinking about their implementation in foods supporting the prophylaxis of diet-related diseases [27]. This especially concerns ACE-inhibiting peptides [28]. Some of the bitter-tasting motifs encrypted in all parent peptides identified in SPC hydrolysates showed an additional biological effect as judged by BIOPEP-UWM database screening (see Table 4). A more comprehensive list of the bioactivities of bitter peptides may be found in the Supplement to our previous review [29]. It was evidenced that the application of different proteases or the fermentation process of soybean proteins leads to obtaining the unique peptide profiles exhibiting, e.g., antihypertensive [30,31], hypocholesterolemic [32], antioxidative [33], and antidiabetic [1] effects. The majority of bitter fragments occurring in parent peptides that were identified in all SPC hydrolysates were associated with ACE-, DPP-IV-, and DPP-III-inhibiting or glucose uptake-stimulating activity. The first activity plays an important role in the reduction of blood pressure [34], while DPP-IV inhibitors are responsible for the antidiabetic effect [35], and the latter ones are involved in prolonging the action of endogenously released or exogenously applied enkephalins (i.e., a promising agent in chronic pain management) [36].


**Table 4.** Additional bioactivity of bitter motifs found in parent peptides identified in SPC (soybean protein concentrate) hydrolysate samples.

<sup>1</sup> Bitter-tasting indicators are given in bold, and peptides with no indicator status are given in normal font. Abbreviations: ACE, Angiotensin-converting enzyme (EC 3.4.15.1); DPP III, dipeptidyl peptidase III (EC 3.4.14.4); DPP IV, dipeptidyl peptidase IV (EC 3.4.14.5).

### **4. Discussion**

The study above was divided into two steps: bioinformatic and experimental analyses. According to Tu et al. [37], bioinformatic analyses help in better understanding biological data and hence their application in food and nutrition increases. Moreover, the fact of the presence of some unique fragments within the primary structures of peptides underlies their different biofunctions. Bioinformatic studies enable minimizing the number of experiments involved to determine the impact of the structure of a peptide on its biological function [37]. Moreover, in silico experiments are relatively easy and less costly to carry out, and they do not require reagent and sample preparation [38]. Combining the bioinformatics with an experimental verification of the results is, in turn, a kind of methodology called an integrated (i.e., hybrid) approach. This term was introduced by Udenigwe [38], and ever since, many studies on bioactive peptides, including the present one, have implemented procedures to solve a problem using databases of biological information and then involving laboratory analyses [5].

To the best of our knowledge, the hybrid and fragmentomic approaches have not been applied so far to analyze the potential of soybean proteins as the reservoir of bitter-tasting motifs. Moreover, our protocol of research was partially based on positive selection [39], i.e., finding in the protein the sequential motifs with already known biological function (i.e., provided in a database). Fu et al. [39] employed this strategy to evaluate the potential of patatin (*Solanum tuberosum*; potato protein) as a source of biopeptides. The biological dataset (peptide sequences) was acquired from the BIOPEP-UWM database. The novel aspect of our research was the use of the positive selection to define bitter-tasting motifs (especially peptide indicators) in parent sequences potentially released from soybean. According

to Agyei et al. [40], the prediction of the bitter taste of foods may support the development of the procedures for masking this taste. Although such a strategy of research has some limitations, it may help discover new peptides [40]. This is a universal approach that may be applicable to in silico assessment of any protein as the reservoir of peptides with any biological function. Although the term "fragmentomics" is not widely used in publications, it represents one of the common contemporary strategies of research on bioactive peptides. Recent applications of fragmentomics concerning the bioactivity of peptides derived from various food sources have been described by Sutopo et al. [41], Alcaide-Hidalgo et al. [42], and Kinariwala et al. [43].

As was said above, the substrate for bioinformatic analyses was the sequences of five soybean proteins, which were subjected to the theoretical hydrolysis using four enzymes: bromelain, ficin, papain, and proteinase K. According to the literature, soybean is widely used in human nutrition and animal feeding due to the high biological value of its proteins. Moreover, it is the least expensive source of protein, which may be helpful in solving feeding and agronomical problems [44]. However, hydrolysis of proteins, also these from soybean, contributes to the production of peptides that taste bitter and are not accepted by Western consumers, even if they offer a health-beneficial value [45,46]. Our initial bioinformatic predictions also confirmed the potential of soybean proteins as the richest sources of bitter-tasting sequences when comparing them to the other sequences of proteins derived from grains, oil, and leguminous plants (21 sequences in total; data not shown). This comparison was made based on one of the criteria serving to evaluate proteins as sources of peptides with a particular bioactivity (i.e., bitterness in this case) defined as the frequency of the occurrence of bitter peptides in a protein chain, a parameter called "A" available in the BIOPEP-UWM database (data not shown). According to the scientific reports, the higher the A, the better the potential of a protein to release peptides with a specified biofunction [47]. A similar trend was successfully followed by K ˛eska and Stadnik [48], who analyzed 16 sequences of porcine proteins as sources of tastant peptides. The potency of myofibrillar and sarcoplasmic proteins was estimated based on the values of a parameter called "abundance of taste-active peptides/amino acids in a protein sequence" (AR), being mathematically a twin version of the A value provided in the BIOPEP-UWM database. AR calculations enabled a theoretical indication about which of the porcine proteins may have the strongest impact on the sensory properties of meat products [48].

Most of the endopeptidases that are applied to produce soybean protein hydrolysates are derived from microorganisms and/or plants. Among them, bromelain, papain, and ficin are stated as the enzymes with large scale availability when comparing them to the proteases originating from animal sources [49]. Although, the analysis of the scientific reports did not show any data about the possible application of the fourth enzyme, i.e., proteinase K, aiming to produce peptides from soybean proteins, this enzyme was included on the in silico part of the study. This decision resulted from the comparison of the effectiveness of almost all enzymes available in BIOPEP-UWM (expressed as the number of peptides released) to produce potentially bioactive peptides, especially tastant peptides (data not shown). The comparison excluded pepsin, trypsin, and chymotrypsin as digestive tract enzymes. They naturally digest proteins when ingested [46].

The results of the enzymatic release of a fragment containing bitter-tasting motifs with and without the status of an indicator are presented in Table 1. This way of data analysis fits the research trend of fragmentomics relying on ascribing the "unknown" biological function of a molecule to the known function of motifs occurring in it and is based on positive selection (see above) [7,39]. In our study, the molecule with no predicted function was a compound called the parent peptide, whereas the motif deciding its possible function was the bitter-tasting peptide. Thus, to be consistent with the fragmentomic idea, data analysis was limited to the released peptides composed of at least four amino acids. Data about the bitterness of such peptides were implemented from the BIOPEP-UWM database of sensory peptides and amino acids [20]. Based on the MLR stepwise regression analysis, some of these peptides were selected as bitter-tasting peptidic indicators, i.e., peptides with approximate predicted and experimental Rcaf. values [13]. MLR is one of the chemometric techniques that enables analyzing the relationship between the structure (i.e., sequence) of a peptide and its biological activity, which is called QSAR (i.e., quantitative structure-activity relationship) [50]. According to Chanput et al. [51], the QSAR methodology is one of the important methods for the analysis of the functionality of peptides. The results of MLR showed that the bitter taste of a peptide is attributable to the presence of some specific residues, like Leu, Ile, Val, Phe, and Tyr [13]. This observation was consistent with the statements of other authors studying bitter peptides using QSAR techniques [52].

Ample in silico studies have been undertaken to evaluate proteins as sources of peptides with various bioactivities using the BIOPEP-UWM database [12]. The analysis of data concerning theoretical release of peptides can be considered from several perspectives, namely: how the length of motifs affects their match to the parent peptides; how the length of the parent peptide contributes to the presence of a bioactive motif in it; and finally, how the length of protein affects the number of parent peptides to be produced. As could be seen, parent peptides that were theoretically released from soybean proteins were the sources of bitter motifs with and/or without the status of an indicator. The great majority of such motifs were dipeptides. According to Iwaniak and Dziuba [53], the match of the motif to the query sequence depends on its chain length (i.e., the shorter, the better). The potential of parent peptides as sources of peptides with a specified function can be quantitatively evaluated using the A parameter (see above). So far, the discriminant A had been successfully applied to compare proteins according to the rule: the higher the A value, the better source of biopeptides the protein of interest is [47]. As regards the parent sequences, no regularities were observed between their chain length and the increasing tendency in the occurrence of bitter-tasting motifs. An example in this case is the D**IFL**/**FL, IF** peptide (from β-conglycinin β-chain hydrolyzed with ficin). Its A value was 0.500. The ETSFHS**EF**E**EI**N peptide was derived from the same source as D**IFL** and contained two bitter-tasting motifs: **EI**, **EF**. Hence, its A value was 0.166. In the case of parent peptides, also their particular amino acid composition is crucial to predict their biological function including the likelihood of them to taste bitter. To exemplify it, these two criteria were fulfilled for parent peptide FSHNI**LE**TSFHSEFE**EI**NR**VL**FG/FG, LF, **VL**, **LE**, IL, **EI**, EF theoretically released from β-conglycinin β-chain hydrolyzed with bromelain. The A value for this peptide was 0.304. Finally, those protein sequences that had the longest amino acid chains represented the best potential to produce several tens of peptides. This rule also suggested the higher probability of peptide release in laboratory conditions [54]. In the case of the in silico results obtained, the highest number of parent peptides likely to be bitter due to the presence of bitter-tasting motifs was produced from α- and β-chains of soybean β-conglycinins. According to literature reports, glycinin and β-conglycinin represent about 65-85 % of the total soybean proteins, and they are known as good reservoirs of bioactive peptides [46].

All in silico results encouraged us to continue studies using laboratory analyses, starting from the hydrolysis of SPC. Hydrolysis was conducted for 3% (w/v) water solutions of SPC. According to Arboleda et al. [55], commercial soybean preparations available in the form of powder (as used in our study) are highly soluble in water if the protein solution does not exceed 20 %. The enzyme-to-substrate ratio for all four samples to be hydrolyzed was 1:100 (protein; w/w). Such proportions of enzymes and substrate were applied by Peñta-Ramos and Xiong [15] for the hydrolysis of soy protein isolate using crude enzymes, like papain, pepsin, and chymotrypsin, as well as commercial proteases to measure the antioxidant activity of hydrolysates.

RP-HPLC comparison of the percentages of the peak areas of all SPC hydrolysate samples allowed monitoring the process of their hydrolysis. The analysis of RP-HPLC chromatograms was successfully applied to observe the progress of hydrolysis of carp, herring, and milk proteins [9,22]. The changes in peak areas observed in different time intervals enabled concluding that SPC hydrolysis took place and allowed identifying parent peptides in the SPC hydrolysate samples using RP-HPLC-MS/MS.

Discrepancies were observed between the number of parent peptides that were identified in silico and in vitro, which is a common fact in the literature [56]. One of the possible reasons behind differences between the in silico and in vitro results was the fact that the sequences produced using the in silico method had to match those already present in the database [9]. The results of this work showed the successful identification of nine parent peptides in total (see Table 3) in all SPC hydrolysates, whereas the in silico simulation of hydrolysis of soybean sequences allowed obtaining 339 parent peptides. Bucholska and Minkiewicz [22] defined the likely reasons that may affect the unsuccessful identification of peptides in a hydrolysate sample. They included no detectable amount of peptide in the sample (e.g., if some peptide bonds likely to be cleaved are actually resistant to hydrolysis) and no detectable fragmentation in an ion trap mass spectrometer. Additionally, peptides identified in SPC hydrolysates are defined as proteotypic peptides, and their identification is dependent on the mass spectrometer used. Finally, there is no method that would serve for the identification of all possible peptides in a protein hydrolysate [22]. The last reason for the discrepancy might be the fact that in silico predictions assume that all peptide bonds are hydrolyzed in a protein chain.

Ficin was the most effective enzyme for producing peptides in vitro, whereas bromelain in the theoretical production of parent peptides likely to be bitter. Enzyme rankings were as follows: bromelain > papain > ficin > proteinase K (in silico results) and ficin > bromelain > papain (in vitro results). There was no correlation between the results of RP-HPLC (C parameter from Table 2) and the number of identified peptides with bitterness indicators. Despite the differences between these rankings, the richest source of parent peptides produced by the applied enzymes turned out to be conglycinins. No parent peptides that were identified in all SPH matched the sequences known as bitter themselves. Moreover, none of these peptides were found as bioactive themselves (see Table 4). Iwaniak et al. [9] identified some parent peptides likely to be bitter in milk protein hydrolysates. For example, the PFPIIV peptide matched the bitter-tasting sequence in the BIOPEP-UWM sensory peptide database (peptide ID in the database: 195). PFPIIV was known as bitter itself, contained PFP, PF, IV, **FP, II** bitter motifs, and was identified in bromelain, ficin, and papain hydrolysates of milk protein concentrate. Moreover, two milk protein-derived peptides: TTMPLW (source: papain hydrolysate) and VLPVPQK (source: bromelain hydrolysate), served additional biofunctions. Based on information found in the BIOPEP-UWM bioactive peptide database, the first sequence exhibited opioid, immunomodulating, and ACE inhibitory effects (peptide IDs in the database: 3127, 8172, and 3520, respectively). The second one was reported as an antioxidative agent (peptide ID: 7877) [9]. As was said above, bitterness associated with bioactive peptides may be found as an unwanted property when thinking about the production of food that is health-beneficial and sensorially-attractive [57]. Hence, our concept relying on the fragmentomic idea of research was useful to: (1) find out if the parent peptides were bitter/bioactive themselves or (2) motifs like, e.g., peptidic bitterness indicators that are encrypted in a sequence exhibiting other activities, may "suggest" the bioactivity/taste of the parent sequence. This concept shows also the potential of enzymes to produce motifs supposed to be bioactive/bitter. The enzymes' potential to produce unwanted peptides may initially be also an indicative criterion of their "negative" selection (i.e., rejection) when considering the production of non- or less bitter hydrolysates. However, according to Fu et al. [39], it needs to be noted that although the theoretical predictions concerning the potential of proteins as sources of peptides and enzymes involved in their production can support further research, such predictions are never 100% consistent with experimental results. This is also true concerning our concept.

The discrepancies between in silico and in vitro results may be associated with the specificity of bioinformatic and experimental analyses. Bioinformatic issues are related to the tools used for predictions, especially databases of biological information. Many data on biological peptides are published every year. Our quick search in Scopus (accessed: January 2020) showed 188 records reflecting the number of articles that were published in 2018 (about 15.7 per month) and contained the following exact words as queries "bioactive peptides AND food proteins". The same query words were used to the search for the articles published in 2019. This simple search showed 241 papers published in 2019 (i.e., 20.08 per month). Both searches were limited to articles, and they excluded conference abstracts. Thus, it may appear that there are some biological activities represented by both parent peptides and their bitter-tasting motifs that were published after this paper was released, and thus, they were not included when we ran these studies. A similar search for the latest publications

concerning sensory peptides showed no new sequences with confirmed bitterness that could be downloaded to the BIOPEP-UWM database and used for further analyses. Thus, our results concerning the presence of bitter-tasting motifs in parent sequences seemed to be relatively steady. Nevertheless, it is recommended to carry out the regular update of databases [38], as well as to use curated databases, like, e.g., BIOPEP-UWM [12].

Experimental issues that possibly affect the differences between in silico and in vitro results may be related to the hydrolysis of protein. Theoretical hydrolysis via bioinformatic tools, including BIOPEP-UWM, is based on the specificity of an enzyme, and hypothetically, all peptide bonds are accessible to enzymatic cleavage [58]. The cleavage of peptide bonds is one of the descriptors characterizing the nature of the enzyme [59]. However, the effectiveness of enzymatic hydrolysis depends also on the temperature, the pH, the enzyme-to-substrate ratio [46], and the types of the molecules affecting the inhibition of the enzyme [60], which are difficult parameters to be included as the "elements" of programs serving for theoretical proteolysis. Moreover, such predictive hydrolysis is relatively simple, especially when the protein is not chemically modified. For example, glycosylated amino acids present in a protein may block the breakdown of a peptide bond in experimental conditions [61]. Additionally, the computer simulation of hydrolysis does not consider the complexity of the protein structure that might hinder its interactions with proteolytic enzyme [58]. Rawlings [62] added to this list some other factors likely to affect the results of proteolysis carried out in experimental conditions, i.e.: location of the enzyme and substrate in different extra- and intra-cellular regions and the involvement of inhibitors. They are not considered by programs for theoretical hydrolysis. In turn, Ashaolu [46] highlighted that proteolysis of soybean carried out in experimental conditions enabled the reduction of waste, establishing the mild conditions of reaction, avoiding unwanted reactions; however, the obstacle is the cost of enzymes, as well as the recovery and/or removal of bitter taste [46].

Despite the discrepancies between the in silico and in vitro results of the hydrolysis of proteins aiming to produce biopeptides, scientists use the hybrid approach more often [63]. Iwaniak et al. [9] applied this protocol to hydrolyze milk protein concentrate using exact enzymes as in the study above. They showed that the results of experimental and theoretical analyses differed. Twenty-eight peptides with the potential to affect the bitter taste of hydrolysates were produced in laboratory conditions, which was consistent by ca. 12 % with computer simulations (226 parent peptides identified in silico). Darewicz et al. [19] used the in silico/in vitro/ex vivo analyses to study the potential of salmon (*Salmo salar*) proteins as the source of ACE inhibitory peptides. To verify the in silico results in laboratory conditions, protein hydrolysis was carried out in vitro (with commercial enzymes) and ex vivo (using digestive juices from volunteers). It was confirmed that some ACE inhibitors were identified in both hydrolysates; however, some of them were observed either in the ex vivo or in the in vitro salmon protein hydrolysate [19].

To recapitulate, the comparison of in silico and in vitro results also differed in our experiment concerning soybean proteins as the material. Nevertheless, they still can be supportive when predicting the bitter-tasting properties of parent peptides based on the fragmentomic approach. Citing the words of Li et al. [64], "the BIOPEP-UWM database is a useful data source that supports the analysis of connections between peptide molecular structures and their sensory properties, and additionally enables users to build a potential sensory profile of peptides". The BIOPEP-UWM database was applied to predict the umami taste of peptides derived from clam *M. meretrix Linnaeus* [64]. Garcia-Vaquero et al. [45] applied the hybrid approach to assess the bitter taste properties of novel peptide ACE inhibitors derived from *U. lactuca* (green algae). To this end, green algae proteins were in vitro (enzyme used: papain) and in silico (tool used: PeptideCutter) hydrolyzed. The bitterness of peptides that were identified in the hydrolysates of *U. lactuca* was predicted based on the calculation of the Q-value introduced by Ney [65]. The Q-value is empirically associated with the hydrophobicity of each amino acid in a peptide sequence. Such a correlation is defined as the Q-rule, according to which the peptide likely to be bitter should have Q over 1400 cal/mole [66]. Based on the Q-rule, it was found that the following green algae-originating sequences: SA**GVL**PWK (**GV**, **VL)**, GAAPTPPS**PPP**AT**KP**STP**PKP**PT (**PK,** **KP, PPP)**, IECC**LL**FALV (**LL**), PVGCLPK (**PK**), DAVEIWRVK (**VE, EI**), DEVIPGAL (**VI**), **PKP**PALCN (**PK, KP**), and PPNPPNPPN, were characterized with Q-values ranging from 1440 to 1743.33 cal/mol, which suggested they could taste bitter [45]. Defining these sequences as "parent peptides" and implementing the fragmentomic idea, we found that seven out of eight such peptides contained motifs (in bold) described as bitter-tasting indicators (provided in brackets). Working the opposite way, we calculated Q-values (cal/mole) for all parent peptides that were experimentally identified in SPC hydrolysates. They were as follows: 1611.43 (LS**VI**S**PK**), 1793.75 (D**VLVI**P**LG**), 1780.00 (LI**VI**LNG), 1721.66 (NP**FL**FG), 1358.33 (ISSTIV), 1908.33 (PQMIIV), 2220.00 (PFPSIL), 1760.00 (DD**FFL**), and 1741.25 (**FFEI**TPEK). Only one peptide (ISSTIV) had the Q-value below 1400 cal/mole. Based on the Q-value, the other peptides identified in SPC hydrolysates containing bitter-tasting motifs with and without the status of an indicator could be classified as bitter. However, some studies on the bitterness of peptides indicated that some of them with the Q-value lower than 1400 cal/mole also tasted bitter. An example of such a peptide is the E**VL**N sequence (Q-value = 1162.5 cal/mole) identified in cheddar cheese and matching αs1-casein as a protein precursor [67]. Moreover, taking into account our point of view, this peptide contains the **VL** bitter-tasting motif, which may be crucial in defining the taste of the whole sequence. Another aspect related to parent peptides likely to be bitter is their size [68]. According to the literature, peptide fractions of soybean protein hydrolysates with molecular weight ranging from 1.9 to 3.3 kDa were characterized by the highest bitterness, whereas the bitterness of other fractions with lower or higher molecular weight than indicated above was milder [68]. Our parent peptides differed in the molecular weight. Thus, the strategy for the elimination of the unwanted taste of a hydrolysate should also be based on the knowledge concerning the amino acid composition of peptides [49]. This point of view fits the fragmentomic idea of the research presented in this article.

The example of positive selection [39] (see above) can be also applicable to enzymes used for the hydrolysis of protein to release peptides. Briefly, the enzyme that is in silico the most effective at producing bioactive peptides may be selected for the hydrolysis of proteins in experimental conditions. This allows identifying the most effective enzymes releasing peptides likely to be bitter (i.e., possessing an unwanted property) from soybean proteins. When thinking about bitterness, such a strategy of research may be the starting point for negative selection, which is the elimination of such enzymes to avoid the production of peptides with the undesired taste. However, the elimination of the enzyme might be problematic if it produces many bitter sequences exhibiting an additional, health-beneficial effect. In such a case, some further procedures should be considered to reduce the bitterness of bioactive hydrolysates. An example of such a procedure is the treatment with exopeptidases [69]. Cheung et al. [69] hydrolyzed whey protein isolates with different commercial endopeptidases and obtained a bitter-tasting ACE-inhibitory hydrolysate. To reduce its bitterness, the hydrolysis was extended with exopeptidases treatment. The taste of this hydrolysate was then more acceptable because of the removal of terminal residues as a result of exopeptidase action [69].

To summarize, our idea of research showed that the potential property of peptides released from proteins might be predicted based on the presence of sequential motifs with already known function (so-called positive selection). This helps understanding some relationships between the structure of a molecule and its activity (e.g., bitter taste). It needs to be noted that although it is possible to prognose the potential of released peptides to taste bitter, such a statement should be confirmed in laboratory conditions [70].

Our methodology might be an effective and universal way to analyze the biological functions of peptides found in protein hydrolysates. Although each of the methodologies contributing to the discovery of bioactive peptides has some limitations and specific character, the strategies involving the wide range of in silico approaches combined with conventional studies can play an important role in the generation and discovery of new peptides [71]. Thus, the employment of several methods for the theoretical prediction of the bitterness of a molecule is inscribed into this trend and is highly recommended.

### **5. Conclusions**

Results concerning the production of peptides likely to be bitter derived from soybean proteins showed the discrepancies between in silico and in vitro results. These discrepancies concerned the number of parent peptides released when applying theoretical and experimental hydrolysis, as well as the ranking of enzymes effective in the production of peptides possessing di- and tri-peptide motifs defined as indicators. The compatibility percentage of in silico and in vitro results was less than 3%. However, all parent peptides that were identified in SPC hydrolysates contained motifs having or not the status of bitter-tasting indicators. Their Q-values suggested they could taste bitter. Thus, it could be concluded that the fragmentomic idea of research analyzing the composition of a parent sequence might be a supportive tool for the prediction of the taste of foods. Moreover, it broadens the knowledge about the structural nature of peptides as bioactive/tastant molecules. However, it is postulated that such statements concerning the peptides likely to be bitter should be evaluated experimentally.

**Author Contributions:** Conceptualization, A.I.; methodology, A.I., M.H., and P.M.; investigation, A.I., M.H., J.B., P.M., and M.D.; resources, M.D. and P.M.; writing, A.I. and M.H.; writing, review and editing, A.I. and M.D.; funding acquisition, M.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** Project financially supported by the Minister of Science and Higher Education in the range of the program entitled "Regional Initiative of Excellence" for the years 2019-2022, Project No. 010/RID/2018/19, amount of funding 12,000,000 PLN, as well as the funds of the University of Warmia and Mazury in Olsztyn (Project No. 17.610.014-110).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Abbreviations**



### **Appendix A**

**Table A1.** Parent peptides released in silico from soybean proteins containing bitter-tasting peptides with and without the indicator status (bold and normal font, respectively).




**Table A1.** *Cont.*

<sup>1</sup> B, bromelain; F, ficin; P, papain; PK, proteinase K.

### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Agricultural Entrepreneurship in the European Union: Contributions for a Sustainable Development**

### **Vítor João Pereira Domingues Martinho 1,2**


Received: 24 February 2020; Accepted: 17 March 2020; Published: 19 March 2020

**Abstract:** Entrepreneurship is sometimes seen as a glimmer of hope which may bring about some contribution towards improving economic dynamics and performance, specifically in the creation of employment by young people, in general, with further educational training, greater flexibility and who are better prepared for working with new technologies. However, entrepreneurship in the agricultural sector is, in certain circumstances, viewed as being something incompatible or, at least, difficult to implement. More scientific studies in these fields could provide interesting contributions on the road to highlighting new ideas inside the farming sector. In this framework, the objective of this study is to explore the entrepreneurship dimensions within the European Union agriculture towards a more sustainable sector. In fact, without an economic dimension in farm management, its sustainability in the medium and long run may be compromised, increasing the abandonment of farming, namely in more disadvantaged regions. For this, the literature which is available on the platform Web of Science relating to the following three topics was initially analysed: entrepreneurship, agriculture, and the European Union. This literature was clustered through the VOSviewer software, an interesting tool for performing bibliometric analysis. Secondly, statistical information related to European Union agricultural entrepreneurship considering empirical approaches was also explored. The analysis carried out shows that the realities across European Union countries are, in fact, different, where the instruments from the common agricultural policies, for example, may play a crucial role in promoting more farming entrepreneurship in a more sustainable way.

**Keywords:** VOSviewer software; bibliometric analysis; statistical analysis; agricultural innovation

### **1. Introduction**

Bringing about new ideas is a fundamental approach in every sector. This is particularly important in agriculture, considering its specificities and lower capacity to sometimes create innovation, and in less favoured contexts of the European Union, such as in rural areas, frequently suffering from a lack in dynamics. In some European Union countries, such as Portugal, many things have already been done in this manner, namely with European financial funds, but there is still much to do. This is primarily due to the fact that for many years, the European agricultural policies within the framework of the Common Agricultural Policy (CAP) are or were socially unjust (favouring larger farms) and economically inefficient (conditioning farmers to opt for the most subsidised productions) in some member-states. On the other hand, the European agricultural strategies could be more directed towards promoting agricultural entrepreneurship and leadership. This is a typical problem which stems from having common policy instruments for a set of countries and regions with great differences amongst them. In any case, these frameworks have limited the potential for the development of farms which are located in certain regions [1].

The consequences of this are the vast differences in the levels of development across farms from different European countries and sometimes across farms from diverse regions within the same member-state. Another question, in addition or in parallel to the economic performance, refers to the discussion about the social and environmental contributions of the farms, namely in less affluent regions, where the agricultural sector, specifically family farming, provides a decisive contribution towards balanced development. However, it would be interesting if the model of this family farming inside the European Union were to be rethought, as some of this agriculture is practiced by older farmers or by farmers who, with their current levels of income, will probably decide in the near future to abandon this sector and the regions in which they live [1].

In this way, it is fundamental to bring about new approaches and sometimes to look at things from a different perspective in order to renew/refresh the agricultural sector, specifically in regions with a greater risk of abandonment. Innovation and entrepreneurship should play an important part here, not only in the agro-food sector, but namely in the production sector (agrarian sectors). It is specifically important to attract younger generations as well as the most qualified professionals.

Considering this context, this study aims to highlight the main insights available in the scientific literature related with agricultural entrepreneurship in the European Union. To deeper explore these insights, the literature review was complemented with bibliometric analysis. Data and empirical analyses were also performed to better explore the actual realities in these domains. These approaches made it possible to find a set of proposals to improve the sustainability of farms in the European Union regions.

### *Further Explanation of the Research Approach*

This subsection aims to clarify the following aspects: What is the main contribution of the paper? How does the existing literature miss the role of entrepreneurship? Does the EU sufficiently support this problem in its policies? How exactly is entrepreneurship defined and measured?

This research intends to bring more insights for the understanding of agricultural entrepreneurship, specifically, for the context of the European Union. There are interesting contributions about these topics, as highlighted in the literature review, but there is still enormous potential to be explored, because agricultural entrepreneurship is a topic that does not attract as much attention from the several stakeholders as in some other sectors. In fact, it is important to further explore the scientific literature available on the Web of Science platform. It is also important to analyse the statistical information available for some fields considered by the literature as relevant to agricultural entrepreneurship, such as, for example, those related to women and young people. In turn, it is relevant to show how these variables influence agricultural performance in the European Union, namely, for instance, to eventually propose policy adjustments.

Following these motivations, the bibliometric analysis was considered, namely, to highlight the main insights of the scientific literature and to support the organization of the literature review. The information obtained with the bibliometric analysis and literature review was considered to identify the main variables related to these domains to be explored through data analysis and econometric approaches.

The concept of agricultural entrepreneurship was considered in all its dimensions. In fact, agricultural entrepreneurship is important for more competitive farms, to strengthen their position in the market, but also for more familiar farms, to improve their socioeconomic and environmental contributions. In practice, entrepreneurship presupposes innovation and new ideas for any stage, from production to final consumption.

### **2. Material and Methods**

The study proposed here aimed to identify the main gaps in the scientific literature related to these topics under analysis and to identify the main factors that influence agricultural entrepreneurship within the European Union. With these objectives, the intention was namely to provide further insight into the design or redesigning of new strategic plans that will promote entrepreneurship in the agricultural sector whilst taking advantage of the available resources, specifically from agroforestry land. In this way, the scientific literature relative to the subjects analysed was explored through bibliometric analysis and literature survey, so as to highlight how entrepreneurship may be further developed inside the agricultural framework. The bibliometric analysis is an interesting approach, specifically, to support the organization of this research. Subsequently, statistical information was examined through descriptive (data analysis) and empirical (regressions based on the Cobb-Douglas model) analyses, namely in order to stress the impacts from entrepreneurship variables on social and economic dimensions. The Cobb–Douglas model (production function) allows us to analyse relationships between several production factors and the output and has its relevance within this study. This approach was followed so as to interconnect the literature survey about agricultural entrepreneurship in the European Union with the empirical reality verified in the several member-states mirrored by variables related with these topics and available in the main statistical databases (namely Eurostat). It was considered important to present these interrelationships and the selection of the chosen variables already took into account the insights from the literature analysis (where, for example, the role of women and younger people in farm management was stressed, as well as, for instance, the sustainability of farms).

### **3. Bibliometric Analysis of the Literature Available on the Web of Science**

In this section, the literature was first analysed through the VOSviewer software (Nees Jan van Eck and Ludo Waltman, Leiden, The Netherlands) [2], considering the scientific studies available on the platform Web of Science [3] and through the University of Burgos (Spain), where we stayed for a week on an Erasmus+ mission. On this scientific platform, 89 studies were found (including 76 articles, 20 meetings and 3 books) in a search performed at the end of May 2018 that included the topics: entrepreneurship; agriculture; European Union. After this initial analysis, in a subsequent sub-section, scientific studies through a literature review will be further explored. It is worth stressing that this kind of analysis for the agricultural sector follows studies such as, for example, that developed by Martinho [4], where the bibliometric analysis is an interesting tool with relevant outcomes.

### *3.1. Literature Analysis through the VOSviewer*

Considering a minimum number of occurrences in all documents of a term of 5, the VOSviewer software selected the 70 terms presented in Table 1 with the respective number of occurrences and relevance score. This minimum number was chosen as after several simulations this was the value which possessed a greater relevance for the main terms. The relevance score indicates the terms which were more demonstrative of the topics analysed [2]. It is worth stressing that despite the importance of the agricultural policies for the questions related to farming entrepreneurship, as stressed before, it seems that the literature gave them little relevance, as shown in the bibliometric analysis performed through the VOSviewer software (in Table 1 the term "policy" appears with a low relevance of 0.51). These aspects related to agricultural policy will be explored at the end of this study, considering the findings obtained from the bibliometric and statistical analyses.

To improve the interpretation of the map, the following terms of relevance below 1.00 were excluded, with the exception of terms related to countries (Spain and Greece) and the European Union. The selection of terms such as rural development and sustainability was maintained. The map with all the terms is presented in Figure 1, where it is possible to identify 4 groups.

The terms considered by the software for each group are presented in Table 2. By analysing Tables 1 and 2 and Figure 1, it is possible to observe that in group 1 the terms for the European Union are those with a greater number of occurrences (11) and women is the term with the most relevance (1.51). For group 2, the term combination presents a higher occurrence and the term mean has greater relevance. In group 3, the term the Netherlands has greater occurrence and the term multifunctional agriculture shows higher relevance. Finally, for group 4, the terms sustainability and difference are those with greater occurrence and relevance, respectively.

On the other hand, it is important to stress the proximity (relatedness) of terms such as: European Union, rural development and, for example, Greece in group 1; combination, place, attitude, industry and, for example, Spain for group 2; the Netherlands, transition, multifunctional agriculture and, for example, management for group 3.


**Table 1.** Number of occurrences and respective relevance of each term.

**Figure 1.** Map containing all the terms.


**Table 2.** Terms included in each group.

### *3.2. Literature Review*

In this subsection for the literature analysis, several scientific documents related to agricultural entrepreneurship were grouped considering the terms previously identified for the four clusters presented in the previous subsection, namely in Table 2.

### 3.2.1. The European Union and Agricultural Entrepreneurship

The agricultural context in the European Union is, indeed, complex, considering the diversity of realities between countries and regions; however, these frameworks are sometimes considered as benchmarks for other countries [5] because there are relevant examples. In turn, some contexts of agricultural development in Europe were also influenced in some parts of history by other global realities, specifically the American one [6] in globalized trade. These scenarios have their implications in the dynamics of entrepreneurship in the agricultural sector, characterized by their specific particularities within several economic activities [7]. Specifically regarding entrepreneurship amongst women, it is necessary to highlight its importance as a specific field with many potentialities, namely in female empowerment and in the promotion of local resources, activities and endogenous productions [8–10].

In a new paradigm of rural development in the European Union, the various forms of European support for the creation of small businesses provide interesting contributions towards entrepreneurship in rural regions [11]; however, some barriers, namely administrative ones, continue to complicate the process [12]. The perception of the numerous stakeholders concerning entrepreneurship can also condition its implementation [13]. Due to the European Union's support, specifically for multifunctional agriculture and market globalization, there has been a rise in entrepreneurial attitudes amongst European farmers [14], or this has at least had an influence on farm organization and farmers' perspectives [15]. In addition, investments in research and education (specifically educational training) from the several agricultural stakeholders have helped to promote innovation and entrepreneurship in the farming sector and this increases the performance in agriculture [16–19].

### 3.2.2. Some Concepts Associated with Farming Entrepreneurship

The attitude and perspectives of the several stakeholders (sometimes the entrepreneurship is seen as something distant, for others and that can disturb the status quo) are determinant for effective faming entrepreneurship and benchmarking may play an interesting role here [20], because it allows farmers to see other realities where entrepreneurial practices are implemented with success. In any case, the economic impact of innovation and entrepreneurship initiatives is not yet totally clear in some sectors and regions [21]. Nevertheless, professional skills and technological/entrepreneurial/developmental competences are fundamental for the promotion for entrepreneurship and innovation [22,23], namely in rural areas where job availability is limited [24]. Self-confidence and good planning for businesses and investments are crucial for success in entrepreneurship [25]. Social capital (social networks, participation in agricultural institutions and access to information) also has its importance [26]. Information, communication and technology (ICT) may be a useful way to promote and increase farming and rural entrepreneurship [27]; however, there is still some work to be done in these fields, namely to overcome several constraints that complicate their utilization by farmers and other agricultural stakeholders [28]. The same happens with other new technologies, such as with nanotechnology [29,30].

The local cultural and historical contexts condition the decisions of famers and this has an influence on the way the several activities are developed [31]. On the other hand, a social perspective of farmers and constructive personal characteristics can positively influence agricultural entrepreneurship [32]. In any case, the organization of employment and working conditions has its influence on the business and entrepreneurship dynamic [33]. Gender is another factor with an influence on entrepreneurship characteristics and motivation [34–36], as well as the age of the farmers in question [37]. The social

construct concerning the relationships between the rural and urban areas [38], sometimes influences the dynamics developed within the several frameworks.

In the context of crisis, agricultural entrepreneurship is, in general, an alternative way to reduce unemployment through self-employment [39,40].

### 3.2.3. Multifunctional Farming and Agricultural Entrepreneurship

There are several activities complementary to agricultural production that can be developed in rural areas [41], some even from within the farms, such as agro-tourism [42], organic farming [43,44] and direct marketing [45], where, for example, multifunctional agriculture may be an option, from a perspective of farmers, producers and entrepreneurs [46]. Nonetheless, the multifunctional agricultural and innovative activities in farms are not free from criticism in some European contexts [47]. Aquaculture in some specific contexts may bring contributions to this multifunctionality; however, some constraints should be carefully analysed and solved [48]. In this multifunctional role of farms, bioenergy production may be a good example, in favourable contexts [49–51], as well as heating entrepreneurship in rural Finland [52]. Another question is the multiple businesses of the farm owners [53,54] that may promote the adoption of innovative and entrepreneurial options. In turn, the agricultural sector is fundamental for industrial performance [55], namely for the industries closely related to agriculture.

Agricultural entrepreneurship is often related to the diversification of activities in farms, where networking is crucial to promote changes within businesses [56–58] and to promote exchanges in experience [59]. However, this networking between the several agricultural and rural stakeholders is not always symmetric and does not provide benefits for everyone [60]. Trust, engagement and reciprocity amongst the several agents are important for success [61] and for creating environmental entrepreneurship [62], as well as the concerns for ethics [63].

In any case, the multifunctional land organization needs interdisciplinary approaches involving the several stakeholders [64] with the same objectives [65]. However, sometimes the transition from family farming to entrepreneurial management is associated with more stress for farmers, where agricultural policies are one of the causes of stress [66–68].

### 3.2.4. Agricultural Sustainability and Entrepreneurship

The relationships between farming sustainability and agricultural entrepreneurship sometimes depend on the form in which the sector is organized [69]. The sustainability of farms is a concern in several countries, namely in those with more environmental problems [70]. A balanced and sustainable relationship between farms and their surrounding context is the main goal for several agricultural stakeholders [71]. Entrepreneurship may bring about interesting contributions to a balanced relationship among the economic, social and environmental dimensions [72,73]. For entrepreneurial and sustainable farms, institutions play a crucial role, namely the cooperatives [74] and the universities for technological transfer [75], as well as the rural policies [76,77]. Social entrepreneurship in rural regions [78] and social farming [79–81] are interesting perspectives for farming and rural sustainability. The solution of social and environmental problems are the main goals for several farmers [82], or, at least, they should be [83], namely those who practice agriculture in disadvantaged regions and receive subsidies to stay there. Another example of agricultural contributions towards sustainability is urban agriculture, as a form of food production, occupation for unemployed persons and for the creation of skills in a process of lifelong learning [84], where there are economic, social and environmental concerns [85].

The farmers who remain in the less affluent regions and smaller farms, some with low profitability, have determinant importance for regional sustainability [86]. The current world contexts call for virtuous circles in sustainable landscape management [87] and for new forms of dealing with these new realities [88], where agroforestry has its place [89]. Innovation in farms brings about important insights for sustainability and animal welfare [90,91]. The aversion to change and to implementing new approaches is one important barrier against improving overall sustainability [92,93].

### **4. Data Analysis for Agricultural Entrepreneurship in the European Union**

This section is aimed at complementing the analysis carried out before for the bibliometric approach. The data available in the Eurostat [94] will be analysed considering data which are more related with agricultural entrepreneurship in European Union regions (NUTS 2) and for 2016 (one observation by region), namely (Figure 2): the number of farms; the utilized agricultural area (hectare); the standard output (euro); the directly employed labour (annual working unit); and the number of farms whose household consumes more than 50% of the final production. These variables are important to understand the current and potential context around European agricultural entrepreneurship. On the other hand, these variables are important to perform the regressions with the Cobb–Douglas model (where the output is regressed, namely, in function of the labour and capital inputs). The standard output was considered as a dependent variable and the utilized agricultural area and the number of farms (as proxies for the capital) and labour were used as independent variables. Having said that, the database used does not present data for Italian regions in all variables considered and for some regions relative to the number of farms whose household consumes more than 50% of the final production.

Considering the relevance outlined by the literature towards the influence of aspects related to gender and age in agricultural entrepreneurship, the number of farms managed by males and females and by different age groups will also be analysed (Figure 3). In fact, as referred to before, gender is an important factor with an influence upon entrepreneurship characteristics and motivation [34–36], as well as the age of farmers [37].

To better understand the distribution of the values from the different variables across the European Union regions, shapefiles were considered obtained from the Eurostat [95] and worked upon with the QGIS [96] and with the GeoDa [97]. Several maps were created considering the GeoDa percentile methodologies. In these maps, the dark blue is for the percentile with lower values and the dark red is for the percentile with higher values. To improve the presentation of the figures, the overseas regions (Guadeloupe, Guyane, La Réunion, Mayotte and Martinique) were removed from the maps.

Figure 2 shows that the regions of Sud-Vest Oltenia, Sud-Muntenia and Nord-Est (all from Romania) are those with greater numbers of farms. It is also worth stressing that regions from Portugal, Spain and others from the nearby countries of Romania (Greece, Croatia, Hungary, Poland and Lithuania) have relatively high values for the number of farms. This context reflects, in some cases, the small size of the farms. It is on these smaller farms that innovation and entrepreneurship may play a relevant role.

On the other hand, the regions with a greater utilized agricultural area are located in Spain (Andalucía, Castilla-la Mancha and Castilla y León). Other regions, for example, from Spain, Portugal, France, Ireland, the United Kingdom and Romania have a relatively high agricultural area. In some of these regions, the large number of hectares is a consequence of the high number of farms, although with a low average area.

Relatively to the standard output, Andalúcia (Spain), Bretagne and Pays de la Loire (France) are the regions with better performance. However, when we look at the productivity of area (standard output by hectare), the higher values go to the Dutch regions. The Dutch farming sector is always a specific context, considering its land particularities that allow other kinds of agricultural organization. Concerning labour productivity, the higher values appear in regions from the United Kingdom and Denmark. This structure emphasizes farm performance in northern European regions.

**Figure 2.** Some further variables associated with agricultural entrepreneurship in the EU.

**Figure 3.** Number of farms disaggregated by gender and age of the managers.

The values for the labour force employed directly by the agricultural sector have a similar distribution as those verified and described by the number of farms, showing that a high number of farms, in some circumstances, is synonymous with small size and little mechanization. The Romanian regions with a higher number of farms and agricultural workers are, also, the same as those with a greater number of farms whose household consumes more than 50% of the final production. Malta, Madeira (Portugal) and Merseyside (the United Kingdom) seem to be the regions with a higher amount of labour per hectare. Other regions from Portugal (Norte, Centro and Algarve) present, also, relatively high values for labour by area. These high labour values per hectare are good from a social point of view, but they, also, show that there is work to be done to make the social and economic dimensions more compatible. In fact, some of these farms are located in mountainous or disadvantaged areas, managed from the perspective of the family, but, even here, measures can be taken for more adjusted management, claiming for agricultural innovation and entrepreneurship. In any case, the social and environmental contributions of these farms are unquestionable and clearly justify the financial support available in the European Union for these contexts. However, these subsidies could be more linked to more innovative and entrepreneurial management, maintaining the social and environmental role of this agriculture. Without this innovative approach for farms located in disadvantaged regions, the consequence, in the medium and long term, will be abandonment.

Regions from the countries of the southern European Union (Portugal, Spain, France and Greece) and from the countries of central and Eastern Europe seem to be those with more area, number of farms, labour and, in some cases, standard output. However, when the area and labour productivities are analysed, the greater performance is verified in regions from the northern countries (the Netherlands and Denmark).

Disaggregating the number of farms by gender and age of the managers, Figure 3 shows that Dutch farms are mostly managed by men and the Finnish agricultural units are managed by women. Women also play a relevant role in farm management in some regions of Germany, Poland, Austria, Romania, Latvia, Lithuania and in the north of Portugal and Spain. Considering the importance of women for more entrepreneurial management, their role should be rethought in the European Union, including from a policy perspective.

Younger managers (less than 25 years old) appear in farms of regions from Slovakia, as well as from France, Austria, Bulgaria, Poland, Finland and Ireland. Germany, Austria and France are the countries with regions where there are more managers aged 25–34 years old. It is also worth stressing that regions from Poland have a relevant number of farms managed by people between the ages of 25 and 34 years old. Regions from Austria and Poland are those with more farms managed by farmers in the 35–39 age group. The farms with managers between 40 and 44 years old appear more in Finnish regions and with 45–64 years old in regions from the central European countries (around France, Germany and the United Kingdom, for example). The greater number of farms with the oldest managers (more than 65 years old) appears in the Portuguese regions, as well as in regions from Romania, Bulgaria, Greece and the United Kingdom. Also, taking into account the role of young people in the agricultural sector, the several CAP instruments should be redesigned to be more effective in bringing youngers to the farms, namely, in countries where this context is more problematic.

### **5. Results for Cross-Section Regressions**

Considering the data available in the Eurostat database (all variables in logarithms), the standard output (euros) was regressed, through cross-section regression for 2016 and across the European Union regions (NUTS 2), in function of the labour directly employed (AWU), the utilized agricultural area (hectare) and number of farms (Equation 1), taking into account the Cobb and Douglas [98] model as a base. The utilized agricultural area and the number of farms were considered as proxies for the capital. The number of farms has the advantage of being considered by the database disaggregated by genders and ages, two important questions referred to by the literature.

$$
\kappa o\_{\hat{\jmath}} = \alpha\_0 + \alpha\_1 \text{labour}\_{\hat{\jmath}} + \alpha\_2 \text{ua}a\_{\hat{\jmath}} + \alpha\_3 \text{uf}\_{\hat{\jmath}} + \varepsilon \tag{1}
$$

where *so* is the logarithm of standard output, *labour* is the logarithm of labour directly employed, *uaa* is the logarithm of utilized agricultural area, *nf* is the logarithm of number of farms and *j* the European Union regions.

### *5.1. Stressing the Cobb-Douglas Model Adequacy for Agriculture*

It is important to highlight that the variables selection took into account the original Cobb–Douglas model, the literature review carried out before and to avoid problems of multicollinearity. The Cobb–Douglas model, with proper adjustments, was considered in analyses for the agricultural sector by several authors, namely for efficiency surveys with data envelopment analysis [99] or through stochastic frontier [100].

Specifically for agriculture in the European Union and with the most diverse approaches (including efficiency analysis), several studies considered the Cobb–Douglas model from the theory of production and taking into account different databases with micro- or macroeconomic statistical information. For example, Aggelopoulos et al. [101] explored data related to production factors and output from 80 Greek pig farms through the Cobb–Douglas developments. These authors highlighted the relevance of the results obtained with the Cobb–Douglas production function and the adequacy of this approach for the agricultural sector. The relevance of the model in terms of economic and agronomic dimensions was also stressed by Gornott and Wechsung [102]. Bille et al. [103] considered microeconomic data from the Italian Farm Accountancy Data Network and used variables such as area and labour as inputs.. Galdeano-Gomez et al. [104] used financial data from 56 Spanish farming-marketing cooperatives to analyse the externalities from sustainability on agricultural productivity, considering as a base the Cobb–Douglas model. Martinho [1] considered the Cobb–Douglas developments to analyse the common agricultural policy impacts on the dynamics of the Portuguese agricultural sector. Utnik-Banas et al. [105] analysed the technical efficiency from some Polish broiler production farms, taking into account as a base the Cobb–Douglas model and considering as inputs, for example, several costs, labour and fixed capital. In fact, in these models, namely when regression approaches are considered, it is important to limit the number of variables to avoid statistical problems, specifically multicollinearity. Typically, inputs which are taken into account are those such as labour and capital (or proxies for it) and other variables for extended versions.

### *5.2. Regressions and Results Analysis*

Several regressions were made through the cross-sectional methodologies, considering the Stata [106] procedures, some with the number of farms disaggregated by gender and age groups for management. The results are presented in Table 3. The labour and the utilized agricultural area are control variables from the Cobb–Douglas model and the variables related to the number of farms disaggregated by gender and age groups for the management are decision variables, considering the previous literature analysis and to capture the age and gender effects on the agricultural output.

Table 3 reveals that this is, indeed, explained positively by the number of agricultural workers and the area, whilst negatively by number of farms, showing that, in general, the regions with more farms have smaller scale economies and less output. To analyse the eventual problems of multicollinearity among the independent variables, the results were compared, for example, for models 1 and 2, which seems to suggest an absence of this statistical infraction. On the other hand, considering the Breusch–Pagan test for heteroscedasticity and the Ramsey RESET test, the more statistically consistent models are those with the number of farms disaggregated by age groups, specifically for younger managers.


Model 11 (Corrected), this model was corrected by changing the number of farms managed by persons aged over 65 to the number of farms managed by persons aged over 65 cubed.

In any case, the Variance Inflation Factor (test for multicollinearity) was run and, in fact, the results suggest the absence of collinearity (all the results are below 10). On the other hand, to solve the problems related with heteroscedasticity and with omitted variables, several alternatives have been tried, namely through the translog model; however, in the simulations performed, the results are not statistically significant. In this way, it was opted to transform, in the models with problems for the Breusch–Pagan and Ramsey RESET tests, the variable related to the number of farms. In the beginning, for each model with heteroscedasticity and with the omitted variable problems, the variables from the translog model were simulated and some of the best statistical results were those where the respective number of farms variable was squared. However, this transformation was not enough for the models with statistical problems and related to farms managed by older farmers. In these cases, the respective number of farms variable was cubed. With these transformations, it was possible to confirm the impact from the several variables in the standard output and to compare results between models.

Model 6, for farms with managers from 25 to 34 years of age, this seems to be where (amongst the younger generations) the number of farms influences the standard output less negatively (−0.270). These findings suggest that the problems related to the number of farms (and probably related with the scale economies of the farms and their performance) may be better solved when the farmers are younger people, with an age ranging from 25 to 34 years. For balanced rural development and for farm sustainability, where the questions related to the economic dimension may not be the only determinant aspect in farm management, it will be important to attract younger, more qualified, innovative people and with more entrepreneurial capacities. In fact, innovation and entrepreneurial skills allow for a deeper exploitation of the multifunctionality of farms, promoting both social and environmental dimensions, with interesting returns for the farmer, as referred to before in the literature review.

In turn, farms which are managed by women influence the agricultural output more negatively than those managed by men. This shows that there still remains a long way to go across the whole of the European Union agricultural sector towards improving the role of women in farms, namely promoting more empowerment in resolving historical and sociological contexts.

Finally, amongst the older farmers (more than 45 years old), the farms managed by people between 55 and 64 were those with more negative impacts on the agricultural performance. The experience accumulated by the older farmers is important for agricultural dynamics; however, sometimes age is an impediment for innovation and entrepreneurship.

### **6. Discussions**

This study was designed and planned with the aim of analysing entrepreneurship in the European Union agricultural sector, exploring the literature available on the Web of Science platform and related to the three topics: entrepreneurship, agriculture, and the European Union. This literature was further explored with bibliometric approaches through the VOSviewer software. These topics were also explored statistically, considering empirical methodologies.

The literature review shows that some terms such as gender, age, multifunctional agriculture, sustainability and rural development, for example, are important expressions when they are analysed as topics related with agricultural entrepreneurship in the European Union. These are relevant insights because the role of women and young people may, indeed, make a difference on European Union farms. These aspects may, in certain circumstances, be obvious findings, but have not been yet fully addressed by the several stakeholders, namely policymakers. In fact, women have an increasing role in society and, consequently, in farm management. Younger people have more training and inclination to use new technologies, namely those related to information and communications technology. If we want to do things differently and with greater return, then multifunctional agriculture and innovation are interesting approaches. In all these contexts, we cannot forget sustainability and a balanced development, where we unite economics, the environment and social aspects. On the other hand, countries such as The Netherlands are, also, important terms in this kind of analysis. In these frameworks related to agricultural entrepreneurship, the environments across the European

Union countries are really quite diverse, where the common agricultural policy may have a more effective influence in reducing asymmetries. Indeed, the bibliometric analysis, namely the information presented in Table 1, reveals that the questions related to agricultural policies are frequently referred to in the literature relative to agricultural entrepreneurship in the European Union (the term "policy" has an occurrence of 13, a number which resides amongst the higher values), but with a relatively low relevance (0.51). Considering these findings, it will be important that the several stakeholders, namely policymakers, design policy instruments which have a greater impact on agricultural entrepreneurship in the different European member-states, so as to increase the relevance of the interrelationships between agricultural policies and entrepreneurship.

The data analysis shows that European Union countries from the southern and eastern regions have a greater number of farms, more utilized agricultural area and more labour, and, in consequence, in some contexts, have smaller farms and with fewer technological resources. These contexts require more adjusted strategies that promote more entrepreneurial management, because the social and environmental dimensions of these farms are important, but should be balanced with the economic dimension. However, it is those farms from the north which have higher labour and area productivities. Maybe, in some cases, these farms could be considered benchmarks for the remaining European sector. In turn, regions from Finland, for example, have more women and younger people as managers. The regions from Portugal are those where the managers are older (more than 65 years old). Indeed, these questions should be addressed in the design of agricultural policies, namely in further promoting the importance of women and younger people in farm management and improving farming productivities in certain member-states. Low productivities may be an obstacle for creating more added value and consequently to bring about more qualified, innovative and entrepreneurial people.

The results obtained with the cross-section regressions, considering a model based on a Cobb–Douglas production function, reveal that the standard output is positively influenced by the agricultural workers and the area, whilst it is negatively influenced by the number of farms. Again, these considerations may be considered obvious, but they continue to deserve special attention from policymakers, because of the productivity weaknesses. In turn, improvements are needed in the output of small farms. Entrepreneurial approaches may here bring relevant contributions. In addition, the results confirm the importance of the age groups on the farms' performance. Another question to be taken into account by the policymakers is to clearly distinguish between agricultural economic entrepreneurship, agricultural social entrepreneurship and agricultural environmental entrepreneurship. It will be important, also, to define the sectors and regions where each one of these agricultural entrepreneurships is more likely. In fact, the literature shows that relative to agricultural entrepreneurship, economic aspects are determinant, but so too are sustainability and multifunctionality dimensions for integrated rural development. Empowerment for female farmers and bringing together farmers' experience and innovation will be another big challenge.

### **7. Conclusions**

As a final remark, it is worth stressing that entrepreneurship dynamics in the agricultural sector are influenced by its particularities and often follow a pattern different to those verified in other economic sectors, such as industry. For example, a great number of firms in industry are often seen as a sign of good dynamics, whereas in the agricultural sector a high number of farms negatively influence the regional standard output (because a greater number of farms is, frequently, synonymous with lower scale economies and lower dynamics).

Thinking about the agricultural and rural policies, this study has brought to light some interesting contributions to the discussion about these topics, namely when highlighting the importance of more effective strategies that promote the several dimensions of agricultural entrepreneurship (economic, social and environmental) within the European Union. This is, in fact, a gap in the CAP instruments that could be redesigned to deeper address these dimensions. In the present version of the CAP strategies, the environmental dimensions, for instance, are clearly addressed, but innovation and

entrepreneurship could be more specific. For example, something like the "Greening" instrument of the first pillar could be created for farming entrepreneurship. There are already incentives to attract young farmers (in the first and second pillars of the CAP), but they could be redesigned to be more effective, namely, to maintain these farmers for longer periods in the sector. Another aspect in which the agricultural policies could be rethought is regarding the role of the women in the sector and their empowerment.

On the other hand, the importance of younger generations and women for agricultural entrepreneurship has also been stressed. However, it will be important to compare these results, in future studies, with those that may be obtained with other topics and other variables. For example, analysing the relationships between agricultural entrepreneurship and the activities of R&D or further analysing the interrelationship between agricultural entrepreneurship and agricultural, agro-environmental, and rural policies in the European Union, may provide interesting future contributions. Agricultural entrepreneurship has great potential to be explored. The consideration of other variables, some without information in the main statistical databases, but which may be obtained through the implementation, for example, of surveys in representative farms, will allow for a deeper examination of the technical orientation and business model for farms. Considering analysis by European Union country and other approaches in order to build the variables (ordinal variables, for example) could be another interesting suggestion for future research.

In any case, to address these and other approaches in future studies related with agricultural entrepreneurship, for example, some recent reviews such as those performed by Fitz-Koch et al. [107], Wuepper and Lybbert [108] and Dias et al. [109] are suggested. Other studies, such as the following, may also bring about further interesting insights: Morris et al. [110] and Dias and Franco [111].

**Funding:** This work is financed by national funds through FCT - Fundação para a Ciência e Tecnologia, I.P., under the project UID/Multi/04016/2019. Furthermore we would like to thank the Instituto Politécnico de Viseu and CI&DETS for their support. This work is supported by national funds, through the FCT – Portuguese Foundation for Science and Technology under the project UID/SOC/04011/2019.

**Acknowledgments:** I would like to thank the Erasmus+ Programme of the European Union, the Polytechnic Institute of Viseu, Portugal, and the University of Burgos, Spain.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

### **References**


© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article*
