*4.1. QSAR*

One of the main challenges arising when producing a hydrolysate from a mix of proteins is to be able to elucidate which one of the multiple structures present in the hydrolysate is responsible for the biological effects appreciated during in vitro or in vivo tests and establish their mechanisms of action. When studying antioxidant peptides and only considering dipeptides, there can be potentially 400 different structural combinations accounting for all the possible combinations of 20 amino acids. However, when studying oligopeptides (2–20 amino acids in length) [99], this variability can reach levels of over 1.07 × 10<sup>39</sup> possible structural combinations and thus, the use of bioinformatic methods, such as QSAR, can support the identification of bioactive peptides [100].

QSAR is an in silico method which takes peptides and their biological activity from pre-existing databases, such as BIOPEP, aiming to understand the link between these structures and their activity towards different biological targets [101]. The process flow of QSAR is represented in Figure 2.

**Figure 2.** Process flow of QSAR applied to bioactive peptides. Content of the image adapted from Nongonierma and FitzGerald [101].

When sourcing bioactive peptides of interest in a QSAR model it is important to note that some peptides work in different ways of inhibiting their targets, such as competitive, non-competitive, and un-competitive ways. If a particular peptide library lacks the specific type of inhibitory action of these peptides then it completely skews the ability of the QSAR to estimate IC50 [101]. These peptides are used to create a model which aims to identify the key commonalities of the structure and composition of these peptides and link their composition to a bioactivity of interest. A portion of the peptides from the data set is randomly selected and left out of the training set; these are the test set and will be used at a later stage of the process. These peptides are used to identify the causative structure that allows for these interactions to occur, allows the identification of peptides with the most advantageous structural features, and establishes prediction scores for these structures [101]. These known peptides teach the software what to look for when unknown peptides are plugged into the equation to be identified and the peptides used to create the QSAR model should be a similar size to the peptides being analyzed [101].

Kumar et al. [102] researched novel ACE inhibitory peptides and the massive variation in these results due to the variable length of peptides with ACE inhibitory activity. This author chose the libraries where peptides with the same mechanisms of action for a particular bioactivity were classed by size, and QSAR models should be produced for each class to increase the accuracy of the results. The authors classified ACE inhibitory peptides as <3 amino acids, small peptides as 4–6 amino acids, medium peptides as 7–12 amino acids and large peptides as >12 amino acids [102].

Different scales and descriptors can be used to accurately define the features that make a certain peptide bioactive. The correct choice of descriptors is important as an excessive number of descriptors will cause background noise, causing an overfitting of data and loss of predictive accuracy [103]. These descriptors are usually physiochemical characteristics, such as the scale described by Hellberg et al. [104] which uses 29 physiochemical descriptors to analyze the amino acids. The authors grouped these descriptors into three main components known as the 3 Z approach which explains hydrophilicity (Z1), steric properties (Z2) and electronic properties (Z3) [104]. This approach was improved further by Sandberg et al. [105] when characterizing 87 amino acids by adding two further components—Z4 and Z5—to describe other properties of the amino acids, such as heat of formation, electronegativity and electrophilicity.

From these rankings, multiple regression models can be performed to evaluate the bioactive potency of peptides on the basis of the interaction between the peptide and its target. These tests are normally performed against a positive control, such as glutathione for antioxidant peptides, and if the tested compounds score similar or even higher than the control, those compounds may have a better potential for in vitro testing [101]. After the model has been run and the IC50 predicted, these data require validation using the test set of peptides that were set aside at the beginning. Moreover, the models will have to be confirmed by testing the highest-ranking peptides against a laboratory-based assay and ensuring that the IC50 predicted by the model matches the experimental data [101]. If the model predicted the activity accurately, then the peptides which are of interest to be tested will be ranked by potency, synthesized, and experimentally tested to compare the results with those of the QSAR model [101].

The third step in the process of making a QSAR is the selection of a mathematical model to relate the physiochemical characteristics and position of the amino acids in the C and N terminus of the tested peptides with those of the peptides with known bioactivity. The models chosen are usually partial least square regression (PLSR), iterative double least square (IDLS), artificial neural networks (ANN) and multiple linear regression (MLR). When using these models, it is important that the model chosen accounts for whether the peptides being screened for activity will be of the same length as those described in the training set or if they account for peptides with a variety of different sizes [100,101].

## *4.2. Molecular Docking*

Molecular docking is a stage to further identify the interactions between the target and substrate, complementing the QSAR modeling as it will provide three-dimensional interactions between the ligand; in this case, the peptide and the target to which the peptide is binding to help to understand further their inhibitory effects [106].

Angiotensin-converting enzyme (ACE) has been a focus of molecular docking studies in relation to cardiovascular diseases to further understand the action of peptides working within its domains. Mirzaei et al. [107] used the crystal structure of human ACE complexed with inhibitor lisinopril as a template for docking studies using the software HADDOCK (see Figure 3). The authors removed all water molecules and the inhibitor from the structure while retaining the zinc and chloride atoms in their active site before proceeding with the docking [107]. As previously stated, a disadvantage of QSAR is that it can be dependent on the amount of information granted to it by the database. An example of this is not identifying if the peptide is competitively binding to the active site or if it is having another effect on the enzyme in its entirety. Using molecular docking on the highest-ranking peptides from QSAR will show their overall binding affinity to the active site; this will hopefully mitigate any problems caused by the lack of information from these databases before synthesizing the highest-ranked peptides and final laboratory confirmatory testing.

**Figure 3.** Representation of the molecular docking results (3D and 2D) of the ACE-inhibitory peptides VL-9 ( **A**) and LL-9 (**B**). Color codes are as follows: blue (Van der Waals bonds), orange (salt bridge) and green (conventional hydrogen bond). Image originally published by Mirzaei, Mirdamadi, Ehsani and Aminlari [107] in Elsevier.

#### **5. Opportunities and Challenges**

There are huge market opportunities for algae as a source of protein due to the environmental benefits [108] associated with its production as well as their untapped potential as source of food and food ingredients for the growing world's population. However, there are still challenges, mainly related to the creation of optimum, reproducible, and sustainable protein extraction processes, mainly limited by the variable composition of the biomass as well as the presence of rigid cell walls of a variable chemical nature. Moreover, all the pre-treatments of the biomass, and the new emerging technological treatments, will have to demonstrate its economic viability in order to be adopted by industry, allowing to scale-up production and expand the use of these approaches.

In addition, further studies evaluating the activity and the chemical structure of peptides will be necessary to build upon current peptide libraries. The choice of peptide library is extremely important for the validity of the QSAR for testing unknown peptides and

molecular docking studies. There are massive opportunities in the search for new peptide alternatives to be used as nutraceuticals with fewer adverse side effects than conventional treatments for multiple diseases. However, further studies and clear mechanisms of action have to be elucidated for these applications to achieve their potential.

**Author Contributions:** Conceptualization, M.G.-V.; writing—original draft preparation, J.O.; writing— review and editing, M.G.-V., S.M. and B.K.T.; supervision, M.G.-V., S.M. and B.K.T.; funding acquisition, B.K.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Teagasc (grant number 0018), BiOrbic SFI Bioeconomy Research Centre funded by Ireland's European Structural and Investment Programmes, Science Foundation Ireland (16/RC/3889) and the Department of Agriculture Food and the Marine (DAFM) under the umbrella of the European Joint Programming Initiative "A Healthy Diet for a Healthy Life" (JPI-HDHL) and of the ERA-NET Cofund ERA HDHL (GA No 696295 of the EU Horizon 2020 Research and Innovation Programme).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.
