AMPScanner\_v1 only considers peptides ≥ 10 AA for the predictions.

Our models largely outperformed AMP-Scanner vr.1, particularly in terms of precision when detecting the specific types of Gram-staining types (Gram+ and Gram−). Regarding the prediction of broad-spectrum peptides, both methodologies delivered the same precision. However, in this case we greatly surpassed the sensitivity of AMP-Scanner vr.1, thus making more accurate predictions overall. Notably, our multi-classifier showed the best performance for the three classes of Gram-staining types, thus providing a valuable complement to the identification of antibacterial peptides.

The comparison with the state-of-the-art tools showed that, together with ABP-Finder, the top-ranked methods in our tests were *i*AMPred, AMP-Scanner vr2, and AMPDiscover. These approaches were thus confirmed as suitable tools for ABP identification. Nonetheless, ABP-Finder outperformed these predictors, particularly in terms of precision. Importantly, as a distinctive feature, we complement our outcome with an estimation of the Gramstaining type of the putative targets, which can be further pinned down to specific bacterial species by considering that our models were trained with data from nine representative targets (see Dataset section). Furthermore, unlike previously published tools [24–30], we provide an estimation of our applicability domain, which delivers reliability to the predicted outcome.

#### *3.5. ABP-Finder Web Server*

Our emphasis in the application of regulatory principles to the development of MLbased predictors relies on our commitment to offer a freely accessible and well-maintained tool to reliably screen peptide libraries. To this end, we implemented our models in a userfriendly web server named ABP-Finder (https://protdcal.zmb.uni-due.de/ABP-Finder/ (accessed on 16 November 2022)). This tool allows screening seamlessly thousands of peptides with a single submission job. The ABP-Finder server delivers for each entry a prediction of the antibacterial function, as well as whether each specific peptide is or not within the AD of our models. ABP predictions are also accompanied by a Gramstaining-based estimation of the putative targets of the antibacterial peptides. Furthermore, the web server offers the functionality of screening regions within a long amino acid sequence to identify promising antibacterial fragments. This application of ABP-Finder's models was recently leveraged by us for the identification of antibacterial motifs within β2-microglobulin [60].

#### *3.6. Virtual Screening of the Human Urine Peptidome*

In this section, we describe the successful application of ABP-Finder to screen a peptide library obtained from the human urine peptidome. The library contains 4696 endogenous peptide fragments, detected in the Core Facility Functional Peptidomics at the University Hospital in Ulm, Germany. The peptide library was screened for antibacterial activity following the workflow depicted in Figure 5.

**Figure 5.** Schematic representation of the virtual screening process carried out on a library of peptides from the human urine peptidome.

ABP-Finder was used to score the original 4696 peptides of the library, obtaining 43 candidates with a probability score larger than 0.6, and within the applicability domain of the model. Subsequently, Blastp [61] was used to cross-align these peptides with known ABPs of our training samples. From there, we excluded two hits that showed 100% identity and coverage in the alignment with previously reported ABPs and therefore did not have value as newly identified peptides. Afterward, we clustered the peptide sequences using CD-Hit [62] with a cut-off of 90% of identity, and minimum coverage of the shortest sequence in the alignment of 90%. From this analysis, eleven clusters were obtained, from which we extracted the shortest sequence as representative of each cluster. Three polyproline peptides, containing none or only one residue other than proline were finally discarded because we considered them unsuitable as candidates for possible lead compounds due to synthetic unfeasibility and the highly homogenous character of their sequences. The final eight candidates (Table 7) were experimentally evaluated using an agar diffusion assay, leading to one active hit, Urine-3462, against *Pseudomonas aeruginosa.*

**Table 7.** The resulting eight ABP candidates from the virtual human urine peptidome screening and some of its global sequence descriptors. Global peptide descriptors were calculated using the Peptide Design and Analysis Under Galaxy (PDAUG) package [63].


*#* Total Molecular Charge given at pH = 7. \* Eisenberg scale. **&** GRAVY (Grand Average of Hydropathy) is calculated as the sum of hydropathy values of all the amino acids, divided by the number of residues in the sequence [64]. Positive GRAVY values indicate hydrophobic; negative values mean hydrophilic.

#### *3.7. Experimental Evaluation of the Reduced Set of Peptides from the Human Urine Peptidome*

To test the antimicrobial potential of the eight candidate peptides identified with ABP-Finder, a radial diffusion assay was carried out, allowing the sensitive detection of antibacterial activity. Activity was determined against various Gram-positive and Gramnegative bacteria species, including *Bacillus subtilis*, *Streptococcus agalactiae*, *Staphylococcus aureus* (MRSA), *Escherichia coli*, *Pseudomonas aeruginosa*, *Klebsiella pneumoniae* (ESBL). While the peptide Urine-3462 was active against *Pseudomonas aeruginosa*, no relevant antibacterial activity could be detected at concentrations of 100 μg/mL and 1 mg/mL of the other peptides. Urine-3462 exhibited a dose-dependent growth of inhibition of *Pseudomonas aeruginosa*, comparable to the inhibitory activity observed for the well described antimicrobial peptide LL37 [54,65], which served as a positive control (Figure 6).

#### **4. Conclusions**

Antibacterial peptides are promising candidates for a new generation of antibiotics designed to address the challenging problem of drug resistance in bacteria. With ABP-Finder we provide a tool that delivers top-ranked predictions as established by several comparisons with prominent examples of the state-of-the-art ABP predictors. Remarkably, ABP-Finder produces the most precise predictions in validation tests with known data. Furthermore, unlike other tools of the state-of-the-art that were used for comparison in this work, we present a successful application of the method in a real-life scenario dealing with the massive screening of unlabeled peptides from the human urine peptidome.

We implemented this RF-based predictor in the user-friendly and freely accessible web server ABP-Finder, which was also leveraged in the identification of the new ABP hit from a large library of peptides derived from the human peptidome.

In this way, the combination of in silico screening and experiments confirmed the applicability of ABP-Finder as a screening tool for the early steps of the design of peptidebased antibiotics. To the best of our knowledge, no other publicly available ABP predictor has delivered a similar study leading to the successful identification of an active hit from tens of thousands of unlabeled peptides. Further developments of our predictor will include its combination with target-specific models. This will allow improving the design of broadspectrum candidates, as well as to orient the selection of targets in massive screenings of bioactive peptides.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/antibiotics11121708/s1, The Supporting Information is available free of charge and includes the project files containing the setup used to compute all the descriptors employed in this work, and the AD of the datasets.

**Author Contributions:** Y.B.R.-B. worked on the conceptualization, data curation, formal analysis, methodology, project administration, validation, visualization and writing of the manuscript. G.A.-C. and A.A. worked mainly on the conceptualization, formal analysis, funding acquisition, supervision, validation, writing and reviewing the manuscript. S.R.-M. carried out the software development. L.- R.O., B.S. and J.M. were responsible for experimental investigation, resources, supervision, validation and revision of the manuscript. E.S.-G. was responsible for the conceptualization, funding acquisition, resources, supervision, writing and reviewing the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the German Research Foundation (DFG) through the CRC 1279-Project number 316249678 to E.S-G., J.M. and B.-S., E.S.-G. was also supported by the DFG under Germany´s Excellence Strategy—EXC 2033—390677874—RESOLV, by the DFG—Project-ID: 436586093 and by the CRC 1430—Project-ID: 424228829. GACh and AA were supported by the Strategic Funding UIDB/04423/2020 and UIDP/04423/2020 through national funds provided by the Portuguese Foundation for Science and Technology (Fundação para a Ciência e a Tecnologia—FCT).

**Data Availability Statement:** The web server presented in this manuscript, which evaluates the described models for ABP prediction and Gram staining type classification, is freely accessible at: https://protdcal.zmb.uni-due.de/ABP-Finder/index.php (accessed on 16 November 2022). The StarPep database, which was the source for all the in-silico data used to train and validate our models is accessible at: http://mobiosd-hub.com/starpep (accessed on 4 February 2020). WEKA was the machine-learning framework used for feature selection, hyper-parameters optimization, model training and validation steps. This program can be downloaded and installed following the guidelines at: https://waikato.github.io/weka-wiki/downloading\_weka/ (accessed on 16 November 2022). ProtDCal descriptors, used to encode the peptide sequences into numeric vectors, can be computed directly from the web server: https://protdcal.zmb.uni-due.de/pages/form.php (accessed on 16 November 2022), using the project files given as Supplementary Material of this manuscript. The project files gather all the configuration of parameters used to obtain the initial set of descriptors screened in this work. The training, development, validation, and test datasets, as well as the boundaries of the applicability domains for the training and production models are included as part of the Supplementary Material of this work. The assessment of the applicability domain is also a feature implemented in our web server (ABP-Finder), therefore it is automatically done and reported by the server for any evaluated peptide.

**Acknowledgments:** Y-B.R.-B. acknowledges the CRC 1279 of the University Hospital Ulm for a Creative Young Researcher Award for the development of algorithms to identify bioactive peptides. L.-R.O. is part of and would like to acknowledge the International Graduate School in Molecular Medicine Ulm (IGradU).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

