# **Proteases From Basic Structure to Function to Drug Design as Targeted Therapy**

Edited by Hang Fai Kwok, Christopher Shaw and Brian Walker Printed Edition of the Special Issue Published in *Biology*

www.mdpi.com/journal/biology

## **Proteases—From Basic Structure to Function to Drug Design as Targeted Therapy**

## **Proteases—From Basic Structure to Function to Drug Design as Targeted Therapy**

Editors

**Hang Fai Kwok Christopher Shaw Brian Walker**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Hang Fai Kwok Department of Biomedical Sciences Faculty of Health Sciences University of Macau Taipa Macau

Christopher Shaw School of Pharmacy Queen's University Belfast Belfast United Kingdom

Brian Walker School of Pharmacy Queen's University Belfast Belfast United Kingdom

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Biology* (ISSN 2079-7737) (available at: www.mdpi.com/journal/biology/special issues/proteases).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-2575-4 (Hbk) ISBN 978-3-0365-2574-7 (PDF)**

© 2021 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


## **Preface to "Proteases—From Basic Structure to Function to Drug Design as Targeted Therapy"**

In the two last decades, proteases have constituted one of the primary and important targets in drug discovery. The U.S. FDA has approved more than 12 protease therapies in the last 10 years, and a number of next-generation or completely new proteases are under clinical development. Protease inhibition strategies are one of the fastest expanding areas in the field of drugs that show considerable promise. This Special Issue will focus on the recent advances in the discovery and development of protease inhibitors, covering the synthesis of protease inhibitors, the design of new chemical entities acting as inhibitors of special/particular types of proteases, and their mode of actions (Frolova et al. 2020; Slapak et al. 2020; Kunnapuu et al. 2021). In addition, the new applications of these interesting ¨ compounds/biomolecules and their limitations have been discussed and described (Wang et al. 2020; Bartosov ˇ a-Sojkov ´ a et al, 2021). ´

The editors are grateful to all the authors who contributed to this Special Issue "*Proteases—From Basic Structure to Function to Drug Design as Targeted Therapy*". They are also mindful that without the rigorous and selfless evaluation of the submitted manuscripts by external peer reviewers/expertise, this Special Issue could not have happened. Moreover, the editor (H. F. KWOK) gives thanks for the support from the Science and Technology Development Fund of Macau SAR (FDCT) (file no. 0055/2019/A1 and 0010/2021/AFJ) and the Faculty of Health Sciences (FHS) University of Macau. Finally, the valuable contributions, organization, and editorial support of the MDPI management team and staff are greatly appreciated.

> **Hang Fai Kwok, Christopher Shaw, Brian Walker** *Editors*

## *Review* **Unravelling the Network of Nuclear Matrix Metalloproteinases for Targeted Drug Design**

**Anastasia S. Frolova <sup>1</sup> , Anastasiia I. Petushkova <sup>1</sup> , Vladimir A. Makarov <sup>1</sup> , Surinder M. Soond <sup>1</sup> and Andrey A. Zamyatnin Jr. 1,2,3,\***


Received: 7 November 2020; Accepted: 17 December 2020; Published: 19 December 2020 -

**Simple Summary:** Nuclear matrix metalloproteinases are emerging to have distinct functions in a number of pathological conditions and physiological processes. In this article, we review what progress has been made in this area of research and discuss their potential in being targeted for future therapeutic design.

**Abstract:** Matrix metalloproteinases (MMPs) are zinc-dependent endopeptidases that are responsible for the degradation of a wide range of extracellular matrix proteins, which are involved in many cellular processes to ensure the normal development of tissues and organs. Overexpression of MMPs has been observed to facilitate cellular growth, migration, and metastasis of tumor cells during cancer progression. A growing number of these proteins are being found to exist in the nuclei of both healthy and tumor cells, thus highlighting their localization as having a genuine purpose in cellular homeostasis. The mechanism underlying nuclear transport and the effects of MMP nuclear translocation have not yet been fully elucidated. To date, nuclear MMPs appear to have a unique impact on cellular apoptosis and gene regulation, which can have effects on immune response and tumor progression, and thus present themselves as potential therapeutic targets in certain types of cancer or disease. Herein, we highlight and evaluate what progress has been made in this area of research, which clearly has some value as a specific and unique way of targeting the activity of nuclear matrix metalloproteinases within various cell types.

**Keywords:** matrix metalloproteinase; extracellular matrix; nuclei; cancer; apoptosis; immune response

#### **1. Introduction**

Matrix metalloproteinases (MMPs) are involved in the degradation of extracellular matrix (ECM) proteins and regulate many fundamental cellular processes during normal bodily development and function [1]. As the ECM is important in maintaining the mechanical and biochemical properties of tissues, its normal turnover and regulation by MMPs is necessary to permit multiple functions, as in the cleavage and activation of signaling molecules, cellular differentiation, and wound healing [2–7]. However, dysregulation of MMP activity can contribute to a variety of pathological conditions. For example, some have been seen to modulate matrix erosion in osteoarthritis and rheumatoid arthritis, whereas expression of others is associated with the formation of atherosclerotic lesions, platelet

aggregation, and the regulation of factors associated with cardiovascular disease [8,9]. Predominantly, the roles of MMPs in malignant tumor initiation, metastasis, and angiogenesis have received the greatest attention and which have highlighted them as good potential therapeutic targets for the treatment of certain types of cancer [9].

To date, 26 human MMP proteins have been identified, which belong to the M10 family of metallo-endopeptidases [10]. Based on substrate specificity, MMPs can be further categorized into collagenases (MMP-1, MMP-8, MMP-13, and MMP-18), gelatinases (MMP-2 and MMP-9), stromelysins (MMP-3, MMP-10, MMP-11, and MMP-17), matrilysins (MMP-7 and MMP-26), membrane-type MMPs (MMP-14, MMP-15, MMP-16, MMP-17, MMP-24, and MMP-25), and others (MMP-12, MMP-19, MMP-20, MMP-21, MMP-22, MMP-23, MMP-28, and MMP-29) [1]. Generally speaking, they are expressed by a broad range of cell types, such as epithelial cells, fibroblasts, osteoblasts, endothelial cells, vascular smooth muscle, macrophages, neutrophils, lymphocytes, and cytotrophoblasts [1].

Structurally, MMPs share a common protein domain structure (Figure 1). For most MMPs, the main components are a signal peptide (that directs synthesized protein into the secretory pathway), a highly conserved amino-terminal pro-domain, a catalytic domain that contains a zinc ion binding site, a linker domain, and a carboxyl-terminal hemopexin-like domain (HEX), that determines substrate specificity and localization and contributes to the enzymatic activity of MMPs [11].

**Figure 1.** Domain structures of human matrix metalloproteinases (MMPs). Here the domain structures of nuclear MMPs (nMMPs) and MMPs, which have not been found in the nuclei but possess nuclear localization signals (NLSs) (placed in brackets), are presented. NLSs are indicated by the red asterisks (if their nuclear-trafficking properties have been proven experimentally) or by the orange asterisks (if the NLS has been identified by bioinformatics alone). Horizontal lines indicate the isoforms of MMPs, which have been found in nuclei. SP, signal peptide; Pro, pro-domain; FC, furin cleavage site; FD, fibronectin domain; HEX, hemopexin-like domain; TM, trans-membrane domain; CT, cytoplasmic tail; GPI, glycosylphosphatidylinositol.

These proteases are synthesized in the form of pre-pro-MMPs, with their enzymic activation occurring through the process of maturation as the proteins progress through the secretory pathway [1]. The first step of maturation is removal of the secretory signal peptide following the course of protein translation, giving rise to an inactive pro-MMP in which the inhibition of the catalytic site occurs through its resident Zn2<sup>+</sup> ion binding a cysteine residue within the "cysteine switch" motif (PRCGXPD) present in the pro-domain [12]. Activation of the pro-MMP may occur in a variety of different ways, arising in a

number of MMP forms containing the full-length pro-domain, a processed form of the pro-domain, and in MMPs lacking the pro-domain. In the former two MMP derivatives, conformational changes caused by mechanical or chaotropic agents can lead to the disruption of the Zn2+-Cys interaction resulting in pro-MMP activation in the absence of pro-domain cleavage [13]. Moreover, processed cleavage and removal of the pro-domain, by plasmin or trypsin, can mediate a conformation change of the protease resulting in full activation of the MMP intermediate [13]. Normally, full cleavage of the pro-domain is either mediated by the furin pro-protein convertase in the trans-Golgi network, auto-catalytically, or by other MMPs at the cell's surface, either within the ECM or the nucleus [12,14,15]. The activity of MMPs can also be regulated by post-translational modifications, such as glycosylation, phosphorylation, and by glycosaminoglycans (GAGs). For example, glycosylation can stabilize a complex between MMP-14, TIMP2, and pro-MMP-2 as a step necessary for the cell-surface activation of MMP-2 [16]. Alternatively, glycosylation can promote MMP-9 secretion and activation, while also stabilizing the formation of MMP-17 dimers [17]. As an important step for the classical mode of MMP activation, a number of recent studies have also reported that some MMPs are also responsive to redox-mediated activation [18].

The tissue inhibitors of metalloproteinase (TIMPs) have also gained significant importance over the years based on their developmental role in normal tissue homeostasis and disease progression and their abilities to modulate MMP protease activity [19]. Four TIMPs (TIMPs 1-4) have been identified, and their mechanisms of MMP inhibition have been established through a number of structural studies. Residues 1–4 of the TIMP-1 amino-terminal domain interact with the primed side of the MMP binding pocket, where Cys-1 can coordinately bind the catalytic site Zn2<sup>+</sup> ion. Simultaneously, five residues (spanning amino acids 66–70) from TIMP-1 can occupy the non-primed site [20]. These potential modes of binding were also shown to be highly conserved among TIMP-2, TIMP-3, and TIMP-4 [20,21]. Biologically, elevated TIMP expression levels have been shown to contribute to enhanced ECM accumulation and deposition, while reduced TIMP expression leads to enhanced matrix proteolysis, thus highlighting their importance in modulating ECM dynamics and plasticity [1,22]. TIMPs can also form non-inhibitory pro-MMP/TIMP/MT-MMP complexes, as in the instance of TIMP-2 complexing with MMP-14 and which can activate pro-MMP-2 in human fibrosarcoma, breast, and melanoma cell lines [16]. While TIMPs are generally found within the ECM, a number of studies have demonstrated that they may also reside in the nucleus of cells, as seen for TIMP-1 [23–25].

Over the years, matrix metalloproteinases have been pursued as good targets for therapeutic development [9], and have the potential to be targeted at several levels of their synthesis and maturation, the proposed stages of which include inhibition at the transcriptional level, during zymogen activation, and at the level of substrate catalysis by the active enzyme [9]. At the moment, there are MMP-directed targeted strategies coming into fruition for the treatment of inflammation, heart disease, lung diseases, and ischemic stroke [26–30]. Simultaneously, the search for more specific and better MMP inhibitors is still ongoing, driven by limited options for targeting specific MMPs within a clinical setting [31]. Consequently, novel strategies embodying greater specificity and efficacy have taken on a greater priority in targeting MMPs.

Over the last ten years, the nuclear localization of MMPs (nMMPs) has been an increasingly reported phenomenon, which has been observed in high-grade tumors, correlated with tumor volume, and in some instances has been associated with poor prognosis in a number of disease types (Table 1) [32–36]. Collectively, such findings suggest an important functional role for nuclear MMPs and that such a localization effect does have biological and clinical significance. In support of this, it is interesting to note that nuclear localization has been reported for other ECM proteases as well. For example, nuclear cathepsins L and D have been reported to exhibit biological effects which can contribute to tumor progression [37–39]. Collectively, the localization of such proteases have the potential to activate or deactivate transcription factors, regulate chromatin remodeling, apoptosis, alter the structural elements of the nuclear matrix, and participate in molecular events that lead to cell proliferation and carcinogenesis [40–43].

What signaling cues cause MMPs to be directed to the nucleus still largely remains unknown, with a number of mechanisms being proposed, which stem from environmental factors to cellular metabolism [44]. Nevertheless, for several MMPs, researchers have been able to propose some molecular mechanisms responsible for nuclear MMP translocation [45–52].

In this review article, we highlight the increasing emergence of nMMPs, while outlining their biological significance to highlight how these distinct sub-sets of proteases may have good targeting potential in diseases such as cancer.

#### **2. Mechanistic Regulation of Nuclear MMPs**

One of the commonest ways to deliver proteins from the cytoplasm to the nucleus is through the process of receptor-mediated nuclear shuttling and due (in large part) to proteins possessing a nuclear localization sequence (NLS) [53]. Here, importins α and β recognize and bind the NLS to form an importin-cargo complex, which can bind the nuclear pore complex, and facilitate the translocation of protein cargo from the cytoplasm into the nucleus [54]. Depending on its sequence and structure, the NLS can be sub-divided into two groups composed of the classical NLS and the proline-tyrosine (PY) NLS. Specific importin proteins are also involved in recognizing different types of NLS, which allows them to confer protein selectivity and regulate this mechanism with greater specificity [53].

The classical NLS was originally thought to be involved in the nuclear translocation of MMP-2 when this protease was detected in the nucleus of rat cardiac myocyte cells for the first time [55]. This was confirmed upon scrutinizing the rat MMP-2 protein sequence, which revealed two small stretches of basic amino acids close to the C-terminal separated by a variable spacer [55]. Such sequences were also identified in the catalytic domain of MMP-3 [56]. The putative NLS (PKW**RK**TH) was identified and confirmed using the bioinformatics software, Protein Subcellular Localization Prediction Tool (PSORT, https://psort.hgc.jp/) [57], and validated upon the deletion of two positively-charged amino acids from this putative NLS, which led to a large decrease in the nuclear localization of the mutated proteins [56]. The same outcomes were observed after the substitution of these amino acids with uncharged amino acids as in the amino acid substitutions **R**110N and **K**111Q. For the first time in this field of research, such findings demonstrated a potential molecular mechanism for the nuclear translocation of MMP-3 [56]. Subsequently, Eguchi et al. (2008) identified five additional putative NLSs in MMP-3, of lysine- and arginine-rich sequences and which were found to be dispersed throughout all of the MMP-3 protein domains (Figure 2) [58]. Moreover, all of these NLSs were exclusively able to transport the MMP-3 protein into the nucleus. While such a study highlighted the existence of multiple NLSs and their dispensability, it also suggested that the post-translational modification of MMPs may "hide" primary NLSs in addition to exposing alternative NLSs and which may offer potential mechanisms that confer selectivity for the nuclear shuttling of some proteins. Functionally, nMMP-3 has also been shown to participate in the transcriptional regulation of *CTGF*/*CCN2* and *HSP* gene expression, where the presence (or absence) of each of the NLSs may contribute to regulating TG2, ERK, and IL-33 specific signaling pathways and responses [58–60].

The use of bioinformatic analyses have also helped to develop this area of research through identifying additional putative NLS sequences in other human MMPs protein sequences. Here, Abdukhakimova et al. (2016) identified a putative NLS within the catalytic domain of 14 MMPs, including the above-mentioned MMP-2 and MMP-3 proteins [61]. The sequences of MMPs were compared with experimentally validated NLSs from the catalytic domain of MMP-3 (of sequence PKWRKTH) [58] and most of the recovered NLSs contained two consensus residues, namely lysine and tryptophan (KW). The whole sequence was identified only in MMP-3 and MMP-10 and the authors also revealed the importance of the NLS in MMP-7 through it being evolutionary conserved throughout different species (Figure 2) [50,61].

Mechanistically, it has been proposed that endocytosis may also be responsible for the nuclear localization of MMPs. For example, in hepatocellular carcinoma cells, the amount of nuclear MMP-14 protein was increased in comparison to healthy liver cells, an event which enhanced the metastatic capacity of tumor cells [33]. Here, MMP-14 was jointly localized within the cytoplasm and perinuclear space and could interact with caveolin-1, thus implicating a specialized form of endocytic-protein trafficking that is fundamentally different to the use of nuclear transport receptors [54,62]. In support of this, caveolin-1 has been reported to drive and enrich the transport of proteins to the nucleus in human endothelial cells, as seen with caveolae regulating the intracellular protein trafficking of MMP-14 [63,64]. Such observations enforce the proposition that caveolin-1 participates in the nuclear translocation of MMP-14.


**Figure 2.** Nuclear localization sequences found in human MMP proteins. The consensus sequence for the classical NLS is indicated at the top of the figure. Three NLSs from the MMP-3 pro-domain (pro1-3) and two NLSs from the hemopexin-like domain (hex1-2) are highlighted. The NLS from the catalytic domain of MMP-3 is shown as a reference sequence, for comparison purposes with putative NLSs from the catalytic domains of other MMPs. Similar or identical sequences within the aligned NLSs are indicated with blue.

For the first time, nTIMPs were reported in human gingival fibroblasts in 1995 and in human breast carcinoma cell lines in 1999, prior to the discovery of nMMPs [23,24]. Subsequently, Gasche et al. (2001), reported gelatinolytic activity in the nuclei of mouse brain cells after ischemia-reperfusion, for which nMMPs were suggested to be responsible for [65]. Two years later, Si-Tayeb et al. reported the detection of nMMP-3 in human hepatocellular carcinoma cell line (HepG2) and the identification of a nuclear localization signal (NLS) within the structure of the protease [66]. Since then, the number of reported nMMPs has grown, with some MMP members being localized to the nuclei in a variety of different cells types, originating from normal tissues, cancers, infected cells, and in cells during disease progression (Table 1) [67–71]. For example, nMMP-2 was found in normal skin cells in the lower one-third of the epidermis, whereas in the tumor and pre-cancerous samples, it was predominantly in the upper layers of the skin suggesting that the protein may be expressed at the early stages of squamous cell carcinogenesis [72]. The expression of MMP-7 and MMP-16 were also found in the nuclei of basal and supra-basal cells of normal squamous epithelium and condyloma [73]. Alternatively, MMP-12 was detected in the nuclei of the virus-transfected cells and MMP-14 was reported as being present in the nuclei of macrophages, supporting the possible involvement of MMPs in the immune response [74].

**Table 1.** Functions and localization of nuclear MMPs (nMMP) and nuclear TIMP (nTIMP). The table represents nMMPs and nTIMP1 and their functions in different cells and tissues. Malignant cells and tissues are indicated in red. Other pathological conditions are indicated in purple; a-deoxyribonucleic acid.


The only nTIMP identified so far, nTIMP1, was found co-localized with nMMP-2 in endothelial cells and neurons, but no direct protein interactions or mechanism(s) for TIMPs translocation have been defined [25]. nTIMP1 inhibits nMMP-9, which was identified in neuronal cells, and exhibits insignificant or low levels of nMMP-9-derived gelatinase activity [80,81,87]. nTIMP1 was detected in gingival fibroblasts (in which nMMP-1 and nMMP-9 were later identified), and also in breast carcinoma cells, where nMMP-1 was also subsequently identified [23,24,76,77]. The mechanistic relationship shared between nMMPs and nTIMPs based upon them simultaneously residing in the nucleus have not been fully investigated in model systems, but immunofluorescence analysis has indicated nMMP/nTIMP co-localization within the nucleus of neural stem cells in Huntington's disease (HD) can contribute to enhanced neurotoxicity. Here, TGF-β treatment enhanced nTIMP1 protein levels, which conferred neuroprotection in HD against toxicity associated with the aggregation of neurotoxic mutant huntingtin proteins [71]. From Table 1, it is interesting to note that co-localization of nTIMP1 with every nMMP reported so far was detected (although not exclusively) in malignant tissues and a variety of other cell types. Additionally, while extracellular MMPs are seen to promote epithelial-mesenchymal transition (EMT), tumor invasion, and metastasis, nMMPs are reported to be present in both epithelial and the resulting mesenchymal cells, which is suggestive of them fulfilling potentially unique and distinct intracellular functions during EMT [92,93].

In summary, it is becoming firmly established that MMPs and TIMPs have the capacity to translocate to the cell's nucleus. While some of these depend on the presence of a classical NLS sequence (or a sequence derived from this), others are capable of this event through caveolin-dependent endocytosis. Unveiling the underlying molecular mechanisms for nMMP and nTIMP transport may lay down solid foundations for such mechanisms to be potentially targeted.

#### **3. Nuclear MMPs as Regulators of Gene Expression**

Following the entry of MMPs into the nucleus, they have been shown to participate in a number of different processes, such as cell migration, proliferation, signaling pathways, tumor growth, and the immune response (Figure 3) [60,68,75,94–96]. As an area of research that has seen significant growth over the recent years, we outline a number of key publications that highlight the diverse biological effects that are modulated when the MMPs are resident within the nucleus.

Unlike the extracellular MMPs, nMMPs have access to genomic DNA and may therefore modulate gene expression events related to disease progression. For example, in the human bone osteosarcoma epithelial cells, MMP-2 was visualized by immunofluorescence methods in the nucleolus where it could interact with DNA associated with different regions of the ribosomal RNA genes, suggestive of its potential to regulate rRNA transcriptional initiation [97]. Here, the inhibition of MMP-2 activity by siRNA interference led to a slower cell proliferation rate in comparison to control cells.

In human breast carcinoma MCF7 cells, overexpression of MMP-14 significantly increased the transcriptional expression of vascular endothelial growth factor A (VEGF-A) [95]. Mechanistically, MMP-14-regulated VEGF-A expression could be suppressed through the treatment of cells with the Src-tyrosine kinase inhibitor PP2 [95], and whether this regulatory effect is direct or not is still to be revealed. Similarly, such findings also have great significance for the role of nMMP-14 in the promotion of tumor growth or invasiveness [98,99]. Here, nMMP-14 stimulated the expression of SMAD1 via TGF-β signaling [98]. Additionally, nMMP-14 suppressed the expression of Dickkopf-3 (DKK3) in human urothelial cell carcinoma tissue, which led to increased invasiveness of cells [99]. In support of this, while the localization of nMMP-14 was not the object of the investigation, the nuclear staining of MMP-14 has also been observed and reported independently by immunohistochemical methods in other studies [33,74].

Immunocytochemistry methods have also identified nMMP-3 [58]. It was shown that the HEX domain of nMMP-3 can interact with transcription enhancer dominant in chondrocytes (TRENDIC) within the connective tissue growth factor gene (*CTGF*/*CCN2*) promoter region and activate its transcription [58,60]. The proteins regulated by this promoter play an important role in proliferation, the formation of the extracellular matrix, angiogenesis, and cell migration. In human dental pulp, nuclear MMP-3 could also regulate the expression of CTGF/CCN2 proteins and the cellular migration capacity of cells through this pathway [60].

β

**Figure 3.** Role of nuclear MMPs within cells. Currently, only two mechanisms for the transport of MMPs into the cell's nucleus are known, and which utilize the nuclear pore complex or endocytosis. Within the nucleus, MMPs can cleave nuclear proteins or regulate the transcription of various genes. Through these mechanisms, nMMPs can modulate a number of key biological processes within the cell. Participation of nMMPs in cancer cell progression is indicated in blue. NPC—nuclear pore complex; MMP—matrix metalloproteinase; NLS—nuclear localization signal.

Although nuclear MMPs are known to participate in malignant tumor progression, they also have additional functions of importance that are related to normal cellular homeostasis. Here, the use of transcriptomic analyses revealed that overexpression of MMP-3 stimulated mRNA expression of heat shock proteins (HSPs), HSP70B, HSP72, HSP40, and HSP20. Several transcription factors that potentially interact with nMMP-3 were predicted and one of them, heat shock factor 1 (HSF1) was validated to co-activate the *HSP70B* gene promoter together with the nMMP-3 protein [59]. Of note, the HEX domain alone was sufficient to induce HSP70B expression. Other transcriptional factors that nMMP-3 may associate with include FOXO3, VDR, Ets-1, CULT1, TBP, and SP1. Since MMP-3-green fluorescent protein (GFP) was found in cellular chromatin fractions and soluble nuclear fractions of COS7 cells, in which nuclear markers chromobox protein CBX5/HP1α and histone-H3 were also detected, it was suggested that MMP-3 can also enter the cell's nucleus to possibly modulate gene expression events [59].

Another protease, MMP-9, was found to contribute to osteoporosis, which is characterized by increased osteoclastogenesis and a decreased number of active osteoblasts (for bone formation). During osteoclastogenesis, nMMP-9 affected the expression of more than 67% of genes [86], normally

expressed in primary osteoclast precursor cells, which included genes that regulate RANKL, AMPK, and VEGF signaling pathways. On a morphological level, inhibition of nMMP-9 enzyme activity led to reduced maturation of osteoclasts, compared with control cells. Using an alternative approach by incorporating ChiPac-seq technology, nMMP-9 was seen as being required for histone-H3 protein cleavage near the transcription start sites of the osteoclastogenic genes *Nfatc1*, *Lif*, *Xpr1*, and for their concurrent activation during osteoclastogenesis. Nuclear accumulation of MMP-9 was also confirmed by immunofluorescence microscopy [86].

Tetracyclines have antimicrobial activity, block bone deterioration and work as MMP-9 inhibitors [99]. Tetracycline analogs, minocycline, and tigecycline suppressed osteoclast formation by blocking nMMP-9-mediated proteolysis of the amino-terminal of histone-H3 protein [100]. Antibiotic treatments significantly reduced the differentiation of osteoclasts but did not affect the proliferation of pre-osteoblasts and osteoclast precursor cells. At the transcriptional level, both tetracycline analogs repressed RANKL-induced mRNA expression of the MMP-9-targeted genes *Nfatc1*, *Lif*, and *Xpr1*. Through using tigecycline and minocycline treatments on zebrafish larvae harboring an osteoporosis phenotype, the antibiotics reduced prednisolone-induced osteoporosis in a dose-dependent manner. Such antibiotics can therefore be effective as a treatment for osteoporosis through modulating nMMP-9 enzymatic activity towards the histone-H3 protein and its gene regulatory effects [100].

In summary, a number of novel functions for nuclear MMPs have emerged over the years, spanning the mechanistic entry of MMPs to the nucleus and their input into the regulation of gene expression in cell- and disease-context dependent manner. In particular, some nMMPs affect proliferation, migration, and invasion that contribute to tumor progression. These data also support the fact that nMMPs can influence cellular processes at the level of gene regulation, thus highlighting additional potential as targets in the treatment of cancer.

#### **4. Nuclear MMPs as Regulators of Malignancy**

Nuclear MMPs also mediate a malignant cell phenotype while contributing to cellular mobility and tumor progression. For example, nMMP-7 together with alternative reading frame (ARF) protein expression contributed to enhanced migration and metastasis of prostate cancer cells [68]. Knockdown of ARF expression in cancer cells decreased MMP-7 expression, but when ARF was over-expressed, MMP-7 accumulated in the nucleus where it could bind to the ARF protein. The molecular mechanisms responsible for this effect remain unclear, but the concurrent increase in these two proteins within the nucleus is correlated with malignancy of cancer cells and the combined targeting of ARF and MMP-7 may therefore have therapeutic value in the treatment of advanced prostate cancer.

The proteases MMP-3 and MMP-9 also contribute to tumor progression [35]. Significant expression of both non-proteolytically-active and proteolytically-active isoforms of these MMPs were found in metastatic cells derived from colon adenocarcinoma cell lines. After colon adenocarcinoma cells were injected into the abdominal walls of mice, primary tumors and metastatic tumors in lung tissues contained active nMMP-9 that had become localized within the nuclei of cells detected within the tumor-stromal area. The knock-down expression of MMP3 by siRNA during the latter experiments suppressed cancer cell migration, suggesting an important and significant contribution from nMMP-9 and MMP-3 during tumor invasiveness [35].

In summary, it is becoming increasingly apparent that intracellular MMPs can play multiple (yet significant) roles in tumor progression.

#### **5. Nuclear MMPs and Oxidative Damage to DNA**

The MMP proteases can process multiple DNA-interacting nuclear proteins during oxidative stress. For example, the accumulation and activation of MMPs were observed in the nuclei of ischemic cells after reperfusion [80,101]. Here, MMP-14 promoted the activation of the zymogens pro-MMP-2 and pro-MMP-9 within the nuclei of ischemic cells after reperfusion [80]. Catalytically active nMMP-2 and nMMP-9 have been shown to cleave the PARP-1 and XRCC1 proteins, which

play an important role in DNA repair and caspase-independent cellular apoptosis [44,55]. It was reported that adenosine diphosphate could enhance the cleavage of PARP-1 by nMMP-2 and nMMP-9, through the PI3K/Akt/NF-κB and ERK1/2 signal transduction pathways [34]. Cleavage of PARP-1 and XRCC1 was also observed, which led to the accumulation of damaged DNA within the nuclei of ischemic brain cells after reperfusion. When rats were treated with the broad-spectrum MMP inhibitor BB1101, the cleavage of PARP-1 was significantly reduced and the amounts of XRCC1 protein were reported to increase. The use of such inhibitors in therapeutically targeting MMPs following cerebral ischemia-reperfusion injury is a good example of how nMMPs are being targeted for therapeutic purposes [81].

Other studies have also confirmed the accumulation of MMP-2 and MMP-9 in the nuclei upon ischemia treatment of cells. In the nuclei of mouse neurons deficient in superoxide dismutase (SOD1) and treated with ischemia-reperfusion, pro-MMP-2 and pro-MMP-9 protein levels were induced and activated. Active MMP-2 and MMP-9 are involved in the early-stage destruction of the blood-brain barrier, caused by oxidative stress during cerebral ischemia-reperfusion [65]. After an ischemic stroke in neurons and glial cells, protein MMP-9 was reported to be localized in the nucleus. The cells containing nMMP-9 also expressed activated caspase 3, which confirmed the link between the nuclear localization of MMP-9 and neuronal apoptosis in ischemic cells [102]. Similarly, the activated form of MMP-13 was also found in the nuclei of neurons as an early event following cerebral ischemia [15]. By subjecting the primary neural culture of rats to oxygen and glucose deprivation, Cuadrado et al. (2009) were able to demonstrate the nuclear translocation of MMP-13 in vitro.

Collectively, such important findings suggest that nMMPs may also fulfill a role in modulating the cell's response to oxidative stress (in addition to disease progression) and that certain members of this family may also be directly be involved in the DNA damage response and caspase-dependent cell death.

#### **6. Nuclear MMP and Apoptosis**

Regulated cell death can occur in different ways in the form of apoptosis, necrosis, pyroptosis, and autophagy [103]. Whether the cell chooses the path of apoptosis or the path of survival depends on the ratio of pro- and anti-apoptotic factors, and in this context, MMPs have been found to modulate both pro-apoptotic [104] and pro-survival effects [105].

One of the environmental factors that can induce oxidative stress and apoptosis is cigarette smoke, which reportedly changes the expression levels of MMP-2, MMP-9, and TIMP-2 and their subcellular localization in pulmonary artery endothelial cells [79]. Here, the level of annexin V-positive/propidium iodide-negative cells significantly increased compared to untreated control cells indicative of enhanced apoptosis. Cells exposed to cigarette smoke contained PARP-1 protein fragments usually detected in apoptotic cells, including a high level of gelatinase activity. Since MMP-2 and MMP-9 were also observed to cleave PARP-1, these data suggest that cigarette smoke may induce apoptosis via MMP activation.

Alternatively, nMMP-1 may also have a pro-survival role. For example, MMP-1 is co-localized with mitochondria and the nucleus in normal glial Muller cells [75], but during staurosporine-induced apoptosis, MMP-1 expression changes and localizes to perinuclear mitochondrial clusters and around fragmented nuclei. Inhibition of MMP-1 activity led to lamin degradation, caspases activation, and apoptosis.

Nuclear MMP-3 expression in HepG2 and liver myofibroblast cells could also affect their rate of apoptosis [56]. When Chinese hamster ovary cells were transfected by a plasmid encoding an EFGP/active MMP-3 fusion protein, it principally localized in the nuclei. Using an antibody against activated caspase 3, it was determined that cells transfected with EGFP/active MMP-3 had higher apoptotic levels compared with untransfected cells. This effect was enhanced in cells where MMP-3 was present within the nucleus. Moreover, expression of a catalytically-inactive form of MMP-3 or inhibition of wild type MMP-3 in the presence of a broad-spectrum MMP inhibitor GM6001, led to a reduction in apoptotic cells. Such findings suggest another important biological effect for active nMMP-3 in apoptosis regulation.

Collectively, nuclear MMPs can have the effects of enhancing or decreasing apoptosis of cells. As described above, there is a clear relationship between nuclear MMP-2, MMP-3, MMP-9, and activation of apoptosis. Conversely, nMMP-1 can block the pathway of apoptosis. Such findings have been defined for a limited number of the MMP family members and clearly further developments in this area of research are warranted based on the importance of nMMPs and their contribution to disease progression [106].

#### **7. Nuclear MMPs in Immune and Anti-Viral Responses**

During inflammation, the expression profiles and activity of a wide range of proteases is increased. These include serine proteinases, such as granzymes, neutrophilic elastases, cathepsin G, and proteinase 3 [107]. Some of these proteases can modulate inflammation and the immune response via regulation of cytokines and chemokines [108]. For example, innate immunity is regulated by MMP-25, which is preferentially produced by leukocyte cells. While MMP-25-deficient mice were viable, they had defects in their innate immune system through having high sensitivity to bacterial lipopolysaccharide, hypergammaglobulinemia, and showed decreased secretion of the pro-inflammatory molecule COX2 [109]. In macrophages, nMMP-14 was observed to participate in the regulation of inflammation, the immune response, or anti-viral and innate immunity. Mechanistically, nMMP-14 can trigger expression and activate phosphoinositide 3-kinase δ-Akt-GSK3β signal cascade and modulate the Mi-2/NuRD nucleosome remodeling complex [74].

The immune system is also modulated by MMPs regulating the transcription of genes involved in anti-viral immunity. For example, macrophages can release MMP-12 during a viral infection, which can also enter infected cells and be translocated to the nucleus, presumably through endocytosis and lipid-dependent trafficking [95]. Through its catalytic domain, nMMP-12 can bind the polyA-rich regions within the promoter region of the *I*κ*B*α encoding gene, and induce its transcriptional expression [70,89], which upregulates the secretion of interferon-alpha (IFN-α) [70]. However, in the absence of MMP-12 expression (in a knockout mouse model), in mice infected with Coxsackie B type B2 virus, unsecreted IFN-α protein remained within pancreatic, heart, and hepatocyte cells and the mice succumbed to the lethal effects of viral infection. Such effects on IFN- α expression could be reversed upon the artificial expression of MMP-12 in MMP-12−/<sup>−</sup> fibroblast cells in vitro [70]. Moreover, nMMP-12 expression was reported to reside in the nucleus of human cardiomyocyte cells [70] and exogenously added recombinant MMP-12 protein, or its catalytic domain, observed to traffic to the nucleus, when used to treat MMP-12-silenced HeLa cells. Additionally, extracellular MMP-12 could cleave and inactivate systemic IFN-α, thereby attenuating the anti-viral inflammatory response as part of a negative feedback loop and which could be reversed upon treating virally-infected mice with the MMP-12 inhibitor, RXP470. Here, morbidity in mice was observed to be reduced, as was viral replication. Collectively, inhibition of extracellular MMP-12 or increasing its nuclear localization (or activity) highlights a potential basis on which the development of a therapeutic strategy against viral infection could be implemented.

The cellular anti-viral response against Dengue virus is augmented by MMP-3 [69]. Zuo et al. observed that the presence of nMMP-3 within infected cells was increased. The silencing of MMP-3 led to increased titers of the virus, decreased levels of cytokines and chemokines, and the reduced activity of NF-κB. Since nMMP-3 was found to be co-localized with intracellular NF-κB, it was suggested that the protease up-regulated the activity of NF-κB via a direct protein-protein interaction, which could subsequently promote the transcription of anti-viral and pro-inflammatory genes [69]. Collectively, such findings suggest that nMMP-3 plays a significant role in the anti-viral defense of the body against Dengue virus.

The requirement for MMPs for the immune response and protection of the organism against various infections has been suggested previously [110]. Predominantly, extracellular MMPs regulate the migration of immune cells, proteolysis of the basement membrane and the remodeling of the extracellular matrix. In the nucleus, MMPs are emerging to participate in the regulation of gene expression and regulate immunity against viruses and bacteria [69,70,74,111], but their additional abilities to cleave gene products that are central to negatively regulating the immune response cannot be completely excluded at this juncture.

#### **8. Future Directions**

MMPs were first identified in 1962 and since then have been characterized as extracellular proteases [112] which are firmly established as playing critical roles in oncogenesis and other pathological processes [113]. Over the years, MMPs have been understood as being some of the most pursued targets for drug development [9]. Their large family size, redundant roles, and substrate specificity are good reasons for why side effects arising from targeting them during disease progression with novel therapeutics has been a major obstacle for good therapeutics reaching the clinic. However, the detection of MMPs within nucleus, when taken with the biological effects they regulate, raises renewed optimism for developing therapeutics that are specific for these MMP derivatives. Mechanistically, the translocation of MMPs to the nucleus have only been thoroughly investigated for MMP-3 [58]. While bioinformatics approaches enable the identification of putative NLSs in other MMPs, their ability to translocate the proteases into nucleus and the mechanisms they utilize to do this still remain to be unveiled [61]. Such investigations would help in defining the plausibility of targeting nMMPs with the foresight of minimizing unwanted side effects, with greater clarity. The activity of MMPs can also be regulated by TIMPs [19]. So far, only TIMP1 has been reported within nuclei and no mechanism of its translocation has been described [25]. Since TIMPs also inhibit other a disintegrin and metalloproteinases (ADAMs), such as ADAM-10, therapeutic targeting nTIMP (so that nMMPs can take greater effect) may offer limitations [114].

A number of approaches have been adopted with a view to targeting MMPs for therapeutic purposes. For example, the small-molecule inhibitors hydroxamic acid, carboxylic acid, 5,5-disubstituted barbiturates, benzosulfonamide, and phosphonate have all shown efficacy in reducing oncogenesis, but unwanted side-effects have presented a number of challenges [115]. Alternatively, the more specific approach of targeting metalloproteases using single chain antibody fragments (scFV) has shown some encouraging outcomes in vitro [116–118]. Similarly, scFv fragments developed against extracellular MMP-14 have also shown good efficacy against cancer cell invasiveness in cell line models validated in a mouse orthotopic xenograft model [118]. Moreover, monoclonal antibodies directed at MMP-14 also successfully prevented the activation of pro-MMP-2 while antibodies to MMP-9 interfered with the catabolism of gelatin [119]. Collectively, while such approaches do encouragingly highlight the feasibility of targeting extracellular MMPs, their ability to target nMMPs remain to be explored.

For effective nMMP-specific inhibitor design, it may be necessary to elucidate the biological functions of distinct nMMPs and their mechanisms utilized for nuclear translocation. Since some of the proteases play important physiological and biological roles, as in the immune and anti-viral response, while others even suppress tumor growth via apoptosis, inhibiting nMMPs may require careful consideration that would potentially leave otherwise favorable biological effects intact [56,66,69,70]. Nevertheless, one serious challenge in targeting nMMPs is the specific delivery of the inhibitor to the nucleus and over the last ten years, several nano-carriers have been developed to help overcome this potential obstacle [120]. Such carriers have proven their efficiency in drug-targeting approaches for human cervical cancer, human oral squamous carcinoma cell lines, and multidrug-resistant breast cancer cell lines and in vivo, using MCF-7-derived breast tumor-bearing mice [121–124]. Alternatively, the use of such carrier systems with MMP inhibitors in combination with other conventional therapeutic reagents may have some usefulness to help combat tumor development, migration and metastasis potential [125–128].

#### **9. Conclusions**

Extracellular MMPs regulate a variety of functions, such as the development of tissues, inflammation, apoptosis, migration, angiogenesis, vasculogenesis, and other processes. However, it is emerging that nuclear matrix metalloproteinases are functionally distinct, through them performing unique and mutually exclusive functions within the nucleus. Although almost 25 years have passed since the first reports of nuclear localization of MMPs appeared, much remains to be explored. Here, one of these key areas is how the matrix metalloproteinases, whether secreted or anchored in the cell membrane, are transported to the cell's nucleus. While some MMPs encode a classical NLS, others are transported through endocytosis. The activity of nMMPs is diverse in that they can promote tumor metastasis and other pathological processes. While on the one hand, nMMPs can contribute to apoptosis resulting in tumor cell death, on the other hand, nMMPs can positively regulate the immune response towards viral and bacterial infections. Surprisingly, the MMP inhibitor TIMP1 has also been detected within the nucleus and its full repertoire of inhibitory functions remains to be fully elucidated, in addition to whether other TIMPs can also reside in the nucleus.

Depending on the favorable or unfavorable effects of nMMP proteins, there appears to be some flexibility presented in how nMMPs may be targeted based on the manner in which they mechanistically translocate to the nucleus. For example, one can try to respectively elevate or interrupt nuclear transport through targeting nTIMP-chaperone effects or directly targeting the nMMPs in a "compartment-specific" manner as the proteases traffic to the nucleus. To create specific targeting approaches for nMMPs activities, it is necessary to understand the biochemical network of these proteases in detail and gain a greater understanding of what other key biological effects these proteases may be regulating during disease progression, as a fundamental prerequisite.

**Author Contributions:** Conceptualization, A.S.F., S.M.S. and A.A.Z.J.; writing—original draft preparation, A.S.F., A.I.P. and S.M.S.; writing—review and editing, A.S.F., A.I.P., V.A.M. and S.M.S.; visualization, A.S.F. and A.I.P.; supervision, S.M.S. and A.A.Z.J.; funding acquisition, A.A.Z.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Russian Science Foundation, grant number 16-15-10410.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Matrix Metalloproteases in Pancreatic Ductal Adenocarcinoma: Key Drivers of Disease Progression?**

**Etienne J. Slapak 1,2,3, JanWillem Duitman 1,2, Cansu Tekin 1,2,3 , Maarten F. Bijlsma 2,3 and C. Arnold Spek 1,2,\***


Received: 26 March 2020; Accepted: 15 April 2020; Published: 18 April 2020

**Abstract:** Pancreatic cancer is a dismal disorder that is histologically characterized by a dense fibrotic stroma around the tumor cells. As the extracellular matrix comprises the bulk of the stroma, matrix degrading proteases may play an important role in pancreatic cancer. It has been suggested that matrix metalloproteases are key drivers of both tumor growth and metastasis during pancreatic cancer progression. Based upon this notion, changes in matrix metalloprotease expression levels are often considered surrogate markers for pancreatic cancer progression and/or treatment response. Indeed, reduced matrix metalloprotease levels upon treatment (either pharmacological or due to genetic ablation) are considered as proof of the anti-tumorigenic potential of the mediator under study. In the current review, we aim to establish whether matrix metalloproteases indeed drive pancreatic cancer progression and whether decreased matrix metalloprotease levels in experimental settings are therefore indicative of treatment response. After a systematic review of the studies focusing on matrix metalloproteases in pancreatic cancer, we conclude that the available literature is not as convincing as expected and that, although individual matrix metalloproteases may contribute to pancreatic cancer growth and metastasis, this does not support the generalized notion that matrix metalloproteases drive pancreatic ductal adenocarcinoma progression.

**Keywords:** MMP; MMP2; MMP9; MMP7; MMP14; matrix metalloproteases; PDAC; pancreatic cancer

#### **1. Introduction**

Pancreatic ductal adenocarcinoma (PDAC) is a devastating disease with the worst survival outcome of any cancer [1]. Its incidence, which is around 10 per 100,000 individuals, is rising in developed countries [2,3], with 458 thousand new cases and 432 thousand deaths in 2018 worldwide [4]. The 5-year survival rate is around 9%, and the 10-year mortality is approaching 99% [5]. Progress towards improving survival has been slow, and current treatment options are inadequate. The only significant progress that has been made is in the form of lower mortality rates for patients eligible for resections, and a slight prolongation and improved quality of life in patients with inoperable disease with the use of chemotherapeutic agents. Single-agent gemcitabine treatment has been the standard of care for inoperable PDAC for many years, although the observed benefits are small in daily practice [6–9] and seem restricted to patients with a good performance status [10]. More recently, nanoparticle albumin-bound paclitaxel was shown to exert superior antitumor activity compared to gemcitabine monotherapy, thereby establishing nab-paclitaxel and gemcitabine combination therapy

as first-line chemotherapy regimens in PDAC [11]. In patients with a good performance status, combination therapy with folinic acid, fluorouracil, irinotecan and oxaliplatin (FOLFIRINOX) is superior over other treatments [12] and FOLFIRINOX is consequently emerging as the new standard of care for relatively fit patients [13]. Importantly however, even in the specific group of patients eligible for FOLFIRINOX treatment, the survival benefit is limited [14].

#### *1.1. Tumor Microenvironment of PDAC*

PDAC is characterized by a strong desmoplastic reaction, which results in an archetypal tumor microenvironment, consisting of a dense stroma surrounding the tumor cells [15,16]. The stroma forms the bulk of the tumor, taking up to 90% of the total tumor mass and consists of many cellular and acellular components like (myo)fibroblasts, macrophages, blood vessels and extracellular matrix components such as, among others, collagen I, collagen IV, laminin and fibronectin. In the stroma, the extracellular matrix has traditionally been considered to be a stable structure that mainly plays a supportive role in maintaining tissue morphology. Nowadays, however, it is evident that the extracellular matrix forms a dynamic and versatile milieu that affects the fundamental processes of the surrounding cells [17,18]. Accordingly, the loss of extracellular matrix homeostasis and integrity is considered one of the hallmarks of cancer and typically defines transitional events, resulting in cancer progression and metastasis [19]. Moreover, the loss of extracellular matrix homeostasis due to stromal depletion aggravates pancreatic cancer progression in preclinical animal models [20–22].

#### *1.2. Matrix Metalloproteases in the Tumor Microenvironment*

The desmoplastic PDAC stroma contains many different proteases that play a key role in the crosstalk between tumor and stromal cells. An intriguing group of proteases in the tumor microenvironment consist of matrix metalloproteases (MMPs), which are primarily known for their ability to degrade extracellular matrix components. Altered expression and/or activity of MMPs in the tumor microenvironment is likely to lead to the loss of homeostasis of the extracellular matrix, thereby driving PDAC progression. Based upon this notion, MMPs are considered important contributors to PDAC progression and experimental PDAC studies frequently use MMPs as surrogate markers for treatment responses. Decreased MMP levels are, nowadays, considered as important signs of the anti-tumorigenic potential of the gene/compound/miRNA under study. In the current review, we address whether the literature supports the concept that MMPs drive PDAC progression and if decreased MMP levels under experimental settings are indicative of the treatment response. To this end, we performed a systematic review of patient and experimental animal studies, focusing on MMPs in PDAC.

#### *1.3. Overview of Matrix Metalloproteases*

MMPs are calcium-dependent zinc-containing endopeptidases of the metzincin protease superfamily. They typically contain an N-terminal propeptide of approximately 80–90 amino acids, with a conserved PRCGXPD motif that is responsible for maintaining latency via the binding of the cysteine residue to the zinc atom in the active site [23]. After the proteolytic removal of the propeptide, the active form of MMP contains a calcium-dependent catalytic domain of around 200 amino acids, which contains a hydrophobic S1′ -pocket that determines substrate specificity, proceeded by a linker region of variable length, and the C-terminal hemopexin-like domain, which spans approximately 200 amino acids. The hemopexin-like domain, which is absent in some MMP family members, plays a functional role in substrate binding and/or in interactions with tissue inhibitors of metalloproteases (TIMPs), a family of specific MMP protein inhibitors [24].

Since the identification of a diffusible collagenolytic factor in living amphibian tissue that is capable of degrading undenatured calf skin collagen [25], a total of 24 MMPs have been identified in humans [26]. According to their substrate specificity, MMPs are classified into subfamilies: (1) collagenases, (2) gelatinases, (3) stromelysins, (4) matrilysins, (5) membrane-type MMPs and

(6) others. Despite the general acceptance of the classification system based on extracellular matrix substrates, MMPs are rather promiscuous in substrate recognition and also proteolytically cleave substrates beyond extracellular matrix proteins.

#### **2. Methods**

To provide a comprehensive overview of the role of MMPs in PDAC, a systematic PubMed search without restrictions was performed. A combination of the search terms "pancreatic cancer" and every individual MMP (both using the official gene name and the common name; see Supplementary Materials Table S1) was used to retrieve papers published up to 1 March 2020. All papers were independently screened by their title and abstract, followed by full text assessment to include papers that contained MMP expression analysis in PDAC patients and papers that contained animal experiments that targeted (either genetically or pharmacologically) MMPs in pancreatic cancer models. The excluded papers were those that contained in vitro data only, papers that assayed MMP levels in experimental animal models without interventions or genetic modifications, or papers that did not focus on PDAC.

#### **3. Results**

We retrieved 64 papers focusing on collagenases, 642 papers focusing on gelatinases, 51 papers focusing on stromelysins, 93 papers focusing on matrilysins, 66 papers focusing on transmembrane MMPs and 21 papers focusing on other MMPs (Figure 1). After the removal of duplicates, 816 eligible studies were identified and were vigorously screened to obtain those that contained patient data and/or animal experiments in which MMPs were targeted. This resulted in the inclusion of 14 papers focusing on collagenases, 60 on gelatinases, 11 on stromelysins, 21 on matrilysins, 12 on transmembrane-type MMPs and five on the so-called "other" MMPs. As several of the eligible papers contained data on multiple MMPs, the total number of papers including patient/experimental animal data selected for the review was 91.

**Figure 1.** Flowchart of paper inclusion. Using the search criteria indicated in Supplementary Materials Table S1, we obtained 814 eligible papers that we screened for the presence of patient and/or matrix metalloprotease (MMP) intervention in animal models. After the exclusion of duplicate papers, we ended up with 91 papers that were included in the review.

#### *3.1. Collagenases in PDAC*

Despite the general notion that collagenases (MMP1, MMP8 and MMP13) are key players in cancer biology [27–29], relatively little is known about collagenases in PDAC. Although MMP-1 is consistently shown to be overexpressed in PDAC patients compared to healthy controls [30–36], its effects on cancer progression are inconsistent (Table 1). For example, MMP1 overexpression has been reported as being associated with both a poor prognosis [30] and prolonged survival [37], although no correlations with tumor size, differentiation status and lymph node involvement have been observed [30,36,38]. Despite an elegant recent study showing that MMP1-dependent protease activated receptor (PAR)-1 drives PDAC cell migration and perineural invasion [33], the important role of MMP1 in PDAC is not supported by the experimental data. Besides MMP1 overexpression, MMP8 [36,39] and MMP13 [34,40] are also overexpressed in PDAC patients compared to healthy controls. The relevance of increased MMP expression is not well documented and only a single study showed that MMP-13 expression is associated with lymph node metastasis and the tumor's pathological stage [41]. Interestingly however, MMP13 overexpression significantly promoted the invasion of the PDAC cells in vitro, whereas MMP13 inhibition blocked leptin-mediated PDAC cell invasion [41], while CD40 agonist-dependent resolution of fibrosis and enhanced chemotherapy efficacy were diminished by MMP13 inhibition [42].

#### *3.2. Gelatinases in PDAC*

The most studied MMPs in PDAC are, without a doubt, the gelatinases (MMP2 and MMP9; see Figure 1). The vast majority of studies show that both MMP2 [34,35,43–62] and MMP9 [34,36,39,48,49,53,54,59,62–67] are upregulated in PDAC patients (Table 1), while a minority of studies fail to show a difference in expression between PDAC and the controls [36,38,52,60,61,68–71]. The potential clinical relevance is less pronounced, as just half of the studies reported associations between increased MMP2 or MMP9 levels with clinical characteristics such as survival, metastasis or tumor stage [43,46–48,50,51,53,56–58,61,63,65,67,68,72,73], whereas in the other half of the studies no such correlations were observed (Table 2). Despite the rather diverse observations in patients, initial preclinical experimental animal experiments showed promising results (Table 3). Batimastat treatment of mice harboring orthotopic pancreatic cancers reduced cancer growth, metastasis and death compared to control-treated mice, while also potentiating gemcitabine sensitivity [74–77]. Batimastat was also shown to reduce metastasis and death when PDAC cells were directly injected into the spleen of recipient mice, in order to mimic liver metastasis in PDAC [78]. Although batimastat is not specific to MMP2 and MMP9 and also inhibits MMP1, MMP3, MMP7, MMP8 and several ADAM family members, based on the gelatin zymography of tumor samples before and after treatment, it was hypothesized that the tumor-inhibiting effect of batimastat was dependent on MMP2 and, to a lesser extent, MMP9. The potential importance of MMP2 and MMP9 in PDAC progression is further supported by studies using more specific inhibitors like MMI-166, RO28-2653 and OPB-3206. Indeed, the selective MMP2, MMP9 and MMP14 inhibitor MMI-166 inhibited PDAC growth in both mice and Syrian hamsters [79,80], whereas RO28-2653 and OPB-3206 (both also selective MMP2, MMP9 and MMP14 inhibitors) reduced chemically induced pancreatic carcinogenesis in Syrian hamsters [81,82]. Finally, treatment with the selective MMP2 and MMP9 inhibitor SB-3CT reduced the lung metastasis of subcutaneously implanted PDAC cells [83].

The most conclusive evidence of the role of MMP2 in PDAC progression comes from subcutaneous models, in which the injection of shMMP2-silenced PANC1 cells resulted in smaller tumors compared to the injection of control shRNA transduced cells [84], whereas treatment with MMP2-blocking peptides limited tumor growth and angiogenesis [85].

In a similar way to the inconclusive association studies in patients (see above and Table 2), experimental animal experiments specifically targeting MMP9 show inconsistent results (Table 3). Orthotopic injections of MMP9-overexpressing Panc02 cells led to bigger tumors than injections of their control counterparts, but the absence/presence of MMP9 did not affect metastasis [86]. Treatment with a MMP9-blocking antibody did not affect the tumor growth of subcutaneously implanted PDAC

cells, but did enhance gemcitabine and nab-paclitaxel sensitivity when PDAC cells were injected into the peritoneal cavity [87]. Doxycycline treatment, suggested to specifically target MMP9, reduced the growth of subcutaneously injected Capan-1 cells [88]. Finally, subcutaneous or orthotopic implantation of PDAC cells in MMP9-deficient mice diminished tumor take, tumor growth, angiogenesis and metastasis [83,89] but tumor progression and metastasis increased in MMP9-deficient mice on the *Kras(G12D)*/*Tp53* background [90].

#### *3.3. Stromelysins in PDAC*

Clinical studies do not support the general role of stromelysins (MMP3, MMP10 and MMP11) in PDAC (Table 1). Although MMP11 is consistently upregulated and associated with clinical characteristics in PDAC patients [35,36,91–93], the data for MMP3 is more controversial. Only half of the studies focusing on MMP3 suggest its expression is increased in PDAC patients compared to control tissue [34,35,94,95], and only a single study suggests that MMP3 is associated with patient survival [95]. Besides clinical studies, preclinical animal models also do not support an important role for stromelysins in PDAC progression. Apart from a study which suggests, but does not prove, that MMP10 drives the invasion and metastasis of PDAC [96], it has only been shown that MMP3 overexpression on the *Kras(G12D)* background increases neoplastic alterations in pancreatic acinar cells [94]. These premalignant morphological changes were accompanied by the recruitment of infiltrating immune cells and the expression of smooth muscle actin and collagen, indicating that MMP3 is not only a coconspirator of Kras in inducing tumorigenic changes in epithelial cells, but also that it promotes the establishment of a tumorigenic microenvironment. Though it has been suggested that MMP3 may play a role in PDAC initiation, the actual importance of endogenous MMP3 (as opposed to overexpressed MMP3) in PDAC progression and its potential clinical relevance remains elusive.

#### *3.4. Matrilysins in PDAC*

MMP7 and MMP26 are the only two members of the matrilysin subfamily. A large number of studies have compared MMP7 expression in PDAC patients with pancreatitis patients and/or healthy controls and have consistently shown that MMP7 levels are elevated in PDAC patients (Table 1) [34–36,54,69,91,97–104]. More importantly, MMP7 levels correlate with metastasis and/or survival in most, but not all, studies. Based upon these reports, it is suggested that MMP7 is an important regulator of tumor formation. In line with this notion, preclinical experimental animal models show that MMP7 expression is intimately linked with acinar-to-ductal metaplasia and that pancreatic duct ligation-dependent acinar cell loss, caspase-3 activation, and subsequent metaplasia is significantly reduced in MMP7-deficient mice (Table 3) [98]. The effect of MMP7 on acinar-to-ductal metaplasia seems model-specific, however, as MMP7 deficiency did not affect pancreatitis driven-PanIN development in Pfta1-Cre Kras(G12D) mice [105]. In addition to PDAC initiation, MMP7 also seems to drive PDAC progression. Using several genetic Kras-driven PDAC models, it was shown that both tumor size and metastasis were significantly reduced by MMP7 deficiency. The percentage of mice with lymph node metastasis reduced from around 60 in MMP7-proficient mice to 0 in MMP7-deficient mice, whereas the percentage of mice with liver metastasis dropped from 67% to 13% due to MMP7 deficiency [105]. In line with these findings, the metastasis of MMP7-silenced PANC1 cells was largely reduced compared to control PANC1 cells, whereas pharmacological MMP7 inhibition with sulfur-2-(4-chlorine-3-trifluoromethyl phenyl)-sulfonamido-4-phenylbutyric acid (SCTPSPA) also significantly reduced the metastasis of PANC1 cells [101]. MMP26 expression was also induced in PDAC patients compared to the controls and, intriguingly, MMP26 was expressed significantly more often in tumors with lymph node involvement. Although this is suggestive of the general role of matrilysins in PDAC progression, experimental data confirming the pro-tumorigenic role of MMP26 in PDAC is lacking and it remains to be established whether MMP26 is indeed a driver of disease progression or merely acts as a marker of PDAC metastasis [106].

#### *3.5. Membrane-Type MMPs in PDAC*

Seven membrane-bound MMPs have been described so far: the transmembrane members MMP14, MMP15, MMP16, MMP23 and MMP24, and the GPI-anchored members MMP17 and MMP25. Of the membrane-bound MMPs, MMP14 seems most relevant in the setting of PDAC (Tables 1–3). Indeed, the overexpression of MMP14 in mice expressing an activating Kras(G12D) mutation led to more large, dysplastic mucin-containing papillary lesions compared to the control Kras(G12D) mice (Table 3) [107]. Using subcutaneous models, MMP14 overexpression in cancer cells seems to reduce the cytotoxic effect of gemcitabine [108], whereas MMP14 inhibition in pancreatic stellate cells limits tumor growth [84]. Moreover, the cancer cell-specific overexpression of membrane-type 1 matrix metalloproteinase cytoplasmic tail binding protein-1 (MTCBP-1; MMP14 binding protein inhibiting its activity) restricts metastasis in orthotopic PDAC models, further suggesting that MMP14 may enhance tumor progression [109]. However, clinical data do not support the important role of MMP14 in PDAC progression (Tables 1 and 2). Although MMP14 may be overexpressed in PDAC [44,110], MMP14 does not correlate with clinical characteristics such as tumor differentiation, tumor size, lymph node status, or patient survival [31,37,111].

#### *3.6. Other MMPs in PDAC*

The so-called other MMPs (i.e., MMP12, MMP19, MMP20, MMP21, MMP27 and MMP28) are not very well characterized in PDAC. Although some members seem to be overexpressed in PDAC [106,111,112] and may be associated with tumor stage and patients survival (Table 1) [111–113], no preclinical studies have addressed the role of these MMPs in PDAC (Table 2). Therefore, their actual importance remains to be established.

#### *3.7. Clinical Trials with MMP Inhibitors in PDAC*

Only two phase 3 trials focusing on MMP inhibition in PDAC have been published [114,115]. One trial showed that the addition of marimastat (a broad-spectrum MMP inhibitor targeting MMP1, MMP2, MMP7, MMP9 and MMP14) to gemcitabine in a double-blind placebo-controlled, randomized study was well-tolerated but did not show clinical benefits in PDAC patients [114]. The overall response rates (11% and 16% with and without the addition of marimastat, respectively), progression-free survival and time to treatment failure were similar in both treatment arms. Another phase 3 trial showed that BAY 12-9566 (tanomastat; MMP2, MMP3 and MMP9 inhibitor) treatment was also well tolerated by PDAC patients but was inferior to gemcitabine, with median survival times of 3.74 and 6.59 months for the BAY 12-9566 and gemcitabine arm, respectively [115]. Median progression-free survival and quality-of-life analyses also favored gemcitabine, arguing against MMP inhibition in the setting of PDAC.

The fact that there are no clinical benefits obtained through MMP inhibition does not imply that MMPs do not contribute to PDAC progression. As elegantly discussed [116,117], the disappointing clinical trial results may be due to several reasons, of which the inclusion of advanced stage disease seems most relevant. Broad spectrum MMP inhibitors may also lack efficacy as they could block the potential tumor inhibitory activities of specific MMPs. As indicated above, MMP9 deficiency on the *Kras(G12D)* background enhanced tumor progression and invasive growth [90], supporting this notion and providing an alternative explanation for the negative marimastat and BAY 12-9566 results in PDAC patients. Finally, the poor clinical efficacy of MMP inhibitors could also be explained by the overestimation of the role of MMPs in PDAC progression based on preclinical models that do not fully capture the complexity of human disease.


**Table 1.** MMP expression levels in Pancreatic ductal adenocarcinoma (PDAC) patients and controls. Red indicates increased MMP levels, blue indicates no difference and green indicates decreased MMP levels in PDAC patients.


**Table 1.** *Cont.*

Pancreatic cancer (PC); pancreatitis (CP); healthy control (CO); benign tumor (BT); immunohistochemistry (IHC); Western blot (WB); zymography (DG).


**Table 2.** Association between MMP expression and clinical characteristics of PDAC. Red indicates that MMP levels are associated with poor outcome, blue indicates no association and green indicates that MMP levels are associated with improved survival.


**Table 2.** *Cont.*

Pancreatic cancer (PC); pancreatitis (CP); healthy control (CO); benign tumor (BT); immunohistochemistry (IHC); Western blot (WB); zymography (DG); overall survival (OS); disease-free survival (DF); lymph node metastasis (LM); perineural invasion (PNI); venous invasion (VI); distant metastasis (DM); differentiation (DIF).


#### **Table 3.**Experimental animal models that target MMPs.

Note: All experiments were performed using mice unless indicated otherwise.

#### **4. Conclusions**

The potential clinical relevance of MMPs in PDAC has largely been addressed using patient-derived tumor material. These studies show a rather consistent picture with respect to MMP overexpression in tumors compared to control sections, although almost 25% of the studies do not show significant differences between patients and controls. However, the association of MMP overexpression with clinical characteristics is not as convincing as suggested in the literature. Half of the studies show that high MMP levels are associated with (lymph node) metastasis and reduced survival, whereas the other half of the studies do not show any correlation with clinical characteristics. Patient-derived data do not, therefore, seem to allow firm conclusions that MMP expression levels (in general) are associated with PDAC progression and poor prognosis to be drawn, especially when considering that publication bias may have resulted in negative studies not being published.

Initial preclinical experimental animal models using broad spectrum MMP inhibitors are more in line with the general role of MMPs in PDAC progression, as different inhibitors limit tumor growth and metastasis in subcutaneous, orthotopic and spontaneous PDAC models. The contribution of individual MMPs in PDAC progression is, however, not very well established. Only MMP2, MMP7 and MMP14 are shown to potentiate tumor growth and/or metastasis in multiple independent papers. For others, the literature is conflicting or missing and no clear conclusions can be drawn. Importantly, however, conflicting results do not indicate that the individual MMPs have no effect in PDAC. The biology of PDAC and MMP is complex and MMPs may act in a context-dependent manner, with both tumor-promoting and tumor-inhibiting effects. The conflicting role of MMP9 serves as an excellent example for this notion. The data rather convincingly show that tumor MMP9 expression drives PDAC progression, but systemic MMP9 ablation triggers invasive growth and metastasis by blocking MMP9-dependent tumor-inhibiting effects in the bone marrow.

Despite the presence of a large range of MMP-deficient animals and the relative ease of generating MMP deficient cells with CRISPR technology, the majority of MMPs have not been studied in preclinical PDAC animal models. To fully appreciate the importance of individual MMPs in PDAC progression and to assess their potential clinical relevance, we have to await studies that combine (pharmacological inhibition in) genetic Kras-driven spontaneous models with subcutaneous and/or orthotopic models, in which MMPs are specifically depleted in stromal or tumor cells. In particular, experiments that address pharmacological treatment with specific MMP inhibitors after tumors could turn out to be invaluable for establishing the context-dependent role of individual MMPs in PDAC. Before such studies have been performed, we should be careful not to generalize the available literature.

Although broad spectrum MMP inhibitors limit PDAC progression in preclinical animal models [73–82], they seem to lack efficacy in a clinical setting [115,116]. This disparity between preclinical data and clinical trials can be attributed to several factors—for instance, differences in pharmacokinetics, pharmacodynamics and metabolism and the failure to accurately model the tumor microenvironment [128]. In particular, xenograft models, which lack a functional immune system, show a reduced complexity and cellular diversity compared to human disease models. Moreover, the degree of aneuploidy in human tumors results in great variety within inter-tumoral gene modifications, in a different manner compared to how it occurs in mice [129,130]. All of these species-related differences limit the capacity of preclinical mouse models to accurately predict the response of MMP inhibitors in PDAC patients.

In conclusion, based on our systematic review on the role of matrix metalloproteases in PDAC, we conclude that the available literature is not as consistent as envisioned and that, although individual matrix metalloproteases seem to contribute to PDAC growth and metastasis, our review does not support the generalized notion that matrix metalloproteases drive PDAC progression.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2079-7737/9/4/80/s1, Table S1: Search terms used and number of papers retrieved.

**Funding:** This research was funded by grants from the Dutch Cancer Foundation (UVA 2017-11174 and UVA 2014-6782) and the Netherlands Organization for Scientific Research (VENI grant 016.186.046).

**Conflicts of Interest:** The authors declare no conflict of interest. M.F.B. has acted as a consultant to Servier, and received research funding from Celgene.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Proteolytic Cleavages in the VEGF Family: Generating Diversity among Angiogenic VEGFs, Essential for the Activation of Lymphangiogenic VEGFs**

**Jaana Künnapuu <sup>1</sup> , Honey Bokharaie <sup>1</sup> and Michael Jeltsch 1,2,3,\***


**Simple Summary:** Vascular endothelial growth factors (VEGFs) regulate the growth of blood and lymphatic vessels. Some of them induce the growth of blood vessels, and others the growth of lymphatic vessels. Blocking VEGF-A is used today to treat several types of cancer ("antiangiogenic therapy"). However, in other diseases, we would like to increase the activity of VEGFs. For example, VEGF-A could generate new blood vessels to protect from heart disease, and VEGF-C could generate new lymphatics to counteract lymphedema. Clinical trials are testing the latter concept at the moment. Because VEGF-C and VEGF-D are produced as inactive precursors, we propose that novel drugs could also target the enzymatic activation of VEGF-C and VEGF-D. However, because of the delicate balance between too much and too little vascular growth, a detailed understanding of the activation of the VEGFs is needed before such concepts can be converted into safe and efficacious therapies.

**Abstract:** Specific proteolytic cleavages turn on, modify, or turn off the activity of vascular endothelial growth factors (VEGFs). Proteolysis is most prominent among the lymphangiogenic VEGF-C and VEGF-D, which are synthesized as precursors that need to undergo enzymatic removal of their Cand N-terminal propeptides before they can activate their receptors. At least five different proteases mediate the activating cleavage of VEGF-C: plasmin, ADAMTS3, prostate-specific antigen, cathepsin D, and thrombin. All of these proteases except for ADAMTS3 can also activate VEGF-D. Processing by different proteases results in distinct forms of the "mature" growth factors, which differ in affinity and receptor activation potential. The "default" VEGF-C-activating enzyme ADAMTS3 does not activate VEGF-D, and therefore, VEGF-C and VEGF-D do function in different contexts. VEGF-C itself is also regulated in different contexts by distinct proteases. During embryonic development, ADAMTS3 activates VEGF-C. The other activating proteases are likely important for non-developmental lymphangiogenesis during, e.g., tissue regeneration, inflammation, immune response, and pathological tumor-associated lymphangiogenesis. The better we understand these events at the molecular level, the greater our chances of developing successful therapies targeting VEGF-C and VEGF-D for diseases involving the lymphatics such as lymphedema or cancer.

**Keywords:** vascular endothelial growth factors (VEGFs); VEGF-A; PlGF; VEGF-B; VEGF-C; VEGF-D; angiogenesis; lymphangiogenesis; CCBE1; proteases; ADAMTS3; plasmin; cathepsin D; KLK3; prostate-specific antigen (PSA); thrombin; wound healing; metastasis; proteolytic activation; vascular biology; lymphedema

#### **1. Introduction**

In vertebrates, the family of vascular endothelial growth factors (VEGFs) typically comprises five genes: VEGF-A (in older literature often referred to simply as "VEGF"),

**Citation:** Künnapuu, J.; Bokharaie, H.; Jeltsch, M. Proteolytic Cleavages in the VEGF Family: Generating Diversity among Angiogenic VEGFs, Essential for the Activation of Lymphangiogenic VEGFs. *Biology* **2021**, *10*, 167. https://doi.org/10.3390/ biology10020167

Academic Editor: Hang Fai Kwok

Received: 15 January 2021 Accepted: 18 February 2021 Published: 23 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

placenta growth factor (PlGF), VEGF-B, VEGF-C, and VEGF-D. In addition to these orthodox VEGFs, several genes coding for VEGF-like molecules have been discovered in some members of the poxvirus and iridovirus families (collectively named VEGF-E) [1–4] and in venomous reptiles (collectively named VEGF-F) [5]. In vertebrates, the VEGF growth factors are central to the development and maintenance of the cardiovascular system and the lymphatic system. Non-vertebrates also feature VEGF-like molecules, but their functions are less well defined. –

A (in older literature often referred to simply as "VEGF"),

The subdivision of the vertebrate vascular system into the cardiovascular and the lymphatic system is reflected at the molecular level by a subdivision of the VEGF family into VEGFs acting primarily on blood vessels (VEGF-A, PlGF, and VEGF-B) and VEGFs acting mostly on lymphatic vessels (VEGF-C and VEGF-D). This specificity results from the expression pattern of the three VEGF receptors (VEGFRs). VEGFR-1 and VEGFR-2 are expressed on blood vascular endothelial cells (BECs), while lymphatic endothelial cells (LECs) express VEGFR-2 and VEGFR-3 (Figure 1).

– **Figure 1.** Vascular endothelial growth factors (VEGFs) act on blood vessels and/or lymphatic vessels depending on their affinities towards VEGF receptors 1, -2, and -3. VEGFR-2 is expressed on both blood and lymphatic endothelium. In principle, growth factors that do activate VEGFR-2 can promote both the growth of blood vessels (angiogenesis) and lymphatic vessels (lymphangiogenesis). VEGF-E and VEGF-F are not of human origin: VEGF-E genes are found in viral genomes, and VEGF-F is a snake venom component. All receptor-growth factor interactions require the extracellular domain 2 of the VEGF receptors (shown in yellow) [6–9]. Domain 3 of VEGFR-2 is important for the interaction of VEGFR-2 with both VEGF-A [7] and VEGF-C [8], and domain 1 of VEGFR-3 is important for the interaction of VEGF-C with VEGFR-3 [9].

– The biology of the VEGFs and their signaling pathways has been extensively discussed elsewhere [10,11]. From all VEGF family members, only VEGF-A and VEGF-C are essential in the sense that constitutive ablation of their genes in mice results in embryonic lethality [12–14]. VEGF-A levels are so crucial that even heterozygous mice are not viable. In fact, *VEGFA* was the first gene where the deletion of a single allele was shown to be embryonically lethal [12,13]. While the primary function and importance of the cardiovascular system are also obvious to the layperson, the tasks of the lymphatic system escape even some life science professionals. Its major three tasks are:

1. Tissue drainage for fluid balance and waste disposal


Considerable effort has been devoted to the mechanisms and effects of receptor binding and downstream signaling of the VEGFs. Less is known about the processes upstream of receptor binding such as secretion, release, and proteolytic processing. In this review, we want to briefly give an overview of what is known about the proteolytic processing of VEGFs with a focus on the lymphangiogenic VEGFs.

Evolutionarily, the importance of proteases has been remarkable. Proteolytic processing often regulates protein activity and creates variation in a protein's function. This has been suggested by phylogenetic and functional studies in all kingdoms of life, including viruses [15], plants [16], and animals [17]. Not surprisingly, proteases are used to regulate function and create functional variety in the VEGF family and can be regarded as signaling molecules [18].

#### **2. Proteolytic Processing of the Hemangiogenic VEGFs**

Among the hemangiogenic VEGFs, protein diversification within a single VEGF family member relies more on differential mRNA splicing than on proteolytic processing (Figure 2; reviewed in [19]). mRNA splicing generates several isoforms of VEGF-A, which differ by the extent of the C-terminal, predominantly basic heparin-binding domain (HBD) [20–22]. The HBD mediates the interaction of VEGF-A with the extracellular matrix (ECM), cell surface heparan sulfate proteoglycans (HSPGs), and neuropilin-1. The interaction with HSPGs involves both a sequence-specific binding epitope and electrostatic effects of a predominantly basic amino acid sequence. Only a few isoforms are entirely devoid of heparin-binding properties under physiological conditions and therefore fully soluble. Mice expressing only the major soluble isoform (VEGF-A121) are born but show severe cardiovascular defects and die from cardiac failure [23].

The matrix-binding properties of the larger VEGF-A isoforms are essential for generating growth factor gradients, which are assumed to be essential for efficient organ vascularization [24,25]. VEGF-A<sup>189</sup> and VEGF-A<sup>206</sup> are sequestered in the extracellular matrix (or on cell surface HSPGs), and at least VEGF-A<sup>189</sup> has been shown not to participate in receptor activation [26]. Proteases such as plasmin, urokinase-type plasminogen activator (uPA), and factor VII-activating protease (FSAP) can release and thus activate the ECM-bound, longer VEGF-A isoforms [21,27–29]. The cleavage of the primary isoform VEGF-A<sup>165</sup> can also be mediated by various matrix metalloproteinases (MMPs), especially MMP-3, resulting in smaller, non-heparin-binding products [30]. While such cleavages do liberate VEGF-A and are necessary for the mitogenic activity of VEGF-A<sup>189</sup> [26], they were reported to reduce the mitogenicity of VEGF-A<sup>165</sup> [31]. Unfortunately, there is little insight into the nature of the molecular handover of HSPG- and ECM-bound VEGF-A to VEGFR-2 or the VEGFR-2/neuropilin signaling complex [11]. The isoform composition and the location where the cleavage happens are likely important determinants of the net effect. The release of cell surface HSPG-bound VEGF-A is perhaps more likely to result in productive signaling than the release of ECM-bound VEGF-A. Complementary to the ECM release by proteolytic cleavage of VEGF-A, enzymatic degradation of the ECM-binding sites, e.g., of HSPGs by heparinases, or binding site competition by heparin or heparan sulfate achieves the same release, but without loss of the HBD [27].

**Figure 2.** Most diversity among the hemangiogenic VEGFs is achieved by alternative splicing. Nevertheless, proteolytic processing of VEGF-A (**A**) [19] and placenta growth factor (PlGF) [32] (**C**) can convert the longer, heparin-binding isoforms into more soluble shorter species. (**B**) VEGF-B is a special case. Alternative splicing results in two isoforms that translate the same nucleotide sequence in two different frames resulting in a heparin-binding and a soluble isoform [33,34]. Due to the near-perfect cleavage context [35], thrombin has been suspected to be the responsible protease for VEGF-B<sup>186</sup> cleavage [36]. Prothrombin is indeed expressed by 293T cells [37], in which the cleavage has been demonstrated [33]. Plasmin cleaves VEGF-B<sup>186</sup> at at least four different sites, of which the two most likely predicted sites are indicated. Importantly, the predicted plasmin cleavage between Arg137 and Ala138 removes the interaction epitope for neuropilin-1 binding [33] Semi-transparent, blurry arrows indicate cleavages, for which only the approximate position is known. The figure shows only the most sensitive site from the plasmin cleavages of VEGF-A since prolonged incubation results in progressing degradation [30]. VEGF-B<sup>186</sup> appears to be progressively degraded by plasmin as well [33]. For VEGF-A and PlGF, the numbering is according to the longest shown isoform. VEGF-A is cleaved not only by MMP3 but also in a similar fashion by MMP7, MMP9, MMP19, and - less efficiently - by MMP1 and MMP16 [30].

> Of the four human PlGF isoforms, PlGF-2 and -4 also contain a C-terminal heparinbinding domain. At least the PlGF-2 HBD can be removed by plasmin [32]. VEGF-B<sup>167</sup> also contains a heparin-binding domain homologous to the one in VEGF-A165, but it is unknown whether this domain is subject to proteolytic removal. A yet unknown protease unmasks the neuropilin-1 binding site of the longer VEGF-B<sup>186</sup> isoform, but its target site is absent in VEGF-B<sup>167</sup> [33]. The cleavage context suggests that thrombin can unmask

the neuropilin-1 binding epitope (see Figure 2) [35]. Plasmin cleavage at the same site is likely but does not result in neuropilin-1 binding due to additional cleavages that remove important sequences for neuropilin-1 binding [33].

#### **3. The Lymphangiogenic Growth Factors VEGF-C and VEGF-D**

The hemangiogenic VEGFs are rendered inactive either through ECM-association or—as in the case for VEGF-A189—by their C-terminal auxiliary domain. Preventing receptor activation using inhibitory domains is also characteristic of the lymphangiogenic VEGFs. Upon secretion, VEGF-C and VEGF-D are kept inactive by their N- and C-terminal propeptides. Hence, the secreted forms are referred to as pro-VEGF-C and pro-VEGF-D. The removal of the propeptides requires two concerted proteolytic cleavages and happens in a very similar fashion for both VEGF-C and VEGF-D (see Figure 3): — —


**Figure 3.** Two proteolytic cleavages are needed to activate VEGF-C and VEGF-D. The first cleavage, by protein convertases, is constitutive and intracellular. The second is highly regulated and happens after secretion of the pro-forms. Many different enzymes have been shown to catalyze the second cleavage, but the primary activating protease of VEGF-C in mammalian developmental lymphangiogenesis is A Disintegrin and Metalloprotease With Thrombospondin Motifs-3 (ADAMTS3). The immunoprecipitation (IP) of transfected 293T cells with a VEGFR-3(EC)/IgGFc fusion protein pulls down the 58 kDa full-length VEGF-C, the pro-VEGF-C peptides of 31 kDa and 29 kDa, and the mature VEGF-C. Proteins were resolved under reducing conditions by SDS-PAGE.

Interestingly, pro-VEGF-C can competitively block the receptor activation of active, mature VEGF-C. Its propeptides allow VEGF receptor binding but interfere with receptor activation. Apart from VEGFR-3, pro-VEGF-C also binds the co-receptor neuropilin-2. C-terminal propeptide processing exposes two terminal arginines (R226,227), which contribute to the conserved binding site for neuropilins [47]. Because it is not entirely clear whether pro-VEGF-C is completely incapable of receptor activation or whether it has some residual activity, pro-VEGF-C is either a partial agonist or an antagonist of mature VEGF-C [43].

#### **4. Plasmin and Thrombin**

The serine protease plasmin was the first protease that was shown to activate both VEGF-C and VEGF-D. Plasmin can remove both the N- and the C-terminal propeptides of VEGF-D to create a mature form containing only the VEGF homology domain [44]. One of plasmin's main functions is to degrade fibrin, the main component of blood clots. Thrombin is the newest addition to the group of VEGF-C/D-activating enzymes. In addition to its classical role in converting soluble fibrinogen into insoluble fibrin fibrils during blood clotting, it plays a crucial role in early wound healing [46] by activating VEGF-C, which is released from *α*-granules upon platelet aggregation [48]. Hence both thrombin and plasmin act concertedly to maintain a supply of active VEGF-C over the entire wound healing period.

However, in vitro, where no feedback loop exists to limit plasmin activity, prolonged exposure of VEGF-C results in VEGF-C inactivation [43]. In any case, without tissue damage, inactive prothrombin is not converted into thrombin, and inactive plasminogen not into plasmin. Therefore, in vivo, VEGF-C activation by thrombin or plasmin is likely restricted to situations with tissue damage. Using a similar rationale, platelet-rich plasma has been proposed for the treatment of lymphedema [49] and to promote wound healing [50]. Some proteomics studies have occasionally missed VEGF-C (as well as VEGF-A) when examining the platelet proteome, which might result from the relative resistance of the VEGF cystine knot to digestion with trypsin or similar proteases (unpublished data by the author). Nevertheless, other analyses and pharmacokinetic studies on anti-VEGF-C antibodies confirm the early findings of VEGF-C release during blood coagulation [51,52].

**Figure 4.** Human VEGF-C and -D are processed in a very similar fashion. The major difference between VEGF-C and -D is that ADAMTS3 activates VEGF-C, but not VEGF-D. This is one of the reasons why ADAMTS3 and VEGF-C are essential for lymphatic development and embryonic survival [14,42], whereas VEGF-D deletion in mice is well tolerated [53]. While the figure shows the exon structure of VEGF-C and -D, mRNA splice isoforms have only been reported for murine Vegfc [54]. The detected splice variants do not contain the full VEGF homology domain and are therefore not shown here. \*Cleavage site is only predicted based on the amino acid context.

Plasmin activation of VEGF-C, which has been shown independently by two different groups [43,44], was not detected in a recent study [46]. Possibly, cleavage products might not have been recognized by the antibody due to low sensitivity or an absent epitope. Alternatively, the internal FLAG-tag preceding the cleavage site, which was used to prevent detection failure due to isoform-specific VEGF-C antibodies, might have interfered with the activation.

#### **5. ADAMTS3 and the Cofactor CCBE1**

ADAMTS3 was identified in the search for the endogenous protease that activates VEGF-C. Although plasmin had been identified as a VEGF-C-activating protease [44], it was never seriously considered as a physiological activator of VEGF-C due to its function in fibrin clot degradation. Moreover, lymphatic phenotypes have never been reported for the plasminogen knock-out mice [55] or human homozygous functional ablations [56].

In 2009, Alders used homozygosity mapping to identify mutations in the human *CCBE1* gene as a cause of Hennekam Syndrome (HS) [57], which is characterized by generalized lymphatic dysplasia [58]. When the same *Ccbe1* gene was ablated in zebrafish or mice [59,60], the phenotype was closely phenocopying the *Vegfc* knock-out [14]. Because it lacks any protease signature, CCBE1 was assumed to be somehow essential for the VEGF-C/VEGFR-3 signaling pathway, but not to be the VEGF-C-activating protease itself.

Co-transfection of CCBE1 with VEGF-C demonstrated that CCBE1 enhances the proteolytic processing of VEGF-C in 293T cells, and ADAMTS3 was identified as the responsible protease by mass spectrometric analysis of a partially purified CCBE1 from a CCBE1-overexpressing 293T cell line [43]. Based on in vitro data and its high homology to ADAMTS2, ADAMTS3 had been thought to function in the proteolytic maturation of procollagens [61]. In mice, *Adamts3* deletion does not lead to collagen fibril assembly deficiencies but instead aborts lymphatic development [42]. In humans, mutations in *ADAMTS3* have similarly been shown to result in a lymphatic phenotype, while symptoms associated with procollagen cleavage defects are absent [56]. Although these and other publications have confirmed that both ADAMTS3 and CCBE1 are required for successful pro-VEGF-C activation, a direct interaction between VEGF-C and CCBE1 has never been demonstrated [41,62,63].

Both domains of CCBE1 accelerate the activation of VEGF-C independently [43,54,58], using different mechanisms. While the N-terminal domain of CCBE1 appears to facilitate pro-VEGF-C encounters with ADAMTS3, the C-terminal domain acts like a coenzyme [64]. From all VEGF-C-activating enzymes, only ADAMTS3 and PSA/KLK3 have been shown to be influenced by CCBE1.

#### **6. Species-Specific Differences**

Based on sequence similarity, in vitro substrate, and domain organization, ADAMTS2, -3, and -14 form the aminoprocollagen peptidase subgroup within the ADAMTS protein family. Species-specific differences in the function of these proteases are seen in vertebrates. In zebrafish, Adamts3 and Adamts14 compensate for each other, and only the double *Adamts3/Adamts14* knock-out shows a lymphatic phenotype comparable to the *Vegfc* knockout [65]. Such compensation does not happen in Adamts3-deficient mice, which are completely devoid of functional lymphatics [42]. Whether the observation that human ADAMTS14 can activate VEGF-C in vitro [65] reflects species differences among mammals or whether it is an observation without a physiological equivalent is still unknown.

Important species differences have also been reported for the growth factors. In mice, VEGF-D is dispensable for the development of the lymphatic system [53], while this is not the case in zebrafish, where it is, e.g., required to form the medial and lateral facial lymphatics [66,67]. However, even murine VEGF-D reportedly differs from human VEGF-D in its inability to interact with mouse VEGFR-2 [68]. Exactly the opposite seems to be the case in zebrafish, where the VEGF-D-VEGFR-3 interaction was reported to be absent [69], implying that lymphangiogenesis might happen in zebrafish independently of

VEGFR-3. While direct demonstrations of the substrate specificities of the zebrafish Adamts proteases are still missing, it appears clear that zebrafish data are not easily extrapolated to mammals. Unfortunately, the same might be true for the extrapolation of mouse data to humans.

#### **7. Which Cell Types Provide ADAMTS3 and CCBE1?**

Since VEGF-C, ADAMTS3, and CCBE1 are all secreted proteins, immunohistochemistry cannot reveal their cellular origin. In the establishment of the early zebrafish lymphatics, *Pdgfra*-positive fibroblast populations appeared to be the source for *Vegfc*, *Adamts3*, *Adamts14,* and *Ccbe1*, as identified by single-cell RNA sequencing [65]. While in vitro data support the notion that fibroblasts are perhaps the dominating source for CCBE1 also in mammals [63], smooth muscle cells appear to make a significant contribution [14,70]. In some contexts, blood vascular endothelial cells appear to also be an important source of VEGF-C [71,72] and CCBE1 [73,74]. However, these are crude approximations of the actual cellular heterogeneity, and in non-homeostatic situations such as inflammation or cancer, other cell types, e.g., immune cells such as macrophages, are likely significant producers of both VEGF-C and VEGF-C-activating proteases [75–77].

#### **8. Enigmatic Propeptides**

The evolutionary origins of both propeptides of VEGF-C and VEGF-D are unclear. Unless assuming horizontal gene transfer, they have been conserved for hundreds of millions of years and can be found in virtually all invertebrate VEGF homologs [78–80]. Apart from the VEGFs, the only homologous sequences were found within larval silk proteins of the mosquito genus Chironomus [38,81], resulting in the nickname "silk homology domain" for the C-terminal propeptide.

Because disulfide bonds link the C- and the N-terminal propeptides of VEGF-C and VEGF-D, the first, constitutive cleavage by the protein convertase furin (or PC5 or PC7) does not remove any of the propeptides from VEGF-C or VEGF-D. Both propeptides are released simultaneously with the activating cleavage between the N-terminal propeptide and the VEGF homology domain (see Figures 3 and 4). With 80 and 192 amino acid residues, respectively, the N- and C-terminal propeptides of VEGF-C are significantly longer than typical propeptides. They also fold independently and are therefore also often referred to as N- and C-terminal *domains*. According to the current understanding, the propeptides serve multiple functions.


Analogous to VEGF-A, the heparin-binding properties are likely necessary for the correct spatio-temporal distribution of the growth factor and its activity. When the VEGF-C propeptides are grafted upon VEGF-A, the resulting blood vasculature was denser compared with VEGF-A-induced vasculature [83]. Vice versa, when the C-terminal domain of VEGF-C was replaced by the heparin-binding domain of VEGF-A, less but larger lymphatic vessels were generated, which localized preferentially to HSPG-rich structures such as basement membranes [84]. The heparin binding of VEGF-C is somewhat weaker compared to that of VEGF-A. Although most heparin-binding affinity resides in the C-terminal propeptide, mature VEGF-C is a heparin-binding growth factor. VEGF-A<sup>165</sup> binds tightest to heparin requiring 0.8 M NaCl for elution, while pro-VEGF-C and mature VEGF-C require elution concentrations of 0.435 and 0.265 M, respectively [82]. This might explain why both mature and pro-VEGF-C have a local effect and do not diffuse far [65]. That the C-terminal

domain mediates the association or embedding of VEGF-C in the extracellular matrix was speculated shortly after its discovery [38] and was recently directly demonstrated in vitro [63]. Thus, pro-VEGF-C might be similar in this respect to latent TGF-β [85].

#### **9. Changing Receptor Preferences with KLK3 and Cathepsin D**

Based on N-terminal sequencing, two different mature, active forms were identified for VEGF-C and VEGF-D [38,86]. In the supernatant of 293 cells, the shorter mature form of VEGF-C was the dominant mature ("major") form, while, for VEGF-D, the longer mature form was dominant. While this indicated early on that different proteases are involved in VEGF-C and VEGF-D activation, it remained unknown which proteases were involved. In 2011, Leppänen et al. found that the shorter ("minor") form of active VEGF-D was not able to activate VEGFR-3 [87]. This finding was surprising as the activation of VEGFR-3 is considered a prerequisite of being lymphangiogenic. It also indicated for the first time that a lymphangiogenic growth factor can be converted into an angiogenic growth factor by proteolysis. At the same time, it explained why VEGF-D had been identified in some experimental settings as a powerful angiogenic growth factor [88]. While other research confirmed the disparity between VEGF-C and VEGF-D in terms of protease utilization for activation [41], the exact nature of the VEGF-D-activating proteases remained unknown until 2019 when Jha et al. tested whether their newly discovered VEGF-C-activating proteases PSA and Cathepsin D (CatD) could also activate VEGF-D [45]. In fact, CatD was able to generate the VEGFR-2-specific mature form of VEGF-D, which Leppänen et al. had described in 2011 [87]. Despite this, it remains to be shown which protease activates VEGF-D in vivo and whether there is a "physiological protease" equivalent to the VEGF-Cactivating ADAMTS3. Perhaps VEGF-D is solely activated in non-homeostatic situations such as tissue damage. Nevertheless, also without any pathological challenge, VEGF-D knock-out mice display subtle alterations in some lymphatic networks [53,89]. These minor phenotypes could result from a lack of activated VEGF-D, but equally well from a lack of pro-VEGF-D (assuming it has some low level of activity) or possibly VEGF-C/VEGF-D heterodimers.

When comparing the effects of different VEGF-C- and VEGF-D-activating proteases [45], two trends are visible, which are summarized in Figure 5:


– **Figure 5.** Biochemically, the group of lymphangiogenic activator enzymes is diverse. It includes a metalloproteinase (ADAMTS3), serine proteases (prostate-specific antigen (PSA)/KLK3, thrombin, and plasmin), and an aspartic protease (cathepsin D). VEGF-C and VEGF-D share all activating enzymes except for the most important one: ADAMTS3. ADAMTS3 is exclusive for VEGF-C and required for the physiologic activation of VEGF-C during developmental lymphangiogenesis [41–43]. VEGF-C and VEGF-D are differently affected by proteolytic processing. With progressing processing, VEGF-C largely maintains its lymphangiogenic properties but loses its angiogenic properties quickly. VEGF-D behaves precisely the opposite way: processing with Cathepsin D almost completely abolishes its lymphangiogenic properties and fully unmasks its angiogenic properties [45]. Extensive exposure of both VEGF-C and VEGF-D to plasmin abolishes all VEGFR-2 and VEGFR-3 binding properties.

#### **10. Secondary Processing and Inactivation**

At least in vitro, the longer forms of activated VEGF-C and VEGF-D can undergo additional cleavages, further shortening the N-terminus and modifying the receptor binding capabilities, e.g., CatD can remove the lymphangiogenic potential from plasmin-activated VEGF-D and the angiogenic potential from ADAMTS3-activated VEGF-C [45]. Finally, a cleavage by plasmin can inactivate VEGF-C and VEGF-D. However, such secondary (or tertiary) processing has not yet been demonstrated in vivo.

#### **11. Other Cleavages**

The activating, N-terminal cleavage of VEGF-D has also been proposed to be mediated by the protein convertases furin or PC5 [39]. While this is certainly a possibility, it seems unlikely that this represents a significant VEGF-D activation mechanism in vivo. The mature VEGF-D produced in furin-deficient Lovo cells upon transfection with furin could well be due to any other endogenous protease in Lovo cells. The requirement for furin in this system might occur if C-terminal furin processing was a prerequisite for the activating N-terminal cleavage. However, such a prerequisite seems not to exist for VEGF-C [41]. Vice versa, plasmin [44] or PSA [45] have not only been shown to perform the N-terminal processing of VEGF-D, but also the C-terminal processing. Nonetheless, this is also likely irrelevant since the protein convertases cleave constitutively inside the cell before VEGF-D ever has the chance to encounter plasmin or PSA.

#### **12. Possible Involvement in Reproduction and Wound Healing**

That VEGF-C is a possible substrate for kallikrein-like peptidases had been proposed before [90], but the identification of KLK3 (also known as prostate-specific antigen, PSA) was, nevertheless, surprising, especially because KLK3 is largely confined to sperm plasma. The presence of VEGF-C, CCBE1, and a VEGF-C-activating protease in sperm plasma [45] is too tempting not to speculate about a possible function of VEGF-C for reproductive biology, but such has not been confirmed yet. Mutations in KLK3 affect male fertility [91], but this is unsurprising since the main biological function of KLK3 is the degradation of gel-like seminogelins, which releases the sperm cells [92]. VEGF-A, which is also present in sperm plasma [93,94], had a modest effect on sperm motility [95]. Therefore, similar experiments were attempted with VEGF-C yielding very variable results (unpublished data by the author), perhaps due to the logistically challenging experimental setup.

Like KLK3, cathepsin D was also identified after an exhaustive analysis of bodily fluids for possible VEGF-C-cleaving activities [45]. VEGF-C is deposited by virtue of its C-terminal domain into the extracellular matrix and released by proteases [63]. Because VEGF-C accelerates wound healing [96,97], it appears possible that Cathepsin D provided by wound licking might activate latent ECM-embedded pro-VEGF-C. Thus, an instant angiogenic, lymphangiogenic, and immunologic stimulus would be provided. Compared to a single gene in humans, *KLK1* was several times duplicated in rodents, leading to at least 23 *KLK1* orthologs (some of which being pseudogenes), and some researchers believe that the evolutionary pressure to heal bite wounds rapidly and efficiently was driving this expansion [98].

#### **13. Activating VEGF-C and VEGF-D in Cell Culture**

While there are several cell lines that endogenously express VEGF-C or VEGF-D (most notably PC-3, from which VEGF-C was originally identified [99]), almost all experiments that require the expression of these growth factors have been performed by cDNA transfection. When the full-length wildtype cDNAs are used, the inactive pro-forms dominate in the cell culture supernatant of most cell lines (see Figure 3). Cells that express both CCBE1 and ADAMTS3 (such as cell lines derived from 293 cells) will process at least some of the pro-VEGF-C into mature, active VEGF-C. This endogenous background activation is sufficient to detect mature VEGF-C even in the absence of added proteases (see Figure 3). If these background activation bands are missing from a Western blot, the detection is likely not very sensitive, or something interferes with the physiological activation of VEGF-C by ADAMTS3. The degree of processing is relatively difficult to predict and appears to depend on cell density, stress level, cell culture medium, and—most importantly—VEGF-C expression levels. In any case, the processing is inefficient, and the 293T cell line that was used to generate the gel image in Figure 3 is among the cell lines that most efficiently activate VEGF-C endogenously.

#### **14. Truncated cDNAs Are Used to Recombinantly Express Pre-Activated VEGF-C and VEGF-D**

Therefore, when larger amounts of active VEGF-C are required, the solution has been to express a mutant VEGF-C cDNA, from which the sequences coding for the propeptides have been deleted ("∆N∆C-VEGF-C"). All recombinant, commercially available VEGF-C and VEGF-D proteins are produced in this fashion. However, because the signal peptide's cleavage context is disturbed, the N-terminus of the resulting protein can differ from the endogenously activated VEGF-C. Only N-terminal sequencing can reveal which form of VEGF-C is present. With few exceptions (R&D Systems), vendors do not provide this information. The same is true for many scientific publications that use truncated cDNAs to express VEGF-C or VEGF-D. While it is possible to predict the signal peptidase's likely cleavage position, only N-terminal sequencing can give a definite answer. Many of the early experiments involving recombinant VEGF-D have used a truncated cDNA that results in a VEGF-D form, which is an intermediate between the VEGFR-2-monospecific

and the VEGFR-2-/VEGFR-3-bispecific endogenous VEGF-D forms, making it difficult to interpret the data [68]. However, after the recent identification of the cleaving proteases, it became possible to generate specific mature forms by co-transfection of the protease with the full-length wildtype growth factor cDNA [43,45]. However, when using pre-activated forms of VEGF-C or VEGF-D, one should remember that it is unclear whether these exist as independent species in vivo. Pro-VEGF-C efficiently binds VEGFR-3 in the context of neuropilin-2, and the "in-situ" activation of pro-VEGF-C (while being bound to VEGFR-3) might be the standard mode of activation [43,63]. Interestingly, a transgenic mouse expressing pre-activated (∆N∆C-VEGF-C) VEGF-C under the control of the keratin-14 promoter did not show the characteristic lymphatic phenotype in the skin as mice expressing VEGF-C from a full-length cDNA under the same promoter (unpublished data by the author) [100].

#### **15. Modulation of Proteolytic Processing**

Protease inhibitors have a veritable track record as drugs, targeting, e.g., viral proteases in AIDS and other viral infections [101], neutrophil elastase in lung diseases [102], and angiotensin-converting enzyme in cardiovascular diseases [103]. The opposite approach—promoting proteolysis—has also resulted in life-saving treatments, e.g., the use of tissue plasminogen activator (tPA) to dissolve blood clots in the immediate treatment of ischemic stroke [104].

Given the importance of the lymphatic system in many diseases [105], both VEGF-C and VEGF-D are likely worthwhile drug targets. Lymphedema, the swelling of organs or tissues due to an absent, hypoplastic, dysfunctional, or overloaded lymphatic network, represents a major clinical challenge because no causal, only symptomatic, treatments are available. The concept of pro-lymphangiogenic therapy to treat lymphedema has progressed to clinical trials using adenoviral VEGF-C gene therapy [106]. In these trials, pro-VEGF-C is produced from a full-length cDNA, relying on endogenous proteases for its activation. Using an adenovirus that produces pre-activated VEGF-C from a truncated cDNA would remove the requirement for endogenous proteases, which might or might not be a bottleneck. Alternatively, co-delivering CCBE1 and/or ADAMTS3 using an adenovirus cocktail could also boost the amount of active VEGF-C. However, the full-length cDNA of VEGF-C has been preferred over the truncated cDNA in most preclinical studies. Removing the N-terminal propeptide from VEGF-C results in an unpaired cysteine residue in VEGF-C's receptor-binding domain. This extra cysteine residue is conserved among all VEGF-C and VEGF-D orthologs but is absent from all other VEGF family members [99,107,108]. Therefore, when active VEGF-C or VEGF-D are expressed directly from a truncated cDNA, the monomeric growth factor can be the predominant species [103]. Monomeric growth factor exposes the dimerization interface to the environment and is predicted to interfere with receptor dimerization and activation. For crystallization studies, the extra cysteine residues can be mutated to promote dimerization [8,82], but the use of mutated proteins as biological drugs requires solid justification.

A continuous low-level supply with VEGF-C appears necessary to maintain the structure and functionality of heavily engaged lymphatic networks [70,109,110]. As an alternative to VEGF-C itself, a highly specific VEGF-C-activating protease might equally be suitable if it can activate endogenous ECM-embedded VEGF-C. Such VEGF-C activation might both act via stimulating lymphatic pumping [111] and by inducing a compensatory expansion of the lymphatic network [112].

In inflammatory and infectious diseases, the lymphatic network must manage the fluid balance during inflammatory swelling. Perhaps more importantly, there is increasing evidence that both innate and adaptive immunological responses are crucially dependent on the lymphatics during all stages of an immune response [113,114]. Thus, the activation of VEGF-C could be used as a generic means to boost any immune response like an adjuvant.

#### **16. Proteolytic Activation of VEGF-C and VEGF-D in Cancer**

The crucial role of tumor-associated lymphatics for metastasis was recognized early on [115–118], and VEGF-C/VEGF-D inhibition has been proposed to therapeutically block metastasis. Since the tumor-promoting effects of VEGF-C and VEGF-D likely require proteolytic processing, inhibition could not only target the growth factors or receptors, but also the activating proteases. In vitro, VEGF-C-expressing MCF-7 and MDA-MB-435 cells, which have been used for xenograft tumor models, are inefficient in activating the growth factor [115,117]. Therefore, it is assumed that the activating proteases are supplied in these xenograft models by the stromal tumor compartment, perhaps by fibroblasts, inflammatory or endothelial cells [119,120]. Harris et al. generated a mutated form of VEGF-D, which is resistant to proteolytic activation They showed that this mutant could not promote tumor growth and lymph node metastasis in a mouse tumor model [121].

Tumor cells can migrate and form distant metastases via two distinct pathways: via blood vessels (hematogenic spread) and via lymphatics (lymphogenic spread). VEGF-C and VEGF-D can stimulate both pathways. By stimulating lymphangiogenesis into the tumor periphery (and occasionally also into the tumor), these growth factors maximize the access of tumor cells to the lymphatic vasculature. VEGF-C further appears to actively prepare the downstream lymph nodes for arriving cancer cells [122]. If the tumor happens to express suitable proteases, it is likely that VEGF-C (and even more so VEGF-D) is activated into forms that mimic VEGF-A but which are not inhibited by current anti-angiogenic treatments [123]. Such angiogenic redundancy might be one of the reasons why VEGF-A treatment is much less universal as initially anticipated, and why in amenable cancers, initial treatment success is usually followed by the development of resistance [124].

However, anti-lymphangiogenic therapy is a double-edged sword [125] because tumor-associated lymphatics are crucially important for the immune response against the tumor. When VEGF-C action was blocked in a mouse tumor model treated with immunotherapy, the mice receiving the anti-VEGF-C treatment died earlier than those who did not receive the treatment [126]. Similarly, in a mouse glioblastoma model, VEGF-C could amplify the CD8+ T cell response against the tumor [127]. In order to be able to successfully target VEGF-C in cancer, a thorough understanding of the underlying molecular mechanisms is needed. This understanding might ultimately allow us to separate the metastasis-enhancing function of VEGF-C from the immune-response-enhancing function of VEGF-C. Such separation might involve activating VEGF-C into a largely VEGFR-3-specific form eliminating some (but not all) of VEGF-C's angiogenic features. However, since trafficking via the lymphatic neovasculature is integral to both tumor cell dissemination and immune response, separating these two functions appears unlikely to be feasible at the level of the VEGF-C/VEGFR-3 signaling axis. However, our understanding of the underlying molecular events is highly incomplete as we do not even know which specific proteases are activating VEGF-C in human cancers. A high-probability guess is that different proteases are involved depending on the cancer type.

#### **17. Blocking VEGF-C and VEGF-D Activation**

The proteolytic processing of VEGF-C and VEGF-D has been so far experimentally blocked only by mutagenesis of the cleavage sites. Furin and related protein convertases cleave VEGF-C after the double arginines (R226,227). Joukov et al. reported that mutating these arginines into serines (R226,227S) mostly blocked VEGF-C processing [38]. This is surprising since the N-terminal cleavage site should have been still subject to proteolytic attack because the first, constitutive C-terminal cleavage and the second, N-terminal cleavage were subsequently shown to occur independently of each other [41]. To generate an even more activation-resistant form of VEGF-D, Harris et al. mutated, in addition to the C-terminal cleavage site, the major N-terminal cleavage site and reported, similar to Joukov et al., almost complete abrogation of VEGF-D activation [121]. It is unclear why the unmutated minor N-terminal cleavage site in this protein did not result at least in partial activation. After all, in the same 293 EBNA cell line, the minor N-terminal cleavage site had been shown to account for approximately 20% of the activated protein [86]. However, a therapeutic effect might not require full inhibition of cleavage because pro-VEGF-C acts as an antagonist of mature VEGF-C, and therefore, a low level of cleavage might be acceptable [43].

#### **18. Lymphedema and Genetic Lesions Affecting the Activation of VEGF-C**

Lymphedema is traditionally categorized into either primary or secondary lymphedema. Most lymphedema cases fall into the secondary category, resulting from various external insults to the lymphatic system, surgery and infection being most common [128]. On the other hand, primary lymphedema results from genetic lesions, which can be inherited or acquired during development. The latter makes them more challenging to identify since the genetic lesion might be present only in a subset of cells or organs [129]. For roughly 40% of primary lymphedema cases, the underlying genetic lesion can be identified, and a clinical guide for this process has been established [130]. However, secondary lymphedema also has a genetic component [131–133], and thus, primary and secondary lymphedema should be considered the endpoints of a continuous spectrum.

All mutations that disrupt the VEGF-C/VEGFR-3 signaling pathway result in hereditary lymphedema (Figure 6). While the most common type of hereditary lymphedema is caused by a mutation in the VEGF-C receptor [134,135], any signaling pathway components can be affected, including the proteolytic activation of VEGF-C. Thus, in human lymphedema patients, disease-causing mutations have been found in VEGF-C itself [136,137], in its activating protease ADAMTS3 [62,63], and the cofactor CCBE1 [57,138]. Genetic lesions in *FAT4* result in a phenotype closely resembling the phenotypes caused by mutations in *CCBE1* or *ADAMTS3*. Only recently, Hennekam Syndrome was split into three different subtypes depending on the underlying genetic lesion (Hennekam Syndrome Type 1, 2, and 3).

Interestingly, the primary function of FAT4 is likely unrelated to VEGF-C processing or VEGFR-3 signaling. Instead, it appears necessary for the flow-dependent establishment of lymphatic endothelial cell polarity [139]. However, in vitro analysis of FAT4 has been hampered because it is a very large protein with a highly repetitive structure [140]. Besides VEGF-C/VEGFR-3 signaling pathway genes known to be compromised in hereditary lymphedema or Hennekam Syndrome, Table 1 also lists selected other genes known to be causative for hereditary diseases featuring lymphedema as a cardinal symptom. Furthermore, Table 1 contains, in addition to ADAMTS3, the genes of all other proteases reportedly able to activate VEGF-C and VEGF-D. For all of these, genetic lesions have been described, but lymphedema has not been associated with any of them, arguing that they are not required for the development of the lymphatic system.




**Table 1.** *Cont*.

**Figure 6.** Genes of the VEGF-C/VEGFR-3 signaling pathway known to be involved in hereditary lymphedema. Although different components of the signaling pathway can be affected, the largest fraction of cases are caused by mutations in the *FLT4* gene, which codes for the VEGF-C receptor VEGFR-3.

#### **19. Outlook: Molecular Nudging**

easily modified ex vivo (blood diseases such as sickle cell disease and β ld be needed since the lymphatic system penetrates almost all our bodies' level activation ("molecular nudging") of With the first successes in Crispr-Cas clinical trials, genetic deficiencies within the VEGF-C/VEGFR-3 signaling pathways appear at least theoretically amenable for repair. However, even cutting-edge trials limit themselves at the moment to cells that can be easily modified ex vivo (blood diseases such as sickle cell disease and β-thalassemia) [152] or to very localized targets [153]. We are still far from a systemic repair of solid tissues, which would be needed since the lymphatic system penetrates almost all our bodies' organs. Since at least a fraction of the VEGF-C appears to originate from blood vascular endothelial cells, a vascular-targeted repair appears possible [154]. If sufficiently specific, the systemic delivery of regulatory factors such as CCBE1 or ADAMTS3 might alternatively result in a widespread low-level activation ("molecular nudging") of endogenous VEGF-C and a therapeutic effect. While such interventions do not reverse developmental routes already taken, they still might significantly improve life quality.

For cancer, being the prototype of a moving drug target, molecular nudging is not likely to have any impact. While a multitargeted anti-VEGF-A/-C/-D therapy might result in improved survival, any progress in this area will likely be incremental since using alternative tumor angiogenesis factors is only one of many escape mechanisms that tumors can deploy [124].

**Author Contributions:** Writing—original draft preparation, M.J.; writing—review and editing, J.K., H.B., M.J.; visualization, H.B., M.J.; supervision, project administration, and funding acquisition, M.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Päivikki and Sakari Sohlberg Foundation, the Novo Nordisk Foundation (#21036), and the Academy of Finland (#337430, #337120). M.J. was supported by the Paulo Foundation and the Einar and Karin Stroem Foundation for Medical Research; H.B. was supported by a Fellowship from the Finnish National Agency for Education (TM-20-11320).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Evolutionary Analysis of Cystatins of Early-Emerging Metazoans Reveals a Novel Subtype in Parasitic Cnidarians**

**Pavla Bartošová-Sojková 1,\* , Jiˇrí Kyslík 1,2, Gema Alama-Bermejo <sup>1</sup> , Ashlie Hartigan <sup>3</sup> , Stephen D. Atkinson <sup>4</sup> , Jerri L. Bartholomew <sup>4</sup> , Amparo Picard-Sánchez 1,5, Oswaldo Palenzuela <sup>5</sup> , Marc Nicolas Faber <sup>6</sup> , Jason W. Holland <sup>6</sup> and Astrid S. Holzer <sup>1</sup>**


**Simple Summary:** Cysteine protease inhibitors (cystatins) are molecules that play key protective roles in protein degradation and are involved in the immunomodulation of host responses to parasites. Little is known about the cystatin gene repertoire, evolution, and lineage-specific adaptations of early-emerging metazoans. Using bioinformatics searches, we identified orthologues of cystatins in basal animal lineages including free-living and parasite taxa. We aimed to explore whether their cystatin gene repertoire and evolution follow similar patterns recognized for derived metazoans and whether the modifications are linked to the organism's life history. We revealed that cysteine protease inhibitors from early-emerging animal groups are highly diverse, with modifications in gene organization and protein architecture. A new subtype of cystatins was discovered in the parasitic cnidarians, the Myxozoa, which has so far been only reported for a group of derived animals: trematode flukes. We set out hypotheses to describe the driving forces for the origins of this unique cystatin subtype and propose evolutionary scenarios elucidating the current existence of cystatins in the Metazoa, especially in their early-emerging lineages. Our research identified molecules for which future functional studies may help to identify their roles in host–parasite interactions and for the parasite itself.

**Abstract:** The evolutionary aspects of cystatins are greatly underexplored in early-emerging metazoans. Thus, we surveyed the gene organization, protein architecture, and phylogeny of cystatin homologues mined from 110 genomes and the transcriptomes of 58 basal metazoan species, encompassing free-living and parasite taxa of Porifera, Placozoa, Cnidaria (including Myxozoa), and Ctenophora. We found that the cystatin gene repertoire significantly differs among phyla, with stefins present in most of the investigated lineages but with type 2 cystatins missing in several basal metazoan groups. Similar to liver and intestinal flukes, myxozoan parasites possess atypical stefins with chimeric structure that combine motifs of classical stefins and type 2 cystatins. Other early metazoan taxa regardless of lifestyle have only the classical representation of cystatins and lack multi-domain ones. Our comprehensive phylogenetic analyses revealed that stefins and type 2 cystatins clustered into taxonomically defined clades with multiple independent paralogous groups, which probably arose due to gene duplications. The stefin clade split between the subclades of classical stefins and the atypical stefins of myxozoans and flukes. Atypical stefins represent key evolutionary innovations of the two parasite groups for which their origin might have been linked with ancestral gene chimerization, obligate parasitism, life cycle complexity, genome reduction, and host immunity.

**Citation:** Bartošová-Sojková, P.; Kyslík, J.; Alama-Bermejo, G.; Hartigan, A.; Atkinson, S.D.; Bartholomew, J.L.; Picard-Sánchez, A.; Palenzuela, O.; Faber, M.N.; Holland, J.W.; et al. Evolutionary Analysis of Cystatins of Early-Emerging Metazoans Reveals a Novel Subtype in Parasitic Cnidarians. *Biology* **2021**, *10*, 110. https://doi.org/10.3390/ biology10020110

Academic Editor: Hang Fai Kwok Received: 30 December 2020 Accepted: 31 January 2021 Published: 3 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Keywords:** cysteine protease inhibitor; stefin; signal peptide; parasite; phylogenetic analysis; diversification; protein structure

#### **1. Introduction**

Members of the cystatin superfamily (I25), also known as cystatins, are major regulators of the activity of papain-like cysteine proteases and legumain-type proteases. These endogenous protease inhibitors are widely distributed in all domains of life. Based on sequence homology, the cystatin superfamily comprises three main sub-families: types 1–3. Type 1 cystatins (stefins, I25A) are low-molecular-weight intracellular single-domain proteins generally lacking disulfide bonds and signal peptides. Within the stefins, an additional subtype, unusually featuring a signal peptide, is recognized exclusively in liver and intestinal flukes (Trematoda: Digenea) [1–4]. Type 2 cystatins (I25B) are mainly extracellular single-domain proteins containing two disulfide bonds and a signal peptide, while type 3 cystatins (kininogens, I25C) are large multi-domain proteins [5–7].

Cystatins are involved in the regulation of key physiological processes, such as the protection of cells from proteolysis [5–7]. In parasites, cystatins are involved in immunomodulation of host cells and represent important immunogenicity and pathogenicity factors involved in host–parasite interactions [8–13]. They facilitate parasite survival within the host [14,15] by inhibiting host proteases and by interfering with antigen processing and presentation [16–19]. The increased capacity of parasite cystatins to induce anti-inflammatory cytokine IL-10 [16,17,20] was suggested to be an adaptation to the parasitic lifestyle [21,22]. These molecules have been used as effective markers in diagnosis and vaccine development and as chemotherapeutic targets against various parasitic infections [23–27]. Recently, cystatins have also been suggested as candidates for immunotherapies for acute and chronic inflammatory diseases [28] and cancer [29].

The diversity and evolution of the cystatin superfamily is well-documented in prokaryotes and certain eukaryotic lineages [30–33]. Within the Metazoa, phylogenetic affinities and representation of cystatins have been investigated predominantly in the derived lineages (Bilateria) including both free-living [34] and parasite taxa, e.g., various helminth [2,10,35–38] and arthropod groups [39–42]. According to the current knowledge of cystatin evolution, the stefin and type 2 cystatin lineages were already present in the eukaryotic ancestor. Stefins have remained structurally similar to the ancestral form and are present across most metazoan groups. In contrast, type 2 cystatins greatly diversified, primarily in derived metazoans (Vertebrata), and have been lost entirely in some early (Placozoa) and derived (Acoela, Tardigrada, Hemichordata, etc.) metazoan groups. Multi-domain cystatins have been reported exclusively from several bilaterian groups [30]. Nevertheless, the picture of cystatin evolution in the Metazoa is incomplete, as early-emerging animal lineages (Porifera, Ctenophora, Placozoa, and Cnidaria) are greatly underexplored in this respect and completely underrepresented with regard to parasitic lineages. To date, only a few early-emerging animal taxa were included in the studies of cystatin evolution [30,34], from which none were parasites. Thus, further data are needed from these ancient animal lineages, including those with parasitic lifestyles, to improve our evolutionary understanding and to explore potential linkage to the organism's life history.

Basal metazoans mostly comprise free-living species, but at least eight independent parasite lineages of early-emerging Metazoa are known, with all but one in Cnidaria [43]. Recently, next-generation sequencing (NGS) data have become available for three of these parasite lineages, i.e., two *Edwardsiella* spp. [44,45], *Polypodium hydriforme* [46,47], and several species of the large group Myxozoa [47–54]. While *Edwardsiella* and *Polypodium* parasitize their hosts exclusively during the larval stage of their development [45,55], the microscopic myxozoans are an entirely parasitic group with complex life cycles [56]. Members of the specious class Myxosporea are primarily known from fish (intermediate hosts) and annelids (definitive hosts). The more primitive Malacosporea use fish

and freshwater bryozoans as hosts [57]. Myxozoans are an evolutionary old and fastevolving group, evidenced by their highly derived morphology and seen in genome phylogenies [46,47,51,53,58]. Their genomes are the smallest amongst all animals due to evolutionary transition to parasitism and the associated extreme reduction of body complexity from free-living cnidarians [47].

To explore the gene repertoire, gene organization, and structural features of cystatins in phyla at the base of the metazoan tree, we mined 110 genomes and transcriptomes of 58 early-emerging animal species. We conducted phylogenetic analyses using newly obtained and published cystatin sequences of basal and derived metazoans to provide insights into the origin, evolution, and structural diversification of this group of inhibitors at the time when this superfamily emerged and expanded. We investigated the amino acid sequence structure and phylogenies of cystatins from free-living and parasite taxa to identify structural modifications, which potentially represent lineage-specific adaptations related to parasitic lifestyles. Our investigation thus aimed to reveal whether gene repertoire and evolution of cystatins in early-emerging animals follow similar patterns recognized for derived metazoans and whether the modifications are linked to the organism's life history.

#### **2. Materials and Methods**

#### *2.1. Data Mining*

Representatives of basal metazoan lineages (Placozoa, Porifera, Ctenophora, and Cnidaria) were selected for mining of cystatin homologues from the NGS data available from GenBank, other public databases, or produced in our laboratories (110 datasets of 58 species; Table S1). The samples included free-living and parasite taxa, particularly Cnidaria, i.e., *Polypodium hydriforme*, *Edwardsiella lineata, E. carnea*, and all available myxozoans (13 species). We used published the cystatin homologues of metazoans from basal animal species [30,53] for the cystatin homology search. If no homologues were found in a species in the first blast, a repeated search was applied by utilizing the sequences of homologues found in related species in the first iteration as additional queries for the next search.

We used a combined search strategy using both motif- and sequence similarity-based methods to identify target molecules. The search was performed using the tBLASTn algorithm with the E-value cutoff set to 10−<sup>5</sup> . This relatively low threshold was selected based on low sequence similarity among our queries and subjects (approximately 30–40%). The mined sequences were examined for the presence of typical conserved motifs of the cystatin domain (G; Q-x-V-x-G; LP/PW; YF [30,59]), by searches against different databases (GenBank, MEROPS peptidase database [60]) including the reciprocal blast analysis, and by their positioning in our preliminary phylogenetic analyses. Detailed homology searches were performed using the hmmrscan algorithm of HmmerWeb version 2.32.0 software [61] against protein family databases: Pfam, CATH-Gene3D, PIRSF, Superfamily, TIGRFAM, and TreeFam. For parasites, the potential contaminating sequences of host or other origin were identified and filtered out by blast matches with reference NGS data and by phylogenetic clustering with contaminant species homologues.

#### *2.2. Verification of Selected Cystatin Sequences*

Full gene sequences of the cystatin superfamily homologues mined from unpublished genomes and transcriptomes (Table S1) were verified by PCR and by sequencing from DNA and/or cDNA templates using species-specific primers designed in this study, if possible, in the 5'and 3'UTR regions (details in Methods S1 and Table S2). PCR was used to obtain the stefin gene sequence of *Buddenbrockia plumatellae* (Methods S1 and Table S2) as mining of the published *Buddenbrockia* EST database (Table S1) did not return any cystatin hits. PCR-verified sequences were deposited in GenBank under the accession Nos. MT127416–MT127426 and MW498387–MW498390 (Table S2). The positions of the introns were assessed by comparison of the sequences of DNA vs. cDNA origin and, if possible, by comparing sequences mined from the genomic vs. transcriptomic data.

#### *2.3. Phylogenetic Analyses*

We compiled and analyzed a sequence dataset (128 taxa) of both newly identified and published cystatins. The dataset comprised Choanoflagellates (11 taxa), early-emerging animal lineages (58 taxa), more derived metazoans (58 taxa), and *Giardia* cystatin (outgroup) resembling the most ancestral eukaryotic cystatin [30] (Table S3). Ingroup taxa were represented by free-living species (72 taxa) and parasite species from basal metazoans (*Polypodium hydriforme*, *Edwardsiella* spp., and myxozoans; 16 taxa) and bilaterians (trematodes, cestodes, nematodes, monogeneans, ticks, and mites; 39 taxa) (Table S3). Multi-domain cystatins were omitted from the phylogenetic analyses due to difficulties with their alignment.

We aligned 329 amino acid cystatin sequences in MAFFT v7.017 implemented in Geneious v8.1.3 [62] using the iterative refinement by the G-INS-i strategy that resulted in an alignment of stefins and type 2 cystatins according to topological equivalence in tertiary structure (see Dataset S1 of [59]). Nonhomologous regions at the beginning (e.g., signal peptides) and end of the alignment were trimmed so the final alignment comprised 213 positions, principally corresponding to the conserved cystatin domain (Pfam: PF00031) (Dataset S2). The phylogenetic trees were reconstructed using the maximum likelihood (ML) and Bayesian inference (BI) methods. ML analysis was performed in RAxML v7.0.3 [63], using the WAG + G model selected by ModelFinder implemented in IQ-TREE webserver [64]. Indels were treated as unknown characters. Bootstraps were based on 1000 replicates. BI analysis was performed in MrBayes v3.2.7av1 [65] using the WAG model of evolution, with 24 million generations sampled at intervals of 1000 trees until the standard deviation from split frequencies was below 0.05. The burn-in period represented the default initial 25% of all generations. The trees were visualized in Geneious v8.1.3 and graphically modified in Adobe Illustrator CS5.

#### *2.4. Comparison of the Protein Architecture*

Cystatin diversity on the amino acid sequence level was analyzed in terms of critical motifs, amino acid conservation, presence/absence of a signal peptide and disulfide bridges, and modifications that might potentially affect the cystatin inhibitory activity. Disulfide bonds were predicted using the DISULFIND webserver [66]. Signal peptides were predicted using the SignalP v5.0 webserver [67]. Animal group-/cystatin type-specific sequence motifs were identified by comparison of primary structures of cystatin amino acid sequences in the WebLogo v3 software [68] using extractions of the alignment used for the phylogenetic analyses.

#### **3. Results**

In this study, we retrieved 184 homologues of cystatins (104 stefins and 80 type 2 cystatins) from the NGS data of early-emerging metazoans (Table S1). Of these, we verified with PCR the identity of 15 homologues mined from unpublished datasets (Table S2).

#### *3.1. Cystatin Gene Repertoire and Diversity in Early-Emerging Metazoans*

We observed a large variation in the representation of cystatin subclasses in the investigated animal groups (Figure 1). Type 2 cystatins were missing in several groups (Placozoa, Porifera, Cnidaria: Staurozoa, and Myxozoa). Stefins were present in all but one (Placozoa) basal metazoan lineage. We found a separate group of stefins in myxozoan parasites, designated here as "atypical stefins", which differed in sequence structure to classical metazoan stefins (see Section 3.3 and Figure 2). No multi-domain cystatins could be identified in basal metazoans (Figure 1).


**Figure 1.** Repertoire of cystatin superfamily genes in early-emerging metazoan lineages.

**Figure 2.** Schematic comparison of the gene architecture and amino acid structure of metazoan cystatins: for myxozoans, information is shown separately for Malacosporea and Myxosporea.

We found that basal metazoans had a diverse cystatin gene repertoire: most investigated species had multiple cystatin homologues with significant sequence differences, even on the intraspecies level (i.e., several divergent sequences of type 2 cystatin and/or stefin homologues in the genome of a single species; Table S1). Exceptionally, certain species (e.g., cnidarians *Ceratonova shasta* and *Physalia physalis*) only possessed single type 2 cystatin and/or stefin homologue.

All mined cystatin homologues are listed either as new GenBank entries (for PCRverified sequences mined from data unpublished at the time of the homology searches) or as Genbank contig/scaffold identifiers for sequences mined from published data (Table S2 and Dataset S1).

#### *3.2. Phylogeny of Metazoan Cystatins did not Mirror Animal Phylogeny and Was Highly Diverse within Individual Groups*

The clustering of cystatins in both ML and BI analyses did not follow the generally accepted trends of organismal phylogeny, as early-emerging animal lineages did not occupy basal positions in our trees (Figure 3 and Figure S1) as commonly known from other phylogenetically informative genes., e.g., 18S rDNA. Based on differences in the primary amino acid sequence structure, cystatins clustered into a single weakly supported stefin clade (ML/BI = 48/0.75) while the remainder of sequences clustered within the type 2 cystatin clade (ML/BI = 59/0.92). The stefin clade included a mixture of classical and atypical stefin subclades with weakly supported interrelationships. Classical stefin subclades comprised all Metazoa except the myxozoans, while atypical stefin subclades were formed exclusively of certain early-emerging (myxozoans) and more derived (trematodes) parasite animal lineages. Unlike other metazoan stefins, those of myxozoans were extremely divergent in their sequences, thus creating long branches (Figure 3 and Figure S1).

**Figure 3.** Maximum likelihood phylogenetic tree of the cystatin superfamily represented in the Metazoa and Choanoflagellata. *Giardia* cystatin was used as the outgroup. The basal metazoan groups are depicted in dark grey shaded bubbles, while other groups are shaded light grey. Cystatin homologues from parasite taxa are labelled by asterisks at the tip of the nodes. The detailed ML and BI phylogenetic trees featuring taxa names and all nodal supports are provided in Figure S1.

> Both stefins and type 2 cystatins clustered into subclades that paralleled classic taxonomy. However, interclade relationships of these taxonomy-defined subclades were weakly supported/not resolved due to polytomies (Figure 3 and Figure S1) and did not always reflect the organismal phylogeny. Indeed, some representatives formed monophyletic taxonomic groups, e.g., Ctenophora, Porifera, Cubozoa, and Myxozoa, while some cnidarian

groups (Polypodiozoa, Staurozoa, Scyphozoa, Anthozoa, and Hydrozoa) split into multiple subclades not related by classic taxonomy. We attributed this splitting to (i) sequence differences among species of the same taxonomic group (e.g., hydrozoans *Hydra*, *Podocoryna*, *Physalia*, *Velella*, and *Porpita* spp. clustering into independent type 2 cystatin clades) or (ii) the occurrence of paralogues (e.g., stefin out-paralogues of the scyphozoan *Aurelia aurita* present in two distinct groups). Some of the out-paralogous clades additionally contained multiple sequences from single species (e.g., stefin in-paralogues of *A. aurita* ranging up to 37.7% amino acid sequence divergence) (Figure 3 and Figure S1).

#### *3.3. Structural Modifications of Cystatin Homologues in Basal Metazoans*

The results of structural analyses aimed at identifying intron number and positions, conserved regions, and the presence/absence of signal peptides and cysteine bonds were combined to create the comparative scheme of cystatin structural motifs (Figure 2).

We identified that introns in cystatin nucleotide sequences of early-emerging animal lineages varied in number (0–2) and location in type 2 cystatins while stefins contained two introns with conserved positions (Figure 2). All introns were delimited by typical GT/AG sequence boundaries at the exon–intron junction.

In amino acid sequence data from our alignment of basal animals (Dataset S2), we observed four conserved regions important for interaction with proteases, as found previously with derived metazoans [30,59]: (i) a glycine residue within the amino-terminal "trunk", (ii) a Q-x-V-x-G motif within the first hairpin loop, (iii) the PW (type 2 cystatins) or LP (stefins) motifs within the second hairpin loop, and (iv) a tyrosine residue located in the relatively conserved D-x-L-x-Y-F carboxy terminus of stefins. We observed two novel conserved residues in stefins of both early-emerging and derived metazoans: (i) a histidine residue located seven amino acids before the LP pair towards the N-terminus and (ii) a lysine residue situated eight amino acids before the C-terminal conserved tyrosine residue (Figure 2 and Dataset S2).

While the amino acid structure of the first two key activity regions (G, Q-x-V-x-G) was conserved among all sequences, several modifications were seen in other regions. All sponge (Porifera) stefins showed a characteristic amino acid replacement of the LP pair to LD (or LS in one case). Substitution of the conserved C-terminal tyrosine residue by nonaromatic valine was observed in two homologues. The conserved histidine residue located near the LP pair was commonly replaced by methionine or, exceptionally, by histidine or tryptophan (both observed once) (Dataset S2).

Stefins and type 2 cystatins of comb jellies (Ctenophora) had the typical sequence motifs defined for these molecules in other metazoans. In a few taxa, we observed substitutions of the conserved cystatin PW pair by AW and stefin C-terminal tyrosine residue by proline or histidine (Dataset S2).

Cnidarian cysteine protease inhibitors showed a wide variety of modifications (Figure 2 and Dataset S2). Cubozoans were the only group for which conserved stefin regions had the classical organization, and their type 2 cystatins differed in a few cases by replacement of the PW motif with SW or KF. Stefins of hydrozoans and scyphozoans showed single substitutions of LP to FA or LK and single alterations of C-terminal tyrosine to aromatic phenylalanine, and many of the type 2 cystatins had substitutions of PW with AW, RW, or SW and exceptionally with PF or PL. Staurozoan stefins had C-terminus modifications, where the tyrosine residue was substituted with histidine or was missing. In anthozoans, type 2 cystatins had alterations of PW to SW or GW, while a few stefins had substitutions of LP with FD or FS and replacement of the C-terminal tyrosine residue to proline or histidine.

The most notable modification we found was in two parasitic cnidarian groups Polypodiozoa and Myxozoa. While the majority of type 2 cystatins of *P. hydriforme* had the typical conserved motifs, its stefin was highly modified, with the LP pair substituted with LS and the C-terminal conserved D-x-L-x-Y-F motif replaced with a completely different pattern lacking a tyrosine residue. While this stefin retained the histidine found near the LP pair, the conserved lysine located near the C-terminal was replaced by leucine. Myxozoans, as representatives of early-emerging metazoans, had atypical stefins, which completely lacked the D-x-L-x-Y-F region at the C-terminus and had the normally conserved histidine and lysine residues replaced by other residues. Stefins of an evolutionary older myxozoan subgroup, the Malacosporea, retained the original LP pair, but some stefins of the specious and more derived myxozoan group, the Myxosporea, had replaced this motif with LS, LY, LR, FK, or SA (Figure 2). We could not find classical stefins or type 2 cystatins in any myxozoan (Figure 1).

In general, signal peptides in cystatins of basal metazoan lineages followed established patterns: they were present in type 2 cystatins and absent in classical stefins. The only exceptions were some myxosporean stefins that carried a signal peptide (Figure 2). Interestingly, the pattern of signal peptide absence (in type 1 atypical stefins) or presence (in type 2 atypical stefins) was observed even at the intraspecies level. For example, some myxosporeans coded exclusively for stefins without a signal peptide (i.e., *C. shasta* and *Sphaeromyxa zaharoni*) while others presented multiple stefin genes that all carried the signal peptide (i.e., *Myxobolus pendula* and *M. cerebralis*) or had genes that represented both variations (i.e., *Enteromyxum leei*, *E. scophthalmi*, *Kudoa iwatai*, *Thelohanellus kitauei, Myxidium lieberkuehni* and *Sphaerospora molnari*) (Figure 2 and Dataset S1).

As typical for type 2 cystatins, two conserved disulfide bridges were identified in all basal metazoans with the exception of homologues found in ctenophorans, which were predicted to contain only one disulfide bridge. Classical stefins lacked disulfide bridges, as is typical for this cystatin subfamily type, and this pattern was also observed in atypical stefins (Figure 2).

#### *3.4. Structural and Evolutionary Comparison of Cystatins of Parasitic vs. Free-Living Groups of Basal Metazoans: The Special Case of Myxozoa*

The Myxozoa, the only obligate parasite group of early-emerging metazoans, possessed exclusively atypical stefins and lacked type 2 cystatins. Myxozoan stefins had characteristic differences in sequence to those of the other basal Metazoa (see Section 3.3) and clustered into a single clade with long branches (see Section 3.2). The two cnidarian lineages studied that are parasitic only in their larval stage (*E. lineata*, *E. carnea*, and *P. hydriforme*) possessed both classical stefins and type 2 cystatins, and no atypical stefins were found in these animals. Their stefins formed a distinct lineage that afterwards clustered with other cnidarians (*P. hydriforme*) or grouped with their respective anthozoan members (*E. lineata* and *E. carnea*). Similarly, clustering of "partially parasitic" cnidarians into distinct monotypic lineages (*P. hydriforme*) or within anthozoan lineages (*E. lineata* and *E. carnea*) was observed for type 2 cystatins (Figure 3 and Figure S1). No structurally unique molecules were identified in the entirely free-living basal metazoans.

#### **4. Discussion**

#### *4.1. A Diverse Repertoire of Cystatins in Early-Emerging Metazoans*

In this study, we identified the cystatin gene repertoire of early-emerging animals by mining homologues from 110 genomes and transcriptomes of basal Metazoa. We demonstrate that cystatins are widely distributed and highly diversified in current representatives of these ancient lineages and that the repertoire of cystatin superfamily genes differs widely among phyla. Each taxonomic group evolved multiple stefin and/or type 2 cystatin homologues, forming individual taxonomy-defined clades in the phylogeny. These paralogues probably arose by gene duplication [30] rather than by whole genome multiplication, as the only basal metazoans for which genome duplication has been suggested are members of the cnidarian genus *Acropora* [69]. Multiple out-paralogues found in cnidarians likely originated before cnidarian radiation while, in-paralogues, probably emerged after the speciation event. Bursts of sequence diversification likely gave rise to gene orthologues within each taxonomy-defined clade. These evolutionary processes resulted in the origin of diverse molecules with varying inhibitory and immunomodulatory functions in metazoans.

Some taxonomic groups had a wide diversity of cystatins, while others lacked some or all cystatin subfamilies. For example, genes coding for cystatins were not identified in the placozoan genomes: two different strains of *Trichoplax* sp. [70] and *Hoilungia hongkongensis* [71], which was in concordance with findings of Kordiš and Turk [30] from the related species *Trichoplax adhaerens* [70,72]. Evidently, cystatins are not essential for the survival of these simple organisms despite the presence of cysteine proteases in them [71,72]. We suggest that, in placozoans, the functions of cystatins have been replaced by other types of inhibitors, such as structurally unrelated equistatin or serpins [70–72]. These proteins have been recognized in other organisms as potent inhibitors of cysteine proteases [73] or cross-class inhibitors of them [74].

#### *4.2. Atypical Stefins: A Unique Type of Cystatins in Myxozoans and Some Trematodes*

Interestingly, we found a wide diversity of both classical and atypical stefins in basal metazoans and we suspect that these highly divergent homologues may have different or novel functions, as suggested for the highly divergent myxozoans serpins (serine protease inhibitors [75]). Stefins structurally similar to those of myxozoans have been identified only from more derived bilaterians: intestinal and liver flukes (Trematoda: Digenea) [1–4]. While myxozoans possess only atypical stefins, these trematodes have cystatins from several superfamily subclasses (classical and atypical stefins, multi-domain cystatins [1–4,22,76]). The atypical stefins presumably arose independently in the two groups, possibly as a result of pressures from a parasitic life history; however, other parasites do not possess these molecules. For example, parasitic metazoans (monogeneans, nematodes, cestodes, ticks, and mites) [9,36,37,39,42] including blood trematodes [35] have only classical stefins and type 2 cystatins. Similarly, cnidarians *Polypodium* and *Edwardsiella* spp. with parasitic larval and free-living adult stages also have only classical stefins and type 2 cystatins. Thus, atypical stefins represent key evolutionary innovations of myxozoans and intestinal and liver trematodes. The likely function of atypical stefins in myxozoans and trematodes is to control unwanted proteolysis by cysteine proteases expressed by different parasite life stages and by the host [1].

Structurally, atypical stefins contain the characteristic stefin LP pair, which is modified in certain myxozoan stefins. Importantly, all atypical stefins lack the conserved D-x-L-x-Y-F carboxy-terminal part that is generally found in classical stefins but is absent from type 2 cystatins (Figure 2). As the tyrosine residue of the D-x-L-x-Y-F region is important for stabilization of the stefin-protease complex [59,77], the absence of this motif along with the modifications in the conserved LP pair may have functional implications in atypical stefins, e.g., their capability to inhibit a broader range of proteases, including those of host origin. The function of this C-terminal modification warrants future experimental studies as demonstrated previously for stefin mutants [77,78].

Additionally, we identified that the atypical stefins of some myxozoans and all trematodes have a signal peptide typical for type 2 cystatins (Figure 2). In some taxa, this feature correlates with extracorporeal protein secretion [3,79–81], but this is not consistent across the Metazoa, as not all proteins that have a signal peptide are exported out of the organism [1] and a large proportion of excreted/secreted products do not have a signal peptide [1,4,37,79,81–83]. Secreted proteins, including cystatins, are crucial for parasite protection from degradation by host cysteine proteases and for modulation of host immunity [9–11,16,79]. We postulate that some myxozoan stefins are secretory proteins with immunomodulatory roles, as has been suggested for the atypical stefin of the myxosporean *Thelohanellus kitauei* [53,84]. These molecules are likely responsible for the increased expression of the anti-inflammatory IL-10 gene in fish hosts infected with different myxosporean species [85–90] as an increased production of IL-10 in hosts infected by other parasites is elicited by cystatin-induced immunomodulation [16,20]. Similar to other parasites [23–27], we suggest that these proteins should be further explored in myxozoans as vaccine targets due to the lack of efficient strategies against these parasites in fish destined for human consumption [91,92].

We propose two mechanisms for the origin of atypical stefins: (i) structural modification of ancestral classical stefins by the acquisition of a signal peptide in some homologues and loss of the C-terminal and its tyrosine residue, or (ii) combination of the features of classical stefins (LP pair) and type 2 cystatins (signal peptides, missing C-terminal part) in chimeric genes. Chimeric genes are important in the evolution of genetic novelty and can allow organisms to adapt to novel environments (or hosts) by acquisition of novel functions [93,94]. Both myxozoans and trematodes are obligate parasites with complex life cycles that involve vertebrate and invertebrate hosts [8,95]. The independent origins of their atypical stefins might have been associated with the transition of a presumably free-living ancestor to obligate parasitism and the associated challenges of surviving within one or more hosts. Extending a recent hypothesis that invertebrates were the initial hosts for myxozoans and fish were incorporated into life cycles later [96], we postulate that myxozoans reduced their type 2 cystatins in the invertebrate host in the course of genome reduction due to parasitism [47] since invertebrates have limited immune capacity [97]. Ancestral superfamily cystatin types were lost in myxozoans as their roles were replaced by atypical stefins, which in trematodes have been shown to inhibit both parasite and host proteases [2,3]. Then, under novel pressures in adopted fish hosts, myxozoans remodeled classical stefins into the atypical ones to interfere with specific immune responses of fish and to enable novel behaviors such as migration within host tissues.

#### *4.3. Obstacles to Reconstruction of Cystatin Phylogenies*

In our analyses, the reconstruction of evolutionary relationships in the cystatin superfamily had good support for the more recent clades but not for deeper evolutionary nodes, as reported previously for this group of genes [30,35]. Unstable topology, polytomies (in BI), and low nodal supports in our phylogeny were due to low informative content from short protein length and high sequence divergence in the chosen genes [30,35]. However, despite generally poor support, a clear separation of ingroup taxa into distinct lineages defined by taxonomy and protein architecture revealed remarkable trends in the evolution of cystatins in early-emerging metazoans, with structurally unique molecules of potentially novel functions. Different branches appear to have evolved at different rates [98], especially in fast-evolving myxozoans [46,47,51,53,58]. Fast rates of evolution of the myxozoan genes likely caused the extraordinary sequence divergence of myxozoan stefins, which may represent functional diversification of these molecules. Phylogenetic clustering of atypical stefins within the myxozoan clade seems to some extent to follow myxozoan speciation and differentiation into the well-known myxozoan lineages (oligochaete-infecting, polychaete-infecting, *Sphaerospora sensu stricto*, and malacosporeans [96]) Resolution of these established myxozoan clades is low due to the undersampling of taxa and is probably compounded by diversification including a wide functional range of atypical stefins, with or without signal peptides, and with each particular lineage displaying specific sequence motifs and potential host organ-specific protease expression.

#### *4.4. Hypothetical Evolution of Metazoan Cystatins*

From their comprehensive analysis of prokaryotic and eukaryotic cystatins, Kordiš and Turk [30] concluded that the ancestor of the cystatin superfamily was intracellular and lacked a signal peptide and disulfide bridges (similarly to extant stefins). Initial gene duplication produced two ancestral eukaryotic lineages, stefins and type 2 cystatins, with the latter evolving from the stefin-like ancestor by acquiring cysteine residues, disulfide bridges, and a signal peptide. Later gene and domain duplications combined with deletions and insertions of genetic material resulted in diverse single- and multi-domain proteins with or without disulfide bonds and glycosylations [30].

Based on our novel data and published records [30], we propose the following alternative scenarios for the evolution of single-domain cystatins in the Metazoa, with a focus on its basal lineages (Figure 4). After the split of the cystatin superfamily into the initial stefin and type 2 cystatin lineages in the ancestor of eukaryotes, stefins were retained

in the common ancestor of the Metazoa and all its descendant lineages except Placozoa. Sequence diversification and lineage-specific adaptation of stefins occurred independently in parasites (Myxozoa and Trematoda), giving rise to atypical stefins. Type 2 cystatins were either (i) retained in the metazoan ancestor and its descendant lineages except for their independent loss in Porifera, Placozoa, some Cnidaria (Myxozoa, Staurozoa) and some bilaterians (liver and intestinal flukes) (scenario A in Figure 4); (ii) initially lost in the metazoan ancestor (or even before), reappeared in the ancestor of Eumetazoa, and were again lost in some descendant lineages (Placozoa, Myxozoa, Staurozoa, and some flukes) (scenario B in Figure 4); or (iii) initially lost in the metazoan ancestor (or even before) and reappeared independently in Ctenophora and in the ancestor of the lineage, leading to Cnidaria and Bilateria with some groups (Myxozoa, some flukes) having a secondary loss (scenario C in Figure 4).

**Figure 4.** Hypothetical evolutionary scenarios of origins of stefins and type 2 cystatins in the Metazoa with a focus on early-emerging animal lineages: please note that the different colors for type 2 cystatins visualize the alternative scenarios of hypothetical evolution for the same cystatin subtype and not the sequence differences.

#### **5. Conclusions**

Stefins are considered relatively conserved genes with characteristics closer to the ancestral superfamily type, while type 2 cystatins are considered to have undergone a more

complex and dynamic evolution through numerous gene and domain duplications [30]. In this study, we showed that early-emerging metazoan lineages were more prone to type 2 cystatin loss and that significant diversification of this type of molecules occurred only later in their evolution. Simultaneously, these ancient lineages retained stefin genes that were more dynamic with regard to structural modifications, with bursts of diversification creating atypical stefins in some obligate parasites with complex life cycles. Further research aimed at biochemical and functional characterization of the parasite cysteine protease inhibitors, especially of atypical stefins, is necessary to decipher the roles of these proteins in multi-host–parasite interactions. This knowledge will improve the understanding of the contributions of cystatins for parasite survival in the host causing disease and may reveal targets for therapeutant development against Myxozoa.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/2079-773 7/10/2/110/s1, Figure S1: Maximum likelihood and Bayesian trees with nodal supports; Table S1: Next-generation sequencing databases mined for cystatins; Table S2: List of myxozoan species with information on primers used for the PCR verification of stefin genes; Table S3: The list of groups/species used for phylogenetic analyses. Methods S1: Methodological details regarding the PCR verification of myxozoan stefins; Dataset S1: Fasta file of complete the amino acid sequences of cystatins present in the phylogenetic analyses; Dataset S2: Fasta-formatted alignment of the trimmed amino acid sequences of cystatins used for the phylogenetic analyses.

**Author Contributions:** Conceptualization, P.B.-S.; methodology, P.B.-S.; formal analysis: P.B.-S., J.K., O.P., M.N.F., and G.A.-B.; investigation: P.B.-S., S.D.A., O.P., A.P.-S., A.H., M.N.F., and G.A.-B.; resources: P.B.-S., A.S.H., O.P., J.W.H., and J.L.B.; data curation: P.B.-S. and O.P.; writing—original draft preparation: P.B.-S.; writing-review and editing: All authors; visualization: P.B.-S. and J.K.; supervision: P.B.-S.; funding acquisition: P.B.-S. and A.S.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Ministry of Education, Youth, and Sports of the Czech Republic, grant number LTAUSA17201; by the European Commission under the H2020 Programme— ParaFishControl, grant number 634429; by the Czech Science Foundation, grant number 19-28399X (to A. S. Holzer, G. Alama-Bermejo, and J. Kyslík) and 21-16565S and by the Czech Academy of Sciences and Hungarian Academy of Sciences, grant number MTA 19-07. This publication reflects the views of the authors only; the European Commission cannot be held responsible for any use which may be made of the information contained therein.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are openly available in Genbank (under the accession Nos. MT127416–MT127426 and MW498387–MW498390) and in the associated supplementary files.

**Acknowledgments:** We thank to Baveesh Pudhuvai (BC CAS, Budweis, Czech Republic) for help in the PCR verification of *Buddenbrockia* stefin. We also thank Ivan Fiala (BC CAS, Budweis, Czech Republic) for providing suggestions to improve the manuscript and for sharing *M. lieberkuehni* and *N. pickii* transcriptomic data. We are grateful to Hanna Hartikainen (ETH Zurich, Switzerland) for sharing *T. bryosalmonae* genome data.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


*Article*

## **Ranacyclin-NF, a Novel Bowman–Birk Type Protease Inhibitor from the Skin Secretion of the East Asian Frog,** *Pelophylax nigromaculatus*

**Tao Wang** † **, Yangyang Jiang** † **, Xiaoling Chen \*, Lei Wang , Chengbang Ma, Xinping Xi, Yingqi Zhang, Tianbao Chen , Chris Shaw and Mei Zhou**

Natural Drug Discovery Group, School of Pharmacy, Queen's University Belfast, Belfast BT9 7BL, UK; twang13@qub.ac.uk (T.W.); yjiang12@qub.ac.uk (Y.J.); l.wang@qub.ac.uk (L.W.); c.ma@qub.ac.uk (C.M.); x.xi@qub.ac.uk (X.X.); zhangyingqi08@sina.com (Y.Z.); t.chen@qub.ac.uk (T.C.); chris.shaw@qub.ac.uk (C.S.); m.zhou@qub.ac.uk (M.Z.)

**\*** Correspondence: x.chen@qub.ac.uk; Tel.: +44-28-9097-2200

† These authors contributed equally to this work.

Received: 7 May 2020; Accepted: 29 June 2020; Published: 2 July 2020

**Abstract:** Serine protease inhibitors are found in plants, animals and microorganisms, where they play important roles in many physiological and pathological processes. Inhibitor scaffolds based on natural proteins and peptides have gradually become the focus of current research as they tend to bind to their targets with greater specificity than small molecules. In this report, a novel Bowman–Birk type inhibitor, named ranacyclin-NF (RNF), is described and was identified in the skin secretion of the East Asian frog, *Pelophylax nigromaculatus*. A synthetic replicate of the peptide was subjected to a series of functional assays. It displayed trypsin inhibitory activity with an inhibitory constant, Ki, of 447 nM and had negligible direct cytotoxicity. No observable direct antimicrobial activity was found but RNF improved the therapeutic potency of Gentamicin against Methicillin-resistant *Staphylococcus aureus* (MRSA). RNF shared significant sequence similarity to previously reported and related inhibitors from *Odorrana grahami* (ORB) and *Rana esculenta* (ranacyclin-T), both of which were found to be multi-functional. Two analogues of RNF, named ranacyclin-NF1 (RNF1) and ranacyclin-NF3L (RNF3L), were designed based on some features of ORB and ranacyclin-T to study structure–activity relationships. Structure–activity studies demonstrated that residues outside of the trypsin inhibitory loop (TIL) may be related to the efficacy of trypsin inhibitory activity.

**Keywords:** Bowman–Birk inhibitor; ranacyclin; trypsin inhibitor; structure–activity relationship; synergistic effect; Gentamicin

#### **1. Introduction**

Serine proteases exist in almost all living organisms [1], and, in humans, they are intimately involved in many disease processes, such as in infections, coagulation disorders, cancer and inflammation [2,3]. Although most current serine protease-targeted drugs are small molecules, there is an increasing attention shift to polypeptide inhibitors as these more complex molecules offer an achievable approach for optimising the affinity and specificity of inhibitors. Inhibitor scaffolds are normally contained in natural proteins or peptides which have been optimised and improved through a process of parallel evolution for activities against endogenous targets [4].

Bowman–Birk inhibitors (BBIs) exist in a wide range of plants including sunflowers, soybeans, legumes and so on [5]. They are classical serine proteinase inhibitors that contain a highly-conserved disulphide loop structure consisting of nine residues (CTP1SXPPXC) [6–8]. BBI peptides derived from amphibians are slightly different and typically contain a disulphide-bridged hendecapeptide

loop (CWTKSXPPXPC), containing 11 residues rather than nine residues [9,10]. For all natural trypsin-inhibition domains, the residue at the P1 position is lysine or arginine, whereas the P1 residue is typically leucine or tyrosine in chymotrypsin inhibitory domains [11]. BBIs have been extensively studied and found to have a wide range of functions, such as anti-inflammatory, anticancer and anti-bacterial. In addition, compared to most antimicrobial peptides (AMPs), BBIs are safe orally-active peptides whose therapeutic effects have been proven in animal models [12]. A thorough understanding of BBIs and their therapeutic potential could help in their further development as clinical agents.

The term "ranacyclins" was first used in 2003 to describe two cyclic AMPs, ranacyclin-T and ranacyclin-E, isolated from the skin secretion of *Rana temporaria* and *Rana esculenta*, respectively [13]. According to the protein database InterPro (https://www.ebi.ac.uk/interpro/), currently, there are 73 ranacyclin-type peptides have been identified from the frog skin secretion. The residue length of ranacyclins ranges from 13 to 30, and most are composed of 17 amino acid residues. They share a similar peptide sequence and contain a highly conserved mutated Bowman–Birk trypsin inhibitory loop (TIL). Although ranacyclins share similar structural features, their bioactivities are extremely diversified [14]. pLR from *Lithobates pipiens* and pYR from *Rana sevosa*, for example, were reported with immunomodulatory properties and could suppress the early development of granulocyte macrophage colonies from bone marrow stem cells [15,16]. ZDPI from *Amolops loloensis* could inhibit the aggregation of platelet [17]. Ranacyclin-HB1 from *Pelophylax hubeiensis* showed obvious antioxidant activity and could scavenge the free radicals rapidly [18]. These diversified functions of ranacyclins might be associated with the species adaption to different environments [19,20].

Compared with most AMPs, peptides in ranacyclin with multiple functions were considered as a more promising candidate for a novel generation of antibiotic agents than most AMPs due to their excellent hydrolysis-tolerance capacity and low cytotoxicity [21]. However, compared with common AMP families, such as dermaseptins or brevinins, reports in these families are relatively less. Therefore, studies and modification on ranacyclins would favour to a better understanding of this family and put them into an application. In this study, a novel Bowman–Birk type ranacyclin, named ranacyclin-NF (RNF), was discovered in the skin secretion of *Pelophylax nigromaculatus* and was subsequently structurally- and functionally-characterised. To study the structure–activity relationships of this ranacyclin in more depth and based on previous publications, two analogues, named ranacyclin-NF1 (RNF1) and ranacyclin-NF3L (RNF3L), were designed and subjected to bioactivity assessment in parallel with a synthetic replicate of the natural peptide.

#### **2. Materials and Methods**

#### *2.1. Acquisition of Skin Secretion*

Four East Asian Frogs, *Pelophylax nigromaculatus*, were captured in China. All frogs were mature adults and their skin secretions were obtained from the dorsal skin by gentle transdermal electrical stimulation. The skin secretion was washed from the skin with de-ionised water (ddH2O), collected within one 50-mL tube, frozen in liquid nitrogen, lyophilised and stored at −20 ◦C prior to analysis [13]. The study was performed according to the guidelines of the UK Animal (Scientific Procedures) Act 1986, under project license PPL 2694 (M.Z.), issued by the Department of Health, Social Services and Public Safety, Northern Ireland. Procedures have been vetted by the Institutional Animal Care and Use Committee (IACUC) of Queen's University Belfast and approved on 1 March, 2011.

#### *2.2. "Shotgun" Cloning and Protein Bioinformatic Analyses*

Polyadenylated mRNA was removed from the skin secretion of *Pelophylax nigromaculatus* using a Dynabeads mRNA Direct Kit (Thermo Fisher Scientific Inc., Waltham, MA, USA). The first cDNA was then obtained by reverse transcription and the cDNA library was constructed by a RACE technique using an AdvantageTM 2 PCR Kit (BD Clontech, Oxford, UK). Then, the full-length sequence of the mRNA transcript encoding the RNF precursor was captured using a SMART-RACE kit (Clontech U.K.). The degenerate primer was 5′ -CCCRAAKATGTTSACCTYRAAGAAA-3′ , designed from a highly-conserved domain within the 5′ - untranslated regions of closely-related *Rana* species. The RACE products were then cloned by use of a pGEM T-easy vector system (Promega, Southampton, UK) and sequenced by an ABI 3100 automated sequencer (Applied Biosystems, Foster City, CA, USA). The target cDNA and corresponding translated protein sequence were compared to homologous entries in the GenBank public database with BLASTn and BLASTp, respectively (National Centre for Biotechnology Information (NCBI), http://www.ncbi.nlm.nih.gov). The homologous protein sequences mainly came from different species of frog and these sequences were then aligned with the precursor of the novel peptide. According to the results of the structural bioinformatic analyses, the mature peptide sequence was predicted [22].

#### *2.3. Solid-Phase Peptide Synthesis and Confirmation of Structure*

RNF and its analogues, RNF1 and RNF3L, were synthesised using standard Fmoc amino acids on a Tribute automated peptide synthesiser (Protein Technologies Inc., Tucson, AZ, USA). When the synthesis was completed, the peptides were cleaved from the resin and de-protected. Using a weak solution of hydrogen peroxide, the samples were oxidised to form the disulphide loop structure. After purification on an Adept CECIL4200 RP-HPLC (Amersham Biosciences Inc., Piscatawa, NJ, USA), Column: Phenomenex Aeris Peptide, C18, 250 mm × 10.0 mm (Phenomenex, Macclesfeld, UK), the masses of peptide samples were confirmed by using matrix-assisted laser desorption/ionisation time-of-flight (MALDI-TOF) mass spectrometry (Perseptive Biosystems, Framingham, MA, USA). Alpha-cyano-4-hydroxycinnamic acid (CHCA) (Sigma Chemical Co., St. Louis, MO, USA) was used as matrix and was dissolved in TFA/H2O/acetonitrile (0.05/49.95/50, *v*/*v*/*v*).

#### *2.4. Circular Dichroism (CD) Analysis*

The secondary structures of synthetic peptides were determined with a JASCO J-815 CD spectrometer (Jasco, Essex, UK). Fifty micromolar peptide samples were, respectively, dissolved in ddH2O (aqueous environment) and 30 mM sodium dodecyl sulphate (SDS) buffer (membrane-mimetic environment) (Sigma, Dorset, UK). The measurement was completed at 20 ◦C with the wavelength of 195–250 nm, 1 nm bandwidth, 0.5 nm data pitch and scanning speed of 100 nm/min. The collected results were then analysed with the online analysis tool, K2D3 (http://cbdm-01.zdv.uni-mainz.de/ ~andrade/k2d3/) [23].

#### *2.5. Trypsin, Chymotrypsin and Tryptase Inhibition Assays*

Trypsin and chymotrypsin inhibition assays were performed as previously described [24]. The substrate of trypsin and chymotrypsin were Phe-Pro-Arg-AMC and Succinyl-Ala-Ala-Pro-Phe-NHMec, respectively, and obtained from Bachem, Merseyside, UK. In the tryptase inhibition assay, peptide samples were first dissolved in tryptase assay buffer pH 7.6, containing 0.05 M Tris, 0.15 M NaCl and 0.2% (*w*/*v*) PEG 6000 (final volume 210 µL) to obtain different concentrations of 0.5, 1, 2 and 4 mM and then added to the wells of a black 96-well plate containing (Boc-Phe-Ser-Arg-NHMec, obtained from Bachem, Merseyside, UK) (50 µM). Before measurement, tryptase (2.5 µL from 1 mg/mL stock solution, Calbiochem, Nottingham, UK) was added to each well. The rate of hydrolysis of substrate was monitored continuously at 37 ◦C and 460 nm with a Fluostar Optima plate reader (BMG Labtech GmbH, Offenburg, Germany)

#### *2.6. Minimum Inhibitory Concentration (MIC) and Minimal Bactericidal Concentration (MBC) Assays*

The MICs and MBCs of RNF and its analogues were evaluated using seven types of microorganisms as previous described [25]: the Gram-positive bacteria, *Staphylococcus aureus* (*S. aureus*) (NCTC 10788), *Enterococcus faecalis (E. faecalis)* (NCTC 12697) and Methicillin-resistant *Staphylococcus aureus (MRSA)* (NCTC 12493); the Gram-negative bacteria, *Escherichia coli (E. coli)* (NCTC 10418) *Pseudomonas aeruginosa*

(*P. aeruginosa*) (ATTC 27853) and the *Klebsiella pneumoniae (K. pneumoniae)* (ATCC43816); and a single yeast, *Candida albicans (C. albicans)* (NCPF 1467).

#### *2.7. Checkerboard Assays*

Peptides at different concentrations (1, 2, 3 and 4 µM) were loaded along the rows of a 96-well plate, while the columns received different concentrations of Gentamicin (0.25, 0.5, 1 and 2 µM) (Sigma, UK). Then, the microorganism cultures (1 × 10<sup>6</sup> cfu/mL) were inoculated into the plate wells and incubated at 37 ◦C for 18 h. The optical densities of the cultures were measured using an ELISA plate reader (Biotech Minneapolis, MN, USA). The lowest cumulative fractional inhibitory concentration (ΣFIC) was calculated with the following equation:

$$\text{EICC} = \frac{\text{Compound A treated in combination}}{\text{Compound A, test in alone}} + \frac{\text{Compound B treated in combination}}{\text{Compound B, test in alone}} \tag{1}$$

ΣFIC ≤ 0.5 indicates synergy; 0.5 < Σ FIC ≤ 4 indicates an additive effect; and Σ FIC > 4 represents an antagonistic effect [26–28].

#### *2.8. Haemolysis and Cytotoxicity Assays*

The haemolysis assay was performed using horse erythrocytes (2% suspension) (TCS Biosciences Ltd., Buckingham, UK) as reported previously [29]. Serial concentrations of peptide were incubated with blood cell suspension at 37 ◦C for 2 h. Then, 1% Triton X-100 (Sigma, Dorset, UK) was used to determine total haemolysis of cells as a positive (100%) control of test peptide and added phosphate-buffered saline (PBS) served as the negative control (0%). After this, the mixtures were centrifuged at 1000× *g* for 5 min and lysis of red cells was determined by measuring of the optical density values of supernatants at 550 nm.

The cytotoxic effects of RNF and its analogues on mammalian cells was examined using a human keratinocyte cell line (HaCaT) (ATCC-PCS-200-011) and a human microvascular endothelial cell line (HMEC-1) (ATCC-CRL-3243). Cells (5 × 10<sup>3</sup> cells/mL) were seeded into 96-well plates with FBS-containing fresh medium for 24 h. After that, cells were treated with corresponding serum-free culture medium for 12 h. Cells were then treated with RNF in a concentration range from 10−<sup>9</sup> to 10−<sup>4</sup> M. After 24 h of treatment, MTT reagent (5 mg/mL) was added to each sample and cells were incubated for 4–6 h at 37 ◦C. After exposure for 6 h, all liquid was removed and 100 µL of DMSO was added. Finally, the absorbance of samples at 570 nm was determined using an ELISA plate reader (Biotech, Minneapolis, MN, USA).

#### *2.9. Statistical Analysis*

Data were assessed for statistical significance using One-Way ANOVA with Prism (Version 6.0; GraphPad Software Inc., San Diego, CA, USA). Results are reported as the mean ± SEM with significance (\* *p* < 0.05, \*\* *p* < 0.01, \*\*\* *p* < 0.001 and \*\*\*\* *p* < 0.0001).

#### **3. Results**

#### *3.1. Molecular Cloning of the RNF Precursor-Encoding cDNA and Sequence Analysis*

Through use of "shotgun" cloning, the RNF precursor cDNA was cloned successfully from the *Pelophylax nigromaculatus* skin secretion cDNA library. As shown in Figure 1, the deduced precursor of RNF consisted of 63 amino acid residues, which included a 22-amino acid residues signal peptide, a 21-mer acidic amino acid spacer and a 17-amino residues mature peptide. The deduced mature peptide was preceded by a typical processing site (-K-R-) and ended in a glycine residue, which is the classical amino donor for C-terminal amidation (Figure 1). The alignment of nucleotide and peptide precursor amino acid sequence exhibited a high degree of similarity between ranacyclin-HB1

and ranacyclin-HB2 (Figure 2). The nucleotide sequence of RNF has been deposited in the GenBank database (accession number MT584032).

> **M F T L K K S I L L L F F L G T I 1ATGTTCACCT TGAAGAAATC CATTCTACTC CTTTTCTTTC TTGGGACCAT TACAAGTGGA ACTTCTTTAG GTAAGATGAG GAAAAGAAAG AACCCTGGTA S L S L C E Q E R D A D E D D G G 51CTCCTTATCT CTCTGTGAGC AAGAGAGAGA TGCCGATGAA GACGATGGAG GAGGAATAGA GAGACACTCG TTCTCTCTCT ACGGCTACTT CTGCTACCTC E V T G E E V K R G A P R G C W 101GGGAAGTTAC AGGGGAAGAA GTAAAAAGAG GTGCGCCCAG GGGTTGCTGG CCCTTCAATG TCCCCTTCTT CATTTTTCTC CACGCGGGTC CCCAACGACC T K S Y P P Q P C F G K R \* 151ACCAAGAGTT ATCCACCACA GCCTTGTTTT GGAAAAAGAT AAAATATGAT TGGTTCTCAA TAGGTGGTGT CGGAACAAAA CCTTTTTCTA TTTTATACTA 201TAGAAAGTAT CGGAATATTC CATTTACCGT GTAAATGCTA AATGTCTAAT ATCTTTCATA GCCTTATAAG GTAAATGGCA CATTTACGAT TTACAGATTA 251AAAAAAAAAT AAGCATAAAA AAAAAAAAAA AAAAAAA TTTTTTTTTA TTCGTATTTT TTTTTTTTTT TTTTTTT**

**Figure 1.** Nucleotide and deduced amino acid sequences of cDNA encoding RNF from *Pelophylax nigromaculatus*. There are 63 amino acid residues within the open-reading frame. The putative signal peptide sequence is double-underlined. The following acidic spacer peptide ends with a typical -K-R- cleavage site. The deduced mature peptide is shown with a single underline and the stop codon is marked with an asterisk.


**Figure 2.** (**a**) Alignment of precursor cDNA nucleotides of RNF, ranacyclin-HB1 and ranacyclin-HB2; and (**b**) alignment of precursor cDNA translated peptides of RNF, ranacyclin-HB1 and ranacyclin-HB2. Asterisks indicate identical nucleotides/amino acids (in (**a**,**b**)) and colons (:) indicate chemically-conserved amino acids (in (**b**)).

#### *3.2. Secondary Structure Determinations*

After the sequence was unequivocally established, RNF and its two analogues were synthesised by standard solid-phase Fmoc chemistry. Subsequent analyses of MALDI-TOF MS spectra of respective synthesis mixtures showed molecular masses of major components to be consistent with theoretical masses, indicating that all three syntheses of RNF were successfully obtained (Figure 3a–c). The predicted secondary structure of RNF was acquired by use of I-TASSER and visualised with Pymol, and this indicated that this peptide may adopt a random coil structure (Figure 4a) [23,30]. The precise secondary structures of RNF and its analogues were determined by CD spectroscopy in aqueous (ddH2O) and membrane-mimetic environments (SDS buffer), respectively (Figure 4b). The proportion of different secondary structure domains were calculated using the online tool K2D3 webserver (Figure 4c). It appeared that RNF and both of its analogues adopted similar secondary structures in either aqueous or membrane-mimetic environments, which was generally consistent with the predicted model (Figure 4a).

**Figure 3.** MALDI-TOF MS spectra of the three purified synthetic peptides: (**a**) RNF (1892.69 [M + H]+); (**b**) RNF1 (1893.8 [M + H]+); and (**c**) RNF3L (1906.66 [M + H]+).

(**a**)

(**b**)

**Figure 4.** (**a**) Predicted secondary structure of RNF; (**b**) secondary structures of RNF (red lines), RNF1 (blue lines) and RNF3L (green lines) measured in ddH2O and 30 mM SDS buffers; and (**c**) proportion of different secondary structure domains predicted and calculated by K2D3.

**f** 

#### *3.3. Protease Inhibitory Activity Assays*

RNF and its analogues were subjected to protease inhibitory assays against trypsin, chymotrypsin and tryptase, respectively. RNF exhibited obvious trypsin inhibitory activity with a Ki value of 0.447 µM. The trypsin inhibitory activity of RNF1 was lower with a Ki of 1.3 µM. RNF3L showed the strongest trypsin inhibitory activity among these three with a Ki of 0.201 µM. They showed similar tryptase inhibitory activity but no chymotrypsin inhibitory activity (Table 1). Compared with other ranacyclin-type peptides, although peptides in this work shared high sequence identity with merely two or three residues mutated, their trypsin inhibitory activity was surprisingly different.

**Table 1.** Inhibitory activity of RNF, RNF1, RNF3L and some ranacyclins against trypsin, chymotrypsin and tryptase.


N.I., there was no inhibitory activity; -, there was no mention in previous studies.

#### *3.4. Antimicrobial Activity*

Antimicrobial activity of RNF and its analogues against Gram-positive bacteria, Gram-negative bacteria and fungi were determined by using a doubling-dilution method. RNF showed a weak bacteriostatic activity toward *S. aureus* at 512 µM, but other peptides did not exhibit obvious antimicrobial activity against any tested microorganisms even at the highest concentration of 512 µM employed (Table 2).

**Table 2.** Antimicrobial activity of RNF and its analogues.


#### *3.5. Additive Antimicrobial E*ff*ects with Gentamicin*

Combinations of an antibiotic (Gentamicin) with either RNF or one of its analogues, improved the activity of the peptides against MRSA (Figure 5). As shown in the following figure, the MIC of Gentamicin against MRSA was 2 µM. In the presence of RNF and its analogues, the MIC of Gentamicin against *MRSA* was reduced to 1 µM. Since RNF and its analogues showed no obvious inhibitory activity against *MRSA* at 512 µM, the lowest cumulative fractional inhibitory concentration (ΣFIC) would be between 0.5 and 1, indicative of an additive effect between peptide and Gentamicin. The best combination of peptide with Gentamicin was 1 µM with 1 µM, respectively.

#### *3.6. Haemolysis and Cytotoxicity*

Released haemoglobin served as a signal of haemolytic activity of peptide. The absorbance of red blood cell lysis was evaluated at a wavelength of 570 nm. RNF and analogues were found to have a weak haemolytic activity and induced less than 5% haemolysis at a concentration of 512 µM (Figure 5d). Moreover, the peptides showed negligible cytotoxicity toward human normal cell lines even at the highest concentrations tested of 100 µM (Figure 6).

Σ

μ

μ

μ

μ

μ μ

μ

μ **Figure 5.** The additive effect against the growth of MRSA between Gentamicin (Gen) and: (**a**) RNF; (**b**) RNF1; and (**c**) RNF3L. (**d**) Haemolysis of RNF, RNF1 and RNF3L at 128, 256 and 512 µM. PBS and 1% Triton X-100 were set as negative and positive controls, respectively.

**Figure 6.** The cytotoxicity of (RNF) RNF1 and RNF3L: on HMEC-1 cells (**a**); and on HaCaT cells (**b**).

#### **4. Discussion**

In previous studies, extensive studies have been made to reveal the role played by each residue within the TIL in overall enzyme inhibitory activity. However, factors outside this conserved disulphide loop, which may affect trypsin inhibition activity, have rarely been studied. In 2012, Wang et al. discovered the peptide, HJTI, and designed one molecular variant to study the role that cationicity played in trypsin inhibitory potency. They believed that structural features, especially for amino acid residues lying outside the TIL, were crucial in modulating the effectiveness of trypsin inhibitory activity [31]. In addition, according to a previous report, ranacyclin-T characterised from *Rana esculenta,* showed a stronger trypsin inhibitory activity than RNF with a Ki of 116 nM [10]. To construct rational explanations for the different trypsin inhibitory activities between ranacyclin-T and RNF and to further study structure and trypsin inhibitory activity relationships, we compared some structural parameters of these two peptides (Table 3). Compared with RNF, ranacyclin-T contained a larger net positive charge and a different residue (Lys) at the P5′ position—a position thought to play important role in the trypsin inhibitory activity of naturally-occurring BBIs [4]. Analysis of predicted secondary structures also revealed that ranacyclin-T may adopt a beta-sheet structure which was different for RNF3L (Figure 7). Taken together, the different trypsin inhibitory activities of RNF and ranacyclin-T could be generally explained by three points: (1) different secondary structures; (2) different cationicities; and (3) different residues at the P5′ position. In a previous study, amino acid scanning at the P5′ position within the BBIs inhibitory loop, revealed that the residue Gln (Q) was the optimal residue at the P5′ position with a relatively higher inhibitory activity in both bovine beta-trypsin and human cationic trypsin inhibitory studies [4]. Therefore, residues at the P5′ position might not be the main reason for induction of different activities. On consideration of the effect of cationicity, which may play a minor role in trypsin inhibitory potency [31], it was assumed that secondary structure may contribute to different trypsin inhibitory activities. The different secondary structures of these two peptides were possibly induced by the different residues at the third position since Pro usually inhibits backbones to conform to alpha-helix or beta-sheet structures [33]. Therefore, it was decided here to focus on the possible role of features outside the TIL in trypsin inhibitory activity. To this end, the structure of ranacyclin-T was used as a template to generate RNF3L, which was designed by replacing the third residue of Pro with Leu, aiming to study potential relationships of structure–activity. Moreover, for some known AMPs, C-terminal amidation is considered to play a significant role in structural stability and antimicrobial activity. In addition, according to Li et al.'s work in 2007, the C-terminal naked phenylalanine of ORB may help promote antimicrobial efficacy due to its strong hydrophobicity, which may facilitate the aggregation of peptide to form a channel-like structure on cell membranes [21]. Therefore, the free C-terminal analogue, RNF1, was synthesised to study the role of C-terminal amidation in trypsin inhibitory capacity and antimicrobial activity.


**Table 3.** Some parameters of RNF and some reported ranacyclins and their trypsin inhibition Ki values. Amino acid residues at P5′ are shaded.

In the trypsin inhibition assay, RNF showed obvious inhibitory activity with a Ki of 447 nM. Compared with RNF, RNF1 exhibited relatively lower inhibitory potency with a Ki of 1.3 µM. Some researchers believe that C-terminal amidation plays a fundamental role in peptide stability since the terminal amidation could generate a mimic of native proteins [1]. Therefore, the lack of C-terminal amidation might make RNF1 more flexible in structure, which might be unfavourable for peptide to

combine with reactive sites on trypsin. RNF3L showed the strongest trypsin inhibitory activity in assay with a Ki value of 201 nM. The increased trypsin inhibitory activity of RNF3L may be due to two reasons: (1) the residue Leu might be more suitable than Pro at this position; and (2) the formation of a beta-sheet leads to the improvement of inhibitory activity. According to the results of CD spectroscopy, RNF and its analogues adopted similar secondary structures both in aqueous and membrane-mimetic environments. Therefore, it is unlikely that secondary structure contributes to the improvement of trypsin inhibitory activity shown by RNF3L. To shed light on exactly how RNF3L shows stronger trypsin inhibitory activity requires the design and performance of additional experiments in the future. In addition, more peptides need to be identified or designed to make clear the exact roles of residues outside the TIL on the exhibition of trypsin inhibitory activity. Taken together, both the C-terminal de-amidation effect on increased trypsin-inhibition activity and the substitution of specific residues in the peptide to this end, provided several lines of evidence to the hypothesis that structural features outside the disulphide loop may play a more fundamental role in dictating inhibitory potency. In the chymotrypsin inhibitory assay, none of the tested peptides showed inhibitory activity. In fact, according to the specific residues at cleavage site P1, the serine protease could be categorised into trypsin, chymotrypsin and elastase analogues. For trypsin-like proteases, the S1 pocket is narrow and anionic. However, the S1 pocket of chymotrypsin-like proteases is broad and hydrophobic [34]. Therefore, chymotrypsin inhibitors or activators generally contain specific residues, such as Trp, Phe or Leu. In this report, the residue at the P1 position of RNF and its analogues was a Lys, which dictated they cannot inhibit chymotrypsin. Modification work in this study revealed the importance of residues and structural features outside TIL for the trypsin inhibitory activity of the ranacyclins peptide. A good understanding of the structure and trypsin inhibitory activity relationship would be favourable for the designs of improved peptides, which are promising candidates in the treatment of health problems, such as metabolic syndrome [35]. ′ ′ ′ ′


(**a**) (**b**)

′

**Figure 7.** Predicted secondary structures of ranacyclin-T (**a**) and RNF (**b**), obtained from I-TASSER (https://zhanglab.ccmb.med.umich.edu/I-TASSER/) [30].

′ μ Tryptases have been found to play important roles in some diseases, such as atherosclerosis, mastocytosis and acute myeloid leukemia. Tryptases in mast cells have also been shown to be closely related to the symptoms of allergic asthma and inflammation [10]. Therefore, the emergence of a novel tryptase inhibitor would be an asset in the treatment of some clinical diseases. In this study, RNF was found to have obvious tryptase inhibitory activity, which may help to solve some difficulties in the clinic. Some publications believe that protease inhibitors cannot work effectively when it comes to human proteases [36]. However, RNF and its designed analogues, as well as some other reported BBI peptides such as OSTI and PPF-BBI, show obvious tryptase inhibitory activity, which was different to previous reports [24,32]. It could be that the larger protease inhibitors might be too large to approach the reactive sites on tryptase and if this should be the case, then some smaller protease inhibitors, such as peptides of the BBI family, may have a great advantage in accessing these active sites.

Ranacyclins were defined as a novel antimicrobial peptide family with potent antibacterial and anti-fungi activities; unlike most cationic AMPs, ranacyclins tend to adopt a pre-dominant random coil structure and can bind and insert into the membrane primarily through the interaction with the hydrophobic core of the bacterial membrane rather than electrostatic interaction [13]. Therefore, ranacyclins firstly approach to the membrane surface through hydrophobic interaction and then, at sites where peptide concentrations reach a threshold, insert into the hydrophobic core of the membrane bilayer and form a channel-like structure leaking the cells [13,37]. However, RNF in this work did not exhibit obvious bacteriostatic or bactericidal effects as other reported ranacyclins. Except for the weak bacteriostatic activity of RNF against *S. aureus*, RNF and its variants showed no obvious antimicrobial activity. These "abnormal" results, nevertheless, were not sufficient to prove that RNF was a "fake" antimicrobial peptide or not all ranacyclins exhibit antimicrobial properties. Bacteria used in this reported were different from those in previous papers; RNF without antimicrobial efficacy in this study does not mean it cannot kill other untested strains. To comprehensively evaluate the potential antimicrobial efficacy of RNF or ranacyclins, current studies were inadequate and more bacterial strains need to be employed. In this study, RNF adopted a mainly random coil structure and had a low cationicity. Ranacyclin-T reported previously adopted a beta-sheet structure, displayed a higher net positive charge and had relatively stronger antimicrobial activity with respect to RNF, which may prove that different structural proportions and cationicity may have contributed to the negligible antimicrobial activity of RNF [13]. For RNF1, although C-terminal de-amidation exposed a naked phenylalanine residue, it also came with a reduction in net charge, which may explain why RNF1 could not inhibit the growth of test microorganisms in this study. Although RNF with weak bacteriostatic activities were hard to be applied in the pharmaceutical fields, their negligible cytotoxicity and obvious trypsin inhibitory activity allow them to be applied in the food industry where the requirements for microbiota control are not as strict as in clinical helping to control pressing obesity problems [35,38]. Moreover, using the advantages of multiple functions, RNF could serve as a temple to design some ameliorated derivatives [21,38].

Gentamicin is a common antibiotic which is effective against a wide range of microorganisms, including Gram-positive bacteria and Gram-negative bacteria. Due to its excellent antimicrobial efficacy and broad-spectrum action, Gentamicin is widely-employed in clinical treatment [39]. However, as a typical aminoglycoside antibiotic, Gentamicin is considered as an "obligatory nephrotoxin" and even small doses could lead to the production of nephrotoxicity in humans and animals [40]. Therefore, the nephrotoxicity from aminoglycosides was considered as an inevitable barrier for patients. On the one hand, in a previous study, BBIs were found to mitigate Gentamicin-induced nephrotoxicity while not decreasing the therapeutic effects of Gentamicin [41]. On the other hand, some peptides in the BBI family were reported to show synergistic antimicrobial effects with some clinical antibiotics, such as rifampicin and erythromycin [13]. Therefore, in this study, Gentamicin was used to investigate whether RNF could help improve its therapeutic effect. The antimicrobial ability of Gentamicin was improved in the presence of RNF with a lower MIC against MRSA. Moreover, RNF and its analogues showed no obvious haemolysis (inducing no more than 5% haemolysis at 512 µM) and negligible cytotoxicity toward human normal cell lines, HMEC-1 and HaCaT. Considering the possible decreasing of nephrotoxicity, low toxic effects per se and improved antimicrobial efficacy, the application of RNF was promising and deserving of further critical study.

#### **5. Conclusions**

A novel BBI peptide, RNF, was characterised from the skin secretion of *Pelophylax nigromaculatus.* The different trypsin inhibitory activities of RNF1 and RNF3L revealed residues or structures outside of the trypsin inhibition loop per se, fundamentally contribute to their trypsin inhibitory activity. To prove this hypothesis, more experiments remain to be designed and completed. Although RNF and its analogues showed no obvious bactericidal activity in antimicrobial assays, they could help to improve the therapeutic efficacy of Gentamicin or other conventional antibiotics, at low concentrations. Moreover, these three peptides showed negligible haemolysis and cytotoxicity toward normal cells. Therefore, the further analysis and perhaps development of RNF may be of value in addressing or solving some pressing health problems.

**Author Contributions:** Conceptualisation, L.W., M.Z. and T.C.; Data curation, Y.J. and T.W.; Formal analysis, T.W. and X.C.; Investigation, T.W., Y.J. and Y.Z.; Methodology, X.X. and C.M.; Project administration, M.Z.; Resources, M.Z. and Y.Z.; Supervision, X.C., L.W., M.Z. and T.C.; and Validation, X.C. and L.W. Writing—original draft, T.W., X.C. and L.W.; Writing—review & editing, Y.Z., C.S. and M.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Biology* Editorial Office E-mail: biology@mdpi.com www.mdpi.com/journal/biology

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com ISBN 978-3-0365-2574-7