*Article* **The Multivalent Polyampholyte Domain of Nst1, a P-Body-Associated** *Saccharomyces cerevisiae* **Protein, Provides a Platform for Interacting with P-Body Components**

**Yoon-Jeong Choi <sup>1</sup> , Yujin Lee <sup>1</sup> , Yuxi Lin <sup>2</sup> , Yunseok Heo <sup>2</sup> , Young-Ho Lee 2,3,4 and Kiwon Song 1,\***


**Abstract:** The condensation of nuclear promyelocytic leukemia bodies, cytoplasmic P-granules, P-bodies (PBs), and stress granules is reversible and dynamic via liquid–liquid phase separation. Although each condensate comprises hundreds of proteins with promiscuous interactions, a few key scaffold proteins are required. Essential scaffold domain sequence elements, such as poly-Q, low-complexity regions, oligomerizing domains, and RNA-binding domains, have been evaluated to understand their roles in biomolecular condensation processes. However, the underlying mechanisms remain unclear. We analyzed Nst1, a PB-associated protein that can intrinsically induce PB component condensations when overexpressed. Various Nst1 domain deletion mutants with unique sequence distributions, including intrinsically disordered regions (IDRs) and aggregation-prone regions, were constructed based on structural predictions. The overexpression of Nst1 deletion mutants lacking the aggregation-prone domain (APD) significantly inhibited self-condensation, implicating APD as an oligomerizing domain promoting self-condensation. Remarkably, cells overexpressing the Nst1 deletion mutant of the polyampholyte domain (PD) in the IDR region (Nst1∆PD) rarely accumulate endogenous enhanced green fluorescent protein (EGFP)-tagged Dcp2. However, Nst1∆PD formed self-condensates, suggesting that Nst1 requires PD to interact with Dcp2, regardless of its selfcondensation. In Nst1∆PD-overexpressing cells treated with cycloheximide (CHX), Dcp2, Xrn1, Dhh1, and Edc3 had significantly diminished condensation compared to those in CHX-treated Nst1 overexpressing cells. These observations suggest that the PD of the IDR in Nst1 functions as a hub domain interacting with other PB components.

**Keywords:** P-body; liquid–liquid phase separation; Nst1; polyampholyte domain; aggregation-prone domain; *Saccharomyces cerevisiae*

## **1. Introduction**

The phenomenon of biomolecular phase separation has expanded our understanding of biomolecular condensation in cells [1]. Biomolecular condensates include many nonmembranous cellular structures, such as Cajal bodies, nuclear speckles, histone-locus bodies, promyelocytic leukemia (PML) nuclear bodies (NB) in the nucleus [2–5], P-bodies (PBs), stress granules (SGs), and germ granules in the cytoplasm [1,2,4,6–9]. These membrane-less cellular structures are not random biomolecule mixtures. Some components are shared in different condensates, but each membraneless organelle contains a specific group of proteins and RNA/DNA, differentiating it from others. Not all of the condensate components are critical for inducing condensation, but a few components, so-called scaffolds, play crucial roles [10].

**Citation:** Choi, Y.-J.; Lee, Y.; Lin, Y.; Heo, Y.; Lee, Y.-H.; Song, K. The Multivalent Polyampholyte Domain of Nst1, a P-Body-Associated *Saccharomyces cerevisiae* Protein, Provides a Platform for Interacting with P-Body Components. *Int. J. Mol. Sci.* **2022**, *23*, 7380. https://doi.org/ 10.3390/ijms23137380

Academic Editor: Vladimir N. Uversky

Received: 23 May 2022 Accepted: 29 June 2022 Published: 2 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Ribonucleoprotein (RNP) granules are among the most representative biomolecular condensates and are an efficient model for studying biomolecular condensation in cells. RNA, the general component of these condensates, is essential for these condensation processes [11]. RNase treatment disperses isolated messenger RNPs (mRNPs) in vitro [11]. Additionally, decreased free ribosomal mRNA influx alleviates mRNP granule condensation in cells [12]. These phenomena strongly support the notion that RNA is a critical factor for molecular condensation.

mRNP granules also contain hundreds of proteins [10,13–17]. Scaffold proteins that function as nodes for protein–protein interaction networks are typically sufficient to form condensates [10]. Scaffold proteins have the intrinsic potential to induce condensation, while client elements are concentrated within the structure often by direct interactions with scaffolds but are not required for condensate formation [18]. The scaffold proteins show a few distinctive characteristics that distinguish them from client proteins. First, numerous scaffold proteins exhibit self-oligomerizing properties. The RING finger-B boxcoiled coil (RBCC) motif [19–21] contains an N-terminal RING, B1-box, B2-box, and a C-terminal coiled-coil (CC) domain and is considered essential for PML oligomerization to form PML NBs [22–24]. In the case of Ras-GTPase-activating protein (SH3 domain)-binding protein (G3BP), the dimerization domain nuclear transport factor 2 (NTF2) is insufficient but necessary for SG formation [25].

Scaffold proteins also have multivalent regions that participate in weak interactions with numerous binding partners [26,27]. Intrinsically disordered regions (IDRs) with few three-dimensional (3D) structures and little specificity [28–33] are reportedly necessary for liquid–liquid phase separation (LLPS) dynamics and multivalency. IDRs of the heterogeneous RNP family, including hbRNPA1 in SGs [31], hbRNPA2B1 [33], RNA helicase Ddx4 in nuage [28], and Laf-1 in P-granules [34,35], are sufficient to mediate phase separation. Low-complexity domains (LCDs) [33,36] such as the poly-Q/N prion-like domain (PrD) and the arginine-glycine-rich (RGG) motif [4,34,37] are also known as critical modifiers for generating LLPS. The polyampholyte or polyelectrolyte region of the IDR may function as a sticker to promote LLPS [34]. Although previous research has established the link between IDR and multivalency, it has not elucidated the syntax of molecular condensation.

PBs of the budding yeast *Saccharomyces cerevisiae* provide an excellent system for studying the elements and mechanisms to form cellular condensates. The predominant components of yeast PBs are mRNA decapping protein Dcp1 and Dcp2, which constitute the decapping enzyme, enhancer of mRNA decapping protein 3 (Edc3), Pat1, Dhh1, and the Lsm1-7 complex, all of which are mRNA-binding proteins that stimulate mRNA decapping [37–39]. Predominantly, multivalent Edc3 interactions appear to drive PB formation. Edc3 serves as a scaffold for PB assembly, primarily under glucose deprivation when PB formation is robust. In a previous study, we identified Nst1 as a novel PB component. Nst1 accumulates in PBs more densely in stationary phase cells and under glucose deprivation. Ectopically overexpressed Nst1 is self-condensed and induces the condensation of other PB components, such as Dcp2, indicating that Nst1 has the intrinsic potential to self-condensate and accumulate other PB components [40].

Here, we dissected Nst1 by overexpressing various Nst1 domain deletion mutants to understand the functions of distinctive Nst1 sequence elements in its self-condensation and recruitment of other PB components and improve our knowledge of molecular condensation in cells.

#### **2. Results**

#### *2.1. The Nst1 C-Terminal Domain (CTD) Contains Polyampholyte and Aggregation-Prone Regions*

We previously reported that Nst1, similarly to Edc3, induced Dcp2 accumulation via self-condensation and physical interactions with other PB components [40]. These observations strongly suggest that Nst1 contains an oligomerizing domain similar to Edc3, with the intrinsic potential to drive self-condensation. Nst1 is a 141 kDa protein consisting of 1240 amino acids with a unique sequence distribution (Supplemental Figure S1A). To determine

the properties of Nst1 in self-generating condensates and the induced condensation of other PB components, we analyzed the Nst1 sequence using multifaceted sequence prediction tools: protein structure prediction using GalaxyWEB (http://galaxy.seoklab.org/, accessed on 12 October 2018) (Supplemental Figure S1A–C), IDR prediction with IUPRED2A, PONDR, and DISOPRED3, and aggregation-prone region prediction with AGGRESCAN, Tango, and PASTA 2.0 (Figure 1A,B). tion of other PB components, we analyzed the Nst1 sequence using multifaceted sequence prediction tools: protein structure prediction using GalaxyWEB (http://galaxy.seoklab.org/, accessed on 12 October 2018) (Supplemental Figure S1A–C), IDR prediction with IUPRED2A, PONDR, and DISOPRED3, and aggregation-prone region prediction with AGGRESCAN, Tango, and PASTA 2.0 (Figure 1A,B).

1240 amino acids with a unique sequence distribution (Supplemental Figure S1A). To determine the properties of Nst1 in self-generating condensates and the induced condensa-

*Int. J. Mol. Sci.* **2022**, *23*, 7380 3 of 21

(blue) [42], and DISOPRED3 (green) [43] algorithms were used for the prediction. Disorder scores were calculated and presented. Scores exceeding the 0.5 threshold indicate the amino acid residues

(blue) [42], and DISOPRED3 (green) [43] algorithms were used for the prediction. Disorder scores were calculated and presented. Scores exceeding the 0.5 threshold indicate the amino acid residues in the Nst1 disordered regions. The disordered regions with scores >0.5 in all three algorithms used are identified as IDRs and highlighted in red-lined boxes. A length threshold for the disordered regions is also set to >30 residues [44]. The PD in predicted disordered regions is marked with thick red-lined boxes. PD corresponding residues are labeled in Supplemental Figure S1. (**B**) Predicting the Nst1 aggregation-prone regions. The regions with high aggregation propensities were calculated using AGGRESCAN (purple) [45], Tango (orange) [46], and PASTA 2.0 (blue) [47] algorithms. Amino acid residues with scores greater than the threshold value in all three algorithms were aggregation-prone and marked with a red box in the Nst1 sequence. Corresponding residues are labeled in Supplemental Figure S1. (**C**) A diagram of Nst1 with domain architectures predicted by (**A**,**B**), and GalaxyWEB. Each color in the schematic corresponds to a particular domain in the sequence. (**D**) Das–Pappu diagrams of the full-length Nst1 and its domain deletion mutants. The full-length Nst1 and its various domain deletion mutants are numbered in the box: 1. Nst1 (residues 1–1240), 2. Nst1NTD (N-terminal domain (NTD) Nst1 residues 1–429), 3. Nst1CTD (C-terminal domain (CTD) Nst1 residues 430–1240), 4. Nst1∆PD (residues 1–630 and 753–1240), 5. Nst1∆APD (residues 1–1015), 6. Nst1∆PD∆APD (residues 1–630 and 753–1015), 7. Nst1PD (residues 631–752), and 8. Nst1APD (residues 1016–1240). The *x*and *y*-axes represent the fraction of positively and negatively charged residues, respectively. The four zones (R1–R4) of the diagram are colored in bright green (R1), emerald (R2), forest (R3), and red/blue (R4), respectively. The physicochemical properties of each colored zone are explained in the inset. The numbers for the full-length Nst1 and its domain deletion mutants are assigned to a corresponding region from R1–R4 with a circle.

The 980-amino acid sequence (from amino acid 131 to 1110), excluding 130 amino acids of each Nst1 N- and C-terminus, was analyzed because of the 1000 amino acid limit of GalaxyWEB (Supplemental Figure S1B). N-terminus (residues 1–429) (data not shown) and C-terminus (residues 430–1240) (Supplemental Figure S1C) structures were also predicted independently. Nst1 was expected to be low-ordered and to not form a globular 3D structure (Supplemental Figure S1B). Based on the prediction of the secondary structure by GalaxyWEB, Nst1 could be divided mainly into two domains: the N-terminal domain (NTD) (1–406) and CTD (430–1240), with a short 23-amino acid unstructured region (UR) between them (Figure 1C).

The 225 amino acids (residues 1016–1240) in the Nst1 C-terminus contain particularly high scores in aggregation propensity prediction (Figure 1B). Considering the aggregation propensity and protein secondary structure predictions, we designated this region as an aggregation-prone domain (APD) (Figure 1C and Figure S1C).

We found that amino acids 1–32 in the NTD and 491–980 in the CTD scored highly in all three IDR predictions (Figure 1A). The polyampholyte sequence, including charged amino acid clusters D, E, R, and K with sparse hydrophobic amino acid L, was embedded in the predicted IDR (Figure 1A and Figure S1A). Considering that the polyampholytic sequence was predicted as coiled-coil (CC) helices in the secondary structure prediction by GalaxyWEB, we designated this predicted region as the polyampholyte domain (PD) (Supplemental Figure S1B).

Polyampholyte sequences are commonly present in many IDRs [48]. Charged amino acids, such as D, E, K, and R, function as inter- and intra-molecular stickers to generate LLPS [34,49]. The molecular conformation of proteins can be deduced based on the fraction of the charged amino acids in the Das–Pappu diagram [50]. Fundamentally, the fraction of charged residues (FCR) and the net charge per residue (NCPR) determine the four regions, R1, R2, R3, and R4, in the Das–Pappu diagram. The proteins showing an FCR value smaller than 0.35 were assigned to R1 or R2. The protein sequences presented in R1 and R2 were expected to have a globular conformation of weak polyampholytes and an alternative globular conformation of a context-dependent polyampholyte, respectively.

Proteins with an FCR value greater than 0.35 were classified as R3 or R4. These protein sequences were strong polyampholytes or polyelectrolytes that were expected to be coiled. We projected the sequence of Nst1 and each Nst1 deletion mutant of the predicted domain onto the Das–Pappu diagram (Figure 1D). Full-length Nst1 (1) was projected in R2, where the fraction of negatively or positively charged residues was between 0.25 and 0.35. The zone of the context-dependent polyampholyte indicates that Nst1 may not have a stable globular protein structure, and its composition may be altered in a context-dependent manner. Additionally, the Nst1 N-terminal (Nst1NTD) (2) and Nst1 C-terminal (Nst1CTD) (3) projections were close to the full-length Nst1 in the same R2 region of the diagram. This indicated that the ratio of the charged Nst1NTD and Nst1CTD residues was similar to that of full-length Nst1. However, as expected due to its IDR predictions, Nst1∆PD (4), the mutant lacking the polyampholyte region, was projected on the border of R2 and R1, in which the fraction of negatively or positively charged residues was below 0.25. This Nst1∆PD prediction indicated that deleting the Nst1 polyampholyte region could severely alter the full-length Nst1 FCR. In contrast, Nst1∆APD (5) was projected onto the R3 zone, demonstrating that deleting the aggregation-prone domain (APD) increased the Nst1 FCR. The Nst1∆PD∆APD projection (6) showed the offset effect of PD and APD deletions. Collectively, these predictions suggest that the unique sequence distribution of Nst1, especially the PD and APD, may enhance Nst1 self-condensation and the condensation of other PB components.

#### *2.2. The Nst1 CTD Is Sufficient for Nst1 Self-Condensation*

To identify the specific regions of Nst1 responsible for the self-condensation and condensation of other PB components, we designed various Nst1 domain deletion mutants with different domain combinations based on the predictions (Figure 1). Each green fluorescent protein (GFP)-tagged Nst1 domain deletion mutant was overexpressed under the galactose inducible (*GAL*) promoter of pMW20 in the wild-type cells, and its expression was confirmed by Western blot analysis (Supplemental Figure S2A). As reported in a previous study [40], overexpressed GFP-tagged Nst1 formed bright puncta (Figure 2A). The Nst1 mutant, Nst1NTD, was completely dispersed throughout the cytoplasm when overexpressed (Figure 2A). In contrast, Nst1CTD formed clear puncta upon overexpression (Figure 2A). These observations demonstrate that the Nst1 CTD was sufficient to form self-condensates upon overexpression (Figure 2A).

The puncta formed by Nst1 overexpression were closely correlated with the physical LLPS properties obtained via 1,6-hexanediol treatment [27]. 1,6-hexanediol is reported to eradicate the nuclear pore permeability barrier by interfering with hydrophobic interactions in the pores and is generally used to interfere with the integrity of reversible condensates with liquid-like properties [51,52]. In budding yeast cells, treatment with 5–10% 1,6-hexanediol for 30 min can impede PB integrity but cannot disperse irreversible amyloids [27]. When cells were treated with 1,6-hexanediol, we observed that condensates of overexpressed GFP-tagged Nst1CTD dispersed as those of full-length Nst1, exhibiting the liquid-like property of both condensates (Figure 2A).

condensation induced by Nst1CTD overexpression was enhanced compared to the vector control and overexpressed Nst1NTD (Figure 2C). However, EGFP-Dcp2 condensation in cells overexpressing Nst1CTD was reduced compared with that in cells overexpressing full-length Nst1 (Figure 2C). To quantify the degree of EGFP-Dcp2 puncta generated, we segmented pixels of the top 0.05% intensity for puncta analysis, and the maximum intensities of the segmented puncta scaled from 0–255 were analyzed using a boxplot. The measuring method is described in detail in the Materials and Methods section and our previous study [40]. Consistent with Figure 2C, full-length Nst1 and Nst1CTD overexpression increased the maximum intensities of EGFP-Dcp2 condensates, while Nst1NTD overexpression did not (Figure 2D). Instead, the EGFP-Dcp2 condensates were decreased in Nst1NTD-overexpressing cells compared to the vector control cells (Figure 2C,D). The endogenous EGFP-Dcp2 expression level in each mutant overexpressing cell was monitored by Western blotting to confirm that the overexpression of

each mutant did not affect EGFP-Dcp2 expression levels (Supplemental Figure S2B).

**Figure 2.** The Nst1 CTD is sufficient for Nst1 self-condensation. (**A**) Fluorescence microscopy of the cells overexpressing full-length enhanced green fluorescent protein (EGFP)−tagged Nst1 and the NTD (Nst1NTD, 1429) and CTD (Nst1CTD, 430–1240) of Nst1. Overexpression of each EGFP−tagged Nst1 deletion mutant was induced in wildtype cells, then observed before and after 1,6−hexanediol treatment. Scale bar: 5 μm. Schematic diagrams of the designed Nst1 domain deletion mutants are **Figure 2.** The Nst1 CTD is sufficient for Nst1 self-condensation. (**A**) Fluorescence microscopy of the cells overexpressing full-length enhanced green fluorescent protein (EGFP)-tagged Nst1 and the NTD (Nst1NTD, 1429) and CTD (Nst1CTD, 430–1240) of Nst1. Overexpression of each EGFP-tagged Nst1 deletion mutant was induced in wildtype cells, then observed before and after 1,6-hexanediol treatment. Scale bar: 5 µm. Schematic diagrams of the designed Nst1 domain deletion mutants are shown on the left. (**B**) The van Steensel's crosscorrelation coefficients (CCFs) between each overexpressed Nst1 deletion mutant used in (**A**). The endogenous mRNA decapping protein 2 (Dcp2) mKate2 signals were analyzed and presented. Overexpression of each EGFP-tagged Nst1 domain deletion mutant was induced in wild-type cells whose chromosomal *DCP2* was tagged with mKate2. Each Nst1 domain deletion mutant (n = total observed cell number): *PGAL*-*GFP*-*NST1* (n = 257), *PGAL*-*GFP*-*NST1NTD* (n = 158), and *PGAL*-*GFP*-*NST1CTD* (n = 161). All images were analyzed by FIJI (https://imagej.net/Fiji, accessed on 9 August 2020). (**C**,**D**) Each Nst1 deletion mutant was overexpressed in the wild-type cells with EGFP-tagged *DCP2* (YSK3485). (**C**) Fluorescence microscopy of cells expressing endogenous EGFP-Dcp2 that overexpress the NTD (1–429), CTD (430–1240), and full-length of Nst1. Scale bar: 10 µm. (**D**) Quantification of the endogenous EGFP-Dcp2 puncta analysis of (**C**). The pixels of the top 0.1% EGFP-Dcp2 signal intensities were segmented

for puncta analysis. The maximal intensity of each segmented punctum was plotted. '+' in the boxplot indicates the mean value of maximal intensities of foci. Each Nst1 domain deletion mutant (n = total observed cell number): *PGAL* vector—only control (n = 218), *PGAL*-*NST1* (n = 307), *PGAL*-*NST1NTD* (n = 333), and *PGAL*-*NST1CTD* (n = 300). All measurements and analyses were performed by FIJI (https://imagej.net/Fiji, accessed on 31 March 2022). Statistical significance was determined by a Mann–Whitney test \*\*\*\* *p* < 0.0001).

Previously, we demonstrated that the accumulated GFP-tagged Nst1 condensates co-localized with endogenous Dcp2-mKate2 [40]. To quantitatively investigate the correlation between Nst1 self-condensation and its association with PBs, we analyzed the co-localization of each overexpressed Nst1 domain deletion mutant and endogenous Dcp2 (a PB marker) using van Steensel's cross-correlation function (CCF) (Figure 2B). Here, van Steensel's CCF determines the degree of co-localization between two different signals (red and green) by crossing the Pearson coefficients of each image signal [53]. Endogenous Dcp2-mKate2 was captured for analysis in wild-type cells whose chromosomal *DCP2* was tagged with mKate2 after each GFP-tagged Nst1 mutant was overexpressed. Van Steensel's CCF of overexpressed GFP-tagged Nst1NTD did not show a bell-shaped curve with Dcp2 mKate2, indicating that the red and green signals did not overlap (Figure 2B). However, van Steensel's CCF of overexpressed GFP-tagged Nst1 and Nst1CTD for Dcp2-mKate2 showed a bell-shaped curve, although GFP-tagged Nst1CTD and Dcp2-mKate2 showed a weaker correlation than the wild-type Nst1 for Dcp2-mKate2 (Figure 2B). GFP-tagged Nst1CTD and Dcp2-mKate2 correlated more closely than GFP-tagged Nst1NTD and Dcp2 mKate2 (Figure 2B). These data suggest that Nst1 self-condensation is correlated with the accumulation of the Dcp2 PB marker.

To demonstrate the Nst1 domain responsible for EGFP-Dcp2 condensation, we monitored endogenous EGFP-Dcp2 in cells overexpressing Nst1, Nst1NTD, and Nst1CTD. As expected, Nst1NTD overexpression did not increase EGFP-Dcp2 condensation, whereas overexpression of full-length Nst1 induced EGFP-Dcp2 condensation (Figure 2C). EGFP-Dcp2 condensation induced by Nst1CTD overexpression was enhanced compared to the vector control and overexpressed Nst1NTD (Figure 2C). However, EGFP-Dcp2 condensation in cells overexpressing Nst1CTD was reduced compared with that in cells overexpressing full-length Nst1 (Figure 2C). To quantify the degree of EGFP-Dcp2 puncta generated, we segmented pixels of the top 0.05% intensity for puncta analysis, and the maximum intensities of the segmented puncta scaled from 0–255 were analyzed using a boxplot. The measuring method is described in detail in the Materials and Methods section and our previous study [40]. Consistent with Figure 2C, full-length Nst1 and Nst1CTD overexpression increased the maximum intensities of EGFP-Dcp2 condensates, while Nst1NTD overexpression did not (Figure 2D). Instead, the EGFP-Dcp2 condensates were decreased in Nst1NTD-overexpressing cells compared to the vector control cells (Figure 2C,D). The endogenous EGFP-Dcp2 expression level in each mutant overexpressing cell was monitored by Western blotting to confirm that the overexpression of each mutant did not affect EGFP-Dcp2 expression levels (Supplemental Figure S2B).

These data suggest that an intrinsic sequence factor responsible for Nst1 self-condensation is present in the Nst1 CTD. In addition, the condensation of PB components in the Nst1 overexpressed cells was produced based on Nst1 self-condensation through LLPS.

#### *2.3. The APD in the Nst1 CTD Is Insufficient but Crucial for Inducing Nst1 Self-Condensation*

Next, we investigated whether the predicted APD in the C-terminus induced Nst1 condensation. We constructed GFP-tagged Nst1∆APD and Nst1CTD∆APD mutants and compared their overexpression phenotypes with those of GFP-tagged wild-type Nst1 and Nst1CTD in BY4741 wild-type cells. Overexpression of GFP-tagged Nst1 and Nst1CTD generated selfcondensation (Figure 3A). However, when Nst1∆APD was overexpressed, major GFP signals were dispersed throughout the cytoplasm. We observed the same dispersed phenotype in cells overexpressing GFP-tagged Nst1CTD∆APD, where the APD was deleted in Nst1CTD,

although Nst1CTD displayed discrete puncta (Figure 3A). These observations suggest that the APD plays a critical role in Nst1 condensation. To further test the sufficiency of the APD inducing condensation, only the APD was overexpressed. This did not result in the assembly of any condensates (Figure 3A). These observations demonstrate that the Nst1 APD is crucial but insufficient for Nst1 condensation to form condensates. Consistently, the CCF of overexpressed GFP-tagged Nst1∆APD and Nst1CTD∆APD versus endogenous Dcp2-mKate2 did not show a bell-shaped curve (Figure 3B). *Int. J. Mol. Sci.* **2022**, *23*, 7380 8 of 21

(residues 431–1015), and Nst1APD (residues 1016–1240). Schematic diagrams of the designed Nst1 domain deletion mutants are shown on the left. Overexpression of each EGFP-tagged Nst1 domain

nous Dcp2-mKate2 signals were analyzed and presented. Overexpression of each EGFP-tagged Nst1 domain deletion mutant was induced in wild-type cells whose chromosomal *DCP2* was tagged with mKate2. Each Nst1 domain deletion mutant (n = total observed cell number): *PGAL-GFP-NST1* (n = 257), *PGAL-GFP-NST1ΔAPD* (n = 387), *PGAL-GFP-NST1CTD* (n = 161), *PGAL-GFP-NST1CTDΔAPD* (n = 191), and *PGAL-GFP-NST1APD* (n = 167). All images were analyzed by FIJI (https://imagej.net/Fiji, accessed Nst1, Nst1∆APD (residues 1–1015), Nst1CTD (C-terminal Nst1 residues 430–1240), Nst1CTD∆APD (residues 431–1015), and Nst1APD (residues 1016–1240). Schematic diagrams of the designed Nst1 domain deletion mutants are shown on the left. Overexpression of each EGFP-tagged Nst1 domain deletion mutant was induced in wild-type cells, then observed. Scale bar: 5 µm. (**B**) The van Steensel's CCFs between each overexpressed Nst1 domain deletion mutant used in (**A**) and the endogenous Dcp2-mKate2 signals were analyzed and presented. Overexpression of each EGFPtagged Nst1 domain deletion mutant was induced in wild-type cells whose chromosomal *DCP2* was tagged with mKate2. Each Nst1 domain deletion mutant (n = total observed cell number): *PGAL-GFP-NST1* (n = 257), *PGAL-GFP-NST1*∆*APD* (n = 387), *PGAL-GFP-NST1CTD* (n = 161), *PGAL-GFP-NST1CTD*∆*APD* (n = 191), and *PGAL-GFP-NST1APD* (n = 167). All images were analyzed by FIJI (https://imagej.net/Fiji, accessed on 9 August 2020). (**C**,**D**) Each Nst1 domain deletion mutant was overexpressed in the wild-type cells with EGFP-tagged *DCP2* (YSK3485). (**C**) Fluorescence microscopy of endogenous EGFP-Dcp2-tagged cells overexpressing full-length Nst1, Nst1∆APD, Nst1CTD, Nst1 CTD∆APD, and Nst1APD. Scale bar: 10 µm. (**D**) Quantification of the endogenous EGFP-Dcp2 puncta of (**B**). The pixels of the top 0.1% EGFP-Dcp2 signal intensities were segmented for puncta analysis. The maximal intensity of each segmented punctum was plotted. '+' in the boxplot indicates the mean value of maximal intensities of foci. Each Nst1 domain deletion mutant (n = total observed cell number): *PGAL* vector-only control (n = 218), *PGAL-NST1* (n = 307), *PGAL-NST1*∆*APD* (n = 337), *PGAL-NST1CTD* (n = 300), *PGAL-NST1CTD*∆*APD* (n = 260), and *PGAL-NST1APD* (n = 261). All measurements and analyses were performed by FIJI (https://imagej.net/Fiji, accessed on 31 March 2022). Statistical significance was determined by a Mann–Whitney test (\*\*\*\* *p* < 0.0001).

In monitoring EGFP-Dcp2 in cells overexpressing these domain deletion mutants, Nst1∆APD overexpression did not show EGFP-Dcp2 condensate accumulation, while overexpression of the full-length Nst1 induced EGFP-Dcp2 condensation (Figure 3C), as expected from van Steensel's CCF. In the quantitative analysis of the EGFP-Dcp2 condensates, the overexpression of full-length Nst1 and Nst1CTD increased the maximal intensities of EGFP-Dcp2 condensates compared to the vector control, while the overexpression of Nst1∆APD and Nst1CTD∆APD canceled out the effect (Figure 3D). The maximal intensities of the EGFP-Dcp2 condensates were reduced in both Nst1∆APD- and Nst1CTD∆APD-overexpressing cells compared to those of the vector control cells (Figure 3D). Endogenous EGFP-Dcp2 did not appear as puncta in cells overexpressing Nst1∆APD, suggesting that the APD of Nst1 alone was unable to induce self-aggregation. Endogenous EGFP-Dcp2 expression levels in cells overexpressing each mutant were monitored by Western blotting to confirm that the overexpression of each mutant did not affect EGFP-Dcp2 expression levels (Supplemental Figure S2C). These data demonstrate that the Nst1 APD is the critical region for inducing Nst1 self-condensation but functions in a context-dependent manner.

#### *2.4. The Nst1 PD Is Not a Critical Component in Self-Condensation but Is Responsible for Inducing Dcp2 Condensation*

The polyampholyte region of proteins is a representative IDR and is anticipated to be closely related to biomolecular condensation [48,49]. We attempted to demonstrate the function of the PD in Nst1 self-condensation because an obvious polyampholyte region is present in the Nst1 CTD. Considering previous reports on the function of the polyampholyte region in LLPS [48,49], we expected that PD deletion in various Nst1 domain mutants would negatively affect self-condensate generation. We compared the punctum formation of the GFP-tagged PD deletion mutants with that of the GFP-tagged full-length Nst1 and Nst1CTD upon overexpression. Unexpectedly, GFP-tagged Nst1∆PD generated condensates similar to wild-type Nst1 when overexpressed, demonstrating that PD does not control Nst1 self-condensation (Figure 4A). Nst1CTD∆PD overexpression also formed puncta (Figure 4A). However, the size and intensity of puncta in cells overexpressing GFPtagged Nst1CTD∆PD were reduced, compared with those in cells overexpressing GFP-tagged Nst1CTD (Figure 4A), suggesting that the PD in Nst1 may only partially contribute to Nst1 self-condensation. Both condensates induced by GFP-tagged Nst1∆PD and Nst1CTD∆PD

*Int. J. Mol. Sci.* **2022**, *23*, 7380 10 of 21

overexpression were dispersed in the cytoplasm in 1,6-hexanediol-treated cells, indicating their liquid-like properties (Figure 4A). expected, endogenous EGFP-Dcp2 condensates did not accumulate in cells overexpressing Nst1∆PD∆APD (Figure 4C,D).

(Figure 4B). These data confirm that the APD is responsible for Nst1 self-condensation. As

full-length Nst1, Nst1ΔPD (residues 1–630 and 753–1240), Nst1CTD (C-terminal Nst1 residues 430–

125

full-length Nst1, Nst1∆PD (residues 1–630 and 753–1240), Nst1CTD (C-terminal Nst1 residues 430–1240), Nst1CTD∆PD (residues 431–630 and 753–1240), and Nst1∆PD∆APD (residues 1–630 and 753–1015). Overexpression of each EGFP-tagged Nst1 domain deletion mutant was induced in wild-type cells, then observed before and after 1,6-hexanediol treatment. Scale bar: 5 µm. (**B**) The van Steensel's CCFs between each overexpressed Nst1 domain deletion mutant used in (**A**) and the endogenous Dcp2-mKate2 signals were analyzed and presented. Overexpression of each EGFPtagged Nst1 domain deletion mutant was induced in wild-type cells whose chromosomal *DCP2* was tagged with mKate2. Each Nst1 domain deletion mutant (n = total observed cell number): *PGAL-GFP-NST1* (n = 257), *PGAL-GFP-NST1*∆*PD* (n = 277), *PGAL-GFP-NST1CTD* (n = 161), *PGAL-GFP-NST1CTD*∆*PD* (n = 199), and *PGAL-GFP-NST1*∆*APD*∆*PD* (n = 198). All images were analyzed by FIJI (https://imagej.net/Fiji, accessed on 9 August 2020). (**C**,**D**) Each Nst1 domain deletion mutant was overexpressed in the wild-type cells with EGFP-tagged *DCP2* (YSK3485). Schematic diagrams of the designed Nst1 domain deletion mutants are shown on the left. (**C**) Fluorescence microscopy of endogenous EGFP-Dcp2-tagged cells overexpressing full-length Nst1, Nst1∆PD (residues 1–630 and 753–1240), Nst1CTD (C-terminal Nst1 residues 430–1240), Nst1CTD∆PD (residues 431–630 and 753–1240), and Nst1∆PD∆APD (residues 1–630 and 753–1015). Scale bar: 10 µm. (**D**) Quantification of the endogenous EGFP-Dcp2 puncta of (**C**). The pixels of the top 0.1% EGFP-Dcp2 signal intensities were segmented for puncta analysis. The maximal intensity of each segmented punctum was plotted. '+' in the boxplot indicates the mean value of maximal intensities of foci. Each Nst1 domain deletion mutant (n = total observed cell number): *PGAL* vector-only control (n = 218), *PGAL-NST1* (n = 307), *PGAL-NST1*∆*PD* (n = 284), *PGAL-NST1CTD* (n = 300), *PGAL-NST1CTD*∆*PD* (n = 200), and *PGAL-NST1*∆*PD*∆*APD* (n = 260). All measurements and analyses were performed by FIJI (https://imagej.net/Fiji, accessed on 31 March 2022). Statistical significance was determined by a Mann–Whitney test (\*\*\*\* *p* < 0.0001).

We anticipated that overexpressed Nst1∆PD would induce EGFP-Dcp2 condensates and colocalize with the Nst1 overexpression because Nst1 PD deletion did not interrupt Nst1 self-condensation upon overexpression. However, in van Steensel's CCF diagram, the localization of endogenous Dcp2-mKate2 tended to be less correlated with overexpressed GFP-tagged Nst1∆PD localization than with overexpressed GFP-tagged full-length Nst1 and Nst1CTD (Figure 4B).

To examine the functional potential of PD deletion in EGFP-Dcp2 condensation, we investigated EGFP-Dcp2 condensation in cells overexpressing Nst1∆PD. Endogenous EGFP-Dcp2 was monitored in cells overexpressing each Nst1 domain deletion mutant (Figure 4A). Unexpectedly, the intensity of EGFP-Dcp2 puncta hardly increased in cells overexpressing Nst1∆PD (Figure 4C,D), although we observed that GFP-tagged Nst1∆PD overexpression generated bright puncta via self-condensation (Figure 4A). Endogenous EGFP-Dcp2 expression levels in Nst1 mutant-overexpressing cells monitored by Western blotting indicated that Nst1∆PD mutant overexpression did not affect EGFP-Dcp2 expression levels (Supplemental Figure S2D). The intensity of EGFP-Dcp2 puncta in cells overexpressing Nst1∆PD was similar to that in cells overexpressing Nst1∆APD, which did not generate any concentrated EGFP-Dcp2 signals (Figures 3C and 4C). These observations suggest that the PD is less correlated with self-condensation and may play a specific role in recruiting other PB components. Deleting the PD in Nst1CTD also canceled out Nst1CTD overexpressioninduced EGFP-Dcp2 accumulation, supporting the role of PD in EGFP-Dcp2 condensation (Figure 4C,D).

The effect of the APD on Nst1 self-condensation was confirmed by Nst1∆PD∆APD overexpression. We observed that GFP-tagged Nst1∆PD∆APD was mainly dispersed in the cytoplasm as GFP-tagged Nst1∆APD, while overexpressed GFP-tagged Nst1∆PD was observed as clear puncta (Figure 4A). Van Steensel's CCF between GFP-tagged Nst1∆PD∆APD and Dcp2-mKate2 also reflected that the double deletion of the PD and APD reduced the intrinsic self-condensation potential of overexpressed Nst1 to be co-localized with the PB marker (Figure 4B). These data confirm that the APD is responsible for Nst1 selfcondensation. As expected, endogenous EGFP-Dcp2 condensates did not accumulate in cells overexpressing Nst1∆PD∆APD (Figure 4C,D).

#### *2.5. Dcp2 Condensation Induced by Nst1 PD Overexpression Is Independent of Free Ribosomal Influx*

Observations of endogenous EGFP-Dcp2 in cells overexpressing various Nst1 mutants revealed that the APD and PD are largely responsible for self-condensation and inducing Dcp2 condensation, respectively. Since RNA functions as a scaffold for protein condensation via LLPS [12], we examined whether the PD in Dcp2 condensation is mediated by polysome RNA influx. We investigated EGFP-Dcp2 puncta induced by overexpression of each Nst1 domain deletion mutant after cycloheximide (CHX) treatment. PB formation induced by stress relies on an increase in non-translating mRNA concentration [12]. CHX completely disassembled the endogenous PBs formed during glucose deficiency, which inhibited translation elongation and resulted in a reduction in non-translating RNA [11,12]. Thus, protein-induced PB accumulation can be verified because CHX eliminated RNAderived PBs.

Nst1, Nst1∆PD, Nst1CTD, and Nst1CTD∆PD were overexpressed in cells with EGFPtagged chromosomal *DCP2*, and EGFP-Dcp2 was observed after treating cells with 100 µg/mL CHX for 10 min [12]. Consistent with our previous report, Nst1 overexpression maintained EGFP-Dcp2 condensates in the presence of CHX, whereas the EGFP-Dcp2 puncta completely disappeared in the vector control (Figure 5A). We also observed that Nst1CTD overexpression maintained EGFP-Dcp2 condensates in the presence of CHX (Figure 5A). The maximal intensity of EGFP-Dcp2 puncta generated by each domain deletion mutant was measured and plotted on the y-axis in the puncta quantification analysis shown in Figure 5A (Figure 5B). Similar to the results shown in Figure 4, the maximal intensity of EGFP-Dcp2 puncta accumulated by Nst1∆PD overexpression was significantly decreased compared to that of full-length Nst1 overexpression and was similar to the vector-only control (Figure 5B). The maximal intensity of the EGFP-Dcp2 puncta accumulated by Nst1CTD∆PD declined, similar to Nst1∆PD (Figure 5B). The ratio of cells with generated EGFP-Dcp2 puncta in the PD deletion mutants (Nst1∆PD and Nst1CTD∆PD) was dramatically decreased compared to that in Nst1 and Nst1CTD (Figure 5C). These observations strongly support the implication that the PD is responsible for inducing the condensation of other PB components.

**Figure 5.** The Nst1 polyampholyte region interacts with a PB component Dcp2 independent of the free ribosomal RNA influx. (**A**–**C**) In the wild-type strain whose chromosomal *DCP2* was tagged with EGFP*,* endogenous EGFP-Dcp2 was observed after the overexpression of full-length Nst1, Nst1ΔPD (residues 1–630 and 753–1240), Nst1CTD (C-terminal Nst1 residues 430–1240), and Nst1CTDΔPD (residues 431–630 and 753–1240). In the cells overexpressing each Nst1 domain deletion mutant, endogenous Dcp2 was observed before and after the 10 min 100 μg/mL cycloheximdie (CHX) treatment*.* (**A**) Fluorescence microscopy of EGFP-Dcp2 in the cells overexpressing each mutant before **Figure 5.** The Nst1 polyampholyte region interacts with a PB component Dcp2 independent of the free ribosomal RNA influx. (**A**–**C**) In the wild-type strain whose chromosomal *DCP2* was tagged with EGFP, endogenous EGFP-Dcp2 was observed after the overexpression of full-length Nst1, Nst1∆PD (residues 1–630 and 753–1240), Nst1CTD (C-terminal Nst1 residues 430–1240), and Nst1CTD∆PD (residues 431–630 and 753–1240). In the cells overexpressing each Nst1 domain deletion mutant, endogenous Dcp2 was observed before and after the 10 min 100 µg/mL cycloheximdie (CHX)

treatment. (**A**) Fluorescence microscopy of EGFP-Dcp2 in the cells overexpressing each mutant before and after the 10 min 100 µg/mL CHX treatment. Scale bar: 10 µm. (**B**) Quantification of EGFP-Dcp2 puncta shown in (**A**). The pixels of the top 0.1% signal intensities were segmented and analyzed. The maximal value of each punctum was plotted. '+' in the boxplot indicates the mean value of maximal intensities of foci. All measurements and analyses were performed by FIJI (https://imagej.net/Fiji, accessed on 31 March 2022) (**C**) The ratio of cells producing EGFP-Dcp2 puncta to the total cells by the overexpression of each Nst1 mutant (Nst1 domain deletion mutant (n = total observed cell number): *PGAL* (vector only, n = 260), *PGAL-NST1* (n = 312), *PGAL-NST1*∆*PD* (n = 247), *PGAL-NST1CTD* (n = 302), and *PGAL-NST1CTD*∆*PD* (n = 241)) in (**A**).

#### *2.6. The Nst1 PD Serves as a Binding Hub, Mediating the Condensation of other PB Components*

Edc3 is a PB scaffold protein in *S. cerevisiae* [11,54]. The ∆*edc3lsm4*∆C mutant could not induce EGFP-Dcp2 condensates independent of RNA influx, indicating that Edc3 is a critical component in PB generation. In our previous study, EGFP-Dcp2 condensation driven by Nst1 overexpression was suppressed in ∆*edc3 lsm4*∆C mutant cells, suggesting a functional relationship between Nst1 and Edc3 in condensate formation. The EGFP-tagged *EDC3* strain was transformed with the same Nst1 deletion mutant clones tested in Figure 5 to determine whether the Nst1 PD is also responsible for EGFP-Edc3 condensation. We then treated these cells with 100 µg/mL CHX after galactose induction to examine whether Nst1 PD-mediated Edc3 condensation is independent of polysome RNA influx. Microscopic observations revealed that the puncta of EGFP-Dcp2 and EGFP-Edc3, induced by Nst1 overexpression, behaved analogously. In the presence of CHX, EGFP-Edc3 condensation was highly decreased in cells overexpressing Nst1∆PD and Nst1CTD∆PD compared to that in cells overexpressing Nst1 and Nst1CTD (Figure 6A). Nst1∆PD overexpression did not induce EGFP-Edc3 puncta (Figure 6A), although GFP-tagged Nst1∆PD overexpression resulted in its bright puncta (Figure 4B). The pattern of EGFP-Edc3 puncta generated by the overexpression of diverse Nst1 deletion mutants was similar to the pattern of EGFP-Dcp2, both in the maximal intensity and the ratio of puncta-generating cells (Figure 6B). The ratio of cells with generated EGFP-Edc3 puncta in the PD deletion mutants (Nst1∆PD and Nst1CTD∆PD) was dramatically decreased compared to that in Nst1 and Nst1CTD (Figure 6C). Endogenous EGFP-Edc3 expression levels of each Nst1 domain deletion mutant monitored by Western blot analysis showed that altering EGFP-Edc3 expression levels did not induce EGFP-Edc3 puncta reduction in cells overexpressing PD deletion mutants (Supplementary Figure S2F). These analyses strongly suggested that the PD is responsible for recruiting Edc3 as well as Dcp2.

The EGFP-tagged *DHH1* and *XRN1* strains were transformed with Nst1 and Nst1∆PD and treated with 100 µg/mL CHX for 10 min after galactose induction to verify whether the PD recruits other PB components. Overexpression of PD deletion mutants (Nst1∆PD and Nst1CTD∆PD) generated fewer Dhh1 and Xrn1 puncta than wild-type CHX-treated Nst1 overexpressing cells (Figure 6D,E). Overall, overexpression of the PD deletion mutants (Nst1∆PD and Nst1CTD∆PD) reduced the condensation of known PB components, suggesting that the Nst1 PD interacts with PB components independent of polysome RNA influx.

**Figure 6.** The Nst1 polyampholyte region functions as a binding hub for P-body (PB) components independent of the free ribosomal RNA influx. (**A**–**C**) In the wild-type strains whose chromosomal *EDC3* was tagged with EGFP, the full-length Nst1, Nst1ΔPD (residues 1–630 and 753–1240), Nst1CTD (Cterminal Nst1 residues 430–1240), and Nst1CTDΔPD (residues 431–630 and 753–1240) were overexpressed. Endogenous enhancer of mRNA decapping 3 (Edc3) was observed before and after cells were treated with 100 μg/mL CHX for 10 min*.* (**A**) Fluorescence microscopy of EGFP-Edc3 in the cells overexpressing each mutant before and after the 10 min 100 μg/mL CHX treatment. Scale bar: 10 μm. (**B**) Quantification of the EGFP-Edc3 puncta shown in (**A**) with CHX. The pixels of the top 0.1% signal intensities were segmented as the EGFP-Edc3 puncta. The EGFP-Edc3 puncta generated were quanti-**Figure 6.** The Nst1 polyampholyte region functions as a binding hub for P-body (PB) components independent of the free ribosomal RNA influx. (**A**–**C**) In the wild-type strains whose chromosomal EDC3 was tagged with EGFP, the full-length Nst1, Nst1∆PD (residues 1–630 and 753–1240), Nst1CTD (C-terminal Nst1 residues 430–1240), and Nst1CTD∆PD (residues 431–630 and 753–1240) were overexpressed. Endogenous enhancer of mRNA decapping 3 (Edc3) was observed before and after cells were treated with 100 µg/mL CHX for 10 min. (**A**) Fluorescence microscopy of EGFP-Edc3 in the cells overexpressing each mutant before and after the 10 min 100 µg/mL CHX treatment. Scale bar: 10 µm. (**B**) Quantification of the EGFP-Edc3 puncta shown in (**A**) with CHX. The pixels of the top

fied, and the maximal value of each punctum was plotted. '+' in the boxplot indicates the mean value

0.1% signal intensities were segmented as the EGFP-Edc3 puncta. The EGFP-Edc3 puncta generated were quantified, and the maximal value of each punctum was plotted. '+' in the boxplot indicates the mean value of maximal intensities of foci. All measurements and analyses were performed by FIJI (https://imagej.net/Fiji, accessed on 31 March 2022) Statistical significance was determined by a Mann–Whitney test (\*\*\*\* *p* < 0.0001). (**C**) The ratio of cells producing EGFP-Edc3 puncta to the total cells overexpressing each Nst1 mutant (Nst1 domain deletion mutant in the presence of CHX (n = total observed cell number): PGAL (vector only, n = 233), PGAL-NST1 (n = 320), PGAL-NST1∆PD (n = 200), PGAL-NST1CTD (n = 244), and PGAL-NST1CTD∆PD (n = 275). (**D**,**E**) In the wild-type strains with EGFP-tagged chromosomal DHH1 and XRN1, the overexpression of full-length Nst1 and Nst1∆PD cells was induced, and then cells were treated with 100 µg/mL CHX for 10 min. Fluorescence microscopy of (**D**) EGFP-Dhh1 and (**E**) EGFP-Xrn1 in CHX-treated cells overexpressing full-length Nst1 and Nst1∆PD. Scale bar: 10 µm. All images were measured and analyzed by FIJI (https://imagej.net/Fiji, accessed on 31 March 2022).

#### **3. Discussion**

Understanding the syntax of biomolecular condensation is key to understanding the molecular dynamics of cells. RNA is a powerful scaffold, and the RNA-binding moiety of scaffold proteins is expected to be crucial in biomolecular condensation. However, the protein scaffolds responsible for condensation need to be investigated further. The sequence properties of various scaffold proteins in condensates, such as the low complexity domains (LCDs) of poly-Q or RGG and the polyampholytic region of charged amino acids (lysine or arginine), may be critical for biomolecular condensation [50,55]. Further, scaffold proteins that specifically function in a particular condensation generally have oligomerizing properties and IDRs. A study on the PB component, Lsm4, in budding yeast found that *GAL*-induced Lsm4 overexpression drives self-condensation [27]. Lsm4 is a representative PB component, with a prion-like domain (PrD, poly-Q motif) in its C-terminal region. Although CHX dissipated the stress-responsive endogenous Lsm4-GFP puncta, it did not disperse the bright clear puncta generated by the *GAL*-induced GFP-Lsm4, implying that the physical properties of the puncta induced by overexpressed Lsm4 were not identical to the stress-derived endogenous PBs. However, these observations indicate that Lsm4 has strong self-oligomerizing potential despite the unidentical physical properties of the puncta upon overexpression with native PBs. Similarly, overexpressed GFP-Edc3 appeared as bright clear puncta not dissipated by CHX, supporting previous reports that Edc3 harbors the Yjef-N domain, which induces Edc3 self-oligomerization [56].

We previously found that Nst1 significantly accumulated puncta in the stationary phase. We also reported that GFP-tagged Nst1 overexpression using a *GAL*-inducible promoter yielded condensates of round puncta (Figure 2A) and drove the accumulation of other PB components. These data strongly suggest that Nst1 has the potential to self-condensate and recruit other PB components to condense. CHX did not dissipate the overexpressed GFP-Nst1-generated bright clear puncta, suggesting that Nst1 has a sequence element that induces self-oligomerization similar to Lsm4 and Edc3, although Nst1 does not have a recognizable PrD. Nst1 is a large protein consisting of 1240 amino acids, including diverse sequence elements, a presumptive IDR, and aggregation-prone regions, as predicted by several programs (Figure 1). In this study, we attempted to elucidate the functional sequence elements of Nst1 for its self-condensation and accumulation of other PB component/s by examining GFP-tagged Nst1 domain deletion mutants upon overexpression.

#### *3.1. The Nst1 C-Terminus Is Necessary and Sufficient for Self-Condensation, While the N-Terminus Has an Auxiliary Role in Recruiting other PB Components*

Overexpression of GFP-tagged Nst1NTD (residues 1–429) did not generate any selfcondensation, whereas CTD (residues 430–1240) overexpression was sufficient for selfcondensation, indicating that the oligomerizing domain is present in the Nst1 C-terminus. Nst1 condensation induced EGFP-Dcp2 condensation. The Nst1 domain, serving as a platform to interact with Dcp2, is essential for EGFP-Dcp2 condensation. In the analysis

of Dcp2 condensates in cells overexpressing different Nst1 domain deletion mutants, CTD overexpression induced less Dcp2 accumulation than full-length Nst1. GFP-tagged Nst1CTD formed bright clear puncta with a similar intensity to the GFP-tagged full-length Nst1 upon overexpression. Consistently, cells overexpressing the NTD did not seem to produce any EGFP-Dcp2 puncta compared to the cells overexpressing full-length Nst1 (Figure 2C,D), but instead showed reduced EGFP-Dcp2 puncta in comparison with the vector-only control. These observations can be explained by the recent LLPS mechanism suggested by the Brangwynne group, in which node capping could reduce interactor condensation [25]. By functioning as a Dcp2 node capper, the Nst1 NTD may directly or indirectly interact with Dcp2 to cover the Dcp2 node, resulting in Dcp2 condensation inhibition. These data imply that the overexpressed Nst1 NTD does not function in Nst1 self-condensation, but it may support Dcp2 recruitment to PB-associated condensates in full-length Nst1-overexpressing cells.

#### *3.2. The Aggregation-Prone Region May Be Associated with Inducing Nst1 Condensates with Liquid-like Properties*

We attempted to identify a specific region in the CTD that is responsible for selfcondensation. According to Nst1 sequence-based predictions, the most aggregation-prone region consisted of hydrophobic amino acid residues in the APD. Although the precise link between aggregation propensity and LLPS remains unclear, the degree of aggregation propensity is likely correlated with many types of condensation, such as LLPS [57]. Among the several Nst1 domain deletion mutants constructed, the APD deletion mutant (Nst1∆APD) was the most powerful suppressor of Nst1 condensation driven by overexpression. Overexpressed GFP-tagged APD mutants, such as Nst1∆APD, Nst1CTD∆APD, and Nst1∆PD∆APD, had significantly decreased puncta and were dispersed in the cytoplasm (Figure 3A), suggesting that the APD in the C-terminus is critical for Nst1 self-condensation. Conversely, APD-overexpressing cells did not show any EGFP-Dcp2 condensates, although condensation of EGFP-Dcp2 in cells overexpressing Nst1∆APD was alleviated compared to cells overexpressing full-length Nst1 (Figure 3C,D). These data suggest that APD is necessary but insufficient for Nst1 or Nst1CTD self-condensation. The insufficiency of the APD for self-condensation could explain the importance of context and promiscuous interactions in protein condensation by LLPS [26,27,34,53]. These observations imply that although we could obtain clues on the sequence elements involved in Nst1 self-condensation by deleting each element, removing a domain may damage the unique sequence pattern of full-length Nst1 for condensation.

#### *3.3. The Polyampholyte Region May Be Involved in Molecular Condensation as a Platform for Multivalent Protein–Protein Interactions Independent of RNA Influx*

Polyampholytes containing a significant proportion (>35%) of positively and negatively charged residues are present in 75% of intrinsically disordered proteins [48,55]. In the analyses of EGFP-Dcp2 puncta formed in cells overexpressing Nst1 domain deletion mutants, the Nst1 PD was responsible for recruiting other PB components. The cells overexpressing the PD deletion mutants showed significantly decreased Dcp2 puncta compared to the full length and CTD of Nst1. However, the overexpressed GFP-tagged PD deletion mutants strongly induced self-condensation, similar to overexpressed GFP-tagged Nst1 and Nst1CTD (Figure 4A). The decreased EGFP-Dcp2 phenotype in the overexpressed PD deletion mutants was more clearly observed after treatment with CHX (Figure 5A,B).

Other PB markers, such as Edc3, Dhh1, and Xrn1, showed similar accumulation patterns to Dcp2 in CHX-treated cells overexpressing PD deletion mutants. These observations clarified that PD does not affect Nst1 oligomerization but significantly contributes to the recruitment of other PB components, namely Dcp2, Dhh1, Xrn1, and Edc3.

To assess the physical interaction of Nst1 and Nst1∆PD with PB components upon biomolecular condensation, we first checked the physical interactions of Nst1 with other essential PB constituents via a co-immunoprecipitation (Co-IP) assay of the 6hemagglutinin (HA)-tagged Nst1 with 9Myc-tagged Dcp2/Edc3/Dhh1/Ccr4 in the log phase cells, but could not observe any interaction (data not shown). We also tried to identify whether the interaction between the overexpressed full-length Nst1, Nst1∆PD, Nst1CTD Nst1CTD∆PD and Dcp2 is biochemically detected. Co-IP of the overexpressed Nst1 and Nst1 domain deletion mutants with 9Myc-tagged Dcp2 was performed. Unfortunately, we could not detect any direct physical interaction between them (Supplemental Figure S3A,B), although microscopic examinations provided evidence of Dcp2, Edc3, Dhh1, and Xrn1 condensation and co-localization with Nst1 when Nst1 was overexpressed (Figures 5 and 6). These results imply that the interactions between Nst1 and known PB components were not strong enough to be detected biochemically but were sufficient to induce promiscuous interactions. Considering that polyampholyte regions are among the most frequently occurring IDRs in nature, our findings provide insights into the roles of polyampholyte multivalency in interacting with other PB constituents. Further studies are needed to understand the molecular mechanism by which the Nst1 polyampholyte recruits other PB components to form condensates.

This study on Nst1 domain deletion mutants further improves our understanding of the sequence elements with high aggregation propensity and the polyampholyte region in unstructured proteins involved in the self-condensation and condensation of PB components with liquid-like properties.

#### **4. Materials and Methods**

#### *4.1. Yeast strains, Plasmids, and Cultures*

Table 1 lists the *S. cerevisiae* strains and genotypes used in this study. The strains were constructed on the BY4741 or w303a wild-type background by integrating templates from the polymerase chain reaction (PCR) toolbox at the 30 end of each reading frame in each endogenous locus through PCR-based homologous recombination [58]. We used PCR of the integrated locus and Western blotting to verify all the constructed strains.


**Table 1.** The yeast strains used in this study.

All plasmids used in this study were constructed in *pMW20(U)-PGAL-GFP* or *pMW20(U)- PGAL*, as Table 2 describes. Nst1 domain deletion mutant clones were generated using the PCR-mediated deletion method [59] and confirmed by sequencing.

Yeast strains were cultured in YPAD or synthetic complete (SC) media containing 2% glucose. Glucose deprivation was induced in the SC medium without glucose. Yeast cells were cultured at 25 ◦C to an optical density of 600 nm (OD600) ≤ 0.5 for logarithmic phase growth. Cells in the logarithmic phase were primarily cultured in SC-U + 2% glucose media to an OD<sup>600</sup> of 0.5 and harvested to induce overexpression under the *GAL*. The cells were washed three times with Sc-U + 2% raffinose + 0.1% glucose medium, diluted to half of its concentration, and cultured for an additional 3 h in Sc-U + 2% raffinose + 0.1% glucose. Then, 20% galactose stock was added to the culture to adjust the final galactose concentration to 2%, and the cells were further incubated for 3 h for induction before collection.


**Table 2.** The plasmids used in this study.

#### *4.2. 1,6-Hexanediol and CHX Treatments and Western Blots*

Yeast *GAL* promoter induction was performed as described above. For CHX treatment, *GAL*-induced cells were incubated with 100 µg/mL CHX for 10 min. For 1,6-hexanediol treatment, the *GAL*-induced cells were washed three times with a medium containing 10% 1,6-hexanediol and incubated for 30 min. Western blotting was performed as described by Choi and Song [40] with anti-EGFP antibody (600-101-215 Rockland, Limerick, PA, USA) and anti-Tub1 (T5168, Sigma, St. Louis, MO, USA) positive controls. HRP-conjugated anti-goat (705-035-003, Jackson Immune Research, PA, USA) and anti-mouse (sc-2005, Santa Cruz Biotechnology, Dallas, TX, USA) antibodies were used as secondary antibodies to detect EGFP and anti-Tub1, respectively.

#### *4.3. Nst1 Structure and Domain Predictions Based on the Sequence*

Structural prediction of Nst1 was performed using GalaxyWEB [60]. The IUPRED2A, PONDR, and DISOPRED3 [42,43,61,62] algorithms, which predict IDRs, were used to analyze the disorder properties in the Nst1 sequence. AGGRESCAN, Tango, and PASTA 2.0 were used to predict aggregation-prone regions in the Nst1 sequence [45,46,63–65]. Domain deletion mutants of Nst1 were projected in the Das–Pappu phase diagram at http://pappulab.wustl.edu/CIDER/analysis/, (accessed on 6 April 2022).

#### *4.4. Wide-Field Fluorescence Microscopy of Yeast Cells and Image Analysis*

Fluorescence-labeled proteins were visualized using an Axioplan2 microscope (Carl Zeiss, Jena, Germany) with a 100× Plan-Neofluar oil immersion objective. Images were acquired using an Axiocam CCD camera and Axio Vision software (Carl Zeiss). The same culture conditions, exposure times, and fluorescence intensities were applied to all the strains observed in this study to compare the degree of puncta intensity. Images were analyzed as described below.

For colocalization analysis to deduce van Steensel's CCF, fluorescent images obtained through green or red channels were analyzed with the plugin JACoP v4.0 analysis tool using FIJI (Image J). CCFs were calculated and presented as a bell-shaped plot.

All images obtained were measured and analyzed with the same optics, filters, and zoom settings throughout the study using FIJI (ImageJ) to quantify PB condensation.

The pixel intensity of each GFP signal in the cells was scaled from 1 to 255 to investigate the intensity of puncta per cell. Pixels of the top 0.05% signal intensities of each strain were segmented to perform particle analysis and determine the individual punctum strength to deduce the maximum intensity value of each punctum. The highest pixel value of each punctum was presented. The total number of cells, the number of cells with puncta, and the ratio of cells with puncta were calculated to analyze the puncta generated.

#### *4.5. Statistical Analysis*

Detailed statistics, including the mean values and standard deviations, are indicated in each figure legend. Statistical analyses were performed using GraphPad Prism 6 (GraphPad Software, Inc., La Jolla, CA, USA). A t-test was used to assess statistically significant differences. *p* < 0.05 (\*), *p* < 0.01 (\*\*), *p* < 0.001 (\*\*\*), and *p* < 0.0001 (\*\*\*\*) indicate statistical significance compared with the control. *p* > 0.05 indicates statistical non-significance (n.s.).

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms23137380/s1.

**Author Contributions:** K.S. and Y.-J.C. conceived and designed the study. Y.-J.C. and Y.L. (Yujin Lee) performed the experiments. K.S., Y.-J.C. and Y.-H.L. contributed to the reagents/materials/analytical tools. K.S. and Y.-J.C. drafted the manuscript. Y.-H.L., Y.L. (Yuxi Lin), Y.H. and Y.-J.C. predicted the Nst1 domains. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Research Foundation of KOREA (NRF) (No. NRF-2017R1A2B4009785) via the Korean Government (MSIT). Y. Choi was partially supported by an NRF grant from the Korean government (NRF-2020R1A2C1102153). Y.H. Lee was supported by NRF grants funded by the Korean government (NRF-2019R1A2C1004954 and NRF-2022R1A2C1011793) and Korea Basic Science Institute grants (C220000, C230130, and C280320).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors deeply appreciate the kind gift of yeast strains from Parker (University of Colorado at Boulder).

**Conflicts of Interest:** The authors declare that there are no conflict of interest.

#### **References**


## *Article* **BIAPSS: A Comprehensive Physicochemical Analyzer of Proteins Undergoing Liquid–Liquid Phase Separation**

**Aleksandra E. Badaczewska-Dawid <sup>1</sup> , Vladimir N. Uversky 2,\* and Davit A. Potoyan 1,3,\***

	- <sup>3</sup> Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
	- **\*** Correspondence: vuversky@usf.edu (V.N.U.); potoyan@iastate.edu (D.A.P.)

**Abstract:** The liquid–liquid phase separation (LLPS) of biomolecules is a phenomenon which is nowadays recognized as the driving force for the biogenesis of numerous functional membraneless organelles and cellular bodies. The interplay between the protein primary sequence and phase separation remains poorly understood, despite intensive research. To uncover the sequence-encoded signals of protein capable of undergoing LLPS, we developed a novel web platform named BIAPSS (Bioinformatics Analysis of LLPS Sequences). This web server provides on-the-fly analysis, visualization, and interpretation of the physicochemical and structural features for the superset of curated LLPS proteins.

**Keywords:** liquid–liquid phase separation; membraneless organelles; intrinsically disordered proteins; proteins with low complexity

#### **1. Introduction**

The spatiotemporal organization of biomolecules and biomolecular interactions is essential for the efficient regulation of cellular biochemistry. The underlying biophysical mechanism for membraneless compartmentalization is liquid–liquid phase separation (LLPS). In the past few years, the LLPS of biomolecules has become a unifying physical mechanism for understanding the principles of intracellular compartmentalization, the formation of membraneless organelles (MLOs), and gene regulation [1–14]. In the LLPS process, the relatively well-mixed solution of biomolecules separates into liquid droplets. The ability of proteins to phase separate appears to be encoded primarily in the peculiarities of their primary sequences, which often contain low-complexity regions and intrinsically disordered regions (IDRs) that are enriched in charged and multivalent interaction centers [6–8,10,11,13–19]. While some general sequence trends have emerged, the quantitative aspects of how amino acid sequences encode and decode phase separation still remain largely unknown [20–22]. This is because many different combinations of relevant interactions seem to be contributing to phase separation, without any one being universally necessary [23]. As a consequence (with a few exceptions [24–30]), mostly case-by-case studies of different sequences are performed, with the broader context of many findings, including their statistical significance, remaining unknown.

Following the statistical trends in PubMed, biological LLPS has been gaining widespread attention in the last two decades. The rapidly growing amount of data from both in vitro and in vivo experiments have systematically narrowed the range of the LLPS-promoting conditions [31]. From these studies, we know that the regulatory mechanisms of phase separation appear strongly context-dependent [31]. The key factors include: the physicochemical state of the protein (e.g., posttranslational modifications), the environmental conditions (e.g., temperature, pH), and the concentration of binding partners (e.g., proteins,

**Citation:** Badaczewska-Dawid, A.E.; Uversky, V.N.; Potoyan, D.A. BIAPSS: A Comprehensive Physicochemical Analyzer of Proteins Undergoing Liquid–Liquid Phase Separation. *Int. J. Mol. Sci.* **2022**, *23*, 6204. https:// doi.org/10.3390/ijms23116204

Academic Editor: Vitaly V. Kushnirov

Received: 1 May 2022 Accepted: 27 May 2022 Published: 31 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

nucleic acids, carbohydrates, lipids). Many among the recent hypotheses suggest the prevalence of: (i) electrostatics and π-stacking; or (ii) specific sequence decoration in charge or hydrophobicity; and (iii) the role of short sequential (e.g., GARs (glycine–arginine-rich) [32]) or structural (e.g., LARKS (low-complexity amyloid-like reversible kinked segments) [33]) motifs [15]. However, deciphering the interplay between sequence composition and phase separation turns out to be challenging.

In recent years, several databases have emerged that collect LLPS-related protein sequence data and metadata, with prominent examples being PhaSepDB [25], PhaSePro [26], LLPSDB [27,28], and DrLLPS [29]. These databases collect and annotate partially overlapping sets of phase-separating protein sequences, including data on the experimental conditions and significant annotations. In particular, PhaSePro, LLPSDB, and a subset of PhaSepDB contain manually curated proteins, which are recognized for driving the formation of subcellular compartments.

The accumulation of high-quality datasets is certainly a necessary condition for making progress towards uncovering the driving forces of protein phase separation. However, one needs a biophysically motivated computational infrastructure to be able to harness the data from carefully and manually curated sets of phase-separating proteins for revealing the molecular features that determine protein phase separation. We argue that providing the concise but informative patterns of various features, all together horizontally stacked along the protein sequence, could improve the identification of the significant yet nontrivial correlations that contribute to the multivalent interactions. On the basis of these premises, we have developed a novel web platform named BIAPSS: Bioinformatics Analysis of Liquid– Liquid Phase-Separating Protein Sequences (available at https://biapss.chem.iastate.edu/ and last accessed on 31 May 2022). BIAPSS combines a high-throughput interactive deep sequence analysis with a comprehensive pre-parsed bioinformatics database containing a wide array of physicochemical and evolutionary features that are relevant for lowcomplexity, disordered, and ordered proteins. This platform provides scientists working in the field of biomolecular condensates with a versatile tool for the rapid and on-the-fly deep statistical analysis of LLPS-driver protein sequences.

#### **2. Results and Discussion**

#### *2.1. Introduction of BIAPSS*

Figure 1 represents the features included in the comprehensive BIAPSS analyzer. These features combine sequence composition and biophysical properties. The composition component is represented by: (i) the amino acid content, including frequencies and patterning (i.e., distribution and enriched regions); and (ii) the sequence complexity, which comprises the detection of low-complexity regions, repeats, short motifs, and the on-the-fly calculation of Shannon entropy. The biophysical component covers the physicochemical and structural properties. Specifically, we provide a set of residue-resolution patterns: polarity, hydrophobicity, aromaticity, charge induced interactions and hydrogen bonding. These properties, correlated with the experimentally confirmed LLPS regions, facilitate the identification of the nature and driving forces of interactions. The structural properties aid in filtering out the interactions involved primarily in stabilizing the structure or in identifying regions prone to disorder-to-order transitions.

Previously, such structural switchers were recognized in low-complexity and internally disordered sequences that function via phase separation [34]. Thus, the collected molecular properties incorporate robust sequence-based predictions for the secondary structure, solvent accessibility, intrinsic disorder, and intramolecular contacts. Finally, the evolutionary context derived from the joint outcome of the HMMER-based analysis [35] and Pfam database search [36] highlights the location of functional domains, including those specialized in nucleic acid recognition. The other highly conserved short motifs or individual positions detected through the analysis of multiple sequence alignments may confirm that evolution deliberately preserves phase separation.

**Figure 1.** The comprehensive BIAPSS analyzer incorporates the compositional, evolutionary, physicochemical, and structural properties of LLPS proteins. All characteristics can be easily compared and correlated on the horizontally stacked multirow graphs. The interactive exploration helps to filter out sequence signals relevant for phase separation. **Figure 1.** The comprehensive BIAPSS analyzer incorporates the compositional, evolutionary, physicochemical, and structural properties of LLPS proteins. All characteristics can be easily compared and correlated on the horizontally stacked multirow graphs. The interactive exploration helps to filter out sequence signals relevant for phase separation.

Previously, such structural switchers were recognized in low-complexity and internally disordered sequences that function via phase separation [34]. Thus, the collected molecular properties incorporate robust sequence-based predictions for the secondary structure, solvent accessibility, intrinsic disorder, and intramolecular contacts. Finally, the evolutionary context derived from the joint outcome of the HMMER-based analysis [35] and Pfam database search [36] highlights the location of functional domains, including those specialized in nucleic acid recognition. The other highly conserved short motifs or individual positions detected through the analysis of multiple sequence alignments may confirm that evolution deliberately preserves phase separation. The comprehensive approach adopted in BIAPSS (Bioinformatics Analysis of LLPS Sequences) consists of integrating multiple third-party tools and high-performance computing, followed by in-house biostatistical analysis and the extraction of meaningful results (see the Materials and Methods section). The protocol was successfully applied for 501 proteins with experimental evidence of phase separation. Moreover, to the best of our knowledge, the resulting platform represents the broadest database with physico-The comprehensive approach adopted in BIAPSS (Bioinformatics Analysis of LLPS Sequences) consists of integrating multiple third-party tools and high-performance computing, followed by in-house biostatistical analysis and the extraction of meaningful results (see the Materials and Methods section). The protocol was successfully applied for 501 proteins with experimental evidence of phase separation. Moreover, to the best of our knowledge, the resulting platform represents the broadest database with physicochemically characterized LLPS proteins (see Figure 2). In particular, the pool of entries completely covers the contents of several primary databases of curated LLPS deposits (PhaSePro, PhaSepDB.v1, LLPSDB), which collect annotations and experimental conditions. High interest in the phase-separation phenomenon has already spurred the growth of experimental data repositories. However, the deficiency of the computational infrastructure that targets the integrated biophysical and statistical analysis of phase-separating systems still hampers progress in the field. Therefore, in addition to the open access to the raw yet standardized and well-documented results of our extensive work, we have developed a web-based BIAPSS platform for interactive customized exploration and easy interpretation.

chemically characterized LLPS proteins (see Figure 2). In particular, the pool of entries completely covers the contents of several primary databases of curated LLPS deposits (PhaSePro, PhaSepDB.v1, LLPSDB), which collect annotations and experimental condi-As a user-friendly web server, BIAPSS (https://biapss.chem.iastate.edu/, last accessed on 31 May 2022) is billing itself as a central resource for the systematic and standardized statistical analysis of the biophysical characteristics of the known LLPS sequences.

tions. High interest in the phase-separation phenomenon has already spurred the The web service provides users with:

and easy interpretation.


es.

**Figure 2.** The BIAPSS repository collects the largest dataset of known LLPS proteins that have been identified from carefully curated primary LLPS databases. The computational framework starts from protein sequences downloaded from the UniProt database. The approach builds on three complementary components: (**a**) integration of metadata, annotation, and cross-links from the external databases; (**b**) comprehensive sequence-based bioinformatic analysis of evolutionary and biomolecular properties using state-of-the-art third-party software; and (**c**) meticulous physicochemical and compositional analysis and robust data integration using the in-house algorithms. The BIAPSS interactive web applications enable exploration through the distilled essence of the crafted characteristics. **Figure 2.** The BIAPSS repository collects the largest dataset of known LLPS proteins that have been identified from carefully curated primary LLPS databases. The computational framework starts from protein sequences downloaded from the UniProt database. The approach builds on three complementary components: (**a**) integration of metadata, annotation, and cross-links from the external databases; (**b**) comprehensive sequence-based bioinformatic analysis of evolutionary and biomolecular properties using state-of-the-art third-party software; and (**c**) meticulous physicochemical and compositional analysis and robust data integration using the in-house algorithms. The BIAPSS interactive web applications enable exploration through the distilled essence of the crafted characteristics.

As a user-friendly web server, BIAPSS (https://biapss.chem.iastate.edu/, last accessed on 31 May 2022) is billing itself as a central resource for the systematic and standardized statistical analysis of the biophysical characteristics of the known LLPS sequenc-

The web service provides users with: (i) A database of the superset of experimentally evidenced LLPS-driver protein sequences. (ii) A repository of precomputed bioinformatics and statistics data. (iii) Two sets of web applications supporting the interactive analysis and visualization of the physicochemical and biomolecular characteristics of LLPS proteins. The applications integrate the results from our comprehensive computational approach. The SingleSEQ module includes a residue-resolution biophysical analyzer for interrogating individual protein sequences. The complementary analyses are organized in nine web applications that toggle between a generalized summary view and details specific to a given characteristic. The latter allows users to correlate regions prone to phase separation with an array of physicochemical attributes, structural properties, detected The applications integrate the results from our comprehensive computational approach. The SingleSEQ module includes a residue-resolution biophysical analyzer for interrogating individual protein sequences. The complementary analyses are organized in nine web applications that toggle between a generalized summary view and details specific to a given characteristic. The latter allows users to correlate regions prone to phase separation with an array of physicochemical attributes, structural properties, detected domains, and various sequential or structural motifs. Many characteristics provided by applications in the SingleSEQ pipeline are qualitative and show a profile or pattern of the feature along the amino acid sequence. Examples include distributions of physicochemical characteristics, such as polarity, hydrophobicity, charge, residues forming hydrogen bonds, and pi-stacking. In these cases, the assignment is binary, and the numerical value is the percentage of residues in the sequence that meet the criterion. The second group of characteristics includes structural features predicted based on the amino acid sequences with top-ranked tools. Here, examples include the secondary structure, the solvent accessibility, the tendency to disorder, and low-complexity regions. The visual representation is developed to assign each position along the amino acid sequence a discrete consensus value (e.g., helical or extended, or coil for the secondary structure). The numerical value is the

percentage of residues that meet the given criterion (e.g., % of helical). Figure 1 is a concept image, while, in the interactive graphs, there is a label of what the given value refers to. Furthermore, for those interested in in-depth analyses, the individual applications offer an on-the-fly exploration of the results from the original tools, which typically provide the fractional probabilities for each variant of a feature (e.g., p(helical), p(extended), p(coil)) for each position along the protein sequence.

BIAPSS also includes the MultiSEQ module. One of its aims is to obtain insight into the overall characteristics of the sufficient nonredundant set of LLPS-driver protein sequences. The comparison to the benchmarks of various protein groups enables a statistical inference of specific phase-separating affinities. Finally, BIAPSS incorporates an extensive cross-reference section that links all entries to primary LLPS databases and other external resources, thereby serving as a central navigation hub for the phase-separation community. All the data used by BIAPSS are freely available for download as well-formatted files with detailed descriptions, facilitating rapid implementation in user-defined computational protocols. The long-term plan for BIAPSS is for it to serve as a unifying hub for the experimental and computational community. Thus, it provides a comprehensive set of analytic tools, biophysically featured data, and standardized protocols that facilitate the identification of the sequence signals that drive the LLPS, which altogether can support applications for designing new sequences of biomedical interest.

#### *2.2. Case Study and Tutorial: Fused in Sarcoma (FUS)*

To illustrate the practical utility of BIAPSS, we carefully interpreted the results for fused in sarcoma (FUS) (UniProt ID: P35637), which is a widely used model system to study biological phase separation [37]. We provide below the details of the BIAPSS-based analysis, combined with a handy tutorial on the BIAPSS functionalities.

Fused in sarcoma (FUS) is one of the early discovered biological systems that undergoes self-organization by liquid–liquid phase separation (LLPS) [37]. Since then, the protein has been the subject of extensive experimental and computational research to understand the molecular mechanisms and interactions that drive this phenomenon. FUS can be found in the SingleSEQ module of the BIAPSS service by the UniProt identifier (P35637), the gene (FUS), or by using the "RNA-binding" search key (Last accessed on May 30 2022). The summary page contains a high-quality image of the experimentally confirmed cellular location (left panel in Figure 3). Due to its multifunctionality in RNA processing, FUS is mostly observed in the nucleus [38]. In physiological conditions, the low levels of the protein are distributed in the cytoplasm [39], where FUS transports and manages RNA through the dynamic liquid-like subcellular compartments, such as ribonucleoprotein or stress granules [40]. However, the cytoplasmic concentration of FUS significantly increases when noxious mutations lead to aggregation [41].

This progressively aberrant process is manifested by neurodegenerative diseases in humans [41]. Although plenty of accumulated evidence points to the influence of distinct factors on the cellular behavior of FUS, its primary sequence still holds many cues. To frame the physicochemical properties of full-length FUS, we used the analytical approach offered by the SingleSEQ module of BIAPSS.

The average metrics, available in the *Summary* of the SingleSEQ module, indicate that the 526-residue-long sequence of FUS contains over 80% disorder and only 8% order. The solvent-accessibility predictions show the same aspect ratio between exposure and burial. The contents of aromatic, hydrophobic, polar, and charged residues are 10%, 42%, 40%, and 17%, respectively, with a slight excess of positive charge. Such a rough overview described by a set of averages gives some general insight into the protein properties, but it conceals some local distributions that are important for the identification of the preferential interactions.

disordered.

To frame the physicochemical properties of full-length FUS, we used the analytical ap-

that the 526-residue-long sequence of FUS contains over 80% disorder and only 8% order. The solvent-accessibility predictions show the same aspect ratio between exposure and burial. The contents of aromatic, hydrophobic, polar, and charged residues are 10%, 42%, 40%, and 17%, respectively, with a slight excess of positive charge. Such a rough overview described by a set of averages gives some general insight into the protein

The average metrics, available in the *Summary* of the SingleSEQ module, indicate

proach offered by the SingleSEQ module of BIAPSS.

tion of the preferential interactions.

**Figure 3.** The **left** panel shows the cellular location of FUS (image source: BIAPSS). The protein is predominantly located in the nucleus. The physiological low levels of FUS found in the cytoplasm typically self-organize to membraneless compartments, such as stress granule or ribonucleoprotein granule. The aberrant disease-related aggregates are mostly localized in the cytoplasm. The methylation (purple stars) of C-terminal arginines (green tail) in the wild-type FUS strongly promotes phase separation and gelation. The phosphorylation (magenta stars) of serine and threonine in the N-terminus (blue tail) dissolve liquid-liquid droplets. The tyrosine-to-phenylalanine mutants (yellow stars) in the N-terminus and hypomethylation of arginines in the C-terminus increase aggregation. The **right** panel shows classification of intrinsically disordered ensemble re-**Figure 3.** The **left** panel shows the cellular location of FUS (image source: BIAPSS). The protein is predominantly located in the nucleus. The physiological low levels of FUS found in the cytoplasm typically self-organize to membraneless compartments, such as stress granule or ribonucleoprotein granule. The aberrant disease-related aggregates are mostly localized in the cytoplasm. The methylation (purple stars) of C-terminal arginines (green tail) in the wild-type FUS strongly promotes phase separation and gelation. The phosphorylation (magenta stars) of serine and threonine in the N-terminus (blue tail) dissolve liquid-liquid droplets. The tyrosine-to-phenylalanine mutants (yellow stars) in the N-terminus and hypomethylation of arginines in the C-terminus increase aggregation. The **right** panel shows classification of intrinsically disordered ensemble regions (CIDER) [42] for the FUS sequence split into functional segments.

gions (CIDER) [42] for the FUS sequence split into functional segments. Therefore, we conduct a detailed analysis of the composition and complexity of the FUS sequence, and we present the resulting patterns in Figure 4. Compared to any reference set of proteins (use *Composition and Complexity* app), this one is extremely enriched in glycine, which makes up nearly 1/3 of the full sequence. Another 20% of the amino acid content consists primarily of serine and glutamine. Although the dominant content of these three amino acids suggests the generally low complexity of the sequence, their distribution along the sequence is strongly heterogeneous. Indeed, the calculated low information content of the sequence is mainly localized around protein terminals and clearly corresponds to three fragments with high glycine concentrations (LCR2: residues 164–267; LCR3: residues 370–420; LCR4: residues 454–507). These regions also exclusively accumulate total arginines, which, together with glycine, form a series of RGG repeating motifs that are known to bind RNA specifically [32]. Both serine and glutamine are mostly localized at the N-terminus, being more clearly clustered within LCR1 (1–163). LCR1 additionally gathers 24/35 available tyrosines, and, thus, it has visibly distinct enrichment (SQYG) that is known to occur in prion-like domains (PLD) [43]. By using the *Domains, Motifs, Repeats* application, we also found that the remaining compositionally Therefore, we conduct a detailed analysis of the composition and complexity of the FUS sequence, and we present the resulting patterns in Figure 4. Compared to any reference set of proteins (use *Composition and Complexity* app), this one is extremely enriched in glycine, which makes up nearly 1/3 of the full sequence. Another 20% of the amino acid content consists primarily of serine and glutamine. Although the dominant content of these three amino acids suggests the generally low complexity of the sequence, their distribution along the sequence is strongly heterogeneous. Indeed, the calculated low information content of the sequence is mainly localized around protein terminals and clearly corresponds to three fragments with high glycine concentrations (LCR2: residues 164–267; LCR3: residues 370–420; LCR4: residues 454–507). These regions also exclusively accumulate total arginines, which, together with glycine, form a series of RGG repeating motifs that are known to bind RNA specifically [32]. Both serine and glutamine are mostly localized at the N-terminus, being more clearly clustered within LCR1 (1–163). LCR1 additionally gathers 24/35 available tyrosines, and, thus, it has visibly distinct enrichment (SQYG) that is known to occur in prion-like domains (PLD) [43]. By using the *Domains*, *Motifs*, *Repeats* application, we also found that the remaining compositionally more complex regions of the C-terminus (I287-L365 and R422-D453) match the PF00076 and PF00641 Pfam domains (i.e., the RNA recognition motif (RRM) and RNA-binding zinc finger (ZnF), respectively). The robust predictions (for details, see Methods) unanimously show that RRM is a well-folded FUS domain, while the other fragments remain disordered.

more complex regions of the C-terminus (I287-L365 and R422-D453) match the PF00076 and PF00641 Pfam domains (i.e., the RNA recognition motif (RRM) and RNA-binding zinc finger (ZnF), respectively). The robust predictions (for details, see Methods) unanimously show that RRM is a well-folded FUS domain, while the other fragments remain

**Figure 4.** Sequence composition and complexity of FUS. The upper panel (**a**) shows the amino acid (AA) content of the query sequence compared to the Eukaryota dataset (black indicates higher and red the lower content than the reference). The bottom panel shows, the following information for a specific query sequence: (**b**) the patterning of enriched amino acids (S (magenta), Q (orange), Y (yellow), R (purple), G (cyan)); (**c**) low-complexity measures (a color scale corresponding to each amino acid) provided as regions of particular AA enrichments (LCR row), and the sequence information content (H(S) row, Shannon entropy); (**d**) consensus of predicted disorder regions (gray) and secondary-structure assignment (helix in green, strand in magenta, coil in light blue); (**e**) detected Pfam domains; and (**f**) evolutionary conservation derived from the multiple sequence alignment against UniRef50 (blue shades). The amino acid patterning section contains points corresponding to the locations of the most relevant serine phosphorylation sites (residues 30, 42, 54, 61, 84, 87) [37], arginine methylations (residues 216, 259, 407, 472, 473, 476) [44], and tyrosine mutations (residues 113, 122, 130, 136, 143, 149, 161) [44]. The seed MSAs prepared for FUS within the *Sequence Conservation* application further confirm that both domains are evolutionarily conserved members of Pfam families: **Figure 4.** Sequence composition and complexity of FUS. The upper panel (**a**) shows the amino acid (AA) content of the query sequence compared to the Eukaryota dataset (black indicates higher and red the lower content than the reference). The bottom panel shows, the following information for a specific query sequence: (**b**) the patterning of enriched amino acids (S (magenta), Q (orange), Y (yellow), R (purple), G (cyan)); (**c**) low-complexity measures (a color scale corresponding to each amino acid) provided as regions of particular AA enrichments (LCR row), and the sequence information content (H(S) row, Shannon entropy); (**d**) consensus of predicted disorder regions (gray) and secondary-structure assignment (helix in green, strand in magenta, coil in light blue); (**e**) detected Pfam domains; and (**f**) evolutionary conservation derived from the multiple sequence alignment against UniRef50 (blue shades). The amino acid patterning section contains points corresponding to the locations of the most relevant serine phosphorylation sites (residues 30, 42, 54, 61, 84, 87) [37], arginine methylations (residues 216, 259, 407, 472, 473, 476) [44], and tyrosine mutations (residues 113, 122, 130, 136, 143, 149, 161) [44].

RRM\_1 and zf-RanBP, respectively (see bottom rows in Figure 4). The visual inspection of the amino acid content and the distribution of FUS allows us to identify and isolate specific regions in the protein (Figure 5). Furthermore, we have performed a physicochemical featurization of these segments, using the *Chemical Properties Patterns* app, which reveals the preferred interactions when coupled with biomolecular conditionals that are known from experiments. The recent experimental reports show that the isolated prion-like domain (PLD) (residues 1–214, or even residues 1–163) can undergo selforganization, forming liquid droplets when kept at high protein levels or high salt concentrations [44,45]. This N-terminal fragment is enriched in amino acids, whose side chains are multivalent, as shown in Figure 5. Thus, the dense pattern of polarity comes from the enrichment in S, Q, Y, where Y, Q, and G also provide π-electron centers for ππ-stacking. Most of them are also able to be both donors and acceptors of side-chain protons for hydrogen bonding (HB). In line with this, the intermolecular-interaction profiles derived from simulations of the 120–163 region indicate the most frequent contacts be-The seed MSAs prepared for FUS within the *Sequence Conservation* application further confirm that both domains are evolutionarily conserved members of Pfam families: RRM\_1 and zf-RanBP, respectively (see bottom rows in Figure 4). The visual inspection of the amino acid content and the distribution of FUS allows us to identify and isolate specific regions in the protein (Figure 5). Furthermore, we have performed a physicochemical featurization of these segments, using the *Chemical Properties Patterns* app, which reveals the preferred interactions when coupled with biomolecular conditionals that are known from experiments. The recent experimental reports show that the isolated prion-like domain (PLD) (residues 1–214, or even residues 1–163) can undergo self-organization, forming liquid droplets when kept at high protein levels or high salt concentrations [44,45]. This Nterminal fragment is enriched in amino acids, whose side chains are multivalent, as shown in Figure 5. Thus, the dense pattern of polarity comes from the enrichment in S, Q, Y, where Y, Q, and G also provide π-electron centers for π-π-stacking. Most of them are also able to be both donors and acceptors of side-chain protons for hydrogen bonding (HB). In line with this, the intermolecular-interaction profiles derived from simulations of the 120–163 region indicate the most frequent contacts between QQ > QY > YY > SY and other pairs of enriched amino acids [46]. All of these observations suggest that the homotypic phase separation

of wild-type PLD monomers is driven by balanced contributions from hydrogen bonding and π-stacking. Indeed, several mutagenesis studies show that Y→A substitution disrupts phase separation by the removal of both components of the interaction, while Y→F mutants are significantly more aggregation-prone, due to the strengthening of the binding via tighter hydrophobic F-π-stacking at the cost of losing HB contributions of polar tyrosine [44,46]. It is also worth noting that the PLD region is completely deficient of positive charge, with a minor net charge per residue of −0.01 (M1-S165: −0.012; and M1-G212: −0.024), which places it within the weak polyelectrolyte region on the CIDER diagram (right panel in Figure 3) [42]. However, an excess of serine and threonine in this region provides an ability to introduce a strongly negative charge through multiple phosphorylations. After phosphorylation, the dominant force becomes electrostatic repulsion, which is known to disrupt both phase separation and aggregation [37]. The central region of the PLD (residues 39–95) was proposed as the core of aberrant fibrils, which, in solid-state form, structured cross-*β*-sheets [37]. The same structural properties have not been unambiguously confirmed in the condensed phase of liquid–liquid mixing. Undoubtedly, however, our algorithms detected, along this region, structural motifs known as low-complexity amyloidlike reversible folded segments (LARKS) [33]. In our analysis, the most effective predictors of structural properties showed, for these motifs, some tendency towards an extended secondary structure and a slightly increased probability of burial (bottom panel in Figure 5; use *Secondary Structure*, *Solvent Accessibility* and *Structural Disorder* applications of the BIAPSS SingleSEQ module). Interestingly, the prediction in eight-letter notation detected a turn or bend within each of the structural motifs, which explains their flexible nature. These findings, together with the ambiguous experimental results, may suggest some variations in the structural state in the PLD core, and specifically the disorder-to-order transition driven by biomolecular conditionals.

The remaining part of the FUS sequence, referred to as the C-terminus, contains two well-known domains (RRM and ZnF) and three glycine–arginine-rich regions (GARs), which are detected using the *Domains, Motifs, Repeats* application. All components are significant players in binding RNA. Zinc finger supports only the recognition of the specific GGU motif, while the RRM domain and RGG repeats are universal towards a variety of RNAs [47]. Both folded domains of FUS are much less polar than the PLD, as seen from the BIAPSS-based physicochemical features in Figure 5. They also have a lower content of side chains that are able to engage in π-stacking or hydrogen bonding. However, the charged residues are pretty abundant in the composition of RRM and ZnF, which explains the functional role of electrostatic interactions towards the binding of nucleic acids or stabilizing folds via salt bridges [48,49].

All three GARs are the least polar regions of the protein (see Figure 5; use *Chemical Properties Patterns* app of BIAPSS SingleSEQ module). The dense patterning of hydrophobicity arises from glycine excess. The rich π-electron-containing systems, other than aromatic side chains, originate mainly from the abundance of the arginine's guanidino group. Arginine is also a source of excess positive charge at the C-terminus. The experimental studies consistently confirm that the isolated C-terminus does not undergo phase separation [44]. However, liquid–liquid droplets rapidly occur when mixed with N-terminal monomers [44]. Moreover, the LLPS of full-length wild-type FUS is more robust than the heterotypic mixing of the N- and C-terminals and the homotypic self-assembling of N-terminal monomers [44]. This suggests the higher priority of cation-π (R-Y) stacking over π-π (Y-Y) stacking, while both are reinforced by hydrogen bonds. Another experimental study showed that R→K mutants, who no longer have the ability of π-π-stacking but retain charge, can still undergo phase separation. In turn, R→A substitutions prevent phase separation because they lose the π-system, cation, and ability of side-chain hydrogen bonding. Interestingly, the recent report indicates that stacking interactions, including cation-π (e.g., RY, KF) and especially π-π (e.g., YY and RY, and even RQ), are most robust over a wide range of salt concentrations [45]. The hydrophobic contribution from π-electron-containing systems becomes the main force that strengthens the contact in high salt. In these conditions, the screening

of usually dominant electrostatic contributions is significant. Surprisingly, changing the partitioning of the different forces makes the interaction of the two positively charged arginines attractive under these conditions [45]. The set of diverse chemical groups in arginine is a unique feature among the other amino acid side chains. With its high reactivity, the need for precise regulation comes, and so arginine can be tuned to a preferred state by posttranslational methylation. nant electrostatic contributions is significant. Surprisingly, changing the partitioning of the different forces makes the interaction of the two positively charged arginines attractive under these conditions [45]. The set of diverse chemical groups in arginine is a unique feature among the other amino acid side chains. With its high reactivity, the need for precise regulation comes, and so arginine can be tuned to a preferred state by posttranslational methylation.

even RQ), are most robust over a wide range of salt concentrations [45]. The hydrophobic contribution from π-electron-containing systems becomes the main force that strengthens the contact in high salt. In these conditions, the screening of usually domi-

*Int. J. Mol. Sci.* **2022**, *23*, 6204 9 of 19

**Figure 5.** Physicochemical and structural properties of FUS. The various characteristics are shown along the protein-sequence split on the N-terminus (blue) and C-terminus (green). The full-length N-terminus corresponds to a highly polar prion-like domain (PLD in gray). The PLD contains a core region (residues 39–95 in orange), in which multiple LARKS motifs were detected (orange bars in the MOTIFS row) and evidenced to form fibrils in a solid state. The C-terminus contains two well-folded domains detected by Pfam search (RRM in yellow, ZnF in gray), and three glycine–arginine-rich (GAR in cyan) regions. All components of the C-terminus are known to have a functional role in RNA binding. The following rows show the physicochemical patterning along the full-length FUS sequence, including charge (positive (magenta), negative (blue)), polarity, donor (blue)/acceptor (magenta)/both (purple) of side-chain proton for hydrogen bonding, πelectron-containing systems (blue), with separation of aromatic ones (dark blue), hydrophobicity (blue), predicted solvent accessibility (SA) in 3-letter notation (exposed (red), buried (green), medium (blue)). The zoom of the PLD core is shown at the bottom panel, where the SS and SA rows contain the predicted probabilities of secondary structure (helix (green), strand (red), coil (blue)) and solvent accessibility. The green arrows indicate the side chains buried in the fibril core, while the black frames highlight segments that form strands of a cross- motif [37]. Thus, under physiological conditions, FUS is highly methylated [44]. This limits **Figure 5.** Physicochemical and structural properties of FUS. The various characteristics are shown along the protein-sequence split on the N-terminus (blue) and C-terminus (green). The full-length N-terminus corresponds to a highly polar prion-like domain (PLD in gray). The PLD contains a core region (residues 39–95 in orange), in which multiple LARKS motifs were detected (orange bars in the MOTIFS row) and evidenced to form fibrils in a solid state. The C-terminus contains two well-folded domains detected by Pfam search (RRM in yellow, ZnF in gray), and three glycine–arginine-rich (GAR in cyan) regions. All components of the C-terminus are known to have a functional role in RNA binding. The following rows show the physicochemical patterning along the full-length FUS sequence, including charge (positive (magenta), negative (blue)), polarity, donor (blue)/acceptor (magenta)/both (purple) of side-chain proton for hydrogen bonding, π-electron-containing systems (blue), with separation of aromatic ones (dark blue), hydrophobicity (blue), predicted solvent accessibility (SA) in 3-letter notation (exposed (red), buried (green), medium (blue)). The zoom of the PLD core is shown at the bottom panel, where the SS and SA rows contain the predicted probabilities of secondary structure (helix (green), strand (red), coil (blue)) and solvent accessibility. The green arrows indicate the side chains buried in the fibril core, while the black frames highlight segments that form strands of a cross-β motif [37].

self-assembly via interactions with tyrosine and promotes a functional role of intermolecular interactions with other proteins and nucleic acids. Therefore, phase separation and the gelation of FUS can increase by the hypomethylation of arginines within RGGrich regions or the insertion of additional ones into the C-terminus [44]. All of these findings come together to demonstrate the significant role of the arginine side chain in phase separation. Tyrosine and glutamine are similarly relevant. Thus, under physiological conditions, FUS is highly methylated [44]. This limits selfassembly via interactions with tyrosine and promotes a functional role of intermolecular interactions with other proteins and nucleic acids. Therefore, phase separation and the gelation of FUS can increase by the hypomethylation of arginines within RGG-rich regions or the insertion of additional ones into the C-terminus [44]. All of these findings come together to demonstrate the significant role of the arginine side chain in phase separation. Tyrosine and glutamine are similarly relevant.

#### *2.3. FUS, LLPS Regulated in the Context-Dependent Tuning of Preferred Forces*

Note that the original results generated for FUS (identifier: P35637) using the BIAPSS web platform are shown in Supplementary Materials. The findings described in the previous section are briefly summarized below. FUS is a predominantly disordered protein (80%), composed of two functional regions: the prion-like domain (residues 1–212) and the RNA-binding C-terminal fragment (residues 213–526). Surprisingly, although the sequence is more than 65% composed of only five amino acids (G >> S >Q > R~Y), their distribution is highly variable. In particular, low sequence complexity occurs mainly in three structurally flexible glycine–arginine-rich regions. They provide hydrophobicity, π-electron-containing centers, and a positive charge of the C-terminus. In contrast, the enrichment of the Nterminus in serine, glutamine, and tyrosine makes it strongly polar but negatively charged (due to a few aspartic acids), with numerous aromatic centers and side chains capable of hydrogen bonding. These very different physicochemical properties of the protein terminals are sensitive to environmental changes and allow for the context-dependent regulation of the protein's cellular behavior. In particular, arginine can be tuned to a preferred state by posttranslational methylation [44], while serine/threonine phosphorylation introduces a highly negative charge [37]. Both modifications prevent the formation of aberrant selfassembly. In the first case, the methyl groups hinder cation-π stacking between arginine and tyrosine, limiting the phase separation driven by contacts between the N- and Cterminals. In the second case, phosphorylation introduces strong electrostatic repulsion between the N-termini. It inhibits homotypic phase separation, driven by the π-stacking of aromatic tyrosines. However, both mechanisms have no effect under high salinity due to the significant screening of electrostatics. In such conditions, even interactions of arginines become attractive due to hydrophobic contributions and π-π stacking [45]. Therefore, these observations demonstrate that the peculiar physicochemical properties of amino acid residues play a significant role in phase separation. The multifunctional chemical groups of amino acids make them reactive and multivalent. These features aid in the contextdependent tuning between preferred modes of interactions. They can work synergistically or alternatively, and their regulation depends on the environmental conditions, the state of posttranslational modifications, and the presence of binding partners.

#### **3. Materials and Methods**

#### *3.1. Sequence Complexity and Physicochemical Decoration*

#### 3.1.1. Sequence Complexity

Low-complexity regions (LCRs) in proteins are compositionally biased fragments of sequences that often have low amino acid diversity and repeats of short motifs of the sequential or structural kinds. Many reports point to their functional or regulatory roles, frequently also associated with subcellular phase separation [19]. The LCRs of LLPS proteins have been detected by using several state-of-the-art tools, such as SIMPLE [50], CAST [51], fLPS [52], and SEG [53]. The original hits were parsed by in-house algorithms to merge overlapping regions enriched in different amino acids, and only the integrated and unified results have been kept.

Shannon Entropy describes the information content held in data and it is a frequently used measure of protein-sequence complexity. We implemented a module for the on-the-fly calculation of it within BIAPSS services. The typical window length for compositional effects is between 5 and 20. The results can be displayed in:

• Residue-resolution mode (residue option; smoother output):

$$\mathcal{S}\_{\bar{i}} = \frac{1}{N} \sum\_{j=1}^{N} \mathcal{S}\_{(j,N)}$$

where the Shannon entropy (*S*(*i*) ) at sequence position *i* is a sum of entropies at all windows containing this position, normalized by the window length (*N*);

• Window-resolution mode (block option):

$$\mathcal{S}\_{(j,N)} = -\sum\_{aa=1}^{AA=20} f\_{aa} \log\_2(f\_{aa})$$

where the Shannon entropy (*S*(*j*,*N*) ) at the *j*-th sequence window of the length (*N*) is summed over the fractions (*faa*) of 20 biogenic amino acids. The value is assigned to the center position within the window. The *S*(*j*,*N*) ranges from 0 (where only one residue is present within the sequence window) to log2(*N*) (all positions are different). Therefore, the lower the Shannon entropy, the less complex the sequence is.

#### 3.1.2. Physicochemical Decoration

To examine the physicochemical properties of LLPS-driving proteins, we identified, along each sequence, the patterns of polarity (Ser, Thr, Tyr, Gln, Asn, Cys, Met), hydrophobicity (Gly, Ala, Val, Ile, Leu, Pro, Phe), and detected π-stacking centers (Arg, Asn, Asp, Gln, Glu, Gly (note that, due to the lack of the side chain, glycine can stack via π-electrons from a peptide bond and hydrogen bonding via backbone carbonyl or amide), including those within aromatic rings (Phe, Tyr, Trp, His). We also provided the charge-distribution split between positively (His, Lys, Arg) and negatively (Glu, Asp) charged residues. For each feature, both the arrangement along the sequence and the fraction of residues are provided.

#### 3.1.3. Electrostatics

It is well established that the electrostatic interactions often affect the solubility and stabilize the binding interface in the liquid–liquid demixing of biomolecules. The recently proposed charge-decoration parameters emerged as a measure of charge distribution along the protein sequence. In addition to the overall charge content, these descriptors are seen as important factors that shape the protein conformations, especially within lowcomplexity regions [54]. Following these discoveries, we calculated and compared the charge-decoration parameters; namely:


#### *3.2. HMMER-Based Sequence Conservation and Functional-Domain Detection*

The multiple sequence alignment (MSA) and consensus profile were prepared using an efficient HMMER method (*phmmer* + *hmmalign* and *hmmbuild*, respectively), which employs a probabilistic hidden Markov model (HMM) [35], and are significantly more accurate compared to BLAST-based searches. Because some of the LLPS sequences are highly unique (detection of the remote homologs is needed), and because the MSA is reliable if at least several dozen homologous sequences are available, we used sequences selected from various UniProt subsets. Specifically, SwissProt, UniRef50, and UniRef90 differ in the size and increasing sequence identity of entries [57–59]. To identify sequence regions with significant evolutionary conservation, we derived three additional MSA-based parameters: strength, diversity, and character. The MSA strength of the sequence conservation informs on how much the specific position is held by evolution. This measure normalizes results from the *hmmlogo* tool to a discrete range from 0 (poorly conserved) to 5 (highly conserved). The *hmmlogo* computes letter heights along the sequence, depending on the information content of the position. The MSA diversity defines the number of different amino acids detected at a given position in the MSA, and is provided in discrete scale from 0 (highly conserved) to 5 (poorly conserved) (0—one, 1—two, 2—three, 3—four, 4—five or six, 5–7 and more amino acids at the aligned position). The MSA character describes the chemical

nature of the most common amino acids at a given position in the multiple sequence alignment. We distinguished the following attributes: polar, charge, aromatic, another π-system, hydrophobic, and other (G or P).

Some LLPS proteins are composed of one or more well-known domains. The identification of these functional regions alongside regions of low complexity or disorder can provide additional insights into the regulatory role of phase separation. Therefore, we have performed a Pfam search for all LLPS proteins, reporting the detected domains and incorporating the original Pfam seed-MSAs for corresponding regions of LLPS sequence (instead of full-length ones) to derive more reliable evolutionary conservation descriptors.

#### *3.3. Short Sequential and Structural Motifs Specific for LLPS Sequences*

Short linear motifs (SLiMs) are short fragments along the sequence, often situated in the intrinsically disordered regions, generally showing high structural flexibility and evolutionary conservation. We systematically detected various short sequential and structural motifs. The implemented algorithms used the list of grouped motifs' instances, defined by regular expressions, as the keys to search protein sequences prone to phase separation. Among motifs known from the literature as relevant for phase behavior, our analysis includes short structural stretches of protein sequence, such as LARKS [33] and steric zippers [60]; glycine–arginine-rich regions (GARs) [32]; and new sequential repetitive n-mers.

#### *3.4. Structural Properties Derived from Sequence-Based Predictions*

Bearing in mind the predictive nature of sequence-based methods and, hence, their limited accuracy, comparing several of them and choosing the final consensus has proven to be successful in many approaches. In our study, we comprised predictions from at least three to six widely used tools for each biomolecular characteristic. While almost every method is available as a web server, due to the size and complexity of our analyses, we employed standalone versions. The raw data derived from these standalone tools during the high-performance computing was initially parsed, filtered, and simplified to a uniform CSV format, and deposited in our online repository at https://biapss.chem.iastate.edu/ download.html, accessed on 1 April 2022.

#### 3.4.1. Secondary Structure

Protein secondary structure is a regular three-dimensional organization of local fragments along a polypeptide chain. The two most common secondary structural elements are alpha helices and beta sheets. Most of the predictors provide the secondary-structure assignment in 3-letter notation (ss3): H—helix, E—strand, C—coil, while the advanced ones (RaptorX and PORTER-5) also deliver more detailed 8-letter notation (ss8): H—α-helix; G—310-helix; I—π-helix; E—β-strand; B—β-bridge; T—HB-turn; S—bend; C–loop. In our benchmark study, we employed five well-established secondary-structure predictors: PSIPRED [61], RaptorX-SS8 [62], PORTER-5 [63], SPIDER-3-Single [64], and FELLS [65].

#### 3.4.2. Solvent Accessibility

Solvent accessibility gives some insight into protein structural flexibility, indicating the exposed patches on the protein surface available for interactions with the solvent molecules. Some surface sites have high evolutionary conservation, which is suggestive of functional or structural importance. Since not many structures of phase-separating IDPs are known, the robust prediction of solvent accessibility can help to identify flexible regions prone to conformational changes upon binding. The assignment of solvent accessibility is usually provided in the 3-letter code: B—buried, E—exposed, M—medium. In our benchmark study, we employed three well-established solvent-accessibility predictors: RaptorX-Property [66], PaleAle 5.0 [67], SPOT-1D [68].

#### 3.4.3. Structural Disorder

The sequence-based predictions indicate regions of increased structural flexibility, usually estimating the disorder probability at a given position in the sequence. Detecting highly flexible regions may support the identification of short sequence stretches of multivalent interactions that can be relevant to phase separation. In our benchmark study, we employed seven well-established predictors of structural disorder: RaptorX-Property [66], IUPred2A [69], SPOT-Disorder [70], DISOPRED (v2 and v3) [71], and PONDR® (FIT, VLXT, VSL2) [72]. Most of these methods return the probability of disorder for each position in the sequence. Usually, the residue is considered as ordered when the score is below 0.5. The protein-binding regions in disordered fragments were estimated using the ANCHOR method [73].

#### 3.4.4. Contact Map

Contact-map application provides a more reduced representation of a protein structure using a binary two-dimensional matrix of distances between all possible amino acid-residue pairs. The commonly used definition assumes the threshold 6–10 Å as the distance between the pair of two Cα or Cβ atoms being in contact. The contact number of protein residues limits the number of possible protein conformations and helps encode a three-dimensional structure. In our benchmark study, we employed three state-of-the-art predictors of intramolecular contacts: RaptorX-Contact [74], ResPRE [75], SPOT-Contact [68,76].

#### *3.5. Data Availability*

The UniProt IDs of LLPS sequences were collected as a joint superset of deposits from primary LLPS databases (i.e., PhaSePro (https://phasepro.elte.hu/, accessed on 1 April 2022), PhaSepDB.v1 (http://db.phasep.pro/, accessed on 1 April 2022), LLPSDB (http://bio-comp.org.cn/llpsdb/, accessed on 1 April 2022)). Then, protein sequences were taken from the UniProt database, available at https://www.uniprot.org/, accessed on 1 April 2022. The cellular location of the protein was derived via web scraping of primary LLPS databases, UniProt and COMPARTMENTS (https://compartments.jensenlab.org/, accessed on 1 April 2022). The following resources were reviewed for the corresponding entries of experimental or predicted three-dimensional structures: PDBe (https://pdbekb.org/, accessed on 1 April 2022), Swiss-Model Repository (https://swissmodel.expasy. org/repository/, accessed on 1 April 2022), ModBase (http://salilab.org/modbase-cgi/, accessed on 1 April 2022), and AlphaFold DB (https://alphafold.ebi.ac.uk/, accessed on 1 April 2022).

The results of our comprehensive analysis, performed on 501 proteins, are available at https://biapss.chem.iastate.edu/download.html, accessed on 1 April 2022. For each file, the details of its content and methods used are comprehensively described. These files are used directly as the input for the web applications of the SingleSEQ and MultiSEQ modules in the BIAPSS platform. The results of the analysis can be explored interactively online, saved as high-quality PNG images, and used directly as figures in the publication.

#### *3.6. Code Availability*

3.6.1. Phase-Separation Predictors:

1. PSPredictor, web server version available at http://www.pkumdl.cn:8000/PSPredictor, accessed on 1 April 2022.

3.6.2. Low-Complexity-Region (LCR) Predictors (Sequence-Based):

1. SEG, standalone (1999), available at https://ftp.ncbi.nih.gov/pub/seg/seg/, accessed on 1 April 2022;

2. fLPS, standalone Sep 2017, available at https://github.com/pmharrison/flps, accessed on 1 April 2022;

3. SIMPLE, standalone V6-6.1, available at https://github.com/john-hancock/SIMPLE-V6, accessed on 1 April 2022;

4. CAST2, web server version available at http://structure.biol.ucy.ac.cy/CAST2 /index.html, accessed on 1 April 2022.

3.6.3. Multiple Sequence Alignment/Build a Profile/Conservation Logo:

1. HMMER (phmmer, hmmalign, hmmbuild, hmmlogo), standalone 3.3, available at http://hmmer.org/download.html, accessed on 1 April 2022;

2. Pfam, the database search was used to detect functional domains, available at http://pfam.xfam.org/, accessed on 1 April 2022.

3.6.4. Secondary-Structure Prediction (Sequence-Based):

1. PSIPRED, standalone 4.02, available at https://github.com/psipred/psipred, accessed on 1 April 2022;

2. RAPTOR-X, standalone Version ID: Rev: 37223, available upon request at http: //raptorx.uchicago.edu/download/, accessed on 1 April 2022;

3. PORTER, standalone v5, available at https://github.com/mircare/Porter5/, accessed on 1 April 2022;

4. SPIDER, standalone v3, available upon request at https://sparks-lab.org/downloads/, accessed on 1 April 2022;

5. FESS, standalone 2.0 (November 2016), available upon request at http://old.protein. bio.unipd.it/download/, accessed on 1 April 2022.

3.6.5. Solvent-Accessibility Prediction (Sequence-Based):

1. RAPTOR-X property, standalone v1.01 (October 2018), available upon request at http://raptorx.uchicago.edu/StructurePropertyPred/predict/, accessed on 1 April 2022;

2. PaleAle, standalone 5.0 (December 2019), available at https://github.com/mircare/ Brewery, accessed on 1 April 2022;

3. SPOT-1D, standalone (July 2019), available upon request at https://servers.sparkslab.org/downloads/, accessed on 1 April 2022.

3.6.6. Structural-Disorder Prediction (Sequence-Based):

1. RAPTOR-X property, standalone v1.01 (October 2018), available upon request at http://raptorx.uchicago.edu/StructurePropertyPred/predict/, accessed on 1 April 2022;

2. UPred2A, standalone (November 2019), available upon request at https://iupred2a. elte.hu/download\_new, accessed on 1 April 2022;

3. DISOPRED, standalone v2 and v3.1, available at https://github.com/psipred/ disopred, accessed on 1 April 2022;

4. SPOT-Disorder2, standalone (February 2019), available upon request at https: //sparks-lab.org/downloads/, accessed on 1 April 2022;

5. VSL2, standalone (November 2019), downloaded from http://www.dabi.temple. edu/disprot/download/VSL2.tar.gz (not available now), accessed on 1 April 2022;

6. PONDR-FIT, web server available at http://original.disprot.org/pondr-fit.php, accessed on 1 April 2022;

7. PONDR-VLXT, web server available at http://www.pondr.com/, accessed on 1 April 2022.

3.6.7. Contact-Map Prediction (Sequence-Based):

1. RAPTOR-X Contact, web server available at http://raptorx.uchicago.edu/ContactMap/, accessed on 1 April 2022;

2. ResPRE, standalone (November 2019), available at https://zhanglab.ccmb.med. umich.edu/ResPRE/download/ResPRE.zip, accessed on 1 April 2022;

3. SPOT-Contact, standalone v3 (June 2007), available upon request at https://sparkslab.org/downloads/, accessed on 1 April 2022.

The raw data collected from the third-party software were parsed utilizing custom python/bash algorithms to provide the unified format and derive the consensus properties. The output files are available at https://biapss.chem.iastate.edu/download.html, accessed on 1 April 2022.

#### **4. Conclusions**

In conclusion, many proteins undergo liquid–liquid phase separation (LLPS), which drives the biogenesis of various membraneless organelles. The interplay between the protein sequence and the LLPS potential is poorly understood. The BIAPSS web platform, which provides the means for the analysis, visualization, and interpretation of data for LLPS proteins, is designed to uncover the sequence-encoded signals of LLPS proteins. This BIAPSS platform stands out as an efficient and user-friendly visualization framework that facilitates the integration and comparison of the physicochemical and structural features of the vast majority of known phase-separating proteins. With the rapid growth of experimental data on a single-case basis, we expect the increased need for computational infrastructure that consolidates some generalized insights. Hence, we have also developed a feature-rich module for analyzing multiple protein sequences. The interactive interface, with content-rich labels and tooltips, makes data exploration and interpretation easy. Both the web applications and raw datasets are broadly accessible on multiple operating systems and popular browsers. The presented case study of FUS shows that the BIAPSS-inferred biophysical regularities accurately identify regions prone to phase separation and facilitate the design of precise sequence modifications for various applications.

While the current version of BIAPSS enables the convenient and insightful analysis of large nonredundant and high-quality LLPS protein supersets, there are several directions through which our platform could expand to discover unknown LLPS proteins. These include analyzing the physicochemical and structural properties for customized protein sequences and introducing several LLPS indicators trained with machine learning to reveal coupled effects. The new functionality will find applications for flexible sequence redesign to introduce or modulate phase separation. Among many beneficial uses, it could tailor the properties of modern biomaterials or open up new directions in the development of medical therapies.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms23116204/s1.

**Author Contributions:** Conceptualization, A.E.B.-D.; methodology, A.E.B.-D.; software, A.E.B.-D.; validation, A.E.B.-D., D.A.P. and V.N.U.; formal analysis, A.E.B.-D., D.A.P., V.N.U.; investigation, A.E.B.-D., D.A.P. and V.N.U.; writing—original draft preparation, A.E.B.-D., D.A.P. and V.N.U.; writing—review and editing, A.E.B.-D., D.A.P. and V.N.U.; visualization, A.E.B.-D.; funding acquisition, A.E.B.-D. and D.A.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Roy J. Carver Charitable Trust through the Iowa State University Bioscience Innovation Postdoctoral Fellowship (to A.E.B.-D.) and the National Institute of General Medical Sciences of the National Institutes of Health [R35GM138243 to D.A.P.].

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available in this article and in Supplementary Materials.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

## **References**


## *Review* **Phase-Separated Subcellular Compartmentation and Related Human Diseases**

**Lin Zhang 1,†, Shubo Wang 1,†, Wenmeng Wang <sup>1</sup> , Jinming Shi <sup>1</sup> , Daniel B. Stovall <sup>2</sup> , Dangdang Li 1,\* and Guangchao Sui 1,\***


**Abstract:** In live cells, proteins and nucleic acids can associate together through multivalent interactions, and form relatively isolated phases that undertake designated biological functions and activities. In the past decade, liquid–liquid phase separation (LLPS) has gradually been recognized as a general mechanism for the intracellular organization of biomolecules. LLPS regulates the assembly and composition of dozens of membraneless organelles and condensates in cells. Due to the altered physiological conditions or genetic mutations, phase-separated condensates may undergo aberrant formation, maturation or gelation that contributes to the onset and progression of various diseases, including neurodegenerative disorders and cancers. In this review, we summarize the properties of different membraneless organelles and condensates, and discuss multiple phase separation-regulated biological processes. Based on the dysregulation and mutations of several key regulatory proteins and signaling pathways, we also exemplify how aberrantly regulated LLPS may contribute to human diseases.

**Keywords:** liquid–liquid phase separation (LLPS); membraneless organelles; phase-separated condensates; human diseases

#### **1. Introduction**

To organize complex biochemical reactions in a cellular environment, cells create compartments, or organelles. A compartment needs a boundary to separate it from the surroundings, and the components within it are mostly able to freely diffuse, so that biological processes can take place inside [1]. Many compartments, such as the endoplasmic reticulum and Golgi apparatus, are organelles surrounded by lipid bilayer membranes. However, many other cellular compartments are not restricted by any membrane, such as nucleoli, Cajal bodies, PML nuclear bodies, stress granules and germ granules [2–6]. In a cell, these compartments harbor a variety of biomolecules with specific functions in a spatiotemporally controlled manner to ensure undisturbed biological processes and fulfill designated cellular functions [7]. In the past decade, accumulating studies suggest a physical process, known as phase separation, that can drive the assembly of these membraneless compartments. The concept that liquid—liquid phase separation (LLPS) may be generally involved in many cellular processes has been gradually uncovered and increasingly appreciated.

Phase separation is a common phenomenon in physics and chemistry: two liquids do not compatibly dissolve in a homogeneous liquid phase, resulting in a distinct phase–phase separation state. In other words, a uniformly mixed and supersaturated solution without further dispersion will spontaneously separate into a dense phase and a dilute phase that can stably coexist. The droplets or condensates produced by LLPS are different from

**Citation:** Zhang, L.; Wang, S.; Wang, W.; Shi, J.; Stovall, D.B.; Li, D.; Sui, G. Phase-Separated Subcellular Compartmentation and Related Human Diseases. *Int. J. Mol. Sci.* **2022**, *23*, 5491. https://doi.org/ 10.3390/ijms23105491

Academic Editor: Vladimir N. Uversky

Received: 22 April 2022 Accepted: 13 May 2022 Published: 14 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

ordinary droplets. For example, droplets composed of proteins and RNAs are not completely uniform, such as nucleoli with three layers regulating different stages of ribosomal biogenesis, but show the characteristics of liquid flow [8]. LLPS is quickly accepted as a key and general mechanism underlying the creation of biomolecular condensates that can promote the formation of membraneless organelles to regulate various cellular functions and activities [9]. However, phase separation is highly sensitive to altered physical and chemical conditions. For example, many protein condensates are regulated by environmental factors that determine the strength and valency of intermolecular interactions, including temperature, pH, salt concentration, component concentration and composition [10]. A molecule may need to reach a threshold concentration to initiate LLPS, and even a small difference in temperature and protein, nucleic acid or salt concentration can lead to distinct outcomes [11]. Moreover, the presence of crowding molecules, such as polyethylene glycol (PEG), dextran and ficoll, can greatly enhance the process of LLPS [12]. In compositional studies of different membraneless organelles, proteins and nucleic acids may utilize multivalent interactions to form phase-separated condensates with designated physical and chemical properties different from the originally uniform cellular environment. Many key regulatory proteins have been reported to undergo phase separation, of which the dysregulation has been etiologically associated with the onset and progression of many diseases, such as amyotrophic lateral sclerosis (ALS), Alzheimer's disease, Huntington's disease and different cancers [13,14]. In the current review, we summarize recent studies of phase-separation-mediated compartmentation, and discuss how aberrantly regulated LLPS causes human diseases, especially neurodegenerative disorders and cancers.

#### **2. Biomolecular Condensates**

Biomolecular condensates are commonly present in live cells, and they troubled scientists for many decades as they attempted to elucidate their formation and functions. Phase separation provides a mechanism for the formation of these condensates that separate or isolate different molecules with related activities in defined compartments. It has also been proposed that the ability to undergo LLPS may be a universal property of proteins and nucleic acids under specific circumstances [15–17].

#### *2.1. The Molecular Features of Biomolecular Aggregates*

Many studies indicate that phase separation requires the establishment of an interactive network through multivalent protein molecules that are composed of multiple modular interactive domains and/or contain disordered regions [18]. The interactions include charge–charge, cation–π, π–π stacking and hydrogen bonds, involving both side chains and backbones of the proteins. For example, Nephrin, Nck and Neural Wiskott–Aldrich syndrome protein (N-WASP) can be assembled into a highly ordered and multivalent protein complex through the interactions between phosphorylated tyrosines of Nephrin and SH2 domains of Nck, and between SH3 domains of Nck and proline-rich motifs of N-WASP [19–21].

The phase separation phenomenon has unique physical characteristics, including fluidity, fusion and fluorescence recovery after photobleaching when fused with a fluorescent protein. Meanwhile, the formation of droplets is generally both concentration- and valence-dependent. Intrinsically disordered regions (IDRs) are featured characteristics of many proteins with LLPS capability, and are often both necessary and sufficient for the formation of phase-separated condensates. IDRs usually have low complexity and contain homo-polymeric repeats of specific amino acids, such as glycine, serine, proline and glutamine, with strong self-sustaining aggregation potentials [22,23]. Recently, we reported that histidine clusters could decide the phase separation of several proteins, including YY1, HOXA1, FOXG1B, ZIC3 and HNF6 [24]. Several algorithms have been developed to help researchers predict IDRs in a protein [25,26]. However, not all highly scored sequences based on the prediction software could necessarily form phase-separated condensates [27]. Meanwhile, IDR mutations are causally related to various human diseases, such as cardio-

vascular disorders, cancers and neurodegenerative diseases [27,28]. Vacic et al. investigated about 100,000 annotated missense disease mutations and discovered that 21.7% of them were located in the IDRs [29]. Among these mutations, 20% led to disorder-to-order transitions, such as increased α-helical propensity, significantly higher than those of annotated polymorphisms and neutral evolutionary substitutions [29].

A classic example is the correlation between fused in sarcoma (FUS) mutations and neurodegenerative diseases, including ALS, essential tremor and rare forms of frontotemporal lobar degeneration [30]. FUS protein contains a prion-like domain that is intrinsically disordered and can form liquid compartments in both the nucleus and cytoplasm [31]. Multiple FUS mutants exhibit significantly reduced mobility and eventually cause prion-like propagation of proteinaceous aggregates in neurons and glial support cells, characteristic of ALS [32]. Another example is the MutL Homolog 1 (MLH1) protein that is essential in DNA mismatch repair. The residue V384 located in the disordered segment of MLH1 is the most common site of mutations. The mutant MLH1 (V384D) is associated with increased susceptibility to colorectal cancer and is prevalent in HER2-positive luminal B breast cancer [33,34]. Phase separation is also involved in the antiviral immune response against the novel severe acute respiratory syndrome (SARS) coronavirus 2. The nucleocapsid protein of SARS2 may undergo LLPS with RNA and subsequently reduced Lys63-linked polyubiquitination and aggregation of mitochondrial antiviral-signaling protein (MAVS), which suppresses the innate antiviral immune response [35].

#### *2.2. Materials Properties of Phase-Separated Condensates*

LLPS contributes to the assembly of different membraneless organelles with different functional commitments in cells [36]. Whether a macromolecule can undergo phase separation depends on its concentration and property, as well as environmental conditions, such as pH, temperature, salt type and concentration. Meanwhile, phase-separated condensates formed under a particular physiological circumstance are accessible to various, but also selective, molecules in cells. The condensation process through the LLPS mechanism is generally reversible with a mobile liquid-like dense phase, and constant exchanges between the dense and light phases. However, the phase-separated condensates are subject to further transitions, such as gelation to form hydrogel that is virtually irreversible under physiological conditions. Whether LLPS condensates remain in a liquid and mobile state or become gelatinous and even solidified are physiologically or pathologically relevant [1,22,37,38]. We have illustrated previously reported membraneless organelles with their subcellular localization and functions in Figure 1. Meanwhile, we also summarized their sizes, components, functions and related diseases in Table 1. Here, we discuss the formation, compositions and other properties of several membraneless organelles and key regulatory protein-mediated condensates in the context of human diseases.




#### **Table 1.** *Cont*.


#### **Table 1.** *Cont*.

PSG: Proteasome storage granule; HLB: Histone locus body; PML: Promyelocytic Leukemia; PcG: Polycomb group; CNB: SUMO-1 nuclear body; PNC: Perinucleolar compartment; OPT: OCT1/PTF/transcription.

**Figure 1.** Schematic diagram of membraneless organelles and their functions in a eukaryotic cell. **Figure 1.** Schematic diagram of membraneless organelles and their functions in a eukaryotic cell.

#### **Table 1.** Membraneless organelles and condensates assembled through the LLPS mechanism. 2.2.1. Stress Granules

250–4000

Cytoplasm

P-body

Stress gran-

Germ granule

body

ule — 1000–2000

P-granule, chromatoid body, polar granule

**Localization Name Alias Size (nm) Components Functions Diseases References**  GW-body, RNA processing body, decapping 100–300 K63, TRAF6, Tob1, TUT4, NoBody, LSM1, GW182, DDX3, DDX6, mRNA degradation, posttranscriptional gene silencing, response to stress, storage of translationally reviral infection, neurodegenerative diseases, autoimmune diseases. [39,40] Both stress granules and processing bodies (P bodies) are composed of RNA and protein molecules that drive the phase separation of these membraneless organelles. Stress granule formation is exclusively induced by stress signals imposed on the cells, while P bodies can be constitutively visible in many cell types, but their size and number may increase in response to stress [40]. Stress granules contain translation-initiation molecules,

> amyotrophic lateral sclerosis, frontotemporal lobar degener-

cancer, viral infec-

[5,41]

inflammatory dis-

Germ cell development [42]

ation,

tion,

eases

pressed mRNAs

stresses,

tion,

cell division

translational regulation, response to stresses, antiviral defense, response to

store mRNA and proteins

post-transcriptional regula-

regulation of Germ cell development and function,

XRN1, etc*.*

RBPs, non-RBPs, TDRD3, TDP43, G3BP1, eIF3, eIF4G, PABPC1, etc.

MEG-3, PGL, RNA, etc.

and P bodies harbor factors regulating mRNA degradation, but they share many common proteins related to RNA metabolism. Mechanistically, in response to certain stresses, translation initiation can be stalled and ribosomes will disassociate from mRNA, which is the so-called ribosome run-off phenomenon. The released mRNA binds to RNA-binding proteins (RBPs) that promote stress granule formation. The mechanism of constitutive presence of P bodies remains unclear, but the stress-induced retardation of translation preinitiation directly contributes to their increased size and number [40].

Dysregulation of stress granules and P bodies is causally related to different diseases. Stress granules are considered an adaptive response of cells to acute stress, and their formation, composition and life span are associated with cancers, heart diseases, neurodegenerative disorders, inflammatory diseases and viral infections [66]. The oncogenic process consists of hypoxia, ER stress and osmotic alterations that all constitute the signals to induce stress granule formation. Meanwhile, chemotherapeutic challenges can also induce the assembly of stress granules, which contributes to the development of chemoresistance and metastasis of cancer cells [66]. Thus, drugs, such as 15d-PGJ2 targeting the eukaryotic initiation factor 4A-I (eIF4A1) in the stress granules, can inhibit proliferation and induce apoptosis of leukemic and colorectal cancer cells [67]. Several neurodegenerative diseases are caused by dysregulated stress granules that generally exhibit increased formation or reduced dissociation of stress granules compared to in the cognate normal cells. In particular, genetic mutations of certain RNA-binding proteins may impair stress granule assembly and composition leading to neurodegenerative diseases. For example, an FMRP mutant with defective stress granule assembly represents an etiologic cause of the Fragile X syndrome with mild-to-moderate intellectual disability [68]. Mutations in other stress granule-associated RNA-binding proteins are also discovered in Alzheimer's disease patients [69]. In addition, the neurons of Alzheimer's disease patients exhibited pathological aggregates by the nucleation of the proteins in stress granules, such as TIA1/R and G3BP1 [69]. During viral infection, many viruses can use a special viral protease to cleave essential stress granule proteins, which can circumvent the cellular defense against viral infection [70,71].

#### 2.2.2. P Bodies

As another type of cytoplasmic ribonucleoprotein granules, P bodies are relatively understudied for their relevance to human diseases, although current evidence strongly suggests their involvement in neurodegenerative disorders, viral infection and autoimmune diseases. Mutations of DDX6 disrupt P body assembly, which is causally linked to intellectual developmental disorders with impaired language and dysmorphic facies [72]. In response to infection by RNA viruses, the number and stability of P bodies may change, and their components may be recruited to viral replication centers, although the underlying mechanisms remain unclear [73]. In addition, autoantibodies against P body components have been reported to contribute to autoimmune diseases [74,75].

#### 2.2.3. Nucleolus

The nucleolus is an important membraneless organelle consisting of ribonucleoproteins and RNAs is assembled in multilayers through the LLPS mechanism [55]. In the past century, the roles of the nucleolus in hosting RNA polymerase I-mediated transcription, ribosomal RNA (rRNA) modification and processing, and rRNA complex assembly have been gradually recognized. A nucleolus of a mammalian cell may contain several functional modules, each of which constitutes three subcompartments or layers. From the inner to periphery, the three layers include the fibrillar center, the dense fibrillar component and the granular component, responsible for different steps of ribosomal biogenesis [55]. The nucleolus is separated from other compartments of the nucleus; however, due to the membraneless status, the nucleolus harbors various contents that dynamically exchange with the remaining nuclear components. Therefore, nucleoli are important organelles for transient sequestration of crucial factors involved in various biological functions, including

the responses to genotoxic and oxidative stress, heat shock, starvation, oncogenic insults and viral infection [55,76]. These stresses may affect the shape, size and number of nucleoli, and the diseased states can markedly alter nucleolar morphology. Interestingly, despite the relatively isolated compartment and spatially distinct layers of each nucleolus, spontaneous coalescence may occur when two nucleoli have intimate contact, resembling droplet fusion during LLPS. Meanwhile, many nucleolar proteins contain IDRs, which are especially enriched by positively charged arginine and lysine residues [77,78].

Dysregulation of the nucleolus may aberrantly change nucleolar morphology, size and number per nucleus, and is tightly linked to various diseases. Excessive production of ribosomes by nucleoli may drive oncogenic transformation. On the other hand, defective activity of ribosome biogenesis may cause a shortage of properly formed ribosomes, and even cause aberrant nucleolar hardening, leading to reduced rRNA and ribonucleoprotein processing. These kinds of ribosomopathies may eventually cause different diseases, such as muscle atrophy and X-linked subtype of dyskeratosis congenita [79,80]. A hexanucleotide repeat GGGGCC (or G4C2) is present in an intron of the *C9ORF72* in chromosome 9, and its expansion can reach up to thousands of copies in ALS patients. Mechanistically, the expanded G4C2 sequence can generate arginine-containing toxic dipeptide repeats that promiscuously interact with the IDRs of RNA-binding proteins to form protein aggregates, and thus impair the dynamics of membraneless organelles, such us nucleoli, leading to the diseases [81]. In addition, the material state of the nucleolus is relevant to aging or longevity. Studies using *C. elegans* as a model revealed that both reduced rRNA production and knockdown of fibrillarin were associated with smaller nucleolar size and extended life span of the worm [82].

#### 2.2.4. Examples of Regulatory Proteins with LLPS Potential

Besides the reported membraneless organelles, many intrinsically disordered proteins, especially those with nucleic acid binding affinity, can form isolated compartments through the LLPS mechanism, and their dysregulation may undergo liquid-to-solid transitions, leading to various diseases [19,83].

The prion-like domains (PrLDs) have relatively low complexity, and are enriched in glycine and uncharged polar amino acids [84]. The PrLDs have been identified in about 240 human proteins, especially many RNA-binding proteins, such as FUS, EWSR1, TDF-43 and TAF15 that are etiologically related to several neurodegenerative diseases, including frontotemporal dementia and ALS.

The RNA-binding protein FUS has 526 amino acids and belongs to the FET (FUS, EWSR1 and TAF15) family. *FUS* was originally discovered to fuse with the *CHOP* gene, and the fusion oncoprotein promotes the development of round cell liposarcoma and myeloid leukemia [85]. In addition to an RNA-binding motif, FUS contains a highly conserved Cterminal nuclear localization signal (NLS) that may harbor various mutations discovered in patients [86]. The EWSR1 protein has a transcriptional activation domain at the N-terminus, and regulates gene expression, cell signaling, RNA processing and RNA transport. The chromosomal translocation between the *EWSR1* and *FLI* genes can produce an oncogenic fusion gene that accounts for about 90% of Ewing sarcomas [87].

Since the N-terminus of FUS contains the IDR, the FUS-CHOP fusion created more intensified nuclear puncta than FUS and CHOP alone, with incorporation of BRD4, a bona fide marker of super-enhancers. Similarly, LLPS is considered as a driving force for the *EWSR1-FLI* fusion gene to regulate transcription and initiate cell transformation [88].

#### *2.3. Regulation of Condensate Assembly*

The assembly and biophysical properties of LLPS condensates are precisely regulated by chaperone proteins, enzymes for post-translational modifications (PTMs) and other cellular factors [89].

#### 2.3.1. Effects of PTMs on Protein Phase Separation

Different PTMs, such as phosphorylation, acetylation, arginine methylation and SUMOylation that regulate protein–protein or protein–nucleic acid interaction strengths, are well-recognized key regulatory factors of phase separation. Furthermore, PTMs are engaged in the assembly and disassembly of condensates, as well as the regulation of their material properties. As a rapid and reversible process, phosphorylation is one of the most well-characterized PTMs modulating biomolecular phase transitions [90,91]. For example, in Alzheimer's disease, phosphorylation of Tau, a microtubule-associated protein, alters the charge distribution to promote its electrostatic interactions, leading to the formation of Tau aggregates [92]. Additionally, phosphorylation hinders tubulin assembly within Tau condensates. Previous studies indicated that neuronal loss and memory impairment were causally related to the presence of highly phosphorylated soluble Tau protein [93].

Phosphorylation of α-synuclein (α-syn) at Tyr39 (pY39) is enriched in patients with Parkinson's disease, and plays an important role in regulating the liquid–solid phase transition of α-syn [94]. pY39 can accelerate α-syn aggregation and inhibit its degradation through autophagy and proteasome pathways in cortical neurons. In general, α-syn phosphorylation may alter its fibril structure and exacerbate pathogenesis of Parkinson's disease [94,95]. As discussed above, FUS is a protein tightly related to neuronal degeneration diseases. FUS phosphorylation at its IDR could disrupt its phase separation and cytoplasmic aggregation, which reduces FUS-associated cytotoxicity [96], suggesting that FUS is a potential therapeutic target in the treatment of neurodegenerative diseases. In addition, the interactions between tyrosines in the IDR and arginines in the C-terminal regions of the FUS protein are crucial to its phase separation. The methylation of these arginines disrupts these interactions, leading to reduced FUS phase separation; however, hypomethylation of these arginines strongly promotes FUS phase separation and gelation, leading to the formation of immobile hydrogels stabilized by intermolecular β-sheets. The loss of FUS mobility causes impairment of neuron terminals and leads to the disease manifestation of frontotemporal lobar degeneration [97].

Polycomb repressive complexes (PRCs) are important regulators for gene repression during embryonic development and oncogenic progression [98]. In *C. elegans*, a polycomb protein SOP-2 functions as the counterpart of the human PRC1 complex to regulate *HOX* gene expression [99]. Qu et al. reported that SOP-2 contained an IDR and could form phase-separated droplets. Importantly, sumoylation at K453 and K594 SOP-2 could allow it to produce droplets with increased sizes and abundancy, and slightly improved internal mobility compared to the droplets formed by the unmodified protein [100]. Sumo-conjugation is likely essential for both phase separation and transcriptional regulation of SOP-2, because its sumoylation is required for both its localization into nuclear bodies and physiological repression of the *HOX* genes [101].

Phase separation-mediated formation of membraneless organelles is cell-cycle-dependent. Most membraneless organelles are dissolved when the nuclear envelope breaks down during mitosis, but are reformed as mitosis is completed. The kinase activity of DYRK3 plays an important role in dissolving several types of membraneless organelles during mitosis [102]. In fact, DYRK3 has been demonstrated to cause the dissolution of stress granules upon stress relief [103], and this activity is dependent on DYRK3's association with HSP90. In the absence of the heat-shock protein, the inactive DYRK3 either stays in stress granules or undergoes degradation [104].

#### 2.3.2. Effects of Chaperones on Protein Phase Separation

Molecular chaperones play a key role in the assembly of phase-separated condensates. The historically recognized functions of chaperones are their abilities to promote correct protein folding and subsequently prevent protein aggregation into nonfunctional structures. A number of recent studies have revealed the activity of molecular chaperones, including several heat shock proteins, to regulate phase separation [105].

Chaperones regulate protein–protein interplay and assist in protein folding through directly interacting with them in an energy-consuming manner [97,106]. Molecular chaperones, including many heat shock proteins, are extensively involved in the maintenance of intracellular protein homeostasis. Previous studies indicate the presence of different heat shock proteins in a variety of membraneless organelles, such as HSP40, HSP70, HSP90, etc. Gu et al. reported that classes I and II of the HSP40 proteins could undergo phase separation due to their contents of flexible regions enriched with glycine and tyrosine [107]. DNAJB1, a member of the class II HSP40 proteins, could form condensates in nuclear bodies. In response to stress, DNAJB1 can translocate into stress granules. Interestingly, when cophaseseparated with FUS, DNAJB1 can prevent FUS from forming amyloid fibrils in vitro and reduce aberrant FUS aggregation in cells [107]. As discussed above, hypomethylation of arginines in the C-terminus of FUS facilitates its phase separation and gelation. However, transportin 1 can serve as a chaperone protein of FUS to reduce its granule formation without affecting its methylation status, and eventually rescue attenuated protein synthesis caused by FUS aggregation in axon terminals [97].

As a canonical small chaperone, HSP27 localizes in stress granules. Due to the interaction with the IDR of FUS, HSP27 can reduce its LLPS. In addition, stress can induce HSP27 phosphorylation that subsequently promotes its co-phase separation with FUS. The presence of HSP27 can prevent FUS from forming amyloid fibrillar aggregates, and thus preserve its liquid phase [106]. Consistently, when mice of an Alzheimer's disease model were crossed with human *HSP27* transgenic mice, overexpressed HSP27 could rescue multiple neurodegenerative defects of the disease, including impaired spatial learning, increased neuronal excitability, reduced long-term potentiation, and widespread amyloid deposition in the brains [108].

As a histone chaperone, CAF-1 has LLPS properties and can form nuclear bodies through recruiting histone modifiers and other chaperones, which contributes to the establishment and maintenance of HIV-1 latency. Therefore, disruption of phase-separated nuclear bodies of CAF-1 can potentially reactivate latent HIV-1 to eradicate the viral reservoir caused by its latency [109].

#### *2.4. Functions of Phase-Separation Condensates*

LLPS have been reported to be involved in various biological processes and regulations. We summarize the LLPS-associated functions into the following four categories.

#### 2.4.1. Regulation of Biological Reactions

In cells, the coordinated processes of biochemical reactions benefit from both membranerestricted and membraneless organelles. The membraneless particles or condensates formed by LLPS are rich in selective proteins and nucleic acids, increasing their local concentrations and subsequently accelerating biochemical reactions [38].

Strulson et al. mimicked the intracellular compartmentalization by partitioning RNA in an aqueous two-phase system established by PEG and dextran. The RNA molecules could show up to 3000-fold enrichment in the dextran-rich phase, and compartmentalization could enhance the rate of ribozyme cleavage by 70-fold [110]. The histone locus body (HLB) is an evolutionarily conserved nuclear body with enriched protein and RNA factors required for histone gene transcription and pre-mRNA processing [111]. In this liquid-like compartment, many factors, such as FLASH and U7 snRNP, essential and constitutive components in HLB, exhibit greatly increased concentrations over the levels in the exterior cellular environment [112].

MicroRNAs (miRNAs) can promote mRNA degradation and/or block translation through targeting the 30 -UTRs. In this regulation, the formation of a miRNA-induced silencing complex (miRISC) consisting of multiple proteins is crucial to the miRNAmediated gene repression. AGO2 and TNRC6B are the core components of the miRISC. The glycine/tryptophan (GW)-rich domain of TNRC6B is an intrinsically disordered region that promotes phase separation through multivalent interactions with three tryptophan-binding

pockets in the PIWI domain of AGO2 [113]. The phase-separation process can enrich both AGO2 and TNRC6B in the condensates, and sequester RNAs to be degraded, which accelerates AGO2-mediated deadenylation of target RNAs.

In addition to the compartmentalizing phenomena discussed above, many other LLPSmediated membraneless organelles, such as Cajal bodies, nucleoli and PML bodies, can concentrate proteins and nucleic acids involved in different designated biological processes in a confined space, which can enhance both reaction rates and efficiency [114].

The LLPS may also provide a platform that allows nascent proteins to quickly associate with their functional partners, which may determine their activities and destinies. Ma et al. reported the membraneless TIS granules formed by an RNA-binding protein TIS11B, which could partially cover of the cytoplasmic side of the rough endoplasmic reticulum (ER) [53]. The integration of these TIS granules and the ER can generate subcellular compartments, termed as TIS granule-ER, or TIGER, that constructs a biophysically and biochemically distinct environment from the cytoplasm. The TIS granules can promote the association between the SET protein and membrane proteins to be translated, such as CD47 and PD-L1, through a mechanism that the 30 -UTRs of the mRNAs of the membrane proteins facilitate the interaction between SET and CD47 or PD-L1. As a result of the SET-binding, the cell surface expression of the CD47 or PD-L1 can be significantly enhanced, which determines the cell identity. This discovery revealed an exciting notion that protein functions can be regulated by the lengths of the 30 -UTRs. In other words, proteins with the same amino acid sequence but encoded by mRNA isoforms with alternative 30 -UTR lengths may have different functions or subcellular localizations [115]. Therefore, 30 -UTRs may act as a medium or scaffold to nurture nascent proteins, and qualitatively change their properties and fates. Noteworthily, it has been reported that over 50% of protein-coding genes can generate mRNA isoforms with alternative 30 -UTRs [116]. Whether the nurturing niche provided by the TIS granules or TIGER compartments can be generalized to the regulation of the functions, localizations or fates of other proteins, in addition to CD47 and PD-L1, is a very intriguing question and deserves future exploration.

#### 2.4.2. Regulation of Gene Expression

RNA polymerase II (Pol II) is responsible for the transcription of mRNAs and many noncoding RNAs, such as lncRNA and microRNAs. RNA Pol II has a highly conserved C-terminal domain (CTD) that contains 52 repeats of the YSPTSPS heptapeptide essential to polymerase activity [117]. The hyperphosphorylation of the CTD mediated by CDK9 can stimulate target gene transcription. As the kinase component of the positive transcription elongation factor b (P-TEFb), CDK9 can release the paused Pol II at a promoter periphery and facilitate its entry to the gene body, to achieve transcriptional elongation. CDK9 also regulates transcription termination through phosphorylating a Pol II-associated protein, SPT5, and promoting its interaction with the poly(A) site [118,119].

Lu et al. reported that a phase-separation mechanism is also critical for CTD hyperphosphorylation that activates RNA Pol II [117]. Despite the inclusion of a low-complexity region, the isolated CTD of RNA Pol II does not undergo phase separation by itself. However, the CTD can be trapped by the phase-separated condensates formed by the IDR of cyclin T1 that interacts with CDK7. Through this interaction, cyclin T1 compartmentalizes CKD7 and the CTD in restricted condensates to facilitate the hyperphosphorylation reaction of RNA Pol II. Additionally, the CTD can also bind to the low-complexity domains of transactivating proteins FUS, TAF15 and hnRNPA2 to form nuclear granules that promote transcription [120,121].

In the past few years, a rapid surge of studies has demonstrated that many transcription factors and coactivators are able to undergo phase separation that can help them create dynamic hubs, clusters or condensates to regulate target gene expression (Figure 2). Some of these condensates can be assembled into super-enhancers with many tandemly adjacent enhancers, each of which is typically 50 to 1500 base pairs in length [122]. The transcription factors, such as OCT4 and GCN4, harbor IDRs in their transactivation domains that can

undergo phase separation to form clustered enhancers or super-enhancers and activate gene expression [122]. Meanwhile, many coactivators, such as BRD4, MED1 and p300, act as key components of the enhancer complexes that drive the expression of the master genes to determine cell identity or promote oncogenesis [11]. As we recently reported, the transcription factor YY1 has an IDR featured with an 11-histidine cluster. Deletion of the histidine cluster or replacing it with 11 alanines abolishes YY1's ability to form nuclear puncta and even deprive its dominant nuclear localization. Through the phase-separation mechanism, YY1 compartmentalizes many coactivators, including p300, BRD4, MED1 and CDK9, to assemble clustered enhancers that activate *FOXM1* gene expression and contribute to mammary tumor formation in a mouse model [24]. ers and activate gene expression [122]. Meanwhile, many coactivators, such as BRD4, MED1 and p300, act as key components of the enhancer complexes that drive the expression of the master genes to determine cell identity or promote oncogenesis [11]. As we recently reported, the transcription factor YY1 has an IDR featured with an 11-histidine cluster. Deletion of the histidine cluster or replacing it with 11 alanines abolishes YY1's ability to form nuclear puncta and even deprive its dominant nuclear localization. Through the phase-separation mechanism, YY1 compartmentalizes many coactivators, including p300, BRD4, MED1 and CDK9, to assemble clustered enhancers that activate *FOXM1* gene expression and contribute to mammary tumor formation in a mouse model [24].

domains that can undergo phase separation to form clustered enhancers or super-enhanc-

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 12 of 24

**Figure 2.** Regulation of gene expression by the phase-separation mechanism. **Figure 2.** Regulation of gene expression by the phase-separation mechanism.

Another example is the coactivator YAP that can cause chromatin reorganization to activate its target genes. In this regulation, YAP forms phase-separated condensates to compartmentalize the transcription factor TEAD1 and other coactivators, such as TAZ. The YAP condensates in the nucleus consist of super-enhancers with an accessible chromatin structure [123]. Another example is the coactivator YAP that can cause chromatin reorganization to activate its target genes. In this regulation, YAP forms phase-separated condensates to compartmentalize the transcription factor TEAD1 and other coactivators, such as TAZ. The YAP condensates in the nucleus consist of super-enhancers with an accessible chromatin structure [123].

Signal transducer and activator of transcription 3 (STAT3) is a transcription factor regulating the expression of a variety of genes involved in different biological processes. As a key regulator in the anti-cancer immune response, STAT3 can be activated by various cytokines. Aberrant activation of STAT3 has been observed in many cancers, which serves as a bona fide target in cancer therapies [124]. Early studies indicated that tyrosine phosphorylation of STAT3 stimulated by interleukin 6 could cause its translocation into nucleus where STAT3 was activated, bound to the enhancer elements of target genes, and formed nuclear bodies. Thus, it was proposed that the STAT3 nuclear bodies could either be directly involved in activated gene transcription or serve as reservoirs of activated STAT3 [125]. Recent studies revealed that the biomolecular condensates formed by activated STAT3 exhibited LLPS properties, suggesting that the phase-separation mechanism contributes to STAT3-mediated gene activation [51,126]. Signal transducer and activator of transcription 3 (STAT3) is a transcription factor regulating the expression of a variety of genes involved in different biological processes. As a key regulator in the anti-cancer immune response, STAT3 can be activated by various cytokines. Aberrant activation of STAT3 has been observed in many cancers, which serves as a bona fide target in cancer therapies [124]. Early studies indicated that tyrosine phosphorylation of STAT3 stimulated by interleukin 6 could cause its translocation into nucleus where STAT3 was activated, bound to the enhancer elements of target genes, and formed nuclear bodies. Thus, it was proposed that the STAT3 nuclear bodies could either be directly involved in activated gene transcription or serve as reservoirs of activated STAT3 [125]. Recent studies revealed that the biomolecular condensates formed by activated STAT3 exhibited LLPS properties, suggesting that the phase-separation mechanism contributes to STAT3-mediated gene activation [51,126].

#### 2.4.3. Regulation of Viral Infection

2.4.3. Regulation of Viral Infection Many studies have demonstrated the regulatory roles of LLPS in both the viral life cycle and virus–host interactions [17,127]. Viral proteins with IDRs can promote the formation of membraneless compartments used for the replication of viruses. These compartments are enriched with specific proteins and nucleic acids, and serve as "viral Many studies have demonstrated the regulatory roles of LLPS in both the viral life cycle and virus–host interactions [17,127]. Viral proteins with IDRs can promote the formation of membraneless compartments used for the replication of viruses. These compartments are enriched with specific proteins and nucleic acids, and serve as "viral factories" for the replication, assembly and trafficking of viruses. The LLPS condensates are selective for the inclusion or exclusion of components to allow optimal viral production, and may also avoid

the defense of the host immune system. For example, cells infected by negative-strand RNA viruses, such as rabies virus (RABV), rotavirus, vesicular stomatitis virus (VSV), Ebola virus, measles virus, influenza A virus and respiratory syncytial virus (RSV), may form cytoplasmic LLPS condensates that allow all the ribonucleoparticle (RNP) components and viral RNAs to be synthesized inside and assembled into viral particles [128–134]. A report by Fouquet et al. revealed that the phosphoprotein P, essential for viral transcription and replication of RABV, could shuttle between the cytosol and the Negri bodies formed by the virus, leading to the recruitment of focal adhesion kinase (FAK) and HSP70, two cellular proteins with proviral activities [135].

Viral protein-mediated LLPS can interfere with the functions of host cells through two mechanisms, either regulating the expression of cellular genes or modulating the activities of cellular proteins. The oncogenic effects of Epstein–Barr virus (EBV) can be used as an example of the first mechanism. EBV is a human virus with potent activities to induce malignant transformation of infected cells through the activation of both viral oncogenes and cellular proto-oncogenes [136,137]. EBNA2 and EBNALP are two EBV-encoded transcription factors that form nuclear puncta using their IDRs, leading to the formation of super-enhancers on the promoters of the oncogenes MYC and RUNX3 to promote their transcription and subsequent oncogenesis [138]. In contrast, the functional interplays between viral and cellular proteins in the context of LLPS have been relatively understudied [17,127]. The formation of fibrillar aggregates by viral proteins may exert various effects on host cells, including inhibition of key cellular processes, such as such as necroptosis, and sequestration cellular transcription factors to block host cell RNA synthesis [17,139].

LLPS-related mechanisms not only mediate the impairments of infected cells caused by viruses, but also contribute to the defense system of host cells against viral infection. Human myxovirus-resistance protein A (MxA) is a cytoplasmic dynamin-family large GTPase with a molecular weight of about 70 kDa, and can be induced by 50- to 100 fold when cells are treated by type I and III interferons [140]. MxA associates with the endoplasmic reticulum and Golgi apparatus, and exhibits antiviral activity against several RNA and DNA viruses. A study by Davis et al. demonstrated that MxA formed metastable membraneless cytoplasmic spherical or irregular bodies, filaments, or reticula with variable sizes. Importantly, in VSV-infected cells, the nucleocapsid protein of the virus could blend with the MxA condensates in cells showing a concomitant antiviral phenotype [141]. Similarly, Mx1, the murine ortholog of human MxA, could also form nuclear condensates when being transfected into human cells. Interestingly, 20–30% of transfected cells also formed cytoplasmic giantin-based filaments, and these cells, but not the ones with only nuclear bodies, showed antiviral activity against VSV [142]. The mechanism underlying the antiviral effects of the cytoplasmic filaments formed by Mx1 remains unclear.

#### 2.4.4. Sequestration and Storage of Molecules

Cellular condensates work as compartments to selectively sequester biomolecules and stock them, which serves as an approach of resource conservation. For example, each proteasome consists of a catalytic core particle (CP) and a regulatory particle (RP). With yeast as a model, the proteasome holo-enzyme constituted by the CP and RP mostly stays in the nucleus in proliferative cells; however, in the quiescent state, they are transported into the cytoplasm and sequestered as protein condensates called proteasome storage granules (PSGs) [143]. The functions of PSGs include protecting yeast cells against stress and maintaining their fitness during aging [144]. When the cells exit quiescence, the PSGs will be disassembled and the proteasome will reenter the nucleus [145]. Furthermore, P bodies and stress granules are also able to sequester highly expressed mRNAs. Whether the stored mRNAs undergo translation or decay by individual cells in future can generate different phenotypes and improve their ability to withstand stress [146]. Meanwhile, P bodies and stress granules can also serve as protein quality control compartments that help cells to sequester misfolded proteins from the other cellular milieu [147].

Cellular condensates can also confiscate proteins to temporally curb their functions. For example, the death domain-associated protein (DAXX) is a chaperone of the histone H3.3 variant, and recruits HDACs to repress basal transcription [148]. Due to the interaction with PML, DAXX can be sequestered into the PML bodies to block its activity in repressing transcription, and sumoylation of PML is prerequisite for this process [148,149].

The nucleolus is a reputed storage apparatus in the nucleus and can sequester many regulatory proteins in response to different signals [150]. Many proteins involved in cell cycle progression, apoptosis and oncogenesis can be sequestered in nucleoli through different mechanisms. As a ubiquitination E3 ligase, MDM2 can be confined in nucleoli through its interaction with p14ARF or ATP molecules, which leads to p53 activation [151, 152]. Another E3 ligase, VHL, can also be sequestered in the nucleolus in response to reduced extracellular pH. This can prevent the ubiquitination and degradation of its substrate HIF in the presence of oxygen, and allow it to activate its target genes [153]. Other important regulatory proteins with reported nucleolar sequestration include MYC, hTERT and CDC14 [154–156].

#### **3. The Phase Separation of Proteins in Diseases**

Accumulating evidence suggests that aberrant assembly of condensates is associated with cancers [157]. Below, we employ several examples to discuss how dysregulated phase separation of key regulatory proteins may contribute to neurodegenerative diseases and cancers.

#### *3.1. LLPS and Neurodegenerative Diseases*

#### 3.1.1. FUS

As a multifunctional DNA- and RNA-binding protein, FUS has been reportedly involved in transcription regulation, RNA splicing, RNA transport and DNA damage repair [158]. The FUS protein has an N terminal PrLD that is intrinsically disordered and critical to its phase-separated condensation [120,159]. The RNA-recognition motif (RRM) of FUS can bind to RNA molecules that promote FUS phase separation. Two domains are involved in FUS nuclear localization. First, the three RGG (arginine-glycine-glycine) repeats, designated as the RGG3 domain, can transport FUS from cytoplasm to nucleus. Second, the C-terminal proline tyrosine (PY) domain is a PY-NLS that can also promote FUS's nuclear transportation, but it needs the assistance of the nuclear import receptor transportin, also known as karyopherin β2, to cross the nuclear pore complex [160,161]. Most FUS mutants showed impaired binding to the receptor transportin, leading to their increased cytoplasmic retention. The defective nuclear import of the FUS mutants causes their cytoplasmic aggregation in neuronal and sometimes glial cells, linked to disease pathogenesis, such as ALS [162].

ALS patient-derived mutations of G156E and R244C, located in or adjacent to the prion-like domain of the FUS protein, could convert its droplets to fibrous structures, which eventually form amyloid-like fibrillar aggregates and subsequently contribute to the protein misfolding diseases [31,163]. While the fusion of two adjacent wild-type FUS droplets could occur in seconds, the event would take many hours for the FUS(G156E) mutant [31]. Interestingly, the fibrillar aggregates of FUS(G156E) could act as seeds to efficiently induce the aggregation of wt FUS [163]. Both wt FUS and the G156E mutant could produce similar condensates in cells; however, in a rat model, FUS(G156E) mutant could create intranuclear inclusions in hippocampal neurons with cytotoxicity, likely due to the defects in regulating translation and RNA splicing [163].

It has been reported that the methylation of the arginine in front of the PY-NLS reduced FUS binding to the receptor transportin, and thus caused its cytoplasmic accumulation [164]. Interestingly, the arginine methylation of FUS also decreases its phase separation and stress granule association. Therefore, the NLS mutations of FUS in ALS patients not only weaken transportin-mediated nuclear import, but also abolish its arginine methylation, which promotes phase separation and stress granule formation of FUS [165]. Besides methylation,

the PrLD of FUS can be phosphorylated by DNA-PK. The phosphorylated FUS protein exhibits reduced FUS phase separation and subsequently decreased aggregation tendency, which can ameliorate FUS-associated cytotoxicity [96].

#### 3.1.2. Tau

In 1975, Weingarten et al. isolated Tau as a protein essential for microtubule assembly [166], and the subsequent studies indicated this microtubule-associated protein as a regulator of axonal outgrowth and transport in neurons. Tau aggregation leads to the formation of intracellular fibrillary deposits that have been recognized as a hallmark of various neurodegenerative diseases, including Alzheimer's disease, frontotemporal dementia and Parkinson disease, with a common name of tauopathies [167,168]. The intrinsically disordered property and phase-separation potential of Tau can be attributed to its high content of proline and glycine, and many polar and charged amino acids. The LLPS propensity of Tau is primarily controlled by the proline-rich domain in its middle region, which also contains many phosphorylation sites. Tau is a protein that harbors different posttranslational modifications, including phosphorylation, acetylation, glycosylation, glycation and ubiquitination [169]. Some of these modifications have been demonstrated to impact the LLPS of Tau through altering its net charge, conformation and interactions with other molecules. Hyperphosphorylation of Tau can promote the maturation of its condensates into insoluble amyloid-like fibrils contributing to the diseases [170]. Lysine residues are crucial for the LLPS of Tau, and thus their acetylation mediated by p300 and CBP can reduce its interaction with RNA and reverse its condensation [171]. Despite the repressive effects of acetylation on LLPS-mediated aggregation, acetylated Tau is associated with neurotoxicity because it shows dampened interaction with tubulin and impaired ability to promote the growth of microtubule filaments [172].

#### 3.1.3. TDP-43

TDP-43 was initially identified as a protein binding to a regulatory element in the long terminal repeat of HIV-1 and blockint the assembly of its transcription complex [173]. Other studies also revealed TDP-43 as an essential DNA/RNA-binding protein regulating RNA splicing [174]. Among ALS patients, 90–95% are sporadic, with mutations in the genes *C9ORF72*, *SOD1*, *FUS*, etc. Strikingly, about 97% of these ALS patients and 45% of FTLD patients exhibited TDP-43 aggregation, implicating its pathogenic role in causing the motor neuron diseases [175]. TDP-43 is one of the PrLD-containing proteins that are prone to aggregation. Either pre-mRNA alternative splicing or aberrant proteolytic cleavage of the full-length TDP-43 can generate the PrLD fragment, suggesting its high potential in forming aggregates [176,177].

Posttranslational modifications play a regulatory role in TDP-43 condensation. Despite the predominantly nuclear presence, TDP-43 phosphorylation is associated with its cytoplasmic translocation, which can drive early pathology of the diseases [178]. Hyperphosphorylated TDP-43 tends to aggregate and generate inclusion bodies in the brains and spinal cords of the patients. Actually, phosphorylation of S409 and S410 has been considered a signature for ALS pathological analysis [179]. TDP-43 acetylation reduces its RNA-binding affinity and promotes accumulation of insoluble, hyper-phosphorylated TDP-43, which resembles the pathological inclusions observed in ALS and FTLD [180]. Additionally, ubiquitination of TDP-43 by its E3 ligase Parkin does not show clear degradationorientated effects, but instead causes its cytoplasmic accumulation to form insoluble aggregates [181]. In addition, TDP-43 aggregation is associated with its C-terminal domain consisting of a prion-like glutamine/asparagine-rich domain and glycine-rich region that drives LLPS [175,182].

#### *3.2. LLPS and Cancers* 3.2.1. SHP2

Src homology region 2 domain-containing phosphatase-2 (SHP2) is a non-receptor protein tyrosine phosphatase (PTP), encoded by the *PTPN11* gene. SHP2 contains two SH2 domains, a central PTP catalytic domain and a C-terminal tail. The two SH2 domains, C-SH2 and N-SH2, serve as phospho-tyrosine-binding regions to interact with the substrates [183]. As a ubiquitously expressed protein, SHP2 regulates many signaling pathways involved in mitogenic activation, metabolic control, and transcription regulation [184]. Germline mutations of SHP2 accounts for 50% of Noonan syndrome and 90% of LEOPARD syndrome (i.e., Noonan syndrome with multiple lentigines) [185,186] cases. Somatic SHP2 mutations are significantly associated with different human malignancies [187].

The intramolecular interaction between the N-SH2 and PTP domains serves as a "molecular switch" to block the phosphatase activity of SHP2. This switch can be turned on by the N-SH2 domain binding to specific phospho-tyrosine sequences of upstream growth factor receptors and/or scaffold proteins, leading to SHP2 activation. Mutations of SHP2 may either abolish the autoinhibitory switch or impair its PTP activity, which cause either Noonan syndrome or LEOPARD syndrome, respectively [188].

E76 is the most frequently mutated site of SHP2 in human cancers and the mutations disrupt the inhibition of PTP domain by the N-SH2, while R498 mutations in SHP2's PTP domain are also commonly observed and associated with LEOPARD syndrome [189]. Interestingly, a recent study demonstrated that two disease-associated mutant proteins, SHP2(E76K) and SHP2(R498L), showed significantly increased tendency of droplet formation compared to the wild-type SHP2. Consistently, the two mutants also formed nuclear puncta in cells, but wild-type SHP2 did not [190]. However, unlike most previously reported proteins with LLPS capability, the SHP2 protein does not contain any IDR or repetitive multivalent modular domain. Interestingly, the catalytic PTP domain is also responsible for the phase separation of the SHP2 mutants. The mutations of N-SH2 enhance the PTP activity and subsequently promote ERK1/2 activation [190].

#### 3.2.2. YAP and TAZ

As downstream effectors of the Hippo signaling pathway, YAP (Yes-associated protein) and TAZ regulate many biological processes including cell proliferation, apoptosis and differentiation [191]. As transcription coactivators, unphosphorylated YAP/TAZ complexes can be translocated to the nucleus, and bind to the TEAD transcription factors that regulate the expression of several genes involved in cell proliferation and survival, such as MYC and BIRC5 [192]. In recent years, both YAP and TAZ have been demonstrated to undergo LLPS that plays an essential role in activating the expression of their target genes, subsequently promoting oncogenesis. The phase-separated condensates can help YAP and TAZ to compartmentalize transcription machinery, including BRD4, MED1, CDK9 and TEAD [193]. Noticeably, in the Hippo signaling pathway, LATS1/2 phosphorylate YAP at S172 and TAZ at S89 to increase their cytoplasmic retention, which can both inhibit the LLPS of YAP and TAZ and reduce their activity as coactivators [10,123,194]. It has been demonstrated that the Hippo pathway can be frequently inactivated through nonmutational mechanisms during oncogenesis [195], which may explain the consistent hyperactivation of the YAP and TAZ in various cancers [194].

#### **4. Conclusions and Future Perspectives**

In a cell, many different types of membraneless organelles or condensates existand provide relatively defined but still dynamic compartments for various biological reactions or material sequestration to occur in an undisturbed fashion. Biomolecules, biochemical reactions and various biological regulations are not present or happen in a chaotic or random manner in the complex cellular milieu. The concept of phase separation that regulates the formation of these compartments is likely a general mechanism to restrict biomolecules into particular compartments for designated biological activities. The questions concerning how

large biomolecules, especially proteins and RNAs, are self-organized and undergo LLPS, and how their phase-separation capability can be linked and contribute to their specific activities, have intrigued many researchers and attracted increasing interest. Through the research endeavors over the past two decades, we have gained extensive knowledge regarding the molecular features, assembly requirements and material properties of these membraneless organelles or condensates. We have also obtained many insights in phaseseparation-regulated biological processes, including biological reactions, resource storage or sequestration, and gene expression. Importantly, dysregulated LLPS of different proteins due to mutations or aberrant posttranslational modifications are causal causes of various human diseases, such as many neurodegenerative disorders and cancer. Despite the knowledge obtained from the reported studies related to LLPS in normal and diseased cellular conditions, many questions remain to be answered and fertile areas need to be explored. First, although IDRs are likely prerequisite elements for protein phase separation, there are still reported exceptions. Thus, how amino acid sequence and/or composition can precisely determine the LLPS properties of a protein needs to be further defined. Second, the sequences, secondary structures or other properties of nucleic acids involved in phase separation are still largely unexplored. Third, with neurodegenerative diseases as an example, the reasons accounting for the occurrence of phase-separated insoluble aggregates only observed in specific cell types deserve special investigation. Finally, we have only just begun to explore therapeutic applications that take advantage of the phase-separation mechanism for disease treatment.

**Author Contributions:** L.Z., S.W., D.L. and G.S. wrote the majority of the manuscript and prepared the figures and table. W.W. participated in writing a few sections of the manuscript, and extensively modified the figures. J.S. and D.B.S. contributed to preparing the scheme of the manuscript and extensively revised and worked on finalizing the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Fundamental Research Funds for the Central Universities (2572021BD03) to D.L., and the National Natural Science Foundation of China (81872293) to G.S.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Free Cholesterol Accelerates A**β **Self-Assembly on Membranes at Physiological Concentration**

**Mohtadin Hashemi 1,† , Siddhartha Banerjee 1,2,† and Yuri L. Lyubchenko 1,\***


**Abstract:** The effects of membranes on the early-stage aggregation of amyloid β (Aβ) have come to light as potential mechanisms by which neurotoxic species are formed in Alzheimer's disease. We have shown that direct Aβ-membrane interactions dramatically enhance the Aβ aggregation, allowing for oligomer assembly at physiologically low concentrations of the monomer. Membrane composition is also a crucial factor in this process. Our results showed that apart from phospholipids composition, cholesterol in membranes significantly enhances the aggregation kinetics. It has been reported that free cholesterol is present in plaques. Here we report that free cholesterol, along with its presence inside the membrane, further accelerate the aggregation process by producing aggregates more rapidly and of significantly larger sizes. These aggregates, which are formed on the lipid bilayer, are able to dissociate from the surface and accumulate in the bulk solution; the presence of free cholesterol accelerates this dissociation as well. All-atom molecular dynamics simulations show that cholesterol binds Aβ monomers and significantly changes the conformational sampling of Aβ monomer; more than doubling the fraction of low-energy conformations compared to those in the absence of cholesterol, which can contribute to the aggregation process. The results indicate that Aβ-lipid interaction is an important factor in the disease prone amyloid assembly process.

**Keywords:** Alzheimer's disease; amyloid aggregation; lipid bilayer; cholesterol; time-lapse AFM imaging; molecular dynamics

## **1. Introduction**

The self-assembly of amyloid β (Aβ) is a process that results in the production of neurotoxic oligomer and fibrillar aggregates in Alzheimer's disease [1,2]. Understanding the mechanism by which these aggregates are formed has been the major focus of research in Alzheimer's disease and other fatal neurodegenerative diseases [3,4]. However, in the majority of in vitro studies, the Aβ concentrations used are several orders of magnitude higher than the physiologically relevant concentrations [5,6]; no aggregation is observed at the physiological low nanomolar concentration of Aβ. This suggests that the aggregation of Aβ in vivo utilizes pathways different from those probed by in vitro experiments.

Recently, an alternative aggregation mechanism has been discovered, allowing for the aggregation to occur at the physiologically relevant concentrations of Aβ [7,8]. This is the on-surface aggregation pathway, in which interactions with a surface act as a catalyst for the aggregation process. The model for the on-surface aggregation process suggests that the self-assembly of Aβ oligomers is initiated by the interaction of amyloid proteins with the cellular membrane. The membrane catalyzes amyloid aggregation by stabilizing an aggregation-prone conformation.

**Citation:** Hashemi, M.; Banerjee, S.; Lyubchenko, Y.L. Free Cholesterol Accelerates Aβ Self-Assembly on Membranes at Physiological Concentration. *Int. J. Mol. Sci.* **2022**, *23*, 2803. https://doi.org/10.3390/ ijms23052803

Academic Editor: Andrea Cavalli

Received: 11 February 2022 Accepted: 1 March 2022 Published: 3 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Cell membranes consist of a large variety of lipids, suggesting that numerous factors may contribute to the on-membrane aggregation of amyloids. Indeed, recent publications revealed the role of such lipids as cholesterol (Chol), sphingomyelins, and gangliosides on the formation of Aβ fibrils on membrane surfaces [9–11]. A very recent publication [12] demonstrated that Chol in the lipid bilayer significantly enhances the aggregation of Aβ(1-42) at nanomolar monomer concentration. Importantly, computer modeling showed that Aβ(1-42) has an elevated affinity to Chol-containing membranes, adopting a set of aggregation-prone conformations. These studies led to an aggregation model with membranes playing a critical role in triggering the aggregation process and hence, the disease state. Within this model, the membrane composition is a factor controlling the aggregation process, so a change in membrane composition can shift the ratio between monomeric and aggregated states of Aβ. This hypothesis is further strengthened by the data regarding the contribution of Chol, sphingomyelins, and gangliosides to the neurotoxicity of Aβ aggregates [13–15], which also highlights these lipids as prime candidates for possible disease defining parameters.

While phospholipids are the major constituent of the cellular lipid bilayer, Chol is the second most abundant lipid and provides stability to the cellular membrane. Importantly, recent findings show higher level of plasma Chol in Alzheimer's disease patients compared to healthy controls [16]. Furthermore, Chol has been identified to be present in plaques in a 1:1 ratio with Aβ [17,18]. Other studies revealed that feeding a Chol-enriched diet to rats resulted in the enhancement of APP, Aβ, and p-tau in the cortex region, which was associated with cognitive problems [19]. In a different study, it was observed that a Chol-rich diet increased the brain Chol level and resulted in motor function impairment [20]. Furthermore, neuronal Chol content has been linked with age, with higher Chol concentration being found in mature neurons compared to younger [21]. Together these results clearly connect Chol with disease development; however, the molecular mechanism of how Chol affects disease development remains unknown.

Aggregates extracted from patient brains have revealed the existence of oligomerlipid ensembles, pointing to possible direct interaction of free lipids with Aβ [22,23]. Additionally, recent studies [24] have reported assemblies of Aβ(1-42) monomers with Chol. These reports lead us to posit that free lipids affect the aggregation of amyloid proteins. Here we tested the hypothesis on the role of free Chol in the aggregation of Aβ, at the physiologically relevant nanomolar concentration. Time-lapse Atomic Force Microscopy (AFM) was applied to monitor the *in-situ* formation of Aβ(1-42) aggregates on supported lipid bilayers in the presence of free Chol. These studies revealed that Aβ(1-42) aggregates are formed more rapidly on the lipid bilayer in presence of free Chol. Furthermore, the aggregation kinetics of Aβ in the presence of free Chol is greatest on bilayers containing Chol. Moreover, in the presence of free Chol, aggregates accumulate more rapidly in the bulk above the membrane bilayer. Altogether, these studies revealed a critical role of free Chol on the disease-prone aggregation of Aβ(1-42), suggesting that Chol can be a trigger of the aggregation process.

#### **2. Results**

#### *2.1. Rapid Appearance of Aggregates in Presence of Free Cholesterol*

The role of free Chol in the aggregation of Aβ(1-42) was investigated on a supported lipid bilayer surface. Briefly, a mixed lipid bilayer (PC-PS), containing 1-palmitoyl-2-oleoylglycero-3-phosphocholine (PC) and 1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-L-serine (PS), was prepared as described earlier [12]. Then, 10 nM Aβ(1-42) monomer solution with and without 100 nM Chol was deposited on the bilayer and time-lapse AFM imaging was performed to visualize the on-surface aggregation process.

Figure 1a shows the lipid bilayer surface before the addition of Aβ solution. The surface is smooth and homogeneous, with no aggregate-like features or trapped vesicles, which is critical for monitoring the on-membrane aggregation events [25–27]. Aggregates were detected 1 h after the addition of the Aβ solution and continued growing in numbers in the subsequent time-points of 3 h and 5 h (Figure 1b,c). To quantify the aggregation process, the volume of the aggregates, at each timepoint, was measured (Figure 1d). The plot shows that the mean aggregate volume increases as a function of incubation time on the PC-PS lipid bilayer.

**Figure 1.** Aggregation of 10 nM Aβ(1-42), in the presence of 100 nM Chol, on PC-PS lipid bilayer. (**a**) AFM image of the bilayer surface before addition of Aβ(1-42)-Chol solution. (**b**,**c**) AFM images of the same area of the lipid bilayer 3 h and 5 h after addition of Aβ(1-42)-Chol solution. (**d**) Evolution of Aβ(1-42) aggregate volume with time. (**e**) Comparison of Aβ(1-42) aggregate volumes after 5 h incubation in the presence of PC-PS bilayer and PC-PS bilayer with Chol in solution. The volume of aggregates is significantly larger (*p* < 0.0001, *t*-test) in presence of free Chol.

As a control, we performed aggregation experiments by incubating 10 nM Aβ(1-42) on the PC-PS bilayer without Chol in solution. Comparison of the volume of aggregates formed after 5 h incubation, with and without Chol present in the solution, is shown in Figure 1e. It is evident that aggregates are significantly larger when free Chol is present in the solution during aggregation, compared to only the Aβ(1-42) in solution.

#### *2.2. Acceleration of Aβ(1-42) Aggregation by Cholesterol inside Membrane*

To understand if the bilayer composition is important during aggregation with free Chol in solution, we assembled a mixed bilayer with Chol, PC-PS-Chol bilayer, and followed the aggregation of Aβ in the presence of free Chol on this bilayer. Representative time-lapse AFM imaging data are shown in Figure 2 and Figure S1. Initially, the bilayer surface is smooth, Figure S1a. Aggregates appear within 30 min of Aβ-Chol solution addition; a few are highlighted with white arrows in Figure S1b. After 2 h of incubation, the lipid bilayer surface shows a significant number of large aggregates (Figure 2a). Quantitative volume measurements for the two time-points show the change in aggregate size (Figure S1c,d). The aggregate size increased approximately 4 times, from ~65 nm<sup>3</sup> to ~272 nm<sup>3</sup> , between 30 min and 2 h.

**Figure 2.** Aggregation of 10 nM Aβ(1-42) on PC-PS-Chol bilayer. (**a**) AFM image of the PC-PS-Chol lipid bilayer after 2 h incubation with 10 nM Aβ42 and 100 nM Chol in the solution. (**b**) AFM image of similar aggregation experiment as (**a**), except the absence of 100 nM Chol in the solution. (**c**) Comparison of the on-bilayer aggregate volumes in the two aggregation experiments. Data is the mean value of aggregate volumes, obtained through Gaussian fits. Presence of free Chol significantly increases (*p* = 0.001, *t*-test) oligomer volume. (**d**) Comparison of the number of aggregates formed on the lipid bilayers in the presence and absence of Chol in solution; presence of free Chol leads to significantly more oligomers (*p* = 0.003, *t*-test). For (**c**) and (**d**) the error bars represent the standard error of the mean.

We then performed aggregation experiments with only Aβ(1-42) in solution in the presence of a PC-PS-Chol bilayer, Figure 2b. Visually it is evident that greater number of aggregates are present when free Chol is in the solution. Quantitative analysis of the two experiments shows that the volume as well as the total number of aggregates are significantly greater when Aβ(1-42) aggregates in presence of free Chol in solution, Figure 2c,d.

To validate the observations and to test whether Chol itself can form aggregate-like features on the bilayer surface, we performed time-lapse experiments on the PC-PS-Chol bilayer in presence of Chol only. Figure S2a shows a large area of the bilayer surface prior to addition of Chol solution. Figure S2b, shows a zoom of the same area after 2 h incubation with Chol solution. Figure S2b,c shows another area on the bilayer surface after 2 h incubation with Chol solution; there are no aggregates or aggregate-like features on the surface of the bilayer. These observations clearly demonstrate that the aggregates, which were observed on the bilayer surface, were indeed self-assembled Aβ(1-42) oligomers and that Chol inside the membrane works in synergy with free Chol, catalyzing the selfassembly of amyloid oligomers.

#### *2.3. Dynamics of Aβ(1-42) Aggregation in Presence of Free Cholesterol*

After 2 h aggregation of Aβ-Chol solution on the PC-PS-Chol bilayer, the surface is practically covered with aggregates, Figure 2a. However, at 3 h significantly fewer aggregates are observed, Figure S3a. While the number of aggregates become fewer with increased aggregation time, their volumes increase, Figure S3b–d. Volume measurements of the aggregates after 1 h incubation show, Figure S3b, that the aggregate volumes are centered around 74 nm<sup>3</sup> . As the aggregates become larger at 3 h, the distribution changes, and a peak around 293 nm<sup>3</sup> becomes prominent. Larger aggregates also appear, Figure S3c. At the 4 h incubation point the aggregates are significantly larger, with a peak around 397 nm<sup>3</sup> , Figure S3d.

Previous studies [12] have shown that aggregates are capable of dissociating from the bilayer surface. Aggregates in the presence of free Chol show similar behavior, and the findings suggest that the presence of Chol in the solution accelerates the dissociation of aggregates. This phenomenon was tested by characterizing the accumulation of aggregates in the bulk solution above the bilayer using AFM. In these experiments, 10 nM Aβ(1-42) with 100 nM Chol solution was incubated on top of PC-PS-Chol bilayer surface. At certain time intervals an aliquot was taken from the bulk solution above the bilayer, deposited onto APS-functionalized mica, and characterized using AFM imaging. The data is assembled in Figure 3. Aggregates, accumulated in the bulk solution above the bilayer, were detected after 3 h, Figure 3a, and become more prominent after 6 h, Figure 3b. At the same time, control experiments conducted with Aβ(1-42) and Chol without the bilayer present show a negligible number of aggregates, Figure 3c. Volumes of the aggregates were also analyzed and show that the average size of the aggregates increases over time, Figure 3d. These results show that the aggregates, which dissociate from the surface, do accumulate in the bulk solution, increasing the level of soluble aggregates. The data also show that, compared with the control experiments, in which 10 nM Aβ(1-42) and 100 nM Chol were incubated without the bilayer, the presence of the bilayer leads to statistically significant more accumulation of aggregates in the bulk solution.

**Figure 3.** Aβ(1-42) aggregate desorption from PC-PS-Chol lipid bilayer in presence of free Chol. (**a**,**b**) AFM images of aggregates from aliquots taken from the solution above the PC-PS-Chol bilayer while 10 nM Aβ(1-42) and 100 nM Chol was incubating. Samples were taken 3 h and 6 h after addition of Aβ(1-42)-Chol solution. (**c**) Comparison of aggregates after 3 h and 6 h incubation of Aβ(1-42)-Chol in the absence and presence of PC-PS-Chol bilayer. Presence of free Chol significantly increases number of desorbed oligomers, furthermore the increase from 3 h to 6 h time point is also significant (*p* = 0.009, *t*-test). (**d**) Comparison of aggregate volumes formed in presence of free Chol, depicted in (**c**).

#### *2.4. Computer Simulation of Interactions of Aβ(1-42) with Free Cholesterol*

We used all-atom molecular dynamics simulations to elucidate the interaction of free Chol with Aβ(1-42) monomers. Briefly, monomeric Aβ(1-42) was placed in an explicit water box, and NaCl ions were used to neutralize the system charge and keep the ionic strength at a physiologically relevant concentration, 150 mM. Aβ(1-42) was placed at 4 nm from a single Chol molecule. Dynamics of Aβ(1-42) without Chol was simulated as a control. Five replicas of each simulation system were run for 10 µs, yielding a cumulative simulation time of 50 µs for each system.

The Aβ(1-42) monomer shows a rough free energy landscape (FEL), calculated using dihedral principle component analysis of the concatenated dataset, when in the presence of a single free Chol molecule, Figure 4a. The FEL contains well-separated energy minima in three distinct areas, two small areas to the upper and lower left, and a single, large, rough area to the right. The 10 lowest energy minima are highlighted in Figure 4a, and the representative structure for each cluster of said minima are also presented, showing the Chol molecule. These 10 clusters represent ~45.6% of the conformations sampled during the simulation. The number of protein residues in contact with Chol plotted versus the simulation time are given for each individual simulation run in Figure S4a–e. It is evident that the Chol molecule does not simultaneously interact with many residues of Aβ(1-42) at any given time. In fact, the majority of interactions occur through contacts with single residues. Quantitative analysis of these data show that specific regions of Aβ(1-42) are more likely to interact with the Chol molecule, Figure 4b. The contact probability for each residue, based on the combined 50 µs dataset, shows that residues 10 through 14 are most

likely to interact with Chol, followed by residues 1–8 of the N-terminal region. Residues in the central hydrophobic region (CHC, residues 17–21) are also likely interaction partners, albeit with lower probability than the aforementioned regions.

**Figure 4.** MD simulation of Aβ(1-42) interacting with Chol. (**a**) Free energy landscape based on dihedral principal component analysis of cumulative 50 µs simulation of Aβ(1-42) interacting with Chol. The 10 lowest energy minima are highlighted and the representative conformation of the Aβ(1-42) is shown. Percentages indicate the fraction of conformations relative to total number sampled during the simulations. Blue sphere denotes the N-terminal. (**b**) Average contact probability between residues of Aβ42 and the Chol molecule.

Aβ(1-42) monomer, in the absence of Chol, shows a dramatically different FEL, Figure 5, in which the deepest energy minimum is isolated and dominates by number of conformations (~11.7%) while the rest of the minima are scattered around a very rough area. Furthermore, the 10 lowest energy clusters only represent ~19.9% of the conformations sampled during the simulations. Comparing the evolution of secondary structure for the different simulations, Figures S5 and S6, shows that in both systems the Aβ(1-42) monomer

is dominated by turn/bend conformations, with gradual increases in β-strand structure for each system. However, interactions with Chol seems to hinder the formation of long-lived β-strands, as in 3/5 of simulations β-strand appear and disappear more rapidly than in the control simulations without Chol, Figure S5 compared to Figure S6.

**Figure 5.** MD simulation of Aβ(1-42) monomer. Free energy landscape based on dihedral principal component analysis of cumulative 50 µs simulation of Aβ(1-42) monomer. The 10 lowest energy minima are highlighted and the representative conformation of the Aβ(1-42) is shown; colors indicate degree of fluctuation in structure, with red being highly conserved regions and blue being highly dynamic regions. Percentages indicate the fraction of conformations relative to total number sampled during the simulations. Blue sphere denotes the N-terminal.

#### **3. Discussion**

In our previous study, we have shown that the presence of Chol in the lipid bilayer facilitates aggregation of Aβ(1-42) leading to rapid formation of aggregates [12]. The number of aggregates formed in presence of Chol-containing bilayers was 6 times greater compared to the aggregates on bilayers devoid of Chol. These results revealed the critical role of Chol in the aggregation process. Here, we have shown that free Chol, in addition to Chol inside the lipid bilayer (PC-PS-Chol), has an accelerating effect on Aβ(1-42) aggregation.

Results unambiguously show that free Chol can further accelerate Aβ(1-42) aggregation, as the size and number of aggregates formed in presence of free Chol are greater compared to the experiments where it is absent (Figures 1 and 2). This enhanced effect of free Chol indicates the possibility of direct interaction between Chol and Aβ(1-42). Several studies have shown this type of direct binding, among them [28]. NMR studies have revealed Chol-binding regions of C99, which is the source of Aβ peptide generation due to the action of γ-secretase. The region encompassing residues 18–40 of Aβ(1-42) is observed to interact with Chol [29]. Furthermore, insertion studies of various length of peptide fragments such as Aβ(17-40), Aβ(22-35), Aβ(25-35) have shown that fragments containing residues 25–35 successfully penetrated the Chol containing monolayer [30].

The findings on direct binding of free Chol to Aβ monomers are in line with our all-atom simulations (Figures 4 and 5). Moreover, the energy landscapes qualitatively support the observation of increased dynamics in the Aβ molecule in the presence of Chol (Figure 4). The presence of Chol dramatically increases the sampling of the free energy landscape, but more importantly also increases the number of sampled low-energy conformations. The 10 lowest energy minima sampled by the Aβ(1-42) monomer, in presence of Chol, make up almost 46% of total conformations sampled during the 50 µs cumulative simulations. At the same time, in the absence of Chol, the 10 lowest minima make up almost 20% of the sampled conformations. This acceleration of conformational search may be the key for how Chol affects the aggregation. Indeed, comparing interactions with membranes with and without Chol showed that the Aβ(1-42) monomer experiences a similar increased sampling when Chol is present in the membrane [12]. Additionally, the affinity of the monomer to the membrane is also changed by Chol [12,31]. Furthermore, the simulations show that dimer formation on membranes with Chol inside occur almost 2X faster than on a similar membrane without Chol [12]. The effect of Chol on the free energy and conformational sampling has also been reported for Aβ dimers and trimers [32]. In addition to significant changes to the FEL, the authors also report that presence of Chol induces greater β-structure content in the dimers and trimers of the Aβ(1-42); they also report that dimer to trimer change in β-structure is also significant when Chol is present, going from 26% to 41% [32]. The discrepancy in fraction of β-structure secondary structure between monomer and oligomers can be explained by data obtained by Ono et al., in which different pure oligomers of defined sizes were compared [33]. They reported that oligomer size has a significant effect on the structure and that there is a significant alteration of the Aβ structure going from monomer to dimer.

Our results, demonstrating the accelerating effect of free Chol on Aβ(1-42) aggregation, directly suggest that interference or blocking of Chol-Aβ interaction may suppress spontaneous self-assembly of the protein and thereby reduce the early-stage toxic oligomers. Studies following this line of thought have shown promising results. Bexarotene, which binds to the Chol-binding domain of Aβ, poses a competition for Chol towards Aβ [34,35]. Treatment with nanomolar concentration of bexarotene prevented Aβ oligomer induced Ca2+ flux. These data indicate that the prevention of direct interaction of Chol with Aβ can significantly reduce the toxicity caused by the oligomers [34].

One of the important findings in the present study is the increased aggregate dynamics caused by the presence of free Chol (Figure 3 and Figure S3). The data shows that, although aggregates are rapidly formed on the surface, they are not firmly attached to the bilayer and can easily leave the surface spontaneously. This hypothesis is supported by a gradual accumulation of aggregates in the bulk solution above the membrane surface (Figure 3). These data clearly show that the bilayer surface, along with the presence of free Chol, can act as a highly efficient platform for producing oligomers, which then can either participate in further aggregation or act as toxic agents. Most notable, this efficient oligomer producing process occurs at physiologically low nanomolar concentrations of Aβ(1-42).

Another aspect of the oligomers formed in the presence of free Chol is their greater size compared to those formed in the absence of free Chol. Yasumoto et al. reported that low- (LMW) and high-molecular weight (HMW) oligomers use different pathways to damage neurons, with HMW being more neurotoxic and causing more direct damage to the membranes [36]. In particular, HMW oligomers caused significantly more membrane depolarization and impaired long-term potentiation. In the context of the current study, large oligomers, produced due to interactions with free Chol, that dissociated from the membrane surface may show similar mechanism of action as the HMW oligomers tested in the aforementioned study.

Overall, the present study shows that the presence of free Chol, along with inmembrane Chol, significantly accelerates the Aβ(1-42) aggregation. This process occurs at physiologically relevant conditions, including the low nanomolar protein concentration. These findings suggests that specific lipid-Aβ interactions are critical factors for the spontaneous formation of neurotoxic oligomers. These findings further extend our model on the critical role of membrane composition in the assembly of disease-prone amyloid aggregates [12]. Our new data suggest that free Chol facilitate the aggregation process of Aβ monomers. Importantly, there is a strong synergy between the in-membrane and free Chol in this membrane mediated catalysis of Aβ aggregation at physiologically relevant conditions. Note a recent publication [37], which found accumulation of free Chol in the brain for a neurovisceral Niemann-Pick type C (NPC) disease. These findings suggest that the effects of free Chol and other lipids may also be extended to other diseases. Further neurotoxic studies of nanoaggregates assembled on the membranes, in parallel with structural characterization of such aggregates, will pave the way for the development of novel diagnostic and therapeutic strategies for AD and can be extended to other neurodegenerative diseases associated with the formation of protein deposits.

#### **4. Materials and Methods**

#### *4.1. Materials*

Lipids were purchased from Avanti Polar Lipids, Inc. (Alabama, US). Aβ(1-42) was bought from AnaSpec (Fremont, CA, USA). Chloroform was procured from Sigma Aldrich Inc (St. Louis, MO, USA). The buffer solution that was used in this study is 20 mM HEPES, 150 mM NaCl, 10 mM CaCl2, pH 7.4. All other chemicals, unless otherwise specified, were procured from Sigma at analytical chemistry grade or better.

#### *4.2. Preparation of Supported Lipid Bilayer*

PC-PS-Chol lipid bilayer was prepared on mica substrate as mentioned in the previous publication [12]. Briefly, POPC, POPS, and Chol vesicles were prepared by sonicating the mixture for 45 min until the mixture became clear and then deposited onto freshly cleaved mica surface attached to a glass slide. The slide was then incubated at 60 ◦C for 1 h. After the incubation, the sample was allowed to reach room temperature and then gently rinsed with a buffer containing 20 mM HEPES, 150 mM NaCl, pH 7.4. The bilayer was then imaged immediately by AFM in liquid.

#### *4.3. Preparation of Aβ42 Protein Solution*

The method for preparing the Aβ42 stock solution was kept similar to our previous publication [12]. Briefly, lyophilized Aβ(1-42) was dissolved in 100 µL of 1,1,1,3,3,3 hexafluoroisopropanol (HFIP) at room temperature with sonication. The HFIP was then evacuated completely in a vacufuge. Anhydrous DMSO was then added to prepare the stock solution, which was then kept at −20 ◦C. The stock solution was diluted in the buffer solution to prepare working solutions at the necessary concentrations. Working solutions were used immediately and leftover was discarded.

#### *4.4. Time-Lapse AFM Imaging*

Time-lapse data were obtained using an MFP-3D instrument (Asylum Research, Santa Barbara, CA, USA). AFM imaging, in buffer medium, was carried out in tapping mode using the cantilever "E" of MSNL probes (Bruker, Santa Barbara, CA, USA). The typical resonance frequency of the cantilever in buffer was 7–9 kHz with typical spring constants of ~0.1 N/m. Scan speed was typically between 1 to 2 Hz.

At the start of each time-lapse experiment the lipid bilayer was imaged to ensure a homogenous and smooth surface, devoid of any unruptured vesicles. Aβ solution was then added, and time-lapse imaging commenced in the same area of the bilayer. The cantilever was parked after recording each frame to ensure that no damage to the lipid bilayer surface occurred due to scanning.

#### *4.5. AFM Data Analysis*

The presented AFM images have undergone minimal processing. Flattening was applied to the images (fitted with 1st order polynomial) with FemtoScan software (Advanced Technologies Center, Moscow, Russia). Grain analysis tool in the software was applied to measure the volume of the oligomers. The volume data were plotted as histograms using Origin Pro software (OriginLab, Northampton, MA, USA) and fitted with Gaussian distribution. The mean value of the oligomer volume for each time point was determined using the peak value of the distribution and the error bars represent the standard deviation, unless otherwise mentioned.

#### *4.6. Molecular Dynamics Simulations*

To investigate the interaction of Aβ(1-42) monomer with Chol, we placed an Aβ(1-42) monomer (conformation taken from [38]) at 4 nm center-of-mass (CoM) from a single Chol molecule, solvated the system in TIP3P water, neutralized with NaCl counter ions, and maintained a final NaCl concentration of 150 mM. Protein was described using the Amber ff99SB-ILDN force field [39], while Chol was described using the lipid17 force field (an extension and refinement of lipid14 [40]). A control system with only Aβ(1-42) monomer was also created in a similar manner. The systems were then energy minimized, heated to 300 K, and run for 500 ps as NVT ensemble. Production simulations were run as an NPT ensemble for 10 µs; simulations for each system were repeated five times for a total of 50 µs for each system. Simulations were performed using a 2 fs integration time step. The simulations employed periodic boundary conditions with an isotropic pressure coupling at 1 bar, a constant temperature of 300 K, non-bonded interactions truncated at 10 Å, and electrostatic interactions treated using particle-mesh Ewald [41]. Simulations were performed using the Amber18 package [42].

#### *4.7. Analysis of MD Trajectories*

AmberTools20 suite of programs [43], Carma [44], and VMD [45] were used to analyze the obtained simulation trajectories. Graphs and mathematical analyses were obtained using MATLAB (MathWorks, Natick, MA, USA).

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms23052803/s1.

**Author Contributions:** Y.L.L., S.B. and M.H. designed the project. S.B. performed the AFM experiments. M.H. performed and analyzed the molecular dynamics simulations. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by National Institutes of Health, grants GM096039 and GM118006 to Y.L.L.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data that support the findings of this study are available from the corresponding author upon reasonable request.

**Acknowledgments:** Anton 2 computer time was provided by the Pittsburgh Supercomputing Center (PSC) through Grant R01GM116961 from the National Institutes of Health. The Anton 2 machine at PSC was generously made available by D.E. Shaw Research. This work was completed utilizing the Holland Computing Center of the University of Nebraska, which receives support from the Nebraska Research Initiative. Authors thank Thomas D. Stormberg for proof reading.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland www.mdpi.com

*International Journal of Molecular Sciences* Editorial Office E-mail: ijms@mdpi.com www.mdpi.com/journal/ijms

Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Academic Open Access Publishing

mdpi.com ISBN 978-3-0365-8930-5