**2. Results**

To determine whether there might be a difference in the frequency of IDRs (defined as predicted IDRs ≥ 30 amino acid residues in length) in proteins encoded by adhesion-regulated genes (adsu, *n* = 1009) compared to other proteins (nadsu, *n* = 17,612), we calculated the percentage of IDRs in adhesion-related proteins for each IDR predictor (Figure 1, blue line) and compared it to the proportion of genes in the adhesion-regulated gene set (5.4%, Figure 1, red line). For all predictors, the proportion of predicted IDRs associated with the adhesion gene set exceeded the frequency expected based on the proportion of proteins in the set. For many predictors, Figure 1 also shows a tendency towards a larger number of longer IDRs in proteins encoded by the adhesion gene set at the expense of shorter IDRs.

**Figure 1.** Enrichment of intrinsically disordered regions (IDRs) in proteins encoded by genes that are differentially expressed in lymphoma cells upon adhering to stromal cells. The number (*n*) of IDRs (≥30 residues) for each predictor is shown as well as how the detected IDRs are distributed in relation to length. The number of IDRs in each size category is shown. The blue line shows the percentage of all IDRs encoded by adhesion-related genes (adsu) and non-adhesion-related genes (nadsu) that are associated with the adsu set, while the red line shows the percentage expected if IDRs are equally distributed between the adsu and nadsu sets.

To determine whether the enhanced frequency of IDRs in proteins encoded by adhesion-regulated genes was significant, we used a resampling approach to test whether the IDR frequency associated with the 1009 adhesion-regulated genes lay outside the distribution of frequencies generated by 1000-fold resampling of 1009 genes from the control gene set (*n* = 17,612). A *z*-score and associated *p*-value was generated for data from each predictor. As shown in Table 1 (adsu vs. nadsu), the enrichment of IDRs in proteins encoded by adhesion-regulated genes (adsu) was significant for all predictors.


**Table 1.** Intrinsically disordered regions are enriched in proteins encoded by down-regulated genes in lymphoma cells upon adherence to stromal cells.

\* mean of 1000 resamples of *n* proteins encoded by genes in nadsu or adsu\_up, where *n* = the number of genes in adsu or adsu\_down, respectively. Abbreviations: adsu (adhesion-regulated genes); nadsu (non-adhesion-regulated genes); adsu\_down (down-regulated adsu); adsu\_up (up-regulated adsu). # Predictors of intrinsic disorder that appear in the D2P2 database.

Next, we tested whether the enrichment of IDRs associated with the adhesion-regulated gene set could be ascribed to subsets of the adhesion-regulated genes. Comparison of proteins encoded by genes manifesting a greater degree of regulation (fold change ≥ 1.3) relative to the remaining regulated genes showed fewer IDRs in more highly regulated genes compared to less highly regulated genes for all predictors and with lower levels of significance compared to the comparison of regulated and non-regulated genes (data not shown). Thus, there is an enrichment of IDRs in adhesion-regulated genes but the enrichment is not related to the extent of their regulation. Comparison of the up-regulated subset (adsu\_up, change >1) relative to the down-regulated subset (adsu\_down, change <1), on the other hand, showed an enhanced enrichment of IDRs in proteins encoded by the adsu\_down subset compared to the enhancement levels in Figure 1, with high levels of significance (Table 1, adsu\_down vs. adsu\_up). Thus, the enrichment in IDRs in proteins encoded by adhesion-regulated genes is mainly associated with proteins encoded by down-regulated genes.

We next investigated whether the length of IDRs in proteins encoded by adsu genes tends to be longer than in other proteins (nadsu). As expected, IDR length is not normally distributed, as indicated by the consistently higher value of the mean compared to the median (Table 2), as well as tests of normality (data not shown). Thus, a Mann–Whitney test was used to test the significance of differences in IDR length between groups. Table 2 shows that some predictors (notably PV2, PrDOS and VSL2b) predict longer IDRs in proteins encoded by adsu genes that in other proteins (nadsu), but for other predictors the difference is less significant or lacking in statistical support. IDRs encoded by adsu\_down genes were significantly longer than IDRs encoded by adsu\_up genes for all predictors. Thus, IDRs in proteins encoded by genes that are down-regulated in adherent cells tend to be both more frequent and longer than IDRs in other proteins.


**Table 2.** Intrinsically disordered regions tend to be longer in proteins encoded by down-regulated genes in lymphoma cells upon adherence to stromal cells.

\* Mann–Whitney test; bold text = *p* < 0.05. Abbreviations: adsu (adhesion-regulated genes); nadsu (non-adhesion-regulated genes); adsu\_down (down-regulated adsu); adsu\_up (up-regulated adsu).

We next addressed how IDRs are distributed among the proteins encoded by the adsu\_down gene set in relation to proteins associated with the adsu\_up and nadsu gene sets (Table 3).

**Table 3.** Distribution of IDRs predicted by VSL2b in proteins encoded by genes that are differentially regulated in lymphoma cells upon interaction with stromal cells.


adsu\_down (down-regulated adhesion-regulated genes); adsu\_up (up-regulated adhesion-regulated genes); nadsu (non-adhesion-regulated genes).

The proportion of completely disordered proteins was higher for the adsu\_down sets than for proteins encoded by the other gene sets, as was the proportion of proteins containing at least one IDR. The median proportion of the protein sequences that were predicted as IDR was higher for the adsu\_down group, irrespective of whether all proteins were considered or only proteins containing IDRs. Table 3 shows data for the VSL2b predictor but other predictors generally produced a similar result, especially PrDOS and PV2. The frequency of IDR-containing proteins with different IDR proportions for the different gene sets is compared graphically in Figure 2A.

**Figure 2.** Relative frequency distributions of proportion of IDR per protein and length-normalized number of IDRs per protein for proteins encoded by adsu\_down genes in relation to adsu\_up and nadsu genes. IDR predictions were made using VSL2b. (**A**) Relative frequency distributions (Density) of IDR-containing proteins with different percent IDR content. The median position and value are shown in blue. (**B**) Relative frequency distributions (Density) of numbers of IDRs per IDR-containing protein, normalized for differences in protein length (IDR number per 1000 amino acid residues). The median position and value are shown in blue.

For proteins encoded by nadsu and adsu\_up, the relative frequency declines progressively as the proportion of IDR per protein increases. Contrastingly, a more even distribution of relative frequencies is seen for adsu\_down proteins, with relatively fewer low-IDR content proteins and an increased proportion of high-IDR content proteins. Interestingly, the protein length-normalized number of IDRs per protein is somewhat lower for proteins encoded by adsu\_down genes, compared to adsu\_up and nadsu genes (Figure 2B). Thus, the greater IDR content of adsu\_down encoded genes tends to be associated with fewer and longer IDRs when only IDR-containing proteins are analyzed.

To further investigate differences in IDR lengths between groups, we plotted the length of the longest IDR in each protein as a function of protein length to compare adsu\_down and adsu\_up encoded proteins (Figure 3).

**Figure 3.** Comparison of proteins encoded by down- or up-regulated adhesion-regulated genes with regard to longest IDR length per protein and protein length. IDR-containing proteins encoded by (**A**) down-regulated adhesion-regulated genes (adsu\_down) and (**B**) up-regulated adhesion-regulated genes (adsu\_up) are shown. Of the 14 proteins in (**A**) for which the maximum IDR length is greater than 1000 residues (above dotted line), 6 proteins (red text) were also found in the sets of 14 proteins with the longest IDRs predicted by the PV2 and PrDOS predictors.

adsu\_down encoded proteins are characterized by both longer protein length and longer length of the longest IDR (VSL2b). There are 14 adsu\_down encoded proteins with IDRs longer than 1000 residues and these are also among proteins with the longest IDRs for most other predictors (notably PrDOS and PV2). The IDR score profiles for the 6 proteins that are reproducibly found in the top 14 proteins with longest IDRs by the VSL2b, PV2 and PrDOS predictors (red text in Figure 3A) are shown in Figure 4.

Consistent with Figure 3A, most of the proteins are predicted to be disordered throughout most of their length. Some contain extended regions with close to maximal intrinsic disorder scores (e.g., ZC3H13), while others are characterized by fluctuating levels of intrinsic disorder (e.g., MKI67). Some proteins contain both patterns in different regions of the protein (e.g., BOD1L1). Many of the proteins have short regions that are predicted to be ordered and that could correspond to folded protein domains. The different types of predicted conformation could inform about mechanisms involved in the function of proteins encoded by down-regulated genes in relation to up-regulated genes (see Discussion).

**Figure 4.** *Cont*.

**Figure 4.** Examples of proteins with long IDRs. Proteins that are reproducibly found by the VSL2b, PV2 and PrDOS predictors in the set of 14 proteins with the longest predicted IDRs (red text in Figure 3A) are shown. The residue-by-residue intrinsic disorder score (VSL2b) is plotted as a function of residue number throughout the length of the respective proteins. The horizontal gridline at a score of 0.5 distinguishes regions predicted to be ordered (<0.5) or intrinsically disordered (>0.5).

#### **3. Discussion**

The main finding of this work is that proteins encoded by genes that are down-regulated in lymphoma cells upon adhering to stromal cells, typically found in microenvironments that increase cancer-cell survival, tend to have more frequent and longer regions of predicted intrinsically disordered conformation than proteins encoded by up-regulated genes or other expressed genes in the same cells. Our previous work has shown that many proteins encoded by down-regulated genes in adherent cells are involved in early stages of mitosis [6]. The present results complement this observation by suggesting that proteins encoded by the down-regulated gene set tend to function by mechanisms that are associated with intrinsically disordered regions. A secondary finding is that many of the proteins encoded by down-regulated genes are larger than proteins encoded by up-regulated genes.

Intrinsically disordered protein regions can be broadly divided into regions that are always disordered and disordered regions that form one or more ordered conformations in particular molecular environments, such as during coupled binding and folding interactions with partner proteins [8]. Some IDRs have been shown to bind partners in the disordered state via multi-valent interactions, mediated by short linear motifs that are distributed along the length of the IDR [5,9–11]. However, IDRs have other functions in addition to interaction with partners. One such function is mediation of phase transitions in cells that allow for compartmentalization of cellular regions in so-called "membrane-less organelles" that include nucleoli, nuclear speckles, P-bodies and chromatin [12–16]. These kinds of functional mechanisms might be associated with the IDRs that have consistently close-to-maximal prediction scores over extended regions of proteins encoded by down-regulated genes, as exemplified by some of the proteins in Figures 3 and 4.

The clearest example of a protein that is predicted to be maximally disordered throughout most of the protein sequence is ZC3H13. Interestingly, ZC3H13 is part of the WTAP complex, which is involved in RNA splicing and processing and is localized in nuclear speckles [17]. It is likely that such speckles result from phase transition processes and it is possible that the disordered region of ZC3H13 is important for speckle formation or ZC3H13 localization to the speckle. In fact, many documented types of so-called proteinaceous membrane-less organelles are located in the nucleus and include chromatin in addition to nuclear speckles, nucleoli and many other bodies [14]. The MDC1 protein

(Figures 3 and 4) contains a central region predicted to be completely disordered, flanked by less disordered/structured regions, which are known to mediate binding to several partner proteins at chromatin regions containing double-stranded DNA breaks [18]. Thus, MDC1 has been regarded as a "scaffold" protein responsible for spreading of DNA-repair factors over the damaged chromatin region and it is tempting to speculate that the central disordered region could play a role in phase-transitions. Other proteins in Figures 3 and 4 that have extensive regions predicted to be completely disordered and that work in a chromatin environment are YLPM1, involved in regulating telomerase activity, and BOD1L1, a protein that protects stalled DNA replication forks.

MKI67 is predicted to be disordered (with varying score) throughout almost its entire length (see Figure 4). Interestingly, MKI67 orchestrates formation of the perichromosomal layer, which coats the condensed chromosomes during mitosis in order to prevent chromosome aggregation [19]. In mitotic mammalian cells, the nuclear membrane and nucleolus are broken down and nucleolar proteins including the known phase-transition proteins, Nucleophosmin and Fibrillarin, that drive nucleolus formation in interphase cells [20], are also found in the mitotic perichromosomal layer. This fact, taken together with the RNA-binding activity associated with MKI67, suggests that the perichromosomal layer may be formed by phase transition phenomena. Interestingly, higher expression of MKI67 is a negative prognostic marker for MCL patients [21].

In the IDR class that conditionally adopts ordered conformations in some molecular contexts, the ordered conformations are characterized by varying degrees of "fuzziness", defined as the existence of a heterogeneous range of ordered conformations in the context of, for example, interaction with a single partner [22]. Many proteins that conditionally adopt ordered conformations contain pre-structure motifs (PreSMos), defined as short protein regions within IDRs that have a weak propensity for secondary structure formation leading to formation of unstable secondary structure elements in a minority sub-population of IDR-containing proteins [23]. PreSMos become stabilized during coupled binding and folding, and form part of the folded protein conformation that is seen in complexes with partner proteins. Protein regions encoded by down-regulated genes that show alternating sub-regions of higher and lower intrinsic disorder scores might correspond to these kinds of IDR since the short regions with lower intrinsic disorder scores may represent PreSMos. The CENPE and CENPF proteins are characterized by disordered regions interspersed with regions with lower disorder scores that could represent regions containing PreSMos. This would be consistent with the multiple interactions made by these proteins within the kinetochore structure that binds to the centromeric chromatin of chromosomes during mitosis. TNRC6A is a member of the GW182 family of scaffold proteins that are important for organization of proteins needed for RNA-mediated gene silencing and are found in P-bodies that are formed by a phase transition process [24].

Although somewhat speculative, the preceding sections suggest mechanisms by which some of the large proteins with large amounts of intrinsic disorder might contribute the propagation of lymphoma cells in suspension as well as how their down-regulation could lead to reduced proliferation of lymphoma cells adhered to stromal cells. Reduced proliferation is known to increase the survival of cancer cells during chemotherapy, which primarily targets proliferating cells [7,25]. Further, the cell cycle arrest that occurs in adherent MCL cells [26] would be expected to reduce the need for apoptotic responses and we previously showed that adherence to stromal cells is associated with up-regulation of anti-apoptotic genes [6].

We have shown that predicted intrinsic disorder can be used to interrogate proteins encoded by transcriptome data and that identification of gene sets encoding proteins with characteristic predicted disorder properties can provide information relevant for understanding the mechanisms underlying the functionality of groups of proteins. This approach complements the commonly used gene ontology analysis approach, which primarily gives information about the cellular components or processes that are characteristic for the function of protein sets. Both approaches provide information that can be used for hypothesis building and the design of further experiments.

In this work, we have only analyzed predicted protein disorder as a conformational characteristic. There are other predictors that could be used to expand the approach in the future and new predictors are continuously being developed as more is learned about how protein functionality is coupled to the conformational flexibility of proteins. Examples are the s2D predictor [27], which predicts secondary structure elements in relation to random coil regions, and Dynamine [28], which predicts the rigidity of the peptide backbone throughout protein sequences, as well as the ANCHOR [29] and MoRFpred [30] predictors, which predict protein interaction sites. More recently developed predictors include prediction of protein regions involved in phase transitions [31], prediction of decomposed residue-by-residue solvation free energy [32] and prediction of residue-by-residue compactness/secondary structure [33]. Thus, it is easy to see that a battery of predictors could be used to reveal many different conformational aspects of protein sets encoded by groups of differentially regulated genes identified in transcriptome data. Databases like the Database of Disordered Protein Prediction (D2P2) [34] or the more recently developed MobiDB [35], which contain collections of prediction data from different sources, will be useful tools for this purpose.

#### **4. Materials and Methods**

#### *4.1. Data*

Human protein regions predicted to be disordered and related data were downloaded from the publically available D2P2 database (available online: http://d2p2.pro/search/build) on 11 September 2017. Default options were used for the download except that "Genome" was set to "Homo sapiens 63\_37" and the "Limit to" option was set to "all". The downloaded data contained all predicted IDRs detected in a total of 917,132 features for each of 9 different intrinsic disorder predictors (Espritz\_Disprot, Espritz\_NMR, Espritz\_Xray, IUPred\_long, IUPred\_short, PV2, PrDOS, VL-XT, and VSL2b). See the D2P2 website (available online: http://d2p2.pro) or [34] for details. Mean fold-change transcriptome values for 1050 genes that show significantly altered transcript levels when Jeko-1 mantle lymphoma cells adhere to MS-5 stromal cells were taken from a recently published study from our group [6].
