*2.2. The Amino Acid Composition of Heat-Induced Proteins Is Not due to Covariation of Amino Acid Composition with GC Content, Gene Expression Levels, or Subcellular Location*

We considered whether our results could be affected by confounding factors. First, GC content is known to affect amino acid composition [24], and *R* significantly correlates with GC content (*ρ* = 0.088, *<sup>p</sup>* = 9.76 × <sup>10</sup>−37). Combined, these correlations alone might potentially explain the observed trends. To discard this possibility, we computed partial correlations between *R* and the frequency of each amino acid, while controlling for GC content, with very similar results. The correlation continued to be significantly positive for charged amino acids and significantly negative for polar and hydrophobic ones (Table 1). More specifically, the correlation was significantly positive for Arg, Asp, Gln, Glu, and Lys and significantly negative for Asn, Gly, Ile, Phe, Pro, Ser, Thr, Trp, Tyr, and Val. Both the negative correlation between *R* and Ala frequency and the positive correlation between *R* and Met frequency, which were initially not significant, became significant after controlling for GC content (Table 1).

Second, highly expressed proteins resemble proteins from thermophiles in their amino acid composition [25], and expression levels correlate with *R* (expression level at 22 ◦C: *ρ* = −0.156, *<sup>p</sup>* = 4.88 × <sup>10</sup><sup>−</sup>112; expression level at 37 ◦C: *<sup>ρ</sup>* = 0.241, *<sup>p</sup>* = 1.18 × <sup>10</sup>−268). To discard the potential confounding effects of expression levels, we computed partial correlations between *R* and the frequency of each amino acid, while controlling for expression levels, again with very similar results. When controlling for expression levels at 22 ◦C, *R* correlated positively with the frequencies of Ala, Arg, Asp, Gln, Glu, and Lys and negatively with the frequencies of Asn, Cys, Gly, His, Ile, Leu, Phe, Pro, Ser, Thr, Trp, and Tyr. When controlling for expression levels at 37 ◦C, *R* correlated positively with the frequencies of Arg, Asp, Cys, Gln, Glu, Leu, Lys, and Met and negatively with the frequencies of Ala, Gly, Ile, Phe, Pro, Thr, Trp, Tyr, and Val. In both cases, the positive correlations between *R* and the frequency charged amino acids and the negative correlations between *R* and the frequencies of polar and hydrophobic amino acids remained significant (Table 1).

Proteins locating to different parts of the cell differ in their amino acid compositions and in their response to heat stress ([26,27]; Table 4). To discard subcellular location as a confounding factor, we analyzed the correlation between *R* and the amino acid composition separately for proteins locating to 10 different subcellular compartments (Table 5). The correlation between *R* and the fraction of charged amino acids was positive in nine of the compartments, which represents a significant departure from the 50% expected at random (one-tailed binomial test, *p* = 0.011). The correlation was significantly positive for the cytosol, the plastid (the compartments with the higher number of known/inferred proteins), and the mitochondrion. The correlation between *R* and the fraction of hydrophobic amino acids was negative in eight of the compartments (one-tailed binomial test, *p* = 0.055), significantly negative in the plastid and the mitochondrion, and significantly positive in the nucleus. The correlation between *R* and the fraction of polar amino acids was negative in half of the compartments, and significantly negative in the cytosol and the nucleus. These results suggest that the enrichment of heat-induced proteins in charged amino acids and their depletion in hydrophobic amino acids are not a byproduct of covariation of both *R* and amino acid composition with subcellular location. The lack of significance in most of the individual correlations is probably due to the low number of proteins for which location information is available, ranging from 720 for the plastid to 63 in the peroxisome (Table 4), which is expected to greatly reduce the statistical power of our compartment-specific analyses. However, we note an exception: among nuclear proteins *R* exhibits a significantly positive correlation with the percent of hydrophobic residues (Table 5).

#### *2.3. Proteins That Are Overexpressed at High Temperatures Are Highly Disordered*

For each *Arabidopsis* protein, we computed the percentage of amino acids that belong to IDRs using IUPred [28]. This percentage correlates positively with *<sup>R</sup>* (*<sup>ρ</sup>* = 0.059, *<sup>p</sup>* = 4.93 × <sup>10</sup>−17; Figure 3). Genes that are overexpressed at 37 ◦C (*R* > 0) encode proteins that are more disordered than those that are repressed (*R* < 0), with median disorder percent of 19.19% and 16.51% for induced and repressed genes, respectively (Mann-Whitney's *<sup>U</sup>* test, *<sup>p</sup>* = 2.01 × <sup>10</sup>−35). The differences are more solid when comparing genes that are strongly overexpressed at 37 ◦C (*R* > 2) vs. those that are strongly repressed (*R* < −2), with percentages of median disorder of 21.54% and 11.51% for induced and repressed genes, respectively (Mann-Whitney's *<sup>U</sup>* test, *<sup>P</sup>* = 2.03 × <sup>10</sup><sup>−</sup>23).

In agreement with previous works [29,30], we found a positive correlation between GC content and the percent of disordered residues (*<sup>ρ</sup>* = 0.044, *<sup>p</sup>* = 2.84 × <sup>10</sup><sup>−</sup>10). In addition, GC content positively correlates with *<sup>R</sup>* (*<sup>ρ</sup>* = 0.088, *<sup>p</sup>* = 9.76 × <sup>10</sup><sup>−</sup>37), making it possible that the positive correlation between *R* and disorder might be due to the covariation of both parameters with GC content. The correlation between *R* and disorder, however, is significant, even after controlling for GC content (*ρ* = 0.055, *<sup>p</sup>* = 3.44 × <sup>10</sup><sup>−</sup>15).

Likewise, intrinsic disorder positively correlates with expression levels (at 22 ◦C: *ρ* = 0.040, *<sup>p</sup>* = 1.03 × <sup>10</sup><sup>−</sup>8; and at 37 ◦C: *<sup>ρ</sup>* = 0.072, *<sup>p</sup>* = 7.75 × <sup>10</sup>−25), in agreement with previous results in *Escherichia coli* [31], but in contrast with observations in yeasts [32,33]. Disorder, however, significantly correlates with *<sup>R</sup>* after controlling for expression levels (at 22 ◦C: *<sup>ρ</sup>* = 0.066, *<sup>p</sup>* = 4.64 × <sup>10</sup>−21; and at <sup>37</sup> ◦C: *<sup>ρ</sup>* = 0.043, *<sup>p</sup>* = 1.03 × <sup>10</sup><sup>−</sup>9).

Both intrinsic disorder and *R* substantially vary among proteins locating to different subcellular compartments (Table 4), thus raising the possibility that covariation of both factors with subcellular location may account for the observed enrichment of stress-induced proteins in IDRs. We analyzed the correlation between intrinsic disorder and *R* separately for proteins locating to 10 different subcellular compartments. The correlation was positive for eight of the tissues (significantly positive for the cytosol, endoplasmic reticulum, and the vacuole) and significantly negative for the nucleus and the plasma membrane (Table 5). These results indicate that the positive correlation between disorder and *R*, while generalized, does not apply to proteins locating to all compartments.



**Table 5.** Correlations between amino acid frequencies and response to high temperature among proteins of different subcellular locations.


*p*-values shown in bold face represent significant tests at *α* = 0.05.

#### **3. Discussion**

We show that *Arabidopsis* proteins whose expression levels increase at high temperatures (heat-induced proteins) are enriched in charged amino acids, and depleted in polar and hydrophobic amino acids, compared to heat-repressed proteins. The enrichment of heat-induced proteins in charged amino acids and the depletion in polar amino acids are trends that mirror those observed in the proteins of thermophilic prokaryotes. The observed enrichment of heat-induced proteins in electrostatically charged amino acids was expected, as such amino acids can engage in salt bridges, which usually increase protein thermostability [1–3]—it should be noted, nonetheless, that not all charged amino acids participate in salt bridges, and that not all salt bridges increase thermostability [34]. However, the depletion of heat-induced proteins in hydrophobic amino acids was not expected, as the proteins of thermophilic prokaryotes are usually enriched in such amino acids (e.g., ref. [35]).

Despite the overall observed trends (heat-induced proteins being enriched in charged amino acids and depleted in polar and hydrophobic amino acids), not all amino acids vary according to these rules. In particular, the frequencies of Cys (polar), His (polar), Ala (hydrophobic), Leu (hydrophobic), and Met (hydrophobic) do not correlate significantly with *R*, and Gln (a polar amino acid) is more frequent in heat-induced proteins than in heat-repressed ones (Table 1). The enrichment of heat-induced proteins in Gln is surprising, given its tendency to undergo deamination at high temperatures [36].

We show that the observed overall trends are not due to heat-induced genes/proteins being different in terms of expression levels, GC content or subcellular location. When controlling for these factors, however, the direction of the correlations for certain amino acids change (Table 1). Thus, the observed trends in amino acid composition are likely the result of adaptation of heat-induced and heat-repressed *Arabidopsis* proteins to high and low temperatures, respectively.

Burra et al. [13] predicted that the proteins of thermophilic prokaryotes should be enriched in IDRs, as intrinsically disorder proteins are often resistant to high temperatures [16–18]. However, contradicting their predictions, they observed that thermophiles often are depleted in IDRs, which may compensate for the disorder induced by temperature. Similar observations were made in both another proteome-level analysis [15] and an analysis of FlgM proteins from bacteria adapted to different temperatures [14]. In agreement with Burra et al.'s prediction, we observed that *Arabidopsis* heat-induced proteins are enriched in IDRs. Our results suggest that there are different ways in which ordered/disordered regions can promote thermostability.

The correlations described in the current work are moderate, albeit statistically significant. Several scenarios may account for the weakness of the correlations. First, amino acid composition and protein intrinsic disorder may be affected by factors other than temperature. Second, the difference between the temperatures used in this study (22 vs. 37 ◦C) is small compared to the differences between the optimal temperatures of psychrophiles, mesophiles, and thermophiles. Third, certain plant genes may have changed their patterns of response to heat stress during the recent evolutionary history of *Arabidopsis*. i.e., certain genes that are currently heat-induced may have been heat-repressed in the past, and certain genes that are currently heat-repressed may have been heat-induced in the past. As amino acid and disorder adjustment to temperature is expected to take a relatively long amount of time, such switches in expression profiles may have limited the adaptation of proteomes to temperatures. Fourth, the adaptability of plant proteomes to temperatures may be more limited than that of prokaryotic proteomes, e.g., due to the higher complexity of protein-protein interaction networks and the smaller effective population size of plants [37].

In summary, the amino acid composition of heat-induced proteins in *Arabidopsis* mirrors to some extent, but not completely, that of the proteomes of thermophilic prokaryotes. This indicates that protein adaptation to high temperatures takes place partly through similar molecular mechanisms in prokaryotes and eukaryotes. Our observations also indicate that adaptation of proteins at the level of amino acid composition and protein intrinsic disorder can be detected not only when comparing the proteomes of species adapted to very different temperatures, but also among the proteins of the same species with different temperature response profiles. These observations expand our view of how eukaryotic proteomes adapt to different temperatures.

### **4. Materials and Methods**

#### *4.1. Plant Material, Growth Conditions, and Experimental Design*

*Arabidopsis thaliana* Columbia ecotype seeds were sterilized with 70% ethanol for 20 min, 2.5% sodium hypochlorite (commercial bleach) with 0.05% Triton X-100 for 10 min, and finally, four washes with sterile dH2O. Seeds were placed onto Whatmann paper in Murashige and Skoog (MS) medium plates (Duchefa, Haarlem, The Netherlands). Plates were kept in the dark at 4 ◦C for 96 h for stratification, and incubated during 8 h in light at 22 ◦C to promote germination. Plates were transferred to darkness at 22 ◦C for 72 h. At this moment plates were either kept at 22 ◦C or transferred to 37 ◦C. Seedlings were harvested at 0 and 24 h with four biological replicates. Samples were frozen in liquid nitrogen and stored at −80 ◦C.
