*2.1. Proteins That Are Overexpressed at High Temperatures Are Enriched in Electrostatically Charged Amino Acids and Depleted in Polar and Hydrophobic Amino Acids*

We grew *Arabidopsis* plants at 22 and 37 ◦C for 24 h, and performed microarray analyses to measure gene expression levels at the beginning of the experiment (*E*0,22 = expression at time 0 and 22 ◦C) and at the end of the experiment (*E*24,22 and *E*24,37). *E*0,22 strongly correlated with *E*24,22 (Spearman's rank correlation coefficient, *ρ* = 0.991, *p* < 10−200; Figure 1) supporting the robustness of our gene expression measures—the small differences between gene expression at both time points could be due to differences in gene expression during development and to measurement errors. The correlation between *E*24,22 and *E*24,37 was weaker (*ρ* = 0.897, *p* = 10−200; Figure 2), highlighting the effect of heat stress on the expression of many genes.

**Figure 1.** Correlation between gene expression levels at 22 ◦C at time 0 and at time 24 h.

**Figure 2.** Correlation between gene expression levels at 22 ◦C at time 24 h and at 37 ◦C at time 24 h.

For each gene with available probes (*n* = 20,491), we computed a response to heat stress (*R*) as the logarithm in base 2 of the ratio of expression levels at 37 and 22 ◦C (following formula 1). Genes with *R* > 0 are overexpressed at high temperatures, and genes with *R* < 0 are repressed. Genes with *R* > 1 (strongly overexpressed) are enriched in Gene Ontology biological processes "protein refolding", "protein folding", "chaperone cofactor-dependent protein refolding", "chaperone-mediated protein folding", "de novo posttranslational protein folding", "de novo protein folding", "cellular response to heat", "response to heat", "response to temperature stimulus", and "heat acclimation". They are also enriched in molecular functions "misfolded protein binding", "heat shock protein binding", "protein binding involved in protein folding", and "unfolded protein binding" (Tables S1–S3).

We observed a positive correlation between *R* and the fraction of charged amino acids (*ρ* = 0.146, *<sup>p</sup>* = 2.47 × <sup>10</sup>−98), and negative correlations between *<sup>R</sup>* and both the fraction of polar (*<sup>ρ</sup>* <sup>=</sup> −0.076, *<sup>p</sup>* = 1.72 × <sup>10</sup>−27) and hydrophobic (*<sup>ρ</sup>* <sup>=</sup> −0.084, *<sup>p</sup>* = 4.08 × <sup>10</sup>−33) amino acids (Figure 3). We next computed the correlation between *R* and the frequency of each amino acid separately. The correlation was significantly positive for all four charged amino acids (Arg, Asp, Glu, and Lys), negative for all hydrophobic amino acids (significant for Gly, Ile, Phe, Pro, and Val), except Met (for which the correlation was non-significantly positive), and negative for all polar amino acids (significant for Asn, Ser, Thr, Trp and Tyr), except for Gln, for which the correlation was significantly positive (Table 1). All these correlations remained significant after controlling for multiple testing (Table 1).

**Figure 3.** Correlations between response to high temperature (*R*) and the fraction of charged, polar, hydrophobic and disordered amino acids. Lines represent regression lines.


**Table 1.** Correlations between amino acid frequencies and response to high temperature.

*Int. J. Mol. Sci.* **2018**, *19*, 2276

Next, we compared the amino acid composition of proteins encoded by genes that are overexpressed (*R* > 0, *n* = 10,728) vs. proteins encoded by genes that are repressed (*R* < 0, *n* = 9763) at 37 ◦C. Overexpressed proteins were enriched in charged amino acids (median percent in overexpressed proteins: 24.32%; median percent in repressed proteins: 23.20%; Mann-Whitney's *U* test, *<sup>p</sup>* = 1.90 × <sup>10</sup><sup>−</sup>66) and depleted in both polar (median percent in overexpressed proteins: 29.54%; median percent in repressed proteins: 30.04%; *<sup>p</sup>* = 2.53 × <sup>10</sup><sup>−</sup>20) and hydrophobic (median percent in overexpressed proteins: 45.77%; median percent in repressed proteins: 46.43%; *<sup>p</sup>* = 6.56 × <sup>10</sup><sup>−</sup>21) amino acids. In almost perfect agreement with our correlation analyses, proteins encoded by overexpressed genes were significantly enriched in Arg, Asp, Gln, Glu, and Lys, and significantly depleted in Asn, Gly, Ile, Phe, Pro, Ser, Thr, and Trp (Table 2).


**Table 2.** Amino acid frequencies in overexpressed (*R* > 0) and repressed (*R* < 0) proteins at high temperatures.

*p*-values correspond to the Mann-Whitney's *U* test. *p*-values and *q*-values shown in bold face represent significant tests at α = 0.05 or *q* = 0.05.

Similar results were obtained when using a more stringent threshold to classify genes as overexpressed (*R* > 2, *n* = 826) or repressed (*R* < −2, *n* = 1214) at 37 ◦C. Overexpressed proteins are enriched in charged amino acids (median percent in overexpressed proteins: 25.30%; median percent in repressed proteins: 22.54%; *<sup>p</sup>* = 1.50 × <sup>10</sup><sup>−</sup>26) and depleted in both polar (median percent in overexpressed proteins: 29.74%; median percent in repressed proteins: 30.17%; *<sup>p</sup>* = 3.20 × <sup>10</sup>−8) and hydrophobic (median percent in overexpressed proteins: 45.20%; median percent in repressed proteins: 47.24%; *<sup>p</sup>* = 6.04 × <sup>10</sup>−11) amino acids. More specifically, overexpressed proteins are significantly enriched in Arg, Asp, Gln, Glu, and Lys, and significantly depleted in Asn, Cys, Gly, His, Ile, Phe, Pro, Thr, Trp, and Tyr (Table 3).


**Table 3.** Amino acid frequencies in highly overexpressed (*R* > 2) and highly repressed (*R* < −2) proteins at high temperatures.

*p*-values correspond to the Mann-Whitney's *U* test. *p*-values and *q*-values shown in bold face represent significant tests at *α* = 0.05 or *q* = 0.05.
