2.2.2. CD

The far-UV (ultraviolet) CD spectrum of LrtA at pH 8.0 had minima at 222 nm and 210 nm (Figure 2B inset), suggesting the presence of helix- or turn-like conformations. Decomposition of the far-UV CD spectrum at pH 8.0, by using the k2d algorithm, available online at the DICHROWEB site [22,23], yields a 30% of helical structure, 17% of β-sheet and 52% of random-coil. However, as LrtA has 8 Tyr, 4 Phe, and 14 His residues (eight naturally-occurring residues and six in the purification tail, see Section 4), we cannot rule out the absorbance of aromatic residues at this wavelength [24,25].

The molar ellipticity, [-], at 222 nm showed a dumb-bell shape, with a maximum value at pH 6.0 (Figure 2B). These results suggest that there were changes in the secondary structure (or alternatively in the environment around aromatic residues [24,25]) above and below pH 6.0; interestingly enough, the changes at low pH mirrored those observed by intrinsic and ANS fluorescence (Figure 2A). Since the fluorescence results indicated that the environment around Tyr residues remained essentially unaltered until pH 9.0 (Figure 2A, filled circles), and taking into account the ANS-fluorescence (Figure 2A, blank circles), we can conclude that between pH 6.0 and 9.0, although the protein had a folded conformation (Figure S2A,B), either the secondary structure of the protein changed or, alternatively, the environment around some of the 4 Phe and 14 His residues in LrtA. We could not determine the p*K*<sup>a</sup> values corresponding to the titrations at the two sides of the curve due to the absence of acidic and basic baselines, respectively.

As it happened with the thermal denaturations followed by fluorescence, the transitions followed by the ellipticity at 222 nm did not show any sigmoidal behavior below pH 6.0 (Figure S7B), but above that pH there was an irreversible broad transition (Figure 2C, right axis, filled circles).

To sum up, the spectroscopic probes (intrinsic and ANS fluorescence, CD, and NMR) indicate that LrtA acquired a native, with well-folded regions (Figure S2) from pH 6.0 to 9.0.

#### *2.3. LrtA Showed an Irreversible Complex Unfolding Equilibrium*

As the thermal denaturations were irreversible (either followed by fluorescence or CD, Figure 2C and Figure S7), we tried to determine the conformational stability of LrtA by using GdmCl-denaturations followed by fluorescence and CD (urea-denaturations followed by fluorescence did not show any sigmoidal behavior, Figure S8A, inset). The tendency in both refolding and unfolding curves, followed by fluorescence and CD, was the same (Figure S8); however, the refolding CD results indicate that the final ellipticity, acquired by the native state, was not the same as that in the unfolding experiments. Moreover, the fluorescence refolding curves indicate that the value of the <1/λ> was different to that in the unfolding ones, even though we used the same protein concentration. Then, we conclude that there was a hysteresis behavior, and chemical denaturations were also irreversible, as it could be expected for a protein composed of several domains (see below, Section 2.4.).

In addition, comparison of the unfolding CD and fluorescence results suggest that the unfolding of LrtA was not a simple two-state process, as the denaturation curves by both techniques were different. Whereas fluorescence curves showed two transitions, CD reported a sole one, whose apparent midpoint did not overlap with that of the fluorescence (Figure S8). Fluorescence denaturation curves at several

protein concentrations (in the range from 1.9 to 19 μM (in protomer units)) indicate that the first transition monitored corresponded to a protein-concentration-dependent process (Figure S8B inset), as at high protein concentrations (19 μM) two transitions were observed, with apparent midpoints around 1.0 and 2.0 M GdmCl. This result further confirms that LrtA was an oligomeric protein.

#### *2.4. Sequence Properties and Molecular Modeling of LrtA*

The primary structure of LrtA possesses a relatively large fraction of charged residues, both acidic and basic. Due to their high hydrophilicity, these residues tend to hamper the hydrophobic collapse and increase disorder in the protein backbone. Predictors of local disorder [26–29] based on a variety of physical properties (Figure S9) were used to estimate the propensity of the protein sequence to fold. There is a consensus indicating the region around residues 100–130 is highly disordered, together with a few other residues at both protein termini. Since the experimentally determined radius of monomeric LrtA has a value close to that of a compact protein (see Section 2.1., DOSY-NMR results), this may indicate that the region 100–130 was either a long, disordered loop within a single domain protein or a coil region separating two distinct but spatially close domains.

The overall propensity of LrtA to fold into a well-structured protein was also explored by mapping its properties in terms of charge and hydropathy (Figure 3). In particular, Figure 3A compares the location of LrtA sequence within an Uversky diagram [30], which provides indication on the possibility that the protein belongs to the IDP (intrinsically disordered protein) class through the identification of a boundary hydropathy that separates folded and unfolded polypeptides. When the overall primary structure was considered, LrtA fell into the region of the diagram that is mostly populated by well-folded proteins [31], although also accessible to a few IDPs. However, when the sequence of LrtA was divided into two separate portions, they had distinct features. The first-half of LrtA sequence (residues 1–100) more distinctly belonged to the region occupied by ordered polypeptides. In contrast, the second-half (residues 100–191) fell in the region of the diagram that is populated by IDPs, although also accessible to some well-folded proteins. On the other hand, a Das–Pappu diagram [32] showed that LrtA should be considered a so-called 'Janus sequence' in between weak and strong polyampholytes (Figure 3B), independently whether the whole protein or just the two halves of its sequence are considered. This observation strongly suggests that the structure of LrtA is context-dependent, and may easily become more expanded/collapsed or structured/unstructured according to the environment (such as solution conditions or the presence of biomolecular partners). From our experimental results, LrtA had folded regions at pH 8.0 in aqueous solution (Figure S2 and Figure 2B, inset), although with a small stability, as suggested by thermal denaturations (Figure 2C and Figure S7A).

**Figure 3.** Location of LrtA in the diagram of state for charged polypeptides: Symbols "1", "2", and "3" indicate, respectively, the first two halves of the LrtA sequence (residues 1–100 and 101–191) and the whole protein. (**A**) Uversky plot based on the absolute mean net charge as a function of the mean scaled hydropathy, as obtained with PONDR [31]; well-folded (blue squares) and disordered proteins (red circles) are shown. (**B**) Das–Pappu plot based on the fraction f(+) and f(−) of positively and negatively charged residues, respectively [32].

With the aim of building a model for the secondary and tertiary structures of LrtA, the protein sequence was submitted to full-chain protein structure prediction servers [33–36]. A particularly interesting result was obtained by using I-TASSER [33], which is one of the most popular and accurate software for generating high-quality model predictions of tridimensional protein structures. The best models predicted by I-TASSER (Figure S10) all included a well-structured domain spanning the first 100 residues, followed by a collapsed and poorly-structured region. The well-structured domain consisted of two parallel α-helices, and a β-sheet formed by four anti-parallel β-strands. These models were remarkable because they predicted a degree of order in the structure of LrtA that is in reasonable agreement with our expectations based on the CD experimental results (see above, Section 2.2.2), whereas in most cases the algorithms tend to overestimate the amount of secondary structure when applied to intrinsically unfolded polypeptides. Furthermore, the absence of a defined folding topology for the second half of LrtA sequence is consistent with the theoretical predictions discussed above. It is worth mentioning that the C-terminal region of HPF of *S. aureus*, another member of the long sub-family of HPF, in EM (electron microscopy) preparations was folded [5], in contrast with our model; then, it seems that in LrtA from *Synechocystis* sp. PCC 6803, the C-terminal region has specific features, which might be related to protein function. Finally, the conformations predicted for the first-half of LrtA sequence were in common with those obtained with the other algorithms of structural modeling that we used (i.e., FALCON [34], SWISS-MODEL [35], and Robetta [36]), although details of the geometry and orientation of the α-helices and β-strands were in some models different. This was particularly intriguing, especially because a four-strand motif is typical of many RNA-binding proteins ([5] and references therein).

Our theoretical predictions are not difficult to reconcile with the findings show by NMR, at physiological pH, where the spectrum of LrtA was that of a folded molecule (Figure S2). In fact, the signals of the proton nuclei in the unfolded and folded halves of the protein had a different behavior. The amide protons of the unfolded half of the protein would appear between 8.0 and 8.5 ppm [37], where they would be probably obscured, although they should be sharper than the rest of the signals, by many of the amide resonances of the folded half (those of the residues connecting the α-helices and the β-strands); it is interesting to note, however, the presence of a higher intensity at 8.2–8.3 ppm (Figure S2B), which could be due to the sharper resonance of the unfolded region of LrtA. Furthermore, the majority, if not all, of the amide protons of the unfolded half will be broadened and exchanged with the solvent at pH 8.0 [37], as it has been observed to occur in other intrinsically disordered regions, when the pH is raised and even when the temperature is decreased at the highest explored pH [38]. However, we tried to acquire a 1D 1H NMR spectrum at pH 6.9 (in the presence of 0.5 M NaCl) and 15 ◦C; under these conditions (Figure S11) some amide signals appearing between 7.8 and 8.3 became sharper, as expected for a disordered region that has a fast molecular tumbling. The methyl region, under these solvent conditions, was similar to that acquired at higher pH and temperature (Figure S2). In addition, all the methyl peaks corresponding to the side-chains of Val, Ile, and Leu residues of the disordered half of the protein under any of the conditions explored (pH 8.0, 20 ◦C or pH 6.9, 15 ◦C) would appear at basically the same chemical shifts as those of the corresponding folded region, i.e., around 0.8 ppm [37].

The models predicted by I-TASSER provided a static picture of LrtA that does not take into account its dynamics, which could be expected to be significant to determine the properties of such a chameleonic protein. Furthermore, the main difference between the predicted models and our experimental findings is the presence of a larger amount of β-structures in the former. Thus, we suspected that the structure predicted corresponded to the most stable structure that LrtA can assume, e.g., under ideal conditions in solution or when bound to a partner molecule, although, the experimental evidence (Figure 2C, Figures S7 and S8) suggests that this structure was not very stable. For those reasons, we used MD simulations to study the behavior of the protein structure both at room and high temperatures. The latter case corresponds to the simplest and most direct way to

investigate the dynamics of a protein under non-native conditions [39], speeding up the sampling by overcoming the energetic barriers that restrain the structure in a given conformation.

The MD results showed that the region including residues 1–100 is stable and maintains its folding topology and structure when simulated at room temperature (Figure S12). In particular, as shown in Figure 4, Tyr19 and Tyr77 interact to fix the two α-helices, whereas the other two Tyr residues are on the opposite face of the protein (Figure 4B). In contrast, when the temperature was raised, Tyr19 and Tyr77 lost their coordination and the first N-terminal β-strand immediately started losing its anchoring with the rest of the protein and the β-sheet scaffold (Figure 4B), increasing the amount of coil and helical structure. The anchoring was not recovered in annealing runs performed by reducing back the temperature, unless they were started at the earliest step of the local unfolding process. This finding suggests that the folding of the N-terminal region of LrtA was possibly assisted by interactions with other biomolecules, which may include other monomers of LrtA or binding partners, such as RNA. In contrast, the rest of the protein was very stable, and did not lose its folding topology even under the most extreme simulation conditions (Figure 4C).

**Figure 4.** Dynamic behavior of the predicted folded domain of LrtA: The region comprising residues 1–100 of LrtA is shown in cartoon representation (colored from red (N terminus) to silver-white (mid-sequence regions) up to blue (C terminus of the domain)). (**A**) Structure at room temperature, with Tyr residues indicated. (**B**) Simulation under unfolding conditions: in the N-terminal region (indicated in red), the first β-strand loses it structure and coordination with the rest of the β-sheet. (**C**) Structure under extreme conditions: the folding topology is maintained. VMD [40] is used for the protein displays.

#### **3. Discussion**

LrtA seemed to acquire a native-like conformation from pH 6.0 to 9.0. Changes in secondary (far-UV CD) and tertiary (intrinsic fluorescence) structures, and in the burial of hydrophobic residues (ANS fluorescence) occurred concomitantly at low pH. Under acidic conditions (pH 3.0), species with a higher amount of secondary structure (as indicated by a larger (in absolute value) ellipticity, Figure 2B) appeared to be populated, although they had hindered solvent-accessibility towards I − and acrylamide quenchers (Table 1). Therefore, at low pH, LrtA had non-native conformations with non-stable secondary and tertiary structures (as judged by the absence of a sigmoidal shape in thermal denaturation curves). These results are further supported by the NMR spectra acquired at low pH, where there was no dispersion of amide or methyl signals (Figure S3), suggesting the presence of conformations with an unfolded structure. Besides, the broadening observed in the methyl and amide regions suggested the presence of aggregation, which was further confirmed by the far-UV CD spectra at pH 4.5 at different protein concentrations. Then, the protein at acidic pH had a larger tendency to associate than at physiological pH. The increase of ANS fluorescence at low pH, indicating a large solvent-accessible hydrophobic surface area, may appear difficult to reconcile with the results of I- or acrylamide quenching, suggesting a smaller solvent accessibility towards Tyr (Table 1). However, the larger amount of hydrophobic surface area (monitored by ANS), that became solvent-exposed at acidic pH values, could involve several of the Val, Leu, and Ile residues, which are highly abundant in LrtA (46 out of 197 amino acids).

LrtA had well-folded regions in the pH range from 6.0 to 9.0, as indicated by: (i) the sigmoidal curves in the thermal and chemical denaturations (fluorescence and far-UV CD) (Figure 2C, Figure S7 and S8); and (ii) the 1D 1H-NMR spectrum at pH 7.0 and 8.0 (Figure S2). Moreover, the protein was capable of binding homogenous yeast RNA (from Sigma) with an affinity of ~1 μM (Figure S13), and therefore the purification protocol did not affect the conformational features of the protein. However, this structure was not highly rigid, as judged from the apparent thermal midpoint obtained from the irreversible denaturation curves (~40 ◦C, Figure 2C); these findings agree with the MD results in this work. As the secondary and tertiary structures of LrtA are only stable in a narrow thermal range, an increase in the temperature environment reduces the availability of well-folded and active protein, and therefore, a larger amount of protein is needed to carry out the cyanobacterial functions. The LrtA structure under native conditions, as suggested by the deconvolution of CD data, had a smaller percentage of α-helix structure than other HPF members (30% vs. 45%), as well as a lower percentage of β-sheet (17% vs. 27%). These experimental percentages were confirmed by the results of our MD simulations.

LrtA was an oligomeric protein at physiological pH. We showed that some of the Tyr residues seemed to be involved in the self-associating interface, as judged by the changes in the fluorescence lifetimes (Table 1) or the protein-concentration dependence of the curve denaturations midpoints (Figure S8B inset). In addition, our MD simulations at room temperature suggest that Tyr19 and Tyr77 in the folded domain of the protein were key in anchoring amino acids of the β-sheet, and the loss of such anchoring during the high-temperature simulations caused a partial disruption of the protein β-sheet. Then, Tyr residues were important for quaternary and secondary scaffolding in LrtA. Recently, it has been observed that, in EM preparations, the HPF of *S. aureus* (another member of the long HPF subfamily) is forming domain-swapped dimeric species [5]. Furthermore, crystals of the short HPF from *Vibrio cholerae* show the presence of dimers mediated by Co (II) anchoring residues of the β-sheets of two monomers [11], further pinpointing to the crucial importance of residues in the β-sheet for a possible quaternary arrangement of any HPF member. However, the importance of oligomerization for the function of all those proteins (included LrtA) remains to be elucidated, as it could be an adaptive mechanism of regulation to interact with other proteins or even with RNA.

#### **4. Materials and Methods**

#### *4.1. Materials*

Deuterium oxide and IPTG was obtained from Apollo Scientific (Stockport, UK). Sodium trimethylsilyl [2,2,3,3-2H4] propionate (TSP), imidazole, DNase, Trizma base and acid, yeast RNA, glutaraldehyde (25% *w*/*v* solution), ANS, deuterated acetic acid, its sodium salt and His-Select HF nickel resin were from Sigma-Aldrich (Madrid, Spain). The β-mercaptoethanol (β-ME) was from BioRad (Madrid, Spain). Triton X-100 and protein marker, PAGEmark-tricolor (G Biosciences) were from VWR (Barcelona, Spain). Dialysis tubing, with a molecular weight cut-off of 3500 Da, was from Spectrapor (Spectrum Laboratories, Breda, The Netherlands). Amicon centrifugal devices with a

cut-off molecular weight of 3000 Da were from Millipore (Barcelona, Spain). Standard suppliers were used for all other chemicals. Water was deionized and purified on a Millipore system.
