*4.1. In Silico Analyses of Selected bHLH–PAS Proteins*

To estimate the occurrence of putative IDRs in bHLH–PAS proteins, we performed in silico analyses of the composition, hydropathy, and sequence complexity of amino acid sequences corresponding to selected proteins. We used the previously described human SIM1 and SIM2, as well as their *D. melanogaster* ortholog, SIM (Figure 5A), representing the class I of the family. To obtain a wider spectrum, we studied other human class I members, AHR, HIF1-α, and CLOCK (Figure 5B), which are engaged in different signal transduction pathways. As mentioned previously, class I proteins dimerize with class II proteins to form a functional complex and are crucial for heterodimer specificity. As each bHLH–PAS class II transcription factor is able to interact with different class I members, we found it to be extremely interesting to perform in silico analysis of the structure of class II members. We chose human ARNT, human BMAL1 (Figure 5C), and, additionally, *D. melanogaster* MET (Figure 5C) as a unique protein with an unknown mammalian homolog. MET can be classified as a class II bHLH–PAS family member based on its ability to not only create heterodimers with its paralog GCE, but also homodimers [56].

**Figure 5.** Prediction of intrinsically disordered regions. The top panel presents the domain structure of the analyzed bHLH–PAS proteins. Pink indicates the bHLH domain, whereas blue represents PAS domains. The length of the proteins is marked. The bottom panel presents a prediction of intrinsically disordered regions based on the amino acid sequence of proteins. All calculations were performed using PONDR-VLS2 software [57]. A score over 0.5 indicates disorder. (**A**) The class I proteins: *D. melanogaster* SIM (red line) and its H. sapiens orthologs SIM1 (dashed red line) and SIM2 (dashed grey line). (**B**) The class I proteins: H. sapiens AHR (violet line), HIF1-α (green line), and CLOCK (violet dashed line). (**C**) The class II proteins: H. sapiens ortholog ARNT (blue line), BMAL-1 (blue dashed line), and *D. melanogaster* Met (black line).

We performed in silico analysis using the predictors of intrinsically disordered regions: PONDR-VSL2 [57], PONDR-FIT [58], IUPred [59], and IsUnstruct [60]. Since the results of all

the employed predictors were compatible, for the purpose of simplicity, we decided to show only one representative result (PONDR-VSL2) for each protein (Figure 5). All the results (Figure 5) substantiate our hypothesis and indicate the intrinsic character of the long C-termini. It is worth noting that for proteins representing class I (see Figure 5A,B), short ordered fragments in their C-termini are visible. Such fragments are able to act as TAD/RPD or so-called molecular recognition elements (MoREs) [61,62]. The presence of MoREs makes the interactions between partner proteins highly specific and reversible [52]. The presented results revealed some subtle differences in the regions that comprise preserved domains. The structure predicted for class I proteins (Figure 5A,B) is undeniably more ordered, while class II proteins show a marked structure relaxation in their middle part (Figure 5C), which is C-terminally linked to the PAS-A domain responsible for specificity of gene activation by bHLH–PAS proteins [63]. Such a difference explains the ability of class II proteins to serve as an interaction partner for different proteins [22]. The ability of IDRs (and IDPs) to interact with several partners is an undeniable advantage in molecular recognition processes [64]. Importantly, the resulting induced folding may differ depending on the binding partner. For example, a disordered region of p53 protein, a known cell cycle regulator and a tumor suppressor [65], folds into alpha helix or beta strand, depending on the partner protein [66].

#### *4.2. The Impact of Disordered Regions on Protein Function*

The flexibility and disorder detected in individual C-termini can be related to the ability of individual bHLH–PAS proteins to perform diverse functions. The differences between SIM1 and SIM2 C-termini, regarding their opposite functions (gene activation/repression), have previously been described. C-terminal regions of two other studied proteins, AHR (class I member) and ARNT (class II member), are characterized by the presence of TADs [67], in which functions are mediated by CBP/p300 and RIP140 coactivators. The C-terminal region of ARNT was additionally proposed to be a crucial activator of the estrogen receptor (ER) [68]. Interestingly, the suppression of AHR activity is also connected with the C-terminus and is mediated by the binding of the small peptide inhibitor [69]. Another repressor of the AHR signaling pathway, AHRR, is distinguished from AHR by the presence of three SUMOylation sites in its C-terminus. As shown, SUMOylation is crucial for full suppressive activity of AHRR [70].

Moreover, the C-terminus of another studied protein, HIF1-α, is characterized by the presence of TADs, and it also interacts with the CBP/p300 coactivator. The C-terminus is additionally responsible for protein stability/degradation and contains sequence motifs influencing subcellular localization: nuclear localization signal (NLS) and nuclear export signal (NES) [71,72].

Another remarkable class I bHLH–PAS protein is CLOCK, comprising a domain with histone transacetylase (HAT) activity in the C-terminus. This domain is responsible for histones acetylation, which affects the transcriptional stimulation of clock-controlled genes. Additional acetylation is performed on the R537 residue of the partner protein, BMAL1. R537 residue is located in the C-terminal part of BMAL1 and its modification facilitates the cryptochrome (CRY1)-mediated repression of specific gene transcription [73]. Importantly, CRY1 competes with the CBP/p300 coactivator for BMAL1 TAD binding, and is not able to bind the C-terminus in the paralog protein, BMAL2. Therefore, C-termini distinguish the circadian functions of these two BMAL paralogs [74].

#### *4.3. Structural Analysis of bHLH–PAS C-Terminal Fragments*

To date, the only structurally characterized C-terminal fragment of the bHLH–PAS protein is the *D. melanogaster* MET C-terminus (MET/C) [75]. It was shown by a series of in vitro analyses that MET/C exhibits a highly disordered character and exists in a solution in extended flexible form with predispositions for conformational changes. It is interesting to note that some short secondary motifs in the structure of MET/C have been predicted. Such short ordered fragments can be important during partner recognition and interactions. It was hypothesized that the intrinsic disorder of the C-terminal fragment was indispensable for the functionality of MET due to it modulating the protein's action in a context-specific way. It enables cross-talk between JH signaling and other signaling pathways during *D. melanogaster* development. Previously, it was shown that Met interacts with FTZ-F1 by its C-terminus [49], thereby modulating stage-specific responses to the hormones during *D. melanogaster* metamorphosis [76].

As all the in vitro analyses results obtained for the MET/C [75] were consistent with the in silico studies presented above (Figure 5C), we hypothesize that the disorder character of the bHLH–PAS proteins subfamily C-terminal fragments can be a more common characteristic and also be very important for their functionality. Previously, the importance of the disordered character of regions flanking the bHLH domain of bHLH transcription factors was shown [77–79].

#### *4.4. Structural Analysis of IDPs*

While C-terminal regions of the bHLH–PAS family are considered as IDRs, it can be challenging to detect and characterize them. The reason is that IDPs and IDRs do not adopt a single stable structure and the energetically most favorable conformations can be very distinguished [80,81]. The tiny conformational changes can promote IDPs/IDRs aggregation [82]. Additionally, it was shown that IDPs/IDRs can be highly sensitive to proteolysis [83]. Currently, studies focused on the characterization of IDRs and IDPs are rapidly developing, and techniques enabling the study of proteins in solution are still improving.

There are a number of bioinformatics tools allowing primary recognition of disordered proteins. Since IDPs are characterized by the specific aa composition (a low content of hydrophobic and a high content of charged residues [84]), the Composition Profiler [85] is commonly used to compare aa distribution between the studied protein and IDPs (DisProt3.4 database)/globular proteins (PDB S25 database). Additionally, for IDPs and globular proteins distinguishing, the Uversky diagram plotting mean net charge versus mean hydrophobicity is useful [86]. Disorder predictors (like PONDR-VSL2 [57], PONDR-FIT [58], IUPred [59], and IsUnstruct [60] used in this work) allow determining the probability of IDR occurrence utilizing the neural networks, trained on selected sets of ordered and disordered sequences. Another predictor, DynaMine, provides information about protein backbone flexibility [87,88]. IDPs, once purified, can be identified by various experimental methods. First, the underestimated mobility during SDS-PAGE electrophoresis can indicate the extended and elongated shape of the protein [75]. Hydrodynamic analysis comprising Size Exclusion Chromatography (SEC) [89] and Analytical Ultracentrifugation (AUC) are commonly used to determine hydrodynamic properties, like the Stokes radius (RS), the sedimentation coefficients (s), and the frictional ratios (f/f0) [90]. The Circular Dichroism (CD) is useful for secondary structures content calculation [91]. All listed techniques allow obtaining preliminary insight into protein structure properties.

One technique commonly used to study the overall shape and structural transitions of biological macromolecules in solution is small-angle X-ray scattering (SAXS) [92]. However, SAXS only provides limited information about the low-resolution overall shape of the molecule, so it is important to combine it with complementary high-resolution methods like NMR that present the local structure [93]. NMR offers unique opportunities that are based on analyzing the deviations from an idealized random coil devoid of any structural propensity [94]. The random coil exhibits characteristic chemical shifts, which are averages of all the possible conformations that amino acids can adopt in a solution. Therefore, NMR chemical shift deviations from random coil values can be used to evaluate the local transient secondary structure of IDPs [80]. The main problem during spectra assignments of IDPs is spectra overlapping (low chemical shifts dispersion) and a significant proton exchange with bulk water that reduces 1HN signal intensities, which in turn leads to low signal-to-noise ratios [94]. The exchange with water can be reduced by conducting measurements in low temperature or low pH [95]. Low-resolution spectra require the development of a novel NMR technique. Recently, IDP-dedicated methods such as 13C-direct detected experiments, paramagnetic relaxation enhancements (PREs), or residual dipolar couplings (RDCs) have been described [96].
