**3. Structure of bHLH–PAS Proteins**

To date, our knowledge regarding the tertiary structure of bHLH–PAS proteins is limited. All determined structures comprise single isolated domains (PAS-A or PAS-B) or adjacent domains connected with flexible aa chains. C-termini, however, comprising an extensive part of proteins, have not yet been structurally characterized. These regions are not homologous to any described domains and seem to be very disordered. Consequently, it can be seen to be a huge challenge for scientists to determine their structure and combine it with specific protein functions. All bHLH–PAS structures deposited in the Protein Data Bank (PDB) are listed in Table 2 (Nuclear Resonance Magnetism (NMR) structures) and Table 3 (X-ray structures). Most of the listed assemblies correspond to heterodimers.


**Table 2.** bHLH–PAS protein structures deposited in the PDB obtained with NMR.


**Table 3.** bHLH–PAS protein structures deposited in the PDB obtained with X-ray diffraction.


**Table 3.** *Cont.*

The first step in determining the structure of bHLH–PAS proteins was the isolation and characterization of PAS-B domains from HIF2-α (Figure 2A) [41] and ARNT (Figure 2B) [42]. Both structures were obtained using the NMR technique and presented a fold characteristic for the PAS domain: a five-stranded antiparallel β-sheet flanked by several α-helices [42]. The next step was the crystallization of the isolated PAS-A domain of AHR (Figure 2C) and the PAS-B domains of ARNT (not shown) and HIF1-α (not shown). Interestingly, the tertiary architecture of all structurally characterized PAS domains is very conserved (Figure 2), despite the fact that their primary sequence is highly divergent (sequence identity lower than 20%) [43].

**Figure 2.** Representation of the PAS fold: a five-stranded antiparallel β-sheet is flanked by several α-helices. (**A**) HIF2-α PAS-B obtained with NMR (PDB 1P97), (**B**) ARNT PAS-B obtained with NMR (PDB 1X0O), (**C**) AHR PAS-A domain obtained with X-ray (PDB 4M4X).

Further experiments led to the cocrystallization of PAS-B domains from the HIF2-α/ARNT heterodimer, which revealed that these two domains form an interaction interface via their β-sheets in an antiparallel form (Figure 3A) [42]. Another measurement covering bHLH domains of BMAL1/CLOCK bound to the DNA defined domain structure and binding properties specifying interactions taking place (Figure 3B) [44]. A typical bHLH domain comprises two long α helices connected by a short loop. The first helix includes the basic domain and interacts with the major groove of the DNA [45]. All presented structures allowed an insight into the organization of bHLH–PAS proteins; however, the structure of the multidomain bHLH–PAS protein was still missing.

**Figure 3.** (**A**) HIF2-α PAS-B (green) and ARNT PAS-B (blue) heterodimer (3F1P, [46]). Amino acids creating a salt bridge are marked (HIF2-α E247, ARNT R362, ARNT R379). (**B**) BMAL1 bHLH (magenta) and CLOCK bHLH (grey) domains with E-box DNA (blue) (1H10, [44]).

A turning point was the year 2012, when the first heterodimer comprising the bHLH–PAS-A/PAS-B domains (CLOCK-BMAL1) was crystallized [47] and its structure was resolved (Figure 4A). In 2015, the architecture of two other heterodimers, HIF1-α-ARNT (not shown) and HIF2-α-ARNT (Figure 4B), were obtained [48]. All determined structures present the position of the defined domains in relation to each other in the functional heterodimers. In general, the individual PAS domains are not involved in equal interactions, and the obtained structures are highly asymmetric. Importantly, two groups of heterodimers (based on BMAL-1 or ARNT proteins as a dimerization partner) present separate types of quaternary architecture. All domains in the BMAL-1 group are close spatially to each other (Figure 4A), while ARNT domains do not create intramolecular interactions and can wrap up around a partner protein (Figure 4B) [22,48].

**Figure 4.** Representatives of the two groups of the bHLH–PAS heterodimers. (**A**) Overall structure of the CLOCK-BMAL1 heterodimer (4f3l, [47]), (**B**) overall structure of the HIF2-α–ARNT heterodimer (4zp4, [48]).

To date, all available structural information concerns mammalian bHLH–PAS proteins. There is almost no information about the structure of proteins derived from other organisms, including invertebrate *D. melanogaster*. It would be interesting to verify evolutionary conservation of the entire bHLH–PAS fold in different organisms. The majority of reported protein structures include defined N-terminal domains, while the structural information about C-terminal regions is still missing and limited to short peptides bound to interacting proteins [22]. An example is a short motif featuring a conserved sequence LIXXL found in *D. melanogaster* MET and GCE, which represents a novel nuclear receptor (NR) box. The Docking models of the MET/GCE NR box associating peptides to the orphan nuclear receptor (FTZ-F1) ligand-binding domain (LBD) revealed their α-helical structure, necessary for hydrophobic interaction [49].

#### **4. Unique Properties of the C-Terminal Domains of bHLH–PAS Proteins as IDRs**

While the N-terminal part of bHLH–PAS proteins is responsible for interactions with DNA, ligands/cofactors binding, and heterodimerization, their C-termini are usually responsible for the regulation of the protein and the activity of created complexes [50]. The variability of the amino acid sequence of C-terminal fragments, their transactivation role, and the lack of homology to any described domains prompted us to ask the question about the structural character of these regions and the relationship of their character with the performed function. For a long time, it was believed that spontaneous folding into a well-defined and stable tertiary structure is required for the protein action [51]. However, it is actually known that more than 20–30% of eukaryotic proteins do not have a stable tertiary structure in physiological conditions, but at the same time still perform important biological functions. Such proteins are referred to as intrinsically disordered proteins (IDPs). Simultaneously, over 70% of proteins involved in signal transduction cascades have long intrinsically disordered regions (IDRs). Importantly, the lack of a defined structure is critical for the functionality of IDPs and IDRs [52]. Additionally, the conformational plasticity and elongated shape make them a frequent target of different kinds of post-translational modifications (phosphorylation, acetylation, methylation, and others) that regulate protein activity [53]. IDPs were identified as elements of cellular signaling which control mechanisms and protein interaction networks [54]. IDPs were also shown to take part in disease-related signaling transduction; for example, intrinsically disordered amyloid β-peptides are involved in Alzheimer's disease [55]. Therefore, IDPs can be seen to be targets for drug design strategies.
