**2. The Role of the bHLH Proteins in Transcription**

The regulation of genes expression by multiple transcription factors, cofactors and chromatin regulators establish and maintains a specific state of a cell. Inaccurate regulation of transmitted signals can results in diseases and severe disorders [46]. Therefore, transcription requires balanced orchestration of adjustable complexes of proteins. A key regulator of transcription is Mediator, a multi-subunit Mediator complex which interacts with RNA polymerase II (Pol II), and coordinates the action of numerous co-activators and co-repressors [47–50]. Function of the Mediator is conserved in all eukaryotes, though, the individual subunits have diverged considerably in some organisms [51,52].

Up to date, for some bHLH family representatives, interactions with subunits of the Mediator and/or chromatin remodeling histone acetyltransferases/deacyltransferase, were reported. In plants, the Mediator complex is a core element of transcription regulation important for their immunity [53]. It was shown, that in *Arabidopsis thaliana* important jasmonate signaling and resistance to fungus *Botrytis cinerea*, is dependent on the interaction between MED25 subunit of the Mediator and MYC2 [54–56], and interaction of MED8 subunit of the Mediator with FAMA belonging to the bHLH family [57]. Sterol regulatory element binding proteins (SREBPs) the class II bHLH TFs (Table 1) are transcription activators critical for regulation of cholesterol and fatty acid homeostasis in animals. It was shown that human SREBPs bind CBP/p300 acetyltransferase [58] and MED15 subunit of the Mediator to activate target genes [59]. Also yeast Ino2 was shown to bind MED15 subunit of the Mediator tail [60].

The representative of class II TFs TAL1 (Table 1) is required for the specification of the blood lineage and maturation of several hematopoietic cells. TAL1/SCL is considered as a master TF delineating the cell fate and the identity of progenitor and normal hematopoietic stem cells (HSCs). It regulates other hematopoietic TFs thus has a potential for cell reprogramming [22]. TAL1 also binds CBP/p300 acetyltransferase [61,62]. Similarly MyoD—a myogenic regulatory factor which controls skeletal muscle development binds CBP and recruits histone acetyltransferase to activate myogenic program [63]. Cao et al. showed that of MyoD modify the myoblasts chromatin structure and accessibility [64]. ASCL1 (class II, Table 1) was shown to be a pioneer factor which promotes chromatin accessibility and enables chromatin binding by others TFs [65]. Recently, also AHR (bHLH-PAS, Table 1) was suggested to be a pioneer factor which regulates DNA methylation during embryonic developments in unknown way [66]. In clear cell renal cell carcinoma (ccRCC), the most frequent mutation causes the von Hippel-Lindau (VHL) tumor suppressor inactivation leading to genome-wide enhancer and super-enhancer remodeling. This process is mediated by the interaction of HIF2α and HIF1β (bHLH-PAS, Table 1) with histone acetyltransferase p300 [67]. CLOCK, the other bHLH-PAS subfamily member (Table 1) was shown to mediate histone acetylation in a circadian time-specific manner [68].

Interestingly, the bHLH-O proteins members (class VI, Table 1) HEY proteins can function as transcription repressors as well as transcription activators. They were shown to bind directly DNA and interact with histone deacetylases and other TFs [28,69]. On the other hand, gene activation by HEY is regulated in an indirect way. Multiple HEY binding sites located downstream and close to the

transcriptional start site, resulted in a hypothesis that HEY influence the pausing/elongation switch of Pol II [70]. Interestingly, though most of TFs stimulate transcription initiation, MYC (class III, Table 1) was shown to stimulate transcription elongation by recruitment of the elongation factor [71]. The presented studies indicate that the crucial role of the bHLH proteins in maintaining transcriptional regulation of important developmental (e.g., cell differentiation) and oncogenic pathways is dependent on the multiple interactions with basal transcriptional machinery.

### **3. The bHLH Transcription Factors as IDPs**

Intrinsically disordered proteins (IDPs) discovered in 1990s obliterate the paradigm derived from Anfinsen's work, stating that functional proteins must possess a well-defined, ordered, three dimensional structure [72]. Currently it is known, that a large number of proteins is perfectly functional or even multifunctional in a disordered state in which a polypeptide chain undergoes rapid conformational fluctuations [73–76]. Intrinsic disorder can be spread throughout the whole polypeptide chain, or it can be limited to intrinsically disordered regions (IDRs) of various length, which are accompanied by well folded domains [77]. The unique properties of disordered proteins originate from their unusual amino acids composition [78]. IDPs/IDRs are depleted in order promoting amino acid residues (hydrophobic, aromatic, aliphatic side chains). In contrast, they possess unusually high content of charged and hydrophilic amino acid residues [79–81]. As a consequence, disordered polypeptide chains have extremely high net charge and low hydrophobicity [82]. IDPs are pliable and highly dynamic molecules of interconvertible conformations. They may completely or almost completely lack the regular secondary structures. However, the content of secondary structure may also be quite significant and molecules can exist in a molten globule state [83–85]. Various in silico analyses indicated that the proportion of disordered proteins is drastically higher in eukaryotes comparing to prokaryotes [86]. This disproportion reflect the complexity of signaling pathways in which IDPs/IDRs play a crucial role [87]. Due to the flexible and dynamic nature, IDPs/IDRs can form fuzzy complexes, adopting various conformations [88]. According to this, one IDP can form multiple interactions with various partners. Due to a large accessibility of particular residues in a disordered chain, the interaction pattern can be easily modified by posttranslational modifications [89]. For that reason IDPs/IDRs often serve as molecular hubs, modulators and sensors of cellular signals [85].

bHLH TFs are responsible for a control of developmental processes like retinal development, proliferation of progenitors, neurogenesis and gliogenesis. Importantly, this is due to a direct interaction between bHLH TFs and interaction of bHLH TFs with homeodomain factors which create complexes that bind to the specific promoters [90,91]. Transcription of muscle-specific genes during skeletal muscle development is also dependent on the interactions between specific bHLH TFs: MyoD, Myogenin, Myf5 and MRF4 with ubiquitously expressed bHLH E-proteins (E12, E47, TCF4, HEB). Interestingly, it was shown that MyoD interacts with two isoforms of HEB: HEBα and HEBβ. which regulate differentially transcriptional activity of MyoD not only on different, but also on the same promoter [92]. Also interesting is the ability of ID4 to recruit multiple ID proteins to assemble higher order complexes. ID4 restores DNA binding by E47 protein even in the presence of repressing ID1 and ID2. Additionally, the ID proteins can interact with non-bHLH partners expanding regulatory network of ID4 [42]. As a consequence, the ID proteins are proposed as a 'hub' for coordination of multiple cancer events [27]. These examples illustrate the possibility of bHLH TFs to interact with many partners in differentiated way. We suggest that these is related to the disordered character of the bHLH proteins. This hypothesis is substantiated by some experimental studies. Neurogenic bHLH transcriprion factor Neurogenin 2 (Ngn2) was shown to possess long IDR which phosphorylation regulates the activity of the protein [93]. Interestingly, though the bHLH domain was considered as a stable, well ordered structure, partially disordered character of this domain was presented for NeuroD [94], MYC and MAX [95]. We performed in silico analyses to predict the presence of intrinsic disorder and get an insight into the degree of flexibility of bHLH proteins representing all established classes (see Table 1): hHEB (class I), hMYOD (class II), hMYC and atMYC2 (class III) (Figure 1); hMAD1 and hMAX (class IV), hID4 (class V),

hHES (class VI) (Figure 2); hAHR, hHIF-1α, hCLOCK and hARNT (class VII) (Figure 3). We used PONDR-VLXT [96,97], http://www.pondr.com/ for the disorder prediction and DynaMine [98,99], http://dynamine.ibsquare.be/submission/ for prediction of the flexibility of proteins backbone.

A representative of the class I, human HEB shows a high content of predicted as disordered and flexible sequences. The only highly ordered/rigid region appears between 577–630 aa which comprise the bHLH domain (Figure 1A). Based on prediction results, we assume HEB as IDP. Also hMyoD, the class II TFs presents a high content of flexible IDRs especially in the C-terminal part of the protein (Figure 1B). As the representatives of the class III we have chosen hMYC (Figure 1C) (for which partial disorder of the bHLH domain was experimentally documented [95]) and *Arabidopsis thaliana* MYC2 (Figure 1D). For both proteins the presence of flexible IDRs was predicted, though they locations were different.

**Figure 1.** Prediction of intrinsically disordered regions. The top panel presents the domain structure of the analyzed bHLH proteins. Dark grey rectangle indicates the position of bHLH domain, the light grey Leucine zipper. The bottom panel presents a prediction of intrinsically disordered and flexible regions based on the amino acid sequence of proteins. Prediction were performed using PONDR-VLXT (left Y axis) and DynaMine (right Y axis) software. For PONDR prediction, a score above 0.5 indicates disorder. For DynaMine, a S<sup>2</sup> value above 0.8 (blue zone) indicates rigid conformation, 0.69-0.8 (grey zone) is context dependent and a value below 0.69 (green zone) indicates flexible conformation. (**A**) class I human HEB [Q99081], (**B**) class II human MYOD [P15172], (**C**) class III human MYC [P01106-2] and (**D**) *Arabidopsis thaliana* MYC2 [Q39204].

**Figure 2.** Prediction of intrinsically disordered regions. The top panel presents the domain structure of the analyzed bHLH proteins. Dark grey rectangle indicates the bHLH domain, light grey indicates Leucine zipper or Orange domain. The bottom panel presents a prediction of intrinsically disordered and flexible regions, based on the amino acid sequence of proteins. Predictions were performed using PONDR-VLXT (left Y axis) and DynaMine (right Y axis) software. For PONDR prediction, a score above 0.5 indicates disorder. For Dynamine, a S<sup>2</sup> value above 0.8 (blue zone) indicates rigid conformation, 0.69–0.8 (grey zone) is context dependent and a value below 0.69 (green zone) indicates flexible conformation. (**A**) class IV human MAD [Q9Y6D9] and (**B**) human MAX [P61244], (**C**) class V human ID4 [P47928], (**D**) class VI human HES1 [Q14469].

The representative of the class IV, human MAD1 also shows high content of predicted as disordered and flexible sequences (Figure 2A). Interestingly IDRs of hMAX which belongs to the same class IV are located in the N- and C- protein termini, while the middle part is predicted as possessing more rigid structure (Figure 2B). Also, ID4 belonging to the class V of transcriptional inhibitors presents flexible IDR in the C-terminal part of protein and a shorter one in the N-terminal part (Figure 2C). In addition to similarly located the N- and C-terminal IDRs in the class VI member, human HES1 analysis shows high flexibility/disorder in the central part of protein (Figure 2D).

The class VII proteins comprise the bHLH-PAS subfamily, which additionally to the bHLH domain possess a PAS domain responsible for ligands and co-factors binding. Importantly, their C-termini are usually responsible for the regulation of the protein and created complexes activity [100]. Human AHR, HIF1-α, and CLOCK belong to the subclass I of specialized factors, while human ARNT (the subclass II) is one of the general partners which dimerize with the subclass I proteins and is important for their activity. In contrast to the hAHR, for which relatively short IDRs were predicted within the middle, the N- and the C-terminal part of the protein (Figure 3A), other bHLH-PAS members contain longer IDRs which comprise most of the C-terminal half of proteins and are predicted as highly flexible (hHIF-1α, Figure 3B; hCLOCK, Figure 3C; hARNT, Figure 3D).

**Figure 3.** Prediction of intrinsically disordered regions of the class VII bHLH-PAS proteins. The top panel presents the domain structure of the analyzed bHLH–PAS proteins. Dark grey rectangle indicates the bHLH domain, light grey indicates PAS/PAC domains. The bottom panel presents a prediction of intrinsically disordered and flexible regions based on the amino acid sequence of proteins. Prediction were performed using PONDR-VLXT (left Y axis) and DynaMine (right Y axis) software. For PONDR prediction, score above 0.5 indicate disorder. For Dynamine, a S2 value above 0.8 (blue zone) indicates rigid conformation, 0.69–0.8 (grey zone) is context dependent and a value below 0.69 (green zone) indicates flexible conformation. (**A**) human AHR [P35869], (**B**) human HIF-1α [Q16665], (**C**) human CLOCK [O08785], (**D**) human ARNT [P27540].

To date, the only report, concerning the structure of the full-length bHLH protein is the mentioned study showing Neurogenin as IDP [93]. Based on the presented predictions and our own experience with expression of the selected bHLH proteins (not published), we assume that this is due to the relatively high content of IDRs. This makes overexpression and purification process extremely difficult because of propensity to aggregation and high sensitivity to proteases.

#### **4. The Role of IDPs in Maintaining**/**Creation of LLPS**

Over the last decade, since the pioneering work regarding physical nature of P-bodies was published by Hyman and co-workers [101], many molecular biologists and biophysicists have focused on the significance of spontaneous thermodynamically driven liquid-liquid phase separation (LLPS) in biological systems. LLPS leads to formation of dense, liquid condensates that stably coexist in diluted phase [101,102]. At the molecular level it was shown that LLPS is forced by multiple weak and transient interactions which engage IDPs/IDRs [101,103–106]. Repetitively distributed within IDRs highly charged regions of opposite charges, short motifs such as YG/S-, FG-, RG-, GY-, KSPEA-, SY- and Q/N-rich regions form multivalent interactions between condensate components [107]. A model for the condensate formation and composition proposes that some proteins act as the scaffolds, while others as the clients. The scaffolds are the modular proteins which contain repeated motives that enable heterotypical scaffold-scaffold interaction. As they undergo spontaneous LLPS they are essential for the structural integrity of a condensate [108,109]. Directly interacting sequences called stickers are usually multivalent, whereas the interval sequences which separate stickers, called spacers are responsible for the properties of a condensate [110]. Highly charged and flexible IDRs are in fact frequently identified as scaffolds [108,111]. The clients participate into the condensates by binding to the free, unoccupied scaffold sites [108]. A growing number of evidences indicate that LLPS constitute a fundamental mechanism to compartmentalize the intracellular space. LLPS form the functional centres for biochemical reactions in cytoplasm and membrane-surrounded organelles including nucleus.

The structural and functional organisation of the interior of the nucleus was believed to rely solely on the rigid insoluble nuclear matrix [112]. The rich in A and T DNA sequences known as scaffold/matrix associated regions (S/MARs) attach to nuclear matrix and organise chromatin into higher-order structures which comprise distinct loops and functional units attached to the matrix [113]. That concept is now giving way to a new concept, were dynamic, spontaneously formed condensates, such as nucleolus, splicing speckles, Cajal bodies, PML bodies are the key structural and functional components of the nuclear interior. The barrier-free character of liquid condensates allows for rapid exchange of their components with surrounding so they form an ideal environment for biochemical reactions. On the other hand, nuclear condensates have a stable inert, well-defined structure and can be purified by biochemical methods [114]. It was shown, that the concentration of nucleolar components is close to saturation [115]. It means that small changes in the nucleus can drive spontaneous LLPS. In fact association/dissociation events of nuclear condensates regulate many processes related to gene expression [116] including chromatin structure organisation [117], RNA processing [118], ribosome biogenesis [119]. Importantly, LLPS was shown to be involved in formation of some functional condensates that regulate genes transcription [76,120–122].
