**1. Introduction to bHLH–PAS Proteins**

The basic helix–loop–helix/Per-ARNT-SIM (bHLH–PAS) proteins are a class of transcriptional regulators that commonly occur in living organisms. They play an important role in the regulation of a variety of developmental and physiological events [1]. The maintenance of cellular and systemic oxygen homeostasis is performed by hypoxia-inducible factor 1α (HIF1-α) [2]. In the hypoxia condition, HIF1-α is translocated to the nucleus [3] where it regulates transcription activity related to angiogenesis, cell proliferation/survival, glucose metabolism, and iron metabolism. The incorrect control of the listed processes is fundamental in many diseases, including cancer, strokes, and heart disease [2]. Some bHLH–PAS family members act as receptors for different high and low molecular ligands [1]. The only known small ligand-activated bHLH–PAS protein, aryl hydrocarbon receptor (AHR), is involved in toxin metabolism and binds highly toxic ligands, such as TCDD [4]. The ligated AHR migrates to the nucleus and mediates a wide range of biological responses to poisons. This mediation comprises a wasting syndrome, hepatotoxicity, teratogenesis, and tumor promotion [4]. Overexpression and constitutive activation of the AHR have been observed in various types of tumors [5]. Importantly, the AHR has been described as a critical modulator of host–environment interactions, especially for immune and inflammatory responses [6].

Another interesting example of a bHLH–PAS family member is the single-minded protein (SIM), which plays a significant role during central nerve cord [7] and genital imaginal disc development [8]. As shown, SIM gene mutations contribute to certain dysmorphic features of brain development and also the mental retardation in Down syndrome [9]. Interestingly, SIM overexpression is also associated with breast and prostate cancer [10], which indicates connections between their apparently unrelated signaling pathways.

Members of the bHLH–PAS family were shown to be targets for disease therapy. AHR, highly expressed in multiple organs and tissues, may influence tumorigenesis both by direct effect on the cancer cells and by modulation of the immune system. For this reason, the development of selective AHR modulators active against multiple tumors is a desirable direction of research [11]. Also, targeting of the HIF1-α pathway as a novel cancer therapy is a current project [12]. As AHR was shown to modulate the immune response in the respiratory tract, this protein can be potentially used also as a therapeutic object for the treatment of various inflammatory lung diseases [13,14]. Another member of the family, expressed mainly in the brain, neuronal PAS domain-containing protein 4 (NPAS4) has been proposed as a novel therapeutic target for depression and neurodegenerative diseases [15] and as a component of new stroke therapies [16]. Additionally, NPAS4, whose expression was also detected in the pancreas, was proposed to be a therapeutic target for diabetes [17] and as a treatment during pancreas transplantation [18].

In spite of performing a high diversity of functions, the bHLH–PAS proteins family exhibits a relatively well-conserved domain structure in the N-terminal part of their sequence (Figure 1). The bHLH region contains approximately 60 amino acid (aa) residues and can be divided into two functionally distinctive parts: the basic region responsible for DNA binding (approximately 15 aa), and the neighboring C-terminal HLH region, which takes part in protein dimerization [19]. The PAS domain is located in the central part of the protein and usually comprises about 300 aa residues [1]. It is divided into two structurally conserved regions named PAS-A and PAS-B, which are often connected to a single PAS-associated C-terminal (PAC) motif [20]. The PAS-A and PAS-B regions are separated by a poorly conserved link [1]. The PAS-A region is critical for selecting a dimerization partner and ensuring the specificity of target gene activation [21]. The PAS-B region is usually responsible for sensing diverse exogenous and endogenous signals, and is accompanied by energetic and conformational changes that regulate protein activity [21]. Contrary to conserved domains, the C-termini of bHLH–PAS proteins present significant variability [21] and contain variable transcription activation/repression domains (TAD/RPD) (Figure 1) [22,23]. An example is the mammalian SIM existing in two isoforms: SIM1 and SIM2. Both isoforms present a high amino acid identity in their N-termini (90% identity in the bHLH and PAS regions) and extreme diversity in their C-termini [24]. While SIM1 activates the expression of target genes, SIM2 acts as an inhibitor. Interestingly, the opposite transcriptional effect disappears after the deletion of both SIM1 and SIM2 C-termini, resulting in proteins with a similar activity [25,26]. Moffet and Pelletier [26] demonstrated that a distinct SIM2 C-terminal sequence comprises two repression domains with a high proline/serine and proline/alanine content, respectively. It is a feature of "repressor motifs", which can also be found in a large number of other transcriptional repressors [25,26]. Due to the highly variable amino acid sequence and the lack of predefined domains, C-termini are believed to be responsible for the specific modulation of the functioning of bHLH–PAS proteins and the recognition of partner proteins necessary for their unique action [21].

**Figure 1.** Schematic representation of the bHLH–PAS protein domain structure. The N-terminal part of bHLH–PAS proteins is characterized by the presence of defined domains: bHLH (blue), PAS-A and PAS-B (green). The C-terminal part presents significant diversity and contains variable transactivation/repression domains (TAD/RPD). The C-termini of selected proteins (HIF1-α, AHR/ARNT, SIM1, and SIM2) are presented. Yellow boxes indicate TADs while the red box indicates RPD. Based on [26–29].

Generally, bHLH–PAS proteins can be divided into two classes. While the expression of class I proteins is specifically regulated by diverse physiological states and/or environmental signals [30], class II proteins are expressed continuously and serve as heterodimerization partners for class I members. Only the dimer of the two bHLH–PAS proteins acts as a functional transcription factor complex, regulating the expression of genes under its control [22]. Mammalian bHLH–PAS transcription factors are listed in Table 1.


**Table 1.** Mammalian class I and class II bHLH–PAS proteins [1,21,30–32].

### **2. bHLH–PAS Protein Conservation between Organisms**

bHLH–PAS proteins are highly conserved among different organisms, including vertebrates and invertebrates [33]. Most mammalian representatives possess orthologs in insect species. An example is the *Drosophila melanogaster* TANGO (TGO) protein, which is a homologue of the mammalian class II protein, ARNT [34]. TGO is known as the general dimerization partner for Similar (SIMA), Trachealess (TRH), Single-minded (SIM) protein, Spineless (SS), and Dysfusion (DYS), performing functions equivalent to mammalian ones.

In 2017, the Nobel Prize in Physiology or Medicine was awarded to J. C. Hall, M. Rosbash, and M. W. Young for their discoveries of molecular mechanisms controlling the circadian rhythm in *D. melanogaster*. As shown, the two bHLH–PAS transcription factors CLOCK and CYCLE play a key role as transcriptional activators for *period* (*per*) and *timeless* (*tim*) genes [35,36]. Thanks to the conservation of circadian bHLH–PAS proteins between *D. melanogaster* and mammals [35], the explanation of the

fly daily rhythm enabled the understanding of a similar, though much more complicated, process in mammals, controlled by two orthologous to CLOCK/CYCLE heterodimers: CLOCK/BMAL1 and NPAS-2/BMAL1 [22].

In spite of significant similarities, some exceptions between vertebrates and invertebrates can be noticed. The bHLH–PAS transcription factor, Methoprene-tolerant protein (MET), occurs exclusively in insects and to date has no known ortholog in nonarthropod organisms. MET has been recently confirmed as the juvenile hormone (JH) receptor playing a significant role during insect development and maturation [37]. Interestingly, in a few species of insects, like *D. melanogaster* and *Bombyx mori*, there exist the MET paralogs named germ-cell expressed (GCE) and MET2, respectively [38]. MET and GCE participate in modulating JH signaling during *D. melanogaster* development, but their functions are not fully redundant and the proteins exhibit tissue-specific distribution [39]. In turn, the MET2 protein function in *B. mori* is not yet defined [40].
