*Article* **A Genomic Approach to Delineating the Occurrence of Scoliosis in Arthrogryposis Multiplex Congenita**

**Xenia Latypova <sup>1</sup> , Stefan Giovanni Creadore <sup>2</sup> , Noémi Dahan-Oliel 3,4, Anxhela Gjyshi Gustafson <sup>2</sup> , Steven Wei-Hung Hwang <sup>5</sup> , Tanya Bedard <sup>6</sup> , Kamran Shazand <sup>2</sup> , Harold J. P. van Bosse <sup>5</sup> , Philip F. Giampietro 7,\* and Klaus Dieterich 8,\***


**Abstract:** Arthrogryposis multiplex congenita (AMC) describes a group of conditions characterized by the presence of non-progressive congenital contractures in multiple body areas. Scoliosis, defined as a coronal plane spine curvature of ≥10 degrees as measured radiographically, has been reported to occur in approximately 20% of children with AMC. To identify genes that are associated with both scoliosis as a clinical outcome and AMC, we first queried the DECIPHER database for copy number variations (CNVs). Upon query, we identified only two patients with both AMC and scoliosis (AMC-SC). The first patient contained CNVs in three genes (*FBN2*, *MGF10*, and *PITX1)*, while the second case had a CNV in *ZC4H2*. Looking into small variants, using a combination of Human Phenotype Ontogeny and literature searching, 908 genes linked with scoliosis and 444 genes linked with AMC were identified. From these lists, 227 genes were associated with AMC-SC. Ingenuity Pathway Analysis (IPA) was performed on the final gene list to gain insight into the functional interactions of genes and various categories. To summarize, this group of genes encompasses a diverse group of cellular functions including transcription regulation, transmembrane receptor, growth factor, and ion channels. These results provide a focal point for further research using genomics and animal models to facilitate the identification of prognostic factors and therapeutic targets for AMC.

**Keywords:** Amyoplasia; scoliosis; DECIPHER (DatabasE of genomiC variation and Phenotype in Humans using Ensemble Resources); CNV (copy number variant); DA (distal arthrogryposis); IPA (ingenuity pathway analysis); HPO (human phenotype ontology); akinesia; MYOD; IGF2

### **1. Introduction**

Arthrogryposis multiplex congenita (AMC or interchangeably arthrogryposis) describes a group of conditions characterized by the presence of non-progressive congenital contractures in multiple body areas [1]. The congenital contractures are a result of decreased fetal movement (fetal akinesia), leading to joint fibrosis and dysplasia/lack of elasticity of the soft tissues surrounding the joint. The longer the duration and the earlier the onset of fetal akinesia, the more severe the contractures. The direct causes of fetal akinesia are

**Citation:** Latypova, X.; Creadore, S.G.; Dahan-Oliel, N.; Gustafson, A.G.; Wei-Hung Hwang, S.; Bedard, T.; Shazand, K.; van Bosse, H.J.P.; Giampietro, P.F.; Dieterich, K. A Genomic Approach to Delineating the Occurrence of Scoliosis in Arthrogryposis Multiplex Congenita. *Genes* **2021**, *12*, 1052. https://doi.org/ 10.3390/genes12071052

8

Academic Editor: Stephen Robertson

Received: 18 May 2021 Accepted: 29 June 2021 Published: 8 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

varied, including abnormalities of the gross or microscopic neurologic system (from brain architecture to anterior horn cell formation), abnormalities of muscle function, restrictive connective tissue conditions, as well as intrauterine crowding and maternal disease. There are more than 400 underlying conditions identified that can lead to fetal akinesia and subsequently to a baby born with AMC, and most of these conditions have known genetic causes [2,3]. Due to the heterogeneity of the condition, several classification systems for AMC exist [1,4,5]. The classification system by Bamshad et al. separates the AMC types by their cardinal features, creating three categories of a roughly equal number of cases. The first category consists of only one diagnosis, Amyoplasia, which represents approximately one-third of cases of arthrogryposis. This "classic arthrogryposis" is a distinct clinical entity presenting with hypoplasia or atrophy of specific muscle groups, and multiple joint contractures. Features at birth are very recognizable and include internal rotation and adduction of shoulders, extended elbows, flexed wrists, and equinovarus foot deformities [1]. Dimples over affected joints are evident. Other recognizable features include a lack of flexion creases on hands and nevus flammeus over the forehead. No underlying genetic abnormality or family history has been associated in cases with Amyoplasia. Therefore, Amyoplasia is postulated to have nongenetic causes, with an intra-uterine vascular interruption as the leading hypothesis [6]. The second category consists of the distal arthrogryposes (DAs), defined by the presence of congenital contractures of primarily the distal joints, primarily wrist and hand contractures and foot deformities (clubfoot or congenital vertical talus), but also to a lesser extent elbows, knees, shoulders, and hips. Underlying genetic causes have been described in most DAs. The current classification system for DA includes 11 subtypes, but as many as 19 different DAs have been suggested. The third category, Bamshad's syndromic category, is used to denote cases of arthrogryposis which may be associated with bone or central nervous system involvement and other birth defects or malformations. This category will probably undergo substantial reorganization in the coming years, as the similarities between different conditions become better understood and the underlying molecular causes unveiled.

The prevalence of scoliosis in children with AMC has been variably reported between 20% and 66%, although more recent studies place the prevalence in the range of 20 to 25% [7–10]. Scoliosis is defined as a coronal plane spine curvature of 10 degrees or greater as measured radiographically and can be separated into idiopathic, congenital, and syndromic or neuromuscular scoliosis. Idiopathic scoliosis represents a curvature of the spine for which no definitive underlying cause is yet known, although a number of candidate genes have been identified [11,12]. Congenital scoliosis is caused by vertebral malformations such as failure of formation (hemivertebrae) and/or failure of segmentation (congenital fusion of two or more vertebral levels). Very few arthrogrypotic conditions will have associated congenital vertebral malformations, therefore most cases of AMC-associated scoliosis are syndromic or neuromuscular. While most published series of children with AMC and scoliosis (AMC-SC) are relatively small, ranging from 14 [7] to 117 patients [9], the relatively high rate of spinal involvement is notable when assessing and treating children with AMC. Some types of AMC have a high association with scoliosis, while others rarely develop spinal deformities. Since most underlying conditions of AMC have known genetic causes, understanding these conditions could shed a light on pathways leading to scoliosis related to arthrogryposis [2,3].

Our primary goal for this paper was to characterize the genetics of the AMC types that have a strong association with scoliosis. We undertook a systematic review of all known genes associated with AMC, focusing on those with an association with scoliosis. We also analyzed copy number variants (CNVs) which represent structural variations in chromosome regions associated with duplication and deletion of genomic material, for their possible role in arthrogryposis and scoliosis. By delineating genes associated with both conditions, common pathways and potential mechanisms were identified to improve our understanding of the natural history of some forms of arthrogryposis, provide prognostic

information for health care providers and families caring for children with arthrogryposis, and possibly lead to targeted therapies for affected patients.

### **2. Materials and Methods**

To delineate the genes associated with both AMC and scoliosis (AMC-SC), their common pathways, and potential mechanisms, we first identified the genes associated with AMC as well as the genes associated with scoliosis. We then identified the common genes to both sets of conditions and conducted Gene Interaction Pathway Analysis, followed by an identification of the CNVs for the identified genes. Each of these steps are detailed below.

### *2.1. Identification of Genes Associated with AMC*

Two previous gene ontology articles published in 2016 and 2019 established a group of 402 genes associated with AMC, which were used as the initial source to identify the genes associated with AMC [2,3]. In addition, we consulted the literature in PubMed from 2019 until 31 December 2020, to identify additional genes since 2019 that are associated with AMC. We identified 30 additional genes (see Table S1) resulting in a total of 444 AMC-associated genes (Figure 1). *Genes* **2021**, *12*, x FOR PEER REVIEW 4 of 21

**Figure 1.** Flow chart diagram used in the current study to identify relevant genes associated with both scoliosis and arthrogryposis multiplex congenita (AMC). **Figure 1.** Flow chart diagram used in the current study to identify relevant genes associated with both scoliosis and arthrogryposis multiplex congenita (AMC).

*2.3. Identification of Genes Associated with Both AMC and Scoliosis and Gene Interaction Pathway Analysis*  The list of genes identified for AMC and for scoliosis were compared to identify the Simultaneously, we extracted 112 genes associated with AMC using the Human Phenotype Ontology (HPO) project (identifier HP:0002804; accessed on Tuesday April 13th 2021, version hpo-web@1.7.9-hpo-obo@2021-02-08). Of these 112 genes, 94 were already identified by the literature search, and the remaining 18 genes were manually curated

genes that are associated with both. Ingenuity Pathway Analysis (IPA), which represents a functional analysis of a set of identified genes, was then conducted using the IPA Inge-

In order to identify CNVs associated with both AMC and scoliosis, we queried the DECIPHER (DatabasE of genomiC variation and Phenotype in Humans using Ensemble Resources) (https://decipher.sanger.ac.uk/) database to identify reported cases with scoliosis or vertebral malformation(s) with AMC (accessed on 9 February 2021). To do so, the 444 genes associated with AMC were queried through the implementation of an in-house Selenium-based automation software package written in the Python 3.8 programming language. The data points extracted into DECIPHER included gene name, number of associated genes, DECIPHER patient number, phenotype(s)/conditions, chromosome location, start position, end position, mode of inheritance, and genotype. The resulting DECI-PHER patient IDs with their associated data were then sub-sampled into identifiable cohort groups representing the phenotype(s) of interest including Arthrogryposis-like hand anomaly, Arthrogryposis Multiplex Congenita, Distal Arthrogryposis, and Scoliosis. We

*2.4. Identification of Copy Number Variants (CNV) Associated with AMC and Scoliosis* 

through a literature review for association with AMC, only 12 of which were found to be associated with AMC. These 12 genes were: *C12orf65*, *DSE, NEK9, PHGDH, PPP3CA*, *PSAT1*, *TBCD*, *VAMP1*, *CACN1E*, *CEP55, RFT1*and *SHPK.*

### *2.2. Identification of Genes Associated with Scoliosis*

The same method used for AMC was applied to identify the genes associated with scoliosis using the Human Phenotype Ontology (HPO) project (identifier HP:0002650; accessed on Tuesday April 13th 2021, version hpo-web@1.7.9-hpo-obo@2021-02-08). A total of 895 genes associated with scoliosis extracted using HPO were reviewed. An additional 16 genes reported in the literature based on Perez-Machado and colleagues' 2020 paper since then were also reviewed, of which three were duplicates among the 895 genes already identified, resulting in a total of 908 genes (see Figure 1) [12].

### *2.3. Identification of Genes Associated with Both AMC and Scoliosis and Gene Interaction Pathway Analysis*

The list of genes identified for AMC and for scoliosis were compared to identify the genes that are associated with both. Ingenuity Pathway Analysis (IPA), which represents a functional analysis of a set of identified genes, was then conducted using the IPA Ingenuity Systems QIAGEN, content version 60467501 software. A core analysis type and subsequent variant effect analysis were used to generate the outputs in each case.

### *2.4. Identification of Copy Number Variants (CNV) Associated with AMC and Scoliosis*

In order to identify CNVs associated with both AMC and scoliosis, we queried the DECIPHER (DatabasE of genomiC variation and Phenotype in Humans using Ensemble Resources) (https://decipher.sanger.ac.uk/) database to identify reported cases with scoliosis or vertebral malformation(s) with AMC (accessed on 9 February 2021). To do so, the 444 genes associated with AMC were queried through the implementation of an in-house Selenium-based automation software package written in the Python 3.8 programming language. The data points extracted into DECIPHER included gene name, number of associated genes, DECIPHER patient number, phenotype(s)/conditions, chromosome location, start position, end position, mode of inheritance, and genotype. The resulting DECIPHER patient IDs with their associated data were then sub-sampled into identifiable cohort groups representing the phenotype(s) of interest including Arthrogryposis-like hand anomaly, Arthrogryposis Multiplex Congenita, Distal Arthrogryposis, and Scoliosis. We then filtered for DECIPHER Patient IDs containing both arthrogryposis and scoliosis. The arthrogryposis and scoliosis DECIPHER Patient IDs containing single nucleotide variants (SNVs) were removed by subtracting the start position from the end position to identify the allelic depth and kept only copy number variation (CNV). Next, we removed any duplicates within our dataset resulting in an accurate representation of the copy number variant genes associated with arthrogryposis and scoliosis for the DECIPHER Patient IDs extracted.

### **3. Results**

Combining the initial literature search results with the HPO identified genes for AMC-SC independently yielded a total of 444 genes associated with AMC and 908 genes associated with scoliosis. When comparing these two sets of genes, 227 genes were found in common (Figure 2).

then filtered for DECIPHER Patient IDs containing both arthrogryposis and scoliosis. The arthrogryposis and scoliosis DECIPHER Patient IDs containing single nucleotide variants (SNVs) were removed by subtracting the start position from the end position to identify the allelic depth and kept only copy number variation (CNV). Next, we removed any duplicates within our dataset resulting in an accurate representation of the copy number variant genes associated with arthrogryposis and scoliosis for the DECIPHER Patient IDs

Combining the initial literature search results with the HPO identified genes for AMC-SC independently yielded a total of 444 genes associated with AMC and 908 genes associated with scoliosis. When comparing these two sets of genes, 227 genes were found

extracted.

**3. Results** 

in common (Figure 2).

**Figure 2.** Scoliosis-associated genes are indicated in the green box, AMC genes are indicated in the red box, and scoliosis and AMC genes are in the orange region.

> This set of 227 genes was then analyzed using IPA. Overall, this group of 227 genes encompasses a diverse group of cellular functions including transcription regulation, transmembrane receptors, growth factor-related genes, and ion channels. (Table 1).

**Table 1.** The 227 Genes Associated with AMC-SC Stratified by Function.

**Homeostasis:** mechanisms include genes associated with early-onset nuclear DNA excision/repair disorders (*ERCC1*, *ERCC2,* and *ERCC6*) [13]. Monoallelic mutations in ERCC1 are associated with cerebro-oculo-facio-skeletal syndrome. *ERLIN1* encodes for a lipid raft-associated protein localized to the mitochondrion and nuclear envelope, and is a component of the *ERLIN1/ERLIN2* complex. The complex mediates the endoplasmic reticulum-associated degradation of inositol 1,4,5-trisphosphate receptors (ITPRs) which are important in calcium homeostasis [14].

*BAG3, BIN1, ERCC1, ERCC2, ERCC6, ERLIN2, SELENON*

### **Table 1.** *Cont.*

**Cytoskeleton:** matrix proteins involved in the sarcomere such as nebulin, a giant protein of thick and thin filaments of striated muscle, encoded by *NEB*. Mutations in NEB are responsible for the majority of cases of nemaline myopathy [15] which can be diagnosed by Gomori trichrome staining on a muscle biopsy or by electron microscopic preparation. *ACTA1*, a member of the cytoskeletal grouping, encodes the principal skeletal muscle isoform of adult skeletal muscle, alpha-actin. Residing in the core of the thin filament of the sarcomere, it assists in the generation of muscle contraction [16]

*ACTA1, ACTB, ACTG1, AP1S2, COL12A1, COL13A1, COL1A1, COL1A2, COL2A1, COL3A1, COL6A1, COL6A2, COL6A3, COLEC11, DCX, DES, DYNC1H1, EMD, FBN1, FBN2, FLNA, FLNB, HSPG2, LMNA, NEB, SPTBN4, SYNE1, TBCD*

**Extra Cellular Matrix:** Extracellular matrix (ECM) protein-associated genes include *ADAMTS10* and *DCHS1. ADAMTS10* is a zinc-dependent protease composed of one cysteine-rich domain, and five thrombospondin type 1 (THBS1) repeats and plays an important role in the formation of the extracellular matrix [17]. *DHCS1* is a member of the protocadherin superfamily and encodes a transmembrane cell adhesion molecule responsible for apical anchoring in the brain [18].

### *ADAMTS10, CDON, DCHS1, MMP2, RAPSN*

**Signal Transduction:** Promotes signaling within a cell via enzyme network cascades to generate precise and appropriate physiologic responses, particularly in skeletal development. *FGFR3* codes for an important tyrosine kinase signal transducer in chondrocytes, functioning to attenuate cartilage growth. FGFR 1–4 transmit at least 18 different fibroblast growth factor (FGF) ligands, therefore, exhibiting a variety of physiological functions [19]. *GDF5* fulfills important functions with respect to bone and muscle [20]. Through its high affinity for BMPR1B, GDF5 positively regulates chondrogenesis, leading to SMAD signal transduction [21]. Through NOG mediated interaction, GDF5 paradoxically also negatively regulates chondrogenesis.

*ADGRG6, CAVIN1, CCDC22, CD96, CFL2, CRLF1, CRTAP, DOK7, EBP, FGFR1, FGFR2, FGFR3, GDF5, IFIH1, KIAA0586, MAGEL2, NF1, PEX5, PEX7, PMP22, RAB3GAP1, RAB3GAP2, STAC3, WNT5A, KBTBD13*

**Proto-oncogenes:** Proto-oncogenes act to facilitate dysregulated cell growth and differentiation. Mutations in *HRAS* are associated with Costello syndrome, characterized by distinct facial features, papilloma of the face, cardiac anomalies, growth restriction, developmental delays, and tumor predisposition. An *HRAS* mutation was identified in an infant with features of Costello syndrome and distal arthrogryposis [22].

### *AKT1, CBL, HRAS, RAB18, RET, SKI*

**Enzyme:** Account for the largest category of genes identified through IPA analysis. 7-dehydrocholesterol reductase (DHCR7) encodes the penultimate step in the cholesterol biosynthetic pathway. Smith-Lemli-Opitz Syndrome is an autosomal recessive disorder caused by an inherited deficiency of DHCR7 which is associated with a variety of birth defects, joint contractures, and intellectual disability [23]. *UBE3* which encodes E3 ubiquitin-protein ligase, a maternally expressed imprinted E3 ubiquitin-protein ligase expressed mainly in the brain, is an integral part of the ubiquitin protein degradation system. Angelman syndrome, characterized by severe cognitive impairment, seizures, an ataxic puppet-like gait, and paroxysms of laughter, is caused by an absence of expression of maternal *UBE3A* [24].

*ALG2, ASAH1, B3GAT3, CANT1, CHAT, CHST14, CHST3, DHCR7, DPAGT1, DSE, ECEL1, EXTL3, EZH2, FBXL4, FKRP, FUCA1, GAD1, GBA, GFPT1, GUSB, HSD17B4, INPP5K, LARGE1, MASP1, MTM1, NAA10, NEU1, OCRL, P3H1, PAFAH1B1, PHGDH, PLOD1, PLOD2, PLOD3, PMM2, POLR3A, POMT1, POMT2, POR, PPIB, PPP3CA, PSAT1, PTDSS1, RNASEH2A, RNASEH2B, RNASEH2C, SAMHD1, TOR1A, TREX1, UBA1, UBE3A, ZMPSTE24*

**Transcription Factor/regulation:** Transcription factors have a pivotal role in the regulation of genes associated with limb and muscle development. T-Box Transcription Factor 5 (*TBX5*) mutations are associated with Holt Oram syndrome characterized by upper limb defects and cardiac malformations [25,26]. *TRIP4* encodes ASC-1, a transcription co-activator. Infants with *TRIP4* mutations present with a congenital muscular dystrophy and respiratory failure. Muscle biopsy shows decreased mitochondria and sarcomere disorganization [27].

### *ARX, ASXL3, ATN1, ATRX, AUTS2, EGR2, FGD1, GZF1, IGHMBP2, IRF6, LMX1B, MED12, NSD1, PAX3, PLEKHG5, PQBP1, RBM10, SETBP1, SETX, SHOX, SOX9, TBX5, TRIP4, ZC4H2, ZEB2, ZIC2*

**Mitochondria:** Mitochondria are depended upon highly by the brain and skeletal muscle tissues for energy. Ganglioside Differentiation Associated Protein 1 (*GDAP1*) encodes a mitochondrial protein postulated to play a role in signal transduction in the brain. Mutations in *GDAP1* are associated with various subtypes of the hereditary and sensory-motor neuropathy disease Charcot Marie Tooth (CMT), including an autosomal recessive intermediate type [28–32]. *RMRP* codes for non-coding RNA involved in mitochondrial DNA replication through the encoding of a mitochondrial RNA processing endonuclease which cleaves mitochondrial RNA at a priming site necessary for mitochondrial DNA replication. Mutations in *RMRP* are associated with cartilage-hair hypoplasia [33]. *RMRP* is essential for early murine development [34].

*ATAD3A, C12orf65, GDAP1, GFM2, MFN2, RMRP, SPAR*

### **Table 1.** *Cont.*

**Membrane Receptor/Ion Channel:** Membrane receptor and ion channels is the second largest group of affected genes leading to AMC-SC. *CHRNA1* (cholinergic receptor nicotinic receptor alpha 1 subunit 1) is one of 5 subunits of the acetylcholine receptor (*AChR*). This gene encodes an alpha subunit and functions as part of acetylcholine binding and channel. Mutations in *CHRNA1* are associated with lethal multiple pterygium syndrome, characterized by the presence of multiple pterygia, intrauterine growth retardation, and flexion contractures resulting in severe arthrogryposis and fetal akinesia [35]. *PIEZO2* is postulated to function as an integral part of mechanically activated cation channel in somatosensory neurons through establishing connections between mechanical forces and biological signals. Mutations in *PIEZO2* are associated with distal arthrogryposis type 5, Gordon syndrome, and Marden–Walker syndrome [36].

*ATP7A, CHRNA1, CHRNB1, CHRND, CHRNE, CHRNG, GPC3, GRIN1, KCNA1, KCNH1, MEGF10, NALCN, NRXN1, NUP88, PIEZO2, PIGS, PIGT, ROR2, RYR1, SCN4A, SGCG, SLC12A6, SLC18A3, SLC26A2, SLC2A10, SLC35A3, SLC39A13, SLC5A7, SNAP25, SYT2, TGFBR1, TGFBR2, TRPV4, VAMP1, WASHC5*

**Kinase:** Kinases phosphorylate target molecules for activation or inactivation. ATR encodes a serine/threonine kinase and halts cell cycling entry upon DNA stress to enable DNA repair [37]. Compound heterozygous mutations in ATR are associated with Seckel syndrome characterized by dwarfism, microcephaly, and cognitive impairment [38]. MAP3K7 mediates cellular transduction in response to environmental changes through association with interleukin receptor (ILR1). Through the cytokine IL-1 mediated interaction with the hypothalamic IL-1 receptor, the hypothalamo-pituitary-adrenocortical axis and sympathetic nervous system pathways suppressing bone formation are activated [39]. Fronto-metaphyseal dysplasia, a progressive sclerosing skeletal dysplasia characterized by small bone undermodeling, supraorbital hyperostosis, large and small joint contractures as well as developmental abnormalities, of the cardiorespiratory system and the genitourinary tract is associated with *MAP3K7* mutations [40].

### *ATR, CASK, MAP3K7, MUSK, NEK9, PRKAR1A*

**Intracellular transport:** Intracellular transport proteins are structural proteins that facilitate the movement of vesicles and substances within a cell. *BICD2* codes for a structural protein functioning as an intracellular adaptor for the dynein motor complex, linking it to various cargos. Through the stabilization of the interaction between dynein and dynactin, the movement of dynein is facilitated along the microtubule [41]. Mono-allelic mutations in *BICD2* cause congenital spinal muscular atrophy [42]. *GLE1* is postulated to act as a terminal step in the transport of mature messenger RNA messages from the nucleus to the cytoplasm. Bi-allelic mutations in *GLE1* are associated with a lethal congenital contracture syndrome characterized by fetal hydrops, degeneration of anterior horn cells, and congenital contractures [43].

### *BICD2, DYM, FKBP10, GLE1, KIF1A, VPS53*

**Structural:** Structural proteins provide the framework for a cell or complex of cells. The *LAMA2* gene encodes laminin-2 or merosin, a major component of the extrasynaptic membrane of muscle cell basement membrane. Laminin-211 binds to the glycosylated residues of alpha-dystroglycan (*DAG1*) in skeletal muscle fibers [44]. Bi-allelic mutations in *LAMA2* are associated merosin-deficient congenital muscular dystrophy. Affected patients have hypotonia, joint contractures and may develop scoliosis. Myosin, the major contractile protein in muscle, is composed of two heavy chains and two light chains. *MYH3* encodes the embryonic myosin heavy chain 3. *MYH3* mutations appear to reside near a groove that is part of the myosin head and are associated with distal arthrogryposis type 1 in which contractures are limited to distal joints, Freeman –Sheldon, Sheldon -Hall syndromes [45]. Affected patients with Freeman Sheldon and Sheldon Hall syndromes have distal joint contractures, characteristic facial features and may develop scoliosis. *MYH3* mutations are postulated to cause structural changes in myosin that potentially alter myosin domain-domain interactions during ATP catalysis or affect nucleotide-binding site conformation.

*FHL1, FKTN, KLHL41, LAMA2, LMOD3, MYBPC1, MYH2, MYH3, MYMK, MYO18B, MYO9A, MYOD1, MYPN, PRX, TNNI2, TNNT3, TPM2, TPM3, TTN, VMA21*

> Figure 3A shows examples of the 227 genes common to scoliosis and AMC analyzed in IPA. On the canonical pathway panel (A, left) describing the actin cytoskeletal pathway, major disruption points representing inactivating variants in some key genes such as F-actin, Myosin, and Filamin A (FLNA) that crosslinks actin filaments to membrane glycoproteins can be seen.

> As of the right panel (B), the complex intricate interaction network clearly shows the close functional relationship and involvement of key genes such as AKT Serine/Threonine Kinase 1 (Akt), cholinergic receptor family (CHRN), and Neuregulin gene family (NRG) involved in neuromuscular junctions.

**Figure 3.** Two examples representative of pathway analysis of the final gene set by Ingenuity Pathway Analysis (QI-AGEN). (**A**) Actin cytoskeleton signaling canonical pathway. Genes that are part of the AMC-SC list are shown in purple. (**B**) Highest score pathway predicted by IPA connecting the largest number of gene list members. Solid lines indicate direct interaction between genes while dotted lines symbolize indirect connection. Molecule shapes indicate gene functions, legends can be found here: https://qiagen.secure.force.com/KnowledgeBase/KnowledgeIPAPage?id=kA41i000000L5rTCAS (accessed on Wednesday 13 January 2021). The DECIPHER database search (outlined in Figure 4) identified only two patients harboring CNVs associated with scoliosis and arthrogryposis, for which details are sum-**Figure 3.** Two examples representative of pathway analysis of the final gene set by Ingenuity Pathway Analysis (QIAGEN). (**A**) Actin cytoskeleton signaling canonical pathway. Genes that are part of the AMC-SC list are shown in purple. (**B**) Highest score pathway predicted by IPA connecting the largest number of gene list members. Solid lines indicate direct interaction between genes while dotted lines symbolize indirect connection. Molecule shapes indicate gene functions, legends can be found here: https://qiagen.secure.force.com/KnowledgeBase/KnowledgeIPAPage?id=kA41i000000L5rTCAS (accessed on Wednesday 13 January 2021).

marized in Table 2.

The DECIPHER database search (outlined in Figure 4) identified only two patients harboring CNVs associated with scoliosis and arthrogryposis, for which details are summarized in Table 2. *Genes* **2021**, *12*, x FOR PEER REVIEW 11 of 21

**Figure 4.** DECIPHER Gene Extraction Workflow.

**Figure 4.** DECIPHER Gene Extraction Workflow.



thoracolumbar scoliosis loss, het) \* N/A N/A Patient 1 (DECIPHER #260667): https://decipher.sanger.ac.uk/patient/260667/overview/general (accessed on Monday 28 June 2021). Deletion chr 5: Start position 125286403, length: 10815843, contains 133 genes. Patient 2 (DECI-PHER #262492): https://decipher.sanger.ac.uk/patient/262492/overview/general (accessed on Monday 28 June 2021). Patient 1 (DECIPHER #260667): https://decipher.sanger.ac.uk/patient/260667/overview/general (accessed on Monday 28 June 2021). Deletion chr 5: Start position 125286403, length: 10815843, contains 133 genes. Patient 2 (DECIPHER #262492): https://decipher.sanger. ac.uk/patient/262492/overview/general (accessed on Monday 28 June 2021). Deletion chr X: Start position: 64954439, length: 233145, contains 1 gene. \* For each gene the following information is provided between parentheses: chromosome number (chr), CNV inheritance (de novo, heterozygous) and category (gain/loss).

Deletion chr X: Start position: 64954439, length: 233145, contains 1 gene. \* For each gene the following infor-

mation is provided between parentheses: chromosome number (chr), CNV inheritance (de novo, heterozygous) and category (gain/loss). The first patient (#260667) had a chromosome 5 deletion encompassing 133 genes. Three relevant genes contained within the CNV which potentially impacted the phenotype i.e., *FBN2*, *MEGF10,* and *PITX1* [46]. The CNV is a 10.82 Mb heterozygous deletion The first patient (#260667) had a chromosome 5 deletion encompassing 133 genes. Three relevant genes contained within the CNV which potentially impacted the phenotype i.e., *FBN2*, *MEGF10,* and *PITX1* [46]. The CNV is a 10.82 Mb heterozygous deletion containing 133 genes resulting in a contiguous gene deletion syndrome. This deletion has been documented with a haploinsufficiency score of 50.51, i.e., a high likelihood of causing a loss of function [47].

> containing 133 genes resulting in a contiguous gene deletion syndrome. This deletion has been documented with a haploinsufficiency score of 50.51, i.e., a high likelihood of causing

a loss of function [47].

Mono-allelic *FBN2* mutations are associated with Beals congenital contractural arachnodactyly [48]. Bi-allelic mutations in *MGF10* are associated with myopathy, areflexia, respiratory distress, and dysphagia. Mono-allelic mutations in *PITX1* are associated with congenital clubfoot, with or without deficiency of long bones and/or mirror-image polydactyly in addition to Liebenberg syndrome, defined by the presence of carpal synostosis, dysplastic elbow joints, and brachydactyly [49].

Regarding patient #2, the de novo heterozygous CNV is a fragment of 233.15 Kb located on the X chromosome, containing only the 2C4H2 gene and reported as "likely pathogenic" according to the ClinVar classification.

The second patient (#262492) had a heterozygous or hemizygous (on the X chromosome) deletion encompassing the *ZC4H2* part of the CNV. *ZC4H2* is associated with Wieacker–Wolff syndrome, characterized by the presence of foot contractures, muscle atrophy, and oculomotor apraxia [50].

### **4. Discussion**

Despite the significant prevalence of scoliosis in the AMC patient population, there is little information regarding genetic contributions to scoliosis development in AMC in the literature. In one study of 46 patients with AMC, 32 patients (65.6%) developed scoliosis between the ages of 5–16 years [10]. Five of 32 patients (15.7%) presented with scoliosis at birth, reflecting the position of the immobile fetus in the uterus, and therefore referred to as "prenatal scoliosis". Several patterns of scoliosis have been noted to occur in AMC and include a "paralytic curve" which is more common in the hypotonic types of AMC and tends to progress; it is typically observed before the age of 2 years but can arise at any age. These curves tend to be thoracolumbar in local, often with pelvic obliquity and severe hip contractures. The second curve pattern is the less prevalent "idiopathic-like", with more balanced double curves, localized to the thoracic and thoracolumbar regions, and often manifesting in later childhood or adolescence.

The aims of this review were to identify genes that are associated with both AMC and scoliosis, and describe the functional pathways and CNVs associated with both conditions. While additional analysis of comparison between pathways associated with arthrogryposis without scoliosis, scoliosis without arthrogryposis and pathways that are associated with both arthrogryposis and scoliosis may be potentially complementary this was not the ultimate focus of our investigation. To our knowledge, this is the first study that has utilized a genomic approach to identify genes that are associated with both AMC and scoliosis. We identified a list of 227 genes that were associated with AMC-SC. The collection of genes encompasses a diverse group of cellular functions, which, once impaired, contribute to AMC-SC: cytoskeletal elements, neurotransmitter enzyme function, molecular chaperone function, ion channel regulation, extracellular matrix, DNA repair, growth factor, transmembrane receptor, transcription factor/regulator, messenger RNA regulation and cellular transport (see Table 1).

IPA analysis of the 227 AMC-SC genes suggest some common causal mechanisms and pathways such as critical "housekeeping" functions (cellular ion balance, DNA excision/repair, mRNA trafficking and post-translational modification), embryologic development, and structural families of genes expressed in bones and/or muscles (Figure 3A,B). The affected genes can heavily impair the naturally occurring or canonical pathways at crucial points, degrading the normal progression of embryologic development and/or after birth differentiation. This process depends on the chronological expression of involved genes and their transcriptional factors (TFs). Several guiding principles were demonstrated by the intricate relationships of the studied genes, as visualized by the pathway analysis: (1) It is likely that the majority of these genes are related to the pathological processes involved with the development of arthrogryposis and scoliosis, (2) IPA analysis facilitates a birds-eye view of potentially impaired key processes, (3) In the central nervous system, dysfunction can be related to mutations of genes surrounding the AKT1/2 kinases or of the growth factors regulating their activities, neurotransmitter receptors, or intracellular ion balance

impairing the transmission of electrical impulses, (4) Mutations in structural genes such as actin, myosin, titin, and dynein in bone and muscle related pathways may cause impaired cytoskeletal function and/or decreased contractile ability, (5) Mutations in regulatory genes such as *TBX5*, *TRIP4* or *NFkB* act at the level of transcription to regulate activity of these genes, (6) There are inflammatory mediated effects on cellular differentiation in these organs. As an example of how the IPA analysis can reflect actual findings, the analysis attests an interaction between MYH3 and actin (Figure 3A). Mutations in *MYH3* are associated with Freeman-Sheldon syndrome (FSS), a form of DA characterized by a small mouth and joint contractures. Drosophila modeling of FSS provided molecular evidence for MYH3 and actin interaction as *MYH3* mutations are associated with myofibrillary disarray and result in decreased catalytic efficiency of actin-activated ATP hydrolysis [51]. IPA analysis did identify other potential disorders and conditions that may be attributed to alterations in genes which are members of pathways in which the 227 genes identified with AMC-SC. These include skeletal, muscular, limb defects and cognitive disability. Further validation would require a more in depth analysis, which is not the focus of this paper.

Figure 3B highlights among other interactions, interplay between *MYOD1* and *IGF2*. Literature review supports this interaction. Recently, two siblings presenting with a lethal form of fetal akinesia deformation sequence (FADS) including deficient pectoralis and proximal limb musculature, distal joint contractures and neonatal respiratory have been described. Watson et al. [52] found a homozygous probably pathogenic loss of function variant, c.188C>A/ p.Ser63\*, in *MYOD1*. *MYOD1* encodes MyoD. MyoD is a key player in cell proliferation and differentiation of myoblasts and its expression fine tunes the balance between myoblast proliferation and differentiation [53]. MyoD directly activates the expression of a long non coding mRNA, called LncMYOD, encoded next to the *MYOD1* gene [54]. LncMyoD then interacts directly with IMP2 (insulin-like growth factor 2 mRNA-binging protein 2). LncMyoD downregulates IMP2-mediated mRNA translation of genes involved in cell proliferation, such as *N-RAS* and c-myc and *IGF2*. Interestingly *IGF2* is part of the imprinting control region 1 (ICR1) at chromosome 11p15.5. *IGF2*, as well as the *H19* gene, when hypomethylated at the ICR1 locus, are associated with Silver-Russell syndrome [55]. Patients with Silver-Russell syndrome have major clinical features consistent with pre- and postnatal growth restriction, frontal bossing with relative macrocephaly, feeding difficulties and low body mass index. In some individuals musculoskeletal features have also been mentioned with muscle hypoplasia and congenital joint contractures [56,57].

Querying the DECIPHER database yielded only two patients who had CNV associated with AMC-SC (Figure 4 and Table 2). Patient 1 had a much larger region of CNV, containing three relevant genes: *FBN2*, *MEGF10* and *PITX1* [46]. *FBN2* (Fibrillin 2) codes for cytoskeletal element, and mutations are associated with Beals congenital contractural arachnodactyly [48]. *FBN2* intragenic deletions or splice site mutations have been published on some occasions associated with contractural congenital arachnodactyly [58]. Rare nonsense mutations are present in the ClinVar database and reported as pathogenic. These latter are not known to be associated with an AMC phenotype. Therefore we cannot either exclude or confirm a link between *FBN2* haploinsufficiency and AMC *MEGF10* (multiple epidermal growth factor-like domains protein 1) codes for a membrane receptor involved in the phagocytosis of apoptotic cells by macrophages and astrocytes, and biallelic mutations are associated with myopathy, areflexia, respiratory distress and dysphagia. *PITX1* (Paired Like Homeodomain 1) was the only gene of the CNV analysis that had not also been identified as a gene associated with AMC-SC, and in the literature, it has not yet been associated with AMC. It is associated with congenital clubfoot, occasionally with bony malformations of the foot, and Liebenberg syndrome, defined by the presence of carpal synostosis, dysplastic elbow joints and brachydactyly [49,59]. Other nonsense variants (2) have only been mentioned in ClinVar.

It is unclear if the monoallelic *MEGF10* was responsible for any part of the patient's phenotype of either AMC or scoliosis. There is a possibility *FBN2*, a gene known to

cause Beals syndrome (a form of distal arthrogryposis with scoliosis), was the only gene responsible for the AMC-SC phenotype; and there is a possibility of some type of additive effect between monoallelic *FBN2*, *MEGF10*, and *PITX1* resulting in an arthgrogrypotic phenotype. We suspect this patient has a contiguous gene syndrome. Additional literature reports describing patients with similar phenotypic features and deletions would provide support for this hypothesis.

One microdeletion involving only *PITX1* has been associated in one family with clubfoot over three generations (Alvarado et al., 2011 [59]). Other nonsense variants (2) have only been mentioned in ClinVar.

Patient 2 had a CNV for *ZC4H2* (Zinc Finger C4H2-Type Containing), a gene causing an X-linked arthrogryposis, usually only in females, known as Wieacker-Wolff syndrome or *ZC4H2*-Associated Rare Disease (ZARD). This condition is characterized by hypotonia, moderate to severe developmental delay, and early and progressive onset of scoliosis. Loss of function mutations have been described to be pathogenic on several occasions in *ZC4H2* [50,60–62].

It is of interest to notice the small number of mapped CNVs in patients as opposed to coding region variants in the literature. This can be at least partially explained by a historic bias of research toward the exome. With a current increase in whole genome sequencing clinical projects, we should see an increase in the non-coding variants for all disease-related literature in the next few years. Furthermore, geneticists do not necessarily need to submit CNVs to DECIPHER that can be easily linked to clinical symptoms.

Future directions related to this research could be exploration of possible treatments, either to ameliorate the effects of AMC and possibly avoid the scoliosis, or to prevent the fetal akinesia altogether. IPA analysis has in some instances the ability to suggest possible drug targets for specific gene pathways. For example, many of the genes implicated in fetal akinesia also are associated with later life cancers. A number of medications have been used or are being developed to treat these cancers. Viewing these drugs as therapeutic for AMC must be done with extreme caution. Mechanistically, many of the genetic AMCs are due to lack of function or altered function of the gene product, whereas many cancers are often an uncontrolled overexpression of the proliferative genes (also shut off of differentiation genes) IPA analysis is limited with respect to drugs' interactions and genes or gene products. While IPA may suggest a certain drug for its interaction with a particular gene or gene product, the nature of the interaction could be unclear. For instance, curarizing agents are listed with *CHRNG* which codes for a subunit of the acetylcholine receptor (AChR), and a mutation of which is associated with multiple pterygium syndrome (MPS). But mutations of *CHRNG* that cause MPS are loss of function mutations which result in a failure of export of the subunit to the cell surface or no protein expression [63]. The mutation already has a disconnecting effect on the neuromuscular junction, which curarizing agent would only exacerbate. Additionally, the *CHRNG* encoded protein is only expressed up to the 33rd week of pregnancy, and is replaced on the AChR with an "adult" subunit, which is presumably functional. In fact, it should be assumed that any chemical treatment for underlying causes of AMC-SC would need to be given early in pregnancy in order to prohibit development of deformities related to fetal akinesia.

Research using animal models such as zebrafish has shown some promise in the identification of pathologic mechanisms which may be amenable to targeted therapies. *MYBPC1* appears to be a novel gene responsible for DA1, though the mechanism of disease may differ from how some cardiac *MYBPC3* mutations cause hypertrophic cardiomyopathy [64]. *MYBPC1* is necessary in slow skeletal muscle development and can be used in established zebrafish models as a tractable model of human distal arthrogryposis [65]. Mutations in *MYH3*, which encodes embryonic heavy chain (MyHC) expressed initially during slow skeletal muscle development are also associated with multiple pterygium syndrome (MPS) and spondylocarpotarsal synostosis syndrome. The latter condition is characterized by joint contractures in addition to vertebral, carpal and tarsal fusions, and could present a mechanistic link between vertebral fusions and joint contractures, with

hypercontraction of the surrounding muscle leading to excessive notochord tension [66,67]. Zebrafish homozygous for the smyhc (slow myosin heavy chain) are analogous to the most common distal arthrogryposis caused by MYH3 mutations. The zebrafish develop notochord kinks characterized by vertebral fusions, progression to scoliosis in addition to motor deficits accompanied the disorganized and shortened slow-twitch skeletal muscle myofibers. Slow twitch muscle fibers rely on aerobic metabolism and are recruited for smaller range of activities as compared to fast twitch fibers which rely on anaerobic metabolism and are utilized for larger bursts of activity. Treatment of the zebrafish embryos with the myosin ATPase inhibitor, para-aminoblebbistatin, which decreases actin-myosin affinity, normalized the vertebral fusions and notochord phenotype [68]. These findings hold tremendous promise for the treatment of AMC-SC.

*TNNI2* is also associated with distal arthrogryposis, types DA1 and DA2B, encoding a subunit of the troponin complex. Tnni2K175del transgenic mice with a heterozygous gain of function mutation in *TNNI2*, encoding a subunit of the troponin complex have small body size and joint contractures. Hypoxia-inducible factor3a (Hif3a) was found to be increased with decreased Vegf levels in bone in these mice resulting in decreased angiogenesis, delays in endochondral ossification, decreased chondrocyte differentiation and osteoblast proliferation [69]. Interestingly, both *HIF3A* silencing using Hif3A/Hif-3α siRNA and HIF-prolyl hydroxylase inhibition effectively increased the cell viability during anoxia/reoxygenation injury in cardiomyocytes and led to changes in mRNA expression of HIF1-target genes, and reduced the loss of mitochondrial membrane potential (∆ψm) [70]. These results show promise towards applications for AMC bone related targeted treatments.

We noted a number of methodological barriers in our research. Although similar database and literature searches were implemented to identify scoliosis- and AMCassociated genes (see Figure 1), it is noteworthy that HPO scoliosis-associated terminology accounted for 98.5% of scoliosis-associated genes, with literature review adding only another 1.5%, whereas using HPO terminology for AMC identified only 21.7% of AMCassociated genes, lagging significantly compared to the literature search. There are multiple reasons to use the HPO terminology to identify genes associated with AMC-SC. HPO is freely available, provides standardized vocabulary for phenotypic manifestations of genetic disorders and can aid in specific diagnoses. HPO also provides linkages with different disease coding systems. The lack of AMC-associated terminology in HPO stems from the scarcity of codes associated with rare AMC-associated disorders, particularly those conditions categorized under Bamshad's syndromic category [71]. Since its founding in 2008, HPO continues to expand its coverage of disease-associated phenotypes [72]. However, some disorders such as epilepsy have very deep phenotypic characterization, whereas other disorders such as respiratory diseases are less well represented in HPO. In searching the HPO database with AMC-associated terms, 18 genes were singled out as not having been identified in the literature search (see Figure 1); subsequently 6 were discarded as on further review as they did not have an associated AMC phenotype.

We strived to be comprehensive in this study, with the identification of genes associated with AMC-SC using PubMed, HPO, and DECIPHER. Despite those efforts, genes may have been missed or their phenotypic spectrum not completely realized, particularly those in research or other databases that are not public/accessible and have not been published. For instance, *MYH3* is the only gene associated with Freeman-Sheldon syndrome (and the only gene associated with DA8–autosomal dominant MPS). We would suggest further investigation into the specific gene mutations of *MYH3* that leads to the occurrence of scoliosis. *MYH3* mutations can also lead to DA1, the "classic" distal arthrogryposis, as well as DA2B, Sheldon-Hall syndrome, both of which rarely have associated scoliosis. Understanding the differences in the specific mutations of *MYH3* between these three conditions may shed light on the origins of scoliosis.

Ascertainment bias or failure to include patients with AMC or scoliosis could lead to a misrepresentation of the number of the genes associated with AMC-SC. We used a systematic HPO and literature searching approach to identify genes associated with AMC, scoliosis so we could subsequently identify genes associated with both conditions.

Syndromes/genes associated with AMC-SC are relatively rare, which highlights the need to share findings and contribute to more easily accessible platforms such as HPO. An ongoing registry project for children with AMC, funded by the Shriners Hospitals for Children and implemented in seven regions in North America, will make future contributions to genotype/phenotype associations in AMC. This should lead to a better understanding of mechanisms that lead to AMC, and possibly to better care and outcomes. International collaborations to expand this registry have started and will necessitate the identification of common data elements and terms. There will also be opportunities for the findings from this registry to contribute to such platforms as HPO.

### **5. Conclusions**

Using a combination of HPO analysis and literature review, we identified 908 genes associated with scoliosis and 444 genes associated with AMC resulting in 227 genes associated with AMC-SC. These genes act through a variety of cellular mechanisms including transcription regulation, transmembrane receptor, growth factor, and ion channels. Through query of the DECIPHER database, we identified two patients each with one CNV associated with AMC-SC. The first case had a CNV involving three genes (*FBN2*, *MEGF10,* and *PITX1)*, while the second case had a CNV involving *ZC4H2*. As we continue to learn more about genetic mechanisms responsible for AMC we anticipate the ability to better provide prognostic information and targeted therapies for affected patients.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/genes12071052/s1, Table S1: Additional genes (*n* = 30) to the list of 402 list from Kiefer & Hall, 2019 [6,73–98].

**Author Contributions:** Writing-Original Draft: X.L. Data Curation: S.G.C. Formal Analysis: N.D.-O. Investigation: A.G.G. Methodology: S.W.-H.H. Project administration: T.B. Resources: K.S. Software: H.J.P.v.B. Conceptualization: P.F.G., K.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** Dahan-Oliel holds a clinical research scholar award from the Fonds de la Recherche en Santé du Québec. Research reported in this publication was also supported in part by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Award Number R03HD099516 to P.F.G.

**Acknowledgments:** This study makes use of data generated by the DECIPHER community. A full list of centers that contributed to the generation of the data is available from https://deciphergenomics. org/about/stats and via email from contact@deciphergenomics.org. Funding for the DECIPHER project was provided by Wellcome. "Those who carried out the original analysis and collection of the data bear no responsibility for the further analysis or interpretation of the data." We gratefully acknowledge the support of the Malika Ray, Asok K. Ray, FRCS/(Edin) Initiative for Child Health. We gratefully acknowledge the technical assistance of Lourdes Richardson RN, MSN, FNP-BC.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Wenjing Lai 1,†, Xin Feng 1,†, Ming Yue <sup>1</sup> , Prudence W. H. Cheung <sup>2</sup> , Vanessa N. T. Choi <sup>1</sup> , You-Qiang Song <sup>1</sup> , Keith D. K. Luk <sup>2</sup> , Jason Pui Yin Cheung 2,\* and Bo Gao 1,\***

> <sup>1</sup> School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China; floralai@connect.hku.hk (W.L.); FengxinCassie@connect.hku.hk (X.F.); nervym7@connect.hku.hk (M.Y.); vntchoi@hku.hk (V.N.T.C.); songy@hku.hk (Y.-Q.S.)

<sup>2</sup> Department of Orthopaedics and Traumatology, The University of Hong Kong, Hong Kong, China; gnuehcp6@hotmail.com (P.W.H.C.); hrmoldk@hku.hk (K.D.K.L.)


**Abstract:** Congenital scoliosis (CS) is a lateral curvature of the spine resulting from congenital vertebral malformations (CVMs) and affects 0.5–1/1000 live births. The copy number variant (CNV) at chromosome 16p11.2 has been implicated in CVMs and recent studies identified a compound heterozygosity of 16p11.2 microdeletion and *TBX6* variant/haplotype causing CS in multiple cohorts, which explains about 5–10% of the affected cases. Here, we studied the genetic etiology of CS by analyzing CNVs in a cohort of 67 patients with congenital hemivertebrae and 125 family controls. We employed both candidate gene and family-based approaches to filter CNVs called from whole exome sequencing data. This identified 12 CNVs in four scoliosis-associated genes (*TBX6*, *NOTCH2*, *DSCAM*, and *SNTG1*) as well as eight recessive and 64 novel rare CNVs in 15 additional genes. Some candidates, such as *DHX40*, *NBPF20*, *RASA2*, and *MYSM1*, have been found to be associated with syndromes with scoliosis or implicated in bone/spine development. In particular, the *MYSM1* mutant mouse showed spinal deformities. Our findings suggest that, in addition to the 16p11.2 microdeletion, other CNVs are potentially important in predisposing to CS.

**Keywords:** congenital scoliosis; congenital vertebral malformation; copy number variant; CNV

### **1. Introduction**

Among all musculoskeletal disorders, scoliosis is one of the most common diseases, affecting around 3% of the world population, which can occur as an isolated defect or as a concomitant symptom in other diseases or syndromes [1]. Scoliosis is categorized into several main groups, including congenital scoliosis (CS), idiopathic scoliosis (IS), neuromuscular scoliosis, and degenerative scoliosis. CS, which usually has first onset at birth or shortly after birth, affects approximately 0.5–1 in 1000 live births [2–5]. Compared with IS, CS is generally more severe due to the high risk of progressive deformity and associated problems such as pulmonary compromise [6]. One of the most significant differences between CS and IS is that IS does not have an association with congenital vertebral malformation (CVM), whereas CVM is the major cause leading to CS. CVM can be classified into several subclasses, including failure of vertebral formation (e.g., hemivertebrae, wedged vertebrae), failure of vertebral segmentation (e.g., unilateral bar, block vertebrae), and mixed type. Of all CVMs, congenital hemivertebrae is the most common anomaly that causes CS [4,5].

During vertebral development, the paraxial mesoderm forms bilaterally paired blocks, named somites, along the anterior–posterior axis. The vertebral bodies are derived from somites formed in the presomitic mesoderm. This fundamental process is called somitogenesis. Once somitogenesis is disturbed, the resulting CVM may lead to spinal deformities.

**Citation:** Lai, W.; Feng, X.; Yue, M.; Cheung, P.W.H.; Choi, V.N.T.; Song, Y.-Q.; Luk, K.D.K.; Cheung, J.P.Y.; Gao, B. Identification of Copy Number Variants in a Southern Chinese Cohort of Patients with Congenital Scoliosis. *Genes* **2021**, *12*, 1213. https://doi.org/10.3390/ genes12081213

Academic Editor: Luisa Politano

Received: 31 May 2021 Accepted: 3 August 2021 Published: 5 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The most commonly accepted mechanism governing somitogenesis is the clock and wavefront model, which is controlled and coordinated by several key signaling pathways, such as Notch, Wnt, Fgf and retinoic acid signaling pathways [7,8]. Genetic studies of human patients with CVM have identified a variety of mutations in components of Notch signaling pathway (e.g., *NOTCH2*, *DLL3*, *MESP2*, *LFNG*, *HES7*, and *RIPPLY2*) and also in several key transcription factors essential for somitogenesis (e.g., *TBX6*, *TBXT*, and *SOX9*). Nevertheless, the genetic basis for majority of patients with CS still remains unclear [1,9].

Copy number variation (CNV) is a type of structural variation of genome. With the advancement of genome-wide analysis tools, it has been revealed that CNVs are widespread in the human genome and account for a large fraction of human genetic diversity [10]. CNVs have been, so far, implicated in many disease states including scoliosis. Although a number of CNVs were found to be associated with adolescent idiopathic scoliosis (AIS) [11,12], there have not been many reports about CS-associated CNVs. The 16p11.2 microdeletion was found to be associated with CS [13], and recent studies demonstrated that a compound inheritance of a *TBX6*-containing 16p11.2 microdeletion and a *TBX6* mutation or hypomorphic haplotype accounted for 5–10% of patients with CS in different populations [14–17]. Additional CNVs, including 10q24.31, 17p11.2, 20p11, 22q11.2, and a few other regions, were respectively reported in individual patients with CVMs [18,19]. Besides 16p11.2 microdeletion, it is unknown whether other CNVs are prevalent in CS.

Here, we analyzed CNVs in a Southern Chinese cohort of patients with congenital hemivertebrae. CNVs were called from whole-exome sequencing (WES) data of 67 cases and 125 family members (controls). We identified 12 rare CNVs in 4 known scoliosisassociated genes and eight recessive CNVs in three genes. We also found 64 novel, rare CNVs in 14 genes that occurred in multiple patients but are very rare in our control group and the general population, suggesting a potential role for genetic susceptibility in the development of CS.

### **2. Materials and Methods**

### *2.1. Patient Recruitment*

The patients studied in this project were recruited from the Duchess of Kent Children's Hospital (DKCH), a tertiary scoliosis referral center in Hong Kong. The patients with CS were diagnosed by imaging such as plain standing whole-spine radiographs and computed tomography. A total of 67 patients with hemivertebrae were chosen for this study, of which 31 had single congenital hemivertebrae while 36 had multiple congenital hemivertebrae. Patients' personal data and medical records were collected under ethical privacy guidelines and approval. Ethics was approved by the Institutional Review Board of the University of Hong Kong/Hospital Authority Hong Kong West Cluster (HKU/HA HKW IRB Ref # UW 15-216), and written informed consent was obtained from all participants and/or parents/siblings.

### *2.2. Control Cohort*

The control cohort studied in this project consisted of 125 participating family members of the recruited patients with CS. Only unaffected parents and siblings (without CS) were included. Accordingly, 58 out of 67 patients had family member(s) participating in this study, including 2 quintets, 14 quartets, 33 trios, and 9 duos.

### *2.3. Genomic DNA Extraction*

Genomic DNAs were extracted from peripheral blood samples of 67 patients and 125 of their family members using InvitrogenTM ChargeSwitch gDNA Serum Kit. The purified genomic DNA was quantified by NanoDrop.

### *2.4. Whole-Exome Sequencing (WES) and Copy Number Variations (CNVs) Calling*

WES was performed for all recruited patients with congenital hemivertebrae and participating family members by Novogene Co, Ltd. (Hong Kong, China), using the Agilent SureSelect Human All Exon Kit on the Illumina sequencing platform. The WES data were processed as described previously [20]. The raw sequence data were first analyzed by fastp for quality control and filtering [21]. After filtering, the Q20 base of most samples was greater than 95%, and the Q30 base was greater than 90%. The sequence reads were mapped to the reference genome (GRCh37/hg19) by Burrows-Wheller Aligner v0.7.17 (BWA-MEM) [22] and further processed using SAMtools v1.10 to sort and index aligned reads [23]. The sam format files generated by BWA was converted to bam format files by SAMtools. CNVs were called from bam files with ExomeDepth v1.1.15, which is an R package based on a read depth algorithm [24]. ExomeDepth uses a robust statistical model to build an optimized reference set in maximizing the CNVs detection power. In this study, four healthy control family members (CS59A, CS71A, CS71B, and CS81A) were selected to generate the reference set.

### *2.5. CNVs Filtering*

Several criteria were used to filter CNVs: (i) Bayes factor (BF) values were calculated for each variant. BF equals to the log10 likelihood ratio of the alternative hypothesis (i.e., there is a CNV) over the null hypothesis (i.e., there is no CNV). BF = log10 (alternative hypothesis/null hypothesis). BF value greater than 1 was regarded as a strong supporting evidence of CNV. CNVs with BF values smaller than 1 were excluded. (ii) As ExomeDepth cannot detect small size CNVs accurately, CNVs with size smaller than 100 bp were excluded. (iii) Because CNVs with high allele frequency in the general population are likely benign and less susceptible, the CNVs with the allele frequency greater than 0.01 were excluded (a minimum sample size of 100 is required). Database of Genomic Variants (DGV) and Genome Aggregation Database (gnomAD) were used. If available, the allele frequency in East Asian population was also checked. As different CNVs often overlap and have no clear boundaries, this filtration was conducted in a gene-based manner. If there were multiple CNVs covering the same gene, the maximum allele frequency was used for filtering. (iv) In a gene-based manner, the number of CNV recurrence was counted in patients and controls.

### *2.6. Real-Time Quantitative PCR (qPCR)*

Real-time qPCR was performed to validate some of the detected CNVs. Briefly, ROX Reference Dye (0.4 µL, 50X), forward and reverse primers (0.4 µL each, 10 µM), TB Green Premix Ex Taq (10 µL, 2X, Tli RNaseH Plus, Takara), patients' genomic DNA (0.5 µL, 10 ng/µL), and sterile ddH2O (8.3 µL) were mixed for qPCR, which was performed using Applied BiosystemTM StepOnePlusTM Real-Time PCR System. A locus outside of the detected CNV region of *NOTCH2*, *DSCAM* and *SNTG1* was used as reference locus (P1). P1 is near the region of chromosome 16p11.2 and previously used as a reference site to detect 16p11.2/TBX6 deletion [14,17]. Each sample was analyzed in triplicate. Quantities of the copy numbers of specific locus were determined by the delta Ct method. The 2−∆∆CT method was used to analyze the relative changes. The qPCR primer sequences: *NOTCH2*-F: 50 - AGGAGGCGACCGAGAAGATG-30 ; *NOTCH2*-R: 5 0 -CGATACTCACCATGCGCG-GG-30 ; *DSCAM*-F: 50 -AGCGAACGTTCCTATCGCTT-30 ; *DSCAM*-R: 50 -TTTCACTTATGCGCCCTGGG-30 ; *SNTG1*-F: 50 -GTCTACATGGGCTGGTG-TGA-30 ; *SNTG1*-R: 50 -CTGGAGGTGCCAGAAACTTG-30 ; P1-F: 50 -GGGGAAGGAACTTA-CATGAC-30 ; P1-R: 50 -TCGTGTTTCCCTGTTGTACC-30 .

### **3. Results**

### *3.1. CS Cohort and WES*

In our cohort, we recruited a total of 92 patients with CS, in which vertebral malformations, such as hemivertebrae, unilateral bar, or block vertebrae, were identified. This operational definition thus excluded other types of scoliosis such as AIS. Because hemivertebrae is the most common type of vertebral malformation in CS and has the greatest potential for rapid progression (5–10 degrees/year) [25], 67 patients with congenital

hemivertebrae were first selected. Further, 125 healthy family members of 58 patients were enrolled for this study, including parents and siblings from two quintets, 14 quartets, 33 trios, and nine duos. WES was performed for all 67 patients and 125 participating family members (controls). The contaminating sequencing adaptors and low-quality reads were first removed and the filtered reads were then aligned to the reference human genome (GRCh37/hg19), sorted and indexed.

### *3.2. CNV Calling*

CNVs were called from the sequence reads with the read-depth analysis tool ExomeDepth, which has high sensitivity and specificity at the exon level [24,26]. Four healthy parents who were not carriers of 16p11.2 microdeletion but whose children have been previously diagnosed with TBX6 compound heterozygosity [17] were selected to generate the reference set for ExomeDepth analysis. After CNV calling of the 67 patients with CS, a total of 15,671 CNVs were detected. On average, each patient carries around 234 CNVs. By counting repeatedly occurring CNVs among different cases, there were 6084 distinct CNVs. This strategy successfully identified *TBX6*-containing 16p11.2 microdeletion in four patients as previously reported [17]. For the control group, a total of 27,116 CNVs were detected from 125 family control members. On average, each control carried around 217 CNVs. By counting repeatedly occurring CNVs among different controls, there were 7171 distinct CNVs. Although more CNVs were detected in a few individuals (six patients and four controls), there was no significant difference between the patient group and the control group (Supplementary Figure S1). The average CNV numbers in patients and controls were similar to the previous report [24]. Afterwards, we analyzed all CNVs by employing both a candidate gene approach and family-based filtering and prioritization strategies. A workflow is shown in Figure 1.

### *3.3. CNVs in Candidate Genes*

To identify CNVs associated with CS, we firstly used the candidate gene approach, and focused on CNVs that contained genes known to be involved in scoliosis or somitogenesis. After checking allele frequencies of CNVs in the Database of Genomic Variants (DGV) and the Genome Aggregation Database (gnomAD), a total of 12 rare CNVs that influence four candidate genes were found in 12 patients, including known *TBX6*-containing 16p11.2 heterozygous deletion in four cases [17]. We also identified two rare CNVs that contained *NOTCH2*, a key component in the Notch signaling pathway, in two patients, and six rare CNVs in AIS-associated genes, *DSCAM* [27] and *SNTG1* [28,29], in six patients (Table 1). We then checked these CNVs in their available family members and found that they are either *novel* mutations (*NOTCH2* in CS043, *DSCAM* in CS018 and CS036, and *SNTG1* in CS048) or paternally inherited (*DSCAM* in CS050) (Table 1). We were unable to determine the inheritance patterns of other patients (*NOTCH2* in CS033, *DSCAM* in CS053 and CS064) due to the lack of family members.

Among the identified rare CNVs, the TBX6-containing chromosome 16p11.2 microdeletion had been previously validated [17]. Here, we further examined CNVs that contained *NOTCH2*, *DSCAM* or *SNTG1* genes. Indeed, qPCR analysis detected heterozygous deletions within the *NOTCH2*, *DSCAM* and *SNTG1* loci (Supplementary Figure S2), indicating the reliability of CNVs called from WES data by ExomeDepth.

**Figure 1.** The workflow of CNV analysis. This strategy detected CNVs in several candidate genes and identified recessive and novel rare CNVs enriched in patients with CS.

### *3.4. Recessive CNVs in Patients with CS*

We then searched for homozygous CNVs (observed/expected reads ratio < 0.1) in 67 patients and 125 controls. After excluding the homozygous CNVs that existed in both patients and controls, we identified unique homozygous CNVs in eight patients with CS. The heterozygous deletions of these loci are rare in DGV or gnomAD database (Table 2). Considering that homozygous CNVs might be inherited from parents, we further checked their inheritance pattern and found that they were either *novel* mutations or unknown due to lack of parents' data. These recessive CNVs contained three genes, *NBPF20* (Neuroblastoma Breakpoint Family Member 20), *FAM138C* (Family with Sequence Similarity 138 Member C), and *DHX40* (DEAH-Box Helicase 40). Interestingly, the *DHX40* containing homozygous CNVs were detected in six patients but was not reported in DGV or gnomAD. *DHX40*-containing heterozygous CNVs are also very rare (Table 2). *FMA138C* is an RNA gene and *NBPF20* is a member of NBPF family characterized by tandemly repeats of DUF1220 domain, but their functions are unclear. *DHX40* encodes a member of the DExD/H-box RNA helicase superfamily that catalyzes the unwinding of double-stranded RNA and has an essential role in RNA metabolism [30].




88

*Genes* **2021**, *12*, 1213


**Table 2.** Recessive CNVs unique in patients with CS (N.D., not determined; N.A., not applied).

### *3.5. Novel CNVs in Patients with CS*

We also sought to identify CS-associated novel CNVs and first analyzed the data from 49 complete families (two quintets, 14 quartets or 33 trios). The detected novel CNVs were then checked in the other 18 patients (nine singlets and nine duos). Eventually, we identified 64 CNVs in 14 genes that occurred in more than three patients but did not exist or was very rare (<1%) in family control group. Those with high CNV allele frequency (>1%) in the general population were also filtered out. This strategy successfully identified the known *TBX6*-containing CNVs in four patients [17] and *DHX40*-containing homozygous CNVs in six patients. Interestingly, we also found there are four additional heterozygous *DHX40* CNVs (Table 3 and Supplementary Table S1). Most of the identified novel CNVs were heterozygous loss, and one was gain of one copy. Our CNV shortlist includes genes involved in ubiquitination (*NAE1*, *MYSM1*), enzymatic activities (*MME*, *PHKB*), ion/small molecule transportation (*SCN7A*, *ABCA6*), meiosis (*MNS1*, *SPO11*), spermatogenesis (*GMCL1*), GTPase activity (*RASA2*), TNF signaling (*NSMAF*), or with unknown function (*LRRC40*).

**Table 3.** Novel CNVs enriched in patients with CS (N.A., not applied).


<sup>a</sup> DHX40 has 6 homozygous (listed in Table 2) and 4 heterozygous CNVs. <sup>b</sup> No CNV with sample size more than 100 is found within the *NAE1* locus. Note: Detailed information of these novel CNVs is shown in Table S1.

### **4. Discussion**

CS is a genetically heterogeneous disorder with evidence for multiple causative genes. However, the genetic causes of the majority of patients still remain unknown. As most cases of CS are of sporadic etiology, CNVs may have greater influence than single nucleotide variations (SNVs) [31]. This was well exemplified by the *TBX6*-containing 16p11.2 microdeletion in previous CS studies [14–17]. Here, we systematically investigated CNVs in a cohort of patients with congenital hemivertebrae and their family controls. We identified the well-known CNVs at chromosome 16p11.2, as well as a number of new CNVs that are potentially associated with CS. Haploinsufficiency of Notch signaling pathway has been demonstrated to cause CS [32] and mutations in *NOTCH2* caused Alagille syndrome and Hajdu–Cheney syndrome, both of which showed abnormal curvature of the spine [33,34]. In our study, we found one short and one long CNVs at *NOTCH2* locus in two patients, spanning one and four exons of *NOTCH2*, respectively. As no significant coding SNV could be detected in *NOTCH2* of these two patients, it is unclear whether heterozygous loss of *NOTCH2* is sufficient to cause CS or other non-coding *NOTCH2* SNVs or environmental factors [32] may contribute. Interestingly, CNVs in two AIS-associated genes, *DSCAM* [27] and *SNTG1* [28,29], were found in six patients, suggesting CS and AIS may be genetically related to each other.

An intriguing finding in our analysis is the identification of CNVs spanning various exons of *DHX40* in ten patients, including six homozygous and four heterozygous CNVs. Most of them are novel mutations. *DHX40* belongs to the conserved DExD/H-box RNA

helicase family, which facilitates the ATP-dependent unwinding of RNA secondary structures [30]. However, the biological functions of each member remained poorly understood. Interestingly, the *DHX40* mutant mice were described to exhibit abnormal bone structure and bone mineralization (Mouse Genome Informatics, MGI: 1914737), indicating a role of *DHX40* in bone development. The mutant of its family member *DHX35* was described to have abnormal vertebrae morphology and scoliosis in mice (MGI: 1918965). Patients carrying *DHX37* mutations showed developmental delay and intellectual disability as well as vertebral anomalies [30]. These observations in the mouse and human might suggest a potential link between DHX family members and CVM.

Although the function of *NBPF20* is unknown, it locates at chromosome 1q21.1, which microdeletion is associated with a variety of phenotypes including skeletal malformations such as scoliosis [35,36]. This region also contains other NBPF family members, such as *NBPF10*, whose genetic variants were implicated in Mayer-Rokitansky-Küster-Hauser (MRKH) syndrome (OMIM # 277000) [37], a disease associated with CS [38].

Among the candidate genes of the identified novel CNVs, *RASA2* (RAS P21 Protein Activator 2) and *MYSM1* (Myb-Like, SWIRM, and MPN domains 1) are of particular interest. *RASA2* encodes a GAP (GTPase-activating protein) protein and functions as a suppressor of RAS by promoting its intrinsic GTPase activity. Rare variants in *RASA2* have been found associated with Noonan syndrome [39]. As scoliosis occurs frequently in Noonan syndrome [40], *RASA2* is a potential candidate gene for CS. It would be interesting to investigate the *RASA2* mutant mouse phenotype. MYSM1 is a deubiquitinase reported to be essential for bone formation [41] and its mutant mice have truncated and kinky tails [42–44], which are often associated with vertebral malformations [45]. Indeed, an X-ray from the International Mouse Phenotyping Consortium (MGI: 2444584) exhibited a spinal deformity in the *MYSM1* mutant mouse (Supplementary Figure S3), indicating a potential role of *MYSM1* in spinal development and predisposition to CS. Further detailed phenotypic analysis of mutant animals is needed to validate its pathogenicity in CS.

Although there are a few reports of heterozygous point mutations in CS patients [17,46–48], the dominant negative effect was only demonstrated by a novel *TBXT* mutation [17]. Considering the low familial recurrence rate in CS, recessive or compound heterozygous mutations are more likely to be the major cause of CS. In this regard, heterozygous CNVs are not sufficient to induce CS. Their pathogenicity may be explained by the following genetic models. First, in our cases, the patients carrying heterozygous CNVs may have additional risk variant or haplotype on the other allele. This possibility has been well exemplified by the 16p11.2/TBX6 mutations and haplotype. However, further analysis of risk variant/haplotype in our study is severely limited by our dataset from WES as they may reside in non-coding regions that regulate gene transcription. We did not detect significant deleterious mutations in the coding regions of these genes. Second, additional mutations in other relevant genes may increase the risk of CS (polygenic model). Other possibilities include environmental contributions and novel mutations in somatic tissues. Environmental factors, such as short-term gestational hypoxia, have been found to cause CS in combination with haploinsufficiency of Notch signaling pathway genes [32]. On the other hand, somatic mutations may serve as the "second hit" in addition to the heterozygous germline CNV mutations (first hit). This genetic model has been well demonstrated in other diseases as well as dystrophic scoliosis caused by *NF1* [49–52]. Testing the above models will require whole genome sequencing, more comprehensive data analysis, and the isolation of malformed vertebral tissues in future studies.

### **5. Conclusions**

In this study, we investigated the genetic basis of CS by analyzing CNVs in a cohort of CS families. Based on the candidate gene approach and family-based filtering of CNVs, we identified both known CS-associated genes and a set of new susceptibility genes, some of which (e.g., *DHX40*, *RASA2*, and *MYSM1*) warrant further investigations in larger cohorts as well as functional characterization. Given the well-defined example of the *TBX6*

compound inheritance and the complex genetic nature of CS, future studies examining the combined effects of SNVs and CNVs and somatic tissues may help better decipher the genetic etiology and heterogeneity of CS.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/ 10.3390/genes12081213/s1, Figure S1: Total number CNVs in each patient and family control, Figure S2: Quantitative PCR analysis for validating heterozygous deletion, Figure S3: The X-ray of *MYSM1* mutant mouse from IMPC. Table S1: Novel CNVs enriched in CS patients.

**Author Contributions:** Conceptualization, B.G., J.P.Y.C. and K.D.K.L.; methodology, W.L., X.F., M.Y., V.N.T.C. and P.W.H.C.; software, W.L., M.Y. and X.F.; validation, W.L., X.F. and B.G.; resources, B.G., J.P.Y.C., Y.-Q.S. and K.D.K.L.; data curation, J.P.Y.C. and B.G.; manuscript preparation, W.L., X.F., J.P.Y.C. and B.G.; supervision, B.G., J.P.Y.C. and Y.-Q.S.; project administration, B.G. and J.P.Y.C.; funding acquisition, B.G., J.P.Y.C. and K.D.K.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Hong Kong Health and Medical Research Fund (06171406) to B.G., and the Tam Sai Kit Endowment Fund to K.D.K.L.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of the University of Hong Kong/Hospital Authority Hong Kong West Cluster (HKU/HAHKW, IRB Reference Number UW 15-216).

**Informed Consent Statement:** Informed consents was obtained from all subjects involved in the study.

**Data Availability Statement:** Data available on request due to restrictions.

**Acknowledgments:** Nil.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**

