**Identification and Characterization of Genetic Components in Autism Spectrum Disorders 2020**

Editor **Merlin G. Butler**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editor* Merlin G. Butler University of Kansas Medical Center USA

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *International Journal of Molecular Sciences* (ISSN 1422-0067) (available at: https://www.mdpi.com/ journal/ijms/special issues/ASD 2020).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-3611-8 (Hbk) ISBN 978-3-0365-3612-5 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


```
Reprinted from: Int. J. Mol. Sci. 2020, 21, 9029, doi:10.3390/ijms21239029 .............. 127
```
## **Isaac Baldwin, Robin L. Shafer, Waheeda A. Hossain, Sumedha Gunewardena, Olivia J. Veatch, Matthew W. Mosconi and Merlin G. Butler**

Genomic, Clinical, and Behavioral Characterization of 15q11.2 BP1-BP2 Deletion (Burnside-Butler) Syndrome in Five Families Reprinted from: *Int. J. Mol. Sci.* **2021**, *22*, 1660, doi:10.3390/ijms22041660 .............. **147**

## **Anastasia K. Neklyudova, Galina V. Portnova, Anna B. Rebreikina, Victoria Yu Voinova, Svetlana G. Vorsanova, Ivan Y. Iourov and Olga V. Sysoeva**

40-Hz Auditory Steady-State Response (ASSR) as a Biomarker of Genetic Defects in the *SHANK3* Gene: A Case Report of 15-Year-Old Girl with a Rare Partial *SHANK3* Duplication Reprinted from: *Int. J. Mol. Sci.* **2021**, *22*, 1898, doi:10.3390/ijms22041898 .............. **171**

## **About the Editor**

**Merlin G. Butler** MD, PhD is Professor of Psychiatry & Behavioral Sciences and Pediatrics at the University of Kansas Medical Center, Kansas City, Director of the Division of Research and Genetics and Director of the University of Kansas Health System Genetics Clinic. He received his MD degree from the University of Nebraska College of Medicine in 1978 and his PhD in Medical Genetics from Indiana University School of Medicine in Indianapolis, where he also trained and completed his postgraduate training and fellowship in medical genetics accredited by the American Board of Medical Genetics (ABMG). He received ABMG board certification in both Clinical Genetics and Clinical Cytogenetics in 1984. He is also a Founding Fellow of the American College of Medical Genetics and Genomics. He previously held faculty and academic positions at Indiana University, the University of Notre Dame, Vanderbilt University, and the University of Missouri–Kansas City prior to his arrival at the University of Kansas Medical Center in 2008.

Dr. Butler is an appointed member of local academic and state programs for genetic screening services as well as national committees engaged in the care and treatment of those with rare genetic disorders. He is also a member of federal and parent-based organizations or foundations participating in grant review study sections. He serves as a member of several editorial boards for peer-reviewed journals and is an Associate Editor of *Frontiers in Genetics* and *Frontiers in Pediatrics*. He is a recipient of local academic and national honors in recognition of his research and clinical service in genetic disorders. He is a member of advisory board organizations for rare disorders including Mowat–Wilson syndrome and is Chairperson of the Scientific Advisory Board of the Prader–Willi Syndrome Association (USA). His research interests include the genetics of developmental disorders, congenital anomalies, connective tissue disorders, autism, and mechanisms of genomic imprinting and impact on Prader–Willi, Angelman, Burnside–Butler, and fragile X syndromes. His research is focused on genotype–phenotype correlations and delineation with natural history of rare disorders as well as the use of advanced genetic technology, including high-resolution microarrays, next-generation sequencing, and pharmacogenetics testing, in clinical practice. He has published over 500 peer-reviewed articles, over 50 book chapters and authored or edited 20 books on the principles of medical genetics and clinical description, management, and care of patients with common and rare genetic disorders, specifically Prader–Willi, fragile X and Burnside–Butler syndromes, the genetics of autism and syndromic obesity, congenital anomalies, intellectual disabilities, and clinical applications of advanced genetic testing.

## **Preface to "Identification and Characterization of Genetic Components in Autism Spectrum Disorders 2020"**

This textbook, *The Identification of the Genetic Components of Autism Spectrum Disorders 2020*, includes themes associated with autism spectrum disorders (ASD) and related conditions divided into three sections (clinical, genetics, other) covering topics discussed in 2020. These sections include: information on clinical description and phenotypic subtyping, causes, diagnosis, treatment, and characterization of ASD and biomarker development related to neurodevelopmental disorders; the overview of genetic, epigenetic, and environmental factors involved in ASD; characterization of findings in autism based on genomics advanced laboratory testing and genetics with bioinformatics and translational research and characterization of an emerging 15q11.2 BP1-BP2 deletion (Burnside–Butler) syndrome as a cause of autism and neurodevelopmental defects; and other factors contributing to our understanding of causation of ASD, including proteomics and metabolomics with approaches towards functional insights into autism.

This textbook includes nine chapters written by experts in the field of genetics, medical care with treatment approaches and diagnosis, autism research and discovery with characterization, and analysis of genetic and environmental factors. Of these, five chapters are dedicated to clinical description, treatment, and characterization with relevant reviews of ASD and related conditions or contributing factors; three chapters are dedicated to basic laboratory or translational research with genetic data analysis and reviews regarding their contribution to ASD, as 90% of individuals with autism may have a genetic component contributing to their clinical findings; and one chapter describes proteomics and metabolic approaches contributing to ASD and biomarker discovery in order to allow readers to acquire a better understanding and awareness of ASD.

This textbook should be a useful resource for scientists and clinical researchers, medical geneticists, and physicians and clinicians in caring for and managing patients with the goal to translate this information directly to the clinical setting for diagnosis, care, and treatment of patients with ASD. Other healthcare providers and paraprofessionals should be interested, particularly those engaged in teaching, research, care, and treatment, including students at all levels of training and families regarding this important neurodevelopmental disorder which is on the rise in our society and worldwide.

Ultimately, the team of healthcare professionals required to diagnose, treat and care for the growing list of problems recognized or understudied in ASD may include psychiatrists, psychologists and laboratory geneticists, clinical geneticists and genetic counselors, neurologists, special educators and paraprofessionals, child life experts, developmental specialists and pediatricians, social workers, nurses and nurse practitioners, occupational and physical therapists, as well as speech therapists and pathologists; furthermore, public health experts and community activists will find this resource helpful in recognizing features seen in autism and understanding and identifying casuses. Lastly, this book can serve as a resource for students, parents, and other family members for better awareness about the risks, features, and causes of autism as well as agencies providing care, information, resources, and services for those families with autism and/or related neurodevelopmental disorders.

**Merlin G. Butler**

*Editor*

## *Review* **Clinical Assessment, Genetics, and Treatment Approaches in Autism Spectrum Disorder (ASD)**

## **Ann Genovese and Merlin G. Butler \***

Department of Psychiatry & Behavioral Sciences, University of Kansas Medical Center, Kansas City, KS 66160, USA; agenovese@kumc.edu

**\*** Correspondence: mbutler4@kumc.edu; Tel.: +1-913-588-1800; Fax: +1-913-588-1305

Received: 1 May 2020; Accepted: 27 June 2020; Published: 2 July 2020

**Abstract:** Autism spectrum disorder (ASD) consists of a genetically heterogenous group of neurobehavioral disorders characterized by impairment in three behavioral domains including communication, social interaction, and stereotypic repetitive behaviors. ASD affects more than 1% of children in Western societies, with diagnoses on the rise due to improved recognition, screening, clinical assessment, and diagnostic testing. We reviewed the role of genetic and metabolic factors which contribute to the causation of ASD with the use of new genetic technology. Up to 40 percent of individuals with ASD are now diagnosed with genetic syndromes or have chromosomal abnormalities including small DNA deletions or duplications, single gene conditions, or gene variants and metabolic disturbances with mitochondrial dysfunction. Although the heritability estimate for ASD is between 70 and 90%, there is a lower molecular diagnostic yield than anticipated. A likely explanation may relate to multifactorial causation with etiological heterogeneity and hundreds of genes involved with a complex interplay between inheritance and environmental factors influenced by epigenetics and capabilities to identify causative genes and their variants for ASD. Behavioral and psychiatric correlates, diagnosis and genetic evaluation with testing are discussed along with psychiatric treatment approaches and pharmacogenetics for selection of medication to treat challenging behaviors or comorbidities commonly seen in ASD. We emphasize prioritizing treatment based on targeted symptoms for individuals with ASD, as treatment will vary from patient to patient based on diagnosis, comorbidities, causation, and symptom severity.

**Keywords:** autism; ASD; genetics; heterogeneity; syndromes; assessment; medications; treatment; causes

## **1. Introduction**

Leo Kanner in 1943 [1] first introduced the term autism as a diagnostic label to define a specific syndrome observed in young children manifested by early onset, characteristic symptomatology, and disrupted social and emotional relationships. Since then, autism is now recognized as Autism Spectrum Disorder (ASD), which is classified as a developmental disorder as defined in DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, 5th Edition) by the American Psychiatric Association [2] and the ICD-10 (International Classification of Diseases, 10th Revision) by the World Health Organization [3]. Autism is characterized by significant impairment in social communication and atypical repetitive and/or restrictive behaviors or interests, with an onset in the early developmental period, prior to age 3 years. The American Academy of Pediatrics [4] recommends screening all infants and toddlers to identify early signs of autism at 18 months and again at 24 months of age. Rating or assessment scales that have been validated for both clinical and research purposes are helpful in establishing the diagnosis of autism. These scales include the Autism Diagnostic Interview-Revised (ADI-R) and the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2) and should be administered by trained specialists in conjunction with an evaluation of the child with consideration of history and clinical presentation [5,6].

ASD affects between 1 to 2% of children in United States with a growing role for genetic factors with etiological heterogeneity. ASD can be conceptualized as a behavioral syndrome rather than a specific categorical mental disorder [7]. The concept of "syndromic autism" (ASD associated with morphological signs or symptoms helpful in the identification of specific genetic disorders) stands in contrast to "non-syndromic autism" (idiopathic ASD with no associated signs or symptoms). Multiplex autism refers to those with a positive family history of other similarly affected individuals, which highlights the heterogeneity of ASD [8,9].

Clinical and other health concerns that may be associated with ASD include intellectual disability (ID), electroencephalogram (EEG) abnormalities with or without epilepsy, dysmorphic features, and abnormal MRI findings [10,11]. About 10% of children with autism are reported to have microcephaly [12,13], which may be associated with additional findings and a poor prognosis. On the other hand, a large-appearing head size is common in children with autism along with increased brain volumes, particularly in the frontal lobes, but with smaller occipital lobes [14–19]. Mutations of the phosphatase and tensin homolog (PTEN) tumor suppressor gene were reported by Butler et al. [14] in children with autism and extreme macrocephaly. Recent studies have shown that about 20% of genes implicated in autism are also known cancer genes, thereby stimulating an interest in not only risks for cancer development in individuals with ASD but whether chemotherapeutic agents could play a role in treatment of autism [20].

Tordjman et al. [7] provided a comprehensive review of diverse genetic disorders associated with autism and considered possible common underlying mechanisms leading to a similar cognitive-behavioral phenotype of autism, while examining relevant genetics, syndromes, epigenetics, and environmental factors. Despite the recognition of nearly 800 susceptibility, clinically relevant, or known genes for autism spectrum disorder collated by Butler et al. [21] and characterized by numerous etiological studies including relevant animal models [22], it appears that no cohesive model of causation, biomarker [23], or specific mode of transmission for the development of autism has been firmly identified [24].

The cause of ASD is heterogenous involving genetics with multiple different gene variants and environmental influences triggering physiological changes in genetically sensitive individuals along with in utero and metabolic factors including mitochondria dysfunction reported in 10 to 20% of patients with ASD. Familial and heritability studies have shown that genetic factors contribute, with estimates as high as 90% with tuberous sclerosis, fragile X, and Rett syndromes as examples of single gene conditions found but accounting for less than 10% of all ASD cases [25–27]. A list of genetic syndromes and chromosomal disorders associated with ASD is illustrated as Box 1 below.

Behavioral and psychiatric comorbidities are common in individuals on the autism spectrum, and can have a substantial impact on overall health, quality of life, and long-term prognosis. Approximately 30% of individuals with ASD require psychological and psychiatric treatments including medication for behavioral problems including hyperactivity, impulsivity, inattention, aggression, property destruction, self-injury, mood disorders, and psychotic or tic disorders [28,29], a major focus of our report.



## **2. Diagnosis and Genetics of ASD**

ASD affects about 1 individual in 50–100 live births [31,32] and is on the increase with a higher prevalence than reported for congenital brain malformations or Down syndrome. The recurrence rate may be as high as 25–30% if a second child is also diagnosed with ASD in a family (i.e., multiplex) compared with a sporadic pattern (simplex) form of ASD. High heritability estimates have been reported in ASD, e.g., 70 to 90% concordance rate in monozygotic twins [33,34], indicating the potential importance of genetics, but studies have not identified the anticipated number of pathogenic variants to date. Those without a family history may be at a greater risk of copy number variants (CNVs) or deletions/duplications at the chromosome level using chromosomal microarray analysis and DNA probes for CNVs and comparative genomic hybridization [35]. Furthermore, 10% of the individuals with autism from simplex families had CNVs, while only 3% of individuals with autism from multiplex families with more than one family member affected showed CNVs, compared with 1% seen in normally developing children studied as controls. The majority of CNVs were of the deletion type. Single gene conditions are found in about 20% of subjects with ASD, while epigenetics impacted by environmental factors such as nutrition, infections, or toxins could alter the gene status through methylation, controlling function without changing the DNA sequence [27,36,37]. Genome-wide linkage and association studies (GWAS) have identified hundreds of ASD risk gene loci in all human chromosomes.

Autism is considered the most heritable neurodevelopmental disorder based on a large difference in concordance rates or heritability estimates between monozygotic and dizygotic twins with monozygotic twins having rates that are nearly three times higher than rates found in dizygotic twins [38]. Furthermore, a meta-analysis of twin studies on the heritability of ASD in more than 6000 twin pairs was reported by Tick et al. [39]. They found that correlations for monozygotic twins were very close to perfect at a score of 0.98 while the score for dizygotic twins was 0.53, indicating a role of shared environmental effects. Hallmayer et al. [38] concluded that susceptibility to ASD showed moderate genetic heritability and substantially shared twin environmental components, indicating a challenge to find genetic causation for autism.

Genetic investigations have identified the role of hundreds of gene variants, but risk effects are highly variable and relate to other conditions besides autism, making it difficult to find ASD-specific gene variants [40–42]. Many gene variants do impact on common biological pathways or interactions and may play a potential causative role in autism, but more research is needed to address the current challenges in translating autism genetics into clinical practice as genetic etiology and pathogenesis of ASD remain largely unclear [43–45].

Further advances made in genetic technology and testing with improved DNA sequencing and development of bioinformatics with searchable computer genetic variant databases have led to discoveries and characterization of genetic defects in the potential causation of ASD. Improvements of chromosome microarray technology with combination of probes for both copy number variants and single nucleotide polymorphisms (SNPs) have not only led to enhanced testing capabilities in identifying segmental deletions and duplications in the genome, but also the identification of pathogenic or disease-causing genes and their positions within chromosomal regions.

#### *2.1. Genetic Factors Contributing to Autism*

Advances in genetic testing and evaluation for syndromic causation of patients with ASD have identified an etiology in up to 40%, using a three-tier clinical genetic approach described by Schaefer and others in 2008 [25] and later in 2013 [34] to identify causes in children diagnosed with ASD. These include fragile X, Rett, and other genetic syndromes, such as tuberous sclerosis (10–20%), PTEN gene mutations (3%), and structural chromosomal deletions or duplications using early versions of chromosomal microarrays (3%), and an additional 10% or higher when using high-resolution microarray technology. Metabolic disorders such as mitochondrial dysfunctions are seen in 10 to 20% of patients with ASD [32,34,46]. Children with ASD reported with microdeletions or duplications involve chromosome regions 1q24.2, 2q37.3, 3p26.2, 4q34.2, 6q24.3, 7q35, 13q13.2-q22, 15q11-q13, 15q22, 16p11.2, 17p11.2, 22q11, 2q13, and Xp22 [13] and additional cytogenetic disorders associated with ASD are found with new ultra-high-resolution microarray technology (e.g., 15q11.2 BP1-BP2 deletions) [47]. Recent GWAS findings in ASD and broad autism phenotype in 28 extended pedigrees from Canada and the United States showed additional chromosome regions including 1p36.22, 2p13.1, 6q27, 8q24.22, 9p21.3, 9q31.2, 12p13.31, 16p13.2, and 18q21.1 [48].

These newer chromosomal SNP microarrays can identify abnormalities 100 times smaller than can be seen with high-resolution chromosome methods including for ASD candidate genes. A report by Shen et al. [26] on 933 patients with ASD using standard karyotype analysis, fragile X DNA testing, and chromosomal microarrays found abnormal karyotypes in 2.2%, abnormal fragile X testing in 0.5%, and microdeletions or microduplications in 18.2% of subjects. These included recurrent deletions or duplications of chromosome 16p11.2 [49] and for chromosome 15q13.2q-13.3, while new studies found chromosome 7q11, chromosome 15q11.2 BP1-BP2, and chromosome 22q11.2 [50].

Whole-exome sequencing (WES) have identified yields of up to 30% [51] but other studies show lower results (e.g., 9.3%) in individuals with ASD [52]. The vast majority of gene variants are of uncertain clinical significance due in part to the rarity found in genomic normative datasets and limitations of bioinformatics, evolutionary conservation, computational predictions, and relevance in relationship to the normal population. Likely explanations for the lack of consistency among molecular diagnostic testing results may relate to multifactorial causation of ASD influenced by a complex interplay between inheritance and environmental effects along with contributions by epigenetics on gene expression. Despite considerable interest in identifying autism-specific genes, deleterious variants have been implicated across multiple neurodevelopmental and psychiatric disorders but insufficient to date in identifying those genes that, when mutated, confer a largely ASD-specific risk [42].

An early genome-wide association study (GWAS) on 4300 affected children with ASD reported by Wang et al. [53] and 6500 controls of European ancestry found a strong association with six single nucleotide polymorphisms (SNPs) located between cadherin 10 (CDH10) and cadherin 9 (CDH9) genes located on chromosome 5 encoding neuronal cell-adhesion molecules. Since then, over 100 genetic loci have been reported to be associated with ASD [54,55], comprising genes converging on chromatin–remodeling, synaptic function in neuronal signaling, and neurodevelopment [56,57]. Furthermore, Butler et al. [21] collated about 800 genes from the literature that were implicated as clinically relevant, susceptible, or known in ASD. These multiple genes include several members of the neuroligin, neurexin, GABA receptor, cadherin, and SHANK gene families. Other genes were found to code for neurotransmitters and their receptors, transporters, oncogenes, brain-derived hormones, epigenetics, and signaling and ubiquitin pathway proteins, along with neuronal cell-adhesion molecules [21,58–61].

#### *2.2. Metabolic Factors Contributing to Autism*

Metabolic factors are now recognized as contributing to autism including the mitochondria. Next-generation DNA sequencing now allows for accurate detection of mutations or gene variants at the nuclear and mitochondrial DNA (mtDNA) level and is potentially more informative than chromosomal microarray analysis involved in structural DNA changes. This technology is now available in the clinical setting for individuals presenting with biochemical and mitochondrial disturbances and autism [21,62]. Three functional pathways to ASD are potentially involved, which include genes and pathways for chromatin remodeling, (e.g., CHD7, MECP2, DNMT3A, and PHF2), Wnt (e.g., CHD8, PAX5, and ATRX), and other signaling super-pathways (e.g., GPCR, ERK, RET, and AKT) [50,51] and mitochondrial dysfunction in ASD (e.g., [62]). High lactate levels are also reported in about one in five children with ASD, further supporting the role of the mitochondria in energy metabolism and brain development [32,46,62].

The mitochondria are intracellular organelles found in the cytoplasm which play a crucial role in adenosine 5'-triphosphate (ATP) production through oxidative phosphorylation [62–66], the latter process carried out by the electron transport chain made up of Complexes I, II, III, and IV situated in the inner membrane of the mitochondria containing about 100 proteins. Genes that encode the proteins are located in both nuclear and mitochondrial DNA [65–67] and are required for cellular energy that can impact or influence brain development and activity. There are hundreds of nuclear genes involved in mitochondrial function, while only 13 mitochondrial genes code for protein. Mitochondrial disturbances include a depletion type, or reduced number of mitochondria per cell, with a decreased quantity of mtDNA, or mtDNA mutations producing defects in biochemical reactions within the mitochondria and individual cells [32,46,65,68].

A subset of individuals with ASD can have small mitochondrial DNA deletions/duplications detectable with mitochondrial genome microarrays. Human mitochondrial DNA (mtDNA) is a circular, double-stranded DNA molecule contained within the mitochondrion and inherited solely from the mother. Each mitochondrion contains 2–10 mtDNA copies. In humans, 100–10,000 separate copies of mtDNA are usually present per cell [63–65]. Inborn errors of metabolism may contribute significantly to the causation of ASD with enzyme deficiencies leading to an accumulation of substances that can cause toxic effects on the developing brain. A common example is phenylketonuria, leading to excessive phenylalanine levels, intellectual disability, and ASD, if not diet controlled.

### **3. Clinical Assessment and Testing**

#### *3.1. Initial Clinical Evaluation*

A healthcare professional interviews the parent or caregiver regarding presenting problems, reviews a three-generation family history, developmental milestones, and abnormal behaviors of the child, medical and surgical history, and any past or current treatments. The diagnostic evaluation is typically performed by a developmental pediatrician or a child and adolescent psychiatrist. Physical and mental status examinations are performed, and additional testing ordered, as appropriate. If a positive family history for autism is found or dysmorphic (syndromic) features, then a referral is made for clinical genetics' evaluation. Laboratory evaluations may include genetic testing, lead levels, thyroid function, lactate, pyruvate and cholesterol levels, and urine for organic acids. Referrals are made for neurological evaluations and brain imaging when clinically indicated.

### *3.2. High-Resolution Microarrays and ASD*

Genetic testing often begins with chromosomal microarray analysis (CMA) to identify copy number variants (CNVs) to search for a cause of autism spectrum disorder and other related conditions. Microarrays employ a variety of designs and range of coverage of genomic regions, which increases the diagnostic yield as arrays have evolved over time to include better coverage and accuracy. Often the CNVs identified are unclassified or poorly understood in their role in causation of ASD.

Neurodevelopmental disorders can cumulatively affect up to 15% of children [69]. While the etiology of ASD is complex, it involves genetic factors with 800 genes recognized, accounting for 4% of all human genes that are implicated in ASD [21]. Single gene changes, large genomic structural changes (i.e., deletions or duplications), or smaller CNVs and other polygenic conditions can be influenced by the environment and epigenetic factors [70,71]. Genetic testing to pinpoint the underlying cause of ASD is critical for clinical management and counseling. Further, chromosomal microarray analysis has demonstrated the highest diagnostic yield for individuals with ASD as compared to other genetic tests as well as in individuals with ID and/ or behavioral problems, including in developing countries.

High-resolution microarrays now utilize millions of single nucleotide polymorphisms (SNPs) as probes to test the DNA from patients presenting with neurodevelopmental disorders, intellectual disabilities, and ASD. These SNP microarrays are used to identify microdeletions (or duplications) with recognition of dozens of a growing list of deletion or duplication syndromes not previously detected. For example, a study of custom-made, ultra-high-resolution microarrays reported by Ho et al. [47] in 2016 were optimized for the detection of neurodevelopmental disorders (Lineagen, Salt Lake City, Utah) on 10,351 patients presenting for genetic services for neurodevelopmental disorders, ASD, ID, behavioral problems, or with or without multiple congenital anomalies (MCA) over a period of four years. Their testing sample had a male:female ratio of 2.5:1 with a mean age of 7 years. Fifty-five percent of cases represented patients with a diagnosis of ASD with or without other features. The overall CNV detection rate of 28.1% was seen in 10,351 consecutive patients and 24.4% in those with ASD along with 33% in those with intellectual disabilities and/or MCA without autism. The rate of pathogenic findings was significantly lower (4.4%) when the diagnostic indication was ASD only compared to diagnostic indications of DD/ID/MCA without a reported diagnosis of ASD (i.e., non-ASD cohort) (12.5%).

In the ASD cohort, the overall pathogenic rate was slightly higher for individuals with ASD+ as compared to the overall pathogenic rate for individuals with ASD only. The pathogenic rate in the ASD+ cohort started at 4.1% in the youngest group and rose to 8.5% in the 5.5–10 years range. The pathogenic rate in the ASD only cohort rose gradually with age, from 3.4% in the youngest cohort (0–3.4 years) to a peak at 7.0% in adolescence. In 5694 patients classified as ASD and 4657 patients with non-ASD, the most common findings were 15q11. 2 BP1-BP2 deletions followed by proximal 16p11.2 deletions or duplications, 15q13.3 deletions, and 16p13.1 duplications (see Figure 1). The most common finding in the non-ASD cohort was the 22q11.2 deletion causing velo-cardio-facial or DiGeorge syndrome. This study illustrates the value of CMA testing and its impact on medical management is now recognized in consensus medical guidelines for the evaluation of children with ASD.

#### *3.3. Next-Generation Sequencing (NGS)*

Advances in genomics technology using next-generation sequencing (NGS) have led to discovery of many disease-causing genes using candidate gene approaches, disease-specific gene panels, or by whole-exome sequencing of patients presenting with neurodevelopmental disorders, intellectual disabilities, or ASD. Applying genomics to the study of neurodevelopment and function has identified over 5000 implicated genes using clinical exome sequencing approaches and informatics in affected individuals. In addition, disease-specific NGS gene testing panels have been developed and used in the commercial laboratory setting for testing patients presenting for genetic services, including

approximately 600 genes for intellectual disabilities and over 100 genes available for testing for ASD (e.g., Fulgent Diagnostics, Irvine, California).

**Figure 1.** Pie chart showing the top 10 out of 85 genetic findings from data summarized by Ho et al. [47] using ultra-high-resolution chromosomal microarrays from over 10,000 consecutive patients presenting for genetic testing with neurodevelopmental disorders affecting brain function and/or structure of unknown cause with developmental/intellectual disabilities and/or ASD.

These types of analyses have identified pathogenic gene variants or mutations which are known to be disease-causing such as missense or nonsense, but more often variants of unknown clinical significance are found. More testing and information with better interpretations of the genomic change and impact at the protein level are needed to help determine the role, if any, of the unknown gene variants in causing the disease under study including for ASD. Hundreds of new causative genes relating to human diseases and syndromes have been identified with the use of NGS technology over the past few years, with expectations of continued success given improvements in genetic technology, bioinformatics, and expanded genomic databases to search for gene variants and in further characterizing identified genes.

Next generation DNA sequencing of the exons (referred to as exome sequencing) or whole-genome sequencing will continue to find new discoveries of disease-causing SNPs, gene regulatory sequences, or mutations of protein-coding genes for both structural and regulatory proteins. Identifying molecular signatures of novel or disturbed gene or exon expression, disease-specific profiles and patterns (i.e., expression heat maps), and recognition of interconnected gene pathways in autism and other behavioral disorders in the future by using readily available blood elements (e.g., lymphoblasts) should hold promise for treatments with pharmacological agents by regulating (either increasing or decreasing) activity of normal (or abnormal) gene function. The study of non-coding RNAs, which control the amount or quantity of gene expression coding for protein production through micro-RNAs and the quality of protein production by specific isoform development by sno-RNAs, will lead to new areas of research and medical therapies for human diseases. Therefore, this technology should be considered in the diagnostic evaluation of ASD, either sporadic or with a positive family history of others similarly affected.

Butler et al. [21] searched the literature and found approximately 800 genes implicated in autism in the literature as clinically significant, relevant, or known to contribute to the risk of ASD. Recent research revealed that ASD and cancer genes may share common genetic architecture and pathways with the first evidence of the PTEN tumor-suppressor gene playing a role in autism in 2005 [14]. Hence, approximately 800 ASD-related genes and 3500 genes in cancer were examined using the GeneAnalytics pathways and profiling software programs and found shared cell-signaling pathways, metabolic disturbances, and molecular functions in 138, or 17%, of ASD genes that overlap with cancer genes [20]. Shared mechanisms may lead to identification of common pathology and a better molecular understanding of causation as well as potential treatment options.

#### **4. Treatment Approaches**

## *4.1. Behavioral Interventions in ASD*

#### 4.1.1. For Children and Adolescents with ASD

Weitlauf et al. [72] reviewed 65 studies, comprising 48 randomized trials and 17 nonrandomized comparative studies, that analyzed the benefit of behavioral interventions. High-intensity applied behavior analysis (ABA) was associated with improvement in cognitive functioning and language skills relative to community controls in young children [73]. Early intensive behavioral intervention (EIBI) is a well-established treatment for young children with ASD and is based on the principles of applied behavior analysis. Delivered over a period of several years at an average of 20 to 40 hours per week, it can provide substantial benefit for core ASD symptoms, particularly in terms of communication skills [74]. Social skills' interventions including group administered training showed positive effects on social behaviors for older children [75].

#### 4.1.2. For Adults with ASD

National Institute for Health and Care Excellence (NICE) recommended guidelines for management and support of children and young people with autism using group or individual social learning programs to improve social interaction deficits by applying behavioral therapy techniques within a social learning framework. These include using video modeling, peer feedback, imitation, and reinforcement to teach conventions of appropriate social interpersonal interaction [76]. There is evidence from observational studies in adults with ASD that social skills' groups may be effective at improving social interaction [77]. CBT can help adults with ASD across a range of domains, particularly in the context of treating anxiety and OCD, and supporting adults who have difficulties related to a history of victimization [78].

## *4.2. Medication Treatments in ASD*

Psychopharmacological treatment of ASD is challenging due to considerable variability in the presentation of ASD and commonly occurring comorbidities. Individuals with ASD are typically more vulnerable to side effects of psychopharmacological agents than their age-matched, neuro-typically developing peers [79]. Finally, ASD impacts individuals over the course of their lifespan and most of the literature on psychotropic medications in ASD involves pediatric populations.

A psychopharmacological approach may be beneficial in the treatment of identified target symptoms in individuals with ASD. When considering the use of medications, potential benefits and risks must be weighed on a case-by-case basis. It has been reported that close to half of insured children with ASD are receiving psychopharmacological interventions, most commonly with stimulants, alpha-2 agonists, antipsychotics, anticonvulsants, and antidepressants [80].

Currently, there are no medications approved for treatment of the core symptoms of ASD including social communication deficits or repetitive behaviors. Common target symptoms for which there are effective, evidence-based medication treatment include hyperactivity, inattention, impulsivity, irritability, aggression, self-injurious behavior, repetitive behaviors (including stereotypies), and insomnia [81]. For the treatment of irritability associated with ASD, the antipsychotics risperidone and aripiprazole are licensed and approved by the US Food and Drug Administration [82].

## 4.2.1. For the Treatment of ADHD Symptoms in ASD

Stimulant medications are considered first line agents for attention deficit hyperactivity disorder (ADHD) in individuals with ASD, given that overall they are most often effective and generally well tolerated compared to other ADHD medications. The RUPP research team [83] and later Reichow et al. [84] demonstrated a clear superiority of methylphenidate over placebo in children with pervasive developmental disorder. However, those with ASD had a greater risk of side effects with methylphenidate including decreased appetite, insomnia, depressive symptoms, irritability, higher levels of social withdrawal, and lower treatment response rates compared to youth with ADHD alone. It should be noted that stimulant medications for ADHD in the amphetamine class are often used in children with ASD but have not been as rigorously studied.

Regarding non-stimulant medications for ADHD in ASD, both atomoxetine and alpha-2 agonists have shown benefit. Harfterkamp et al. [85], in a double-blind treatment trial of patients age 6 to 17 years with ADHD and ASD using atomoxetine 1.2 mg/kg/day or placebo for eight weeks, found that atomoxetine moderately improved ADHD symptoms, but with frequent adverse events including nausea, decreased appetite, fatigue, and early morning awakening. The alpha-2 agonist guanfacine has been shown to be effective for ADHD in children with ASD demonstrated by a double-blind, placebo-controlled trial of guanfacine extended release in which 50% of youth on active treatment improved on the Clinical Global Impression–Improvement (CGI-I) scale [86], compared to 9.4% on placebo [87], with sedation and transient lowering of blood pressure as the most common adverse effects.

#### 4.2.2. For the Treatment of Irritability, Aggression, and Self-Injurious Behavior in ASD

Atypical antipsychotics compared to other medications have to date demonstrated the best evidence for the treatment of irritability in ASD. Risperidone in youth age 5 to 16 years with ASD [88] in three randomized, placebo-controlled trials showed an over 50% reduction in the irritability score of the Aberrant Behavior Checklist (ABC-I) irritability scale [89] and the magnitude of the response was greater when irritability was rated as moderate to severe [90].

Aripiprazole, the second antipsychotic approved by the FDA for the treatment of irritability associated with autism (in children between the age of 6 and 17 years), demonstrated significantly lower severity scores on the ABC-I and the CGI-I scales for subjects on active medication compared to placebo in two large-scale, randomized, placebo-controlled studies. Unfortunately, weight gain is a common side effect of antipsychotics, and increases in body mass index have been shown to be similar for aripiprazole and risperidone in children with ASD [91].

Anticonvulsant medications divalproex and topiramate have shown some promise for treating irritability in ASD. Divalproex was beneficial in reducing irritability in a small, randomized, placebo-controlled trial of children with ASD [92]. However, an earlier trial failed to show separation from placebo on the ABC-I [93]. Topiramate as monotherapy has no demonstrated benefit in the treatment of irritability in youth with ASD [94]; however, it reduced the ABC-I score when co-administered at an average daily dose of 200 mg with risperidone [95]. It is hypothesized that, as EEG abnormalities are common in children with ASD, symptom reduction with anticonvulsants may result from treatment of abnormal brain discharges [96].

#### 4.2.3. For the Treatment of Repetitive Behaviors Including Stereotypies in ASD

Fluoxetine has been shown to improve repetitive behaviours in adults with ASD [97]; however, benefit has not been reliably demonstrated in pediatric populations. In fact, the Cochrane Collaboration published a systematic review [98], which concluded that for repetitive behaviors in children with ASD

there is not only a lack of available evidence of benefit from treatment with selective serotonin reuptake inhibitors (SSRIs) including fluoxetine, fluvoxamine, and citalopram, but some evidence for risk of harm, given a greater incidence of adverse effects, most notably symptoms of behavioral activation.

## 4.2.4. For the Treatment of Persistent Insomnia in ASD

Exogenous melatonin, (available as an over-the-counter supplement) in both immediate-release and extended-release formulations, has been shown to be safe and effective in improving sleep patterns in children with ASD [99]. Some evidence suggests that children with ASD have abnormal melatonin secretion and circadian rhythm abnormalities compared to non-ASD children [100]. Clonidine (an alpha-2 agonist) has shown promise in reducing latency of sleep initiation and decreasing nighttime awakening in ASD [101].

## *4.3. Pharmacogenetics and Role in Medication Selection and Management*

Personalized or precision medicine is emerging in clinical practice based on individual genetic patterns contributing to pharmacogenetics, particularly in the field of psychiatry and treating individuals with ASD [102–105] with behavior issues including ADHD, irritability, aggression and self-injury, repetitive behaviors, and persistent insomnia addressed above. Pharmacogenetics is a study of structural DNA variation that impacts drug metabolism [106,107] and most often based on the cytochrome P450 enzyme system, primarily active in the liver and coded by genes. Cytochrome P450 enzymes metabolize or break down drugs in the liver with most prescription drugs metabolized by this enzyme system and, thus, play a significant role in the treatment of diseases [107,108]. Variation in drug response among individuals due to metabolism differences is now recognized as a major clinical problem, especially given that the use of several medications per patient is common practice. Relevant cytochrome P450 gene polymorphisms and different racial distributions can identify sources of variability in drug response by the modulation of metabolism by the cytochrome P450 enzymes impacting treatment in ASD.

There are over 50 cytochrome P450 hepatic enzymes that are primarily found in the mitochondria [102–105]. These enzymes metabolize endogenous and xenobiotic substrates including environmental pollutants and agricultural and plant-based chemicals and are involved in biosynthesis and metabolism of steroids, vitamins, hormones, lipids, and prostaglandins. About 90% of all drugs are metabolized by seven different cytochrome enzymes including CYP1A2, CYP3A4, CYP3A5, CYPC19, CYP2D6, CYP2C9 and CYP2B6 [106,109–111]. The most commonly prescribed medications used in treating patients with psychiatric problems and ASD are broken down by CYP2D6 [102–105]. It should also be noted that many drugs are also metabolized by more than one cytochrome P450 enzyme and in addition some drugs (e.g., risperidone) require break down to generate an active metabolite or functional agent for treatment.

There is growing evidence that cytochrome P450 enzymes may be altered by the environment in the form of inhibitors or inducers as well as impacting drug–drug interactions. Known inhibitors or inducers may include common sources such as caffeine, grapefruit, broccoli, cabbage, or cauliflower by impacting the individual enzyme activity. For example, if an individual has a reduced form of a cytochrome P450 enzyme, then an inducer may increase the enzyme response in breaking down the drug to help that person in metabolizing a specific medication and, thus, impact response to treatment.

Drug–drug interactions and their concentrations and half-life along with response to inhibitors and/or inducers can all impact medication levels and treatment in the patient. It should also be noted that individuals who are either fast or slow metabolizers based on their microsomal P450 enzyme system genotype patterns may respond differently to specific medications and put them at risk for either failure of drug therapy and/or adverse side effects. Similarly, a better understanding of the metabolic differences that occur with age will further impact on drug dosage and selection of specific therapeutic agents. Therefore, personalized medicine requires the development of resources for clinicians including pharmacogenetic dosing guidelines for medications, as 25 to 50% of individuals do not respond normally to drug dosage or treatment, and this scenario also applies to those with ASD [102].

Applying this knowledge from pharmacogenomics and identifying genes and polymorphisms involved in drug metabolism will benefit patients treated for psychiatric and behavioral problems. The discovery of new classes of drugs and research on existing drugs for new purposes to treat behavioral problems in patients with ASD are under investigation including clinical trials (e.g., in fragile X syndrome), holding promise for improved therapy. In addition, the discoveries made in brain imaging such as functional MRI or PET scans in identifying regions of the brain that are affected in ASD should allow for new treatment discoveries and applications specific for the altered regions identified.

#### **5. Future Directions**

Advances and application of genomic testing technology, bioinformatic approaches, and computational predictions will strengthen genetic testing results and interpretations as more experience is gained in testing patients presenting for clinical services and diagnosis [112–114]. Increased number of next-generation sequencing (NGS) or whole-exome sequencing (WES) studies in ASD of individuals from different ethnic backgrounds will be required to gain ASD-specific genomic information from datasets of both sexes when compared to the normal population. These contributions should allow a better understanding of the role of genetics, genomics, epigenetics, and specific candidate genes and their variants, along with environmental factors playing a role in ASD in relationship to multifactorial influences in family studies [38,112,114,115]. Confounding effects of clinical heterogeneity and diagnostic uncertainty are other complicating issues needing further characterization and evaluation to gain more experience in clinical assessment, genetics, and treatment approaches in autism spectrum disorder.

In addition, research with brain and tissue harvested and stored for structural DNA and RNA expression studies are needed on individuals with ASD having data from cognitive, behavioral, and ASD assessment tools and neuroimaging results while living. Coding and non-coding expression patterns and epigenetic (methylation) signals supplemented with WES and CNV data would be beneficial for a better understanding of the role of genetics in ASD, particularly with larger cohorts of individuals having similar genetic backgrounds, patterns, and ethnicity to identify large-effect pathogenic variants for facilitating genotype–phenotype correlations and allow comparisons. The biological processes, molecular functions with gene-interactions, and pathways that are more autism-specific may be identified through these analytical genetic studies. Currently, there are no molecular pathways known to be uniquely associated with ASD when disturbed. Some gene variants are more related to neurodevelopmental disorders and not specific for autism. Classification of gene variants that specifically cause ASD alone and not attributable to other neurodevelopmental or psychiatric disorders are under investigation as rare, large-effect mutations seen in ASD also influence cognition in a high proportion of individuals, complicating the degree of impact on the ASD phenotype vs. impact on cognitive function [42]. Certain neurodevelopmental gene variants may also impact gene function differently including neural circuits depending on an individual's genetic background differences.

Particular class of variants such as missense or nonsense changes may confer different effects at the protein level. Individual gene variants coding for specific amino acids may impact more important protein regions or domains with certain characteristics at specific amino acid positions, conferring mild consequences, while other amino acid positions may be more important for protein function. These areas of gene variant(s)-protein relationships will require more studies in ASD in the future using improved genetic technology, data collection, and analysis with genotype–phenotype correlations.

Brain tissue regions most often affected in ASD (e.g., hippocampus, cerebellum, etc.) may yield useful information if studied in those persons with documented autism, particularly with stored clinical and imaging data with CNVs and gene variants combined with expression patterns and methylation status in relationship to control subjects who are similarly studied. Mosaicism/tissue-specific gene expression should be considered and further studied in view of more males than females affected with ASD, particularly X-linked genes. Additionally, hormonal-mediated gender influences or differential expression in the brain should be examined for dysregulation in ASD including methylation status of brain-expressed genes on the X chromosome and interaction with autosomal genes (e.g., X-linked FMR1 gene causing fragile X syndrome [116] and CYFIP1 gene at 15q11.2 involved with coding transporter for FMR1 protein [117]). These investigations will require more specialized methods with increased sensitivity such as droplet digital PCR [118].

## **6. Summary**

On behalf of individuals living with ASD and their families and for the benefit of society as a whole, increased awareness and knowledge regarding autism spectrum disorder and commonly related neurobehavioral conditions with contribution of genetic differences are imperative for healthcare professionals who provide evaluation and treatment services for ASD. Early recognition, diagnosis, and treatment should increase the likelihood that affected individuals will achieve optimal long-term outcomes and improved quality of life. Genetic and epigenetic discoveries underlying causes, as well as factors impacting treatment, such as pharmacogenetic variability, have the potential to improve the overall health of individuals with ASD. Additional clinical research to improve the evidence base for various treatment interventions for ASD with related behavioral and psychiatric challenges is desperately needed.

**Author Contributions:** A.G. and M.G.B. designed and contributed to the study equally by reviewing the literature, writing and revising the manuscript and agreeing to publish the article. M.G.B. generated the figure. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** We thank Grace Graham for expert preparation of the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **Proteomics and Metabolomics Approaches towards a Functional Insight onto AUTISM Spectrum Disorders: Phenotype Stratification and Biomarker Discovery**

**Maria Vittoria Ristori 1, Stefano Levi Mortera 2, Valeria Marzano 2, Silvia Guerrera 3, Pamela Vernocchi 2, Gianluca Ianiro 4, Simone Gardini 5, Giuliano Torre 6, Giovanni Valeri 3, Stefano Vicari 3,7, Antonio Gasbarrini 8,9 and Lorenza Putignani 1,\***


Received: 14 August 2020; Accepted: 27 August 2020; Published: 30 August 2020

**Abstract:** Autism spectrum disorders (ASDs) are neurodevelopmental disorders characterized by behavioral alterations and currently affect about 1% of children. Significant genetic factors and mechanisms underline the causation of ASD. Indeed, many affected individuals are diagnosed with chromosomal abnormalities, submicroscopic deletions or duplications, single-gene disorders or variants. However, a range of metabolic abnormalities has been highlighted in many patients, by identifying biofluid metabolome and proteome profiles potentially usable as ASD biomarkers. Indeed, next-generation sequencing and other omics platforms, including proteomics and metabolomics, have uncovered early age disease biomarkers which may lead to novel diagnostic tools and treatment targets that may vary from patient to patient depending on the specific genomic and other omics findings. The progressive identification of new proteins and metabolites acting as biomarker candidates, combined with patient genetic and clinical data and environmental factors, including microbiota, would bring us towards advanced clinical decision support systems (CDSSs) assisted by machine learning models for advanced ASD-personalized medicine. Herein, we will discuss novel computational solutions to evaluate new proteome and metabolome ASD biomarker candidates, in terms of their recurrence in the reviewed literature and laboratory medicine feasibility. Moreover, the way to exploit CDSS, performed by artificial intelligence, is presented as an effective tool to integrate omics data to electronic health/medical records (EHR/EMR), hopefully acting as added value in the near future for the clinical management of ASD.

**Keywords:** autism spectrum disorders (ASDs); proteomics; metabolomics; interactomics; disease biomarkers; clinical decision support systems (CDSSs)

#### **1. Introduction**

Autism spectrum disorders (ASDs) are a complex set of neurodevelopmental diseases, behaviourally diagnosed, that affect several spheres of mental development. They represent a panel of conditions that begin during the developmental period and result in impairments of personal, social, academic or occupational functioning (Figure 1) [1–3].

**Figure 1.** Neuropsychiatric features of autism spectrum disorder (ASD). ASD is characterized by impairments in social interaction, difficulty in adapting behaviour in various social contexts, or lack of interest in peers; communication problems, such as difficulty making eye contact, facial expressions, body postures, and difficulty understanding or using the gestures that regulate interaction with others; and restricted or repetitive behaviours, such as rituals that are conducted with a rigid manner or movements.

Comorbidity with intellectual disabilities, impaired motor coordination and gastrointestinal (GI) disorders are also often present [3,4]. An incidence of ASD of 1 in every 60 subjects is estimated in the United States, with a fourfold frequency for males with respect to females [5–7]. Clinical manifestations typically occur in the 2nd–3rd year of life, usually persist in adulthood, and influence various aspects of mental development [8]. The exact etiopathogenesis of ASDs is not yet well known but many studies have investigated both genetic and environmental factors [9,10]. In particular, gene inheritance has been found in around 60% [11,12] or 80% [13] of cases with significant heterogeneity concerning the genetic factors actually involved in ASD onset [14,15]. In a review of the literature, Higdon R. et al. found that many ASD patients have recurrent de novo disruptive mutations particularly affecting specific protein targets such as chromodomain helicase DNA binding protein 8 (CHD8), activity-dependent neuroprotector homeobox (ADNP), dual-specificity tyrosine phosphorylation-regulated kinase 1A (DYRK1A), and phosphatase and tensin homolog (PTEN). In particular, subjects with PTEN mutations showed abnormal brain white matter volumes in addition to autism symptoms; subjects with CHD8 mutations carried chronic GI complications, distinct facial dysmorphology and macrocephaly; those with DYRK1A mutations had a higher probability of microencephaly and early growth difficulties compared to healthy subjects; and patients carrying an ADNP-disruptive mutation were characterized by intellectual disability and dysmorphic features [16]. Interestingly, a recent study directed by Risch

and his group found that ASD siblings showed a risk increment of 10.1% to develop ASD, compared to control siblings, hence confirming a genetic predisposition in the familial recurrence of ASD [10]. Additionally, oxidative stress, inflammation, mitochondrial dysfunction, and immune dysregulation, associated with changes in protein and metabolites pathways, have been reported in the literature in many ASD studies [17–19].

However, it is also well established that several factors involving the pre-, perinatal, or postnatal environment are associated with amplified risk of ASD [20] (Figure 2). Amongst these, the main factors of maternal infections, stress, and diabetes, also in addition to exposure to pesticides, air pollution, dietary habits, infections, inflammatory conditions, and consumption of antibiotics during pregnancy, which together make up the so-called exposome, combine to increase the ASD risk [21,22]. Indeed, environmental factors can influence genetic vulnerability [23] by modifying the development of neuronal circuits [24]. These alterations are heterogeneous and this can contribute to determining the complexity of the disorder in terms of neurobiology, symptomatology, and etiology [25]. In particular, GI comorbidity associated with an alteration in the microbiota composition is frequently reported in ASD groups. [21,26]. Indeed, many studies have investigated the association between autism severity and GI, showing positive correlations [26–30]. The increased presence of irritability [31], anxiety and affective disorders [3,32], dysregulation and externalizing problems [33], rigid/compulsive behaviors [34,35], increased sensory sensitivity [36], and sleep problems [33,37] have been reported in ASD subjects with concurrent GI symptoms compared to ASD subjects without. Significant GI symptoms are associated with chronic abdominal pain [3,4,26,33,38,39], chronic constipation [3,4,26,39,40], chronic diarrhea [3,39,41], and gastroesophageal reflux [26,38,42]. The existence of a complex bidirectional interaction between the central nervous system and the GI tract is currently more than a hypothesis and primarily involves the impact of ecology and function of the microbiota [43].

Targeted-metagenomics studies on the ecology of the microbiota, performed using next-generation sequencing (NGS), have revealed that specific signatures, such as *Prevotella*, *Enterococcus*, *Lactobacillus*, *Ruminococcus*, *Faecalibacterium prausnitzii*, *Sutterella*, and *Bifidobacterium*, are overrepresented in children with ASD compared to healthy controls [28,44–46]. Furthermore, metabolomics studies evidenced an involvement of the microbiota in the production of molecules, such as tryptophan, inflammatory cytokines, or cortisol, which exert a role in ASD disorders and GI-related symptoms, as recently reported by our group [47]. Therefore, an approach to the investigation of ASD based on omics or meta-omics disciplines is becoming crucial for both disease phenotype stratification and biomarker discovery. Given the large amount of data (i.e., big data) produced by a single omics or meta-omics discipline, an untargeted evaluation of data may provide substantial new models to properly stratify a multifactorial disease such as ASD, considering both host and microbiome profiling and producing integrated models to co-represent genomics/metagenomics, metabolomics/metametabolomics, and proteomics/metaproteomics profiling of the disease.

In particular, transcriptomics and proteomics may improve the existing gene models by profiling molecular phenotypes at transcriptionally active regions of the genome (the transcriptome) and protein abundance (the proteome) levels. Besides a proteomics profiling, post-translational modifications and protein–protein interactions can contribute to providing further specific information on pathways and molecular networks. Furthermore, metabolomics may provide information on the modulation of the host and microbiota metabolism. As previously reported, the diagnosis of ASD depends on clinical observation and procedures to evaluate behavioral, historical, and parent-report information [48]. These tools may involve a significant degree of host variability; hence, detecting metabolomics biomarkers may contribute to the advance of the diagnostics and clinical management of ASD, and may provide new biomarkers that could be used to improve the outcome of individualized interventions as personalized medicine tools.

**Figure 2.** Risk factors associated with ASD: ASD is a multifactorial condition characterized by genetic and environmental factors, including prenatal and postnatal factors that increase the risk of disease. Among the main factors, genetic predisposition, parents' age, and exposures during pregnancy to air pollutants have been associated with poor cognitive outcomes in the perinatal age. Moreover, delivery complications or postpartum haemorrhage might also increase the risk of ASD. All these factors, globally constituting the exposome, may contribute to ASD, hence hampering the search for single biomarkers of the disease.

Given the growing interest in identifying new functional traits of the disease and novel biomarkers, big data obtained by metabolomics and proteomics approaches from blood, urine or saliva specimens should be collected and stored in open source digital biobanks available to omics scientists and clinicians for deep phenotyping. This system biology-based approach will allow multidimensional data to be integrated at cellular, tissue and organ organization levels, providing computational chemometric models concurring in the understanding of the pathophysiological mechanisms of diseases (i.e., onset and progression) [49]. In this paper, we introduce a new analysis of the data (i.e., ontology enrichment of protein and metabolites), based on the category "Biological Process" to highlight new possible biomarkers and metabolic pathways that could open new avenues to the study of ASD disease biomarkers based on proteomics and metabolomics. Moreover, novel biomarkers are discussed in terms of their added value in clinical decision support systems (CDSS), approached by artificial intelligence (AI), that represent a new way to integrate omics, multi-omics data and health/medical records (EHR/EMR). This may represent a novel tool to assist clinicians in the near

future to approach ASD phenotype stratification and hopefully to disentangle the complexity of this disease due to multifactorial components.

## **2. Methods**

## *2.1. Search Strategy*

We conducted a review of the literature to evaluate the role of the proteomics and metabolomics disciplines, and data for disease phenotype stratification and biomarker discovery, in ASD patients. The research was conducted on PubMed, including papers from 2004 to 2019 and using the following terms: "autism" or "autism spectrum disorder" and "proteomics" or "interactome" or "metabolomics" or "protein" or "metabolites" and "omics". All articles providing enough information about the relationship between multi-omics data and ASD were included.

## *2.2. Selection Criteria*

The inclusion criteria for study selection were the following: (1) observational prospective and retrospective studies, case–control studies, cohort studies, or systematic reviews; (2) studies involving proteomics- and metabolomics-based biomarker research in ASD; and (3) English written studies. All studies that did not meet the following criteria were excluded from the review process.

## *2.3. Analysis of Protein and Metabolites Highlighted in Tables 1 and 2*

New "ontology enrichment" analyses, based on class, subclass, and biological processes were performed for both metabolites and proteins reported in Tables 1 and 2. The aim was to identify new potential biomarkers found in the literature that could be interpreted in a new way for appropriate evaluation in decision support systems (DSSs) through the framework discussed herein.


**1.** Proteomics-based targets of ASD.

**Table** 

#### *Int. J. Mol. Sci.* **2020**, *21*, 6274




**Table 2.** Metabolomics-based targets of ASD.


**Table 2.** *Cont.* To be precise, we analyzed 140 proteins, as shown in Table 1:


Some proteins were not included in the analysis because we did not find the associated GOTERM-BP TAS.

We have analyzed 119 metabolites, as shown in Table 2:


Some metabolites were not included in the analysis because we did not find the associated ontology terms in HMDB.

## **3. Stratification of Complex Disease Phenotypes, Early Interventions and Omics-Generated Biomarkers**

There is high variability in ASD symptoms and severity, principally focused on a general deficit in interpersonal interactions, due to reduced or absent verbal communication, or difficulty in communicating with other people [83]. This heterogeneity could result from a combination of many molecular mechanisms and environmental issues and an integrative approach could be a new strategy for ASD deep profiling, possibly allowing behavioural intervention as quickly as possible. Indeed, some studies showed a significant improvement when therapies started before 24 months of age, at which time the neural system is still extremely mouldable [84]. Early interventions would probably reduce the cognitive abnormalities of subjects and help in preventing the onset of an autistic condition [85]. Thus, early diagnosis and prediction are essential for ASD and could be achieved due to the study of molecules involved in the etiopathogenesis of the ASD phenotype. Subgroups can be individuated in ASD patient cohorts, including functioning levels based on different parameters such as IQ values, language, and/or reading impairment, and this could help in the identification of gene/protein candidates or molecular mechanisms that can be associated with one or more of these subpopulations. This type of approach could be useful for the diagnosis and prognosis of the disease and/or for the choice of the therapy to be applied [86–90].

Omics and meta-omics disciplines currently allow deep investigation of ASD starting from a wide panel of samples, ranging from biological fluids to tissues. Omics sciences originate from a holistic vision of the system under study, thus overcoming the classical genetic/biochemical studies based on single or few target molecules. The strength of omics approaches resides in their ability to provide complete profiles of biological "features" (genes/transcripts/proteins/metabolites) to obtain a broad description of a biological system by the so-called systems biology. A non-targeted and high-throughput search of the genetic scaffold (DNA) and functional reservoir (RNA, proteins, metabolites) is key to decoding the pathophysiology of systems as complex as biological systems without any "a priori" statement on the descriptive variable of the systems, hence providing the most appropriate stratification of the disease or identification of novel biomarkers.

These new approaches are possible due to high-throughput technological platforms that have been developed with unprecedented specificity and sensitivity, fused to modern data processing based on the most recent bioinformatics and computational tools, and capable of processing the big data produced. Indeed, the current challenges of the omics technologies rely on the harmonization and integration of the big data generated by the different technologies [91].

A multi-omics approach that combines and integrates the results of more than a single discipline is becoming crucial to understand the pathophysiology of ASD and to identify new diagnostic and prognostic biomarkers from blood, saliva, urine, faeces, or other body fluids [19]. Different matrices can be treated for the direct extraction of DNA, proteins, peptides, or metabolites, or preliminarily processed for fractionation and isolation of cells or bacteria (Figure 3). Genomics or metagenomics follow a single path, starting with the DNA extraction from any matrix and finishing with the genome sequencing and the following bioinformatics' pipeline. Proteins can also be extracted from a wide range of samples and analysed by liquid chromatography and mass spectrometry (LC-MS). Gel electrophoresis alone does not provide the same performance in biomarker discovery and is often used as a further preliminary fractionation step before LC-MS. In this sense, interactomics is a powerful tool for focusing on a potential target and its relationships with other proteins in promoting or inhibiting important functions related to ASD. Faeces and saliva are the most frequently used matrices for bacterial protein extraction, which is a crucial step in microbiota metaproteomics. The subsequent analytical procedure does not differ from that of a study on a single proteome but everything concerning data analysis or taxonomic assessments must rely on huge computational efforts for the individuation of a microbial biomarker or a functional signature of the microbiota or of the host co-metabolism. Metabolites can be easily extracted from faeces along with all biological fluids and analysed with different techniques on the basis of the chemical–physical features of molecules. The volatile fraction of metabolites can be detected by gas chromatography-mass spectrometry (GC-MS) experiments for both untargeted profiling and targeted quantitative analysis. Other small molecules can be analysed by NMR, LC-MS, or even direct infusion in MS.

**Figure 3.** Scheme of the paths from matrices to analytical techniques in a multi-omics approach: An integrative approach could be a new strategy for ASD deep profiling, which combines data from genome sequencing (next-generation sequencing—NGS) with those from proteomics and metabolomics

by one or two-dimensional gel electrophoresis (1/2-DE), liquid-chromatography and gaschromatography mass spectrometry (LC-MS or GC-MS), or even from metabolomics data obtained by nuclear magnetic resonance (NMR) experiments.

A crucial point regards "sample size" computation (i.e., statistical power) for big data that cannot be predicted based on "basic" statistics but rather uses multivariate and univariate analyses for multi-step reduction of multidimensional data. Indeed, big data are usually produced by considering an order of tens or hundreds of samples, but this point (i.e., the number of samples) can be established for each produced chemometric model using different multivariate statistical methods (e.g., Hotelling T2, number of misclassifications (NMC), area under the receiver operating characteristics curves (AUROC), and discriminant Q2; in the case of the latter, in contrast to NMC and AUROC, PLS-DA models with low complexity compared to PLS models are preferred). Therefore, each integrated model based on omics-derived data needs to be validated for each sample set under analysis [92–95].

Integrated multi-omics approaches usually involve metabolomics in conjunction with genomics or metagenomics. However, the coupling of metabolomics and proteomics, in which both yield information about the functional aspects of ASD, appears to be even more promising. Thus, this review mainly focuses on these disciplines.

#### *Computational Models Used to Generate Proteomics and Metabolomics Single and Fused Data*

Regarding computational pipelines ("dry laboratories") able to process big data generated from omics platforms ("wet laboratories"), the analysis of single and fused multidimensional data requires several steps of multi-step reiterated reduction supported by multivariate and univariate models. The variables under study may be the number of subjects (i.e., healthy or patients) and their omics "features" (i.e., proteins, metabolites), represented by worksheets that, overall, represent the system under study; such a system can be represented by a different number of features expressible by different percentages (e.g., 80–50–20%). Hence, the representativeness of the system (e.g., subjects and omics variables) can be evaluated by considering the most appropriate percentages to depict the entire system. Therefore, the starting raw data matrix (big data) becomes reduced to a new data matrix (smart data), the latter ready to be harmonized and integrated into a fused omics data model with data coming from different omics platforms. Then, the reduced and integrated data can be analyzed by chemometric models based on different multivariate statistical methods, as previously described [96]. In this way, the integrated generated model can be validated for each type of sample set (e.g., row data from GC-MS, LC-MS, 1H-NMR, etc.) and for each patient and/or reference subject dataset (e.g., phenomics data) [92–95].

## **4. Translational and Clinical Proteomics**

The rapid expansion of proteomics in recent decades has provided powerful tools to undertake investigations of biological systems, aimed at identifying biomarkers for clinical diagnosis, monitoring the stage of diseases, studying the pathogenetic molecular mechanism, and choosing appropriate treatments.

Translational proteomics is a crucial component of a picture in which other omics and meta-omics disciplines (genomics, metabolomics/lipidomics, transcriptomics, and microbiomics), contribute to complex workflows producing both qualitative and quantitative relevant outcomes.

Under the approach of omics research, molecular profiles, combined with clinical profiles of patients, can be managed to drive clinical decisions, and, hence, advanced treatments. However, a key challenge is to overcome technical bottlenecks and to bridge the gap between early-stage discovery (translational research) and the subsequent stage, which is represented by routine quantitative searching and the determination of biomarker candidates in clinical research settings. Moreover, to accelerate the discovery of clinically actionable biomarkers, the focus must be switched from identification (ID) to quantitation. That is, precision proteomics must converge with precision medicine, in addition to other omics sciences.

For this purpose, the major issue of quantitation and calibration in mass spectrometry (MS) must be correctly addressed. During the past decade, numerous experimental strategies have been developed and refined, together with tools and analytical kits, to provide researchers with a number of solutions for different proteomic approaches, on the basis of sample complexity and matrix variability. These approaches are difficult because a series of factors must be considered for both relative and absolute proteomics quantification [97]. Given the ascertained performance of MS in the absolute quantification of proteins, as a strong alternative to immunoassays, the principal needs currently appear to be focused on the calibration methods, not only to improve the measurement precision but also to make data interchangeable by laboratories globally. Synthetic stable isotope labelled (SIL) proteins and peptides, used as internal standards (IS), present both advantages and drawbacks, due to availability, cost, reproducibility of biological features, and accuracy [98]. An open question is how recombinant SIL proteins, synthetic labelled tryptic peptides (tSIL), or even the so-called "flanked" or "winged" peptides, could be used with reasonable confidence to provide the best accuracy as an internal calibration (IC) in bottom-up experiments [99]. Furthermore, new strategies for external calibration (EC), by means of external reference (molecules), are needed for selected or multiple reaction monitoring experiments (SRM/MRM), to correct bias coming from instrumental variance and sample preparation workflows [100]. Concerning protein ID, due to the continuous growth of public repositories of MS datasets, the use of spectral libraries is increasingly mined compared to classical peptide sequences matching with protein databases (DB). Data-independent acquisition (DIA) approaches appear particularly promising for future applications in biomarker discovery in proteomics. SWATH (Sequential Window Acquisition of all Theoretical Mass Spectra) analysis is currently employed for large scale ID and quantitation in proteomics, with the aim of exploiting the increasing number of spectral libraries available, considerably reducing computational time and space [101]. Applications in metaproteomics remain limited but the desire to identify alternative paths for protein detection and inference, overcoming the bias of conventional sequence databases, makes this an intriguing analytical strategy for faster identification/quantitation [102,103].

The extension to metabolomics continues to suffer from a lack of public libraries, but also appears to be a powerful approach. A new tool has recently been developed to match MS and MS/MS spectra searching for identical or analogous features in public repositories of metabolomics data [104].

Fewer proteomics studies on ASD have been published than those based on other disciplines, such as genomics or transcriptomics. Many of them rely on post-mortem brain tissues [50–52], serum [53–56], plasma [57–59], urine [63], saliva [60–62] direct samples or lymphoblastoid cell lines [64] (Table 1).

The first study on brain tissues was conducted by Junaid et al. [50]. Post-mortem brain samples from 8 ASD patients and 10 controls were analysed using two-dimensional gel electrophoresis (2-DE) followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis; the glyoxalase 1 (Glo1) protein showed a shift in net charge allowing a polymorphic form of the protein in the brain of ASD patients to be characterized. Furthermore, they found that isoform have reduced enzyme activity and its variant was found to be increased in ASD patients with respect to control cases. The authors suggested that the homozygosity of a Glo1 variant could be one of the susceptibility factors in the etiology of autism. In another study of post-mortem brain tissue, the authors studied the prefrontal cortex of 10 ASD individuals and the cerebellum of 16 ASD patients versus control subjects. The results showed that the proteins, found in the brain tissue of ASD patients, are involved in processes of synaptic vesicle regulation, myelination, and energy metabolism [51]. Recently, Wingo et al. recruited 104 participants who were followed for up to 14 years, and 27% of the cohort were finally diagnosed as ASD. They studied proteins to identify new processes underlying variation in cognitive modification, by label-free LC-MS/MS experiments and found 579 proteins associated with cognitive trajectory after meta-analysis. Moreover, they found 38 proteins related to cognitive trajectory independently of β-amyloid plaques and neurofibrillary tangles [52]. Studies conducted on non-neurological tissues highlighted changes in quantities of proteins associated with inflammation or regulation of the

immune system, including some interleukins [50,51,105–108]. Interestingly, most proteomic studies investigating ASD identified proteins involved in lipid metabolism and differentially expressed in ASD [51,55,57,105,106,109,110]. As an example, Corbett et al. identified differentially expressed peptides using LC-MS/MS, which allowed for the recognition of complement factor H-related protein (FHR1), apolipoprotein (APO) B-100, fibronectin 1 (FN1), and complement C1q as dysregulated proteins in children with ASD compared to control subjects [53]. In another study, apolipoproteins involved in cholesterol metabolism were found to be increased in an ASD cohort with respect to the control subjects [54]. In addition, two studies reported an alteration in proteins implicated in lipid metabolism, in the inflammation process and cell growth [55,56] (Table 1).

The study of Cortelazzo evidenced a total of 12 dysregulated proteins, associated primarily with acute inflammatory response [57]; despite irregularities of lipid metabolism shown in ASD tissues, the authors could not characterize primary or secondary occurrence in ASD pathophysiology. In the study by Pichitpunpong et al., ASD patients were subdivided into subgroups using the Autism Diagnostic Interview-Revised (ADI-R) method, and subsequently, dysregulated genes were identified by studying transcriptome profiles of individuals with ASD and control subjects. They profiled protein from lymphoblastoid cell lines using 2-DE followed by LC-MS/MS. Subsequently, proteins were compared to the dysregulated transcripts. Selected proteins were also analysed by Western blotting [64]. The authors reported 82 proteins linked to subjects with ASD and with severe language impairment and, among these, 14 were correlated with inflammation and neurological functions. Moreover, the diazepam-binding inhibitor (DBI) protein was notably reduced in the subgroup with severe language impairment. Furthermore, its expression levels were matched with the ADI-R items [64]. Recently, Shen et al. used the iTRAQ-based proteomics approach to compare plasma protein profiles of ASD compared with healthy subjects. They identified 24 differentially expressed proteins expressed in different pathways associated with ASDs [59] (Table 1). This evidence supports the thesis that synapsis growth, the complement system, cytoskeleton-related activities and cell adhesion are all involved in ASD. Moreover, using ELISA (enzyme-linked immune-adsorbent assay) and ROC (receiver operating characteristic) analysis, the authors found five proteins as possible biomarkers to discriminate ASD from controls [59]. Interestingly, they identified focal adhesion, cell adhesion molecules, and leukocyte trans-endothelial migration pathways that were correlated with ADS in a Chinese cohort [111].

Saliva, urine, and blood samples have also been taken into consideration. In particular, saliva-based studies showed an alteration in the processes involved in antimicrobial peptides, immune response, and inflammation at the mucosal level [60–62].

Overall, a large number of studies highlighted abnormalities in synapse biology in ASD patients, showing a prevalence of proteins related to neural tissue, and evidencing a limitation in the application of non-neural tissues in proteomics studies of ASD.

Indeed, due to the blood–brain barrier (BBB), it can be difficult to find proteins associated with ASD in peripheral blood, as possible disease-specific protein markers, despite the ease of use, cost-effectiveness, and low invasiveness of blood collection. Moreover, a study highlighted high intestinal and BBB permeability in ASD patients; indeed, ASD shows a reduced expression of occluding, tricellulin, claudin-1, and increased pore-forming claudins, which are part of gut barrier-forming "tight junction" (TJ) components [108]. Furthermore, it was found that a protein (zonulin) involved in intestinal permeability increased in ASD patients compared with controls, particularly with respect to the Childhood Autism Rating Scale score. [109]. Thus, possible blood biomarkers could be proteins associated with intestinal and BBB permeability.

We conducted an in-depth analysis of the protein data found in the reviewed studies, as reported in Table 1. For the 140 proteins (Table S1), we first undertook an analysis based on the matrix (e.g., blood, brain, saliva, urine, and lymphoblastoid cell lines) and on the number of times (frequency) that the protein was reported in all of the analyzed articles. The analysis showed a major presence of the apolipoproteins (APOE and APOA1 proteins) and fibronectin (FN1). Then, we analyzed all of the biological pathways associated with the proteins and clustered all of the proteins according to the

biological process (BP). Interestingly, the results showed that the two major biological processes associated with ASD were platelet degranulation and lipid metabolism (Figure 4).

Indeed, platelets play a safe role in the pathophysiology of thrombogenesis and atherogenesis, and this connection may also be due to an increase in Body Mass Index in ASD children [112]. In addition, serotonin is also present in the platelet, and stimulates aggregation by exercising the vasoconstrictor and thrombogenic effect in response to lesions of the basal endothelium. In fact, it is now known that many ASD patients have hyperserotonemia or elevated serotonin levels in whole blood (5-hydroxytryptamine, 5-HT). Despite decades of study, the mechanisms behind this well-replicated biomarker and the contributions of the serotonergic system to ASD remain unclear and further studies are necessary [113]. Almost all whole blood 5-HT is found in the platelet and the serotonin 5-HT2A receptor improves platelet functions induced by adenosine diphosphate (ADP) signaling, with exposure to phosphotidylserine (PS) and receptor activation fibrinogen.

Overall, the 5-HT2A receptor improves platelet aggregation. Thus, this suggests some level of mutual regulation. Remarkably, this link with the serotonin 5-HT2 receptor has been extensively studied in ASD [114]. Although the importance of fats in the correct development and maintenance of cells of the nervous system is now recognized, lipids may actually play a role in regulating inflammation.

Overall, proteomics studies of ASD remain limited and vary between different technological approaches, focusing on various matrices. However, blood may be promising for diagnostic purposes, and because the data produced from the analysis are more accessible. Finally, faeces represents an intriguing matrix, mostly concerning the strong relationship between ASD and GI symptoms; however, in this case, the gut bacterial proteome/metaproteome [102] could be involved in the investigation even though, to the best of our knowledge, no metaproteomics studies have been published to date.

**Figure 4.** *Cont*.

**Figure 4.** Protein analysis as a tool for the decision support system (DSS). In box (**i**), we grouped the proteins highlighted in Table 1 and Table S1 according to the matrix in which the proteins were studied, such as blood (**orange**), blood and urine (**lawn green**), brain (**red**), dried blood (**brown**), urine (**pink**), and brain biopsies, urine and blood (**lilac**). The size of the bubbles indicates the number of times the protein was found in that matrix in the different studies taken into consideration. In box (**ii**), we analyzed the data for the biological process and clustered the protein. Legend code: (A) platelet degranulation (GO:0002576); (B) cellular protein metabolic process (GO:0044267); (C) neutrophil degranulation (GO:0043312); (D) regulation of complement activation (GO:0030449); (E) receptor-mediated endocytosis (GO:0006898); (F) extracellular matrix organization (GO:0030198); (G) antimicrobial humoral response (GO:0019730); (H) cytokine-mediated signaling pathway (GO:0019221); (I) retinoid metabolic process (GO:0001523); (L) immune response (GO:0006955); (M) blood coagulation (GO:0007596); (N) membrane organization (GO:0061024); (O) pyruvate metabolic process (GO:0006090); (P) signal transduction (GO:0007165); (Q) chemical synaptic transmission (GO:0007268); (R) regulation of lipid metabolic process (GO:0019216); (S) transmembrane transport (GO:0055085); (T) glutamate secretion (GO:0014047).

#### **5. Metabolomics**

Metabolomics is an approach to understand the metabolic pathways and metabolic network regulation of a biological system. There are therefore many fields of application of this discipline, including medicine and biology [115].

Metabolomics has been studied to identify possible biomarkers in the serum or urine samples of patients with obesity [116], diabetes [117], and coronary heart diseases [118]. Recently, it has been evidenced that perturbations of metabolic pathways could affect the pathogenesis of central nervous system disorders [119].

Thus, metabolomics provides a powerful tool to map these perturbations and their relationship to disease and response to therapy. In fact, some studies have been carried out to investigate metabolomics biomarkers in the brain [65,66], plasma [67,68,70], dried blood [79] and urine [72–78] samples of ASD patients (Table 2). Yap and coworkers used a proton nuclear magnetic resonance ( 1H-NMR) spectroscopy method that showed increased levels of taurine and low content of glutamate in urine samples of ASD patients [72]. Ming et al. used a combination of liquid chromatography (LC-) and gas chromatography (GC-)-based MS, and revealed abnormal amino acid metabolism and increased oxidative stress in urinary specimens of ASD patients [73].

Mavel et al. identified more than 150 metabolites in urine, comparing 30 ASD children with 28 neurotypical subjects. Increased levels of succinate, taurine, β-alanine, and glycine were found in ASD subjects [74]. In another study, a statistically significant increase of homovanillic acid, tryptophan, glycolic acid, and 3,4-dihydroxybutyric acid was detected [76]. The authors supposed that the increase in glycolate was linked with primary oxaluria type I, and that the phenomenon was also related to yeast overgrowth.

Tryptophan and homovanillic acid have been indicated as metabolites of neurotransmission probably involved in neurodevelopmental disorders. Although 3,4-dihydroxybutyric acid may be a typical component of human urine, its increase has been highlighted in cases of succinic semialdehyde dehydrogenase deficiency, which represents a disorder in patients that could also show ASD features [120]. Recently, Dieme et al. detected increased levels of indoxyl, indoxyl sulfate and *N*-acetylarginine, and decreased levels of methylguanidine and other compounds, such as desaminotyrosine and dihydrouracil [77]. Bitar et al., in a group of 40 ASD children and 40 age-matched controls, highlighted perturbations in various compounds [78], including 2-hydroxybutyrate, glutamate, creatine, and tyrosine. In addition, the authors also identified metabolites, including cysteic acid, guanine and trigonelline. These results showed abnormalities in carbohydrate and amino acid metabolisms, in addition to differences in oxidative stress pathways [78].

Furthermore, by using a GC-MS approach in urine sample analysis, it has been possible to build a multivariate statistical model that captures global biochemical signatures of autistic individuals, thus enabling patients to be distinguished from healthy children [75]. To investigate metabolites potentially related to the ASD disorder, it is possible to also study metabolites in plasma samples. The analysis of these metabolites tended to approximately support the results noted in urine, although specific gaps exist due to differences in renal clearance for some compounds.

Plasma fatty acids were recognized as diagnostic markers for ASD, specifically as an increase in the most saturated fatty acids and a decrease in polyunsaturated fatty acids, such as valeric, hexanoic and stearidonic acids [121]. Kuwabara et al. identified diverged levels of plasma metabolites connected with mitochondrial dysfunction and oxidative stress in ASD patients [67]. In a study by Wang et al., a cohort of ASD patients and participants without autism were analyzed using ultra-performance LC quadrupole time-of-flight tandem MS (UPLC/Q-TOF MS/MS) to identify metabolic variations in serum. In particular, the authors identified 17 metabolites, two of which were associated with ASD and could be significant predictors of autism: sphingosine 1-phosphate (S1P) and docosahexaenoic acid (DHA) [69].

West et al. used LC-MS/GC-MS to recognize significant concentrations in aspartate, dehydroepiandrosterone sulfate (DHEA-S), glutaric acid, serine, and succinic acid. Decreased levels of citrate, creatinine, glutamate, hydroxyphenyllactate and isoleucine were detected, thus indicating the roles of altered abnormal mitochondrial energy production (succinic acid, DHEA-S, citrate, aspartate, glutamate) and branched-chain amino acid metabolism (isoleucine, hydroxyphenyllactate) [68].

Rangel-Huerta et al. used untargeted metabolomics (HPLC–MS/MS) of plasma samples from 30 ASD children and 30 age-matched controls. In this study, ASD patients were subdivided into two groups: with (AR) and without (ANR) neurologic regression. The metabolic intermediates were detected in the aspartate, beta-alanine, glucose–alanine cycle, malate–aspartate shuttle, urea cycle, and tryptophan breakdown pathways showing significant statistical differences between controls and ASD patients. In addition, within the two subgroups, significant statistical differences were also highlighted in the levels of the fatty acids decanoylcarnitine, laurate, arachidate, myristate, octanoylcarnitine, quinate, and 7-methylurate [70]. Rachel S. Kelly et al. highlighted differences in tyrosine metabolism, tryptophan biosynthesis and endocannabinoid metabolism in a cohort of children, 13% of whom had ASDs. In addition, they hypothesized that metabolomic biomarkers could help to identify children with poor communication skills [71].

A recent study investigated the metabolic profile by extracting molecules from dried blood spots (DBSs) and performing a targeted analysis on a panel of 45 metabolites [79]. The level of nine of these molecules (20%) was significantly higher in ASD patients with respect to controls, including citrulline and acyl-carnitines C2. The results suggested that the mitochondrial fatty acid β-oxidation pathway was less active, revealing hidden molecular mechanisms related to ASD. Thus, this non-invasive methodology was shown to be suitable for a screening of newborn subjects to emphasize modifications in metabolic profiles during development [79].

Moreover, it is possible to analyze the metabolomes derived from brain tissue, for example, using LC-MS analyses for untargeted metabolomics analysis. As reported by Graham (106), in brain tissue from 11 deceased subjects with ASD, compared to 11 controls, a group of statistically significant compounds, such as 3-methoxytyramine, 5,6-dihydrouridine and *N*-carboxyethyl-γ-aminobutyric acid, was detected [65].

Post-mortem prefrontal cortex samples were analyzed by Kurochkin et al. in a cohort of 32 ASD subjects and 40 controls. The study identified increased levels of glutathione disulfide and 5-oxoproline and, on the contrary, decreased levels of glutathione, L- γ-glutamyl-cysteine and l-cysteinyl-glycine. All of these metabolites were involved in their respective metabolic pathways and in others such as pyruvate metabolism, starch and sucrose metabolism, arginine and proline metabolism and TCA cycle [66].

Regarding proteins, we performed a bubble analysis of the metabolites (Figure 5). For the 119 metabolites (Table 2 and Table S1), the first analyses based on the matrix showed a major presence of glutamate, decanoyl carnitine, and tryptophan, clustered in the major BP lipid metabolism pathway, confirming the data from the proteins. In addition, another implicated BP was the tryptophan metabolism, which was implicated in the gut–brain axis because of an increased 5 -HT, leading to tryptophan depletion and contributing to hyperserotonemia, which is associated with GI symptoms and neurodevelopmental disorders [47].

**Figure 5.** *Cont*.

**Figure 5.** Metabolites analysis as a tool for the decision support system (DSS). In box (**i**), we grouped the metabolites highlighted in Table 2 and Table S1 according to the matrix in which the metabolites were studied, such as blood (orange), blood and urine (lawn green), brain (red), dried blood (brown), urine (pink), and brain, urine and blood (lilac). The size of the bubbles indicates the number of times the metabolites were found in that matrix in the different studies taken into consideration. We analyzed the data for the biological process (box (**ii**) and clustered the metabolites. Legend code: A: Lipid Metabolism Pathway; B: Glycine and Serine Metabolism; C: Tryptophan Metabolism; D: Transcription/Translation; E: Histidine Metabolism; F: Glutamate Metabolism; G: Thioguanine Action Pathway; H: Tyrosine Metabolism; I: Glutathione Metabolism; L: Nicotinate and Nicotinamide Metabolism; M: Galactose metabolism; N: Glutaminolysis and Cancer.

The study of the metabolites in the ASD subjects provides an understanding of the complexity of the system in terms of the main metabolic interactions between metabolites, with particular reference to their relationship with ASD.

#### **6. Interactome in ASD**

In the context of a multi-omics functional approach, the identification of proteins related to ASD, together with their functional annotation, correlations, and association to biological pathways, can be related to the metabolomics outcome, to depict a comprehensive picture of the functions involved in the ASD pathogenesis or in the identification of ASD-related biomarkers.

In this sense, interactomics, namely the ensemble of all of the molecular interactions that could take place in a group of proteins involved in a biological function, is furthermore key for data interpretation. Currently, more than 650,000 protein–protein interactions (PPIs) have been reported and found to constitute the human interactome. [122,123]. Furthermore, it was found that PPIs have been involved in most diseases, such as neurodegenerative disorders, leukemia, cervical cancer, and bacterial infection [122]. In fact, many human diseases involve the loss of an essential interaction or formation of a protein complex at an inappropriate time or location [122]. A possible role of the interactome in psychiatric diseases, such as ASD, has been hypothesized, focusing on patients who show alterations in some biological pathways, including calcium homeostasis, oxidative stress, energy metabolism, synaptic transmission, cytoskeleton, and immune system development [124–126].

Studies have highlighted that disruptions in the Schizophrenia 1 (DISC1) gene single-nucleotide polymorphisms are related to some psychiatric disorders, including ASD [127]. Protein complexes of DISC1 are involved in intracellular transport, cell cycle/division and cytoskeletal stability and organization [128]. Sakai et al. [129] found new interactions among protein products encoded by ASD-associated genes, including tuberous sclerosis 1 (TSC1) and tuberous sclerosis 2 (TSC2) proteins, which are implied in tuberous sclerosis complex (TSC), a rare disease associated with ASD [130].

Alfieri et al. studied the synaptic interactome associated with the p140Cap protein, which is involved in synaptogenesis, plasticity, and synaptic transmission [131]. Another study investigated the MET receptor tyrosine kinase one, as a potential interactome in ASD. MET receptor tyrosine kinase is involved in spine morphogenesis and, synaptic structures including dendritic complexity, controlling neuronal growth, functional maturation, and glutamatergic synapse maturation in the hippocampus [132].

SHANK3 gene duplications, deletions, and various point mutations have been observed in ASD patients, with the role of organizing the postsynaptic density by assembling complexes with signaling molecules, postsynaptic receptors, and cytoskeletal proteins [14,133–135]. ASD symptoms, such as motor hyperactivity, a tendency toward acoustic startling, reduced prepulse inhibition and abnormal circadian rhythms, were found in some SHANK3 overexpressing transgenic mice [133]. In another work, Lee et al. investigated how the rapamycin interactome could target SHANK3 in a mammalian model [136].

All of these studies provide an idea of the possible biological interactions in psychiatric disorders, such as ASD. Currently, however, information from interactomics is not sufficient to delineate a proper picture of the system. A comprehensive map could clearly help to provide crucial information for more accurate diagnosis and targeted treatments. Future investigations should consider the many factors that are often neglected, from family history and lifestyle to age, ethnicity, and dietary habits.

#### **7. Clinical Decision Support Systems to Improve Medical Diagnosis on ASD**

Clinical decision support systems (CDSSs) are described as "active knowledge systems that use two or more items of patient data to generate case-specific advice" [137]. Medicine is now oriented toward personalized and precise treatment. The application of electronic medical and health record systems (EHR/EMR) and integration with data from translational research will improve the timing of diagnosis and, therefore, reduce costs. Thus, CDSS is a tool that incorporates clinical patient information into big data from omics platforms to improve patient care.

The CDSS was designed to assist the clinician in the relationship with the patient from initial consultation to diagnosis and follow-up. In fact, CDSS has been shown to positively influence clinical outcomes, improving the quality of patient safety [138–141], reducing bias from medical decisions [142,143], yielding more reliable data [144,145], and promoting prevention and specific treatments [146–148]. For example, the Child Health Improvement through Computer Automation system (CHICA) is a computer-based CDSS developed for the automation of the management of chronic disease and their prevention. It also works as an electronic health record (EHR), supporting and easing the workflow of pediatricians [149,150].

In a clinical trial, Downs et al. used CHICA integrated with the EHR on an electronic tablet or a sheet of scanned paper that families could complete in the waiting room. CHICA analyzes the answers to the questions and selects the six most important alerts or reminders for the clinician. These results are accumulated into a visit agenda and the clinician can respond to the alerts and reminders on the agenda. Thus, this method demonstrates that automating surveillance for ASD and automating the administration of a screening test can result in high rates of screening [151].

To better understand ASD and be more able to stratify the population of ASD patients into subgroups, it is necessary to integrate all of the data from the omics with the data collected by the clinician and analyze them through machine learning models. This allows advanced models to be generated and also used in clinical practice. The wide range of generated omics data, particularly those obtained from proteomics and metabolomics analyses, requires an advanced computational analysis that allows identification of possible biomarkers associated with the different ASD phenotypes, such as high or low functioning, and presence or absence of repetitive behaviors. In turn, this enables a deeper understanding of the molecular mechanisms associated with the different phenotypes and the implementation of personalized treatments.

The increase in omics data is contributing to the use of the molecular subtype in complex diseases; in particular, this approach is already used in the study of cancer, such as for the Cancer Genome Atlas (TCGA) [152].

In addition to this approach, efforts are being made to implement data from omics, such as the multi-omics profiling expression database (MOPED), to process these data, and to standardize data from genomics and proteomics [153–156]. Additionally, in the field of autism, databases such as the National Database for Autism Research [157] have been created to host multidisciplinary omic data, including exome sequencing, brain imaging, and clinical diagnostic data.

These types of automation, such as the CHICA system with EHR and raw data from omics approaches, based on the computation of machine learning algorithms, are the types of approaches currently under development. Furthermore, these approaches can considerably increase the speed at which children can be screened for ASD, thus improving the age of diagnosis and helping to better define the ASD subgroup. Data from the multi-omics approach (such as that from proteomics, and metabolomics, in particular) will be elaborated with the information taken from clinician reports (such as neuropsychiatric or GI symptoms). Artificial intelligence (AI) will integrate this information to study the characteristics of ASD groups, and stratify patients into sub-groups according to the phenotype characteristics.

Moreover, the CDS can be defined as an "expert system", according to the classification of Wright et al. [158]. Expert systems assist clinicians by offering complex decision support by combining patient data with electronically available data. In our case, we crossed clinical data (EHR), laboratory data (multi-omics data), and lifestyle data (a survey) of patients. The CDSS can be made available through the Internet (e.g., https://cds.asd.com). The system can be trained with data of a cluster of patients of known diagnosis (Figure 6A). Then, the clinician will be able to access the web interface and upload all of the required information. The software can be connected directly to the hospital database through the laboratory information system (LIS) and can allow the collection of all of the required inputs, such as EHR information (Figure 6B). The system, consisting of bioinformatics algorithms and machine learning models, will analyze the data and provide, via the web interface, a clinical report. The results in the form of a clinical report can be easily interrogated to support other clinicians in a possible diagnosis or treatment of autism phenotypes, after deep stratification of different types. The CDSS will be able to increase accuracy with the increase of the so-called continuous learning of the machine. Thus, the expectation is that the CDSS, properly guided, will make a beneficial contribution to patient care at all levels.

**Figure 6.** Clinical decision support system (CDSS) as a new approach to ASD children to screen and improve the age of diagnosis: Box (**A**) (training) shows the flowchart including all data used (survey, such as Child Health Improvement through Computer Automation system (CHICA), omics data and electronic medical records (EHR)) by machine learning model to classify patients. Once the model has been constructed with good accuracy, the clinician (box (**B**)) (prediction) will upload the patient data. An A.I. model will predict the class with an actionable result summarized in the clinical report. The result will generate subgroups based on the patient's features, for example, high functioning (H), low functioning (L), and control group (C). This system might be used by clinicians to improve early diagnosis because it provides significant information about the features of the ASD patient.

#### **8. Discussion and Future Prospects**

It is clear that the etiology of ASD is given by a combination of genetic and environmental factors, and the only therapy for ASD that has been demonstrated to be effective is behavior analysis. Furthermore, genetic and environmental factors are implicated in changes in the brain and metabolism, such as mitochondrial dysfunction, neurotransmitters alteration, abnormal neuron development, neuroinflammation and immune dysregulation and oxidative stress. In addition, the diagnosis of ASD currently depends on clinical observation and procedures to evaluate behavioral, historical, and parentreport information [48].

There are also important ASD-related considerations to be taken into account, namely, ASD's etiological heterogeneity, varied comorbidities, and the complexity of the brain as the centrally affected organ. In addition, there have been relatively few omics-based studies of ASD thus far. These aspects may involve significant variability, therefore, detecting proteomics and metabolomics biomarkers that look at the functional aspect may contribute to the advance of the clinical diagnosis of ASD and find new tools that could be used to estimate the outcome of individualized interventions. Results produced from proteomics-approaches show that differential expression of mitochondrial bioenergetics (involving NDUV1 protein), inflammation and immune function (MBP protein), proteins of lipid metabolism (APOB-100) and synaptic biology (SYT1 protein) are crucial in the pathogenesis

and progression of the ASD. In fact, synaptic degeneration and mitochondria dysfunction are examples of the cellular events that could be correlated with ASD and cognitive impairment.

The altered metabolites are mainly associated with fatty acid metabolism (decanoyl-L-carnitine), oxidative stress (e.g., glutathione), mitochondrial dysfunction (e.g., arginine), amino acid metabolism, cholesterol metabolism, energy metabolism (e.g., Succinic acid), intestinal microbiota (e.g., tryptophan) and neurotransmitters (e.g., γ-aminobutyric acid).

Therefore, the results highlighted by metabolomics confirm that mitochondrial dysfunction may be a risk factor for autism; it could also be a target to find a possible biomarker to identify ASD patients and to show different expressions in subgroups of patients. All of these results may influence social and cognitive deficits in autism. In addition to elucidating the ASD pathobiology, multi-omics approaches could lead to the identification of novel biomarkers for improved diagnosis and therapeutic monitoring and diagnosis; hence the ultimate aim is to find biomarkers that are simple, inexpensive, and noninvasive, in accessible tissues and body fluids such as the blood or, preferably, urine (Figure 7).

**Figure 7.** Multi-omics approach to ASD. ASD is a multifactor disease that includes genetic and environmental factors. The phenotype of ASD determines abnormal neurodevelopment with alteration in neurotransmitters. Furthermore, ASD is characterized by mitochondrial dysfunction, oxidative stress, inflammation and abnormal immune regulation. All of these dysfunctions produce possible biomarkers that could be identified by a new multi-omics approach to studying ASD.

Currently, the main limits are due to the lack of uniformity in the collection of clinical phenotypes with homogeneous methods; moreover, few studies have addressed the degree of correlation between different biomarkers or evaluated multiple biomarkers or endophenotypes in parallel. Furthermore, studies are limited in their evaluation of biomarkers by comparisons of patients with ASD and healthy controls, without considering the family and specific characteristics of the pathology. Often, the sample cohort is also highly limited. From the point of view of omics data, the biggest limit is that all of the data from the omics are not considered and the data are not integrated with collected clinical data.

Thus, this big data approach will be essential for the early detection of ASD and the possibility of developing more precise medicine to design and optimizing the pathway for diagnosis, therapeutic intervention, and prognosis using omics data that are able to highlight individual variability between ASD patients.

In this context, proteomics and metabolomics offer the best possibility of viewing changes in the functional pathways associated with the disease, such as increases or decreases in the expression of proteins or metabolites markers. These markers can be differently expressed in different patient cohorts thus helping improve diagnostics accuracy, ASD population stratification, and the development of effective personalized treatments.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1422-0067/21/17/6274/s1, Table S1: Table of all proteins with UniProt code with associated matrix.

**Author Contributions:** Conceptualization, M.V.R. and L.P.; methodology, M.V.R. and L.P.; investigation, M.V.R. and L.P.; writing—original draft preparation, M.V.R. and L.P.; writing—review and editing, M.V.R., S.L.M., V.M., S.G. (Silvia Guerrera), P.V., G.I., S.G. (Simone Gardini), G.T., G.V., S.V., A.G. and L.P.; supervision, A.G. and L.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Fondazione Bambino Gesù, Grant number 201903\_FBG, and Ministry of Italian Health, Ricerca Corrente grant number 201905\_genetica, to L.P.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


*Int. J. Mol. Sci.* **2020**, *21*, 6274


## **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **Phenotypic Subtyping and Re-Analysis of Existing Methylation Data from Autistic Probands in Simplex Families Reveal ASD Subtype-Associated Di**ff**erentially Methylated Genes and Biological Functions**

## **Elizabeth C. Lee** † **and Valerie W. Hu \***

Department of Biochemistry and Molecular Medicine, The George Washington University, School of Medicine and Health Sciences, Washington, DC 20037, USA; elee171@jhu.edu

**\*** Correspondence: valhu@gwu.edu; Tel.: +1-202-994-8431

† Current address: W. Harry Feinstone Department of Molecular Microbiology and Immunology,

Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.

Received: 25 August 2020; Accepted: 17 September 2020; Published: 19 September 2020

**Abstract:** Autism spectrum disorder (ASD) describes a group of neurodevelopmental disorders with core deficits in social communication and manifestation of restricted, repetitive, and stereotyped behaviors. Despite the core symptomatology, ASD is extremely heterogeneous with respect to the severity of symptoms and behaviors. This heterogeneity presents an inherent challenge to all large-scale genome-wide omics analyses. In the present study, we address this heterogeneity by stratifying ASD probands from simplex families according to the severity of behavioral scores on the Autism Diagnostic Interview-Revised diagnostic instrument, followed by re-analysis of existing DNA methylation data from individuals in three ASD subphenotypes in comparison to that of their respective unaffected siblings. We demonstrate that subphenotyping of cases enables the identification of over 1.6 times the number of statistically significant differentially methylated regions (DMR) and DMR-associated genes (DAGs) between cases and controls, compared to that identified when all cases are combined. Our analyses also reveal ASD-related neurological functions and comorbidities that are enriched among DAGs in each phenotypic subgroup but not in the combined case group. Moreover, relational gene networks constructed with the DAGs reveal signaling pathways associated with specific functions and comorbidities. In addition, a network comprised of DAGs shared among all ASD subgroups and the combined case group is enriched in genes involved in inflammatory responses, suggesting that neuroinflammation may be a common theme underlying core features of ASD. These findings demonstrate the value of phenotype definition in methylomic analyses of ASD and may aid in the development of subtype-directed diagnostics and therapeutics.

**Keywords:** phenotypic subgroups stratified by ASD severity; simplex families; DNA methylation; subgroup-associated genes and biological functions

## **1. Introduction**

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder characterized by impaired social communication and repetitive behaviors [1]. Tremendous phenotypic and symptomatic heterogeneity exists within the ASD population, thereby presenting a challenge to diagnosis and treatment. The wide range of clinical presentation in ASD is attributed to different underlying etiologies, which include both genetic and environmental influences. One area that bridges the genetics–environment gap is epigenetic variation, which has been proposed to play a role in ASD [2–6]. It has been shown that DNA methylation is dysregulated in ASD in multiple studies involving both peripheral and brain tissues, principally from individuals with ASD from the multiplex population [7–15]. However, published DNA methylation studies of ASD have produced inconsistent findings, including variable reporting of differentially methylated sites. This inconsistency may be explained not only by the different tissues used but also in part by the wide phenotypic heterogeneity intrinsic to ASD.

Previous findings from our laboratory showed that reduction of ASD clinical heterogeneity by classifying patients into subphenotypes based on cluster analyses of severity scores from the Autism Diagnostic Interview-Revised (ADI-R) diagnostic instrument [16] results in increased ability to detect statistically significant subphenotype-specific transcriptomic as well as genetic differences, which were otherwise undetectable in an aggregate analysis of all individuals with ASD [17–20]. Based on these previous studies that demonstrate the value of subphenotyping in genome-wide omics analyses and the growing body of evidence implicating a link between ASD and epigenetic modification, we hypothesized that the stratification of individuals with ASD by phenotypic severity will result in the identification of subphenotype-dependent DNA methylation differences between cases and controls that achieve statistical significance.

The present study involves re-analyzing existing Illumina HumanMethylation27K BeadChip data from lymphoblastoid cell lines (LCLs) derived from blood lymphocytes of 292 male ASD probands from the Simons Simplex Collection (SSC) after stratification into three distinct subgroups based on ADI-R symptom severity profiles (mild, intermediate, and severely language-impaired). The main goals of this research are to: (1) identify statistically significant differences in DNA methylation between cases and typically developing sibling controls for each of the three ASD subphenotypes, (2) examine the impact of decreasing phenotypic heterogeneity on the ability to detect statistically significant differentially methylated regions and associated genes (DAGs) by comparing results with and without subphenotyping, and (3) identify biological functions, signaling pathways, and disorders associated with DAGs from each subgroup analysis.

#### **2. Results and Discussion**

#### *2.1. DAGs Associated with ASD Subphenotypes and the Combined Case Group*

Hierarchical clustering (HCL) and principal components analysis (PCA) using scores on the ADI-R diagnostic scoresheets from each of the probands were performed as previously described [16]. These cluster analyses confirmed that the 292 cases in this methylation study could be separated into three phenotypic subgroups based on their severity scores from the ADI-R. A heatmap depicting clinical severity across 123 scores on 63 ADI-R items for individuals in each subgroup is shown in Figure 1, together with PCA plots from the data reduction analysis confirming the separation of cases into three distinguishable subgroups according to integrated severity profiles. Notably, the first three principal components (represented by the x, y, and z-axes of the 3-d PCA plot) account for 85.72% of the variability among all probands based on the 123 ADI-R scores.

**Figure 1.** Separation of probands into three phenotypic subgroups based on cluster analyses of 123 scores on 63 Autism Diagnostic Interview-Revised (ADI-R) items for each individual. (**A**) Hierarchical clustering (HCL) analysis, (**B**–**D**) Principal components analysis (PCA) in 3-d (**B**) and 2-d projections showing PC-1 and PC-2 (C) and PC-1 and PC-3 (D). For the heatmap in (**A**), each row represents an individual, while each column represents a score from the ADI-R diagnostic. The range of severity scores (1–3) for each ADI-R item is represented in the color bar above the heatmap, with light blue indicating a score of 1, green-yellow indicating a score of 2, and red indicating a score of 3, which represents the most severe autism spectrum disorder (ASD) manifestation. The three ASD subgroups are identified by the vertical colored bars along the right side of the heatmap, with red indicating the severely language-impaired subgroup, yellow indicating the intermediate subgroup, and turquoise indicating the mild subgroup. This latter set of colors also applies to the subgroups shown in the PCA plots in which each point represents an individual. Note: In the heatmap (**A**), the large block of red columns associated with the severely language-impaired subgroup primarily corresponds to items involving spoken language on the ADI-R diagnostic.

Using GenomeStudio Methylation Module software, CpG sites across the genome were identified that exhibited statistically significant differential methylation with False Discovery Rate (FDR)-adjusted *p*-values < 0.05. For the severely language-impaired subgroup (*n* = 22 cases and 22 controls), 266 unique DAGs were mapped to the CpGs (Table S1). The intermediate subgroup (*n* = 121 cases and 121 controls) exhibited 360 unique DAGs (Table S2), and the mild subgroup (*n* = 149 cases and 149 controls) exhibited 4073 unique DAGs (Table S3). Among the three ASD subgroups, a total of 4155 unique DAGs with FDR-adjusted *p*-values < 0.05 were identified, with some DAGs shared among the subgroups. The volcano plots for each subgroup illustrate distinct differences in the number, distribution, and methylation profiles of significant DAGs in each subgroup (Figure 2). For example, the majority of the DAGs in the severely language-impaired subgroup show reduced methylation (negative delta β values), while the majority of the DAGs in the intermediate subgroup show increased methylation (positive delta β values) (Figure 2A). Although the majority of DAGs in the mild subgroup show decreased methylation as observed in the severely language-impaired subgroup, the mild subgroup has a much greater number of significant DAGs (Figure 2B). In addition, while all the DAGs in the mild subgroup exhibit delta β values < |±0.05|, a fraction of the DAGs associated with the severely language-impaired subgroup exceeds these absolute delta β values, indicating larger methylation differences between cases and controls, which are also reflected by larger fold-change values (see Figure S1). These data suggest that the three ASD subgroups can be distinguished from each other by their differential DNA methylation profiles.

To examine the impact of decreasing phenotypic heterogeneity on the ability to detect statistically significant DAGs, differential methylation analysis of the 27K BeadChip data using Illumina's GenomeStudio Methylation Module was also performed without stratification into phenotypic subtypes, i.e., combined case group (*n* = 292 cases and 292 controls). Without subgrouping, a total of 2570 unique DAGs with FDR-adjusted *p*-values < 0.05 were identified in the combined case group (Table S4). The volcano plot of DAGs for the combined case group is shown in

**Figure 2.** Volcano plots of significant differentially methylated regions and associated genes (DAGs) among subgroups and combined cases. |DiffScore| versus Delta β plots for significant DAGs in: (**A**) severely language-impaired (blue) and intermediate (red) subgroups; (**B**) mild subgroup (blue) and combined cases (red). Note: A |DiffScore| of 13 is roughly equivalent to a *p*-value of 0.05.

Figure 2B in comparison to that of the mild subgroup. It is notable that there are fewer significant DAGs in the combined case group compared to that of the mild subgroup (2570 vs. 4073), despite the larger number of individuals in the combined case group (292 vs. 149 case-control pairs). Figure 3 summarizes the location of the differentially methylated CpG sites relative to the transcription start site (TSS) for each case group and also the proportion of hypermethylated or hypomethylated sites in each group. In brief, more than 90% of CpG sites in all case groups were found within 1000 bp of the TSS, with the remainder less than 1500 bp away, suggesting that the majority of these sites are likely to be involved in the regulation of transcription. There are also noticeable quantitative differences in the methylation profiles among the case groups. For example, the severely language-impaired subgroup exhibits the greatest proportion of hypomethylated genes (86.8%) and the greatest proportion of CpGs (72.4%) that are closest (≤500 bp) to the TSS. By contrast, the intermediate subgroup exhibits the greatest proportion of hypermethylated genes (91.6%), while the location of the CpGs relative to the TSS is very similar to that of the mild and combined subgroups. Table S5 lists the map positions of the differentially methylated CpGs (and associated genes) in each subtype and the combined case group.

**Figure 3.** Summary of the proportions of differentially methylated CpG sites at different distances relative to the transcription start site (TSS) of the closest gene and the proportion of hypermethylated and hypomethylated sites in each ASD case group.

The Venn diagram in Figure 4A shows that there are 67 significant DAGs shared among the three subgroups and the combined case group, while Figure 4B shows volcano plots representing the relative distribution of these 67 DAGs in each group's differential methylation profile. The differences in the distribution of these overlapping DAGs in each of the four groups reflect the differences that were revealed in Figure 2, with the majority of DAGs in the intermediate subgroup showing increased methylation, while these same DAGs in the severely language-impaired, mild, and combined groups show decreased methylation, as shown quantitatively in Figure 3.

**Figure 4.** Stratification of ASD patients (*n* = 292) into distinct subphenotypes results in increased discovery of significant DAGs. (**A**) The Venn diagram shows unique significant DAGs that were identified using GenomeStudio Methylation Module v1.8 software with subphenotyping into three groups (mild, intermediate, and severely language-impaired) or without subphenotyping (combined case group) (FDR-adjusted *p*-values < 0.05). (**B**) Volcano plots for the 67 overlapping DAGs from each group, identified by color in the accompanying legend.

Not surprisingly, the |DiffScore| values (inversely related to *p*-values) for these DAGs are much greater in the mild subgroup than those for the severely language-impaired subgroup, which is likely the result of the larger number of samples in the mild subgroup (149 vs. 22 case-control pairs). On the other hand, despite having the largest number of cases and controls, the combined case group has smaller |DiffScore| and delta β values in comparison to the mild subgroup. This finding may reflect the increased heterogeneity underlying the combined case group in which the conglomeration of disparate

cases dampens the average methylation differences (i.e., delta β) between the cases and controls. Hence, the present study demonstrates that phenotypic subtyping by clinical severity of ADI-R scores is a productive path for discovering a greater number of statistically significant DAGs between ASD cases and controls as well as differences in DNA methylation profiles among the subgroups.

### *2.2. Network Prediction Analyses of Subgroup-Associated DAGs*

Ingenuity Pathway Analysis (IPA) was used to conduct functional analysis of the DAGs from each of the ASD subgroups as well as from the combined case group. Neurological functions enriched among DAGs in each subgroup and the combined case group are shown in Table 1. The specific DAGs associated with each function are included in Table S6. As shown, the severely language-impaired subgroup exhibits more functions known to be associated with ASD, such as: neuritogenesis, size and branching of neurites, and maturation of synapse and dendritic spines [21–25]. Figure S2 shows that axon guidance signaling and CXCR4 signaling are canonical pathways involved in the top network of genes involved in neuritogenesis. The intermediate subgroup is notably enriched in DAGs associated with the activation of neuroglia and astrocytes, suggesting inflammatory processes known to be involved in ASD [26,27]. Figure S3 shows that the neuroinflammation signaling pathway as well as the glucocorticoid signaling pathway are implicated by the genes involved in the abnormal morphology of neurons in the intermediate subgroup.


**Table 1.** Significantly over-represented neurological functions among DAGs from three phenotypic subgroups of ASD and a combined case group.

\* Fisher exact *p*-value representing the probability that the indicated function is not over-represented among the DAGs for each group, using all genes in IPA's Knowledgebase as the reference gene set.

The mild subgroup is enriched in DAGs involved in sensory system development. Figure S4 shows that the top network of genes enriched for sensory system development is associated with axon guidance, transforming growth factor-β (TGFβ), and bone morphogenetic protein (BMP) signaling pathways. Interestingly, many individuals with ASD exhibit abnormal sensory responses, such as hypersensitivity to certain sounds, visual stimuli, taste, and textures [28–30]. Thus, it is not surprising that many genes related to the sensory system are affected. The nervous system functions associated with DAGs in the combined case group reflect those identified for the intermediate and mild subgroups but not for the severely language-impaired subgroup, which comprises just 7.5% of the total number of cases.

With respect to neurological disorders (see Table 2, Table S7), DAGs in the severely languageimpaired subgroup are enriched for genes contributing to comorbidities in ASD, such as cognitive impairment [31–34] and motor dysfunction [35,36]. While axon guidance and synaptogenesis signaling is implicated by the top network of genes associated with cognitive impairment (Figure S5), calcium signaling and dendritic cell maturation are indicated by the top network of genes involved in motor dysfunction (Figure S6). On the other hand, DAGs in the intermediate and mild subgroups as well as the combined case group are over-represented with respect to schizophrenia genes. Figure S7 shows that the neuroinflammation signaling pathway as well as the cAMP and G-protein coupled receptor signaling pathways are involved in schizophrenia in the intermediate subgroup, while synaptogenesis, GABA receptor, and CREB signaling in neurons are involved in the top network of genes in the mild subgroup (Figure S8). Genes associated with motor dysfunction and movement disorders are also over-represented among DAGs in the mild subgroup. Interestingly, only the severely language-impaired subgroup exhibits DAGs explicitly enriched for ASD or intellectual disability (ID), a comorbidity that presents more frequently in individuals with deficits in spoken language.


**Table 2.** Significantly over-represented neurological and developmental disorders among DAGs from the three phenotypic subgroups of ASD and the combined case group.

\* Fisher exact *p*-value representing the probability that the indicated disorder is not over-represented among the DAGs for each group, using all genes in IPA's Knowledgebase as the reference gene set.

Figure 5 shows that one of the two networks of genes that are associated with ASD/ID includes *FMR1*, the gene responsible for fragile X syndrome, a genetic condition that is frequently associated with both intellectual disability and ASD. In a hierarchical layout of the network (Figure S9), *FMR1* is placed at the top of the network, highlighting its influence on the downstream genes, which include *UBE3A*, an E2 ubiquitin conjugating enzyme involved in cognitive disability, and *SLC1A7*, a glutamate transporter that is involved in pervasive developmental disorder (also used to describe ASD), social anxiety, and fragile X.

**Figure 5.** Gene network associated with DAGs enriched for ASD or intellectual disability. All genes involved in developmental disorder are outlined in purple. Genes colored red are hypermethylated while those colored green are hypomethylated. The turquoise colored lines indicate relationships between *FMR1* and other genes in the network. Solid lines denote direct interactions; dashed lines denote indirect interactions.

## *2.3. Proximity of Hypermethylated and Hypomethylated CpGs to the TSS of the DAGs*

Aside from identifying DAGs in each subtype, we also separately investigated the genes associated with hypermethylated and hypomethylated CpGs that were less than 500 bp from the TSS. Interestingly, two of the top genes associated with ASD or intellectual disability in the severely language-impaired subtype, *FMR1* and *UBE2A*, were among the hypomethylated genes closest to the TSS in this subgroup (Table 3). Other ASD-relevant genes within 500 bp of the TSS are *PAX8* and *SHANK1* (both hypomethylated), and *CADM1* and *PAX6* (both hypermethylated). PAX6 and PAX8 are members of the paired box (PAX) family of transcription factors. While PAX 6 is involved in modulating the fate of neural progenitor cells [37], genetic variants in *PAX8* are associated with sleep disturbance [38,39], a frequent comorbidity of ASD. SHANK1 is a scaffolding protein at the postsynaptic density that has been found to be involved in ASD [40]. Mutations in *CADM1*, which codes for a synaptic adhesion molecule, are also associated with ASD [41].


**Table 3.** Neurological functions and diseases enriched among DAGs implicated by CpGs within 500 bp of the TSS in the severely language-impaired subgroup.

\* Fisher exact *p*-value indicating the probability that the function or disorder is not enriched among the DAGs for this subgroup.


**Table 4.** Neurological functions and diseases enriched among DAGs implicated by CpGs within 500 bp of the TSS in the intermediate subgroup.

\* Fisher exact *p*-value indicating the probability that the function or disorder is not enriched among the DAGs for this subgroup.

Table 4 shows the neurological functions and disorders enriched among the hyper- and hypomethylated genes with CpGs closest (<500 bp) to the TSS in the intermediate subgroup. Notable among this set of genes are the hypermethylated genes, *CORT* (corticostatin) and *NTF3* (neurotrophin 3), which are both involved in the loss of neurites. CORT is a neuropeptide that is involved in the depression of neuronal activity and the induction of slow-wave sleep, while NTF3 is a protein that controls the survival and differentiation of mammalian neurons. Among the ASD-relevant hypomethylated genes in this subgroup are *CTNNB1* and *GABRA3*. CTNNB1 (catenin beta1) plays a role in seizure susceptibility and cortical malformation as demonstrated in a *Ctnnb1* knock-out mouse model [42], and GABRA3, a GABA receptor that mediates fast inhibitory effects of GABA in the brain, is reduced in the cerebellum of individuals with ASD [43].

Among the top hypermethylated genes that are enriched in neurological functions and disorders in the mild subgroup are *ARX* and *CNTNAP2* (Table 5; Table 6). *ARX* (Aristaless related homeobox gene) is involved in a number of neurological diseases, including mental retardation, autism, epilepsy, and dystonia [44]. Its function as a homeobox gene is responsible for the wide range of neurological disease phenotypes that result from its mutation or dysregulation. *CNTNAP2*, a member of the neurexin family of proteins that serves as a cell adhesion molecule, is one of the most well-studied ASD risk genes [45–49]. Like *ARX*, mutations in *CNTNAP2* can lead to multiple neurological disease phenotypes, including autism, epilepsy, intellectual disability, obsessive compulsive disorder, and schizophrenia [50]. Notable among the hypomethylated genes in this subgroup are a number of chemokine genes, including *CCL1*, *CCL11*, *CCL2*, *CCL22*, *CCL5*, and *CCL7*. Not surprisingly, these DAGs are enriched among genes that are involved in the activation of neuroglia and neuroinflammation that have been associated with ASD [51,52]. Interestingly, hypomethylated genes associated with schizophrenia include a number of neurotransmitter receptors (e.g., cholinergic, cannabinoid, dopamine, GABA, glutamate, and serotonin) as well as ion channels and ion transporter proteins. These schizophrenia-associated DAGs in the mild ASD subgroup are significantly enriched for GABA receptor signaling (Fisher's exact *p*-value = 1.98 <sup>×</sup> 10−6), neuroinflammation pathway signaling (*p* = 4.21 <sup>×</sup> 10−5, serotonin receptor signaling (1.75 <sup>×</sup> <sup>10</sup><sup>−</sup>5), calcium signaling (1.29 <sup>×</sup> <sup>10</sup><sup>−</sup>3), G-protein coupled receptor signaling (2.56 <sup>×</sup> 10<sup>−</sup>3), and glutamate receptor signaling (7.61 <sup>×</sup> 10<sup>−</sup>3) pathways. Overall, the proximity of the differentially methylated CpGs to the TSS of the DAGs enriched for neurological functions and diseases (as shown in Tables 3–6) suggests that these sites may play a role in the transcriptional regulation of the associated genes.


**Table 5.** Neurological functions enriched among DAGs implicated by CpGs within 500 bp from the TSS in the mild subgroup.


**Table 5.** *Cont.*

\* Fisher exact *p*-value indicating the probability that the function is not enriched among the DAGs for this subgroup.

**Table 6.** Neurological diseases enriched among DAGs implicated by CpGs within 500 bp from the TSS in the mild subgroup.



\* Fisher exact *p*-value indicating the probability that the neurological disease is not enriched among the DAGs for this subgroup.

## *2.4. Shared DAGs among Case Groups Converge on Inflammatory Responses*

We also used IPA to analyze the 67 DAGs (from Figure 4) shared by all three subgroups and the combined case group. Figure 6 shows the top network resulting from the network prediction analysis of the shared DAGs. This network is enriched in genes associated with inflammatory responses, suggesting that neuroinflammation may be a common theme underlying core features of ASD that are manifested in all subtypes. These results collectively demonstrate the value of reducing heterogeneity by subphenotyping individuals with ASD to maximize the ability to identify not only ASD-related DAGs but also ASD-associated functions, pathways, and disorders over-represented within each subgroup. Specifically, 1.62 times as many unique DAGs (4155) are identified among the three subphenotypes in comparison to that of the combined case group (2570).

**Figure 6.** Hierarchical layout of top network of DAGs shared among 3 subphenotypes of ASD and the combined case group. Genes outlined in purple are involved in inflammatory responses.

## *2.5. Overlap of DAGs and Di*ff*erentially Expressed Genes (DEGs) from Analogous Phenotypic Subgroups from the Simplex Population*

The subgroup-associated DAGs from the present methylation study were compared with DEGs from a separate study investigating transcriptomic data on individuals from the SSC cohort who were divided into subphenotypes using cluster analyses of ADI-R scores (Hu, V.W. and Bi, C., unpublished data). The overlapping genes for each subgroup and the combined case group included 12 DEGs from the severely language-impaired subgroup (hypergeometric cumulative *q*-value = 0.30), 8 DEGs from the intermediate subgroup (*q* = 0.35), 76 DEGs from the mild subgroup (*q* = 7.14 <sup>×</sup> 10−4, and 68 DEGs from the combined case group (*q* = 2.31 <sup>×</sup> 10−4) (Table 7). Thus, there is a significant overlap between DEGs and DAGs from the mild subgroup and combined case group but not from the severely language-impaired subgroup or intermediate subgroup, suggesting at least partial functional validation with regard to regulation of expression for the overlapping genes. It is not expected that all of the DEGs would be regulated by methylation differences between cases and controls. It should also be noted that there were fewer individuals in the transcriptomic investigation, with 7 case-control sibling pairs in the severely language-impaired subgroup, 26 pairs in the intermediate subgroup, 41 pairs in the mild subgroup, and 74 pairs in the combined case group. Furthermore, although the cases from the transcriptomic analysis were phenotypically representative of those from the three subgroups in this methylation study, the samples were not the same as those included in the present study.


**Table7.**OverlappinggenesamongDAGsandDifferentiallyExpressedGenes(DEGs)fromthethreephenotypicsubgroupsofASDandthecombinedcase

*Int. J. Mol. Sci.* **2020**, *21*, 6877

#### *2.6. Comparison with Other DNA Methylation Studies of ASD*

DNA methylation has long been implicated as a contributor to the etiology of ASD-related disorders, such as Rett syndrome and Fragile X [53–56]. With respect to idiopathic ASD, we were the first to demonstrate that DNA methylation differences across multiple genes could be correlated to dysregulated expression of those genes in LCLs from discordantly diagnosed monozygotic twins and sibling pairs [7,57]. Since then, a number of other studies using blood-derived cells, buccal cells, and brain tissues have confirmed aberrant DNA methylation as a likely contributory factor to ASD [7–15]. However, there has been relatively low consistency with respect to the specific DAGs identified among the various studies, which is possibly due to the heterogeneity both within and among the cohorts used for the different analyses.

To our knowledge, this is the first study to undertake methylation analysis of ASD probands and unaffected siblings from the simplex population after dividing the cases into phenotypic subgroups to lessen the clinical heterogeneity inherent to ASD. A recent study that involved a meta-analysis of blood DNA methylation from two cohorts of individuals with ASD and controls did not find any DAGs that were significant after Bonferroni correction for multiple testing despite having 796 cases and 858 controls [58]. One of the cohorts included 343 probands and their respective sibling controls from the SSC. Among the 7 genes that were suggestively associated with ASD with *<sup>p</sup>*-value <sup>&</sup>lt; <sup>1</sup> <sup>×</sup> <sup>10</sup><sup>−</sup>5, only one, *DIO3*, was found among DAGs from the mild subgroup tested here. Similarly, another recent methylation study using neonatal bloodspots from 1263 infants (of whom 50% were later diagnosed with ASD) concluded that ASD is not associated with robust differential methylation between the diagnosed children and the unaffected ones [59]. Another large methylome-wide association study (MWAS), which used cord blood from 701 8-year-olds and their respective scores on the Social and Communication Disorders Checklist as a measure of autistic traits, found no significant CpGs associated with the social traits [14]. Moreover, Massrali et al. [14] reported that a meta-analysis of the blood and blood spot data from the previous two MWAS studies [58,59] did not reveal any significant concordance in effect direction with their cord blood study; they therefore concluded that none of the MWAS studies identified any significant DAGs. It should be noted that all of these studies used methylation data collected on Illumina Infinium HumanMethylation450K BeadChip arrays, which offer greater potential for identifying differentially methylated CpG sites in comparison to the HumanMethylation27K array from which we derived the methylation data for our study. We therefore suggest that our ability to identify a large number of significant DAGs—some of which are replicated in different subgroups (or cohorts)—is due to the reduction in phenotypic or clinical heterogeneity among the cases in each subgroup. This interpretation is borne out by the smaller delta β values for DAGs when all the cases are combined into one group for methylation analysis. While we have used cluster analyses of ADI-R scores for phenotypic subgrouping in this study, heterogeneity reduction by genetic subgrouping (e.g., by *CHD8* mutations or by 16p11.2 deletions) has also resulted in enhanced ability to detect significant DAGs between ASD cases and controls [60]. In the same study, Siu et al. also reported no significant DAGs with a heterogeneous group of cases with idiopathic ASD, thereby reaffirming the value of heterogeneity reduction in genome-wide DNA methylation studies of ASD. Aside from the subgrouping methods discussed above, heterogeneity reduction in ASD can also be accomplished by subtyping individuals by associated comorbidities, such as intellectual disability, immune dysfunction, or gastrointestinal disease.

#### *2.7. Advantages and Limitations of Study Design and Future Considerations*

This study examines the impact of applying clinical subtyping to methylation analyses of males with ASD from the simplex population. While the inclusion of only males eliminates sex as a confounding factor, future studies should also include females to investigate the potential for sex-related differences in DNA methylation. The main limitation here is the relatively small number of cases studied, particularly in the severely language-impaired subgroup, which represents the smallest phenotypic subgroup identified by ADI-R cluster analyses of cases from simplex families. Despite this limitation, the severely language-impaired subgroup exhibited the largest differences in β values (and fold-change) between cases and controls relative to that of the other groups, perhaps reflecting the heightened clinical severity of this subgroup. Furthermore, network prediction analyses show that this subgroup was most enriched in neurological functions and comorbidities associated with ASD and was also the sole subgroup enriched for genes directly involved in autism and intellectual disability.

Another limitation is that we were not able to confirm the DAGs identified in each subgroup inasmuch as this bioinformatics study focused on a re-analysis of existing methylation data, and we did not have access to the samples for pyrosequencing analyses. Future studies should therefore address the confirmation of these results not only with regard to methylation analyses of specific DAGs, but also with respect to application of this subtyping method to methylation analyses of an additional cohort of individuals with ASD from the simplex population, preferably with more CpG sites interrogated. Nevertheless, the overlap of DAGs between and among the three phenotypic subgroups represents replication of at least these specific DAGs in different cohorts, as there is no overlap of individuals among the subgroups. In addition, the overlap of DAGs from this study and DEGs from a separate transcriptomic study involving analogous subgroups from the simplex population offers functional support for a fraction of the DAGs identified here. Finally, the Infinium HumanMethylation27K BeadChip array, which was used to generate the methylation data analyzed in this study, is also a limitation in terms of the relatively low number of CpGs interrogated. More recent BeadChip arrays currently probe over 800K CpG sites, and whole genome bisulfite sequencing can assess even more. Thus, in light of our study demonstrating increased discovery of significant DAGs as well as ASD-associated neurological functions and disorders as a result of phenotypic subgrouping, it will be of interest for future studies to analyze more comprehensive methylation data in the context of ASD subtypes to confirm the main findings reported here.

## **3. Materials and Methods**

## *3.1. Acquisition of Methylation Data for Individuals with ASD from the Simplex Population*

DNA methylation data from a study of individuals with ASD and their unaffected siblings who were included in the Simons Simplex Collection (SSC) (New York, NY, USA) were downloaded from the National Database for Autism Research (NDAR). NDAR is a repository of clinical, omics, and brain imaging data from autism studies that is maintained by the NIMH Data Archives (NDA) (Rockville, MD, USA). The original data were deposited by Dr. Stephen Warren (Emory University, Atlanta, GA, USA) for a methylation pilot study entitled "Epigenetic marks as peripheral biomarkers for autism" (Study ID: #287). For the pilot study, DNA methylation for over 300 simplex cases and their respective sibling controls was analyzed on Illumina Infinium HumanMethylation27K BeadChips covering 27,578 CpG dinucleotides, with the raw data deposited into NDAR.

## *3.2. Phenotypic Subtyping for ASD Individuals from Simplex Families*

The Autism Diagnostic Interview-Revised (ADI-R), which is considered a gold-standard diagnostic tool for autism, is based on a series of questions posed to parents or primary caregivers that interrogate a subject's performance on a wide range of behaviors impacted by ASD [61]. These behaviors are scored for severity by a trained neuropsychologist according to the parent/caregiver's response. Raw ADI-R scoresheets for 1900 individuals with ASD (i.e., probands) were obtained from the SSC. As described previously for multiplex families [16], 123 severity scores on 63 ADI-R items (see Table S8) for each individual were subjected to K-means cluster (KMC) analyses, which showed that K = 3 (representing separation into 3 subgroups) resulted in an optimum separation of cases with distinguishable severity profiles. Based on these profiles, the three subgroups were described as mild, intermediate, and severely language impaired, which are similar to three of the four subgroups previously identified in multiplex families [16]. The fourth subgroup in the multiplex population, which exhibited a noticeably higher frequency of savant skills, was not clearly discernible within the simplex population.

The sample identification numbers (IDs) of the probands from the Warren pilot methylation study were then cross-referenced against the sample IDs associated with 1900 ADI-R scoresheets of individuals with ASD from the SSC. Based on the severity profiles that resulted from the ADI-R cluster analyses, 292 cases (all males) with available methylation data were stratified into three subphenotypes as follows: mild (*n* = 149), intermediate (*n* = 121), and severely language impaired (*n* = 22). Differences in the severity profiles of the 292 individuals selected for our study were verified by hierarchical clustering (HCL) and principal components analyses (PCA) using open-access Multiexperiment Viewer software [62]. The demographic information on individuals included in the current study is presented in Table S9.

### *3.3. Identification of Di*ff*erentially Methylated Regions (DMR) and DMR-Associated Genes (DAG)*

Raw signal intensities were extracted from idat files derived from the Illumina Infinium HumanMethylation27K BeadChip analyses using Illumina's GenomeStudio Methylation Module v1.8 (Illumina, San Diego, CA, USA). The DNA methylation level of each interrogated CpG site is reported in the GenomeStudio software as an average β value, ranging from 0 (completely unmethylated) to 1 (100% methylated). The β value is defined as the ratio of signal from the methylated probe (M) to the sum of the signals from the methylated probe (M) and unmethylated probe (U) plus 100, or β = M/(M + U + 100). Analysis of differential methylation between ASD cases and sibling controls was performed in GenomeStudio using the Illumina Custom Model. This error model developed by Illumina assumes a normal distribution of the β values among biological replicates and results in a differential methylation score (DiffScore) for each interrogated CpG site. Using the absolute value of DiffScores reported by the GenomeStudio software, *p*-values were calculated with the following formula: *p* = 10−(|DiffScore|/10). |DiffScore| > 13.0103 is equivalent to a *p*-value < 0.05. Correction for multiple comparisons was accomplished by computing the false discovery rate (FDR), which is integrated into the GenomeStudio software. Interrogated CpG sites were annotated in GenomeStudio with respect to their corresponding genes. DAGs with FDR-adjusted *p* < 0.05 were classified as significant. The three ASD subgroups were analyzed separately as well as in combination (i.e., combined case group) for DNA methylation differences when compared to their respective sibling controls. Volcano plots showing the overall distribution of significant DAGs for each subgroup or the combined case group were generated by plotting |DiffScore| against delta β, i.e., (βcase – βcontrol). The distance of the differentially methylated CpGs to the TSS of the nearest gene was obtained from Illumina's content file for the HumanMethylation27K BeadChip array, which provides the mapping information for each CpG as well as its location with respect to CpG islands. All of the CpG sites were within 1500 bp of a TSS, suggesting their potential involvement in the regulation of gene expression.

#### *3.4. Network Prediction Analyses of DAGs*

Ingenuity Pathway Analysis (IPA) network prediction software (Qiagen, Germantown, MD, USA) was used to identify enriched functions, pathways, and disorders associated with DAGs from the methylation analyses based on Fisher exact *p*-values of *p* ≤ 0.05, using genes in IPA's Knowledgebase as the reference set of genes to determine enrichment in pathways or functions among the DAGs.

## *3.5. Hypergeometric Distribution Analyses*

Hypergeometric distribution analyses were employed to identify the significance of overlap between DAGs and differentially expressed genes (DEGs) from analogous phenotypic subgroups of ASD from a separate study (Hu, V.W. and Bi, C., unpublished data), which involved re-analysis of transcriptomic data from a previously published study that used LCLs from the SSC [63]. Theoverlapping genes were identified using an open-access Venn diagram software program called Venny 2.1.0 (https://bioinfogp.cng.csic.es/tools/venny/) [64]. Significant overlap between the DAGs and DEGs was determined by hypergeometric distribution analyses using the open-access CASIO Keisan Online Calculator (http://keisan.casio.com/exec/system/1180573201), with significance determined

by an upper cumulative *q*-value of ≤ 0.05. Venny 2.1.0 was also used to identify overlap among subgroup-associated DAGs and those from the combined case group.

### **4. Conclusions**

This is the first study to investigate differential methylation in individuals with ASD from the simplex population who were divided into distinct phenotypic subgroups by cluster analyses of ADI-R scores. This study is important because it demonstrates a link between DNA methylation and the etiology of ASD in this population. We suggest that subphenotyping enables more efficient identification of statistically significant DAGs which, in turn, reveal subphenotype-dependent functions and comorbidities that are associated with each ASD subgroup. Such discrimination of the biological differences between ASD subphenotypes is essential to our understanding of the complex pathobiology of ASD as well as our ability to develop targeted ASD subtype-directed therapeutics.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1422-0067/21/18/6877/s1, Figure S1. |DiffScore| vs. log2 Fold-change values for all groups; Figure S2. Gene network associated with DAGs involved in neuritogenesis in the severely language-impaired subgroup; Figure S3. Gene network associated with DAGs involved in abnormal morphology of neurons in the intermediate subgroup; Figure S4. Gene network associated with DAGs involved in sensory system development in the mild subgroup; Figure S5. Gene network associated with DAGs involved in cognitive impairment in the severely language-impaired subgroup; Figure S6. Gene network associated with DAGs involved in motor dysfunction in the severely language-impaired subgroup; Figure S7. Gene network associated with DAGs involved in schizophrenia in the intermediate subgroup; Figure S8. Gene network associated with DAGs involved in schizophrenia in the mild subgroup; Figure S9. Hierarchical layout of a top gene network of DAGs associated with ASD and ID in the severely language-impaired subgroup; Table S1. Significant DAGs associated with the severely language-impaired subgroup; Table S2. Significant DAGs associated with the intermediate subgroup; Table S3. Significant DAGs associated with the mild subgroup; Table S4. Significant DAGs associated with the combined case group; Table S5. Map positions (distance from TSS) of the differentially methylated CpG sites in each subtype and the combined case group; Table S6. Over-represented neurological functions among DAGs in all subgroups and the combined case group; Table S7. Over-represented neurological and developmental disorders among DAGs in all subgroups and the combined case group; Table S8. List of ADI-R items with score adjustments used for cluster analyses, as described in [16]; Table S9. Demographic information for probands and siblings used in this study.

**Author Contributions:** Conceptualization, V.W.H.; Formal analysis, E.C.L. and V.W.H.; Supervision, V.W.H.; Visualization, V.W.H.; Writing—original draft, E.C.L. and V.W.H.; Writing—review & editing, E.C.L. and V.W.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The methylation data used in the preparation of this manuscript were obtained from the National Institute of Mental Health (NIMH) Data Archive (NDA). NDA is a collaborative informatics system created by the National Institutes of Health to provide a national resource to support and accelerate research in mental health. Dataset identifier: [NIMH Data Archive Collection ID #287; DOI: 10.15154/1165651]. This manuscript reflects the views of the authors and may not reflect the opinions or views of the NIH or of the Submitters submitting original data to NDA. We also acknowledge Stephen Warren (Emory University, Atlanta, GA) as the submitter of the pilot methylation data deposited into the NDA. We thank the SSC for providing raw ADI-R scoresheets for probands in the SSC (to VWH) and the families of individuals with ASD for making these data as well as biological specimens available for research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article*

## **Tracing Autism Traits in Large Multiplex Families to Identify Endophenotypes of the Broader Autism Phenotype**

**Krysta J. Trevis 1,2,**†**, Natasha J. Brown 3,4,**†**, Cherie C. Green 1,2,5, Paul J. Lockhart 6,7, Tarishi Desai 1,2, Tanya Vick 4, Vicki Anderson 2,8,9, Emmanuel P. K. Pua 1, Melanie Bahlo 10,11, Martin B. Delatycki 3,6,7, Ingrid E. Sche**ff**er 1,3,9,12,**‡ **and Sarah J. Wilson 1,2,12,\*,**‡


Received: 21 September 2020; Accepted: 21 October 2020; Published: 27 October 2020

**Abstract:** Families comprising many individuals with Autism Spectrum Disorders (ASD) may carry a dominant predisposing mutation. We implemented rigorous phenotyping of the "Broader Autism Phenotype" (BAP) in large multiplex ASD families using a novel endophenotype approach for the identification and characterisation of distinct BAP endophenotypes. We evaluated ASD/BAP features using standardised tests and a semi-structured interview to assess social, intellectual, executive and adaptive functioning in 110 individuals, including two large multiplex families (Family A: 30; Family B: 35) and an independent sample of small families (*n* = 45). Our protocol identified four distinct psychological endophenotypes of the BAP that were evident across these independent samples, and showed high sensitivity (97%) and specificity (82%) for individuals classified with the BAP. Patterns of inheritance of identified endophenotypes varied between the two large multiplex families, supporting their utility for identifying genes in ASD.

**Keywords:** Broader Autism Phenotype; genetic; autism spectrum disorder; multiplex family

#### **1. Introduction**

Autism Spectrum Disorders (ASD) are a group of neurodevelopmental conditions characterised by deficits in social communication and social interaction, and restricted and repetitive patterns of behaviours, interests, or activities [1]. Recent estimates from the Centers for Disease Control and Prevention indicate a prevalence of 1 in 54 children aged 8 years for ASD [2]. There is strong evidence for the contribution of genetic factors to the aetiology of ASD [3]. Despite recent molecular advances, the cause remains unidentified in the majority of cases [4].

The recognition of milder phenotypes has facilitated an increased understanding and interest in the wide spectrum of clinical presentations in ASD. Family members of individuals with ASD have been observed to express milder forms, known as the Broader Autism Phenotype (BAP), that are not sufficient to meet diagnostic criteria for ASD [5,6]. The BAP includes a range of subtle behavioural and cognitive features that have qualitatively similar presentations to the core domains of ASD. While clinically significant impairment in these areas of functioning is seen in ASD [1,7], BAP traits are continuously distributed in the general population [8]. Several studies have reported the expression of at least one BAP trait in 20% of relatives of children with ASD, as well as a higher rate of BAP traits compared to controls across domains of pragmatic language [9,10], personality [11], social cognition [10,12,13] and executive function [14,15]. Monozygotic twins demonstrate 65–90% concordance for ASD [16,17] with a higher estimate when the BAP is considered [18,19], while dizygotic twin concordance is ~20% [20]. Together these findings suggest a strong genetic basis for ASD.

An estimated 25% of ASD cases can be identified clinically or molecularly with a predominant monogenic cause [21–23]. Although the aetiology for the majority of cases is still unknown, increasing evidence suggests a polygenic basis due to the interaction of multiple genetic risk variants following complex inheritance, with or without environmental factors [24–27]. Complementary techniques will be necessary to investigate the genetic aetiology of ASD in unsolved cases. Typically, family studies combine many small families (2–3 affected individuals), however, these are likely to be confounded by genetic heterogeneity. Very large multiplex families (> 8 affected) where ASD traits appear dominantly inherited are rare, but more genetically homogeneous. In other complex disorders, such as epilepsy, phenotypic characterisation of such families has proved powerful in gene discovery [28], however this approach has received limited attention in ASD. In multiplex ASD families, the identification of family members with BAP traits, or endophenotypes, may serve as markers of risk variants for ASD [29,30]. In turn, this may facilitate gene identification [31].

Endophenotypes are measurable features within a disorder that are proposed to reduce its complexity into more quantifiable elements [32]. These components can be represented at any level of analysis, including but not limited to biochemical, neuroanatomical, neurophysiological, cognitive or neuropsychological measurements. They have been hypothesised to reflect more aetiologically homogeneous subgroups within genetically heterogeneous conditions. Endophenotypes are also presumed to be located closer to aetiological mechanisms in the pathway between genotype and disease, compared to more overt phenotypes that are used to define clinical syndromes [33,34]. Clinically, the use of endophenotypes offers increased statistical power to localise and identify genes associated with disease [35]. Importantly, an endophenotype must indicate genetic susceptibility to disease independent of disease status, and by definition may serve as a marker of genetic liability in individuals without the disorder [34]. In individuals with the BAP, the mild expression of ASD-related traits is hypothesised to be due to an increased genetic liability for ASD. There are several BAP traits that may be considered "endophenotypes" from within the domains of language, executive function, and social cognition [31]. In the context of a single large family where numerous individuals demonstrate ASD or the BAP, recognition of BAP endophenotypes should allow granular identification of autism genes of dominant effect. This study is the first known to the authors to apply this approach in autism.

The aim of our study was to analyse ASD traits within large multiplex families to examine inheritance of ASD by identifying endophenotypes of the BAP. We achieved this aim through an iterative process of characterising, refining, and assessing BAP endophenotypes. This involved (1) rigorous phenotyping of large multiplex families to identify BAP traits and potential endophenotypes, (2) validation of putative endophenotypes and their cut-off scores in an independent aggregated sample of 20 small families (each with at least one member with ASD), and (3) classification of BAP endophenotypes in large multiplex families (from step 1) based on optimal cut-off scores following validation analysis (from step 2) to assess their utility for examining inheritance patterns. We hypothesised that (1) multiple individuals in large families would demonstrate the BAP, (2) distinct BAP endophenotypes would be identified, and (3) BAP endophenotypes would vary in presentation between large multiplex families.

## **2. Results**

## *2.1. Hypothesis 1: Multiple Individuals in Large Families Demonstrate the BAP*

Based on our rigorous protocol for phenotyping the BAP in large multiplex families, we identified 32 members with the BAP across Family A and B. Of the 23 members in Family A who did not have an ASD diagnosis, we detected the BAP in 17 (74%) individuals, with 6 individuals unaffected. In Family B, we detected 15 (63%) individuals with the BAP, with 9 individuals unaffected (Table 1).


**Table 1.** Intellectual functioning in Family A and B by diagnostic classification.

FSIQ = Full Scale Intelligence Quotient, VIQ = Verbal Intelligence Quotient, PIQ = Performance Intelligence Quotient. Note. Average FSIQ = 80–119; Superior FSIQ = ≥120. <sup>a</sup> One individual only completed VIQ and select executive functioning subtests.

Intellectual function was assessed in 54/63 (86%) individuals. Overall, participants were of average or greater intelligence. Average full scale intelligence quotient (FSIQ) was observed in 32/54 (59%) individuals, while 20/54 (37%) demonstrated superior or very superior FSIQ (Table 1). We performed group-level comparisons of cognitive, social and adaptive functions between family members with and without the BAP using non-parametric and parametric tests (Mann–Whitney U and *t*-tests, respectively), with the more conservative parametric tests reported here as there were no differences between these approaches. On average, individuals with the BAP demonstrated poorer pragmatic language, with significantly higher mean Pragmatic Rating Scale (PRS) scores (*M* = 8.75, *SD* = 7.08) compared to unaffected individuals (*M* = 2.00, *SD* = 3.05), *t* (40.99) = −4.42, *p* < 0.001. No significant differences were observed for general intellect, the Faux Pas Task (FPT), executive (D-KEFS) or adaptive function measures (ABAS-II, BRIEF; See Materials and Methods).

#### *2.2. Hypothesis 2: Specific BAP Endophenotypes Exist Across BAP Domains*

Based on our iterative characterisation process, four distinct endophenotypes of the BAP were reliably identified in the small families sample. From the natural grouping of traits, these reflected "socially unaware", "pedantic", "aloof", and "obsessive'" endophenotypes (Table 2). At the highest level of the dendrogram of the 33 BAP traits there was a clear split, whereby traits of the socially unaware and pedantic endophenotypes were more similar to each other and more dissimilar to the combination of traits of the aloof and obsessive endophenotypes. There was a significant difference between the mean proportional scores of the unaffected and BAP groups, with the BAP group demonstrating significantly higher scores on all four endophenotypes (all *p* < 0.015).


**Table 2.** Four endophenotypes of the BAP in the small families sample.


**Table 2.** *Cont.*

\* *p* < 0.05, \*\* *p* < 0.01, \*\*\* *p* < 0.001.

Analysis of receiver operating characteristic (ROC) curves indicated relatively good discrimination within the small families for the socially unaware, aloof and obsessive endophenotypes (all AUC > 0.73, all *p* < 0.025), and acceptable discrimination for the pedantic endophenotype (AUC = 0.68, *p* = 0.077). Although we note that Box's M was violated in the discriminant function analysis (likely due to variation in the sample sizes), combined the four endophenotypes captured 93% of cases (Wilk's λ = 0.47, χ<sup>2</sup> = 27.83, *p* < 0.001). In particular, the endophenotypes showed high sensitivity for the BAP group (97%) characterised by higher proportional scores, and good specificity for the unaffected group (82%) with lower proportional scores (Table 2).

## *2.3. Hypothesis 3: BAP Endophenotypes Vary in Large Multiplex Families*

Applying the endophenotype thresholds to the proportional scores of the 33 BAP traits for members of Family A and B led to the identification of all individuals classified as having the BAP. Two additional BAP cases were identified in Family B based on the presence of above threshold endophenotype scores, indicating good utility of this approach (Figure 1). One individual was excluded from this analysis due to incomplete data. Across both families, the aloof endophenotype was most common (62%), followed by obsessive (60%), pedantic (55%) and socially unaware (48%). Approximately one quarter of family members met criteria for only one endophenotype, 15% met criteria for two, and the remainder met criteria for 3–4 (62%) (Figure 1). The dominant endophenotype across both families, as determined by the highest score, was aloof (47%), followed by obsessive (26%), socially unaware (18%) and pedantic (9%).

Family A appeared to have two endophenotype profiles, with one characterised by the presence of a single endophenotype (35%) seen in individuals who were mostly married-in (67%), contrasting with the second profile (41%) of all four endophenotypes, most evident in core family members (72%) (Figure 1). Overall, the obsessive endophenotype occurred most frequently (77%), followed equally by pedantic (65%) and aloof (65%), and then socially unaware (53%). The co-occurrence of the obsessive and pedantic endophenotypes was relatively common, seen in 29% of married-ins and core family members. Overall, there was a range of dominant endophenotypes across individuals, with aloof the most frequent (35%) particularly in core family members (83%).

Contrasting with Family A, Family B had more individuals (70%) with multiple endophenotypes, in both married-in and core family members (Figures 1 and 2). All four endophenotypes were again most frequently observed in core family members, indicative of a more severe BAP presentation. Unlike Family A, however, the aloof endophenotype occurred most frequently in Family B (88%), followed by obsessive (71%), pedantic (65%), and socially unaware (65%). The aloof endophenotype was also identified as dominant (59%), evident in 70% of core family members.

**Figure 1.** Number of BAP endophenotypes present in Family A and B. Individuals married-in to Family A tend to have a single endophenotype indicating a milder BAP presentation, in contrast with core family members who have multiple endophenotypes (obsessive most frequent). In Family B, married-in and core family members tend to have more than one endophenotype, with the aloof endophenotype the most frequent.

**Figure 2.** *Cont*.

**Figure 2.** Scrambled pedigrees for Family A (panel **a**) and Family B (panel **b**) showing phenotypes and endophenotypes. All individuals with ≥1 endophenotype had the BAP, with the exception of two individuals from Family B (III-3 and IV-9) marked with an asterisk. These two individuals were clinically determined as unaffected but had above threshold endophenotype scores based on ROC curves. Some family members who were not phenotyped are not shown to preserve the anonymity of these families. The arrow indicates the proband shown in this pedigree. \*: "All individuals with ≥1 endophenotype had the BAP, with the exception of two individuals from Family B (III-3 and IV-9) marked with an asterisk. These two individuals were clinically determined as unaffected but had above threshold endophenotype scores based on ROC curves.".

#### *2.4. Correlates of the BAP Endophenotypes*

Across both families, no sex or age differences were observed for any of the endophenotypes (all *p* > 0.200). Overall, a more severe BAP presentation (indicated by a greater number of endophenotypes) was associated with reduced social adaptive functioning on both self-report and objective measures of social communication (Table 3), demonstrating good convergent validity. In particular, a more severe BAP presentation showed a strong correlation with more severe pragmatic language difficulties, with scores for each endophenotype also significantly correlated. A similar relationship was evident for the ability to detect a faux pas in social discourse and self-reported social functioning, particularly for family members with the socially unaware endophenotype (Table 3).

For the cognitive measures, a more severe BAP presentation was associated with reduced executive functioning, particularly for nonverbal measures of cognitive flexibility (switching and fluency; Table 3). A pattern of weaker correlations was also evident for specific endophenotypes, including lower IQ in the socially unaware and aloof endophenotypes (Table 3).


**Table 3.** Correlations between endophenotypes and quantitative measures in Families A and B.

\* *p* < 0.05; \*\* *p* < 0.01. PRS = Pragmatic Rating Scale, FPT = Faux Pas Test, FSIQ = Full Scale Intelligence Quotient, VIQ = Verbal Intelligence Quotient, PIQ = Performance Intelligence Quotient. <sup>a</sup> Nonverbal executive function subtests.

#### **3. Discussion**

We investigated the BAP to capture phenotypic variation within and between high-risk ASD families, with the aim of improving characterisation and identification of individuals for accurate molecular genetic analysis. We identified multiple individuals with the BAP in large multiplex families using rigorous phenotyping and a novel reliable endophenotyping approach, validated in an independent sample of small ASD families and with objective measures of cognitive, social and adaptive functioning. This deep phenotyping showed that specific BAP endophenotypes exist beyond the conventional BAP domains of social relationships, communication, and circumscribed interests and behaviour, allowing for more granular detection of subtle features of the BAP. Distinct patterns of inheritance were identified by applying the endophenotype approach in two large multiplex families, highlighting the utility of such a framework to identify putative ASD genes of dominant effect.

The research model employed here to phenotype rare large multiplex families reveals a pattern consistent with autosomal dominant inheritance of ASD/BAP traits that would not have been captured without deep phenotyping. Fifteen individuals (23%) met criteria for ASD and 33 (51%) the BAP, including some married-in individuals. Our promising endophenotype analysis provides further insight into specific profiles of the BAP and its varied presentation. Traditionally, ASD family studies include 2–3 affected individuals [36,37]. For example, four candidate ASD genes were identified in seven ASD/BAP pedigrees with ≥ 3 affected individuals [38]. Larger multiplex families remain scarce in the literature [30,39]. Here, we identified more subtle indicators of carrier status in two large families, using a robust endophenotyping method with good sensitivity and specificity to detect the BAP in two independent samples.

The BAP is strongly associated with ASD and may be considered a marker of genes that contribute to ASD risk [31,40]. Here we delineated the BAP into distinct endophenotypes that fulfil Gottesman and Gould's criteria for a true endophenotype (Supplementary Table S1) [41]. This includes recent proposed revisions to account for the strong overlap in the complex aetiology and genetic liability underlying the spectrum of trait expression across many neuropdevelopmental and psychiatric conditions, with the expectation that putative endophenotypes may not be strictly disorder-specific [34]. Each endophenotype cluster was characterised by a combination of communication, personality and behavioural indicators showing how specific traits across the traditional BAP domains may group together to form distinct endophenotypes or 'profiles' that capture phenotypic variation within and between families. As summarised in Table 4, these profiles capture identifiable 'personas' that

have core characteristics with high face validity. These profiles also vary with functional correlates in distinct ways, supporting their construct validity. For example, the aloof endophenotype was characterised by a lack of innate social motivation or ability to meaningfully connect and empathise with others, associated with decreased theory of mind, and lower executive and intellectual functioning. One individual dominant for the aloof endophenotype described social interactions as "a means to an end". In contrast, the pedantic endophenotype was primarily characterised by detail-oriented traits, showing no associations with intellectual, executive or adaptive functions. Unsurprisingly, given the importance of social communication deficits in ASD and the BAP, all endophenotypes were associated with poor social communication, with the socially unaware endophenotype most broadly affected across social, intellectual, executive and adaptive functions (Table 4).


**Table 4.** Summary of the BAP endophenotypes and their functional correlates.

In contrast to conventional approaches that commonly rely on broad and non-specific classification of the BAP, the endophenotype framework implemented in the present study reliably identifies individuals that meet threshold criteria for a specific endophenotype profile potentially linked to susceptibility genes that confer risk for ASD. Clustering traits across conventional BAP domains to achieve granular characterisation of endophenotype profiles may thus improve detection of the BAP to facilitate gene discovery. We propose that such a framework is necessary to increase the efficacy of assessment protocols. It may also allow for a more sophisticated mapping of psychological and neural correlates to delineate the neurobiology of ASD, which has been characterised by inconsistencies in the literature to date [42–46].

Phenotypic heterogeneity was evident in both families at the endophenotype level suggesting that a single familial mutation may produce a phenotypic spectrum, with other genetic, epigenetic and environmental factors potentially influencing expression. With the advancement of high-throughput next generation sequencing technologies, meticulous phenotypic characterisation of both affected and apparently unaffected individuals remains essential for accurate data interpretation. In other words, identification of subtle endophenotypes, such as the four identified here, are crucial for advancing gene discovery programs. Importantly, these proposed BAP endophenotypes should be replicated and validated in subsequent research and further endophenotypes sought through a deep phenotyping approach.

Although multiplex families with ASD are genetically homogeneous, our phenotyping analysis suggests possible bi-lineal inheritance of the BAP in both families. Therefore, multiple risk alleles may contribute to ASD/BAP in later generations, consistent with recent genetic and phenotyping evidence [47,48]. The importance of unique de novo genetic changes in both sporadic (or 'simplex') ASD [25], and small multiplex ASD families [39] has become increasingly apparent. However, with at least seven individuals with ASD and many more with the BAP in our families, there is less likelihood of de novo changes contributing to each phenotype. It is much more likely that there is a single genetic variant of major phenotypic effect in each family, with the possibility that there are additional de novo genetic changes in some individuals that contribute to phenotypic severity.

The intensive nature of the study meant that clinicians were not blinded to family relationships, potentially leading to investigator bias. However, our diagnostic method of using the independent assessments of experienced clinicians and subsequent consensus agreement aligns with current best practice for ASD/BAP diagnosis. This method was further strengthened by objective cognitive and behavioural testing using quantitative measures. We selected a relatively low threshold for BAP classification, leading to the identification of many affected individuals. However, this approach is justified in a family with a clear genetic liability for ASD and was validated by the finding of consistent data-driven endophenotypes in the small families. Successful gene identification in future work requires capture of all individuals who may carry the putative variant, with the approach outlined here designed to enable more robust gene identification work. Future genetic investigations are required to test the reproducibility of the four identified endophenotypes over time and in additional independent samples, as well as determine their rate of occurrence in affected families compared to the general population (see Supplementary Materials for materials and guidelines on assessment and scoring of the four endophenotypes).

#### **4. Materials and Methods**

#### *4.1. Large Multiplex Families*

Large multiplex families were primarily ascertained from the Barwon Autism Database as part of a broader Collaborative Autism Study [49]. For inclusion as a multiplex family, >8 individuals with a diagnosis or suspected diagnosis of ASD or the BAP were required. The two fully characterised large multiplex families used to examine inheritance patterns using BAP endophenotypes are referred to as 'Family A', ascertained from the Barwon Autism Database, and 'Family B', who were self-referred. All available relatives were recruited, including those with and without reported BAP traits. The study was approved by the appropriate institutional review boards including the Human Research Ethics Committees of Barwon Health (HREC 02/34 and 04/57; reviewed 9 August 2017) and The Royal Children's Hospital (HREC 25043Y; reviewed 26 September 2017), Australia. Informed consent was obtained from all participants or a parent/guardian, and all study methods were carried out in accordance with relevant guidelines and regulations.

#### *4.2. Protocol for Diagnosing ASD in Large Multiplex Families*

ASD diagnoses were confirmed using the Autism Diagnostic Observation Schedule—Generic [50] (ADOS-G), or the Autism Diagnostic Interview—Revised [51] (ADI-R), based on Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) criteria [52]. For adults, the structured Family History Interview [5] (FHI) was administered by NJB, while for adolescents, a detailed developmental and medical history was obtained. Quantitative measures of intellect, executive functioning, adaptive behaviour and social functioning were also completed (Table 5). Testing was undertaken over a number of days to minimise fatigue effects. A physical examination was conducted for dysmorphic and neurocutaneous features and growth parameters. Standard genetic testing (karyotype, fragile X testing) and metabolic investigations were performed on probands.


**Table 5.** Protocol for diagnosing ASD and phenotyping the Broader Autism Phenotype (BAP) in large multiplex families.

ASD = Autism Spectrum Disorder, ADI-R = Autism Diagnostic Interview-Revised, ADOS-G = Autism Diagnostic Observation Schedule-Generic, DSM-IV = Diagnostic and Statistical Manual of Mental Disorders (4th edition). + all individuals completed this assessment, ± only some individuals completed this assessment, − no individuals completed this assessment. <sup>a</sup> Wechsler Abbreviated Scale of Intelligence and subtests of the Delis–Kaplan Executive Function System. <sup>b</sup> The Adaptive Behavioural Assessment System (2nd edition) and The Behavioural Rating Inventory of Executive Function.

Across Family A and B, 65 individuals were recruited: 16 children (2–12 years), 9 adolescents (13–17 years) and 40 adults (18–79 years) spanning 4 generations. Of these, 16/65 met criteria for a diagnosis of ASD. Family B also reported a deceased family member who had a diagnosis of ASD (not shown on the pedigree to preserve anonymity), an additional family member with ASD who was not recruited, and a child who had been diagnosed with ASD but was not assessed. Scrambled pedigrees of affected status are presented to preserve participant anonymity (Figure 3). In each family, individuals directly related to a matriarch are classified as 'core family'; others are referred to as 'married-in'. Family A comprised 30 individuals, including 7 diagnosed with ASD (6/9 children, 1/6 adults; Figure 3a). Nineteen were core family; 11 were married-in. In Family B, the matriarch (II-2) was elderly and too unwell to be behaviourally assessed and subsequently passed away during the study. Three children and three adolescents participated in a limited range of phenotyping activities and as such, these individuals were excluded from final analyses. Nine participants had ASD (5/7 children, 1/5 adolescents, 3/22 adults; Figure 3b); 31 were core family, and four were married-in.

**Figure 3.** Scrambled pedigrees for Family A (Panel **a**) and Family B (Panel **b**) at recruitment. Individuals with a diagnosis of ASD are marked in black, and individuals recruited from the broader families are marked in yellow. White diamonds are individuals who were not assessed but are represented here to preserve the pedigree lines. One individual in Family B with a diagnosis of ASD was deceased (not shown to preserve anonymity) and one was too young to be assessed. The arrow indicates the proband shown in this pedigree.

## *4.3. Protocol for Phenotyping the BAP in Large Multiplex Families*

We employed a mixed methods approach to rigorously assess the BAP, including an evaluation of general intellect, executive functions, adaptive behaviour, social cognition and language pragmatics (Table 5). A purpose developed semi-structured interview, the Broader Autism Phenotype Interview (BAPI), was also administered by three clinicians with expertise in neurobehavioural disorders (NJB, SJW, IES) to all individuals ≥ 13 years to characterise the presence, nature and extent of BAP traits. Interview responses were independently rated by the clinicians on all traits, with consensus agreement used to determine the presence of the BAP. Questions focused on the participant's life story, personal qualities, relationships, social functioning, and developmental, medical, psychiatric and vocational history. During the interview we included one or two "intentional errors" to elicit pragmatic elements of the BAP, such as terse speech [10]. Quantitative measures of social cognition and language pragmatics were also included within the interview. Social cognition was assessed using an adapted Faux Pas Task [53,54] (FPT), involving the standardised administration of four faux pas stories and four

control stories [55] (maximum score = 40, *M* = 37, *SD* = 4). Pragmatic language was assessed with the Goldman-Eisler Cartoon task [56], which has previously been used to assess overly detailed speech and longer pauses between words in the BAP [10]. This task measures discourse production by eliciting a description of an eight frame captionless cartoon, "The Cowboy Story", over three successive trials [57]. Control individuals show increased verbal fluency with successive trials compared with decreased fluency in individuals with communication deficits [56]. Following the interview, the Pragmatic Rating Scale (PRS) [10] was independently completed by the three clinicians and consensus ratings reached. A score ≥ 4 defined pragmatic impairment [11] (Table 5).

Other cognitive domains and adaptive behaviour were assessed by a member of the team trained in psychometric assessment (TV) on a separate testing occasion. This included estimating full scale (FSIQ), verbal (VIQ) and performance (PIQ) intelligence quotients, derived with the four subtest Weschler Abbreviated Scale of Intelligence [58] (WASI; *M* = 100, *SD* = 15). Executive functions were measured with seven subtests of the Delis–Kaplan Executive Function System [59] (D-KEFS; *M* = 10, *SD* = 3). The second edition of the Adaptive Behavioural Assessment System [60] (ABAS-II) and the Behavioural Rating Inventory of Executive Function [61] (BRIEF) were used to assess adaptive functioning. The quantitative assessment provided a measure of convergent validity for the BAPI. At the completion of testing, final review of all qualitative and quantitative data by the three clinicians was used to confirm BAP status based on consensus agreement.

## *4.4. Small Families*

We recruited an independent sample of 45 individuals from 20 small families with at least one member diagnosed with ASD, through advertisements and from the Barwon Autism Database. All participants provided written informed consent, as described above. Inclusion criteria were: (i) no diagnosis of ASD (based on DSM-IV or DSM-V criteria), (ii) ≥1 family member with ASD (based on DSM-IV or DSM-V criteria), and (iii) >12 years of age. Individuals were classified as having the BAP if they met ≥2 criteria for a BAP diagnosis on the Broader Autism Phenotype Rating Scale [5] (BAPRS) administered by an independent ASD expert (CG). Individuals were classified as unaffected if they did not meet criteria for any BAP traits or a diagnosis of ASD. This identified 30 individuals with the BAP (4 adolescents, 26 adults) in the 20 families, ranging in age from 14–71 years, and 11 unaffected adult family members ranging in age from 18–53 years. Four adult individuals showed only one BAP trait on the BAPRS and thus, were excluded from analyses based on the above criteria. All 45 individuals were also administered the BAPI by the independent ASD expert (CG) to characterise and rate their BAP traits. To ensure inter-rater agreement, video recordings of the interviews of a subset of these individuals were independently rated by two of the three clinicians (SJW, IES), with ratings confirmed by consensus agreement.

In 31/45 individuals, the WASI-II [62] was administered to measure Verbal Comprehension (VCI) and Perceptual Reasoning (PRI) and to estimate FSIQ. All individuals were within the normal range based on FSIQ (Table 6), with no significant differences between unaffected and BAP individuals for age or intellect (all *p* > 0.250). For 27 individuals, average total scores were available for the Broader Autism Phenotype Questionnaire (BAPQ) collected as part of a separate study. Consistent with expectations, there was a trend for higher scores on the BAPQ in the BAP group, with a medium effect size (*t* (24.56) = −1.96, *p* = 0.062, *d* = 0.70).


**Table 6.** Demographics of the small families sample.

BAPQ = Broader Autism Phenotype Questionnaire, FSIQ = Full Scale Intelligence Quotient, VCI = Verbal Comprehension Index, PRI = Perceptual Reasoning Index. <sup>a</sup> Data available for unaffected (*n* = 9) and BAP (*n* = 18). <sup>b</sup> Data available for unaffected (*n* = 10) and BAP (*n* = 21).

#### *4.5. Endophenotyping Procedure*

We used an iterative process to characterise, refine and assess endophenotypes of the BAP in the two independent samples, as summarised in Figure 4.

**Figure 4.** Iterative process used to identify and assess BAP endophenotypes. Step 1: An exhaustive list of BAP traits was generated through detailed clinical assessment of a number of large multiplex families, resulting in the identification of a final set of 33 BAP traits. Step 2. Validation analyses in an independent and aggregated sample of small families resulted in a 4-cluster solution representing distinct BAP endophenotypes. Step 3. Cut-off scores modelled in Step 2 were independently applied to large multiplex Family A and B for endophenotype classification and assessment of inheritance patterns.

## 4.5.1. Step 1: Identification of Potential BAP Endophenotypes in Large Multiplex Families

Using a grounded theory approach, BAP traits were initially identified from a detailed literature review targeting the theoretical domains described in the seminal work of Bolton (1994), on which the conceptualisation of the BAP is largely based. The domains included speech, literacy, pragmatics, relationships, and circumscribed interests, which were explored in-depth using our BAP phenotyping protocol (described above) in members of a number of unrelated large multiplex families primarily ascertained through the Collaborative Autism Study [55]. This in-depth characterisation was phenomenologically based [63], whereby the number of traits within each domain was fully expanded through administration of the semi-structured interview (BAPI) with separate family members until no further traits were identified (saturation) to capture the entire range of BAP traits (Supplementary Table S2).

This deep phenotyping produced an exhaustive list of 36 BAP traits. Ordinal ratings of these traits were then assigned to capture subtle variations in their presentation, with severity rated on a scale of 0 = absent, 1 = mild, 2 = moderate, and 3 = severe. The presence of traits through each individual's developmental history was also evaluated where available. Exploratory hierarchical cluster analysis was then performed to identify potential BAP endophenotypes. We used Ward's method with Euclidean squared distances based on z-scores to progressively group traits by minimising the variability within clusters and maximising the variance between clusters [64]. Interpretation of cluster groupings was informed by the relative similarity and dissimilarity in the linkage output combined with clinical judgement, leading to the initial identification of five endophenotypes. Inspection of these endophenotypes revealed a consistent rating of 0 for two of the 36 traits across all interviews, leading to their removal. One further trait reflecting inflexibility to intentional errors was removed due to challenges in reliably assessing the trait across interviewers, resulting in a final set of 33 BAP traits (Supplementary Figure S1).

#### 4.5.2. Step 2: Validation of BAP Endophenotypes in Small Families

In the small families sample, an independent expert in ASD assessment (CG) interviewed and rated 45 participants on the 33 BAP traits based on all qualitative and quantitative data, with a subset (9%) rated via consensus between CG, IES and SJW to ensure consistency in ratings across both samples and to clarify borderline cases. As above, Ward's hierarchical cluster analysis was used to examine natural trait groupings. This led to the identification of four endophenotypes that showed a high degree of similarity to the initial five cluster solution (Spearman's *r* range = 0.710–0.976, *p* < 0.001).

To account for a varying number of traits in each cluster we computed proportional scores, whereby scores on each trait (range 0–3) were summed and divided by the maximum total score for that cluster, to produce four cluster scores for each individual. An ROC curve was plotted for each cluster in the small families sample to identify optimum cut-off scores for determining endophenotypic status using Youden's Index to allow mildly affected individuals to be included [65,66]. The highest score was used to represent the most prominent endophenotype for each individual, calculated as the difference between the observed endophenotype (i.e., cluster) score and the threshold score for the endophenotype (i.e., cut-off score).

#### 4.5.3. Step 3: Assessment of BAP Endophenotypes in Family A and B

A team member who had not been involved in the phenotyping of Family A and B (Step 1) performed the endophenotype analysis (KT). Proportional scores for the four endophenotypes were calculated, and family members classified as having the endophenotype if their proportional score was greater than or equal to the cut-off scores identified in the small families analysis (Step 2). As above, the highest score (observed endophenotype score—threshold endophenotype score) for any endophenotype was used to represent an individual's most prominent endophenotype. A discriminant function analysis was then used to determine the sensitivity and specificity of the endophenotype approach to identifying the presence of the BAP in these families. In addition, endophenotype results were correlated with measures of intellect, executive, social and adaptive functions using conservative non-parametric Spearman's correlations (*rs*). Materials used for assessment and scoring of endophenotypes are available in Supplementary Materials.

## **5. Conclusions**

Despite significant advances towards unravelling the genetic heterogeneity of ASD, the underlying genetic aetiology remains unsolved for the majority of cases, in part due to significant challenges in identifying endophenotypes and potential carriers. We used a rigorous phenotyping approach to characterise the BAP in two large multiplex families with dominant inheritance of ASD and the BAP. Deep phenotyping identified four endophenotypes, showing differentiation of BAP features beyond traditional domain approaches. The proposed endophenotype approach advances current understanding and characterisation of the phenotypic spectrum for improved detection of the BAP that may facilitate gene discovery.

## **Supplementary Materials:** The following are available online at http://www.mdpi.com/1422-0067/21/21/7965/s1.

**Author Contributions:** K.J.T. contributed to data analysis and draft manuscript; N.J.B. contributed phenotyping, data analysis and the draft manuscript; C.C.G. contributed to phenotyping and data analysis; P.J.L. and M.B. contributed to project conception; T.D., T.V. and V.A. contributed to phenotyping; E.P.K.P. contributed to data interpretation, data curation, visualisation, the draft manuscript; M.B.D. contributed to project coordination and data analysis; I.E.S. contributed to project conception and co-ordination, phenotyping and the draft manuscript; S.J.W. contributed to project co-ordination, phenotyping, data analysis and the draft manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This project received financial support from the National Health and Medical Research Council of Australia (Grant Numbers: 490037, 566759, 1044175, 1098255), the Australian Research Council (Grant Number: FT100100764), the Jack Brockhoff Foundation, Pfizer Australia, the Percy Baxter Charitable Trust, Perpetual Trustees, the Murdoch Children's Research Institute, and The University of Melbourne. PJL is supported by an NHMRC Career Development Fellowship (GNT1032364). This work was made possible through Victorian State Government Operational Infrastructure Support and Australian Government NHMRC IRIISS. The funding bodies played no role in the study design, or data collection, analysis or interpretation nor in writing of this manuscript.

**Acknowledgments:** We wish to acknowledge the late Peter Hewson, who was the founding father of the project.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Data Availability:** All data is available within the manuscript and its Supporting Information, although scrambled pedigrees have been used to preserve participant anonymity.

## **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **An Overview of the Main Genetic, Epigenetic and Environmental Factors Involved in Autism Spectrum Disorder Focusing on Synaptic Activity**

**Elena Masini 1,**†**, Eleonora Loi 1,**†**, Ana Florencia Vega-Benedetti 1, Marinella Carta 2, Giuseppe Doneddu 3, Roberta Fadda <sup>4</sup> and Patrizia Zavattari 1,\***


Received: 30 September 2020; Accepted: 30 October 2020; Published: 5 November 2020

**Abstract:** Autism spectrum disorder (ASD) is a neurodevelopmental disorder that affects social interaction and communication, with restricted interests, activity and behaviors. ASD is highly familial, indicating that genetic background strongly contributes to the development of this condition. However, only a fraction of the total number of genes thought to be associated with the condition have been discovered. Moreover, other factors may play an important role in ASD onset. In fact, it has been shown that parental conditions and in utero and perinatal factors may contribute to ASD etiology. More recently, epigenetic changes, including DNA methylation and micro RNA alterations, have been associated with ASD and proposed as potential biomarkers. This review aims to provide a summary of the literature regarding ASD candidate genes, mainly focusing on synapse formation and functionality and relevant epigenetic and environmental aspects acting in concert to determine ASD onset.

**Keywords:** autism spectrum disorder; ASD; genetic factors; epigenetic factors; environmental factors; pervasive developmental disorder; post-synaptic density; CNV; SNP; gene fusion

## **1. Introduction**

Autism is a complex syndrome characterized by a range of conditions and symptoms that frame it as a spectrum of disorders (autism spectrum disorder, ASD), including relevant physiological and biochemical ones, whose core symptoms includes social deficits and restrictive/repetitive behaviors. In the 1960s, thanks to the studies of Bernard Rimland, it was understood that ASD is a psychiatric disorder which might be grounded on a combination of genetic and environmental factors. To cope with ASD, it is necessary to achieve positive results through multidisciplinary, biomedical and behavioral therapies. Early diagnosis and intensive therapeutic interventions greatly improve the disease outcomes. In this context, the discovery of new diagnostic methods to detect ASD-related genetic alterations and biomarkers becomes fundamental in order to make an early diagnosis of the disorder.

In this review, we provide an overview of the genetic, epigenetic and environmental factors contributing to ASD pathogenesis.

#### **2. Autism**

#### *2.1. Clinical Characteristics of ASD*

ASD can be considered as a group of early-onset neuroevolutionary disorders which seem to be at the basis of alterations in brain connectivity, with cascading effects on many neuropsychological functions [1,2]. The The Diagnostic and Statistical Manual of Mental Disorders (DSM-5) (The American Psychiatric Association, APA, Philadelphia, PA, USA, 2013) gives the condition of autism the attribute of "spectrum" and uses criteria derived from diagnostic research assessment tools.

As indicated in the DSM-5 (APA, Philadelphia, PA, USA, 2013), individuals with ASD are characterized by persistent deficits in social communication and social interaction across multiple contexts and by restricted, repetitive patterns of behaviour, interests or activities. Deficiency in social communication and social interaction might appear in the form of deficits in social-emotional reciprocity, in nonverbal communicative behaviours used for social interaction and deficit in developing, maintaining and understanding age-appropriate relationships. As reported in the DSM-5, symptoms could be masked during early development and fully manifest only when social demands exceed limited capacities or may be hidden by learned strategies in later life. The impairments should cause clinically significant damage in social, occupational or other important areas of current functioning.

ASD symptoms should not be better explained by a diagnosis of intellectual disability (ID) or global developmental delay. Intellectual disability and ASD might co-occur.

The DSM-5 proposes differentiations based on commorbidity with intellectual impairment, language impairment, another neurodevelopmental, mental, or behavioral disorder, genetic or medical condition or environmental factors. Furthermore, it is possible to differentiate between different levels of severity according to the level of support required to function in daily contexts. According to this description, it is clear that different clinical variants of ASD exist and should be taken into account for diagnosis and intervention (APA, 2013). A distinction is made between a congenital form of ASD, representing a small percentage of cases in which the symptoms occur shortly after birth and in which the genetic fingerprint is prevalent, and a regressive or acquired form, in which the disorder appears after a period of typical development and it is not characterised by typical and constant genetic abnormalities, although several single nucleotide polymorphisms (SNPs) have been associated with the disease [3,4]. SNPs constitute variations of a single nucleotide in certain DNA traits. SNPs associated with ASD have been identified in genes encoding for proteins involved in different processes, including: cellular detoxification, some neuronal receptors and metabolism of several neurotransmitters and metabolites, in particular those of the metabolic circuits of methylation and transulfuration [5,6].

### *2.2. Epidemiology*

Autism has been considered relatively rare for many years, with a prevalence of less than 1 in 1000 children, while today, the estimated rate is 1 in 160, and it seems likely to increase in the coming years (World Health Organization, Geneva, Switzerland, 2019). In the last decade, the study of ASD genetics has proved to be crucial not only to interpret and explain its phenotypic heterogeneity but also to discover new diagnostic procedures and therapies. It is estimated to-date that hundreds of genes are involved in ASD, resulting in a unified spectrum of different phenotypes, including different language and social deficits with various associated sub-phenotypes [7].

ASD show an unequal distribution based on gender: males have a four times higher risk of developing the disorder than females. A number of hypotheses have been made to interpret this unequal prevalence: in females, a higher dose of genetic "defect" is required than in males, consistent with the hypothesis of the contribution of protective genetic factors in females, because in males, there is only one X chromosome, so when alterations appear, it cannot be compensated by the normality of a second X chromosome. An association between testosterone levels and ASD risk has also been described: males have more frequent inflammatory reactions and use the brain in a more focal way and therefore suffer more from alterations in neuronal development that affect the connection systems between the different areas [8].

Genetic, environmental and developmental factors play a key role in the onset of autism spectrum disorders, as highlighted from many epidemiological studies [9,10]. It is unlikely that a single condition or event plays a major role in the causality of ASD; based on research to date, rather, none of the risk factors identified is a necessary and sufficient condition for ASD. Even for syndromic or secondary autism, which refers to autism with a single defined cause, such as fragile X syndrome and tuberous sclerosis, none of these etiologies are specific to autism because each of them encompasses a variable proportion of individuals with and without autism [11]. At present, ASD appears to have a multifactorial etiology to which developmental (in utero and early childhood), environmental and genetic aspects contribute, in as-yet unknown and different ways. Emerging methodologies in genomics and epigenomics research could be the key to elucidate the mysteries underlying the epidemiology of autism spectrum disorder.

#### **3. Genetic and Epigenetic Factors**

The involvement of genetic etiology in ASD was first suggested by a study on twins reported in the 1970s. The genetic heritability of a trait can be estimated by comparing the phenotypic concordance between monozygotic twins (MZ), which have 100% genetic similarity, and dizygotic twins (DZ), who have approximately 50% genetic similarity. The greater the difference between the concordance of MZ twins and DZ twins, the higher the genetic heritability and the contribution of genetics to that trait. The genetic fingerprint was confirmed by the high concordance of autism in monozygotic twins (60–90%) compared to dizygotic twins (5–40%) [12,13].

One of the largest studies to date, involving more than two million children born between 1982 and 2006 in Sweden, concluded that ASD has an inheritability of 45–56% [14]. A study of all twins born in the UK between 1994 and 1996 estimated an inheritability of more than 56% using concordance in MZ (0.77–0.99) compared to DZ (0.22–0.65) twins [13].

Considering all lines of evidence, the genetic heritability of ASD is estimated to play a fundamental role in ASD onset, together with environmental and epigenetic factors. However, despite most of the studies aimed at understanding the etiological basis of ASD having been focused on the genetic component, it has been possible to associate genetic variants to only a relatively small fraction of ASD patients. The problem, known as the "missing heritability issue", is common to most complex genetic diseases. Several hypotheses have been put forward to justify the missing heritability, such as the existence of poorly characterised variants, genotype/genotype interactions, incomplete penetrance, epigenetic factors and genotype/environment interactions [15–17].

In the 1990s, most of the research consisted of candidate gene studies, focusing on a particular gene that might be involved in ASD. From 2005, technologies such as whole-exome sequencing (WES), but also microarrays, have allowed genome-wide studies, leading to the identification of different variations in copy number (CNVs), DNA segments larger than 1 kilobase present in a variable copy number compared to a reference genome and single nucleotide variations (SNVs) in autistic patients, suggesting a highly heterogeneous genetic architecture. CNVs would contribute at about 15% and SNVs at 7% to the causes of ASD. Only a few of these genetic alterations have such complete penetrance that they are associated with ASD in almost every person who carries that variant [18]. On the contrary, genetic alterations with incomplete penetrance, variable expressivity or both are more frequently observed. However, the cause in most ASDs (>75%) remains elusive [18]. Genome-wide association studies (GWAS) have identified several SNPs associated with ASD. The first GWAS study allowed the identification of six polymorphisms, including some localized in *CDH10* and *CDH9*, genes encoding for cadherins, proteins that are important in cell adhesion, as common genetic variants in ASD [19]. However, the fact that several GWASs failed to identify some relevant loci despite the use of genetic data from more than 1000 families affected by autism suggests that the effect of individual common variants is relatively small [20].

On the other hand, WES analysis of affected and unaffected individuals has proven to be a powerful approach that offers new opportunities of sporadic cases studies and has the ability to detect mutations and de novo variants with incomplete penetrance [21]. WES has already led to the identification of over 150 new candidate genes for ASD.

In addition to SNPs, evidence is accumulating that CNVs play an important role in human neuropsychiatric diseases. ASD patients have three to five times more de novo CNVs than other family members and unaffected controls [22,23].

CNVs can influence gene expression, thus contributing to the pathogenesis of the disease through various mechanisms, including gene dosage, gene interruption, position effects, gene fusion and unmasking of recessive alleles or polymorphisms [24].

Screening for CNVs has proven to be a method of choice for identifying genes associated with ASD susceptibility [25]. Although CNVs associated with the disease are usually unique and show a low frequency in the population, they are identified in 8–21% of individuals with ASD and are most likely related to a severe clinical picture [26,27]. In addition, previous studies have indicated that individuals with syndromic ASD and intellectual disability have more pathogenic CNVs than individuals with non-syndromic ASD or ID [26,28].

CNVs may also lead to the generation of chimeric genes. Several studies have investigated whether fusion transcripts may lead to an increased ASD susceptibility. Holt and colleagues identified a fusion transcript involving *MAPKAPK5* and *ACAD10* genes in two ASD probands. However, the fusion transcript was detected at similar rates in both ASD patients and controls and had a premature stop codon, suggesting that it may be degraded by nonsense-mediated decay [29]. Similarly, Pagnamenta et al. identified a *DOCK4*-*IMMP2L* fusion transcript, likely to be subjected to nonsense-mediated decay in ASD individuals and their unaffected family members [30]. A study conducted on a multiplex family identified a *BST1*-*CD38* fusion transcript in one ASD proband with asthma, suggesting that it may be related to the more severe phenotype of this patient compared to the other ASD sibling [31]. qRT-PCR analysis showed that the fusion transcript was less expressed compared to the wild-type *BST1* transcript in the lymphoblastoid cell line derived from the proband, while the aberrant protein was not detected in a preliminary Western blot analysis [31]. Recently, our research group identified a microdeletion leading to the formation of an *ELMOD3*-*SH2D6* chimeric transcript in a multiplex ASD family [32]. *SH2D6* is expressed at extremely low levels in blood cells. On the other hand, the fusion transcript was highly expressed in PMBCs from the two ASD siblings and their unaffected mother carrying the deletion, suggesting that it was not subjected to nonsense-mediated decay. Bioinformatic analysis has shown that the fusion transcript would encode for a chimeric protein with an interrupted domain of ELMOD3 and would not contain the canonical SH2D6 sequence, suggesting an impaired function of the protein [32]. These results suggest a possible contribution of fusion transcripts in the complex ASD phenotype. Therefore, in case of copy number loss, the possible transcript fusion and chimeric protein product should be deeply investigated.

Recent studies have shown that epigenetic factors, including DNA methylation, hystone modifications and microRNAs (miRNAs), could play an important role in predisposition to autism.

We herein provide an overview of the main candidate genes (extensively reviewed in [33]) and epigenetic mechanisms involved in ASD etiology. Figure 1 summarizes some ASD candidate genes and epigenetic factors, belonging to the pathways mainly associated with ASD described in this review.

**Figure 1.** Candidate genes and epigenetic factors representative of the main processes involved in autism spectrum disorder (ASD) development. The illustration shows a synapse between neurons (presynaptic cell in violet and postsynaptic cell in green). On the bottom-left, a cell body of a neuron including different nuclear and cytoplasmic mechanisms involved in ASD. In the nucleus, several processes implicated in gene expression regulation are shown: (**1**) chromatin packaging and factors involved in chromatin remodeling; (**2**) gene transcription regulated by transcription factors; (**3**) DNA methylation at promoter region associated with transcription inhibition of target genes; (**4**) alternative splicing and mRNA export to the cytoplasm. In the cytoplasm, the following mechanisms are shown: (**5**) regulation of protein translation by the CYFIP1-EIF4E-FMR1 complex; (**6**) post-transcriptional regulation by miRNA; (**7**) protein ubiquitination and degradation by proteasome. On the right, the synapse architecture and functionality mechanisms associated with ASD. In the presynaptic cell, (**8**) TSC proteins and co-chaperons. (**9**) The neurexin/neuroligin transsynaptic complex and (**10**) the voltage-gated ion channels are represented. In the postsynaptic cell, (**11**) actin filaments, capping proteins and scaffold proteins; (**12**) some members of PI3K/AKT pathway, RAS signal transduction pathway and MET receptor tyrosine kinase pathway. Chromatin remodelers are indicated in beige, transcription factors in pink, proteins involved in RNA binding and export in light blue, protein ubiquitination in purple, scaffold proteins in red, cell growth and proliferation proteins in green and their related pathway members in grey. A more comprehensive list of ASD candidate genes can be found in Table 1 and along the text. Figure created using BioRender.com images.

#### *3.1. Relevant Candidate Genes*

Case-control studies on population and animal models have pointed out more than 800 genes associated with autism. The most affected genes in ASD encode for proteins involved in chromatin remodeling and transcriptional regulation, cell proliferation and mostly synaptic architecture and functionality. In this review, we will focus on this last category since our recent studies have also pointed out alterations in these genes fundamental for a proper synaptic function. Table 1 provides a summary of several genes clearly implicated in ASD, included in the SFARI Gene database as high confidence ASD genes (release 26 October 2020, gene.sfari.org) belonging to the other categories. Most of these genes were indicated in the largest exome sequencing study of ASD to date [34], as well as in the list narrowing down the number of amygdala-expressed genes associated to the social pathophysiology of ASD [35].


#### **Table 1.** Several relevant ASD candidate genes.

Prepared by the authors with data from gene.sfari.org (release 26 October 2020).

### Synaptic Architecture and Functionality

It is not surprising that many candidate ASD genes are involved in synaptic architecture and function, which allows the transmission of information between neurons and between neurons and other cells, such as muscle, sensory and other cells. Many ASD candidate genes are involved in dendritic spine formation. Dendritic spines are small actin-rich protrusions that form the postsynaptic part of most excitatory synapses. Remodeling of actin cytoskeleton is responsible for the changes in the shape and size of dendritic spines and, consequently, to the synaptic functions [36]. Actin regulation mechanisms regulate the formation, maturation and plasticity of dendritic spines and of neuronal processes, such as learning and memory [36]. Abnormalities in the number and shape of dendritic spines have been observed in several neurological disorders, including autism, and contribute to brain dysfunction [37].

Post-synaptic density proteins (PSD),including cell adhesionmolecules, scaffold proteins, receptors and cytoskeleton proteins, are fundamental for synaptic transmission and plasticity. Alterations of these proteins have been associated with many neurological disorders, including ASD [38].

• Cell adhesion molecules

Neurexins (NRXN) and neuroligins (NLGN) are transmembrane synaptic proteins that form the neurexin/neuroligin transsynaptic complex, crucial for synaptic function [39]. NLGNs bind to SHANK3 through PSD-95 and other synaptic proteins.

Loss of function NRX1 variants in ASD individuals have been identified in multiple studies [23,40,41]. Studies conducted in animal models knockout (KO) for *NLGN* and *NRXN* family members showed that mice develop ASD-like symptoms and have confirmed their role in synaptic function [42–44].

*CNTNAP2*, also known as *CASPR2*, encodes for a member of the NRXN family that serves as an adhesion protein, primarily between neuronal and glial cells. CNVs encompassing *CNTNAP2* and resulting in its decreased expression have been described in subjects with ASD [45–47]. The suppression of CNTNAP2 in murine models causes autistic behaviors, such as repetitive behaviors and reduced socialization and communication [48].

• Scaffold proteins

SHANK gene family, including *SHANK1*, *SHANK2* and *SHANK3*, has been suggested as a strong candidate for ASD. SHANK proteins are multi-domain post synaptic density scaffold proteins that connect neurotransmitter receptors, ion channels and other membrane proteins to cytoskeleton actin and signaling proteins. These proteins are important for synapse formation and dendritic spine maturation [49]. Rare deletions of *SHANK2* and de novo variants causing loss of protein function have been identified in individuals with ASD [50]. A microdeletion encompassing *SHANK3* determines Phelan–McDermid syndrome, characterized by intellectual disability, ASD, severe speech disorders and epilepsy [51]. A meta-analysis of SHANK mutations found low frequency of *SHANK1* and *SHANK2* deleterious mutations in contrast to the high frequency of loss of function *SHANK3* mutations in cases with ASD [52].

Our studies led to the identification of another promising ASD candidate gene, *CAPG*. This gene encodes for a member of the gelsolin family of actin-regulatory proteins, important for the remodeling of actin architecture. A microdeletion encompassing the entire *CAPG* gene has been recently described in three completely independent families in the heterozygous [53,54] and homozygous state [55]. Importantly, a reduced CAPG expression, both at transcriptional and protein levels, has been detected in the Sardinian family members carrying the deletion, and reduced *CAPG* mRNA levels have been also observed in an independent cohort of 13 non-Sardinian ASD cases compared to age-matched healthy controls [54].

Several studies have demonstrated the importance of CAPG for the formation of functional synapses. In fact, experiments conducted on cultured hippocampal neurons have demonstrated that capping proteins are present at the branched actin filament network of dendritic spine heads and they are fundamental for dendritic spine development [56,57]. In fact, *CAPG* knock-down led to a decline in spine density and to an increased number of filopodia-like protrusions [56].

• Voltage-gated ion channels

The role of genetic defects of different ion channels in the pathogenesis of ASD is well established. In fact, GWAS, WES and WGS have identified several polymorphisms and rare variants in calcium, sodium and potassium channels in ASD subjects (reviewed in [58]).

Point mutations in *CACNA1C* gene, which encode for l-type voltage-gated Ca2<sup>+</sup> channel Cav1.2, lead to Timothy syndrome (TS), a disorder affecting multiple organs and characterized by an autistic phenotype [59,60]. l-type channels are mainly expressed in neuronal dendrites and cell bodies and are crucial for the activation of Ca2+-signaling pathways and for neuronal excitability [61]. Defects in CACNA1C prevent the inactivation of the channel and lead to its prolonged opening and consequent increase in Ca2<sup>+</sup> flux [60,62].

Mutations in other genes encoding for l-type and T-type Ca2<sup>+</sup> channels, such as *CACNA1D*, *CACNA1E*, *CACNA1F* and *CACNA1H*, have been described in ASD [63–67]. Moreover, mutations of *CACNB2*, the gene encoding for the regulatory β2 subunit of CACNA1C, have been found in ASD families [68].

Genetic defects in genes encoding for sodium channels, such as *SCN1A*, *SCN2A*, *SCN3A*, *SCN7A* and *SNC8A*, have been also associated with ASD [69–73]. Voltage-dependent sodium channels are mainly expressed in neurons and glial cells and are fundamental for the initiation and propagation of action potentials. Mutations of SCN1A cause Dravet syndrome, characterized by seizures and frequently manifesting, also, autistic symptoms. A study has shown that *Scn1a*+/− heterozygous KO mice display stereotypical and anxious behaviors other than seizures [74].

Several studies have also shown that mutations in genes encoding for K<sup>+</sup> channels, including *KCNMA1*, *KCND2*, *KCNJ10*, *KCNQ3* and *KCNQ5*, may play a central role in ASD etiology [75–78]. It has been shown that KO of *Fmr1* in mice results in K<sup>+</sup> channel dysregulation and consequent dysregulation of synaptic transmission [79].

Some evidence has shown that voltage-dependent anion channel (VDAC) genes, a class of postsynaptic density genes highly expressed in several brain regions, could be implicated in ASD. In fact, autoantibodies against VDAC proteins have been detected in autistic individuals, suggesting a possible causal role in ASD pathogenesis [80]. Moreover, the beneficial effects observed in ASD patients treated with coenzyme Q, or other agents influencing the transport of electrons, have been attributed to the control of such molecules on the porin channels [81]. Recently, a study conducted by our research group identified a 2-bp frameshift deletion of *VDAC3* in an ASD family [54].

#### *3.2. Epigenetic Factors*

There is increasing evidence supporting the possible role of epigenetic aberrations, including DNA methylation alterations and microRNAs, in ASD etiology.

• DNA methylation

Several studies have conducted global methylation analyses in peripheral tissues as well as post-mortem brain tissues from ASD subjects and controls.

Studies conducted on lymphoblastoid cell lines and whole-blood DNA from monozygotic twins discordant for ASD diagnosis and controls identified several differentially methylated regions (DMRs) between discordant MZ twins and between ASD patients and control samples [82,83].

Zhu et al. identified 400 DMRs, enriched at promoters of genes involved in neuronal development, between placentas from children later diagnosed with ASD and those from typically developing controls. Methylation levels of two DMRs, mapping on *CYP2E1* and *IRS2*, were respectively associated with genotype within the DMR and prenatal vitamin use [84].

Conversely, a large epigenome-wide association study, performed on blood-DNA from 796 ASD cases and 858 controls, did not detect any differentially methylated CpG site after correction for multiple testing [85].

Similarly, Siu et al. did not detect any DNA methylation patterns clearly distinguishing heterogenous ASD cases from controls. However, they identified unique DNA methylation signatures for ASD individuals with 16p11.2 deletions or pathogenic variants of *CHD8* [86].

On the other hand, Kimura et al. identified a potential biomarker for adult ASD. The identified CpG site was hypermethylated in whole-blood DNA from ASD patients compared to controls and mapped on the *PPP2R2C* gene, which resulted down-regulated in ASD subjects [87].

Alterations in Alu methylation patterns have been observed in ASD cases sub-grouped based on Autism Diagnostic Interview-Revised scores compared with matched controls [88].

More recently, a genome-wide methylation study was performed on post-mortem tissue samples from different brain regions dissected from ASD subjects and controls. Wide-spread methylation differences, with more pronounced effects in cortical regions compared to cerebellum, were detected between idiopathic ASD cases and controls and in individuals carrying 15q11–13 duplication [89].

Another global methylation study has recently reported that differentially methylated CpG sites identified in ASD cases compared to controls are enriched in pathways converging on mitochondrial metabolism and protein ubiquitination, suggesting a possible role of DNA methylation and mitochondrial dysfunction in ASD [90].

Other studies measuring methylation levels of candidate genes using targeted approaches detected hypermethylation of genes including *APOE* [91] and *HTR2A* [92] and hypomethylation of genes such as *HTR4* [93] in ASD subjects compared to controls.

The functional impact of locus-specific *Mecp2* methylation on ASD onset has been recently demonstrated in vitro and in vivo. *MECP2* is known to be hypermethylated and down-regulated in ASD subjects. The authors of this study employed the CRISPR-dCas9 methylation editing system to induce methylation of the *Mecp2* transcription start site in Neuro-2a cells and in mouse models, resulting in *Mecp2* down-regulation and the acquisition of behavioral changes attributable to an ASD phenotype in mice [94].

• miRNA

Several studies have found that miRNA expression profiles are dysregulated in different matrices, including saliva, blood and brain tissues, from ASD individuals [95].

A study conducted on saliva samples from individuals with and without ASD has identified a panel of four microRNAs differentially expressed between ASD patients and controls [96]. A panel of five salivary miRNAs has shown an accuracy of about 90% in the detection of developmental disorders, including ASD [97].

Down-regulation of miR-6126 has been detected in peripheral blood samples from adult individuals with ASD. The predicted targets of this miRNA belong to neuronal and oxytocin pathways [98]. Similarly, Ozkul et al. identified a consistent decrease and a slight reduction in six microRNAs (miR-19a-3p, miR-361-5p, miR-3613-3p, miR-150-5p, miR-126-3p and miR-499a-5p) in serum samples from ASD children and their unaffected family members compared to healthy controls. This result was replicated in the blood, hypothalamus and sperm of two ASD mouse models [99]. Another study conducted on serum samples identified a panel of three miRNAs (miR-130-3p, miR-181b-5p and miR-320a) showing an area under the curve >0.85 in distinguishing ASD subjects from controls [100].

Studies conducted on knockout (KO) mouse models for some ASD candidate genes, including *Fmr1*, *Mecp2* and *Ube3A*, have observed dysregulation of different miRNAs and evaluated their regulatory role in neuronal context [101–103].

#### **4. Environmental Factors**

As mentioned above, several studies investigated the possible role of environmental factors in the etiology of ASD. According to recent studies, up to 40–50% of variance in ASD liability could be determined by environmental factors, such as drugs, toxic exposures, parental age, nutrition, fetal environment and many others [104–106]. However, while for some potential risk factors, there is

strong evidence, supported by association studies but also by in vitro and in vivo studies, only weak associations have been described for many others.

Below is reported an overview of the most-studied environmental factors that have been found to potentially contribute to cause ASD (recently reviewed in [107]) (Figure 2).

**Figure 2.** Environmental factors associated with ASD. The illustration indicates the putative impact of environmental factors on embryonic and fetal development with a particular focus on neuronal development and synaptic function. Figure created using BioRender.com images.

Parental age is one of the most established environmental ASD risk factors. In fact, much evidence has correlated advanced paternal age (APA) with the development of bipolar disorder, schizophrenia, ADHD and ASD [108]. A meta-analysis of 27 studies on the association between advanced parental age and ASD showed that a 10-year increase in maternal and paternal age is associated with a 20% higher risk of ASD in children [109]. A study has shown that age-related methylation changes observed in sperm could be related to an increased ASD risk in the offspring [110]. APA has been associated with reduced cortical thickness of the right posterior ventral cingulate cortex in ASD offspring [111]. Experiments conducted in mouse models have confirmed that APA is associated with the development of autism-like symptoms in the offspring [112,113] and with altered cortical morphology in male APA mice [112]. Moreover, behaviors related to ASD have also been observed in the second generation of mice with older grandfathers, suggesting that genetic and epigenetic alterations associated with APA are heritable [113].

Perinatal risk factors are also among the most-studied ASD risk factors and among the most difficult to determine and predict in advance. Two comprehensive meta-analyses examined 60 obstetric factors and found statistically significant associations between ASD risk and umbilical cord complications, injury or trauma at birth, multiple births, maternal hemorrhage, low birth weight, neonatal anemia, genital malformation, ABO or Rh blood group incompatibility and hyperbilirubinemia [114,115]. Another study has described an association between increased risk of autism and different factors, including caesarean section delivery, induced labor, management age less than 36 weeks and fetal distress [116].

Fetal exposure to sex steroids represents a potential risk factor for ASD. In fact, the fetal testosterone theory has been proposed to explain the higher ASD prevalence in males [8]. However, this hypothesis is controversial. In fact, while higher testosterone levels have been reported in ASD women, ASD males display testosterone levels similar to controls [117]. Baron-Cohen and colleagues have supposed that testosterone has effects on brain development during the prenatal masculinization window. The authors detected higher levels of sex steroids and cortisol in the amniotic fluid samples from male autistic patients compared with matched typically developing controls [118]. Recently, the same authors reported an association between fetal estrogen levels, important in synaptogenesis and corticogenesis, and autism risk [119]. Similarly, a study conducted on post-mortem brains detected reduced levels of estrogen receptor beta, aromatase and estrogen coactivators in the frontal gyrus of subjects with ASD compared to controls [120]. Moreover, several SNPs in genes encoding for proteins involved in sex steroid synthesis/transport have been associated with autistic traits [121].

The health condition of the mother has a major impact on the risk of ASD. It appears that maternal nutrition in pregnancy is of fundamental importance as it determines the nutrients available to support fetal growth [122]. Therefore, diets lacking specific nutrients can have adverse effects on fetal development. It has been shown that even short intervals between pregnancies can be harmful, since the body needs time, up to one year after childbirth, to recover acceptable levels of several essential substances [123].

Deficiencies of micronutrients, including vitamins and trace elements, have been associated with an increased risk of ASD. For instance, a Swedish study found that maternal vitamin D deficiency is associated with the risk of ASD with ID in the offspring [124]. Unbalanced levels of vitamins have also been detected in ASD children. Moreover, beneficial effects of vitamin supplementation have been observed in ASD patients [125]. Similarly, altered hair and/or blood concentrations of several trace elements, including chromium, magnesium and zinc, have been found in ASD patients compared with controls [126]. Association between maternal deficiency of microelements and ASD risk has also been reported. For example, iron deficiency, common in pregnant women, was associated with a five-fold increased risk of ASD, especially in the presence of other risk factors [127].

The involvement of altered trace elements concentrations on ASD phenotype is also supported by in vitro and animal studies. The effects on unbalanced metal levels on synapse formation and functionality have been evaluated on hippocampal cultured cells from rat brains, finding that the metal profile of autistic children led to down-regulation of crucial synaptic components, including Shank proteins and NMDA receptor subunits, and reduction of synaptic density. Interestingly, it was observed that zinc supplementation was able to revert the observed alterations [128]. The importance of zinc in ASD is supported by multiple pieces of evidence, including a strong association between low levels of this metal and ASD risk as well as a causative role of zinc deficiency in neuronal defects and development of ASD-related symptoms and "therapeutic" effects of zinc supplementation. In fact, low hair and serum levels of zinc and/or altered Zn/Cu ratio have been detected in children with ASD [129–131]. The effects of zinc deficiency on ASD have been observed in animal models. It has been shown that prenatal zinc deficiency alters social behavior in mice [132]. In vitro and in vivo studies have shown that zinc deficiency at the synaptic level leads to a decrease in ProSAP/Shank family members, a reduction in synaptic density and the development of ASD-related symptoms in mice [133]. In fact, other studies have shown that Zn2<sup>+</sup> ions, highly abundant at PSD level, influence the recruitment of ProSAP1/Shank2 or ProSAP2/Shank3 for a correct formation and maturation of synapses [134]. The mechanism regulating the abundance of Zn at PSD levels has been hypothesized: Zn is released from pre-synaptic terminals and can translocate into post-synaptic neurons through Zn-permeable channels, including NMDA, and voltage-gated Ca2<sup>+</sup> and AMPA channels [135]. Notably, a recent study has shown maternal zinc supplementation can prevent ASD-associated deficits in *Shank3* KO mouse models [136]. Recently, Shih et al. have elegantly demonstrated the impact of the crosstalk between genetic and environmental factors in the development of ASD. The authors have shown that KO of *Cttnbp2*, an actin cytoskeleton regulator, leads to a reduction in Zn concentration and expression levels of different synaptic proteins, consequently affecting dendritic spine formation and leading to the development of autism-like behaviors rescued by zinc supplementation [137]. The beneficial effects of zinc supplementation have been also observed in pregnant women, where an increase in zinc intake has been shown to reduce the risk of neural tube defects in the offspring [138].

The association between maternal obesity and risk of ASD in offspring is controversial. A Swedish study described a relationship between maternal BMI and ASD at population level; however, sibling analysis did not reveal any association between elevated maternal BMI and ASD risk [139]. On the other hand, a meta-analysis has shown a 28% and 36% increased risk of ASD in offspring born from overweight and obese mothers, respectively [140], even though there were relatively small numbers of ASD cases within the category of maternal underweight. However, it has also been shown that children born from underweight mothers are also at high risk of ASD [141]. Another study has shown that the combination of maternal obesity and maternal diabetes was associated with an increased risk of ASD and ID [142].

Maternal consumption of substances such as smoke, alcohol and medicines during pregnancy might be a potential risk factor as well. However, the association between smoking and alcohol-use with ASD is rather weak; in fact, two meta-analyses showed no evidence that smoking is a risk factor for ASD [143,144]. Moreover, cohort studies or case-control studies have examined the risk of ASD due to maternal alcohol consumption, indicating that mild to moderate consumption does not pose any risk [145–148].

The safety of medicines during pregnancy is very difficult to establish. In the ASD literature, antidepressant and anticonvulsant drugs have emerged as drugs of potential interest. An example is valproic acid, a drug that has been used to treat epilepsy or as a mood stabilizer in bipolar disorder. Its use during pregnancy leads to congenital malformations, developmental delay and cognitive malfunction [149,150]. Maternal use of selective serotonin uptake inhibitors has been associated with a 50% increase in ASD risk, although maternal psychiatric condition is a confounding factor [151].

An association between maternal diseases and ASD risk has been shown. A meta-analysis limited to case-control studies identified a 62% increased ASD risk among diabetic mothers compared to non-diabetic mothers [152], while a second study found a 74% increase in ASD risk for pregestational diabetes and 43% for gestational diabetes [153].

It has also been shown that maternal viral and bacterial infections are associated with ASD risk [154–156]. Two meta-analyses found that maternal autoimmune diseases are associated with an increased risk of ASD in offspring [157,158]. However, it is not the presence of viruses and bacteria per se to be associated with ASD development, but the immune response they cause, a conclusion supported by research identifying elevated inflammatory markers and antibodies in pregnant women with autistic children [159,160]. This hypothesis is supported by animal studies where maternal immune activation, induced by different immunogens, has been shown to induce post-natal brain dysfunction observable in a phenotype characteristic of ASD and other neurological disorders [161].

It is interesting to note that, although from different perspectives, much evidence points out the importance of maternal immune system condition during fetal development. Several studies have also reported an association between family history of autoimmune diseases and ASD [158,162,163]. It could be speculated that this link could be supported by maternal levels of zinc. In fact, our recent meta-analysis has shown that Zn concentrations in both serum and plasma levels of patients with autoimmunity are significantly lower compared to controls [164]. It is known that Zn plays an important role in the regulation of the immune system and, according to the studies provided above, it plays also an important role in neuronal tube formation.

Finally, exposure to toxic xenobiotics could represent another potential environmental risk factor. Substances such as brominated flame retardants could cause mitochondrial toxicity through a variety of mechanisms, leading to an altered energy balance in the brain [165], an important association with autism since mitochondrial dysfunction has been documented in patients with ASD [166]. Heavy metals can have a negative impact on many body functions by inducing neurological and behavioral damage. A meta-analysis of three case-control studies found a 60% increase in the risk of ASD due to exposure to high levels of inorganic mercury [167]. A recent case-control study found

that exposure to organophosphates during pregnancy is associated with a 60% increase in the risk of ASD [168]. This category includes non-persistent organic pollutants, including phthalates and bisphenol A, and persistent organic pollutants, including DDT, PCB and PBDE. A research group found that three out of five studies on phthalate exposure showed a significant association between phthalate exposure and ASD risk [169]. Exposure to PCBs and PBDEs appears to alter calcium-related signal pathways, leading to alterations in dendritic growth and consequent abnormalities in neuronal connectivity, a key feature of ASD [170,171].

Table 2 summarizes several environmental factors associated with ASD described above.


## **5. Conclusions**

Given the complexity of the etiology of autism and the increasing prevalence of new confirmed cases of ASD worldwide, there is an urgent need to find effective diagnostic methods and study as many risk factors as possible—not only genetic but also epigenetic and environmental ones—without neglecting also genetic/environmental interactions, where risk factors influence each other.

**Author Contributions:** Conceptualization, E.L. and P.Z.; bibliographic search, E.M. and E.L.; writing—original draft preparation, E.M., E.L., R.F. and P.Z.; writing—review and editing, all authors; visualization, A.F.V.-B.; supervision, E.L. and P.Z.; funding acquisition, P.Z., M.C., G.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Fondazione di Sardegna, grant number BSPROG21/2017.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Case Report*

## **New Cav1.2 Channelopathy with High-Functioning Autism, A**ff**ective Disorder, Severe Dental Enamel Defects, a Short QT Interval, and a Novel** *CACNA1C* **Loss-of-Function Mutation**

**Dominique Endres 1,2,\*,**†**, Niels Decher 3,**†**, Isabell Röhr 3, Kirsty Vowinkel 3, Katharina Domschke 2,4, Katalin Komlosi 5, Andreas Tzschach 5, Birgitta Gläser 5, Miriam A. Schiele 2, Kimon Runge 1,2, Patrick Süß 6, Florian Schuchardt 7, Kathrin Nickel 1,2, Birgit Stallmeyer 8, Susanne Rinné 3, Eric Schulze-Bahr 8,**‡ **and Ludger Tebartz van Elst 1,2,**‡


Received: 20 September 2020; Accepted: 7 November 2020; Published: 15 November 2020

**Abstract:** Complex neuropsychiatric-cardiac syndromes can be genetically determined. For the first time, the authors present a syndromal form of short QT syndrome in a 34-year-old German male patient with extracardiac features with predominant psychiatric manifestation, namely a severe form of secondary high-functioning autism spectrum disorder (ASD), along with affective and psychotic exacerbations, and severe dental enamel defects (with rapid wearing off his teeth) due to a heterozygous loss-of-function mutation in the *CACNA1C* gene (NM\_000719.6: c.2399A > C; p.Lys800Thr). This mutation was found only once in control databases; the mutated lysine is located in the Cav1.2 calcium channel, is highly conserved during evolution, and is predicted to affect protein function by most pathogenicity prediction algorithms. L-type Cav1.2 calcium channels are widely expressed in the brain and heart. In the case presented, electrophysiological studies revealed a prominent reduction in the current amplitude without changes in the gating behavior of the Cav1.2 channel, most likely due to a trafficking defect. Due to the demonstrated loss of function, the p.Lys800Thr variant was finally classified as pathogenic (ACMG class 4 variant) and is likely to cause a newly described Cav1.2 channelopathy.

**Keywords:** CACNA1C; CaV1.2; autism; short QT syndrome; dental enamel defect

### **1. Introduction**

Autism spectrum disorders (ASD) are frequent neurodevelopmental disorders characterized by social interaction difficulties, stereotypical procedures, routines and rituals, or special interests caused predominantly idiopathic or of unknown cause ("primary forms"); in a subset, a specific cause can be identified, and some of these secondary cases ("secondary forms") are part of genetically determined syndromes, e.g., fragile X syndrome [1]. Within these, Timothy syndrome is an extremely rare one characterized by long QT syndrome (LQTS) in the surface electrocardiography (ECG), skeletal abnormalities (e.g., cutaneous syndactyly), and neuropsychiatric features, such as autism [2–4], and is caused by gain-of-function mutations in the *CACNA1C* gene. Of note, some *CACNA1C* mutations may have an isolated cardiac, non-syndromal phenotype (only with QTc prolongation [5]). In contrast, patients with a short QT syndrome (SQTS) have not been described with extra-cardiac features so far.

## **2. Case Presentation**

The authors present for the first time a syndromal form of SQTS in a 34-year-old German male patient that is characterized by extracardiac features, namely a severe form of secondary ASD together with affective and psychotic exacerbations, wearing of his teeth, and a short QT interval in the surface ECG due to a heterozygous loss-of-function mutation in the *CACNA1C* gene. The functional consequences of the genetic mutation were further analyzed using electrophysiological investigations of the mutant Cav1.2 calcium channel expressed in Xenopus oocytes [2]. The patient has given his signed written informed consent for this case report, including the presented images, the genetic information, and all other data, to be published.

### *2.1. Clinical Case Description*

Since the first decade, the patient has suffered from the whole spectrum of autistic symptoms. The detailed diagnosis was performed according to the scheme established by the Freiburg Center for the Diagnosis and Treatment of Autism (http://www.uniklinik-freiburg.de/psych/live/patientenversorgung/ schwerpunkte/schwerpunkt-asperger.html, accessed on 14 November 2020 [6–8]). His medical history showed alterations in (1) gaze control and holistic visual recognition (he had to learn to look into the eyes of others, uses analytical facial expression recognition, recognizes other people by certain characteristics); (2) problems with social communication (he lacks the tools to build relationships, is burdened by social contacts, small talk and telephoning cause him problems, plans social situations in detail in advance); (3) reduced social integration (has only a few close friends, needs his time alone); (4) interactional imagination abnormalities and sense of justice (he hated "doing things as if he were playing games", his sense of justice is 10/10, he is very honest); (5) linguistic pragmatics (as a child, he read dictionaries to understand proverbs, is often interpreting things literally); (6) routines and rituals (plans his everyday life in detail, is stressed by changes of plan, usually eats and drinks the same things); (7) motor clumsiness (he always had difficulties in sports, he had two "left hands" and two "left feet"); (8) sensory hypersensitivity (for loud noises and light, tormented in former times already by stroking by the mother); (9) strong perception of details (e.g., finds comma errors in texts, has difficulties with the "overall picture"); (10) special memory capacity (always knows exactly where what stood when he had read something); and (11) special interests for computer science [6]. The Autism Diagnostic Observation Schedule, Second Edition (ADOS-2; at the age of the patient of 34 years) confirmed clear autistic communication and interaction behavior confirming a syndrome

diagnosis (14 points; cut off > 10 [9]). The testing of the recognition of emotions ("gnosis facialis") revealed indications of relative deficits in recognizing fear (http://www.gnosisfacialis.de/infoERT.html, accessed on 14 November 2020. In the Movie for the Assessment of Social Cognition (MASC), the patient scored borderline [10]. Difficulties in communication in social intuition were partially compensated by the patient in an analytical–cognitive way. In summary, two expert raters confirmed the presence of a high-functional ASD syndrome. His intelligence level was above average (he reached high school graduation "Abitur" in Germany with average grade 1.3; range: 1–6, optimal: 1, worst: 6) and later studied computer science.

At the age of 15 years, the patient developed his first depressive episode and, at the age of 17 years, his first hypomanic episode. Until publication of this case report, the patient suffered from recurrent depressive episodes, but hypomanic phases did not occur anymore after the age of 21 years. His first psychiatric presentation took place at the age of 18 years following mutistic states (i.e., he was unable to speak); there were no indications of schizophrenia or causation by illegal drugs. Video telemetry revealed no evidence of epileptic seizures. At the age of 18 years, monomorphic ventricular extrasystoles (VES) were noticed for the first time during a physical examination for military service suitability. At the age of 29 years, the teeth suddenly discolored and then receded (Figure 1A). During this time, the patient suffered from significant polydipsia (requiring around 10 L of drinks/day) and was treated with lithium, methylphenidate, pregabalin, and lorazepam. Polydipsia was mainly caused by lithium treatment with increased serum levels at that time (1.6−1.8 mmol/L; reference 0.4–1.2 mmol/L). Calcium serum levels were within the normal range, and, thereby, phosphate and parathyroid levels were not determined. In the further course, 13 teeth had to be extracted and were supplemented by dental prostheses; other teeth were crowned. At the age of 29 years, he also developed a paranoid hallucinatory episode with hearing voices and ideas of persecution under the treatment with methylphenidate, nortriptyline, quetiapine, pregabalin, and lorazepam. Recurrently, the patient received benzodiazepines, but, after this was stopped (by antagonization with flumazenil), a generalized tonic-clonic seizure occurred at the age of 29 years.

**Figure 1.** (**A**) The dental status at 29 years of age and the corresponding dental radiography showed a reduced density of several coronas and mild hypoplasia of the maxilla and mandible. (**B**) The cerebral magnetic resonance imaging of the patient revealed slight hippocampal damage on the left in the area of the corpus and the cauda, together with otherwise inconspicuous cerebral anatomic findings. (**C**) The baseline electrocardiography displayed a short QTc interval (heart rate 57/min, QT 330 ms, QTc: 324 ms; reference: > 350 ms; 3a) and normalization during treatment with quinidine (3 × 200 mg/d; QTc: 425 ms; reference: < 350 ms; 3b). (**D**) The DNA sequence electropherogram revealed a heterozygous state at position 2399 (c.2399A > C) in the *CACNA1C* gene predicted to result in p.Lys800Thr.

Electroencephalography at age 29, 33, and 34 years was inconspicuous. Analysis of the cerebrospinal fluid (CSF) revealed a borderline blood-brain barrier dysfunction (with CSF protein levels of 579 mg/L; reference: < 450 mg/L). The immunological screening showed an antinuclear antibody (ANA) titer of 1:100 (reference: < 1:50) and an IgA deficiency (0.26 g/L; reference: 0.7−4 g/L).

A highly symptomatic occurrence of the monomorphic VES was noted again during the subsequent in-patient hospital treatment at the age of 29 years. Further cardiologic evaluation showed ~12,000 VES/day during Holter ECG, and subsequently, due to the severe overall clinical course, the patient underwent ablation therapy in the area of the middle cardiac vein or coronary sinus twice, which reduced VES burden (remaining: 3600 VES/day). There was no evidence of a structural heart disease as assessed by transthoracic echocardiography. The family history was positive with his father suffering from depression, alcohol-dependency, tachycardia (treated with ß-blockers), and unclear dental damage (the patient had no contact with the father; therefore, further information is missing). The age of the mother at birth of the patient was 41 years, and the age of the father was 35 years.

Over the years, due to the severe course of the psychiatric symptoms, multiple psychopharmacological treatment trials have been carried out using varying and sometimes high doses (sertraline, fluoxetine, escitalopram, paroxetine, venlafaxine, vortioxetine, bupropion, nortriptyline, amitriptyline, clomipramine, lithium, methylphenidate, haloperidol, aripiprazole, perazine, promethazine, quetiapine, valproate, lamotrigine, carbamazepine, oxcarbazepine, pregabalin, levetiracetam, zonisamide, lacosamide, lorazepam, clonazepam, oxazepam, clobazam).

After complete discontinuation of psychotropic medication at the age of 31 years, a cardiac re-evaluation was performed, and a short QTc interval of 324 ms (Figure 1C; reference: > 350 ms) was noted for the first time. Overall, this resembled a (paradoxical) shortening of the QTc interval during a heart rate (HR) of 57 bpm; at a HR of 111 bpm, the QTc interval normalized (408 ms). There was no evidence of T-wave alternans. Due to suspected SQTS, quinidine treatment (3 × 200 mg/day) has been initialized to normalize repolarization, but also to reduce symptomatic VES burden and thereby indirectly to improve mental strength and personal stability. Of note, during treatment QTc intervals completely recovered (at 57/min: QRS 114 ms, QTc around 425 ms; T-wave: broad-based, biphasic; no early repolarization signs or Brugada sign; Figure 1C), and the burden of symptomatic VES was nearly absent (now: 16 VES/day).

#### *2.2. Genetic Analyses*

Genetic testing of genes related to SQTS (*KCNQ1*, *KCNH2*, *KCNJ2*, *CACNA1C*) revealed a heterozygous nucleotide variant in *CACNA1C* (NM\_000719.6: c.2399A > C; p.Lys800Thr), which was very rare in control databases (MAF gnomAD: 0.00043%) and predicted to affect protein function by 16 of the 23 pathogenicity prediction algorithms (VarCards). The mutated lysine is located in the cytoplasmic loop between repeat II and repeat III of the Cav1.2 calcium channel and is highly conserved. To exclude pathogenic copy number variations (CNV) as a possible additional cause of ASD and the psychiatric symptoms, microarray-analysis was performed (CytoSureTM Constitutional v3 Array 180k, Oxford Gene Technology) in the patient. Molecular karyotyping did not show any pathogenic or relevant CNVs.

## *2.3. Electrophysiological Analyses*

Heterologous expression studies of the mutant Cav1.2 calcium channel in *Xenopus* oocytes showed an isolated reduction in the current amplitude of the L-type calcium channels (Figure 2B,C) without a change in kinetic properties (Figure 2D−I) when compared with native (wild-type) channels. Here, the normalized bell-shaped current-voltage relationship and the voltage-dependence of activation were unaltered (Figure 2D−E). In addition, the voltage-dependence of inactivation (Figure 2F−G), the extent of inactivation (Figure 2H), and the kinetics of inactivation (Figure 2I) were similar as in wild-type channels. Thus, the genetic variant is responsible for an isolated reduction of the macroscopic current amplitudes of approximately 40% (Figure 2C). The lack of changes in the channel kinetics and

maintained voltage-dependence of channel gating behavior suggest that this reduced current amplitude is most likely caused by an intracellular trafficking defect of the mutant Cav1.2 calcium channel.

**Figure 2.** Own experimental analyses in the presented patient illustrated an isolated reduction in the current amplitude of L-type calcium channels without a change in kinetic properties. Two-electrode voltage-clamp measurements were performed in *Xenopus* oocytes, as previously described [2]. Ba2<sup>+</sup> was used as a charge carrier, and for all experiments, Cav1.2 was co-expressed with the α2δ and β2b subunits [2]. In detail, the figure shows the following: (**A**) High evolutionary conservation of the K800 residue between different orthologues. (**B**) Representative current traces of wild-type (WT) Cav1.2 and mutant Cav1.2K800T recorded with the indicated voltage protocol, in order to analyze the current amplitudes and the voltage-dependence of activation properties. (**C**) Reduced peak current amplitudes, analyzed at + 20 mV. (**D**) Normalized bell-shaped current voltage-relationship and (**E**) conductance-voltage (GV) relationship. The voltage of half-maximal activation (V1/2 act.) is indicated in the upper corner. (**F**) Representative current traces of wild-type Cav1.2 and Cav1.2K800T recorded with the indicated voltage protocol, in order to analyze the inactivation properties. (**G**) Analyses of the voltage of half-maximal inactivation (V1/2 inact.). The V1/2 inact. values are provided in the upper corner. (**H**) Percentage of voltage-dependent inactivation of wild-type and Cav1.2K800T analyzed at different voltage potentials. (**I**) Analyses of the time constant of voltage-dependent inactivation (τ inact.) of wild-type and Cav1.2K800T analyzed at different voltage potentials. Numbers of experiments are indicted in the panels. Data are presented as mean ± s.e.m. \* indicates *p* < 0.05.

With respect to these functional data, the p.Lys800Thr variant was classified as likely pathogenic (class 4 variant) according to the ACMG guidelines [11].

### **3. Discussion**

The authors present a patient with a previously unpublished loss-of-function mutation of the *CACNA1C* gene with a novel, syndromal phenotype characterized by ASD, affective and psychotic exacerbations, dental enamel defect, and a short QT interval in the surface ECG.

The mutated *CACNA1C* gene (Chr. 12p13.3; 2,138 amino acids) encodes the α-subunit of the L-type calcium channel CaV1.2. These calcium channels are highly expressed in the brain and heart [3]. Genetic changes in *CACNA1C* are associated with Timothy syndrome or early repolarization disturbances/Brugada-like electrocardiography. However, patients with Timothy syndrome typically have gain-of-function mutations in the *CACNA1C* gene (e.g., p.Gly406Arg [2]), leading to maintained depolarizing L-type calcium current with long QT syndrome in the surface ECG and early sudden cardiac death (at an average age of 2.5 years; 3). Additionally, other cardiac features (AV conduction block with bradycardia, tachyarrhythmia, or congenital heart defects), neuropsychiatric involvement (developmental delays, seizures, ASD), hand/foot and facial findings (e.g., cutaneous syndactyly, low-set ears), hypoglycemia, and infections can be found in patients with Timothy syndrome [2]. Poor dental enamel was also reported earlier [3,12].

The presented case also suffered from high-functioning ASD, but he additionally presented with depressive and hypomanic episodes, as well as one paranoid hallucinatory episode. Such complex psychiatric syndromes are often found in patients with secondary, organic forms [1]. The genetic variant that has been classified finally as a likely pathogenic variant could explain the poor response to different psychotropic drugs in the patient's history. However, pharmacogenetic studies, e.g., on drug metabolism, have not been carried out. There were no signs of dysmorphia on the hands/feet or face in the presented patient. In addition, he suffered from poor dental enamel with severe caries, initially misinterpreted as "meth mouth" because of high-dose methylphenidate medication. Surprisingly, a short QT interval was first noted when the patient was off psychotropic drugs that typically prolong the QT interval and, in this particular case, led to a normalization. He also had a high burden of VES. Only just about 250 cases with short QT syndrome are currently known, however, these have either isolated cardiac phenotypes or acquired, concomitant conditions (e.g., electrolyte disturbances or carnitine deficiency), and to date only about 30 genetic variants in eight potential disease genes were identified (summarized in [13]). Most of these cases are due to gain-of-function mutations in the potassium channel genes (*KCNH2, KCNQ1, and KCNJ2*), whereas loss-of-function mutations in *CACNA1C* leading to SQTS are extremely rare, and, in these cases, the shortened QT interval is accompanied with a Brugada ECG phenotype (not seen in the present case). Due to controversial or absent functional data and contradictory in silico predictions, some of variants were re-classified as variants of uncertain significance following ACMG criteria (class 3 variant) in a recent publication [14]. Very recently, a *CACNA1C* loss-of-function variant has been linked with SQTS in combination with early repolarization patterns in the surface ECG [15].

The heterozygous c.2399 A > C variant in the presented patient is yet unreported, leading to a non-synonymous amino acid exchange in the α-subunit of the L-type calcium channel CaV1.2 (p.(Lys800Thr)). A causality of the mutation is possible because the gene mutation is located in a conserved gene region and is very rare in control databases. Most pathogenicity prediction programs that evaluate amino acid exchange in silico suggest a biochemical and thus pathogenic effect (VarCards: 16/23) as it has been shown by the in vitro data upon heterologous expression experiments. The variant finally could be classified as a class 4 variant ("likely pathogenic") according to the ACMG/AMP guideline [11]. Our functional studies revealed an isolated reduction of the current amplitudes (Figure 2C) without changes in the gating behavior of the channel. Therefore, a weak trafficking defect is the most likely mechanism of action for this atypical Cav1.2 channelopathy with brain, cardiac and dental enamel involvement.

## **4. Conclusions**

In summary, this is a paradigmatic case of a secondary genetic variant of a complex, partially atypical mental disorder with heart and dental involvement. Many such secondary genetic variants probably occur, each of which is very rare, but, taken together, these rare variants might be causing a relevant subgroup of mental disorders. In the presented patient, a pathological variant can be assumed, which led to an atypical form of SQTS with a clinical spectrum that partially overlaps with that of Timothy syndrome. Associations between CaV1.2 channel (*CACNA1C*) polymorphisms and different psychiatric disorders, including autism, depression, bipolar disorder, and schizophrenia, were recently reported [16–19]. Further research will need to show whether similar cases exist and elucidate the therapeutic consequences that can be inferred from mutant Cav1.2 channel.

**Author Contributions:** D.E., P.S., F.S., E.S.-B., and L.T.v.E. treated the patient. D.E. performed the data research and wrote the paper. F.S. supported the literature search. N.D., I.R., K.V., and S.R. performed the experimental tests of calcium channels. E.S.-B. and B.S. performed *CACNA1C* testing and cardiac investigations and interpretation. E.S.-B. strongly revised the initial version of the paper. K.K. and A.T. examined the patient for signs of dysmorphia. B.G. performed and interpreted the microarray-analysis. N.D., M.A.S., K.R., K.N., and K.D. supported the overall interpretation and critically revised the manuscript. All authors were critically involved in the theoretical discussion and composition of the manuscript. All authors read and approved the final version of the manuscript.

**Funding:** The article processing charge was funded by the Baden-Wuerttemberg Ministry of Science, Research and Art and the University of Freiburg in the funding program Open Access Publishing.

**Acknowledgments:** Dominique Endres was funded by the Berta-Ottenstein-Programme for Advanced Clinician Scientists, Faculty of Medicine, University of Freiburg. Patrick Süß is a member of the research training group GRK2162 funded by the DFG (270949263/GRK2162) and is supported by the University Hospital Erlangen (ELAN project P059, IZKF clinician scientist program). Niels Decher was funded by the DFG (DE 1482/9-1). Florian Schuchardt received financial support from Forschungskommission, Medical Faculty, Albert-Ludwigs-University Freiburg.

**Conflicts of Interest:** Dominique Endres: None. Niels Decher: None. Isabell Röhr: None. Kirsty Vowinkel: None. Katharina Domschke: Steering Committee Neurosciences, Janssen. Katalin Komlosi: None. Andreas Tzschach: None. Birgitta Gläser: None. Miriam A. Schiele: None. Kimon Runge: None. Patrick Süß: None. Florian Schuchardt: None. Kathrin Nickel: None. Birgit Stallmeyer: None. Susanne Rinné: None. Eric Schulze-Bahr: None. Ludger Tebartz van Elst: Advisory boards, lectures, or travel grants within the last three years: Roche, Eli Lilly, Janssen-Cilag, Novartis, Shire, UCB, GSK, Servier, Janssen and Cyberonics.

## **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## **An Automated Functional Annotation Pipeline That Rapidly Prioritizes Clinically Relevant Genes for Autism Spectrum Disorder**

**Olivia J. Veatch 1,\*, Merlin G. Butler 1, Sarah H. Elsea 2, Beth A. Malow 3, James S. Sutcli**ff**e <sup>4</sup> and Jason H. Moore <sup>5</sup>**


Received: 17 November 2020; Accepted: 25 November 2020; Published: 27 November 2020

**Abstract:** Human genetic studies have implicated more than a hundred genes in Autism Spectrum Disorder (ASD). Understanding how variation in implicated genes influence expression of co-occurring conditions and drug response can inform more effective, personalized approaches for treatment of individuals with ASD. Rapidly translating this information into the clinic requires efficient algorithms to sort through the myriad of genes implicated by rare gene-damaging single nucleotide and copy number variants, and common variation detected in genome-wide association studies (GWAS). To pinpoint genes that are more likely to have clinically relevant variants, we developed a functional annotation pipeline. We defined clinical relevance in this project as any ASD associated gene with evidence indicating a patient may have a complex, co-occurring condition that requires direct intervention (e.g., sleep and gastrointestinal disturbances, attention deficit hyperactivity, anxiety, seizures, depression), or is relevant to drug development and/or approaches to maximizing efficacy and minimizing adverse events (i.e., pharmacogenomics). Starting with a list of all candidate genes implicated in all manifestations of ASD (i.e., idiopathic and syndromic), this pipeline uses databases that represent multiple lines of evidence to identify genes: (1) expressed in the human brain, (2) involved in ASD-relevant biological processes and resulting in analogous phenotypes in mice, (3) whose products are targeted by approved pharmaceutical compounds or possessing pharmacogenetic variation and (4) whose products directly interact with those of genes with variants recommended to be tested for by the American College of Medical Genetics (ACMG). Compared with 1000 gene sets, each with a random selection of human protein coding genes, more genes in the ASD set were annotated for each category evaluated (*<sup>p</sup>* <sup>≤</sup> 1.99 <sup>×</sup> <sup>10</sup><sup>−</sup>2). Of the 956 ASD-implicated genes in the full set, 18 were flagged based on evidence in all categories. Fewer genes from randomly drawn sets were annotated in all categories (x <sup>=</sup> 8.02, sd <sup>=</sup> 2.56, *<sup>p</sup>* <sup>=</sup> 7.75 <sup>×</sup> <sup>10</sup><sup>−</sup>4). Notably, none of the prioritized genes are represented among the 59 genes compiled by the ACMG, and 78% had a pathogenic or likely pathogenic variant in ClinVar. Results from this work should rapidly prioritize potentially actionable results from genetic studies and, in turn, inform future work toward clinical decision support for personalized care based on genetic testing.

**Keywords:** bioinformatics; human genetics; pharmacogenomics; autism

#### **1. Introduction**

Autism spectrum disorder (ASD) is a heterogeneous neurodevelopmental condition characterized by impairments in social interactions, delays in language development and patterns of restricted interests and/or repetitive behaviors [1]. ASD manifests along a wide distribution of core symptom severity and numerous health conditions present with ASD [2,3]. It is well-established that ASD has a predominantly genetic etiology, and genetic factors influence many of the medical conditions diagnosed with ASD [4,5]. Advancements in genomic technology are continually generating large amounts of data implicating various biological processes dysregulated in ASD. Notably, ASD is one of the most complex and prevalent neurodevelopmental conditions observed in humans, with a worldwide prevalence estimated at one in 160 children [6]. Furthermore, insufficient evidence exists substantiating efficacy of pharmaceutical treatment of specific symptoms and co-occurring conditions in ASD with many reported adverse events [7–9]. Informing more effective, personalized approaches for treatment of these heterogeneous conditions is a necessary area of research. An important next step is to better understand how current data can inform diagnosis and treatment of co-occurring conditions in ASD [10].

Fortunately, excellent tools are available to help circumvent challenges related to interpreting results from human genetic studies and offer immediate insight into mechanisms that are most likely to translate in the clinic setting [11–19]. While the definition of what constitutes 'clinically actionable' genetic information can vary and is dynamic [10], it is defined in this project as any functionally relevant, disease-implicated gene with evidence indicating a patient may have co-occurring condition that requires direct intervention, or that variation in the gene (or genetic mechanism) may influence how a patient will respond to a drug.

To initially identify all of the potential genetic factors that contribute to expression of a complex condition like ASD, it is necessary to compile results from many sources. These include statistical associations (e.g., genome-wide association study (GWAS) hits, rare variant burden test results, transmission disequilibrium test results) and functional evidence in humans and model organisms. Centralized repositories, like DisGeNET (https://www.disgenet.org/), integrate and uniformly annotate data and allow easy access to comprehensive knowledge of the genetic underpinnings of disease [17].

One way to facilitate interpretation of genetic results is to focus on implicated genes expressed in tissues relevant to the disease etiology [13,20–23]. As most evidence indicates that ASD relates to neurodevelopment [24], identifying ASD-associated genes that are expressed at appreciable levels in human brain tissue can rapidly prioritize genes that are more likely to be functionally relevant. Integrated databases, such as the Expression Atlas (https://www.ebi.ac.uk/gxa/home), offer results from large microarray and RNA-sequencing studies in humans (e.g., the Genotype Tissue Expression (GTEx) Project) and allow for efficient mining of data related to tissue-specific gene expression [13,22].

To further annotate those genes that are more likely to be functionally related to ASD etiology, it is useful to focus on genetic mechanisms evidenced to underlie ASD symptom expression. Notably, the biological functions of different genes implicated in ASD point to convergent mechanisms [25–27]. By identifying biological processes enriched for genes implicated in ASD and then focusing specifically on genes that influence these processes, genes of particular interest for functional follow-up and drug target repurposing or discovery can be readily classified [28–30]. Efforts to consistently describe gene products across databases, like the Gene Ontology (GO) Consortium [16], allow for identification of these larger, multi-gene processes. Furthermore, transgenic techniques in model organisms are incredibly useful for identifying genes with functional consequences relevant to human disease [31–33] and drug discovery [34]. On-going efforts by groups like the International Mouse Phenotyping Consortium (IMPC) offer web portals which allow for rapid mining of mouse phenotype data from

knockout mutant strains for eventually every protein-coding gene in mice [35]. Determining the traits that are associated with knocking out specific genes may also offer insight into which genes are more likely to influence expression of co-occurring conditions in ASD.

The ultimate goal of precision medicine is to prevent disease; however, realization of this vision is likely many years away [36]. A key opportunity in precision medicine is to incorporate drug ontology and pharmacogenomics data to inform care and tailor treatment (i.e., maximize efficacy, minimize adverse events) [37,38]. An approach toward this goal is to determine if any of the proteins encoded by ASD candidate genes are targets for pharmaceutical compounds that are currently approved to treat symptoms and co-occurring conditions in ASD, or are potential novel targets based on evidence that they are chemically similar or function in the same mechanisms as approved drug targets [39,40]; integrated knowledge bases, like Pharos (https://pharos.nih.gov/), exist to identify these proteins [14]. It is also of interest to know whether a patient with a given variant will respond differently to medications. Resources such as the Pharmacogenomics Knowledge Base (PharmGKB; https://www.pharmgkb.org/) coalesce these data into a common portal [41].

To prioritize genes that may be clinically relevant, specifically with regard to co-occurring conditions, but have yet to be confirmed as having a causal relationship with human disease (i.e., pathogenic), it is advantageous to incorporate evidence related to genes with known pathogenic variants. Resources that define the clinical relevance of variants, like the American College of Medical Genetics (ACMG) [42] and ClinVar [18], can be used to identify these 'clinically actionable' genes. Furthermore, protein-protein interaction databases, like STRINGdb (https://string-db.org/), can be used to identify different genes with disparate variants that impact a common network [43]. By extension, protein interaction knowledge may identify candidate genes more likely to house pathogenic variants via revealing direct connections between their protein products and the products of genes with known pathogenic variants.

A major goal of our study is to develop an automated bioinformatics pipeline using the databases referenced above that can identify genes pulled from all published studies of ASD where variation may indicate a patient has a co-occurring condition that is important to treat, or are more likely to be useful for drug development and/or pharmacogenomics approaches to treatment of symptoms and comorbidities in patients with ASD. Each step of the pipeline is evaluated by comparing results for genes cited in connection with ASD to results for 1000 sets of *n* = 956 randomly selected human protein coding genes of equal number to the ASD gene set. Prioritized ASD genes are then queried against a gold standard, defined as expert curated evidence in ClinVar indicating that the gene has a pathogenic variant.

#### **2. Results**

At the time of these analyses, there were 956 unique protein coding genes with evidence for associations with ASD in the DisGeNET, which reflects expansive evidence from candidate gene studies, genome-wide genotyping, whole-exome/genome sequencing, and functional studies in human cell lines and model organisms (Supplementary Table S1). Notably, this initial list of ASD candidate genes included genes implicated via evidence from genetic studies of idiopathic ASD cases (e.g., *NLGN1*, *NLGN2*, *NLGN3*, *NLGN4X* and *NLGN4Y*), as well as genes pulled from studies evaluating syndromic cases of ASD (e.g., *FMR1*, *MECP2*, and *SHANK3*). All of the functional attributes used in our approach for prioritizing candidate genes had more ASD-related genes when compared to 1000 random sets, each with a random selection of 956 genes (Table 1). Ultimately, 18 ASD candidate genes were prioritized based on annotation in each step of the pipeline, which was more than the average number of genes from random gene sets (X2 <sup>=</sup> 11.30, *<sup>p</sup>* <sup>=</sup> 7.75 <sup>×</sup> <sup>10</sup><sup>−</sup>4; Table 1).


**Table 1.** Comparisons of Functional Annotations for Autism Spectrum Disorder Candidate Genes to Random Genes.

Shown are the number of genes cited in connection with ASD in DisGeNET with pipeline attributes compared to the average number of genes across 1000 random sets. Included for each attribute are results of chi-square tests comparing the proportion of genes annotated in the ASD list (*n* = 956) to the average proportion of genes across all random lists. Uncorrected and Benjamini-Hochberg corrected *p*-values are included. Tclin = FDA-approved compound targets, Tchem = molecules with known properties similar to approved drug targets, Tbio = proteins with known biological or molecular functions but no known drug target properties, Tdark = proteins with relatively unknown function, sd = standard deviation, CI = confidence interval.

#### *2.1. ASD Candidate Gene Expression in Human Brain*

There were 861 (90.1%) ASD genes which were expressed at TPM ≥ 0.5 in at least one of the brain regions evaluated using available RNA-sequencing data from typical human tissue. This was increased when compared to random gene sets (84.6%, X2 = 21.36, pcorrected = 5.07 <sup>×</sup> 10−6; Table 1). Notably, ASD-related genes were expressed at different levels in human brain tissue available in GTEx when compared to expression patterns of the random gene sets (*F*(13, 13117) = 2.19, Pillai = 0.002, *p* = 7.83 <sup>×</sup> 10<sup>−</sup>3). These differences related almost entirely to increased levels of ASD gene expression in the pituitary gland (*F*(1, 15317) = 14.10, pcorrected = 2.24 <sup>×</sup> 10−3; Figure 1). No other evaluated brain regions showed evidence of differential expression of ASD genes compared to genes in the random sets (Supplementary Table S2).

**Figure 1.** Brain Expression Profiles of ASD Candidate Gene Set and Random Gene Sets. Shown are average expression levels for ASD and random gene sets, based on transcripts per million, for each brain region obtained from typical human tissue in GTEx. Average expression of each gene set in the respective brain region are clustered based on Euclidean measures. The ASD risk gene cluster has been amplified for easier visualization and is highlighted in red on the y-axis.

#### *2.2. ASD Candidate Genes Associated with Mammalian Phenotypes*

There were 225 biological processes defined in humans and overrepresented in ASD candidate genes. There were also 258 processes defined in mice that were overrepresented among ASD gene mouse orthologs; 200 of these overlapped with biological processes overrepresented for ASD genes in humans (Supplementary Table S3). Products from 931 genes (96.5% of all ASD genes) were involved in these processes. Seven terms reflecting Mammalian Phenotypes (MP) defined in the IMPC were mapped to a GO-defined biological process overrepresented among the ASD genes. Phenotypes included 'abnormal postnatal growth/weight/body size' (MP:0002089), 'decreased body size' (MP:0001265), 'increased body size' (MP:0001264), 'abnormal nervous system morphology' (MP:0003632), 'abnormal brain development' (MP:0000913), and 'abnormal embryo development' (MP:0001672). These traits were represented by three top level terms (i.e., 'growth/size/body region phenotype' [MP:0005378], 'nervous system phenotype' [MP:0003631], 'embryo phenotype' [MP:0005380]). More ASD genes (9.2%) were associated with at least one trait under these top-level terms in mouse knockouts when compared to genes in the random sets (Table 1, X<sup>2</sup> = 5.42, pcorrected = 1.99 <sup>×</sup> 10<sup>−</sup>2). Specifically, more ASD genes were associated with abnormal postnatal growth/size/body region phenotypes when knocked out in mouse models (Table 2, X<sup>2</sup> = 8.60, pcorrected = 1.01 <sup>×</sup> 10−2). The most prevalent of these types of traits was 'decreased lean body mass' (MP:0002089), followed by 'decreased body length' (MP:0002089; Figure 2).

**Table 2.** Proportions of Autism Spectrum Disorder Genes Associated with Specific Top Level Mammalian Phenotypes.


Shown are results comparing the number of ASD candidate genes that were associated with a mammalian phenotype reflecting overrepresented Gene Ontology Biological Processes defined in humans, compared to the average number of genes across 1000 random sets. Chi-square test results are based on the proportion of genes in the ASD list (*n* = 956) compared to the average proportion of genes across all random lists. Uncorrected and Benjamini-Hochberg corrected *p*-values are included. sd = standard deviation.

#### *2.3. ASD Candidate Genes Influencing Drug Response*

Compared to random genes, ASD candidate genes encoded more potential drug targets or contain variants that may inform pharmaceutical treatment (Table 1). Specifically, an increased proportion of genes cited in connection with ASD (11.8%) encode FDA-approved drug targets (X2 = 219.18, pcorrected = 6.98 <sup>×</sup> 10−49), or had molecular properties similar to approved targets (15.5%, X2 = 60.25, pcorrected <sup>=</sup> 1.34 <sup>×</sup> <sup>10</sup><sup>−</sup>14). Furthermore, more ASD genes (13.0%) contained variants with evidence for significant pharmacogenetic effects (X2 <sup>=</sup> 103.25, pcorrected <sup>=</sup> 5.92 <sup>×</sup> <sup>10</sup><sup>−</sup>24).

## *2.4. ASD-ACMG Protein Interactions*

There were nine ASD risk genes with evidenced pathogenic variants for which the ACMG recommends clinical testing (Supplementary Table S1). In addition, an increased proportion of proteins encoded by candidate genes (49.7%) had predicted direct interactions with proteins encoded by ACMG-recommended genes compared to proteins encoded by genes in the 1000 random sets (Table 1, X2 = 122.98, pcorrected = 3.76 <sup>×</sup> 10−28). On average, ASD-related proteins formed six connections (x = 6.32, sd = 14.20) with ACMG proteins; however, there was no evidence that candidate proteins formed more connections when compared to proteins encoded by genes in any of the random sets (X2 <sup>=</sup> 824.34, *<sup>p</sup>* <sup>=</sup> 9.99 <sup>×</sup> <sup>10</sup><sup>−</sup>1).

## *2.5. Evaluation of Pathogenic Variants in Prioritized ASD Candidate Genes*

Of the 27,622 total genes harboring variants as curated in the September 2020 update of ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/), 900 were genes cited in connection with ASD and 487 of these had a variant reported to be pathogenic or likely pathogenic. This represents approximately half (50.9%) of the full list of 956 ASD-related genes included in the DisGeNET. Of the 18 ASD-related genes that were prioritized via our pipeline (Figure 3), 78% (*n* = 14) had a variant reported to be pathogenic or likely pathogenic (Table 3, Supplementary Table S1).

**Figure 3.** Prioritized ASD Proteins and ACMG Proteins Interaction Network. Shown are direct interactions predicted between proteins encoded by ASD candidate genes annotated in all pipeline categories and proteins encoded by American College of Medical Genetics (ACMG) recommended actionable genes. Top ASD-related proteins are highlighted in yellow, ACMG-recommended proteins in blue. Notably, none of the prioritized ASD genes were represented among the 59 genes compiled by the ACMG.


**Table 3.** Prioritized Genes with Pathogenic/Likely Pathogenic Variants.

Details are shown for the 14 Autism Spectrum Disorder (ASD) genes annotated in all functional categories included in the pipeline that had pathogenic variants defined in ClinVar. Included are human gene symbols, the brain region of highest expression in Genotype Tissue Expression (GTEx), traits associated in mouse knockouts, the corresponding Gene Ontology (GO) Biological Process (BP) defined in humans, the drug development level, and the PharmGKB ID for pharmacogenetic variant guidelines. TPM = transcripts per million, Tclin = FDA-approved compound targets, Tchem = molecules with known properties similar to approved targets, CNS = central nervous system, NA = not available.

#### **3. Discussion**

The central goal of this study was to build a bioinformatics pipeline to rapidly detect specific candidate genes that are more likely to be clinically relevant based on functional evidence and drug target properties. Compared to random gene sets, more ASD-related genes were expressed in control post-mortem brain tissue. In addition, more ASD genes were associated with postnatal growth phenotypes when knocked out in mice. Furthermore, ASD genes encoded more targets for FDA-approved drugs or bioactive molecules with drug-like properties, and more had a pharmacogenetic variant. Proteins encoded by ASD genes also showed more evidence for direct interactions with proteins encoded by the ACMG recommended clinically actionable genes. These results suggest that all of the functional categories incorporated in our pipeline were useful in identifying the candidate genes most likely to be functionally and clinically relevant. While there are numerous challenges limiting the clinical utility of information from genetic studies of complex conditions, the pipeline we developed addresses a key issue related to knowledge gaps among physicians regarding the benefits of genetic testing [44]. As incorporating genomic technology into patient care is largely dependent on clinicians' perspectives of its utility, it is important to identify ways to better inform clinicians about relevant genetic findings beneficial to optimizing treatment. Developing tools that summarize the extensive amounts of data available and present them in a manner that offers more immediate insights into the benefits for patient care is a necessary step toward more efficient integration of this knowledge into the clinic. Ultimately, we expect evidence from our pipeline will be useful in supporting clinical decisions regarding the benefits of genetic testing for optimizing treatment. Results from our work may also inform future research focused on drug repurposing, characterizing pleiotropic genetic effects connecting clinically distinct conditions or prioritizing genes for in vitro or in vivo studies aimed at elucidating molecular mechanisms underlying expression of symptoms in ASD.

## *3.1. ASD-Related Gene Expression in the Pituitary Is Increased Compared to Random Sets*

Although more ASD genes overall were expressed in the brain, evidence of differential expression was limited to the pituitary. Genes cited in connection with ASD were expressed at higher levels here compared to genes included in the random sets. During embryogenesis, non-dysfunctional genetic regulation of pituitary gland development results in typical production of numerous hormones and the appropriate release of neuropeptides, like oxytocin and arginine vasopressin [45]. Many of the hormones and neuropeptides produced and regulated by the pituitary have been suspected to be disrupted in individuals with ASD, as well as other neuropsychiatric disorders [46]. Notably, brain region specific gene expression data were obtained from individuals without evidence of disease [13]. It is possible that in individuals with ASD, dysregulation of these pituitary-expressed genes results in the development of a dysfunctional pituitary and subsequent issues with neurotransmission of molecules important to expression of social and repetitive behaviors, which are core symptoms of ASD [47]. It is unclear why other brain regions did not show evidence of differential expression of ASD candidate genes when compared to random sets of genes. While we chose to use the GTEx resource, as these data are more readily accessible—and likely more generalizable to other conditions—compared to resources containing data generated from ASD brain tissue, this may limit the ability to understand genetic differences that may be unique to the brains of individuals with ASD. Specifically, it is hypothesized that brain regions involved in social behavior drive ASD symptomatology; these include the amygdala, orbitofrontal cortex, temporoparietal cortex and insula [48]. With the exception of the amygdala, gene expression in the precise brain regions that are proposed biomarkers for ASD are not quantified in the GTEx resource potentially explaining why our results were limited to the pituitary. As opposed to prioritizing genes solely based on brain expression, focusing future work on prioritizing genes that are differentially expressed in the brains of patients with ASD may be more useful to pinpointing clinically relevant genetic data for ASD specifically.

### *3.2. ASD-Related Genes Associated with Abnormal Postnatal Growth in Mouse Knockouts*

ASD candidates were also more likely to be associated with growth/size/body region phenotypes when knocked out in mouse models included in the Knockout Mouse Phenotyping (KOMP) project. This result supports a risk model for neurodevelopmental disorders that reflects deviations from typical body size during development [49]. However, it should be noted that while the ambitious goal of the KOMP project is to eventually test for associations between the extensive phenotypes included in the pipeline and every protein coding gene in the mouse, a number of genes have yet to be evaluated. For example, the *Phosphatase and Tensin Homolog* (*PTEN*) gene was excluded during this annotation step. *PTEN* is well-known to contain pathogenic clinically-actionable variants and has not

yet been tested via the KOMP pipeline (https://www.mousephenotype.org/data/genes/MGI:109583). This is likely due to the fact that this gene is an essential gene and is homozygous lethal as a null resulting in a limited embryo phenotype [50]. Notably, *PTEN* has been well studied in disease-specific mouse KO projects and these data are available via the Mouse Genome Informatics (MGI) resource (http://www.informatics.jax.org/marker/key/31908). Unfortunately, MGI data are not readily accessible via an application program interface (API), like the IMPC data, making it more difficult to incorporate into an automated pipeline. Furthermore, there may be bias introduced when incorporating evidence obtained from data generated in disease-specific studies as many of these focus on confirming reports in humans and may only test for specific traits reflecting the implicated human disease [51,52]. It is expected that as the IMPC database expands to include more genes, the automated gene prioritization pipeline we developed will also improve. Future work may focus on determining the level of contribution mouse model data adds to the prioritization accuracy of our pipeline. In addition, we may specifically incorporate information regarding embryonic lethal phenotypes as part of the gene annotations.

#### *3.3. More ASD Genes Are Implicated in Drug Response*

Also notable is more ASD genes encode molecules that are useful to drug development, supporting the ability to maximize drug efficacy while minimizing adverse events. As mentioned above, numerous treatment regimens using pharmaceutical compounds have been developed to address specific symptoms and co-occurring conditions in ASD; however, there is insufficient evidence supporting efficacy for many compounds with many reported adverse events [7–9]. Functional characterization of implicated genes that encode targets for drugs not currently used to treat symptoms in ASD may offer opportunity for repurposing [53]. Additionally, identifying genes that encode similar molecules to current targets may help pinpoint genes and pathways that should be fast-tracked for functional study as novel targets. Furthermore, as many individuals carry pharmacogenetic variants that influence drug response [54], identifying these variants in ASD genes that encode targets for drugs used to treat ASD symptoms is an important avenue of research that may provide results helpful toward more effective personalized treatment of symptoms in ASD.

### *3.4. More ASD-Related Proteins Interact with ACMG Gene Encoded Proteins*

Genes that are recommended for testing by the ACMG are almost entirely those containing variants having a strong relationship with increased cancer risk [42,55]. Given the mounting evidence that much of the genetic architecture underlying expression of cancer is shared with ASD [56], it is interesting that more ASD proteins were predicted to directly interact with ACMG proteins. It is possible that these results support a connection between ASD and cancer etiology. Regardless, identifying specific ASD-related proteins that interact with cancer susceptibility proteins may be important for recognizing genes with disease-causing variants that should be evaluated at the bench and in the clinic.

#### *3.5. Prioritized ASD Candidate Genes More Likely to Have Pathogenic Variants*

Ultimately, the pipeline we developed identified a subset of ASD-related genes with a higher proportion of pathogenic or likely pathogenic variants compared to the initial list of genes cited in connection with ASD. This suggests the approach is useful to deciphering clinically relevant genetic results from all of the currently available evidence. Notably, the pipeline is also useful in a broader sense for functional annotation of gene lists to select genes that are worthy of further study. By automating this process as much as possible, clinically useful results for hundreds of genes implicated in ASD can be delivered rapidly.

## *3.6. Limitations and Future Directions*

Limitations for each step in the pipeline and future work aimed at overcoming these limitations have been noted throughout. Additional limitations of the approach include the inability to automate the pipeline in its entirety. This is due to the availability of many of the resources utilized. As detailed in the Material and Methods, resources that were considered important for ensuring accuracy of the pipeline (e.g., DRSC Integrative Ortholog Prediction Tool, Ontology Mapping repository) did not have an API thereby requiring direct downloads of these data. A future goal to circumvent automation issue is to establish a database with cached data to allow for complete automation. In addition, the genetic landscape is dynamic, meaning results obtained at one specific time point represent only a snapshot of the available information, and this may be potentially misleading. Future work will focus on regularly incorporating updates into pipeline automation to ensure the most current evidence is incorporated.

## **4. Materials and Methods**

A schematic of the entire functional annotation pipeline developed is provided in Figure 4. More specifically, an initial list of ASD candidate genes was identified by querying data available in the DisGeNET 6.0 database of human gene-disease associations [17], v1.1.0 update May 2019 (http://www.disgenet.org). The benefits of this database include gene-disease associations identified via text-mining of multiple sources and includes evidence from studies of idiopathic ASD cases (including cases from multiplex families and sporadic cases from simplex families), as well as studies conducted in individuals with underlying familial and non-familial syndromes. All Unified Medical Language System Concept Unique Identifiers (CUI) that relate to ASD were selected from the disease mappings file provided by DisGeNET and a data frame of genes with evidence for a relationship with CUIs relating to ASD was created. CUIs queried in DisGeNET were as follows: Autism Spectrum Disorders [C1510586], Autistic behavior [C0856975], and Autistic Disorder [C0004352], Autistic features [C1846135], Autistic spectrum disorder with isolated skills [C1298684].

Next, the list of 'ASD-related' genes were annotated to identify those expressed in the human brain. We used RNA-sequencing data from non-diseased brain tissue made available via the Genotype-Tissue Expression (GTEx) project, an ongoing collaborative effort to build a comprehensive public resource to study tissue-specific gene expression and regulation [13,57]. Genes expressed at a default cut-off of ≥ 0.5 transcripts per million (TPM) across all available brain regions in GTEx were downloaded from the Expression Atlas (https://www.ebi.ac.uk/gxa/home) and queried for ASD-related genes. Available brain regions included the amygdala (*n* = 129 individuals), anterior cingulate cortex (*n* = 147), caudate (*n* = 194), cerebellar hemisphere (*n* = 175), cerebellum (*n* = 209), frontal cortex (*n* = 175), hippocampus (*n* = 165), hypothalamus (*n* = 170), nucleus accumbens (*n* = 202), putamen (*n* = 170) and substantia nigra (*n* = 114).

To further identify genes encoding proteins that function in biological processes relevant to ASD etiology, conditional gene set overrepresentation analyses were conducted comparing candidate genes to all human protein coding genes included in Ensembl release 99 [58]. We used the parent-child algorithm from the TopGO R package version 2.38.1 (The R Foundation for Statistical Computing, Vienna, Austria) for overrepresentation analyses which incorporates knowledge about hierarchical relationships between GO terms into the calculation of statistical significance [59,60]. Significance was determined using Fisher's exact test and set at a Bonferroni-adjusted level of α = 0.05/*n* where *n* is the number of nodes tested. As a goal was to find genes with a phenotypic consequence when knocked out in mice, we then mapped human genes to the most likely mouse orthologs using the best match from all prediction tools available via the DRSC Integrative Ortholog Prediction Tool (Version 8.0 August 2019; https://www.flyrnai.org/diopt). Only orthologs that had a high rank, indicating that the number of tools that support the orthologous gene-pair relationship was ≥ 2, and that this ortholog had the best score in both forward and reverse mapping, were included in subsequent analyses [61]. Overrepresentation analyses were then conducted comparing mouse orthologs for ASD genes to all mouse protein coding genes as described above for human genes. Biological processes overrepresented for human ASD genes and mouse ASD gene orthologs were cross-referenced to identify overlap. We then mapped significantly overrepresented GO terms defined in both humans and mice to mammalian phenotype terms (MP) using direct mapping (i.e., distance = 1) from the

Ontology Mapping repository, OxO (https://www.ebi.ac.uk/spot/oxo, updated 11 September 2020), which is hosted by the Ontology Lookup Service [62]. The genotype-phenotype representational state transfer API from the IMPC (http://www.mousephenotype.org/) [63] was subsequently queried using the jsonlite R package v1.6.1 (The R Foundation for Statistical Computing, Vienna, Austria) [64] to prioritize genes that have a phenotypic consequence—associated with *p* ≤ 0.05 in KOs of either sex—representing top level MP terms that reflect the ASD gene overrepresented GO-defined processes.

**Figure 4.** Candidate Gene Identification and Annotation Pipeline. All human protein coding genes defined in Ensembl, with evidence for influencing risk for concept unique identifiers reflecting Autism Spectrum Disorder (ASD) were identified from the DisGeNET resource which compiles evidence from all sources included in the corresponding table on the left side of the schematic. The initial list of 956 ASD candidate genes was then annotated using the publicly available resources included on the right side of the schematic, based on the criteria indicated above each arrow for each annotation step. TPM = transcripts per million, Tclin = FDA-approved compound targets, Tchem = molecules with properties similar to approved targets.

To identify genes encoding currently approved drug targets or those that may be novel drug targets, we used jsonlite to query the Pharos database (https://pharos.nih.gov/idg/api). Available drug development levels included: FDA-approved compound targets (Tclin), molecules with known properties similar to approved drug targets (Tchem), proteins with known biological or molecular functions but no known drug target properties (Tbio), and proteins with relatively unknown function (Tdark). To identify genes with pharmacogenetic variants evidenced to influence individual responses to drugs, we directly downloaded variant and clinical annotations data from PharmGKB (https: //www.pharmgkb.org/downloads), update 20 March 2020. Only those associations that were reported in the literature as significant were used in annotations.

To decipher genes that are potentially actionable, protein-protein interactions between ASD candidate proteins and proteins currently recommend as clinically actionable by the ACMG were predicted using STRINGdb R package v10 (The R Foundation for Statistical Computing, Vienna, Austria) [65]. The ACMG 2016 update SF v2.0 [42] was used for network prediction (https://www. ncbi.nlm.nih.gov/clinvar/docs/acmg/). We focused on ASD-related proteins with evidence for direct interactions with proteins encoded by ACMG-recommended genes. We also calculated the quantity of direct interactions between proteins in this network. ACMG-ASD protein interaction networks were visualized using Cytoscape v3.5.1 (The Cytoscape Consortium, San Diego, CA, USA) [66].

To help show that the pipeline was useful for prioritizing clinically relevant candidate genes, we selected 1000 random sets of protein coding genes in humans (Ensembl release 99), while allowing for replacement, of equal number to the ASD gene list from biomaRt R package v 2.42.1 (The R Foundation for Statistical Computing, Vienna, Austria) [67] and annotated as described above. The one proportion test was used to determine if the proportion of genes in the ASD set that were annotated was increased compared to the average proportions of genes across all random sets (i.e., the expected proportion). Significance was based on a Benjamin-Hochberg false discovery rate corrected at *p* ≤ 0.05. To determine if specific brain regions were enriched for ASD candidate gene expression compared to random genes, we used the limma package in R v 3.38.3 (The R Foundation for Statistical Computing, Vienna, Austria) [68]. The average expression of each gene set in the respective brain regions was clustered based on Euclidean measures. Tests for differences in the average gene expression of ASD genes and the average expression across genes in the random sets among brain regions were done using multivariate analysis of variance. Significance was based on a Benjamini-Hochberg false discovery rate corrected at *p* < 0.05. Kruskal-Wallis test was used to determine if the number of direct connections between ACMG-recommended gene encoded proteins were different across the ASD and random sets. Evidence included in ClinVar expanding upon current ACMG recommendations was also used as a gold standard reference to determine how often ASD genes with confirmed pathogenic or likely pathogenic variants were prioritized.

**Supplementary Materials:** Supplementary Materials can be found at http://www.mdpi.com/1422-0067/21/23/ 9029/s1.

**Author Contributions:** Conceptualization, O.J.V., M.G.B., S.H.E., B.A.M., J.S.S., J.H.M.; methodology, validation, investigation, formal analysis, data curation, and writing—original draft preparation, O.J.V.; writing—review and editing, M.G.B., S.H.E., B.A.M., J.S.S., J.H.M.; supervision, J.H.M.; funding acquisition, O.J.V. and J.H.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Library of Medicine, K01LM012870 (OJV) and R01LM010098 (JHM). The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **Abbreviations**


## **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Genomic, Clinical, and Behavioral Characterization of 15q11.2 BP1-BP2 Deletion (Burnside-Butler) Syndrome in Five Families**

**Isaac Baldwin 1,2,†, Robin L. Shafer 3,†, Waheeda A. Hossain 1,2, Sumedha Gunewardena 4, Olivia J. Veatch 1,4, Matthew W. Mosconi 3,5 and Merlin G. Butler 1,2,\***


**Abstract:** The 15q11.2 BP1-BP2 deletion (Burnside-Butler) syndrome is emerging as the most common cytogenetic finding in patients with neurodevelopmental or autism spectrum disorders (ASD) presenting for microarray genetic testing. Clinical findings in Burnside-Butler syndrome include developmental and motor delays, congenital abnormalities, learning and behavioral problems, and abnormal brain findings. To better define symptom presentation, we performed comprehensive cognitive and behavioral testing, collected medical and family histories, and conducted clinical genetic evaluations. The 15q11.2 BP1-BP2 region includes the *TUBGCP5*, *CYFIP1*, *NIPA1*, and *NIPA2* genes. To determine if additional genomic variation outside of the 15q11.2 region influences expression of symptoms in Burnside-Butler syndrome, whole-exome sequencing was performed on the parents and affected children for the first time in five families with at least one parent and child with the 15q1l.2 BP1-BP2 deletion. In total, there were 453 genes with possibly damaging variants identified across all of the affected children. Of these, 99 genes had exclusively de novo variants and 107 had variants inherited exclusively from the parent without the deletion. There were three genes (*APBB1*, *GOLGA2*, and *MEOX1*) with de novo variants that encode proteins evidenced to interact with CYFIP1. In addition, one other gene of interest (*FAT3*) had variants inherited from the parent without the deletion and encoded a protein interacting with CYFIP1. The affected individuals commonly displayed a neurodevelopmental phenotype including ASD, speech delay, abnormal reflexes, and coordination issues along with craniofacial findings and orthopedic-related connective tissue problems. Of the 453 genes with variants, 35 were associated with ASD. On average, each affected child had variants in 6 distinct ASD-associated genes (*x* = 6.33, sd = 3.01). In addition, 32 genes with variants were included on clinical testing panels from Clinical Laboratory Improvement Amendments (CLIA) approved and accredited commercial laboratories reflecting other observed phenotypes. Notably, the dataset analyzed in this study was small and reported results will require validation in larger samples as well as functional follow-up. Regardless, we anticipate that results from our study will inform future research into the genetic factors influencing diverse symptoms in patients with Burnside-Butler syndrome, an emerging disorder with a neurodevelopmental behavioral phenotype.

**Keywords:** 15q11.2 BP1-BP2 deletion; Burnside-Butler syndrome; clinical findings; cognition; neuropsychiatric behavior development; genomic characterization; exome sequencing; protein– protein interaction

**Citation:** Baldwin, I.; Shafer, R.L.; Hossain, W.A.; Gunewardena, S.; Veatch, O.J.; Mosconi, M.W.; Butler, M.G. Genomic, Clinical, and Behavioral Characterization of 15q11.2 BP1-BP2 Deletion (Burnside-Butler) Syndrome in Five Families. *Int. J. Mol. Sci.* **2021**, *22*, 1660. https://doi.org/10.3390/ ijms22041660

Academic Editor: Paola Bonsi Received: 15 January 2021 Accepted: 2 February 2021 Published: 7 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

## **1. Introduction**

Chromosome 15 abnormalities have been reported for a number of years in the medical literature, specifically for Prader-Willi (PWS) and Angelman (AS) syndromes, the first examples of genomic imprinting in humans [1–4]. These disorders are generally due to a chromosome 15q11-q13 deletion depending on the parent-of-origin (i.e., PWS—paternal, AS—maternal). The typical 15q11-q13 deletions are classified as either Type I with deletions involving the proximal 15q breakpoint (BP1) and a distal 15q breakpoint (BP3), or Type II relating to a smaller 15q11-q13 deletion involving a second proximal breakpoint (BP2) and distal BP3. The larger Type I deletion is approximately 6 Mb and includes the *TUBGCP5*, *CYFIP1*, *NIPA1*, and *NIPA2* genes, while the smaller Type II deletion is approximately 5.5 Mb with all four genes intact [5].

Clinical and behavior differences have been reported for the past 15 years involving specific deletion classes in both PWS and AS. For example, individuals with PWS or AS having the Type I deletion, generally have more learning and behavioral problems compared to those with the Type II deletion [6,7]. Specifically, patients with PWS and larger deletions have more compulsions and maladaptive behaviors, as well as lower cognition, reading and math skills when compared to PWS patients with smaller deletions [3,5,6,8]. In AS, more impaired speech and seizure activity are noted in individuals with the larger deletion [4].

The emerging 15q11.2 BP1-BP2 microdeletion (Burnside-Butler) syndrome (BBS) encompasses the region between the PWS/AS chromosome 15q deletion breakpoints and includes the *TUBGCP5*, *CYFIP1*, *NIPA1*, and *NIPA2* genes. This microdeletion was consistently reported in early studies of patients presenting with unexplained behavioral, cognitive, and/or psychiatric problems [9–11]. Ho et al. [12] later summarized the results of ultra-high microarray single nucleotide polymorphism (SNP) analysis and found this microdeletion to be the most common cytogenetic finding observed in over 10,000 consecutive patients studied and presenting for genetic services with features of ASD or other neurodevelopmental disorders. Furthermore, a systematic literature review by Cox and Butler [10] found over 200 individuals reported with this microdeletion and grouped the clinical findings into five categories: (1) developmental, speech, and motor delays (73%, 67%, and 42% of cases, respectively); (2) dysmorphic ears and palatal anomalies (46%); (3) writing and reading impairment, memory problems, and verbal IQ scores ≤ 75 (50–60%); (4) general behavior problems, unspecified (55%); and (5) abnormal brain imaging, including a smaller brain surface with a thicker cortex (43%).

Notably, the four genes encoded in the 15q11.2 BP1-BP2 region are syntenic, biallelically conserved, and functionally predicted to interact with each other along with seven other genes (i.e., *IGFBP2*, *CFHR1*, *CFHR3*, *MNS1*, *SPG20*, *BMPR2*, and *SPAST*) recently reported and analyzed via in silico studies and STRING functional interactions network [13]. Rafi and Butler [13] also found that the encompassed four protein-coding genes showed 11 nodes and 34 edges. Network nodes represent proteins with splice isoforms or post-translational modifications collapsed into each node for all proteins produced by a single protein-coding gene. Edges represent protein–protein associations that jointly contribute to a shared function. These genes are at the center of our focus on genomics and clinical findings in individuals with the 15q11.2 BP1-BP2 deletion.

A major focus of our report is to identify variants if present in the non-deleted alleles of the four genes from the affected children in five unrelated families with the 15q11.2 BP1-BP2 microdeletion and characterize potential influences of genome-wide variation on symptom expression using whole-exome sequencing in trios. We assessed clinical, behavioral, and cognitive phenotypes as well as physical and motor development in relationship to genetic findings in each subject in separate families, the first study of its kind in this emerging syndrome.

## **2. Results**

Five families were studied. Each family included one parent and at least one child with the 15q11.2 BP1-BP2 deletion, as confirmed by Methylation Specific-Multiplex Ligation Probe Amplification (MS-MLPA) assays. Each family was initially ascertained via the child having neurodevelopmental problems and/or ASD; chromosomal studies confirmed the 15q11.2 BP1-BP2 microdeletion. The parents were then cytogenetically analyzed to identify the deletion and five families with six affected children (three males and three females) were recruited for study.

#### *2.1. Cognitive and Behavioral Features*

Results of cognitive and behavioral testing obtained from members of five families with the 15q11.2 BP1-BP2 deletion are shown in Tables 1 and 2. In terms of general cognitive functioning, 9/11 individuals with 15q11.2 BP1-BP2 microdeletions performed in the average to superior range; whereas, one affected child (Subject 3) scored just below the normal range for full-scale IQ, and one affected child (Subject 11) was unable to complete cognitive testing due to limited receptive and expressive language understanding and severe developmental delay. Academic abilities (spelling, reading comprehension, and math) were broadly commensurate with individuals' cognitive abilities, though three individuals showed spelling abilities at least 1 standard deviation (SD) below their verbal abilities (Subjects 2, 3, and 9); one showed suppressed reading relative to verbal abilities (Subject 3), and one showed suppressed math relative to their nonverbal abilities (Subject 7). Verbal memory abilities appeared to be relatively intact among individuals. The majority of participants performed within 1 SD of the mean across subscales of the California Verbal Learning Test [14] (CVLT). Two participants (Subjects 1 and 3) scored below average on most subscales, however, and Subject 10 performed 1.5 SD below the mean on long delay recall and 3 SD below the mean on the recognition subscale. All individuals tested performed within the normal range or above on the Peabody Picture Vocabulary Test, Fourth Edition [15] (PPVT-4), indicating intact receptive language ability. Results from the Trail Making tests [16] indicated performance in the average to superior range across participants on Part A, demonstrating that psychomotor speed and planning was intact. In contrast, 4/10 participants showed performance on Part B that was greater than one 1 SD below their performance on Part A, suggesting the ability to rapidly and flexibly shift between response sets was selectively disrupted.

Results from Autism Diagnostic Observation Schedule, Second Edition [17,18] (ADOS-2) testing suggested 3/6 affected children met testing criteria for a classification of ASD (Subjects 2, 9, and 11; Subject 3 was 1 point below threshold). The Vineland Adaptive Behavior Scale, Third Edition [19] (VABS-III) indicated that all of the children in our cohort experienced deficits in a broad range of adaptive skills, with four of the six participants scoring two standard deviations below the population average on at least one subscale. For 6/7 affected children, rates and severity of repetitive behavior were comparable to children with ASD of similar ages [20]. The Broad Autism Phenotype Questionnaire [21] (BAP-Q) indicated autism-associated traits in 2/5 affected parents (Subjects 1 and 6) based on score thresholds defined by Sasson et al. [22]. Three of the five parents displayed repetitive behavior severity that was comparable to individuals with ASD of similar ages [20].

*Int. J. Mol. Sci.* **2021**, *22*, 1660


**Table1.**Cognitivetestingandresultsforfamilieswith15q11.2BP1-BP2deletion.

Age years (yr) subject family 15q11.2 non-compliance. reported (mean std=±1),scoresforallothermeasuresarenormedscoreswithmean=100,std=±15.

*Int. J. Mol. Sci.* **2021**, *22*, 1660


**Table2.**Behavioraltestingandresultsforfamilieswith15q11.2BP1-BP2deletionsyndrome. Age in years (yr) and subject number for family members with the 15q11.2 BP1-BP2 deletion are indicated. RRB: Restricted and repetitive behaviors, NT: not tested—measure was only administered to eitherparentorchildcohort.\*TheseresultsarefromADOS-2Module1ratherthanModule3.

#### *2.2. Sensorimotor Ability*

Group means, standard deviations, and effect sizes for postural control are reported in Table 3. Six children with the 15q11.2 BP1-BP2 deletion, six age- and sex-matched control children, five affected parents, and four age- and sex-matched control adults completed sensorimotor testing. Children with the 15q11.2 BP1-BP2 showed medium to large elevations in center of pressure (COP) length (d = −0.64) and variability in mediallateral (ML) sway relative to control children (d = −0.70), and small increases in variability of anterior-posterior (AP) sway (d = −0.38). Affected parents showed large increases in COP length relative to control adults (d = −0.94), and these differences were medium for variability of AP sway (d = −0.61).

**Table 3.** Group comparisons of postural ability in children and adults with 15q11.2 BP1-BP2 deletion syndrome and matched controls.


Group means and standard deviations (SD) for children and parents with the 15q11.2 BP1-BP2 deletion (BBS) and their matched controls as well as effect sizes (Cohen's *d*) for group comparisons. Negative effect sizes indicate greater variability for BBS children or adults relative to controls. COP: center of pressure, ML SD: medial-lateral standard deviation, AP SD: anterior-posterior standard deviation.

#### *2.3. Whole-Exome Sequencing*

The per sequence Phred quality scale was above 35 for all the samples and 99.5% of sequenced reads mapped to the genome, resulting in ~37.1 million mapped reads per sample. There were 67,994 variants identified across the dataset. Of these, 526 were identified as high confidence, potentially damaging variants (PDVs) located in 453 distinct genes in affected children (see Figure 1). Each affected child had an average of 100 PDVs (*x* = 100.33, sd = 15.97) affecting 88 different genes (*x* = 88.00, sd = 12.88). The most prevalent type of PDV were missense variants (36.009%), followed by frameshifts (15.83%), inframe insertions and deletions (InDels; 15.67%), splice site variants (9.17%), losses or gains of stop codons (10.84%), protein altering variants (0.67%), and losses of start codons (0.33%; Figure 2). Almost all types of variants had a proportion that were inherited, de novo, or of unknown inheritance (Figure 2). In total, 132 of PDVs identified in affected children were predicted de novo. Some variants that were de novo in one child were observed inherited in others. Notably, there were 99 distinct genes with PDVs that were exclusively de novo. In addition, there were 125 PDVs in 107 genes that were inherited exclusively from the parent without the deletion.

There were three genes with PDVs that were exclusively de novo evidenced to encode proteins that interact with the protein encoded by *CYFIP1* which is located in the region deleted in patients with BBS (Figure 3). These included an inframe deletion in *APBB1* in Subject 3, a missense mutation in *GOLGA2* in Subject 2, and a missense mutation in *MEOX1* in Subject 7. In particular, *APBB1* is associated with ASD (Supplemental Table S1). As noted above, Subject 3 was near the threshold for an ASD diagnosis based on the ADOS-2, and had a history of global developmental delay (Table 4). Notably, the location of the deletion identified in *APBB1* affects a region of the protein that contains simple sequence repeats (e.g., low-complexity region). As such, it is unclear whether or not this type of variation would influence phenotype expression.

**Figure 1.** Quality control and variant selection. Overview of the criteria used to select high confidence, potentially damaging variants from whole-exome sequence data. Maximum allele frequencies (Max AF) were based on the maximum observed frequency across all reference populations available in the 1000 Genomes Project, the European Standard Population, and the Genome Aggregation Database. Abbreviations: GQ = genotype quality, SNVs=single nucleotide variant, InDels = insertion/deletion variant, DP = depth, VEP = Variant Effect Predictor.

**Figure 2.** Predicted protein consequences of potentially damaging variants in affected children (Families A–D). Shown are the frequencies of Variant Effect Predictor consequences for variants identified in five affected children that were highly or moderately likely to damage the protein product. Data were not included for the affected child from Family E as the parents' data were unavailable. Colors indicate variant consequences that were inherited (blue), de novo (crimson), or unknown (grey = uncharacterized Mendelian violations, or improbable homozygous de novo).

**Figure 3.** Genes with de novo variants with evidence for downstream relationships with *CYFIP1*. Shown are three genes prioritized following Ingenuity Pathway Analysis of all genes with de novo variants in affected children that were evidenced to be involved in the etiology of diseases relevant to symptoms observed in affected children. Included are protein–protein interactions (PP) and diseases and functions where variation in the gene is causal (C).

*Int. J. Mol. Sci.* **2021**, *22*, 1660


compulsive disorder,

ODD—oppositional

 defiant disorder,

GAD—generalized

 anxiety disorder.

## **4.**Medicalhistoryoffamilieswith15q11.2BP1-BP2deletion.

There were also genes with PDVs inherited exclusively from the parent without the 15q11.2 BP1-BP2 deletion that encoded proteins interacting with the products of two genes encoded in the deleted region, *CYFIP1* and *TUBGCP5* (Figure 4). Specifically, missense mutations were identified in two genes. These included *FAT3* where a PDV was observed in Subject 2; Subject 3 had the same variant in this gene and a variant in *GOLGA2*. Neither of these genes are associated with ASD or included on any of the evaluated clinical testing panels.

**Figure 4.** Genes with variants transmitted from non-deleted parents to affected children with evidence for downstream relationships with *CYFIP1*. Shown are genes prioritized following Ingenuity Pathway Analysis of all genes with variants in affected children that were inherited from parents without the 15q11.2 BP1-BP2 microdeletion that were also evidenced to be involved in the etiology of diseases relevant to symptoms observed in affected children. Included are protein–protein interactions (PP) and diseases and functions where variation in the gene is causal (C) or correlated (CO).

> Other genes of interest with PDVs were identified based on associations with clinical and physical findings in affected subjects as described in Tables 1, 2, 4 and 5. Notably, an average of 17 genes (*x* = 17.3, sd = 5.3) with PDVs in each patient were associated with ASD. As seen in Figure 5, Subject 2 had the most ASD-associated genes with PDVs (*n* = 25). Compared to other affected children, Subject 2 also had the most severe repetitive behaviors measured via the Repetitive Behavior Scale – Revised [23] (RBS-R) and the most severe ADOS-2 score (Table 2). In addition, all affected children had PDVs in multiple genes that were included on clinical testing panels for intellectual disability (*x* = 8.7, sd = 4.7), ataxia (*x* = 6.8, sd = 2.9), epilepsy (*x*= 6.2, sd = 2.8), comprehensive cardiovascular defects (*x* = 2.8, sd = 1.1), and neuronal migration disorders (*x* = 2.0, sd = 1.1). There were four subjects with PDVs in genes included on the cerebral cortical malformations panel, almost all of whom had some evidence of neurological issues (Tables 4 and 5). Additionally, of note, both Subjects 7 and 11 had PDVs in genes included on the testing panel for cleft palate—*GLI2* and *FOXE1*, respectively— and had craniofacial malformations involving the mouth (Table 5). In addition, Subject 5 inherited a stop-loss variant in a connective tissue disorder gene, *FLCN*, from the parent with the deletion (Subject 4) both of whom had a history of musculoskeletal findings (Table 4). More details for all genes with PDVs meeting our inclusion criteria are provided in the Supplemental Table S1.



**Table5.**Specificclinicalevaluationandphysicalexamfindingsoffamilieswith15q11.2BP1-BP2

*Int. J. Mol. Sci.* **2021**, *22*, 1660


**Table5.***Cont.*

**Figure 5.** Number of genes with possibly damaging variants (PDVs) in each affected child associated with disorders of interest. Shown are the total number of genes with PDVs on the y-axis, that are also evidenced to be involved in the conditions on the x-axis. For autism spectrum disorder (ASD), evidence was based on the Simons Foundation for Autism Research Institute gene list. For other conditions, evidence was based on inclusion on clinical testing panels from Clinical Laboratory Improvement Amendments (CLIA) approved and accredited commercial laboratories.

#### **3. Discussion**

This study is the first of its kind to characterize phenotypic, behavioral, and cognitive measures combined with exome sequencing in families with the 15q11.2 BP1-BP2 deletion. An initial goal was to determine if the sequences of one or more of the four genes in the 15q11.2 BP1-BP2 region showed a variant inherited from the parent with the intact (non-deleted) chromosome. When disturbed, the four genes in the 15q11.2 region are associated with cognitive impairment, speech and/or motor delay, dyslexia, and psychiatric/behavioral problems (e.g., attention deficit hyperactivity disorder (ADHD), autism, schizophrenia, or psychosis). The cardinal disease associations for the four contiguous genes in the 15q11.2 BP1-BP2 region are: *NIPA1*—Spastic Paraplegia 6; *NIPA2*—Angelman syndrome and Prader-Willi syndrome; *CYFIP1*—fragile X syndrome and autism; and *TUBGCP5*—Prader-Willi syndrome. The four genes are individually associated with PWS, ASD, schizophrenia, epilepsy, and Down syndrome. Collectively, all four genes have been associated with up to 75% of patients with ten distinctive neurodevelopmental disorders [13].

The addition of newly reported findings including ataxia, poor coordination, seizures, and congenital anomalies including palatal, heart, and ear defects along with structural brain disturbances [11] which are also associated with the four genes in the 15q11.2 BP1-BP2 region raises the question of whether these genes interact with other genes, their biological

processes or molecular functions. These related genes may play a role in the clinical presentation causing core features of Prader-Willi and Angelman syndromes as additional clinical structural differences are seen in those with the four genes deleted in the typical 15q11-q13 Type I deletion seen in these syndromes. For example, dysfunctional variation in the *NIPA1* and *NIPA2* genes could impair the function of magnesium transport as both genes encode magnesium transporters [24,25]. Their biological processes and molecular functions could regulate axonogenesis and axon extension via relationships with bone morphogenetic protein (BMP) and signaling pathways, regulations of cellular and developmental growth, and interaction with the *FMR1* gene causing fragile X syndrome [13]; all pertinent and relevant to the reported variable clinical phenotypes seen in this microdeletion syndrome. We used whole-exome sequence data to identify other genes outside of the deleted region with possibly damaging variants to help detect genetic effects underlying expression of symptoms in the affected child bringing the family to medical attention. Detailed physical examinations and family pedigrees were performed, for the first time, by an experienced clinical geneticist trained as a dysmorphologist to characterize the phenotype and review of systems on each subject. In addition, cognitive and behavior testing, including motor assessments for ataxia or balance disturbances of each family member, were performed using various validated techniques and tests by experts in the field. These studies were the major outcome measures for comparison with the genomic data and analysis for similarities among our families with the 15q11.2 BP1-BP2 deletion.

## *3.1. Clinical and Neuropsychiatric Behavior Developmental Findings*

We did not identify consistent patterns of cognitive impairments across individuals with 15q11.2 BP1-BP2 microdeletion syndrome. General cognitive, academic, and receptive vocabulary abilities were relatively intact with only one participant, Subject 11, not able to complete standardized IQ testing, showing indications of intellectual/developmental delay. Similarly, verbal memory abilities appeared to be unaffected across the majority of participants, though 3/10 participants showed mild deficits in suggesting that more selective issues in verbal memory and learning may impact a subset of individuals with the 15q11.2 BP1-BP2 microdeletion. Similarly, multiple participants (4/10) showed executive deficits characterized by a reduced ability to flexibly shift response sets. These participants were largely nonoverlapping with those showing verbal memory issues indicating that these cognitive effects may be relatively distinct across individuals with 15q11.2 BP1-BP2 deletion syndrome. Several children had a history of learning problems reported by parents, school records, or neuropsychological evaluations.

Our results suggest that individuals with 15q11.2 BP1-BP2 deletions have increased risk for ASD. The prevalence of ASD in the general population is estimated to be 1 in 54 (1.8%) [26]; however, in our sample, we found that 3/6 (50%) of affected children met testing standards for a diagnosis of ASD and 2/5 (40%) of affected parents demonstrated elevated autistic traits (one additional child, Subject 3, and one additional parent, Subject 10, scored just below the cutoff for the ADOS-2 and BAP-Q, respectively). Consistent with high rates of ASD and ASD-related traits in our sample, we observed high rates of repetitive behavior in a majority of our participants with 5/6 affected children and 3/5 adults demonstrating severity of repetitive behavior that is comparable to similarly aged persons with ASD [20]. Of note, our sample of 15q11.2 BP1-BP2 deletion carriers (parents) may actually underrepresent the prevalence of ASD in the population of these carriers since persons who are parents and those who volunteer to participate in research and travel significant distances are likely to represent a cohort with relatively mild symptoms. Notably, all of the affected children were observed to have variants in multiple genes that were associated with ASD or found on the intellectual disability testing panel.

One limitation to classification of ASD in this sample is the use of only one observational diagnostic tool (the ADOS-2), rather than using a combination of clinical observation, parent interviews and strict DSM-V criteria to confirm diagnosis; however, scores on the ADOS-2 combined with scores on the Vineland and RBS-R strongly indicate elevated

rates of ASD and ASD-related traits in our sample. Similar to findings from studies of individuals with idiopathic ASD, the majority of our participants (9/11) showed adaptive functioning abilities below the mean for their age. These findings suggest that the 15q11.2 BP1-BP2 deletion confers increased risk for ASD and functional impairments independent of selective impacts on cognitive abilities.

Most of the individuals in this sample presented with at least one dysmorphic feature, some of which were present across multiple families. Five had abnormal ear findings such as broad, soft, fleshy, or overfolded ears. The child of Family C also had a smooth upper lip and philtrum. In the case of Family E, both the father and child had a broad, round face as well as broad hands. Flat feet were present in three individuals. Both the mother and child of Family B had small upper incisors. Six of the participants had eye findings demonstrating ptosis including both affected individuals from Family C and the father of Family E. The connective tissue finding of mild scoliosis was observed in two unrelated individuals whereas kyphosis was found in one other participant. Hyperextensibility or instability of various joints was a common feature, with unrelated individuals demonstrating a positive Beighton hyperflexibility score of at least six out of nine showing hypermobile joints. The 6-year-old child of Family A had a history of ankle instability and wore leg braces. Leg asymmetry was also found in two unrelated individuals. Pectus carinatum was seen in one individual while four participants had loose, soft skin, and two unrelated individuals had birth marks. Two unrelated affected children had a reported history of delayed wound healing.

Neurological problems were also present in several children. Two were diagnosed with epilepsy, one had non-essential tremor, and the child of Family D had gross hypotonia. The mother and child of Family D were both found to have decreased deep tendon reflexes, whereas the child of Family B had increased deep tendon reflexes. Ataxia was seen in the child of Family E. Motor delay was also relatively common, with four affected children showing delayed motor milestones. Motor deficits in affected individuals were also evident in our tests of postural control. While this test does not have normed scores or clinical cutoffs, group averages and effect sizes indicate that both children and adults with the 15q11.2 BP1-BP2 microdeletion show increased variability of postural sway relative to nonaffected controls. This is consistent with findings of increased variability of motor behavior in neurodevelopmental disorders including Prader-Willi syndrome [27], ASD [28,29], and fragile-X associated disorders [30], and involvement of ataxia-related genes in 15q11.2 BP1- BP2. Of note, all affected children had PDVs in genes evidenced to cause ataxia, epilepsy, comprehensive cardiovascular defects, and neuronal migration disorders. In addition, many children had PDVs in genes involved in connective tissue disorders, cerebral cortical malformations, micro/macrocephaly, and congenital malformations with craniofacial defects that are included on the clinical cleft palate DNA testing panel.

#### *3.2. Protein–Protein Interactions and Functions Related to NIPA1, NIPA2, CYFIP1 and TUBGCP5 Genes in the 15q11.2 BP1-BP2 Region*

Of particular interest are the genes that had either a de novo variant, or a variant inherited from the parent without the deletion that encode proteins that interact with products of the four genes in the 15q11.2 BP1-BP2 region. As reported by Rafi and Butler [13] when examining the protein–protein interactions of the four genes in the 15q11.2 BP1-BP2 region, the predicted biological processes can be summarized as follows: regulation of cell growth, magnesium ion transmembrane transport, regulation of axonogenesis, regulation of plasma membrane bounded cell projection organization, positive regulation of axon extension, regulation of cellular response to growth factor stimulus, regulation of developmental growth, positive regulation of cell projection organization, mitotic spindle organization, regulation of BMP signaling pathway, and positive regulation of plasma membrane bounded cell projection assembly.

Notably, NIPA1 protein was observed in our previous study to interact with 11 other proteins. Five (45%) of the 11 proteins were members of the BMP superfamily, three (27%) were BMP receptors and TGFB1 (9%) protein, indicating that three-fourths of the NIPA1

interacting proteins are important for developmental bone morphogenesis or multifunctional proteins that control proliferation, differentiation, and other functions in many cell types. The NIPA2 protein interacted with 19 other proteins with three (16%) involved with the BMP protein superfamily, three (16%) proteins interact with BMP receptors, ACVR1, TGFBR1, and six members of the SMAD superfamily of proteins (42%); all playing a role as intracellular signal transducers and transcriptional modulators activated by TGFB, thereby impacting bone morphogenesis and its related functions. Specifically, the Spastin protein, encoded by *SPAST,* was observed to interact with both NIPA1 and NIPA2. A variant in *SPAST* was found in Subject 9. This child had hypotonia and history of fine and gross motor delay as well as autism. Spastin severs polyglutamylated microtubules and likely has a role in axon growth and branching [31,32]. Mutations in both *SPAST* and *NIPA1* have been identified as causes of hereditary spastic paraplegia, a condition which causes progressive weakness and spasticity of the legs [33,34]. De novo variants in *SPAST* are evidenced to be associated with ASD with comorbid spastic paraplegia [35].

The CYFIP1 protein is also reported to interact with other proteins having a wide range of activity with functions related to cytoskeleton organization and actin filament binding with cell-matrix adhesion, MAP kinase signal transduction of cell growth, survival and differentiation, stimulation of glucose uptake, intracellular protein breakdown and tissue remodeling with mediation of translational repression [36]. We observed that five additional genes with either a de novo or non-deleted parent inherited variant in the affected child encode proteins that interact with CYFIP1. *GOLGA2* encodes a protein that acts as a membrane skeleton that maintains the structure of the Golgi apparatus. Mouse models of this gene indicate its involvement in brain morphology and the development and quantity of neurons [37,38]. Loss of this gene also resulted in ataxia in mice [37]. Subject 2 had a PDV in this gene as well as a disturbed motor phenotype, ASD, and neuropsychiatric behavior developmental phenotypes. *MEOX1* encodes a mesodermal transcription factor that plays a key role in somitogenesis, specifically sclerotome development. *MEOX1* is involved in overall organism development in humans [39] and mutations in mice result in evidence of congenital neurological disorders [40]. Subject 7 had a de novo PDV in *MEOX1* along with connective tissue defects, congenital malformations, epilepsy, and neuropsychiatric behavior developmental phenotypes. Finally, *FAT3* encodes an atypical cadherin protein and may play a role in the interactions between neurites derived from specific subsets of neurons during development. While *FAT3* was not included in any of the disease association categories we directly evaluated, missense mutations in this gene are associated with the neurodevelopmental disorder, Hirschsprung disease (https: //www.ncbi.nlm.nih.gov/clinvar/RCV000201304.1/) (accessed on 6 February 2021). Both Subjects 2 and 3 had a PDV in this gene and evidence of a neurodevelopmental phenotype and should be monitored for gastrointestinal issues.

#### *3.3. Identified Gene Variants with Potential Clinical Significance*

As the first effort to identify variants in the four 15q11.2 BP1- BP2 genes in this microdeletion syndrome, we analyzed the non-deleted alleles in affected patients and assessed family members as well, using whole-exome sequencing in order to compare the genomic data of related genes with clinical, cognitive, and behavioral data. Some of these variants are identified by the gene panels as potentially contributing to multiple phenotypes in our subjects, as in the case of *MLC1*, which was a candidate for macrocephaly and motor delay in Subject 9 and for contribution to epilepsy in Subject 7. All variants passing inclusion criteria for possibly damaging the gene in which it is located, and the corresponding evidence for clinical significance of having a variant in the gene, can be found in Supplemental Table S1.

Other particular genes of interest with PDVs include numerous genes of the collagen or COL group which code for proteins that make up various subtypes of collagen. Disturbances in these genes are known to cause several connective tissue disorders [41]. For example, in addition to the stop-loss variant in the connective tissue disorder gene *FLCN*,

Subject 5—who had significant joint hyperextensibility—inherited a frameshift in *COL5A3* from the parent with the deletion. Additional missense variants in collagen-encoding genes, *COL21A1* and *COL6A2*, were inherited from the non-deleted parent in Subject 7, who had joint hyperflexibility. Specifically, variation in *COL6A2* is associated with Ullrich congenital muscular dystrophy 1 (https://omim.org/entry/254090) (accessed on 6 February 2021). Subject 9 had an inframe deletion in *COL4A3* and a missense variant in *COL6A6* that was inherited from the deleted parent. While no joint hyperflexibility was noted, this patient had flat feet which may reflect collapse of connective tissues of the midfoot. Dysfunction in collagen proteins may also manifest as other symptoms. *COL4A3*, for instance, is implicated in disorders resulting in renal failure (https://omim.org/entry/120070) (accessed on 6 February 2021) and Subject 9 was also noted to have a history of constipation. Furthermore, variation in *COL6A6* has been implicated in skin disorders [42]. Subject 11, also without joint hyperflexibility, had the same missense variant identified in *COL6A6* and had birthmarks noted on the thigh and forehead.

Finally, *CDK19* gene is another candidate explaining clinical findings in Subject 11 who had a de novo missense variant in this gene. A disturbance in this gene has been associated with bilateral congenital retinal folds, microcephaly, and intellectual disability [43]. Of these findings, intellectual disability was present in Subject 11.

#### **4. Materials and Methods**

#### *4.1. Families with 15q11.2 BP1-BP2 Deletion or Burnside-Butler Syndrome (BBS)*

A total of five families with an affected child diagnosed with the 15q11.2 BP1-BP2 microdeletion were recruited and extensively evaluated using a series of cognitive, behavioral and motor assessments, family and medical histories, physical examination, and exome sequencing analyses. All participants or their legal guardians signed informed consent forms approved by the Institutional Review Board at the University of Kansas Medical Center (KUMC) before entry into the study. The five families included six affected children (3 males and 3 females) with one family having two affected children (Family A, Subjects 2 and 3) with BBS. Four mothers and one father had a confirmed 15q 11.2 BP1-BP2 deletion. Methylation-specific multiplex ligation-dependent probe amplification (MS-MLPA) was performed on two families for confirmation of the parents' cytogenetic diagnoses using existing methodology [36].

#### *4.2. Cognitive and Behavioral Measures*

Cognitive and behavioral assessments were administered to all members of our study cohort (except where otherwise noted). All interviews and observational measures were conducted by trained members of our study team. The following cognitive and behavioral measures were administered:

The Wechsler Abbreviated Scales of Intelligence [44] (WASI-II) was used to assess general cognitive abilities including verbal, perceptual (nonverbal), and full-scale IQ.

The Wide Range Achievement Test-4 [45] (WRAT-4) was used to assess basic academic skills implicated previously in BBS, including sentence comprehension, word reading, spelling, and math computation.

The California Verbal Learning Test [14] (CVLT) is a comprehensive assessment of verbal learning and memory that specifically measures short delay free recall, short delay cued recall, long delay free recall, long delay cued recall, and long delay recognition. The CVLT-II was used in adolescents and adults (aged 16 to 89 years; N = 6), while the California Verbal Learning Test for Children (CVLT-C) was administered to children (aged 5 to 16 years; N = 4).

The Peabody Picture Vocabulary Test, Fourth Edition [15] (PPVT-4) was used to assess receptive vocabulary. During the PPVT-4, participants are instructed to identify the picture that best matches a target word. The number of correctly identified pictures was examined.

The Trail Making Test [16] is a commonly used assessment of visual search, processing speed, and cognitive flexibility consisting of two parts: Part A primarily measures visual search and processing speed as participants draw lines connecting numbers in sequential order as fast as possible. Part B assesses cognitive flexibility as participants must alternate between connecting letters and numbers as fast as possible (A-1-B-2-C-3, etc.). Reaction time and the number of errors made were examined for Parts A and B.

The Autism Diagnostic Observation Schedule, Second Edition [17,18] (ADOS-2) is a semi-structured play- and conversation-based assessment of the core social, communication, and repetitive behaviors in ASD. Module 3 (for children/adolescents with fluent speech) was used for all children in our study cohort except one non-verbal child, who as noted in Table 2, was administered Module 1 (for children 31 months or older who are preverbal or only use single words). The social-affective and total algorithm scores were examined. Higher scores reflect more severe symptoms of ASD.

Broad Autism Phenotype Questionnaire [21] (BAP-Q) measures traits relating to the broad autism phenotype—the presence of core diagnostic symptoms of ASD (e.g., social/communication impairments and restricted, repetitive behaviors) that occur below the threshold for a clinical diagnosis. This questionnaire consists of three subscales: social abnormalities, pragmatic language difficulties, and rigid personality and a desire for sameness. The BAP-Q was administered only to the parents in our study cohort. We used cutoff scores defined by Sasson et al. [22] (Males ≥ 3.55, Females ≥ 3.17), where higher scores indicate greater presence of ASD traits.

Vineland Adaptive Behavior Scales-Third Edition [19] (VABS-III) is a series of semistructured caretaker interviews assessing current adaptive and daily living skills across four domains: communication, daily living skills, socialization, and motor skills.

The Repetitive Behavior Scale—Revised [23] (RBS-R) is a rating scale that assesses five categories of repetitive behavior (motor stereotypy, repetitive self-injury, compulsions, routines/sameness, restricted interests) commonly observed in individuals with developmental disabilities. Adult participants provided self-reported responses to the RBS-R, and a parent or caregiver provided responses on behalf of the child participants. Higher scores indicate greater severity of repetitive behavior. This measure does not have defined clinical thresholds, so we compared scores to age-dependent averages from a large cohort study of individuals with ASD [20].

## *4.3. Clinical Evaluation and Physical Examinations*

A complete physical examination was performed by a clinical geneticist (MGB) with anthropometric measures (e.g., head circumference, ear length, inner and outer canthal distances, hand and mid-finger lengths) and data recorded including a three-generation family pedigree. Past medical histories and review of systems were included for cognition, behavior, seizures, pulmonary, cardiac, gastrointestinal, genitourinary, renal, musculoskeletal, cutaneous, immune, and hematology. Clinical photographs were obtained following written consent.

#### *4.4. Postural Control Testing*

Participants completed a postural control task to assess gross sensorimotor ability. Data for children and adult participants with the 15q11.2 BP1-BP2 deletion were compared to age- and sex-matched control participants who were unrelated to the families affected by the BBS deletion. Control participants completed postural control testing as part of larger studies taking place at two separate research sites. Where noted, some of the task parameters differed between these two sites.

Postural control was assessed using an AMTI (American Mechanical Technology, Inc., Watertown, MA, USA) AccuGait strain gauge force platform. Participants were tested bare footed in a well-lit room. They were instructed to stand as still as possible on the platform with their feet shoulder width apart and their arms resting at their sides. Participants completed three trials each lasting 30–45 s.

To standardize the duration of data analyzed, the first five seconds of each trial were removed and only the subsequent 20 s were included for analysis. Trials during which the participant lost balance or did not remain still for the duration of the trial (e.g., took a step, sneezed, spoke, walked away, etc.) were excluded from analysis. Participants who had fewer than two useable trials were excluded from analyses.

The center of pressure (COP) time series were derived from the force and moment data during standing posture. The time series of each trial was down-sampled to 100 Hz (for five control participants, data were down-sampled to 200 Hz) and low-pass filtered using a fourth-order double pass Butterworth filter with a cutoff frequency of 6 Hz.

The variability of each participant's postural sway was quantified using the standard deviation (SD) of the COP in both the medial-lateral (ML) and anterior-posterior (AP) directions, as has been done in previous studies [30]. SD values were log transformed to correct for skewed distributions.

Data analyses were conducted separately for adults and children. Due to the small sample size of this study, we limited our analyses to the calculation of group means, standard deviations, and effect sizes (Cohen's d) and interpreted as small (d = 0.2), medium (d = 0.5), or large (d = 0.8) according to Cohen's standards [46]. One child with the 15q11.2 BP1-BP2 deletion (Subject 11) did not complete any postural control testing, so data from this participant and the matched control were not included in the analyses.

## *4.5. DNA Extraction*

Saliva and buccal cells were collected using a Saliva DNA Collection and Preservation Device (Norgen Biotek Corporation, Thorold, Ontario, CA, USA). Genomic DNA was isolated and purified using a Saliva DNA Isolation Kit (Norgen Biotek Corporation) according to manufacturer's protocol.

#### *4.6. Methylation Specific-Multiplex Ligation Probe Amplification (MS-MLPA)*

The MS-MLPA assay is a standard laboratory assay to examine for chromosome 15 deletions and was used to identify the presence or absence of the 15q11.2 BP1-BP2 deletion prior to enrolling individuals in this study. MS-MLPA was performed using reagents and kits obtained from MRC-Holland (Amsterdam, The Netherlands) including ME028-C1 kits containing sequence specific probes along the length of the 15q11.2-q13 region. The C1 kit contains 47 MLPA probes for copy number detection using fragment analysis following manufacturer's instructions and reported elsewhere [36].

#### *4.7. Whole-Exome Sequencing*

Genomic DNA (5 μg) samples from four families (5 affected children and 4 sets of parents) were exome sequenced at the KUMC Genomics Core facility, and one family was sequenced in a commercial laboratory (GeneDx, Gaithersburg, MD, USA). Exome sequencing was performed using the TruSeq Exome Library Prep Kit (Illumina FC-150- 1001, San Diego, CA, USA). High molecular weight (HMW) genomic DNA (gDNA) was fragmented using the Covaris S2 Ultra-Sonication system. Following fragmentation, DNA was end-repaired and 3' adenylated prior to adaptor ligation using TruSeq DNA indexed adapters. The DNA libraries were equalized to 233 or 250 ng and pooled for two rounds of enrichment using the TruSeq Exome capture probes. TruSeq Exome capture probes target 45 Mb of coding sequence from >98% of RefSeq, Consensus CDS (CCDS), and Ensembl coding content. Following the final exome capture, the enriched libraries were amplified using polymerase chain reaction (PCR) to generate sufficient yield for sequencing. Enriched amplified libraries were validated using the Agilent Bioanalyzer 2100 system and quantitated using the Roche LightCycler96 RealTime PCR system. Final library concentration results were used to dilute the library pool to 2 nM and pooled for multiplexed sequencing on a NovaSeq 6000 sequencing machine (Illumina, San Diego, CA, USA). The onboard clonal clustering procedure was automated during the NovaSeq 6000 sequencing run. The 100-cycle paired end sequencing was performed using the NovaSeq 6000 S2 Reagent Kit—200 cycle (Illumina 20012861). Following sequencing, the raw base call files (.bcl) were converted to fastq files and de-multiplexed into individual libraries using Illumina's bcl2fastq2 software.

#### *4.8. Variant Calling and Quality Control Procedures*

The initial read quality was assessed using the FastQC software1 [47]. The sequenced reads were then aligned to the human genome (hg38) using the Burrows–Wheeler aligner (BWA) [48]. The initial read quality was assessed using the FastQC software1 [47]. The sequenced reads were then aligned to the human genome (hg38) using the BWA [48]. Variant analyses of these data were performed in accordance with the Genome Analysis Toolkit (GATK) variant calling best practices pipeline [49]. Variant calling was as follows: The "MergeBamAlignment" tool was used to incorporate metadata to the aligned BAM files. The "MarkDuplicates" tool was used to locate and tag duplicate reads in the BAM files. The "BaseRecalibrator" tool was used to generate a recalibration table for base quality score recalibration (BQSR). Known polymorphic sites obtained from the GATK resource bundle were provided to this tool (Mills and 1000 G gold standard insertions or deletions (InDels) hg38 vcf, Homo sapiens assembly38 dbsnp138 vcf). The "ApplyBQSR" tool was run next in the two-stage process of base quality score recalibration. SNPs and InDels were called using the "HaplotypeCaller" tool. The resulting gVCF files were combined using the "CombineGVCFs" tool to form a multi-sample gVCF file. This file was passed for Joint genotyping using the "GenotypeGVCFs" tool. A recalibration model to score variant quality was built separately for InDels and SNPs using the "VariantRecalibrator" tool. The tool "ApplyVQSR" was then used with these models in the second stage of the variant quality score recalibration (VQSR) process to filter the input variants based on the recalibration table. The resulting data were further processed using the Genotype Refinement Workflow with the goal of improving the accuracy of genotype calls and discarding unreliable genotype calls. The first step of this process was to calculate the genotype posterior probabilities given family and known population genotypes using the "CalculateGenotypePosteriors" tool. The pedigree information of the families and a high confidence SNP file was provided to this tool. Variant calls were hard filtered to exclude genotype quality scores less than 20 (GQ < 20.0).

Figure 1 shows additional quality control and variant selection details from the wholeexome sequencing analysis. Specifically, single nucleotide variants (SNVs) were required to have read depth ≥10, InDels were required to have depth ≥28X when 2–5 base pairs (bp) long and ≥42X when 5–200 bps. To identify SNVs and InDels passing all quality control that were potentially damaging, the functional effects of variants on encoded proteins were predicted using the "SnpEff [50]" and Ensembl Variant Effect Predictor (VEP) tools [51]. Rare variants—with a maximum allele frequency (AF) <0.01% based on reference populations available in the 1000 Genomes Project, the European Standard Population and the Genome Aggregation Database—located in protein coding gene transcripts were selected. Rare variants predicted by VEP to have consequences that were moderately (i.e., inframe InDels, missense, or protein altering) or highly likely (i.e., splice site alterations, gains or losses of stop codons, loss of start codons, or frameshifts) to damage the protein products were then prioritized. Missense and start loss SNVs were further evaluated for potential deleterious effects using Sorting Intolerant From Tolerant (SIFT) [52], PolyPhen [53], and Grantham substitution scores [54] with inclusion criteria as follows: SIFT score < 0.05, PolyPhen score ≥ 0.70, and Grantham score ≥ 100 [54]. We then prioritized variants in affected children that were predicted de novo, or inherited from only one parent as additional evidence for potential clinical relevance.

## *4.9. Functional and Clinical Characterization of Genes with Variants*

The implicated diseases and biological functions of the genes with variants of interest were then identified using the Ingenuity Systems Pathway Analysis (IPA) tool [55]. Variant effects analyses were run considering molecules and/or direct relationships that were experimentally observed in mammals (i.e., humans, mice, rats) from all data sources. Two

separate analyses were run on genes with de novo variants and genes with inherited variants. As it was expected that de novo variants were more likely to represent clinically relevant findings [56], variant effects analysis of genes with de novo PDVs were conducted using evidence from all possible mutation consequences, including unclassified and silent mutations. For genes with PDVs that were transmitted to affected children from the parent without the 15q11.2 BP1-BP2 microdeletion, only evidence from likely damaging mutation consequences (i.e., null, frameshift, loss of function, missense, gain-of-function, knockout, in-frame, or nonsense mutations) were considered. In addition, as the specific symptoms of affected children were diagnosed (see below for details on measures and clinical evaluations), genes overrepresented at a Benjamini–Hochberg corrected *p* ≤ 0.05 in diseases and disorders reflecting these symptoms were selected for follow-up. These genes were then evaluated in Path Designer to determine if any direct or indirect relationships were observed or predicted downstream of the four genes encoded in the 15q11.2 BP1-BP2 deletion region.

To further evaluate genes with PDVs that may relate to symptoms observed in affected children, these genes were annotated based on evidence of associations with ASD using the 2020 Q2 release of the Simons Foundation for Autism Research Initiative (SFARI) gene list (https://gene.sfari.org/about-gene-scoring/) (accessed on 6 February 2021). In addition, genes were evaluated for inclusion on panels from Fulgent Genetics (https://www.fulgentgenetics.com/) (accessed on 6 February 2021) and Prevention Genetics (https://www.preventiongenetics.com/) (accessed on 6 February 2021), both Clinical Laboratory Improvement Amendments (CLIA) approved and accredited commercial laboratories. Interrogated panels from Fulgent Genetics included: epilepsy, intellectual disability, comprehensive cardiovascular defects, ataxia, microcephaly, macrocephaly, cerebral cortical malformations, neuronal migration disorders and malformations including cleft palate. Genes for connective tissue disorders reflected a combination of both the Fulgent Genetics and Prevention Genetics panels as the latter panel was more comprehensive.

#### **5. Conclusions**

Our study of the 15q11.2 BP1-BP2 deletion (Burnside-Butler) syndrome, an emerging disorder, found variants in genes beyond the four genes in the chromosome 15q11.2 BP1-BP2 region using exome sequencing with our inclusion criteria and correlations with reported gene and protein interactions with associated diseases or findings. Although we cannot claim that the gene variants found are causal for the clinical, behavioral, or cognitive findings observed (e.g., 9/11 subjects had adaptive functioning abilities below mean for age; 8/11 subjects demonstrated repetitive behavioral scores comparable to ASD subjects; 4 affected children tested had delayed motor milestones; joint instability or hyperextensibility was commonly observed) our data suggests that variants in genes encoded outside of the 15q11.2 BP1-BP2 region may be of interest to future research. Notably, four of the five probands in the five unrelated families inherited the 15q11.2 BP1-BP2 deletion from the mother while only one father (Family E) had the deletion. The affected child in Family E was also the most severely affected (ataxia, wheel-chair user, non-verbal, impaired cognition). This may indicate parent of origin effects, similar to those reported in Davis et al. [57], which included lower motor function and coordination when of paternal origin and more cognitive and behavioral disturbances and seizures when of maternal origin. It is also important to note that several families included in our study traveled long distances. The travel and testing required for participation may have selected for affected children and parents that are relatively less severe than others with this deletion. It is possible that many less severe symptoms reported in this analysis dataset relate primarily to dysfunction in the four core genes encoded in the 15q11.2 BP1-BP2 deleted region. Future studies focused on ascertaining individuals with more severe expression of the symptoms we report may identify more concrete evidence for damaging variants in other genes in addition to the four core genes. Importantly, replication and functional studies are needed to confirm these hypotheses. As noted in our study, a neurodevelopmental phenotype (autism, speech delay,

abnormal reflexes, and coordination) was most commonly found along with craniofacial findings (ear anomalies), and orthopedic or connective tissue problems (flat feet, scoliosis, hypermobile joints). These clinical, cognitive, behavioral, and genomic characterizations with reported protein interactions of the four genes of interest and associated diseases in this chromosome 15q11.2 BP1-BP2 deletion did support a role in the neurodevelopmentalautism phenotype seen in this emerging syndrome. Interaction with genes found and their paralogs may have contributed, but our observations are preliminary on a small sample size requiring replication. The authors hope that our findings will stimulate more research and directions for study, shedding light on diagnosis, treatment, and prognosis for affected individuals with this emerging syndrome having the most frequent microarray defect seen in patients presenting with neurodevelopmental disorders.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/1422-006 7/22/4/1660/s1.

**Author Contributions:** Research design conceptualization, O.J.V., M.G.B.; methodology, validation, investigation, formal analysis, data curation, I.B., R.L.S., W.A.H., S.G., O.J.V.; the networks, functional analyses were generated through the use of Ingenuity Pathway Analysis by O.J.V.; writing—original draft preparation, I.B., R.L.S., W.A.H., S.G., O.J.V., M.G.B.; writing—review and editing, I.B., O.J.V., M.G.B.; supervision, M.W.M., M.G.B.; funding acquisition, O.J.V., M.W.M., M.G.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** We acknowledge the National Institute of Child Health and Human Development (NICHD) grant number HD02528, KUMC Research Institute Clinical Pilot Research Grant Program, Kansas Intellectual and Developmental Disabilities Research Center (KIDDRC) grant number U54HD090216, the National Center for Advancing Translational Sciences (NCATS) Clinical and Translational Science Award (CTSA) awarded to the Frontiers: University of Kansas Clinical and Translational Science Institute (TL1TR002368), the National Institute of Mental Health (R01MH112734) and the National Library of Medicine (K01LM012870).

**Institutional Review Board Statement:** Participants were consented under an IRB-approved protocol (IRB#141167) that allows research. The University of Kansas Medical Center (KUMC) IRBapproved forms were signed by the participant and/or guardian prior to entry into the study.

**Informed Consent Statement:** All study participants or their legal guardian provided informed written consent prior to study enrollment.

**Data Availability Statement:** Genomic data that support the findings of this study are available in the Supplemental Table S1 and additional data are available from the authors upon reasonable request.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


## *Case Report* **40-Hz Auditory Steady-State Response (ASSR) as a Biomarker of Genetic Defects in the** *SHANK3* **Gene: A Case Report of 15-Year-Old Girl with a Rare Partial** *SHANK3* **Duplication**

**Anastasia K. Neklyudova 1, Galina V. Portnova 1, Anna B. Rebreikina 1, Victoria Yu Voinova 2,3, Svetlana G. Vorsanova 2,3, Ivan Y. Iourov 2,3 and Olga V. Sysoeva 1,\***


**Abstract:** *SHANK3* encodes a scaffold protein involved in postsynaptic receptor density in glutamatergic synapses, including those in the parvalbumin (PV)+ inhibitory neurons—the key players in the generation of sensory gamma oscillations, such as 40-Hz auditory steady-state response (ASSR). However, 40-Hz ASSR was not studied in relation to SHANK3 functioning. Here, we present a 15-year-old girl (SH01) with previously unreported duplication of the first seven exons of the *SHANK3* gene (22q13.33). SH01's electroencephalogram (EEG) during 40-Hz click trains of 500 ms duration binaurally presented with inter-trial intervals of 500–800 ms were compared with those from typically developing children (*n* = 32). SH01 was diagnosed with mild mental retardation and learning disabilities (F70.88), dysgraphia, dyslexia, and smaller vocabulary than typically developing (TD) peers. Her clinical phenotype resembled the phenotype of previously described patients with 22q13.33 microduplications (≈30 reported so far). SH01 had mild autistic symptoms but below the threshold for ASD diagnosis and microcephaly. No seizures or MRI abnormalities were reported. While SH01 had relatively preserved auditory event-related potential (ERP) with slightly attenuated P1, her 40-Hz ASSR was totally absent significantly deviating from TD's ASSR. The absence of 40-Hz ASSR in patients with microduplication, which affected the *SHANK3* gene, indicates deficient temporal resolution of the auditory system, which might underlie language problems and represent a neurophysiological biomarker of *SHANK3* abnormalities.

**Keywords:** 22q13.3 duplication; auditory steady-state response; ASSR; *SHANK3*; biomarker; auditory event-related potential; ERP; autism spectrum disorders; intellectual disabilities

## **1. Introduction**

SH3 and multiple ankyrin repeat domain 3 (*SHANK3*), also known as proline-rich synapse-associated protein 2 (ProSAP2), is a gene that encodes scaffolding proteins that organize postsynaptic density in excitatory synapses [1,2]. This gene is in the 22nd chromosome, 22q13.33 region. Deletion of this region as well as mutations lead to 22q13 Deletion Syndrome also known as Phelan–McDermid Syndrome (PMS) [3–8]. In most PMS cases, the *SHANK3* gene is affected that is believed to be the major cause of PMS.

Phelan–McDermid Syndrome (PMS) is a rare neurodevelopmental disorder with about 2000 cases identified so far [6]. However, many PMS cases can go unnoticed, as the diagnosis of PMS is often difficult due to the subtle appearance of the deletion of chromosome 22 and relatively mild physical and nonspecific clinical manifestation of the syndrome.

**Citation:** Neklyudova, A.K.; Portnova, G.V.; Rebreikina, A.B.; Voinova, V.Y.; Vorsanova, S.G.; Iourov, I.Y.; Sysoeva, O.V. 40-Hz Auditory Steady-State Response (ASSR) as a Biomarker of Genetic Defects in the *SHANK3* Dene: A Case Report of 15-Year-Old Girl with a Rare Partial *SHANK3* Duplication. *Int. J. Mol. Sci.* **2021**, *22*, 1898. https://doi.org/ 10.3390/ijms22041898

Academic Editor: Merlin G. Butler Received: 12 January 2021 Accepted: 9 February 2021 Published: 14 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Dysmorphic features in PMS include dysplastic nails, large or prominent ears, long eyelashes, wide nasal bridge, bulbous nose, and sacral dimple [3]. Major dysfunctions in PMS are hypotonia [3], global developmental delay, and severely delayed or absent speech [4]. Autistic traits are also present in most patients with PMS, suggesting PMS as a syndromic form of autism spectrum disorder (ASD) [9,10]. According to a recent meta-analysis, 0.7% of patients with ASD have *SHANK3* mutations, and this number is even higher (2.1%) for ASD patients with moderate to profound intellectual disability [11]. Moreover, altered methylation patterns in *SHANK3* were detected in ≈15% of postmortem autistic brain tissue [12], suggesting even more widespread implication of altered *SHANK3* expression to ASD development through epigenetic influence. Copy-number of variance (CNV) and point mutations of *SHANK3* have been also associated with intellectual disability and schizophrenia [11,13–19].

Few cases (*n*≈30) of duplication involving the *SHANK3* gene has been described in the literature: among patients with Asperger's syndrome, attention-deficit hyperactivity disorder (ADHD), bipolar disorder [20], schizophrenia [21], intellectual disabilities, delayed speech and language development [20–24]. While ASD is reported in patients with *SHANK3* duplication, ASD prevalence seems to be smaller than in *SHANK3* deletions or mutations (≈15% vs. >50%). Dysmorphic features of patients with duplications and mutations affected *SHANK3* gene included full lips, slightly upturned nose/anteverted nares, protruding ears, arch-shaped eyebrows. Microcephaly was reported in 15% of reported cases [23]. Resemblances between the cases with *SHANK3* duplication noticed by the researchers points to a distinct 22q13.33 duplication syndrome (for a recent update, see [23,24]). At the same time, the implication of both *SHANK3* deletion and duplication in neurodevelopmental and neuropsychiatric disorders suggests that *SHANK3* gene dosage is essential for correct brain function. However, one must be aware that microduplications does not always mean overexpression of the coded proteins, as an insertion of genetic material within the gene can alter nucleotide sequence and lead to abnormal protein code. Thus, detailed molecular genetic analysis is needed to infer whether the microduplication leads to gain or loss of *SHANK3* functioning.

Several animal models of ASD with deficient *Shank3* gene were developed. Mice with *Shank3* mutations/deletion exhibit ASD-like symptoms including social abnormalities and motor coordination problems [12,14,16,24–32]. The transgenic mice with mildly overexpress Shank3 proteins (≈50%) were also created [20,33,34]. These mice display manic-like hyperkinetic behaviors and decreased social interaction; however, unlike *Shank3* knockout mice (KO), *Shank3* transgenic mice did not exhibit repetitive behavior.

Shank3 determines the postsynaptic density of N-methyl-D-aspartate (NMDA) receptors. NMDAR is one of three ionotropic receptors to the main excitatory mediator in the brain: glutamate. Deviation in NMDAR function alters excitation/inhibition balance in neuronal circuitry and associates with autistic-like behavior in patients with ASD as well as in its animal models [26]. It is noteworthy that different *Shank3* mouse lines show similar NMDAR hypofunction [14,16,25–31].

Recent studies pointed to the abnormalities in inhibitory signaling in *Shank3*-mouse models of ASD. In particular, several studies [16,35] reported the reduced number of synaptic puncta containing parvalbumin (PV) as well as reduced PV expression of the PVexpressing gamma-aminobutyric acid (GABA) interneuron—the most abundant subtype of the inhibitory interneurons, which contribute to the perisomatic inhibition of glutamatergic principal cells. Supporting the implication of reduced inhibition to Shank3 deficits, an enhancer of GABA-mediated inhibitory transmission, clonazepam, normalizes the abnormal network firing pattern in cultured cortical neurons of *Shank3* KO mice [36].

Cortical gamma oscillations (30–100 Hz) are generated in recurrent circuits of excitatory and inhibitory neurons [37–39] and reflect the excitatory state of the neural network. While baseline, spontaneous gamma oscillations are studied in humans and animals, highfrequency oscillations are most reliably induced in response to sensory stimuli [40–43]. The evoked gamma-band activity can be studied with auditory steady-state response

(ASSR) [41–43]. ASSR refers to the ability of the neural populations to synchronize the timing of neural discharges with the frequency of external periodic auditory stimulation, e.g., click trains or amplitude modulated tone. ASSR is most pronounced in response to 40 Hz stimulation [44], coinciding with an intrinsic resonance frequency of cortical PV+ fastspiking interneurons [45,46]. This 40-Hz ASSR was recently proposed as a non-invasive biomarker of NMDA receptor function [47–50]. In mice the pharmacological modulation of NMDAR function by NMDA antagonists such as MK-801 or ketamine suggested an inverse relationship between ASSR and NMDA occupancy [48,51]. Nakao and colleagues [47] demonstrated robust ASSR deficits in the mutant mice with selective elimination of NM-DARs from PV+ interneurons in neocortex (Ppp1r2-cre/fGluN1 KO mice), suggesting a causal role of the NMDA receptors on this PV+ interneurons for neural entrainment at 40 Hz. Modeling studies supported this finding, emphasizing the link between NMDAR on PV+ interneurons and 40-Hz ASSR [52,53].

ASSR is reduced in schizophrenia (for meta-analysis see [54]), bipolar disorders [55–58], and autism spectrum disorders (ASD) [59,60], which are the disorders with implicated GABAergic dysfunctions and altered NMDA signaling. The 40-Hz ASSR deficit occurs in non-psychotic first-degree relatives of patients with schizophrenia [61] and ASD [59], which is consistent with an effect of familial or genetic risk factors. However, recent larger sample studies in children with ASD did not confirm ASSR reduction [62,63]. Such discrepancy might be related to the well-known heterogeneity of the ASD population. Even remarkably similar behavioral manifestations can be caused by different biological underpinnings, e.g., genetic etiology. Thus, examination of ASSR for the patient with known genetic abnormalities, associated with ASD, might be the Rosetta stone for the identification of subgroups of ASD patients based on common molecular–genetic and neurophysiological causes.

Gamma oscillations have been associated with perceptual organization, attention, memory, consciousness, language processing, and motor coordination [64]. The 40-Hz ASSR has been suggested as a candidate mechanism underlying the fast temporal integration and resolution of auditory inputs [41,42,65,66]. In neurotypical controls and elderly population, ASSR was correlated with gap detection threshold [66] and attenuation of speech perception under the presence of noise [65], pointing to the relevance of ASSR to language processing. In patients with schizophrenia, the 40-Hz ASSR positively correlated with the working memory performance [67], attentional functioning [68], and predicted the future global symptomatic outcome (GAF-S2) [69]. Thus, ASSR is linked to the cognitive functioning, which is altered in patients with *SHANK3* abnormalities.

The promising approach in building the causal link between genes and behavior is relating the genetic pathways converging on candidate cellular/molecular processes to the target neurophysiological phenotype. In line with this approach, here, we present the clinical and neurophysiological description of a 15-year-old girl with rare microduplication in 22q13.33, which affects the *SHANK3* gene. The study focused on examination of the 40-Hz ASSR response, which is crucially dependent on PV+ interneurons activity, one of the key targets of the *SHANK3* gene. At the behavioral level, ASSR is thought to reflect temporal integration and resolution of the auditory system and is linked to memory and speech-in-noise processing. Based on this logic, we hypothesize that this girl will have altered ASSR.

#### **2. Results**

#### *2.1. Genetic Information*

The girl, further referred as SH01, has normal karyotype (46, XX). Molecular genetics analysis using an SNP array revealed a duplication (size: 16,389 bp) spanning partially *SHANK3*. The duplication affected the first seven exons of the gene (Figure 1).

**Figure 1.** SNP array analysis demonstrating the duplication affecting the first seven exons of the *SHANK3* gene.

#### *2.2. Phenotyping, Clinical Description*

Anamnesis. SH01 was from full-term pregnancy from healthy parents, who were 39 years old at the time of the girl's birth. Her weight at birth was 3.040 g and length 52 cm, Apgar score was 7/8. Motor milestones were achieved within normal limits with holding her head at 2 months, sat down at 6 months, stood with the support at 10 months, began to walk alone at 11 months. However, language milestones were slightly delayed with the first syllables appeared at 12 months followed by a relatively long time of no phrases. Short sentences appeared at the age of 3. Cognitive development was also delayed, with a lack of interest in books and cartoons until age 3. At about this age, SH01 developed aggressiveness toward peers (e.g., biting) and protest behavior. At kindergarten, she referred her bad behavior to a fictional peer-boy. Aggressive behavior was resolved when she was about 10 years old. Currently, she might have some rare periods of self-aggression (biting) when too angry and unsatisfied. SH01 started normal school together with typically developing (TD) peers, but by the end of primary school, she was referred to specialists due to the problems with dealing with the school program (especially Math). However, she managed to continue the study in the school with TD peers with the support of the specialists and parents. Menstruation was regular and started at 10 years of age.

SH01 took part in our EEG/ERP experiment at age 15.06 years old. Her official diagnoses were mild mental retardation and other deficits of behavior due to other specified causes (F70.88), and organic emotionally labile [asthenic] disorder with unspecified cause (F06.69). Diagnoses were obtained from the recent clinical reports provided by experienced psychiatrists from the Moscow Research Institute of Psychiatry and Scientific and Practical Center for Mental Health of Children and Adolescents, which is a leading Moscow organization for the diagnosis of mental health problems. The report from a psychologist confirmed mild mental retardation by the Wechsler Intelligence Scale for Children (Russian adaptation based on original Wechsler Intelligence Scale for Children [70]): verbal IQ = 71; nonverbal/performance IQ = 64; full-scale IQ = 64. The psychologist also pointed to unstable attention, smaller memory span, a fluctuating but lower speed of performance, quicker tiredness and loss of work efficiency, as well as infantilism, protest behavior, irritability, emotional liability, problems with understanding the social context, lack of self-critique and motivation to overcome difficulties, and a preponderance of recreational entertaining interests over educational and cognitive ones. A speech therapist revealed mild forms of dysgraphia and dyslexia.

Parents' major concerns at age 15 were learning disabilities, behavioral disorders, and irritability. The girl was sociable, and her mild cognitive impairment was hardly noticeable in daily routines. Her mild speech underdevelopment manifested in rare problems to pronounce long and complex words and smaller vocabulary than typically developing (TD) peers. She attended the 9th grade of normal middle school that required great efforts from her parents. While she hardly managed to make any homework by herself, she was considering continuing her education in high school, pointing to the lack of adequate self-assessment. Among her interests was performing in a school theater. She used her right hand to write and to eat. At EEG/ERP examination, she showed infantile childish behavior, demanding attention of others and especially her mom, which was not typical for her age (e.g., she asked her mom to stay with her in the experimental room).

Physical parameters at the age of 15: 163 cm (50–75 percentile), 50 kg (50–75 percentile), head circumference of 51.5 cm (lower than 3rd percentile). Facial phenotype included elongated face, protruding auricles. There were short 5th fingers on the hands, a sandal gap. The girl had mild scoliosis, valgus deformity of the knee joints, and planovalgus feet.

Autistic characteristics as assessed at age 15. SH01 s T-scores on social responsiveness scale (SRS) equals 63, which referred to mild autistic symptoms [71], while neither the Autism Diagnostic Interview-Revised [72] (ADI-R, with subscale social interaction A-4 scores, Communication and language B-2, repetitive and restricted behavior C-1, early developmental problems, 1-36 months, D-1) nor psychiatric assessment support the ASD diagnosis.

Magnetic resonance imaging (MRI) at age 15: The hemispheres of the brain were symmetrical. No focal changes in the intensity of the MR signal from the substance of the brain, cerebellum, or brain stem were found. Differentiation into cortical and medullary substances was expressed satisfactorily. The lateral ventricles were symmetrical, not dilated. The hind horns were deepened. The cerebellum was typically located. The pituitary gland was not changed with preserved structure. The adeno- and neurohypophysis was clearly differentiated. Chiasma did not change. The optic nerves were clearly visible. The median structures were not displaced. The craniovertebral junction was not changed.

Other laboratory examinations at age 15. Echocardiography revealed ectopic chords and trabeculae in the left ventricular cavity, mitral valve prolapse with 1+ regurgitation, tricuspid valve prolapse with 1.5+ regurgitation. Ultrasound examination showed bilateral nephroptosis. X-ray showed a short fifth finger metacarpal bone of the left hand. Pulmonary examination revealed moderate bronchial asthma, atopic, with polyvalent sensitization.

Medications at age 15: SH01 took phenibut, 250 mg three times a day to control behavior and levothyroxine (L-thyroxine) 50 mL to treat her asthma (diagnosis J45.0— Predominantly allergic asthma).

## *2.3. Clinical EEG*

The voltage of EEG activity was in accordance with the healthy peers' EEG voltage; significant asymmetry of the background EEG was not detected. EEG recordings with eyes closed demonstrated normal background EEG with dominate alpha rhythm (Figure 2a). In 2018, it had an amplitude of 107 μV and 87 μV (maximal and mean, respectively) and a frequency of 9.1 Hz in the left hemisphere and amplitude of 104 μV and 69 μV (maximal and mean, respectively) and frequency 9.3 Hz in the right hemisphere. In 2020, the dominate alpha rhythm in the eyes closed condition had an amplitude of 93 μV and 67 μV (maximal and mean, respectively) and frequency of 9.5 Hz in the left hemisphere and amplitude of 94 μV and 63 μV (maximal and mean, respectively) and frequency of 9.7 Hz in the right hemisphere. The abnormalities of the background EEG (Figure 2b) could be described as intermittent theta, slowing (3.5–5.5 Hz and 80–140 μV) in the right hemisphere in 2018; in 2020, abnormalities of the background EEG could be described as sporadic spike and polyspike discharges (100–150 μV) arising from the right centrotemporal region.

#### *2.4. ASSR/Auditory ERP*

SH01 s 40-Hz ASSR and auditory ERP were compared with two control groups of typically developing (TD) children: the first one ("old", *n* = 13, seven females, mean age 16.04 (SD = 1.9), ranged 12–18) was age-matched with our patient SH01 (age = 15.06). The second subgroup ("young") consisted of 19 participants (14 female, five male) with an average age of 7.8 (SD = 2.6), ranged 3–12. Comparison groups of different ages were selected to examine if the suggested alternation in SH01 s neurophysiological responses to sounds might be linked to the developmental delay in brain maturation (as changes of auditory ERP and ASSR with age are known, see Section 3) or represent more general phenomena. Table 1 summarizes the results.


**Figure 2.** (**a**) Dominate alpha rhythm with eyes closed, (**b**) the abnormalities of the background EEG.

**Table 1.** Amplitudes of 40-Hz auditory steady-state response (ASSR) and event-related potential (ERP) components (mean ± STD) for two comparison groups of typically developing children and SH01 in Fz electrode.


The 40-Hz ASSR was clearly identified in the TD groups and was dominant at frontal sites (Figures 3 and 4). Consistent with previous reports, 40-Hz ASSR peaked about 200 ms post-train onset and persisted over the whole period of stimuli presentation in all TD participants, which were significantly higher in the older control group than in the younger one (*t* (30) = 2.362, *p* = 0.025), as can be also seen in Figure 5, which represents the individual 40-Hz ASSR values averaged over the whole period of stimuli presentation. At the same time, 40-Hz ASSR were totally absent in SH01 (Figure 3), being significantly smaller compared to any of the TD groups (old vs. SH01: *t* (12) = 9.6602, *p* < 0.0001; young vs. SH01: *t* (18) = 5.684, *p* < 0.0001). Moreover, there were no TD participant in the old, age-matched group who had 40-Hz ASSR value below that of SH01 (minimum value in the TD old group being 0.053 μV, SH01 s ASSR = −0.015 μV), suggesting a very robust effect (Figure 4, Figure A1 in Appendix A).

**Figure 3.** Envelope curve of 40-Hz ASSR obtained after Hilbert transform from electrode Fz. The ASSR of SH01 is shown in green, that of the young group of typically developing children (TD young) is in red, and that of the old, age-matched to SH01 group (TD old) is in blue. Opaque blue and red shading illustrate the 95% confidence interval. The time of stimulus presentation is 0.

**Figure 4.** Topographic map of 40-Hz ASSR amplitude averaged over the period of 0–500 ms. The "old" group is represented in (**a**), the "young" group is represented in (**b**), and the values of SH01 are represented in (**c**).

**Figure 5.** Individual values of 40-Hz ASSR across the groups (Fz electrode). The old TD group's values are shown in the first column, the young TD group's values are shown in the second, and SH01's values are shown in the third. (\* shows significant differences *p* < 0.05, \*\*\* shows significant differences *p* < 0.0001).

Figure 6 represents the auditory event-related potentials to the same stimuli that fail to elicit 40-Hz ASSR in SH01. Despite such a drastic alteration in ASSR response, auditory ERP in SH01 were much more similar to that of TD groups (Figure 6, Figure A2 for individual ERPs). The old TD group was characterized by prominent P1, N1, and N2 components, which were registered after the 40-Hz train onset. SH01 s ERP generally resembled that of the old TD ERPs, with only SH01 s P1 components being significantly smaller than that in her age-matched group (t (12) = 3.484, *p* = 0.005), while N1 (t (12) = 1.864, *p* = 0.087) and P2 (t (12) = −2.099, *p* = 0.058) were unremarkable as represented in Figures 6 and 7. For the peak values of major ERP components, see Table 1. As for younger TD participants, their ERPs was characterized by the absence of a clear N1 response, corresponding with wellknown developmental change in the ERPs structure (old TD vs. young TD: t (30) = −3.524, *p* = 0.0014). For all components, the SH01 s ERPs differed from the young groups (P1: *t* (18) = 5.683, *p* < 0.0001; N1: *t* (18) = 5.863, *p* < 0.0001; P2: *t* (18) = 2.554, *p* = 0.02). Thus, the auditory ERP in SH01 was more similar to her peers than to the young control group, pointing to a generally preserved development of auditory ERP structure.

**Figure 6.** Auditory event-related potentials (ERPs), in Fz electrode with SH01 shown in green, the younger TD group (TD young) is shown in red, and the old, age-matched with SH01 control group (TD old) is shown in blue. Opaque blue and red shading illustrate 95% confidence interval. The time of stimulus presentation is 0. Time windows for P1 (50–80 ms), N1 (80–120 ms), and P2 (130–160 ms) are shown in rectangles.

**Figure 7.** ERPs components across groups. P1 component is shown in (**a**), N1 is shown in (**b**), P2 is shown in (**c**). The old typically developing (TD) group's values are shown in the first column, the young TD's values are shown in a second, and SH01's value is shown in the third (\* shows significant differences *p* < 0.05, \*\*\* shows significant differences *p* < 0.0001).

#### **3. Discussion**

Our report presents a new patient with unique duplication of the first seven exons of the *SHANK3* gene, adding one more case to the about 30 patients with 22q13.3 duplications described in previous studies [23]. For the first time, we describe the neurophysiological phenotype of a patient with 22q13.3 duplications. The major focus of our study was on the 40-Hz ASSR, a brain response to high-frequency auditory stimulation, which is thought to underlie temporal binding and speech-in-noise processing [65]. This choice was motivated by the studies that reported 40-Hz ASSR as a biomarker of NMDAR density and PV+ interneurons functioning, as they are dependent on *SHANK3* gene activity [48–51]. Here, we report a striking absence of 40-Hz ASSR in SH01, collaborating our initial hypothesis. Below, we discuss our findings in more detail.

The clinical phenotype of SH01 resembles that described for few patients with 22q13.3 microduplication (*n* = 29, [23], Table 2), although clinical features in the 22q13.3 duplication syndromes show great variability. Among common features are intellectual disabilities

(*n* = 15), attention deficits (*n* = 5), and language problems (*n* = 11). Physical dysmorphic features have been also reported in these patients previously, including sandal gap (*n* = 1) and protruding or low-set deformed ears (*n* = 3), microcephaly (*n* = 5). One previously described patient with 22q13.3 microduplication [21] shared with our patient irritability and scoliosis, as well as mild mental retardation and attention deficits. It is noteworthy that a girl showed normal development until 13 years old but later was diagnosed with borderline intellectual functioning and disorganized schizophrenia. At the same time, unlike few patients with 22q13.3 microduplication who were diagnosed with autism spectrum disorders (*n* = 5) and epilepsy (*n* = 4), our patient SH01 does not have epilepsy, only some minor epileptiform activity in EEG, and does not have enough symptoms to get a diagnosis of autism spectrum disorder, while her SRS score suggested some autistic features. SH01 phenotype was also compared to the more studied 22q13.3 deletion syndrome. SH01 shared with patients of this syndrome intellectual disabilities and language problems, as well as autistic features, although their manifestations are milder in SH01 [3,73–75]. Among the dysmorphic features reported in patients with PMS, SH01 also has an elongated skull. Thus, the clinical description of SH01 contains both common and distinct features with patients with different types of abnormalities affecting *SHANK3*, while it more resembles those with *SHANK3* microduplications, pointing to a partially distinct phenotype of 22q13.3 duplication and deletion. For convenience, Table 2 shows the prevalence of individual clinical features in patients with *SHANK3* duplication and deletion.

**Table 2.** Clinical phenotype of patients with 22q13.3 duplication and 22q13.3 deletion/mutations. In addition to individual cases of patients with 22q13.3 duplication (that include *SHANK3* gene), the last two lines show the overall occurrence (in percentage) of the symptoms in patients with 22q13.3 duplication (based on [21] and our own review of individual cases reported previously) and 22q13.3 deletion/mutations (taken from previous reviews). Our subject, SH01, is also included for comparison. As patients of different ages are described, both IQ and developmental quotient (DQ) are used to characterize mental retardation. +/− reflects the presence/absence of the symptom.



**Table 2.** *Cont.*

Our study indicates a general preservation of auditory ERP in SH01 with the pronounced N1-P2 response, which is typical for TD teens. At the same time, the P1 component that usually decreases with age [83,84] is not evident in SH01, with amplitude within P1 latency being significantly smaller in SH01 not only as compared to the younger cohort but also as compared to the age-matched control group. Unfortunately, we are not aware of any ERP study conducted in patients with the 22q13.3 microduplication. Thus, we compare our results with those obtained in patients with point mutations or deletion in 22q13.3 (PMS). Consistent with our finding of reduced P1 response to auditory stimuli, Reese and colleagues [85] found a reduction of P50 in response to the repeated tone in patients with PMS. It is noteworthy that the reduction was significant only for the female participants. Thus, there might be some common deficits in the early stage of auditory processing in the auditory cortex in patients with abnormalities related to the *SHANK3* gene. The decrease in the early component of visual ERP to checkerboard stimuli, which was registered within the same latency range, 50-75 ms post-stimulus, was also reported in PMS [86,87], pointing to the fact that neurophysiological abnormalities occur in PMS at the early stages of sensory processing regardless of the modality of stimulation. Whether such deficits also spread to the visual system in our patient was not studied. It is noteworthy that the attenuation of P1 in response to auditory stimuli was also reported in patients with idiopathic autism [88–90], linking these behavioral and neurophysiological abnormalities.

As for the later components, patients with PMS showed a reduction of P2 component in response to the repeated tones [85,91] as well as a decrease in the latency of N250 in response to oddball stimuli [92]. In our study, neither P2 nor N250 were affected, and P2 even tended to be larger in SH01 than in the age-matched controls. Such discrepancy might indicate different neurophysiological phenotypes for 22q13.3 duplication/deletion or just be related to a methodological difference between the studies.

The focus of our study was 40-Hz ASSR, as we hypothesize its abnormality in our patient based on previous literature. Indeed, we found a striking absence of 40-Hz ASSR in SH01. Considering relatively normal auditory ERP in SH01, such finding points to specific deficits in following high-frequency auditory signals. An absence of 40-Hz ASSR might underlie a disruption of temporal integration and binding mechanism in audition that is linked with PV+ interneurons functioning [41,42]. As 40-Hz ASSR was correlated with speech-in-noise perception [65], an absence of ASSR might be related to speech decoding. At the same time, 40-Hz ASSR seems to reflect not a primary mechanism for speech comprehension, as the total absence of 40-Hz ASSR does not prevent SH01 from learning language and being fluent in everyday life. Rather, 40-Hz ASSR reflects some modulatory mechanism that helps to differentiate speech sounds, making it easier to learn and communicate. Abnormalities in such modulatory mechanism can cause low vocabulary and some complex words' pronunciation problems, as well as mild dysgraphia and dyslexia, as observed in SH01. Further studies are needed to fully examine SH01 s phonematic abilities, which are related to speech perception, to shed light on the particular process disrupted with the absence of 40-Hz ASSR related to *SHANK3* abnormality.

While ASSR is modulated by age [44,93,94], our 15-year-old patient's 40-Hz ASSR was significantly smaller not only as compared to age-matched control group but also in relation to data obtained in children aged 3–12 years old. Thus, her 40-Hz ASSR deficiency is hard to explain by the developmental delay, or such a delay should be very profound with SH01 s ASSR corresponding to that from TD children under 5–8 years old [95,96].

While 40-Hz ASSR was previously studied neither in patients with 22q13.3 deletion/duplication nor their animal models, Engineer and colleagues [97] observed a drastic attenuation of neuronal firing rate in response to rapidly presented sounds in the *Shank3* deficient rat model. These authors showed that the number of driven spikes evoked by noise bursts and speech sounds as well as the spontaneous firing rate were significantly weaker in *Shank3*-deficient rats compared to control rats. In relation to our results, the effect was most pronounced when the stimuli were presented with short inter-stimulus intervals below 100 ms, especially in the primary auditory cortex and anterior auditory field. Taken together, the results point to the problems of following the rapidly presented auditory signal as general phenomena, which is related to *SHANK3* abnormalities. This might indicate that the gain and loss of *SHANK3* function share a common neurophysiological phenotype. It might also point to the potential loss-of-function effect of microduplication within the *SHANK3* gene. More detailed molecular genetic analysis and modeling might help resolve these alternatives.

Our study has implications to the heterogeneous population of idiopathic ASD with a significant percentage of its cases related to *SHANK3* abnormalities and having language problems. ASSR being easy to assess as a non-invasive index of functioning of PV+ neurons and NMDAR signaling, stemming from *SHANK3* abnormalities, can help segregate the ASD population based on neurophysiological and molecular genetic underpinnings.

Our study is not without limitations. First, it is just a case report of one patient's data. While SH01 is an incredibly unique patient, more studies are needed to confirm our observations. As there have been about 30 patients described so far, our study aims to promote the ASSR paradigm among other researchers and clinicians, inviting them to run 40-Hz ASSR in other patients with the 22q13.3 duplication identified world-wide. These studies are an important step toward the validation of this neurophysiological biomarker.

SH01 also took phenibut as regular medicine. As this drug is a GABA-receptor agonist, it might potentially influence ASSR. To rule out such an explanation, we compared SH01 with a kid without known genetic abnormalities, who also took Phenibut for migraine treatment. Such control kid exhibits typically pronounced 40-Hz ASSR (Appendix B, Figure A3). At the same time, more research on a larger sample is needed to examine the effects of phenibut on ASSR.

One may also relate absent 40-Hz ASSR to the hearing problems, arousal level, or attention deficits, as some researchers found ASSR modulation by these factors [44,98,99]. However, normal auditory ERP in response to the same stimuli rule out these explanations, as e.g., an N1 component was also shown to be modulated by attention and stimulisubjective intensity [100]. Moreover, the P2 component, that was reported to be attenuated in participants with moderate to severe sensorineural hearing loss [101] even tended to be larger in SH01, pointing to an increased rather than decreased auditory sensitivity in SH01.

#### **4. Materials and Methods**

#### *4.1. Participants*

Thirty-two typically developing (TD) children were recruited from the local community to take part in a study as a comparison group. According to their parents or guardians, they did not have a history of neuropsychiatric conditions and had normal or corrected to normal vision and hearing. Except for one participant (this case is described in Appendix B), none of the participants reported to be taking any medicines. TD participants

were divided into two subgroups by age (Appendix C, Table A1). The first one ("old") was age-matched with the participant SH01 (age = 15.06). It consisted of 13 people (7 female, 6 male) with an average age of 16.04 (SD = 1.9). The second subgroup "young") consisted of 19 participants (14 female, 5 male) with an average age of 7.8 (SD = 2.6).

Almost all participants' guardians filled in the Russian translation of the Social Responsiveness Scale, second edition (SRS-2) [71], the school age version for the "young" group and the school age or adult version for the "old" group. Threshold values for any social behavior deficiencies are 58 T-scores for males and 63 T-scores for females. None of the participants from the "old" group exceeded the threshold value (range 16–56, mean = 37, SD = 13), and only one participant from the "young" group had a greater value (range 11–62, mean = 27, SD = 14). Six participants did not have SRS values. More detailed characteristics of comparison samples are presented in Table A1.

The SH01 patient underwent a full clinical assessment at the Research Clinical Institute of Pediatrics by experienced clinicians. In addition, Autism Diagnostic Interview– Revised [72], an investigator-based semi-structured instrument, was administered by a trained interviewer to SH01 s mom. It was used to assess autistic traits in SH01.

The study was approved by the local ethics committee of the Institute of Higher Nervous Activity and Neurophysiology, Russian Academy of Sciences, protocol number 3, date of approval July, 10th, 2020, and was conducted following the ethical principles regarding human experimentation (Helsinki Declaration). All children provided their verbal consent to participate in the study and were informed about their right to withdraw from the study at any time during the testing. Written informed consent was also obtained from a parent/guardian of each child.

## *4.2. EEG Recording*

Electroencephalographic data were recorded using the NeuroTravel system with 32-scalp electrodes arranged according to the international 10–10 system. Ear lobe electrodes were used as reference, and the grounding electrode was placed centrally. For clinical EEG, periods of resting activity were recorded as well as a test on closing and opening the eyes.

#### *4.3. Stimuli and Paradigm*

The ASSR paradigm consisted of 40-Hz click train stimuli, which were presented binaurally through foam insert earphones for 500 ms at 80 dB sound pressure level. Inter-trial intervals varied from 500 to 800 ms. The total number of trials was 150, and the duration of the whole paradigm was around three minutes. Stimuli were presented via Presentation software (NeuroBehavioral Systems, Albany, CA, USA). During the experiment, participants were sitting in a dimmed room and watching a silent video of their choice.

## *4.4. Data Analysis*

EEG analysis was performed using MATLAB (Version—b, The MathWorks, Natick, MA, USA), Fieldtrip software (https://www.fieldtriptoolbox.org/, [102]), as well as customized scripts. Peak values were compared using two-tailed Student's t-test for independent groups.

#### 4.4.1. ASSR Analysis

First, the raw data were segmented into epochs with an interval of 200 ms before the stimulus and 800 ms after it. Then, the signal was filtered at a frequency of 35–45 Hz, and trials with amplitude within 3 STD of the mean were averaged. The mean number of selected trials was 97 ± 34 for the "old" group and 80 ± 17 for the "young" group. There were 182 good trials for SH01. To better characterize 40-Hz ASSR, we extracted the envelope of the signal using the Hilbert transform. The absolute value of this linear integral transformation reflects the envelope of the grand-average waveform (see Figure A4). Baseline correction for the 200–0 ms was applied. These steps were conducted for all participants, including the patient with microduplication affecting *Shank3*, SH01. For further analysis, we chose the Fz electrode, since according to topographic data (see Figure 3), ASSR has a maximum response near Fz. It is also consistent with the literature, which reports that ASSR is most pronounced in this site [103,104]. Then, we averaged the values of the envelope curve after Hilbert transform in the Fz electrode over the whole period of stimuli presentation (0–500 ms) and compared the results of SH01 with the average values of each comparison group.

#### 4.4.2. ERP Analysis

ERP for auditory stimuli were created with filtering band-pass 1–30 Hz using the Fieldtrip function for all participants. The averaging epoch was the same as in ASSR analysis: 200 ms before the stimulus and 800 ms after it, but for later analysis, we focused on the period of –200–400 ms. Only trials with an amplitude within 3 STD of the mean were averaged. The mean number of good trials was 99 ± 43 for the old group, 96 ± 18 for the young group, and 69 for SH01. Then, we calculated ERPs peak values for all participants. The timeframes for each component were chosen based on grand-averaged peak latencies (P1: 50–80 ms; N1: 80–120 ms; P2: 130–160 ms).

#### *4.5. Molecular Genetic Analysis*

Molecular genetic analysis of SH01 was performed by CytoScan HD Arrays (Affymetrix, Santa Clara, CA, USA), which consists of about 2.7 million markers (resolution: >1 kbp). The results were visualized by the Affymetrix ChAS (Chromosome Analysis Suite) CytoScan® HD Array software (reference sequence GRCh37/hg19). The procedures were earlier described in detail [105,106]. All the genomic variations uncovered by the molecular genetic analysis were analyzed using a panel of bioinformatic techniques targeting the phenotypic outcome of each variation. The procedure was described previously in a step-by-step manner [107,108].

#### **5. Conclusions**

Our study demonstrates a link between duplication of the first seven exons of the *SHANK3* gene and alteration of brain response to high-frequency auditory input, 40-Hz ASSR: a neurophysiological phenotype believed to be mediated by hypofunctional NMDA receptor signaling on the parvalbumin (PV)+ inhibitory neurons, which depends on SHANK3 abnormality. As reported in our manuscript, the absence of 40-Hz ASSR in the patient with microduplication that affected *SHANK3* gene points to a deficient temporal resolution of this patient's auditory system, which might underlie the language problems observed in our patient as well as in many patients with abnormal functioning of the *SHANK3* gene.

**Author Contributions:** Conceptualization, O.V.S.; methodology, O.V.S. and V.Y.V.; data collection, A.K.N., A.B.R., O.V.S. and V.Y.V.; data analysis, A.K.N., G.V.P. and O.V.S.; resources and data curation, O.V.S. and V.Y.V.; genetic analysis, I.Y.I. and S.G.V.; writing—original draft preparation, O.V.S. and A.K.N.; writing—review and editing—all authors; visualization, A.K.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Russian Science Foundation (RSF), grant #20-68-46042.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of the Institute of Higher Nervous Activity and Neurophysiology (protocol code 3, date of approval 10 July 2020).

**Informed Consent Statement:** All children provided their verbal consent to participate in the study and were informed about their right to withdraw from the study at any time during the testing. Written informed consent was also obtained from a parent/guardian of each child to publish this paper.

**Data Availability Statement:** Data available on request due to restrictions.

**Acknowledgments:** We thank all the participants of our study and their parents for their support and dedication to science. This work would not be possible without them.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Abbreviations**


## **Appendix A**

*Individual ASSRs and ERPs*

**Figure A1.** Individual values (μV) of 40-Hz ASSR obtained after Hilbert transform for age-matched comparison group ('old') and values of SH01 (presented in bold black). The time of stimulus presentation is 0.

**Figure A2.** Individual ERPs (μV) for age-matched comparison group ('old') and SH01 s ERP (presented in bold black).

## **Appendix B**

#### *Phenibut effect on ASSR*

As mentioned above, one of the participants from the young group reported to be taking Phenibut. He was 8.41 years old and had normal SRS T-scores (19). He was taking half a tablet (250 mg) twice a day and was prescribed phenibut for a migraine attack. As it can be seen in Figure A3, D038 has ASSR within the range of other TD children from the young group (*t* (17) = −0.333, *p* = 0.743) and clearly above that of SH01. Thus, we can conclude that phenibut does not cause abnormally low ASSR in SH01.

**Figure A3.** Comparison of ASSR values (μV) for the participant D038 who took phenibut (violet) and the participant SH01 (green). Individual amplitudes of 'young' group are shown in grey. The time of stimulus presentation is 0.

#### **Appendix C**

**Table A1.** Description of comparison groups including age, sex and SRS T-scores.


**Figure A4.** Illustration of Hilbert transformation performed for the analysis of ASSR. Red line corresponds with the signal obtained after filtering the data (35–45 Hz) and averaging all good trials. Blue line indicates Hilbert transformation of 35–45 Hz filtered ERPs.

## **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*International Journal of Molecular Sciences* Editorial Office E-mail: ijms@mdpi.com www.mdpi.com/journal/ijms

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com