1. Introduction
Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder that exhibits a wide range of symptoms of varying severity and has a high prevalence in the Gulf region [
1]. Recent research suggests that gene variants, altered biological mechanisms, and the environment together influence the development of autism [
2]. The most common symptoms of autism are impaired social skills, difficulty with communication, and repetitive movements. Autism can be detected at the age of 3 years, and perhaps as early as 18 months [
3]. ASD heritability has been estimated to be 40–90%, and a number of common variants contribute to the significant variability in the condition, in addition to epigenetic, environmental, and hormonal factors [
4,
5,
6]. Moreover, the Diagnostic and Statistical Manual of Mental Disorders—Fifth Edition includes atypical sensory processing, which has been reported in more than 70% of patients with ASD as a diagnostic criterion (DSM-5; American Psychiatric Association, 2013) [
7,
8,
9].
Genes differentially expressed in autism are important for understanding the associated pathways [
10,
11,
12]. This meta-analysis looked at ten Gene Expression Omnibus datasets, which included 364 cases and 248 controls. It discovered 3105 genes that were consistently expressed differently in autism (1680 downregulated genes and 1425 upregulated genes).
PNPLA2, LYPLA2, LYPLA2P1, PLA2G4D, PLA2G6, PLA2G7, and
PLA2G5 were also discovered to be related to phospholipase A2 (PLA2). The enriched gene ontology terms for molecular functions were involved in structural constituents of ribosomes and transcription regulator activity, while the enriched gene ontology terms for biological processes were involved in translational elongation and the response to cytokine stimuli, according to the study. The ribosome route was the most important pathway in our KEGG study. The meta-analysis discovered genes that were consistently differentially expressed in autism, as well as biological pathways linked to gene expression alterations [
10]. Topological analysis of the autistic brain identified proteomic hub gene signatures:
CDK2,
BAG3,
MYC,
TP53,
HDAC1,
CDKN1A,
EZH2,
GABARAPL1,
TRAF1, and
VIM [
11]. The study also revealed the transcriptional regulating factors, such as
YY1,
FOXL1,
USF2,
FOXC1,
GATA2, NFIC,
E2F1,
NFKB1,
TFAP2A, and
HINFP [
11].
Despite rapid developments in genetic techniques, for diagnostic purposes, common variants of neuropsychiatric disorders are still lacking. An exome microarray provides a potential diagnostic method for many complex genetic diseases such as ASD. Unlike other genetic techniques, the exome microarray is effective for profiling variants in associated genes with >5% allele frequency [
13]. In addition, it provides exceptional coverage of the whole exonic region delivered from over 12,000 individual exome sequences [
14]. The present study conducted whole RNA sequencing and exome microarray analysis to identify the most prominent variants associated with autism in patients in Saudi Arabia.
2. Results
The study revealed the significant regulatory variants associated with the development of autism in people with Arab ancestries using multiomic analysis (
Figure 1).
A total of 13 SNPs were significantly (
p < 2.05 × 10
−7) associated with ASD even after Bonferroni or false discovery rate corrections (corrected α = 0.05/243,345 = 2.05 × 10
−7) among the 243,345 SNPs screened, and they obeyed the HWE.
Table 1 summarizes the most significant SNPs that were associated with autism in patients of Arab ancestry. Common variants in olfactory receptor family 12 subfamily D member 2 (
OR12D2) and olfactory receptor family 5 subfamily V member 1 (
OR5V1) showed the strongest signal in Arab patients with autism. The top 10 most significantly (
p < 2.05 × 10
−7) associated SNPs were in the region of the olfactory receptor genes
OR12D2 and
OR5V1 on chromosome 6p22.1 (
Figure 2). Additional information about the significant SNPs (
p < 0.0001) in autism patients is provided in
Table S1. The results suggest that a single gene,
OR12D2 (including up and downstream; NC_000006.11:g.29360183–g.29369519), has 10 common SNPs with large effects on risk (rs9257819, rs2022077, rs9257834, rs4987411, rs2073154, rs2073153, rs2073151, rs2073149, rs1028411, and rs2394607), including five SNPs in the coding region (rs9257834, rs4987411, rs2073154, rs2073153, and rs2073151).
All the association tests were screened using the frequency of minor alleles in controls, the
p-value of the HWE, and the type I error rate to identify the strongest genetic findings, which were input for linkage disequilibrium in HapMap SNPs of selected regions on chromosome 6 (
Figure 2). The variants in the region of the olfactory receptor genes
OR12D2 and
OR5V1 had a high degree of linkage disequilibrium (
p = 6.71 × 10
−9) (
Figure 2). The haplotype associated with the most significant risk in patients of Arab ancestry with autism, AAGTCTGATT (
p = 6.7082 × 10
−9), consisted of the top 10 SNPs (rs9257819A, rs2022077A, rs9257834G, rs4987411T, rs2073154C, rs2073153T, rs2073151G, rs2073149A, rs1028411T, and rs2394607T) (
Table 2). The protective haplotype most significantly associated with the control subjects of Arab ancestry, CTTCGGATGC (
p = 2.43 × 10
−8), also consisted of the top 10 SNPs (rs9257819C, rs2022077T, rs9257834T, rs4987411C, rs2073154G, rs2073153G, rs2073151A, rs2073149T, rs1028411G, and rs2394607C) with the alleles that were not associated with autism (
Table 2).
2.1. RNA Sequencing and Differentially Expressed Genes
Blood samples were collected in the RNA protectant vacutainer with an RNA stabilizer to obtain more accurate results. Total RNA was isolated for the transcriptome profiling study using paired-end sequencing. Raw reads were checked for a quality Phred score ≥ 30. The qualifying reads were processed for mapping on the reference genome and expression profile preparation. The raw RNA sequencing data reads were trimmed for the adaptors and mapped by using the tophat RNA Sequence mapping tool with the reference RNA sequence. All RNA sequence data were subjected to complete differential expression analysis between the autistic and the control subjects. The differential expression between the autistic and control subjects was calculated, revealing upregulated (foldchange ≥ 0.8 and
p-value ≤ 0.05) and downregulated (foldchange ≤ −0.8 and
p-value ≤ 0.05) genes. Autistic patients and healthy controls were considered two distinct groups for differential expression calculation using Cuffdiff [
15]. Genes were filtered for differential expression through all significant DEGs based on a
p-value ≤ 0.05. For upregulated genes, we used a cutoff
p-value ≤ 0.05 and foldchange ≥ 0.8, and for downregulated genes, we used a cutoff
p-value ≤ 0.05 and foldchange ≤ −0.8. All the expressed genes from the autistic and control samples with a quality Phred score ≥ 30 were subjected to differential expressed gene analysis, and the results are presented in
Figure 3 as volcano plots. The significantly upregulated and downregulated genes were considered with foldchange ≥ 0.8 and
p-value ≤ 0.05 and foldchange ≤−0.8 and
p-value ≤ 0.05, respectively.
A list of the top 20 upregulated genes with foldchange ≥ 1.98, a
p-value ≤ 0.0025 and a
q-value ≤ 0.05 is presented in
Table 3. A list of the top 20 downregulated genes with foldchange ≤ −3.5, a
p-value ≤ 0.00005, and a
q-value ≤ 0.0044 is presented in
Table 4. The results indicate that the fivefold downregulated genes (foldchange ≤ −5,
p-value ≤ 0.00005 and
q-value ≤ 0.0044) such as
MPO,
TMCC3,
MMP8,
CA1, and
ELF3 can be notable downregulation biomarkers. Similarly, a fivefold upregulated gene,
MTRNR2L8 (foldchange ≥ 1.98,
p-value ≤ 0.00005 and
q-value ≤ 0.004), is a considerable upregulation biomarker. The fivefold downregulated
MPO,
TMCC3,
MMP8,
CA1, and
ELF3 genes and the fivefold upregulated
MTRNR2L8 gene can be considered for the early identification of autistic patients among Saudis.
2.2. Pathway Enrichment Analysis
All the significantly upregulated and downregulated genes were searched for during the GO and pathway enrichment analysis. Significantly upregulated genes and their interactions (
Figure 4) revealed highly significant protein–protein interactions based on the predictive model enrichment (
p-value: <1.0 × 10
−16). Downregulated genes (
p-value < 0.001) and their interactions (
Figure 4) demonstrated significant PPI enrichment with a
p-value = 2.05 × 10
−6. Pathway enrichment by the significantly upregulated and downregulated genes is presented in
Table 5. Ribosome, oxidative phosphorylation, protein export, Parkinson’s disease, and Alzheimer’s disease-associated pathways were significantly enriched by the upregulated genes (
Table 5). The impact on the genetic information processing and translation was significantly enriched by the 17 upregulated genes through the ribosome pathway. The HIF-1 signaling pathway, pyruvate metabolism, propanoate metabolism, metabolic pathways, the rap1 signaling pathway, the AMPK signaling pathway, the glucagon signaling pathway, proteoglycans in cancer, renal cell carcinoma, fatty acid biosynthesis, the insulin signaling pathway, and EGFR tyrosine kinase inhibitor resistance were significantly enriched by the downregulated genes (
Table 5).
2.3. Integration of Transcriptome and Exome Genotyping
Genotyping and exome array analysis among the 243,345 putative functional exonic variants revealed 589 genes significantly (
p < 0.001) associated with autism, with 682 significant variants. Furthermore, the differentially expressed gene analysis based on transcriptome sequencing for the autistic patients compared to controls identified significantly upregulated (foldchange ≥0.8 and
p-value ≤ 0.05;
n = 125) and downregulated (foldchange ≤−0.8 and
p-value ≤ 0.05;
n = 117) genes. The integration of significantly up- and downregulated genes and genes of significant SNPs using Venn diagram analysis identified three regulatory variants (rs6657480, rs3130780, and rs1940475) associated with upregulated (rs6657480,
p = 0.000762, (NC_000001.10: g.63999868T > C) nearest upstream to the gene
ITGB3BP) and downregulated (rs3130780,
p = 3.12 × 10
−5 (NC_000006.11: g.30874308T > G) nearest upstream gene,
DDR1; and rs1940475,
p = 0.0001807 (NC_000011.9:g.102593248T > C; NM_002424.3:c.259A > G; NP_002415.1:p.Lys87Glu,
MMP8)) genes in autism spectrum disorder in patients of Arab ancestries (
Figure 1).
3. Discussion
This study performed a comprehensive investigation of the genetic architecture of ASD using an exome microarray analysis and RNA sequencing. We identified 155 significantly (
p < 0.0001) associated SNPs, and the most strongly (
p < 2.05 × 10
−7) associated 13 SNPs were investigated further. None of the 13 SNPs had previously been reported in Saudis with autism [
16]. The
OR12D2 olfactory receptor gene, which codes for the G-protein coupled receptor (GPCR), carries the signal to sensory neurons to trigger smell and has 10 common variants and large effects on risk, was associated with autism in Arab-ancestry subjects and showed a high degree of linkage disequilibrium (
p = 6.71 × 10
−9) on the significant risk haplotype AAGTCTGATT. The
OR12D2 gene (gene expression for
OR12D2: ENSG00000168787.4) was overexpressed in the brain (hypothalamus ×13.9, caudate ×10.7, hippocampus ×8.2, nucleus accumbens ×5.0 and cervix-endocervix ×4.6) [
17,
18].
Some studies have proposed unusual sensory processing, particularly olfactory processing, in a number of neurodevelopmental disorders, including autism [
19,
20]. Berko et al. [
21] showed that the
OR2L13 G-protein, joined by the olfactory receptor, which is involved in resetting the neuronal response to smells, is particularly prevalent in autism in terms of DNA methylation and expression, suggesting a possible rationale for olfactory dysfunction in ASD. The olfactory receptor genes
OR12D2 and
OR5V1, which are involved in the olfactory signaling pathway, G alpha (s) signaling events, GPCR downstream signaling, and GPCR signaling and signal transduction together regulate the overall olfactory growth, behavior, and impairment associated with autism (
Figure 4) [
22]. A functional analysis of the genes is presented in
Table S2. The significant genes in this study included integrin subunit alpha 11 (
ITGA11), which is associated with attention deficit hyperactivity disorder and attention deficit disorder with hyperactivity; MHC class I polypeptide-related sequence A (
MICA), associated with symptoms of inflammatory bowel disease, chronic ulcerative colitis, and Crohn’s disease; bromodomain-containing 2 (BRD2), which has been linked to Alzheimer’s disease and myoclonic epilepsy; cell division cycle 14C pseudogene (
CDC14C), which is associated with Parkinson’s disease and sleep; methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1-like (
MTHFD1L), which is associated with stroke and Alzheimer’s disease; myosin IXB (
MYO9B), associated with symptoms of celiac disease, inflammatory bowel disease, chronic ulcerative colitis, multiple sclerosis, and schizophrenia; and ring finger protein 144A (
RNF144A), which has been associated with depressive disorder, schizophrenia, and stroke (
Table S2) [
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35]. For example, SNP rs2727943 is significantly correlated with ASD and has been reported in bipolar cases as an intergenic variant that occurs between the
BIG-2 and
CNTN6 genes, both of which produce a neuronal membrane protein that forms the axonal connections in the developing nervous system [
4].
Unlike previous results, our study identified a single gene,
OR12D2 (including up and downstream; NC_000006.11:g.29360183-g.29369519), with 10 common SNPs and large effects on risk in patients with autism. Most previous results have suggested that no single gene has such large effects on risk (
p = 6.71 × 10
−9) [
4]. A comprehensive review on the genetics of ASD described the role of various genes and their clinical relevance to autism. Here, we describe an additional gene,
OR12D2, that should be considered for molecular pathway studies in the future [
4]. The enrichment analysis of the genes/nearest upstream/nearest downstream genes of the highly associated SNPs using Enrichr in GWAS Catalogue 2019 revealed significant diseases/drug terms associated with ASD or schizophrenia (
p = 0.001107; adjusted
p = 0.0321) on eight genes, namely
SFTA2,
BRD2,
HCP5,
OR5V1,
HLA-DMA,
ZKSCAN8,
OR12D2, and
MICA [
36]. Four genes (
TMEM185B,
MTHFD1L,
QPCT, and
H2AFV,
p = 0.0098) were downregulated relative to their median expression in the ventral juxtacommissural pretectal nucleus among the genes/nearest upstream/nearest downstream genes of highly associated SNPs using Enrichr [
36]. Children with systemic lupus erythematosus (SLE) have a high risk of developing autism. In the present study, seven genes (
BRD2,
OR12D2,
CR2,
OR5V1,
SFTA2,
HCP5, and
HLA-DMA) were highly significant for SLE (
p = 0.000029; Bonferroni = 0.0041) using DAVID [
37,
38].
The risk haplotype of the
OR12D2 gene in autism subjects and the overexpression of
OR12D2 in the brain, dysfunction in neurodegenerative processes, olfactory impairment, and the overuse of aroma compounds or essential oils, flowers, aromatic wood, herbs, frankincense, and perfumes among the Arab population provide clues that olfactory sensory deficits and overexpression of
OR12D2 should be studied in detail among the Arab population to reveal the underlying etiology of autism [
18,
19,
20,
21,
39,
40].
Even though the olfactory receptor genes on chromosome 6 are significant after a Bonferroni correction (α = 0.05/243345 = 2.05 × 10
−7) with high linkage disequilibrium (
p = 6.71 × 10
−9 on 6p22.1), their significance on the transcriptome sequencing was lacking. Furthermore, the most significantly upregulated gene,
RPS12, and the most significant downregulated gene,
MPO, were observed with no significant variants in the exome array. This suggests that whole-genome sequencing on these subjects may reveal the significant genes with novel variants which are not covered in the exome array. Differentially expressed genes in autism are essential for the complete understanding of the associated functional and autism development pathways [
10,
11,
12]. Earlier studies have revealed many significantly differentially expressed genes from various populations [
10,
11,
12]. However, there were no studies on the differentially expressed genes in autism in Saudi Arabians. Hence, this study can be considered as a first of its kind study on autistic patients from Saudi Arabia. Additionally, the integrated analysis of exome array and transcriptome sequencing is an initial study on the Arab ancestries with autism. The upregulated integrin subunit Beta 3 binding protein (
ITGB3BP) gene with rs6657480 (
p = 0.000762) in the present study is in line with the previous studies on the genes predicted to affect risk for ASD [
41]. The upregulated
ITGB3BP gene in the present study is targeted as an autism-risk gene in human fetal brains [
42], and also differential expression was observed in a mouse model [
43]. Downregulation of
ITGB3BP was observed in GABAergic neural development [
44]. SNPs associated with schizophrenia risk were identified in the gene
ITGB3BP [
45].
SNP rs3130780 was observed in this study, which is located close to the downregulated discoidin domain receptor 1 (
DDR1) gene.
DDR1 was reported earlier as a susceptibility gene for schizophrenia [
46,
47] and self-consciousness [
48].
DDR1-mediated signaling systems were observed in
Drosophila in the mushroom body encoding learning and memory [
49,
50].
DDR1 plays an important role in myelination and neuropsychiatric diseases [
51].
DDR1 expression increases in parallel with neural myelination throughout mouse brain development, according to previous research [
52]. The exonic variation of rs1940475 (NP_002415.1:p.Lys87Glu) in the
MMP8 gene was found to be a significant coding variant in the genotype association study, and it impacts the expression level of the
MMP8 gene. Pathway analysis and genome-wide association in major depressive disorder revealed a significant association with the
MMP8 gene [
53]. rs1940475 in the
MMP8 gene was reported for its association with osteoarthritis [
54], femoral head osteonecrosis [
55], and bladder cancer in non-smokers [
56]. On the other hand, a recent meta-analysis of the K87E (rs1940475) variant showed that this variant is not significantly associated with cancer susceptibility [
57]. Overall, the literature on the
ITGB3BP,
DDR1, and
MMP8 genes indicates that the observation regarding the integration of transcriptome and exome genotyping in autism is a notable addition for the understanding of the genetics of autism. Defects in the SH3 and multiple ankyrin repeat domains 3 (Shank3) protein cause numerous synaptopathies, including autism [
58,
59,
60]; upregulation of the
RPL36A was reported in the striatal synaptosome of Shank3-overexpressing transgenic mice [
58]. This is in line with the present observation on the upregulation of
RPL36A in autistic patients and indicates that the upregulation of RPL36A can influence the development of autism through the dysfunction of SHANK3-associated synaptopathies [
58,
61]. The small number of subjects enrolled in this study was a notable limitation. However, the autistic patients were selected from 1800 patients visiting the neurology clinic. Currently, research guiding medication choice and dosing in patients with ASD is limited. Autism symptoms are usually managed with a variety of medications such as psychotropic (antipsychotics and antidepressants) drugs to manage behaviors such as mood symptoms, resulting in polypharmacy- and drug-related adverse effects. Knowledge of the clinical and genetic predictors of drug response variability and the outcomes of personalized medicine can advance the clinical care of young patients with ASD. Pharmacogenomics is a method that can be used for medication choice that affects a person’s genetic information and clinical presentation to make informed choices, but no studies have specifically examined the outcomes of PGX for patients with ASD. We believe that we can use the result of our study to help in the diagnosis and treatment of patients with autism. Further studies are needed to provide further information on the genetic risk in relation to treatment effectiveness. Analysis of reactive oxygen species (ROS) generation and stress enzymes in the cases and controls may strengthen the observed findings.