Autism spectrum disorder (ASD), as described by the American Psychiatric Association (APA) in the Diagnostic Statistical Manual (DSM-5) [
1], is a neurological disorder characterized by a lack of communication, restrained social skills, and visible signs of repetitive behaviors, like the stacking of objects, in children who have ASD. ASD is called a “spectrum” disorder because no two people having ASD exhibit the same phenotype, as most ASD conditions vary from person to person and ASD subjects can have variations in skills, symptoms, and cognitive abilities [
2]. The prevalence of ASD worldwide is around 1% [
3], while estimates are even higher in high-income countries; for example, in the US [
4], about 2.3% of children aged 8 years and about 2.2% of adults are diagnosed with ASD, which makes ASD a global economic burden. The early diagnosis of ASD is considered crucial because early diagnosis and the timely start of therapies and interventions in ASD subjects have shown improvement in the quality of life of people with ASD [
5]. The challenging issue is that the early diagnosis of ASD is difficult because of the lack of trained ASD specialists worldwide, demographic variations across countries, and universal ASD screening tests. Most importantly, the interventions and therapies that exist around the world to target ASD are mostly designed for the adult population [
6,
7]. In light of this evidence, it is of significant interest to find psychological interventions, biological drugs, and artificial intelligence-based methods to tackle and treat ASD in general and find those neuro-markers and genetic factors related to ASD that can reduce the economic burden of ASD and improve the quality of lives of people suffering with ASD in particular.
1.1. Motivations
The functional magnetic resonance imaging (fMRI) modality has been receiving more attention from researchers in the field of neuro-science because of its non-invasiveness, high spatial resolution, and lower cost spent over time. fMRI measures the neuronal activity in the brain by measuring the magnitude of the blood-oxygenated-level-dependent (BOLD) signals [
8,
9]. A BOLD signal reflects the changes in blood oxygenation levels in response to neuronal activity in a brain region. The key idea of fMRI is to measure these changes, which are indicative of increased blood flow and oxygenation following neuronal activation. A range of brain disorders, including Alzheimer’s [
10], attention deficit hyperactivity disorder [
11], autism spectrum disorder [
12], epilepsy [
13], major depressive disorders [
14], and Parkinson’s [
15], have now been explored using fMRI, which makes fMRI a widely researched modern modality for studying various brain disorders. A plethora of fMRI studies on various brain disorders, the non-invasive nature of fMRI, and the high spatial resolution of fMRI make this imaging modality a promising research direction for pursuing and exploring brain disorders.
Of late, deep learning (DL)-based approaches [
16] have revolutionized the field of artificial intelligence and are being used in a variety of tasks, such as classification, detection, segmentation, and natural language processing. The underlying idea of DL, which is also called representational learning, is to map the underlying data distribution to a non-linear latent dimension by first using a multi-layered neural network and then training the network using the back-propagation algorithm. After training, the trained neural networks are tested on unseen or testing data. DL-based approaches have shown huge potential in the fields of image classification [
17], brain tumor detection [
18], drug discovery [
19], and natural language processing (NLP) [
20], which makes DL an active area of research for many open problems, consequently motivating a community of researchers to use the ideas of DL in their respective fields. In the following, we briefly review the related articles specific to our proposed approach, with which we compare our results.
Conditional Generator Adversarial Networks (cGANs) [
21] comprise an unsupervised category or class-embedded approach that is used to generate a synthetic dataset when the dataset is rare and the problem of over-fitting the model is to be reduced using the synthetically generated dataset. cGAN consists of two neural networks called the “Generator” and “Discriminator”, where the task of the Generator is to generate real data using the random vector and a class-embedding vector to fool the Discriminator, and the task of the Discriminator is to identify if the data generated by the Generator are real or fake using the real and fake data with class-embedding information. Our motivations for using the cGAN comes from its ability to generate synthetic data to reduce the problem of over-fitting because of highly dimensional connectivity features. We use the cGAN to generate connectivity features across the category of subjects so that more of the dataset on connectivity features can be generated, and our model has solved the issue of over-fitting.
1.2. Related State of the Art
RFEGNN [
22], a “Recursive Features Extraction-based Graph Neural Network” approach, was proposed to classify ASD using spatial features extracted from recursive feature elimination and then concatenating them with the phenotype information on the subjects. The authors achieved promising results, as their fusion-based approach achieved an overall accuracy of 80%. A potential drawback of their approach is the use of a feature selection approach and a concatenated approach to fuse two sets of features. Our proposed approach neither relies on those hand-crafted features nor uses the phenotype information and still outperforms this approach, which shows the robustness of our approach.
MHSA [
23] is a “Multi-Head Self Attention”-based promising approach that uses the architecture of transformer-based Multi-Head attention and a data augmentation module. The authors achieved promising results with their end-to-end approach that did not require hand-crafted features with 81.47% accuracy, 83.8% sensitivity, and 80.16% specificity. But the main weaknesses of their approach was the augment module where the authors used a sliding window approach and a large number of parameters needed for selection and experimentation. To this end, our proposed approach of cGAN-powered data augment module solves this problem by not relying on the sliding window to augment the dataset.
MVES [
24] is a “Multi-View Ensemble”-based approach to tackle the classification of ASD by first extracting the mean time series of the fMRI data and then by a selection of low-/high-level functional connectivity. It features using PCA and autoencoder and finally an ensemble-based model for the classification. The authors reported a classification accuracy of 72% and a highest classification accuracy of 92.9% on the CMU site. In MVES, the authors used a fixed-length window approach to augment the data, which is also the main limitation of their approach, because by using the fixed-length window, the inherent meaningful structure of the full-length fMRI is ignored. To this end, we again firmly believe that our proposed approach is superior to their fixed-length approach because our cGAN-based data augmentation approach tackles data augmentation by generating fixed-length correlation features that do not depend on the varying time series of a subject.
NVS [
25], a “Novel Features Selection”-based approach that uses a pre-trained variational autoencoder, achieved a 10-fold accuracy of 78.12% on the overall dataset. The authors also proposed an innovative activation function and devised a normalization pipeline. But their approach used the hand-crafted feature selection approach that is now considered obsolete. Our end-to-end tackles this limitation well because, it does not rely on a pre-selection of features. Moreover, the authors did not report sitewise metrics for the 17 sites, which again limits the usefulness of their approach as it might not work well on sitewise data because some sites have significantly smaller data points, which make classification a challenging task on individual sites.
MSC [
26] is a “Multi-Site Clustering”-based approach that tackles the classification task using nested feature selection, where the authors first divide the dataset into two categories of ASD and HC, and then the features are selected from the two sites using single-value decomposition, resulting in an accuracy of 68.42%. The major limitations of their work are the hand-crafted features and the reporting of accuracy on fewer sites instead of all the 17 sites.
DeepGCN [
27], a “Deep Graph Convolution Neural Network”-based approach, uses an end-to-end idea using Graph Convolution Network (GCN) that has achieved promising results on the ABIDE dataset and the authors have reported a cross-validated accuracy of 73.7%. The main limitation of their approach is that, although the authors used the end-to-end approach for the classification of ABIDE, they did not report a sitewise comparison of the accuracy, which limits the usefulness of their approach because the sitewise high accuracy is considered more challenging due to lesser data points, as previously discussed.