**3. Statistical Analysis of Rare Variants**

Gene-based association tests evaluate the relationship of rare variants enrichment in genes and phenotype or Mendelian and common diseases [79]. Region-based analysis has become the standard approach for analyzing rare variants, since standard individual variant tests are underpowered to detect rare variant effects because of the low allele frequencies. Statistical methods to test for rare variants can be categorized as burden approach [80–82] and SKAT (Sequence Kernel Association Test) approach [74,83]. Burden tests assume all rare variants in the target region have effects on the phenotype in the same direction and of similar magnitude [84,85], but they undergo a considerable loss of power in the presence of a large number of non-causal variants or in the presence of protective, deleterious, and null variants [86,87]. SKAT aggregates genetic information across the region using a kernel function and uses a computationally efficient variance component test to test for association. CMC (Combined Multivariate and Collapsing Method) collapses variants in subgroups according to allele frequencies and combines these subgroups using a T1 test [66,88]. Compared with population-based methods, family-based methods have more power and can prevent bias induced by population substructure [89]. The optimal weight was first proposed by Sha and coauthors in 2012 [90] in a population-based test called TOW (Test for the effect of an Optimally Weighted combination of variants) by assuming the independence among rare variants. FamSKAT [91], which accounts for familial correlation based on kinship coefficients in a linear mixed model, may be able to use both family and unrelated samples (developed for quantitative traits). Wang and coauthors in 2016 [92] proposed four weighting schemes for the family-based rare variants test (FBAT-v) [93]. Lee and coauthors in 2012 [94] derived the optimal test SKAT-O by estimating the correlation parameter in the kernel matrix to maximize the power, which corresponds to the estimated weight in the linear combination of the burden test and SKAT test statistics that maximizes power. Lin and coauthors in 2016 [68] extended the CAPL (Combined Association in the Presence of Linkage) [95] test, using both case-control and family data for testing from common variants to rare variant associations. A similarity-based weighted U approach is used to model the joint association analysis of sequencing variants and gene expression [69]. Sun and coauthors in 2016 [70] introduced a W-test collapsing method to evaluate rare variant associations by measuring the distributional differences between cases and controls through combined log of odds ratio within a genomic region. Wang and coauthors in 2016 [96] developed SKAT+, an estimation method that uses only control subjects; it has superior power over SKAT, while maintaining control over the type I error rate. Lu and coauthors in 2014 [71] reported the development and application of Trio-SVM (Support Vector Machine) approach that aggregates and evaluates the transmission of rare variants. The focus of Derkach and coauthors in 2014 [72] confirmed that Fisher's method is not only robust but can also improve power over individual pooled linear and quadratic tests and is often better than other robust tests such as SKAT-O. Cao and coauthors in 2014 [73] developed a USR (Unified Sparse Regression) to incorporate prior information and jointly adjust for cryptic relatedness, population structure, and other environmental covariates; qMSAT (Quality-based Multivariate Score Association Test) [97] and SSU (Sum of Squared U) statistic tests [98] were equivalent to the SKAT.
