**2. Materials and Methods**

#### *2.1. Methods*

Classical statistical inference under constrained parametric spaces has been addressed by many studies. Among them, Bartholomew [11] presented one of the first tests for K multinomial proportions with inequality constraints. He proposed a test of H0 : p1 = p2 = ... = pK against the simple ordered H1 : p1 ≤ p2 ≤ ... ≤ pK with at least one strict inequality, where pk (k = 1, 2, ... , K) represents the proportion the kth group. Under H0, the maximum likelihood estimator of pk is the overall sample proportion πk. If the sample multinomial proportions satisfy π<sup>1</sup> ≤ π<sup>2</sup> ≤ ... ≤ πK, then the order-restricted ML estimator is pˆ <sup>k</sup> = πk. However, sometimes the sample proportions may not satisfy the ordering π<sup>1</sup> ≤ π<sup>2</sup> ≤ ... ≤ πK; in that case, calculation of the restricted maximum likelihood estimator (RMLE) is subject to arbitrary orderings of the parameters, and it requires specialized algorithms that are not easily generalizable [9].

Robertson and Wegman [12] proposed a likelihood ratio statistic for the inequality-constrained binomial problem, which compares parameters for independent samples from a single-parameter exponential family distribution. Before calculating the test statistic, they used the pool-adjacent-violators algorithm [13] to pool "out-of-order" categories for which π<sup>k</sup> > πk+<sup>1</sup> until the resulting sample proportions are monotone increasing. The order-restricted ML estimators pˆ <sup>k</sup> become the adjusted sample proportions.

The idea of applying an isotonic transformation to the unconstrained parameter estimates motivated Dunson and Neelon [9] to create a Bayesian alternative approach for this problem, which has been adapted here. They proposed to use Bayes factors for assessing ordered trends, which are calculated based on the output from Gibbs sampling. The samples from the order-constrained model are derived by transforming samples draws from an unconstrained posterior density using an isotonic regression transformation. Next, we explain our proposed Bayes factor method (BFM).

Suppose mkij is the count of methylated molecules at CpG site j of individual i in group k. We assume mkij ∼ B ckij, pkij , where ckij is the coverage, and pkij is the true methylation rate at that particular site, with k = 1, 2, ... , K, i = 1, 2, ... , nk and j = 1, 2, ... , m.

Within each moving window along the genome, a mixed-effect model is considered to allow the correlation of methylation rates between and within CpG sites. The logit link function for the methylation rate pkij is expressed by

$$\text{logit}(\mathbf{p}\_{\text{kij}}) = \mu\_{\text{k}} + \mathbf{v}\_{0\text{k}i} + \mathbf{v}\_{1\text{k}ij} \tag{1}$$

where <sup>ν</sup>0ki and <sup>ν</sup>1kij are the random effects. The random effect <sup>ν</sup>0ki <sup>∼</sup> <sup>N</sup>(0, <sup>σ</sup><sup>2</sup> <sup>ν</sup><sup>0</sup> ) is used to model the interindividual correlation of methylation rates within each CpG site, while the random effect ν1ki = (ν1ki1, ν1ki2, ... , ν1kim) <sup>T</sup> <sup>∼</sup> <sup>N</sup>(μ0, **<sup>Σ</sup>**), with <sup>μ</sup><sup>0</sup> <sup>=</sup> (0, 0 ... <sup>0</sup>) <sup>T</sup> is used to model the correlation of methylation rates between CpG sites.

Here μ<sup>k</sup> in (1) is the fixed effect for each group, representing the association between methylation rates and group responses. The strength and direction of the association is modeled by prior distribution N(μμ, σ<sup>2</sup> <sup>μ</sup>), which means the parameters of μ<sup>μ</sup> and σ<sup>2</sup> <sup>μ</sup> control the distribution of μk, and implies that all of the methylation rates are drawn from a common distribution. This brings the advantage of allowing for heterogeneity of effects across CpG sites, instead of just pooling information across CpG sites in a region. Pooling assumes that each CpG site in the region has same methylation rates, while BFM considers the methylation rates of each CpG sites to be a random quantity governed by a prior distribution.

With assigned hyperpriors μ<sup>k</sup> ∼ N 0, 10002 , σ<sup>2</sup> <sup>k</sup> <sup>∼</sup> IG(1, 100), <sup>σ</sup><sup>2</sup> <sup>ν</sup><sup>0</sup> <sup>∼</sup> IG(1, 100) and <sup>Σ</sup>−<sup>1</sup> <sup>∼</sup> Wish(Im, m) for m CpG sites in the moving window. The posterior distribution of μ<sup>k</sup> is based on the mixed-effect logistic model (1), and it is used to calculate the Bayes factor for comparing the two models, M0 : μ<sup>1</sup> = μ<sup>2</sup> = ... = μK, M1 : μ<sup>1</sup> ≤ μ<sup>2</sup> ≤ ... ≤ μ<sup>K</sup> with at least one strict inequality, in order to see whether there is an ordered constraint of methylation rates corresponding to severity of the disease.

To calculate the Bayes factor, first we drew samples μ1, μ2, ... , μ<sup>K</sup> from the posterior distribution by using Gibbs sampling. After that, an isotonic transformation is used to transform μ1, μ<sup>2</sup> ... μ<sup>K</sup> into μ1,μ2, ... ,μK, with μ<sup>1</sup> <sup>≤</sup> μ<sup>2</sup> <sup>≤</sup> ... <sup>≤</sup> μ<sup>K</sup> [8] by using the min-max formula for the isotonic transformation, given by,

$$\widetilde{\boldsymbol{\mu}}\_{\mathbf{k}} = \mathbf{g}\_{\mathbf{k}}(\boldsymbol{\mu}) = \underset{\mathbf{t} \in \mathbb{U}\_{\mathbf{k}}}{\mathrm{minmax}} \left( \frac{\mathbf{1}\_{\mathbf{t}-\mathbf{s}+1}^{\prime} \mathbf{V}\_{[\mathbf{s};\mathbf{t}]}^{-1} \boldsymbol{\mu}\_{[\mathbf{s};\mathbf{t}]}}{\mathbf{1}\_{\mathbf{t}-\mathbf{s}+1}^{\prime} \mathbf{V}\_{[\mathbf{s};\mathbf{t}]}^{-1} \mathbf{1}\_{\mathbf{t}-\mathbf{s}+1}} \right) \text{for } \mathbf{j} = 1, 2, \dots, \ \mathbf{K}, \tag{2}$$

where *V*=diag*V*1,..., *VK* denotes the posterior covariance matrix and the diagonal submatrix Vi, i = 1, ... , k, is the covariance matrix of the ith ordered group. It is estimated from the samples of the posterior

density of μ. Uk and Lk denote subsets of {1, ... , K} such that the ordering μ<sup>j</sup> ≤ μ<sup>j</sup> for all j ∈ Lk and the ordering μ<sup>j</sup> ≥ μ<sup>j</sup> for all j <sup>∈</sup> Uk. Also samples <sup>μ</sup><sup>0</sup> <sup>1</sup>, <sup>μ</sup><sup>0</sup> <sup>2</sup>, ... , <sup>μ</sup><sup>0</sup> <sup>K</sup> are drawn from the prior density and transformed into μ<sup>0</sup> 1,μ<sup>0</sup> 2, ... ,μ<sup>0</sup> <sup>K</sup>, with μ<sup>0</sup> <sup>1</sup> <sup>≤</sup> μ<sup>0</sup> <sup>2</sup> <sup>≤</sup> ... <sup>≤</sup> μ<sup>0</sup> <sup>K</sup>, by using the isotonic transformation in (2). The Bayes factor for each window (with moving windows along the genome) is given by,

$$\text{BF} = \frac{\mathbf{P}(\mathbf{M}\_1 | \text{data} \,) / \mathbf{P}(\mathbf{M}\_1)}{\mathbf{P}(\mathbf{M}\_0 | \text{data} \,) / \mathbf{P}(\mathbf{M}\_0)} = \frac{\mathbf{P}(\widetilde{\boldsymbol{\mu}}\_\mathbf{K} > \widetilde{\boldsymbol{\mu}}\_1) / \mathbf{P}(\widetilde{\boldsymbol{\mu}}\_\mathbf{K}^0 > \widetilde{\boldsymbol{\mu}}\_1^0)}{\mathbf{P}(\widetilde{\boldsymbol{\mu}}\_\mathbf{K} = \widetilde{\boldsymbol{\mu}}\_1) / \mathbf{P}(\widetilde{\boldsymbol{\mu}}\_\mathbf{K}^0 = \widetilde{\boldsymbol{\mu}}\_1^0)}$$

Please note that the isotonic transformation in (2) changes our hypotheses slightly, making the resulting Bayes Factor an approximation rather than exact [14]. The windows with highest value of the Bayes factor among all windows are used for evaluating DMRs.

Thus, the Bayes factor is the ratio of the marginal densities of the data under the two hypotheses, and it can be used to weigh evidence in favor of a hypothesis, by utilizing all the information contained in the full likelihood. Our proposed BFM can detect DMRs associated with disease severity, especially detecting DMRs with monotonically increasing or decreasing methylation rates, as the disease severity increase. It uses a mixed-effect model to not only adjust for correlation of methylation rates between CpG sites within each moving window but also correlations within CpG sites.

In addition, by adding covariates xki in the model (1), we can account for the effects of covariates that are associated with methylation rates, such as age [15] and gender [16].

To aid in the interpretation of the Bayes factor, Jeffreys [17] proposed the following rule of thumb: "When 3 < BF ≤ 10 the evidence is positive, when 10 < BF ≤ 100 the evidence is strong, and when BF > 100, the evidence is decisive". As Kass and Raftery [18] pointed out, these categories are not precise calibration, but rather a descriptive statement about the standards of evidence in scientific investigations.
