*3.2. Data analysis*

We used our proposed BFM to analyze methylation data from a genome-wide association study of chronic lymphocytic leukemia (CLL), which manifests as a result of clonal expansion of malignant B cells. B-cell lymphoma, mostly prevalent among adults, is a heterogeneous disease [20,21]. It is clinically important to find heterogeneity of patients at the molecular level, which can help design specific interventions for patients at different severity levels.

Over the last decade, research in CLL has resulted in significant advances such as identification of several molecular alternations with prognostic values. These include specific cytogenetic patterns [22], mutational status of the immunoglobulin heavy chain variable gene (IgVH) [23] and expression of CD38 [24]. It has been found that patients lacking the mutation have a poorer prognosis. Patients with lower levels of CD38 have slower disease progression [23,25].

Several research groups have demonstrated that DNA methylation of multiple promoter-associated CpG islands is common in CLL [15,26,27]. Detection of aberrant DNA methylation in CLL could result in the development of an epigenetic classification of the disease with prognostic and therapeutic potential.

CD19+ B cells from peripheral blood were collected from CLL samples and normal control subjects. All CLL samples were obtained from patients at the Ellis Fischel Cancer Center (EFCC), the Georgia Cancer Center of Augusta University and the North Shore-LIJ Health System in compliance with the local Institutional Review Boards [28].

Illumina sequencing reads were generated for each sample by using RRBS [29]. In total, 20–30 million reads were sequenced for each sample, and 63%–75% were successfully mapped to either strand of the human genome (hg18) [28]. The average sequencing depth per CpG was between 32x and 43x. Eventually RRBS provided counts of DNA molecules that were methylated or unmethylated at each CpG site, and overall methylation status of approximately 1.8–2.3 million CpG sites were determined consistently for each sample in the study [28].

Tong et al. [30] pointed out that aberrant DNA methylation associated with CLL were located more frequently on chromosome 19. Hence, we analyzed genome-wide methylation data on 17,917 CpG sites on Chromosome 19 of 40 patients.

#### *3.3. Comparison of Bayesian Method with Scan Statistic Method for Two Groups*

First, we tested for differential methylation under binary response, by dividing the samples into two groups based on CD38 level of 20 as the cut-off. We had 23 subjects with CD38 ≤ 20 and 17 subjects with CD38 > 20. BFM and Scan statistic method (SSM) [31] were compared, using moving windows with 10 CpG sites in each window.

For comparing the two methods, we used a cut-off value of 2 for BFM and a 5% significance level for SSM. A total of 181 genes in DMRs were detected by SSM, and 183 genes were detected by BFM, using these criteria. Among these, 41 from SSM and 42 from BFM were found in PubMed publications as associated with leukemia (Table 4). There were 67 overlapping genes of which 18 were found in PubMed. They are ACP5, ATF5, BIRC8, C3, CARD8, CEACAM8, CERS1, CKM, CRTC1, IL4l1, LAIR1, MAP1S, NFIX, PDE4C, PLEKHG2, PLVAP, RFX1, and ZNF331 [32–49].

**Table 4.** Comparison of BFM and SSM for window size of 10 (*p* < 0.05).


C3 and LAIR1((INK4a))genes were both detected, which were shown to be related to acute myeloid leukemia [34,41]. Actually, both C3 and LAIR1 genes connect with the transcription factor CREB (cyclic AMP response element binding protein), which has a role in the pathogenesis of AML and other cancers [50,51].

#### *3.4. Bayesian Method for Ordinal Group Responses*

To test whether the methylation rates increase as the CD38 levels increase, the samples were classified into four risk groups based on CD38 level, with 5 non-leukemia subjects in group 1, 23 patients in group 2 with CD38 ≤ 20, 9 patients in group 3 with 20 < CD38 ≤ 50, and 8 patients in group 4 with CD38 > 50. Though there are advantages of modeling CD38 as a continuous variable, but on the other hand, modeling as an ordinal variable is more robust to distributional assumptions. Again, moving windows of size of 10 were used for analysis. In fact, in clinical studies it is a common practice to put patients into discrete disease risk groups based on continuous measures.

Because of multiple testing issues associated with the comparison of four groups, we used a more stringent criterion of BF > 19 to evaluate the strength of evidence of differential methylation [8]. A total of 789 windows showed strong evidence of differential methylation using this criterion. The start and end positions in base pairs for each detected DMR were used in the UCSC genome browser to find the genes in the regions, and eventually 125 genes were found in these regions. Among them, 35 were associated with leukemia on PubMed literature. Some of these were not detected when only two groups were considered even with a less stringent criterion. They are BRD4, ELL, ERCC1, ERCC2, GDF15, JUND, POLD1, PRDX2, RANBP3, SPIB and TSPAN16 [52–62].
