*A Method of Power Calculations, Estimating Ability to Detect Di*ff*erentially Methylated Sites with a Given False Discovery Rate*

First, in what follows, *p* values are defined as the cumulative probabilities of the chosen test statistics when the null hypothesis is satisfied, thus ranging from 0 to 1.0. As noted by Storey et al. [32], a plot of the density of *p* values for a central t distribution (or other distributions) is a uniform plot, when the data conform to the null (the hypothesis that the *p* value is defined to represent). Then, it is assumed that the observed data to be evaluated (consisting of a large number, say R, independent *t* statistics), represent a mix from two types of CpG (or gene) populations. Namely, these are a population of true null data, and a second population where the true distribution for each *t* statistic is non-null (non-central), but not otherwise specified.

In that situation the density plot of *p* values (estimated, say, from a central t distribution) should devolve into a central uniform portion, at a density less than 1.0, say R0/R, and tails at either end that slope upward from the uniform region represent an excess of *p* values in regions more sparsely populated under the null. The total area under the curve must, as usual, equal 1.0, by the definition of density. One can compare this to a plot that would be expected if all values came from a null distribution, which would be a uniform plot at a value of 1.0. It is easy to show that the value R0/R represents the proportion of the t values in the real data that come from a null distribution. Then R—R0 is the estimated number of t statistics that come from non-null distributions.

To gain some extra stability, we have simply translated this same rationale to cumulative distributions of *p* values from both observed data and hypothetical null data. In the null situation the cumulative distribution is a diagonal line from lower left to upper right. In the observed data, the central uniform density cumulates to a straight line that can be shown to have slope R0/R. Relative excess densities at small *p* values cumulate to push the left end of the curve higher than the diagonal with slopes that exceed 1.0, thus flattening the straight line null portion that begins at the end of this elevated segment. Unless there are no non-null data at the right-hand higher-valued end, the central straight portion necessarily crosses the diagonal. Then later, at the end of the null straight portion, again develops slopes greater than 1.0, representing non-null high-value P statistics, finally meeting the diagonal again when *p* = 1.0. In these plots the slope of the straight line portion estimates the proportion of underlying null distributions, and of course again (R—R0)/R is the proportion that are non-null.

An example from the DNA methylation data (average of CpG sites in gene body regions) described in this paper is shown as Figure A1. Here, the deviation in slope of the central linear region from the diagonal is relatively small, but important, and in part due to large numbers, is not due to chance (*<sup>p</sup>* <sup>&</sup>lt; <sup>2</sup> <sup>×</sup> <sup>10</sup><sup>−</sup>16). The size of the population studies, and the relatively small fold change values (that translate to smaller non-centrality parameters for the non-null t distributions) account for the modest deviation from the diagonal in the figure. Increasing *n* (the number of subjects) will change this (see Table 4). An example from a metabolomics dataset—also comparing non-vegetarians to vegans—includes t distributions with much greater non-centrality parameters and is shown below for illustrative purposes (Figure A2).

**Figure A1.** Plot of cumulative *p* values from real data (dark line), and from a hypothetical data where all data are null (fine line). Differential methylation: average of CpG sites in gene body regions, comparing non-vegetarians to vegans.

**Figure A2.** Cumulative *p* plots (real and hypothetical null data) for a metabolomics panel comparing non-vegetarians to vegans.

This situation can be used to estimate power, and how, in a particular dataset, this depends on *n*. Power is here expressed as the false discovery rate (FDR) associated with sample size *n*2, where *n*1 is the size of the observed data, and where critical *t* statistic values are nominated for exploratory purposes. This is dependent on the observation that as *n* increases, the distribution of *t* statistics from null sites does not change. The reason for this is that the numerator of each *t* value is (mean(non-vegetarians)—mean(vegans)) for that site or gene, and its expectation by definition under the null, is zero. Thus, the underlying random errors in these means will shrink as *n* increases according to the inverse square root law, decreasing the range of the random differences between the two means (the numerator). The denominator that measures the standard error of this difference between means shrinks by an exactly equivalent amount; however, the ratio *t* statistic remains unchanged, with other things being equal.

However, the situation for non-null t values is different—the numerators, again by definition, have a non-zero expected value (that differs from methylation site to methylation site). As *n* increases, this expected value does not change, but the denominator (Standard Error of the difference) shrinks as usual, driving the non-null *t* statistics to ever more extreme values. This has the most helpful consequence of drawing the non-null results ever more toward the tails of the observed distribution as numbers increase. So, for any nominated critical t value, the tails will contain the usual number of randomly extreme *t* statistics from null sites, but an increasing number from non-null sites with increasing numbers of subjects, thus improving the FDR.

Specifically, it can be shown that, for the left-hand tail,

FDR(n2) <sup>=</sup> FDR(n1).R.cp(Tcl) - R.cp(Tcl. <sup>√</sup>(n1/n2)) – *<sup>R</sup>*0.[p(Tcl. <sup>√</sup>(n1/n2)) <sup>−</sup> <sup>p</sup>(Tcl)] , where Tcl is the nominated critical *t* value for the left-hand tail; p(Tcl) refers to the cumulative probability under the null corresponding to this *t* value; cp(Tcl) refers to the cumulative probability of this Tcl value in the observed data.

For the right-hand tail where Tcu is the critical value for significance in the upper tail, the expression is

$$\text{FDR}(\text{n2}) = \frac{\text{FDR}(\text{n1}) . R.} (1 - \text{cp}(\text{Tcu}))}{\left\{ \text{R.} \left(1 - \text{cp}\left(\text{Tcu. } \sqrt{\text{(n1/n2)}}\right)\right) - R. \left[\text{p(Tcu) - p}\left(\text{Tcu. } \sqrt{\text{(n1/n2)}}\right)\right] \right\}}.$$
