2.1.2. *L*<sup>2</sup> Penalized Boosting

In the *<sup>L</sup>*<sup>2</sup> penalized boosting, we replace <sup>Ω</sup>(*f*) in the objective function of (5) with *<sup>λ</sup><sup>β</sup>*<sup>2</sup> 2. Following the same transformation as that in Section 2.1.1, the objective can also be converted to a standard Ridge Regression (see Section 1 of Supplementary Materials)

$$\min\_{\beta} \frac{1}{N} ||\overline{\eta} + \mathcal{R}\_m \beta||\_2^2 + \lambda ||\beta||\_{2'}^2 \tag{7}$$

which allows closed form solution

$$\beta = - (\mathcal{K}\_m^T \mathcal{K}\_m + N \lambda I\_N)^{-1} \mathcal{K}\_m^T \bar{\eta}.$$

Both the *L*<sup>1</sup> and *L*<sup>2</sup> boosting algorithms require the specification of the penalty parameter *λ*, which controls step length (the norm of fitted *β*) in each iteration and additionally controls solution sparsity in the *L*<sup>1</sup> case. Feasible choices of *λ* might be different for different scenarios, depending on the input data and also the choice of the kernel. Either too small or too large *λ* values would lead to big leaps or slow descent speed. Under the *L*<sup>1</sup> penalty, poor choices of *λ* can even result in all-zero *β*, which makes no change to the target function. Therefore, we also incorporate an optional automated procedure to choose the value of *λ* in PKB. Computational details of the procedure are provided in Section 2 of the Supplementary Materials. We recommend the use of the automated procedure to calculate a feasible *λ* and try a range of values around it (e.g., the calculated value multiplies 1/25, 1/5, 1, 5, 25) for improved performance.

Lastly, the final target function at iteration *T* can be written as

$$F\_T(\mathbf{x}) = \sum\_{m=1}^{M} \sum\_{i=1}^{N} \mathcal{K}\_m(\mathbf{x}\_i^{(m)}, \mathbf{x}^{(m)}) \boldsymbol{\beta}\_i^{(m)} + \mathbf{C}\_r$$

where *β*(*m*) = (*β*(*m*) <sup>1</sup> , *<sup>β</sup>*(*m*) <sup>2</sup> , ... , *<sup>β</sup>*(*m*) *<sup>N</sup>* ) are the combination coefficients of kernel functions from pathway *<sup>m</sup>*. We use *<sup>β</sup>*(*m*)<sup>2</sup> as a measure of importance (or weight) in the target function. It is obvious that only the pathways that are selected at least once in the boosting procedure will have non-zero weights. Because *FT*(**x**) is an estimation of the log odds function, sign[*FT*(**x**)] is used as the classification rule to assign **x** to 1 or −1.
