**Appendix A**

For optimizing objective functions (2) and (6), a weighted normalization is first conducted as:

$$y\_i^{(k)} = \sqrt{w\_i^{(k)}} (y\_i^{(k)} - \overline{y}^{(k)}) \ . \ x\_{ij}^{(k)} = \sqrt{w\_i^{(k)}} (x\_{ij}^{(k)} - \overline{x\_j}^{(k)}) .$$

where *<sup>y</sup>*(*k*) = *n*(*k*) *<sup>i</sup>*=<sup>1</sup> *<sup>w</sup>*(*k*) *<sup>i</sup> y* (*k*) *<sup>i</sup>* / *n*(*k*) *<sup>i</sup>*=<sup>1</sup> *<sup>w</sup>*(*k*) *<sup>i</sup>* and *xj* (*k*) = *n*(*k*) *<sup>i</sup>*=<sup>1</sup> *<sup>w</sup>*(*k*) *<sup>i</sup> x* (*k*) *ij* / *n*(*k*) *<sup>i</sup>*=<sup>1</sup> *<sup>w</sup>*(*k*) *<sup>i</sup>* . Then objective functions (2) and (6) can be rewritten as:

$$\sum\_{k=1}^{K} \left[ \frac{1}{2\boldsymbol{n}^{[k]}} \sum\_{i} \left[ \boldsymbol{y}\_{i}^{[k]} - \boldsymbol{x}\_{ij}^{[k]} \boldsymbol{\eta}\_{j}^{[k]} \right]^2 \right] + \sum\_{k=1}^{K} \rho\_{\text{MCP}} \left( \boldsymbol{\eta}\_{j}^{(k)}, \boldsymbol{\lambda}\_{1}, \boldsymbol{\gamma} \right) + \frac{\lambda\_{2}}{2} \sum\_{k' \neq k}^{K} \rho \left( \boldsymbol{\eta}\_{j}^{(k)}, \boldsymbol{\eta}\_{j}^{(k')} \right), \tag{A1}$$

and

$$\sum\_{k=1}^{K} \sum\_{k=1}^{K} \left[ \frac{1}{2n^{[k]}} \sum\_{i} \left[ y\_i^{[k]} - \mathbf{X}\_i^{[k]} \boldsymbol{\beta}^{[i]} \right]^2 \right] + \sum\_{k=1}^{K} \sum\_{j=1}^{P} \rho\_{\text{MCP}} \left( \boldsymbol{\beta}\_j^{(k)}, \boldsymbol{\lambda}\_{3,j} \boldsymbol{\gamma} \right) + \frac{\lambda\_4}{2} \sum\_{k \neq k} \sum\_{j=1}^{P} \rho\_{\left[ \boldsymbol{\beta}\_j^{(k)}, \boldsymbol{\beta}\_j^{(k')} \right]} \tag{A2}$$

The coordinate descent (CD) technique is used to optimize objective functions (A1) and (A2). In the CD procedure, the objective function is optimized with respect to one parameter at a time, and the other parameters are fixed at their current values. All parameters are iteratively cycled through until convergence.

Specifically, with fixed tuning parameters, for *j* = 1, ... , *p*, the CD algorithm for penalized objective function (A1) proceeds as follows.

(1). Initialize *<sup>t</sup>* <sup>=</sup> 0, - η (*k*) *j* (*t*) <sup>=</sup> 0, *<sup>k</sup>* <sup>=</sup> 1, .., *<sup>K</sup>*, where - η (*k*) *j* (*t*) denotes the estimate of η (*k*) *<sup>j</sup>* at iteration *t*. (2). For *k* = 1, ... ,*K*, carry out the following steps sequentially.

*Genes* **2019**, *10*, 604

(2.1) If ρ - η (*k*) *<sup>j</sup>* , η (*k*) *j* is the magnitude-based shrinkage penalty (3), compute:

$$b = \frac{1}{n^{(k)}} \sum\_{i=1}^{n^{(k)}} \mathbf{x}\_{ij}^{(k)2} + \lambda\_2 \text{ and } a = \frac{1}{n^{(k)}} \sum\_{i=1}^{n^{(k)}} \mathbf{x}\_{ij}^{(k)} y\_i^{(k)} + \lambda\_2 \sum\_{k' \neq k} \mathbf{s}\_{j}^{(k')} \Big(\eta\_{j}^{(k')}\Big)^{(t)}$$

.

If ρ - η (*k*) *<sup>j</sup>* , η (*k*) *j* is the sign-based shrinkage penalty (4), compute:

 $b = \frac{1}{n^{(k)}} \sum\_{i=1}^{n^{(k)}} \mathbf{x}\_{ij}^{(k)^2} + \frac{\lambda\_2}{\left(\left(\eta\_j^{(k)}\right)^{(t)} + \chi\right)^2},$ 
$$a = \frac{1}{n^{(k)}} \sum\_{i=1}^{n^{(k)}} \mathbf{x}\_{ij}^{(k)} \mathbf{y}\_i^{(k)} + \lambda\_2 \sum\_{k^{\prime} \neq k} \frac{\left(\eta\_j^{(k^{\prime})}\right)^{(t)}}{\left(\left(\eta\_j^{(k)}\right)^{(t)} + \chi\right)\left(\eta\_j^{(k^{\prime})}\right)^{(t)} + \chi}.$$

where χ is a small positive number, which is set as 0.01 in our numerical study.

$$\begin{array}{ll} \text{(2.2)} & \text{If } \left| \frac{a}{b} \right| > \gamma \lambda\_1, \text{ update} \left( \eta\_j^{(k)} \right)^{(t+1)} = \frac{a}{b};\\ & \text{else if } |\mathbf{a}| > \lambda\_1, \text{update} \left( \eta\_j^{(k)} \right)^{(t+1)} = \frac{a - \operatorname{Sym}(a) \ast \lambda\_1}{(b-1) / \gamma};\\ & \text{else, } \operatorname{update} \left( \eta\_j^{(k)} \right)^{(t+1)} = 0. \end{array}$$

(3). Repeat Step (2) until convergence. In our numerical study, convergence is concluded if *K k*=1 η|*k*| *j* |*t*+1| − η|*k*| *j* |*t*| <sup>&</sup>lt; <sup>10</sup><sup>−</sup>4.

With fixed tuning parameters, the CD algorithm for penalized objective function (A2) proceeds as follows.


(2.1) If ρ - β (*k*) *<sup>j</sup>* , β (*k*) *j* is the magnitude-based shrinkage penalty (3), compute:

 $b = \frac{1}{n^{(k)}} \sum\_{i=1}^{n^{(k)}} \mathbf{x}\_{ij}^{(k)}$  $2 + \lambda\_4$ , and  $a = \frac{1}{n^{(k)}} \sum\_{i=1}^{n^{(k)}} \mathbf{x}\_{ij}^{(k)} \left(\mathbf{y}\_i^{(k)} - \sum\_{j' \neq j}^{p} \mathbf{x}\_{ij'}^{(k)} \boldsymbol{\beta}\_{j'}^{(k)}\right) + 1$  $\lambda\_2 \sum\_{k' \neq k} \mathbf{s}\_j^{(k k')} \left(\boldsymbol{\beta}\_j^{(k')}\right)^{(t)} \text{.}$ 

If ρ - β (*k*) *<sup>j</sup>* , β (*k*) *j* is the sign-based shrinkage penalty (4), compute:

$$b = \frac{1}{n^{(k)}} \sum\_{i=1}^{n^{(k)}} \mathbf{x}\_{ij}^{(k)^2} + \frac{\lambda\_4}{\left(\left(\boldsymbol{\rho}\_j^{(k)}\right)^{(t)} + \boldsymbol{\chi}\right)^2},$$

$$a = \frac{1}{n^{(k)}} \sum\_{i=1}^{n^{(k)}} \mathbf{x}\_{ij}^{(k)} \left(\mathbf{y}\_i^{(k)} - \sum\_{j' \neq j}^p \mathbf{x}\_{ij'}^{(k)} \boldsymbol{\beta}\_{j'}^{(k)}\right) + \lambda\_2 \sum\_{k' \neq k} \frac{\left(\boldsymbol{\rho}\_j^{(k')}\right)^{(t)}}{\left(\left(\boldsymbol{\rho}\_j^{(k)}\right)^{(t)} + \boldsymbol{\chi}\right) \left(\boldsymbol{\rho}\_{j'}^{(k')}\right)^{(t)} + \boldsymbol{\chi}}.$$

where χ is a small positive number, which is set as 0.01 in our numerical study.

$$\begin{array}{ll} \text{(2.2)} & \text{If } \left| \frac{a}{b} \right| > \gamma \lambda\_3, \text{ update } \left( \beta\_j^{(k)} \right)^{(t+1)} = \frac{a}{b};\\ & \text{else if } |\mathbf{a}| > \lambda\_3, \text{ update } \left( \beta\_j^{(k)} \right)^{(t+1)} = \frac{a - \mathbf{S}\_{\mathbf{X}}^{n} (a) \star \lambda\_3}{(b-1)/\gamma};\\ & \text{else, update} \left( \beta\_j^{(k)} \right)^{(t+1)} = 0. \end{array}$$

(3). Repeat Step (2) until convergence. In our numerical study, convergence is concluded if *<sup>p</sup> j*=1 *K k*=1 - β (*k*) *j* (*t*+1) − - β (*k*) *j* (*t*) < 10<sup>−</sup>4.

These approaches involve tuning parameters, which are selected using cross validation.
