*2.5. Multi-Task Learning-Based Two-Branch Architecture*

Multi-task learning is a learning mechanism that enables multiple learning tasks to improve their generalization performance by sharing common knowledge learned from other tasks and maintaining their own features. The proposed model combines branches introduced above together with a shared backbone feature. In the foreground extraction branch, the result is accurate enough by training with the transfer learning method; thus, the output of this branch is used as a mask for further processes. In the severity branch, the background class is set as ignored, i.e., the parameters of the background class are not reckoned in back propagation; only parameters of three different severity levels are learned. Finally, to get the final results, the mask from the foreground extraction branch is applied to the output image of the severity segmentation branch.

Two different training methods were adopted for comparison to get better results. The overall architecture and training methods are listed in Figure 6.

Two-stage training: Train the backbone and foreground extraction branch using transfer learning first, then fully freeze parameters of the backbone and train the severity segmentation branch.

Joint training: Train the two branches and background together, then calculate the weighted sum of loss from the two branches for back propagation. Assuming *L*<sup>1</sup> is the loss from the foreground extraction branch, and *L*<sup>2</sup> is the loss from the severity segmentation branch, the overall loss is calculated as:

$$L = \lambda L\_1 + (1 - \lambda)L\_2 \tag{4}$$

Moreover, two output methods were also implemented and taken into comparison. The first output method did not set the background label as ignored; thus, the severity branch also output the prediction of the background, and the number of classes of this branch output is 4. On the contrary, the second method set the background label as ignored, i.e., background was not included for back propagation; thus the severity branch barely output the prediction result containing the background class. Two different methods are shown in Figure 7.

**Figure 6.** Two training methods implemented in this paper; dotted lines stand for back propagation. (**a**) Two-stage training; (**b**) joint training.

**Figure 7.** Two output methods (for the severity segmentation branch) in this paper: (**a**) 4-class output; (**b**) 3-class output.
