*2.2. Multi-Task Loss Function Based on Homoscedastic Uncertainty*

The performance of hard parameter sharing is highly dependent on the loss weight of each task, and simply performing a weighted linear sum of the loss for each individual task is usually undertaken to carry out training. Manual tuning of the weights is often troublesome. Thus, a method based on homoscedastic uncertainty of the Bayesian model is used to balance the weights of the loss function of multiple tasks.

In Bayesian modeling, there are two main types of uncertainty, i.e., epistemic and aleatoric uncertainty. In a model, the aleatoric uncertainty captures the randomness of the model prediction, which depends on the noise inherent in input observations, and the epistemic uncertainty captures what a model does not know due to lack of training data [37]. Aleatoric uncertainty can again be divided into two subcategories, heteroscedastic uncertainty and homoscedastic uncertainty. Heteroscedastic uncertainty depends on the inputs to the model, with some inputs potentially having more noisy outputs than others. Homoscedastic uncertainty can be described as task dependent, which stays constant for all input data and varies between different tasks [36]. In this paper, we derive a multi-task loss function based on maximizing the Gaussian likelihood with homoscedastic uncertainty. The derivation is as follows:

(1) Given a dataset *X* = {*x*1,..., *xN*}, *Y* = {*y*1,..., *yN*}, we define *f <sup>w</sup>*(*x*∗) as the output of a neural network with weights *w* on input *x*∗. For regression tasks, we define the likelihood as a Gaussian distribution that takes the model output as the mean:

$$p(y^\* | f^w(\mathbf{x}^\*)) = \mathcal{N}(f^w(\mathbf{x}^\*), \sigma^2) \tag{1}$$

where *σ* is an observation noise scalar capturing how much noise is in the outputs. When *w* and *x*<sup>∗</sup> are determined, the establishment of a probabilistic model of observation is equivalent to the determination of epistemic uncertainty and heteroscedastic uncertainty. Only homoscedastic uncertainty is considered in this paper. Different tasks have different homoscedastic uncertainties.

(2) In maximum likelihood inference, we want to maximize the logarithmic likelihood of the model, that is, to maximize the following equation:

$$\log p(y^\* | f^w(\mathbf{x}^\*)) \propto -\frac{1}{2\sigma^2} ||y^\* - f^w(\mathbf{x}^\*)||^2 - \log \sigma \tag{2}$$

(3) Construct the maximum likelihood function for multi-task. There are two tasks in our model: impedance prediction and data reconstruction. We use *ypre* and *yrec* to denote the outputs of the two tasks and assume that *ypre* and *yrec* following a Gaussian distribution:

$$p\left(y\_{\text{pre}}, y\_{\text{rec}} \middle| f^w(\mathbf{x})\right) = p\left(y\_{\text{pre}} \middle| f^w(\mathbf{x})\right) p\left(y\_{\text{rec}} \middle| f^w(\mathbf{x})\right) = N\left(y\_{\text{pre}\_r} f^w(\mathbf{x}), \sigma\_{\text{pre}}^2\right) N\left(y\_{\text{rec}\_r} f^w(\mathbf{x}), \sigma\_{\text{rec}}^2\right) \tag{3}$$

(4) To maximize the logarithmic likelihood function, that is, to minimize the negative logarithmic likelihood function, then the multi-task loss function is:

L - *w*, *σpre*, *σrec* = −*log p*- *ypre*, *yrec <sup>f</sup> <sup>w</sup>*(*x*) ∝ <sup>1</sup> 2*σ*<sup>2</sup> *pre* ||*ypre* − *f <sup>w</sup>* (*x*)||<sup>2</sup> + <sup>1</sup> 2*σ*<sup>2</sup> *rec* ||*yrec* − *f <sup>w</sup>* (*x*)||<sup>2</sup> + *logσpreσrec* = <sup>1</sup> 2*σ*<sup>2</sup> *pre Lpre*(*w*) + <sup>1</sup> 2*σ*<sup>2</sup> *rec Lrec*(*w*) + *logσpreσrec* (4)

the process of minimizing the loss function is to learn the optimal weight of *Lpre*(*w*) and *Lrec*(*w*) automatically according to the data. As *σpre* increases, the weight of task impedance prediction decreases. However, with too great an increase in noise, the data will be ignored, so the last term of the objective function *logσpreσrec* is the noise term regularizer. When the loss function reaches its minimum, we can obtain the corresponding *σpre* and *σrec* . We use *wpre* and *wrec* to denote the weights of the two tasks; then, from Equation (4) we can obtain the optimal weight between the two tasks as:

$$w\_{\rm pre} \,:\, w\_{\rm rec} = \frac{1}{2\sigma\_{\rm pre}^2} \,:\, \frac{1}{2\sigma\_{\rm rec}^2} = \sigma\_{\rm rec}^2 \,:\, \sigma\_{\rm pre}^2 \tag{5}$$

Further processing makes the two weights add to 1; then, the final optimal weight is:

$$w\_{\rm pre} \, : \, w\_{\rm rec} = \frac{\sigma\_{\rm rec}^2}{\sigma\_{\rm pre}^2 + \sigma\_{\rm rec}^2} \, : \, \frac{\sigma\_{\rm pre}^2}{\sigma\_{\rm pre}^2 + \sigma\_{\rm rec}^2} \tag{6}$$
