**3. Experiments**

Two synthetic datasets and one field dataset were used in this study, which are denoted as the Marmousi2 model, the Overthrust model, and the Volve model, respectively. The three datasets were used to train the single-task network and the hard parameter sharing multi-task network respectively, and then the trained networks were used for impedance inversion.

### *3.1. Experiment on the Marmousi2 Model*

The impedance of the Marmousi2 model and its corresponding synthetic data are the same as in [23], and are shown in Figure 2. For this model, 101 traces were selected from the synthetic seismic data and impedance data as the training set through isometric sampling, and 1350 traces were randomly selected from the remaining 13,500 traces as the validation set and the whole 13,601 traces comprised the test set. The Adam optimization method was adopted in this paper. The weight decay rate was set to 1 × 10<sup>−</sup>7. The learning rate was set to 0.001. The number of epochs was set to 50, and the batch size was set to 10. The above hyperparameters and network structure were decided from an ablation study by adjusting the convergence of the training set and the validation set. The training of the network was implemented under the PyTorch framework, and a GPU was applied to accelerate the calculation.

**Figure 2.** Marmousi2 model dataset: (**a**) impedance; (**b**) synthetic seismic data generated by 30 Hz 0◦ phase Ricker wavelet.

The validation Mean Squared Error (MSE) of the two tasks with different weights is shown in Table 1. The training/validation procedure was executed five times and the average results were output for comparison. The performance of the model for a single task is shown in the first and last rows. When the validation loss is minimal during the 50 epochs, the corresponding *σpre* **=** 0.890 and *σrec* **=** 0.759, then we can calculate that the optimal weight for impedance prediction and seismic data reconstruction is 0.421:0.579 by Equations (5) and (6). Figure 3 shows the variation at different epochs of *σpre*, *σrec* and the corresponding *wpre*, *wrec*. Under this optimal weight, the validation MSE of the two task is 0.0365 and 0.00054. Compared with results in Table 1, it can be seen that the MSEs of the multi-task network of the two tasks under different weights are smaller than that of the single-task network, and the optimal weight determined by the proposed method makes the MSE smaller for both tasks, which proves the effectiveness of the method. The training and validation loss curves of different tasks under the optimal weight are shown in Figure 4. Figure 5 illustrates the profiles predicted by the single-task network (Figure 5a) and the hard parameter sharing multi-task network (Figure 5b) with the optimal weight for visual comparison. The MSEs of the profiles predicted by the single-task network and the hard parameter sharing multi-task network are 0.0478 and 0.0353, respectively, which proves the advantage of the hard parameter sharing multi-task network from a quantitative perspective.


**Table 1.** The validation MSEs of the two tasks with different weights on the Marmousi2 model.

**Figure 3.** The variation of (**a**) *σpre*, *σrec* and (**b**) *wpre*, *wrec* with epochs.

**Figure 4.** The training and validation loss curves of the Marmousi2 model: (**a**) impedance; (**b**) data reconstruction.

To further demonstrate the effectiveness of the proposed method, Figure 6 shows the results of impedance prediction for the 731st (Figure 6a) and 8991st (Figure 6b) trace data points. We can see that the impedance predicted by the hard parameter sharing multitask network (green) matches the true impedance (red) better than that of the single-task network (blue). The blue and green dotted lines in Figure 6 represent residuals between the true impedance and the impedance predicted by the two networks. Table 2 shows the Pearson Correlation Coefficient (PCC) between the predicted value and the truth. Under the optimal weights, the PCC between the predicted value and the ground truth of the 731st and 8991st traces of the multi-task network (third column) are both higher. We also tested the tolerance of the two networks to noise. We added six levels of Gaussian noise with a signal-to-noise ratio (SNR) of 0, 5, 15, 25, 35, and 45 dB to the test dataset. The MSEs between the prediction of different SNR data and the true impedance are presented in

Table 3, and shows that the accuracy of the multi-task network is higher than that of the single-task network under all six test datasets with different SNRs. This further proves the superiority of the method in this paper.

**Figure 5.** Profiles of the Marmousi2 model predicted by (**a**) the single-task network and (**b**) the hard parameter sharing multi-task network under the optimal weights.

**Figure 6.** Impedance traces' prediction of the Marmousi2 model by the two networks: (**a**) trace 731; (**b**) trace 8991.

**Table 2.** PCC between the traces' predicted value and the ground truth of the Marmousi2 model.


**Table 3.** The MSEs between the prediction of different SNR data and the true impedance of the Marmousi2 model


### *3.2. Experiment on the Overthrust Model*

The impedance of the Overthrust model and its corresponding synthetic data are also that same as in [23] and are shown in Figure 7. For this model, five traces were selected as the training set from the synthetic seismic data and impedance data by isometric sampling, 39 traces were randomly selected from the remaining 392 traces as the validation set, and the whole 401 traces were the test set. Due to the small amount of sample data, there may be overfitting. Data enhancement methods are usually used to avoid overfitting. In contrast to the Marmousi2 model test, we used cubic spline interpolation to generate 20 new impedance and seismic data traces between every two selected traces in this model. Ultimately, we used 85 traces to train the network. The interpolation data are shown in Figure 8. The frequency-wavenumber spectra of the original and interpolated seismic data are shown in Figure 9 for comparison. The number of epochs was set to 1000 and the other hyperparameters and network structure were same as those of the Marmousi2 model. The validation MSEs of the two tasks with different weights are shown in Table 4. The performance of the model for a single task is shown in the first and the last rows. When the validation loss is minimal during the 1000 epochs, the corresponding *σpre* **=** 0.192 and *σrec* **=** 0.120, then we can calculate that the optimal weight is 0.28: 0.72, and the MSE of the two tasks under this weight is 0.0079 and 0.00001. Figure 10 shows the variation with epochs of *σpre*, *σrec* and the corresponding *wpre*, *wrec*. It can be seen that the MSEs obtained by the multi-task network under different weights for both tasks are all smaller than that obtained by the single-task network, and the optimal weight minimizes the MSEs. The training and validation loss curves of different tasks under the optimal weight are shown in Figure 11. Figure 12 illustrates the profiles predicted by the single-task network (Figure 12a) and the hard parameter sharing multi-task network (Figure 12b) with the optimal weight. The MSEs of the predicted profiles are 0.0119 and 0.0063, respectively, which also shows that the hard parameter sharing multi-task network outperforms the single-task network. Similarly, we chose two traces to further demonstrate the performance of the proposed method. Figure 13 shows the predicted results of impedance prediction for the 99th (Figure 13a) and 316th (Figure 13b) traces, and their corresponding PCC between the predicted value and the ground truth are shown in Table 5. It is clear that the impedance predicted by the hard parameter sharing multi-task network (green) matches the true impedance (red) better than that by the single-task network (blue). The blue and green dotted lines in Figure 13 represent residuals between the true impedance and the impedance predicted by the two networks. In addition, the MSEs between the prediction of different SNR data and the true impedance are presented in Table 6. It shows that, with the exception of 0 dB noise, the accuracy of the multi-task network is higher than that of the single-task network under the other five testing data with different SNRs, which further proves the advantage of the proposed method.

**Figure 7.** Overthrust model dataset: (**a**) impedance; (**b**) synthetic seismic data generated with 30 Hz 0◦ phase Ricker wavelet.

**Figure 8.** Interpolated data of the Overthrust model. (**a**) impedance; (**b**) synthetic seismic data.

**Figure 9.** The frequency-wavenumber spectra of the original (**a**) and interpolated seismic data (**b**).


**Table 4.** The validation MSEs of the two tasks with different weights of the Overthrust model.

**Figure 10.** The variation of (**a**) *σpre*, *σrec* and (**b**) *wpre*, *wrec* with epochs.

**Figure 11.** The training and validation loss curves of the Overthrust model: (**a**) impedance; (**b**) data reconstruction.

**Figure 12.** Profiles of the Overthrust model predicted by (**a**) the single-task network and (**b**) the hard parameter sharing multi-task network under the optimal weight.

**Figure 13.** Impedance traces prediction of the Overthrust model by the two networks: (**a**) trace 99; (**b**) trace 316.

**Table 5.** PCC between the traces' predicted value and the ground truth of the Overthrust model.



**Table 6.** The MSEs between the prediction of different SNR data and the true impedance of the Overthrust model
