*4.1. Comparison with State-of-Art Methods*

We compared the proposed method with seven state-of-the-art methods: ACSD [31], CDCP [33], DCMC [34], DF [17], MBP [21], PDNet [18], and SFP [35]. Table 1 and Figure 2 show that the proposed HANet outperforms the other evaluated methods. Figure 3 shows various saliency maps obtained from each method in typical scenarios. In the first and second rows, the closest objects are non-salient and have the highest pixel intensities. For the comparison methods, the two images are misjudged due to the depth principle error. In contrast, HANet can correctly detect the salient objects by using the information in the hybrid attention map.

To further demonstrate the effectiveness of HANet, we conducted an ablation study by removing the RGB attention map. The results are shown in the 12th column of Figure 3, where the miscalculation due to the depth principle error appears. On the third and fourth rows, we show the saliency obtained from HANet in scenes with multiple and large salient objects, confirming the effectiveness of the proposed method.


**Table 1.** Saliency Detection Performance of Different Methods on the Testing set of the NJUD and NLPR Datasets.

**Figure 2.** Precision–recall curves of different methods on the testing set of the NJUD and NLPR datasets.

**Figure 3.** Examples of salient object detection from the testing set. (**a**) Original image, (**b**) depth map, and (**c**) ground truth. Saliency maps obtained from (**d**) ACSD, (**e**) CDCP, (**f**) DCMC, (**g**) DF, (**h**) MBP, (**i**) PDnet, (**j**) SFP, (**k**) proposed HANet, and (**l**) HANet without RGB attention map (ablation study). The RGB-D imges are selected form Ref. [33].

### *4.2. Ablation Study*

To analyze the effectiveness of both the proposed hybrid attention mechanism and RGB attention map to correct mistakes caused by depth principle error, we removed Layer2, Layer3, and Layer4 and their corresponding 1 × 1 convolutions from HANet. In addition, we removed the upsampling and convolution during fusion, and omitted the RGB attention map and thus its combination with the depth attention map. Table 2 and Figure 4 show that the saliency results are substantially deteriorated, as illustrated in the 12th column of Figure 3, where the depth principle error is evident. Therefore, HANet accurately predicts salient objects and eliminates interference caused by the depth principle error.


**Table 2.** Performance of HANet During Ablation Study on NJUD and NLPR Datasets.

**Figure 4.** Precision–recall curves obtained from ablation study applied to images from NJUD (left) and NLPR (right) datasets.
