*2.3. Loss Function*

We use the binary cross-entropy as loss function for HANet:

$$L(\boldsymbol{Y}, \boldsymbol{G}) = \sum\_{h} \sum\_{w} \left[ \boldsymbol{Y}(h, w) \log[\boldsymbol{G}(h, w)] \right] + \left[ 1 - \boldsymbol{Y}(h, w) \right] \log[1 - \boldsymbol{G}(h, w)],\tag{2}$$

where (*h*, *w*) represents the pixel values of the image at the corresponding position, *Y* is the prediction map, and *G* is the ground truth. Thus, *L*(*Y*, *G*) provides the final loss function values of the prediction and label map.

#### **3. Evaluation Measures and Implementation Details**

## *3.1. Evaluation Measures*

To comprehensively evaluate the detection performance of various saliency methods, we adopt five evaluation measures: precision–recall curve, maximum and mean F-measure, mean absolute error, and area under the precision–recall curve [31,32].

The binary saliency map corresponding to a threshold is then compared to the ground truth, and precision *P* and recall *R* are computed as

$$P = \frac{\sum\_{\text{lt}} \sum\_{w} \overset{\wedge}{Y}\_b(h, w) - \text{Y}(h, w)}{\sum\_{\text{lt}} \sum\_{w} \overset{\wedge}{Y}\_b(h, w)},\tag{3}$$

$$R = \frac{\sum\_{h} \sum\_{w} \stackrel{\wedge}{Y}\_{h}(h, w) - \mathcal{Y}(h, w)}{\sum\_{h} \sum\_{w} \mathcal{Y}(h, w)}. \tag{4}$$

The average precision and recall for images in each dataset are plotted in a precision–recall curve. An adaptive threshold is applied to the grayscale saliency map to obtain the corresponding binary saliency map. For each saliency map, the precision and recall are computed using (3) and (4). Then, *F*β is defined as

$$F\_{\beta} = \frac{(1+\beta^2)PR}{\beta^2 P + R} \tag{5}$$

where β is a positive parameter specifying the relative importance of precision and recall. For consistency while comparing the performance of the proposed network with that of other methods, we set β = 0.3.

The mean absolute error reflects the average absolute pixelwise difference between the predicted saliency maps and corresponding ground truth. Thus, it is an important measure to evaluate the proposed HANet, and it is given by

$$MAE = \frac{1}{HW} \sum\_{w=1}^{W} \sum\_{h=1}^{H} \left| \stackrel{\wedge}{Y}(h, w) - Y(h, w) \right|, \tag{6}$$

where *H* and *W* are the numbers of rows and columns in the saliency map, respectively.
