2.2.1. Step 1

For pixel *i* in position (*<sup>x</sup>*, *y*), we can obtain *N* features. Let **s***ni* ∈ R(*<sup>L</sup>*×<sup>1</sup>) denote the *n*th sub-feature, we conduct locality-sensitive hashing (LSH) on **s***ni* , i.e.,

$$\mathbf{h}\_i^{\boldsymbol{\eta}} = \text{sign}(\mathbf{Ds}\_i^{\boldsymbol{\eta}}),\tag{7}$$

where **D** ∈ R*L*×*<sup>L</sup>* is a random matrix with zero-mean normal distribution, and **h***ni* is a binary vector. Integrating all the *N* vectors, we ge<sup>t</sup> **S***i* = [**h**1*i* , **h**2*i* , ··· , **h***ni* ] ∈ R(*<sup>L</sup>*×*<sup>N</sup>*).

2.2.2. Step 2

> Coding **S***i* by

$$\mathfrak{s}\_{i}(j) = \sum\_{\ell=1}^{N} 2^{\ell-1} \mathbf{S}\_{i}(j,\ell),\tag{8}$$

and **sˆ***i* ∈ R(*<sup>L</sup>*×<sup>1</sup>) is the coding output for pixel *i*. Based on Equation (8), the binary results are converted to decimal vectors. Equation (8) can also indicate that the grouping strategy in H2F is necessary because the range of the coding results is [0, 2*<sup>N</sup>* − 1].
