*2.8. Heuristic k*2*-Raster*

In the *k*2-raster paper by Ladra et al. [33], a variant of this structure was also proposed whereby the elements at the last level of the tree structure are stored by using an entropy-based heuristic approach. This is denoted by *k*<sup>2</sup> *<sup>H</sup>*-raster. For example, for *k* = 2, each set of the 4 nodes that are from the same parent forms a codeword. It is possible that at this same level of the tree, these codewords may be repeated, and their frequencies of occurrences can be computed. These sets of codewords and their frequencies are then compressed and saved. In effect, the more these codewords are repeated, the less storage space they take up. An example of codeword frequency based on the *k*2-raster discussed in Section 2.1 is shown in Table 7. According to experiments conducted by the authors of [33], it saves space in the final representation.

**Table 7.** Codeword frequency in Level 3 of the *Lmax* bitmap in the *k*2-raster structure in Figure 1.


#### *2.9. 3D-2D Mapping*

A study on compact representation of raster images in a time-series was proposed by Cruces et al. in [34]. This method is based on the 3D to 2D mapping of a raster where 3D tuples <*x*, *y*, *z*> are mapped into a 2D binary grid. That is, a raster of size *w* × *h* with values in a certain range, between 0 and *v* inclusive, has a binary matrix of *w* × *h* columns and *v*+1 rows. All the rasters are then concatenated into a 3D matrix and stored as a 3D-*k*2-tree.

### **3. Experimental Results**

In this section, we present an exhaustive comparison of the different integer encoders for use with *k*2-raster. First, though, we report results from experiments for finding the best *k* value. Reported also are the experimental results to find out if the heuristic *k*2-raster and 3D-2D mapping would give better storage sizes. All storage sizes in this section are expressed as bits per pixel per band (bpppb).

The hyperspectral scenes were captured by different sensors: Atmospheric Infrared Sounder (AIRS), AVIRIS, Compact Reconnaissance Imaging Spectrometer for Mars (CRISM), Hyperion, and IASI. Except for IASI, all of them are publicly available for download (http://cwe.ccsds.org/sls/docs/slsdc/123.0-B-Info/TestData). The hyperspectral scenes used are listed in Table 8.

The implementations for *k*2-raster and *k*<sup>2</sup> *<sup>H</sup>*-raster were based on the algorithms presented in the paper by Ladra et al. [33]. The sdsl-lite implementation of *k*2-tree by Simon Gog [38] (https: //github.com/simongog/sdsl-lite/blob/master/include/sdsl/k2\_tree.hpp) was used for testing 3D-2D mapping described in the paper by Cruces et al. [34]. The DACs software was downloaded from a package called "DACs, optimization with no further restrictions" at the Universidade da Coruña's Database Laboratory website (http://lbd.udc.es/research/DACS/). The programming code for the Rice, PForDelta, Simple9, and Simple16 codes was written by the programmers Diego Caro, Michael Dipperstein, and Christopher Hoobin and was downloaded from these authors' GitHub web pages. Slight modifications to the code were made to meet our requirements to perform the experiments. All programs for this paper were written in C and C++ and compiled with gnu g++ 5.4.0 20160609 with -Ofast optimization. The experiments were carried out on an Intel Core 2 Duo

CPU E7400 @2.80GHz with 3072KB of cache and 3GB of RAM. The operating system was Ubuntu 16.04.5 LTS with kernel 4.15.0-47-generic (64 bits). The software code is available at http://gici.uab. cat/GiciWebPage/downloads.php.

**Table 8.** Hyperspectral scenes used in our experiments. Also shown are the bit rate and bit rate reduction using *k*2-raster. *x* is the scene width, *y* the scene height, and *z* the number of spectral bands. bpppb, bits per pixel per band; CRISM, Compact Reconnaissance Imaging Spectrometer for Mars; IASI, Infrared Atmospheric Sounding Interferometer.



#### *3.1. Best k Value Selection*

From our previous research [20], the selection of the *k* value when building a *k*2-raster was shown to have a great effect on the resulting size of the structure, as well as the access time to query its elements. In order to further investigate this idea, we extended our research to finding ways of choosing the best *k* value. One way was to build the *k*2-raster structure with different *k* values for scene data from each sensor to see how the matrix size affected the choice of the *k* value. Additionally, we measured the time it took to build the *k*2-raster and the size of the structure. The results are shown in Table 9. For most tested data, the *k* value leading to the smallest extended matrix size (attribute S in the table) usually provided the fastest build time and the smallest storage size. With these results, we could say that, in general, when *k* = 2, the compressed data size was large, sometimes even larger than the size of the original scene. As the value of *k* became larger, beginning with *k* = 3, the compressed data size was reduced. As far as the compressed size was concerned, the best value was in the range from three to 10 for matrices with a small raster size (i.e., if both the original width and original height were less than 1000) such as the ones for the AIRS Granule or AVIRIS Yellowstone scenes. If at least one dimension was larger than 1000 such as Hyperion calibrated or uncalibrated scenes, a larger range, typically between three and 20, needed to be considered.


*k*2-raster

**Table 9.** Results for different *k* values using the scene data from each sensor for the following attributes: (S) the extended matrix Size (pixels), (C) the


The above experiments were repeated to compare the access time for the different *k* values. For each scene, the average time over 100,000 consecutive queries is reported. Results are shown in Table 10, and Figure 6 shows how the access time and the size varied depending on the *k* value. As can be observed, access time became smaller and smaller as the value of *k* became larger. The plotted data suggested that there was a trade-off between access time and size with respect to the *k* value. We considered the optimal *k* value to be the one that created a relatively small size with a minimal access time. For example in AG9, when comparing the results between *k* = 6 and *k* = 15, the difference in bits per pixel per band for storage size was not very significant, but the reduction in access time was. Therefore, for this scene, *k* = 15 was considered an optimal value.

**Figure 6.** A comparison of the storage size (bpppb) and access time (μs) for different *k* values of *k*2-raster built from scenes in our datasets. Access time is the average time of 100,000 consecutive queries. For AIRS Granule 9, the best value is marked with a red circle, and the optimal value is marked with a blue square.


<sup>=</sup> scene width; h = scene height.
