*2.1. K*2*-Raster*

The *k*2-tree structure was originally proposed by Ladra et al. [35] as a compact representation of the adjacency matrix of a directed graph. Its applications include web graphs and social networks. Based on *k*2-tree, the same authors also proposed *k*2-raster [33], which is specifically designed for raster data including images. A *k*2-raster is built from a matrix of width *w* and height *h*. If the matrix can be partitioned into *k*<sup>2</sup> square subquadrants of equal size, it can be used directly. Otherwise, it is necessary to enlarge the matrix to size *s* × *s*, where *s* is computed as:

$$s = k^{\lfloor \log\_k \max(w, h) \rfloor} \text{ } \tag{1}$$

setting the new elements to 0. This extended matrix is then recursively partitioned into *k*<sup>2</sup> submatrices of identical size, referred to as quadrants. This process is repeated until all cells in a quadrant have the same value, or until the submatrix has size 1 × 1 and cannot be further subdivided. This partitioning induces a tree topology, which is represented in a bitmap *T*. Elements can then be accessed via a rank function. At each tree level, the maximum and minimum values of each quadrant are computed. These are then compared with the corresponding maximum and minimum values of the parent, and the differences are stored in the *Vmax* and *Vmin* arrays of each level. Saving the differences instead of the original values results in lower values for each node, which in turn allows a better compression with DACs or other integer encoders such as Simple9, PForDelta, etc. An example of a simple 8 × 8 matrix is given in Figure 2 to illustrate this process. A *k*2-raster is constructed from this matrix with maximum and minimum values as given in Figure 3. Differences from the parents' extrema are then computed as explained above, resulting in the structure shown in Figure 4. Next, with the exception of the root node at the top level, the *Vmax* and *Vmin* arrays at all levels are concatenated to form *Lmax* and *Lmin*, respectively. Both arrays are then compressed by an integer encoder such as DACs. The root's maximum (*rMax*) and minimum (*rMin*) values remain uncompressed. The resulting elements, which fully describe this *k*2-raster structure, are given in Table 1.

#### *2.2. Unary Codes and Notation*

We denote *x* as a non-negative integer. The expression |*x*| gives the minimum bit length needed to express *x*, i.e., |*x*| = log2 *x* + 1.

Unary codes are generally used for small integers. Unary codes have the following form:

$$
\mu(\mathbf{x}) = 0^{\mathbf{x}} \mathbf{1} \,, \tag{2}
$$

where the superscript *x* indicates the number of consecutive 0 bits in the code. For example, *u*(1*d*) = 0<sup>1</sup> 1 = 01*b*, *u*(6*d*) = 06 1 = 0000001*b*, *u*(9*d*) = 0<sup>9</sup> 1 = 0000000001*b*. Here, bits are denoted by a subscript *b* and decimal numbers by a subscript *d*. Furthermore, when codes are composed of two parts, they are spaced apart for readability purposes. In general, the notation used in [15] is adopted in this paper.

**Figure 2.** Subdivision of an example 8 <sup>×</sup> 8 matrix for *<sup>k</sup>*2-raster (*<sup>k</sup>* <sup>=</sup> 2).

**Figure 3.** A *k*2-raster (*k* = 2) tree storing the maximum and minimum values for each quadrant of every recursive subdivision of the matrix in Figure 2. Every node contains the maximum and minimum values of the subquadrant, separated by a dash. On the last level, only one value is shown as each subquadrant contains only one cell.

**Figure 4.** Based on the tree in Figure 3, the maximum value of each node is subtracted from that of its parent while the minimum value of the parent is subtracted from the node's minimum value. These differences then replace their corresponding values in the node. The maximum and minimum values of the root remain the same.


**Table 1.** An example of the elements of a *k*2-raster based on Figures 2–4.

#### *2.3. Elias Codes*

Elias codes include Gamma (*γ*) codes and Delta (*δ*) codes. They were developed by Peter Elias [21] to encode natural numbers, and in general, they work well with sequences of small numbers.

Gamma codes have the following form:

$$\gamma(\mathbf{x}) = 0^{|\mathbf{x}| - 1} \begin{bmatrix} \mathbf{x} \end{bmatrix}\_{|\mathbf{x}|} = \mathfrak{u}(|\mathbf{x}| - 1) \begin{bmatrix} \mathbf{x} \end{bmatrix}\_{|\mathbf{x}| - 1} \tag{3}$$

where [*x*]*<sup>l</sup>* represents the *l* least significant bits of *x*. For example, *γ*(1*d*) = *γ*(1*b*) = 1*b*, *γ*(4*d*) = *γ*(100*b*) = 001 00*b*, *γ*(6*d*) = *γ*(110*b*) = 001 10*b*, *γ*(9*d*) = *γ*(1001*b*) = 0001 001*b*, *γ*(14*d*) = *γ*(1110*b*) = 0001 110*b*.

Delta codes have the following form:

$$\delta(\mathbf{x}) = \gamma(|\mathbf{x}\mid) \begin{bmatrix} \mathbf{x} \end{bmatrix}\_{|\mathbf{x}\mid -1} \tag{4}$$

For values that are larger than 31, Delta codes produce shorter codewords than Gamma codes. This is due to the use of Gamma codes in forming the first part of their codes, which provides a shorter code length for Delta codes as the number becomes larger. Some examples are: *δ*(1*d*) = *δ*(1*b*) = 1*b*, *δ*(6*d*) = *δ*(110*b*) = 011 10*b*, *δ*(9*d*) = *δ*(1001*b*) = 00100 001*b*, *δ*(14*d*) = *δ*(1110*b*) = 00100 110*b*.

#### *2.4. Rice Codes*

Rice codes [25] are a special case of Golomb codes. Let *x* be an integer value in the sequence, and let *<sup>y</sup>* = *x*/2*<sup>l</sup>* , where *l* is a non-negative integer parameter. The Rice codes for this parameter are defined as:

$$R\_l(\mathbf{x}) = \mu(y+1) \begin{bmatrix} \mathbf{x} \end{bmatrix} . \tag{5}$$

Some examples are shown for different values of *l* in Table 2.

**Table 2.** Some examples of Rice codes.


To obtain optimal performance among Rice codes, *l* should be selected to be close to the expected value of the input integers. In general, Rice codes give better compression performance than Elias *γ* and *δ* codes.

#### *2.5. Simple9, Simple16, and PForDelta*

Apart from Elias codes and Rice codes, the codes in this section store the integers in single or multiple word-sized elements to achieve data compression. They have been shown to have good compression ratios [30].

Simple9 [36] assigns a maximum possible number of a certain bit length to a 28-bit segment or packing space of a 32-bit word. The other 4 bits contain a selector that has a value ranging from 0 to 8. Each selector has information that indicates how the integers are stored, and that includes the number of these integers and the maximum number of bits that each integer is allowed in this packing space. For example, Selector 0 tests to see if the first 28 integers in the data have a value of 0 or 1, i.e., a bit length of 1. If they do, then they are stored in this 28-bit segment. Otherwise, Selector 1 tests to see if it can pack 14 integers into the segment with a maximum bit length of 2 bits for each. If this still does not work, Selector 2 tests to see if 9 integers can each be packed into a maximum bit length of 3 bits. This testing goes on until the right number of data are found that can be stored in these 28 bits. Table 3 shows the 9 different ways of using 28 bits in a word of 32 bits in Simple9.

Simple16 [37] is a variant of Simple9 and uses all 16 combinations in the selector bits. Their values range from 0 to 15. Table 4 shows the 16 different ways of packing integers into the 28-bit segment in Simple16.

PForDelta [27] is also similar to both Simple9 and Simple16, but encodes a fixed group of numbers at a time. To do so, 128- or 256-bit words are used.

Due to its relative simplicity, Simple9 is used here as an example to illustrate how an integer sequence is stored in the encoders described in this section. This sequence <3591 25 13 12 15 12 11 26 20 8 13 8 9 7 13 10 12 0 10>*<sup>d</sup>* is taken from the *Lmax* array of one of our data scenes AG9, and the bit-packing is shown in Table 5. There are 19 integers in the sequence. Assuming the integer is 16 bits each, the sequence has a total size of 38 bytes. After packing into the array, the sequence occupies only 16 bytes.

**Table 3.** Nine different ways of encoding numbers in the 28-bit packing space in Simple9.


**Table 4.** Sixteen different ways of encoding numbers in the 28-bit packing space in Simple16. There are no wasted bits in any of the selectors.


**Table 5.** Example to show how the integer sequence <3591 25 13 12 15 12 11 26 20 8 13 8 9 7 13 10 12 0 10>*<sup>d</sup>* is stored with Simple9.


#### *2.6. Directly Addressable Codes*

Directly Addressable Codes (DACs) can be used to compress *k*2-raster and provide access to variable-length codes. Based on the concept of compact data structures, DACs were proposed in the papers published by Brisaboa et al. in 2009 [30] and 2013 [31]. This structure is proven to yield good compression ratios for variable-length integer sequences. By means of the rank function, it gains fast direct access to any position of the sequence in a very compact space. The original authors also asserted that it was best suited for a sequence of integers with a skewed frequency distribution toward smaller integer values.

Different types of encoding are used for DACs, and the one that we are interested in for *k*2-raster is called VBytecoding. Consider a sequence of integers *x*. Each integer *xi*, which is represented by log2 *xi* + 1 bits, is broken into chunks of bits of size *CS*. Each chunk is stored in a block of size *CS* + 1 with the additional bit used as a control bit. The chunk occupies the lower bits in the block and the control bit the highest bit. The block that holds the most significant bits of the integer has its control bit set to 0, while the others have it set to 1. For example, if we have an integer 41*<sup>d</sup>* (101001*b*), which is 6 bits long, and if the chunk size is *CS* = 3, then we have 2 blocks: 0101 1001*b*. The control bit in each block is shown underlined. To show how the blocks are organized and stored, we again illustrate it with an example. Given five integers of variable length: 7*<sup>d</sup>* (111*b*), 41*<sup>d</sup>* (101001*b*), 100*<sup>d</sup>* (1100100*b*), 63*<sup>d</sup>* (111111*b*), 427*<sup>d</sup>* (110101011*b*), and a chunk size of 3 (the block size is 4), their representations are listed in Table 6.

**Table 6.** Example of an integer sequence and the corresponding DACs blocks of the integers.


We store them in three blocks of arrays *A* and control bitmaps *B*. This is depicted in Figure 5. To retrieve the values in the arrays *A*, we make use of the corresponding bitmaps *B* with the rank function. This function returns the number of bits, which are set to 1 from the beginning position to the one being queried in the control bitmap *Bi*. An example of how the function is used follows: If we want to access the third integer (100*d*) in the sequence in Figure 5, we start looking for the third element in the array *A*<sup>1</sup> in Block1 and find *A*3,1 with its corresponding control bitmap *B*3,1. The function rank(*B*3,1) then gives a result of 2, which means that the second element *A*3,2 in the array *A*<sup>2</sup> in Block2 contains the next block. With the control bit in *B*3,2, we compute the function rank(*B*3,2) and obtain a result of 1. This means the next block in Block3 can be found in the first element *A*3,3. Since its corresponding control bitmap *B*3,3 is set to 0, the search ends here. All the blocks found are finally concatenated to form the third integer in the sequence.

More information on DACs and the software code can be found in the papers [30,31] by Ladra et al.


**Figure 5.** Organization of 3 DACs blocks.

#### *2.7. Selection of the k Value*

Following the description of Subsection 2.1, using different *k* values leads to the creation of *Lmax* and *Lmin* arrays of different lengths. This, in turn, affects the final results of the size of *k*2-raster. With this in mind, we present a heuristic approach that can be used to determine the best *k* value for obtaining the smallest storage size. First, we compute the sizes of the extended matrix for different values of *k* within a suitable range using Equation (1). Then, we find the *k* value that corresponds to

the matrix with the smallest size, and the result can be considered as the best *k* value. Before the start of the *k*2-raster building process, the program can find the best *k* value and use it as the default.
