Latitude-Adaptive Integer Bit Allocation for Quantization of Omnidirectional Images

Qian Sima; Hui Feng; Bo Hu

doi:10.3390/app14051861

Abstract

Omnidirectional images have gained significant popularity and drawn great attention nowadays, which poses challenges to omnidirectional image processing in solving the bottleneck of storage and transmission. Projecting onto a two-dimensional image plane is generally used to compress an omnidirectional image. However, the most commonly used projection format, the equirectangular projection (ERP), results in a significant amount of redundant samples in the polar areas, thus incurring extra bitrate and geometric distortion. We derive the optimal latitude-adaptive bit allocation for each image tile. Subsequently, we propose a greedy algorithm for non-negative integer bit allocation (NNIBA) for non-uniform quantization under an omnidirectional image quality metric WMSE. In our experiment, we design quantization tables based on JPEG and compare our approach with other sampling-related methods. Our method achieves an average bit saving of 7.9% compared with JPEG while outperforming other sampling-related methods. Besides, we compare our non-uniform quantization approach with two proposed bit allocation methods, achieving an average improvement of 0.35 dB and 2.66 dB under WS-PSNR, respectively. The visual quality assessment also confirms the superiority of our method.

Keywords:

omnidirectional image; transform coding; quantization; bit allocation

1. Introduction

Omnidirectional images, alternatively referred to as panoramic, spherical, or 360° images, represent a novel multimedia format. These images are typically used for virtual reality (VR) [1,2], augmented reality (AR) [3], and various immersive experiences [4]. The content of 360° images is on the sphere covering the whole

360 \times 180 °

viewing range. In essence, a 360° image provides a seamless view of a full sphere, allowing viewers to explore any direction as if they were centrally positioned within the captured space, which is different from the traditional 2-dimensional (2D) image that only covers a limited plane.

The omnidirectional view and the immersive experience result in large data sizes compared to standard 2D images, especially when dealing with high-resolution 360° images [4]. Additionally, substantial data of omnidirectional images can introduce latency in transmission as they require more time to load and process, while reduced latency is crucial in applications like live streaming and real-time interaction in AR and VR. With the high-resolution demand for panoramic images and their rapid proliferation on the Internet, developing efficient compression techniques becomes crucial to meet the demands for storage and transmission of omnidirectional images [5].

There are mainly two methods to compress omnidirectional images, as shown in Figure 1. One strategy involves processing omnidirectional images within the spherical domain. Several techniques have been developed for representing signals on the entire sphere, including the spherical harmonic transform [6], the spherical wavelet transform [7], the Gaussian function on the sphere [8], and block spherical transforms like the graph Fourier transform [9]. Methods in the spherical domain can avoid the distortion induced by projection, while they may not outperform well-established 2D image compression techniques, which require further development [10].

Figure 1. Overview of omnidirectional image compression methods.

The 360° images can also be projected to 2D planes to leverage the existing traditional image compression standards [4]. Although 2D image compression techniques have reached a high level of maturity, these methods do not adequately accommodate the spherical nature of omnidirectional images. Therefore, various methods have been proposed to address the issues of redundancy and distortion in projected omnidirectional images, which can be categorized into three main groups: re-projection, perceptual methods, and sampling density correction.

Re-projection methods seek to leverage compression-friendly re-projection techniques. Certain approaches involve rotating the panoramic content within the spherical domain [11,12]. Others develop new projection methods, including HEC [13], HCP [14], OGM [15], etc. Re-projection methods allow regions demanding high visual quality to be situated in less distorted areas on the projected 2D plane before compression. However, it is important to note that these methods are highly content-dependent. Perceptual compression methods aim to improve the perceptual quality of omnidirectional images within the viewport. This category can be divided into viewport-based coding, such as [16,17], and saliency-based adaptive coding, such as [18,19]. Perceptual compression methods can yield improved results while necessitating additional data and increased computational resources.

Sampling density correction methods aim to mitigate the oversampling induced by the projection, thereby enhancing the compression efficiency of panoramic images. Sampling-related methods primarily include down-sampling [20,21] and adaptive quantization [10,22] methods. There are also learned image compression methods leveraging sampling-related approaches [5]. Due to the necessity for complex methodologies and extensive training data, it falls outside the scope of our consideration. Sampling density correction methods, primarily based on conventional 2D methods with enhancements, are relatively straightforward to understand and implement. However, down-sampling methods predominantly hinge on intuitive insights and exhibit a limited foundation in terms of robust theoretical support. In addition, adaptive quantization methods partly rely on conventional techniques that were not originally designed for omnidirectional images.

Our paper falls within the category of sampling density correction methods and focuses on omnidirectional images under ERP, where the design of quantizers plays a pivotal role. Methods already proposed predominantly employ uniform quantization and rely on conventional quantization techniques such as JPEG [23] and quantization parameter (QP) in HEVC [24]. However, these conventional methods, designed for 2D images and videos, may not be well-suited for omnidirectional images, whereas our method can produce non-uniform scalar quantizers with assigned bits.

In this paper, we introduce a novel NNIBA technique to achieve high-quality image quantization through a combination of greedy bit allocation and non-uniform quantization of the transform coefficients. It is crucial to emphasize that our method is compatible with any block-based transformation, like discrete cosine transform (DCT). The process begins by performing the transform block by block, obtaining statistical parameters of a batch of images, followed by deriving the optimal real-valued bit allocation (ORBA). Due to the characteristics of spherical image projection, we posit that coefficients at different latitudes exhibit distinct distributions. To capture this, we utilize empirical distribution instead of the Gaussian distribution assumed in [25]. The subsequent step involves a two-step greedy algorithm to allocate the non-negative integer (NNI) bits. Further, we design non-uniform quantizers using the Lloyd algorithm instead of conventional quantization techniques, which has demonstrated superior performance than uniform quantizers employed by [20,21]. Because of certain spherical properties inherent in panoramic images, conventional 2D image quality metrics such as peak signal-to-noise ratio (PSNR) are no longer suitable for evaluating panoramic images. Consequently, for practicality and simplicity, we have adopted the WS-PSNR metric and WMSE distortion [26]. With the help of experiments, we have validated that the proposed NNIBA technique outperforms previously proposed sampling correction and bit allocation methods under WS-PSNR. Our proposed method effectively reduces redundancy in omnidirectional image compression and enhances the quantization of transform coefficients, thereby improving overall compression quality. This improvement enhances the storage and transmission performance of omnidirectional images. In applications such as AR and VR, where real-time requirements are critical, it can reduce image latency and provide superior image quality.

Our main contributions can be summarized in the following aspects:

(1): To quantize transform coefficients with non-uniform scalar quantizers, we derive the theoretical ORBA and provide a detailed analysis under WMSE distortion.
(2): Two consecutive low-complexity greedy algorithms are employed to obtain the integer bit allocation.
(3): We empirically verify the effectiveness of our method through simulation.

The rest of the paper is organized as follows. In Section 2, we provide an overview of related work on sampling-related methods and integer bit allocation. In Section 3, we introduce the system model and formulate the optimization problem. The theoretical analysis for ORBA and proposed greedy algorithms are presented in Section 4 and Section 5, respectively. Section 6 shows the validation experiment. Finally, Section 7 concludes the whole paper. Additionally, we first list the notations used in the paper in Table 1 below.

Table 1. Summary of the main symbols used in the document.

2. Related Work

Sampling density correction can be divided into two main types, down-sampling and adaptive quantization. In the case of down-sampling methods, Budagavi et al. [20] introduced a variable smoothing approach that employs Gaussian smoothing filters on the upper and lower regions of 360° video under ERP. Similarly, Youvalari et al. [21] proposed a strategy to divide panoramic images under ERP into several down-sampled strips based on latitude. Note that in both approaches [20,21], down-sampling becomes progressively harsher in the two vertical directions towards the top and bottom borders. Additionally, Lee et al. [27] presented a scheme that employs rhombus-shaped latitude down-sampling followed by pixel rearrangement. Nevertheless, the preprocessing approaches may not have direct control over the bitrate for panoramic image coding. Consequently, compressed panoramic images may struggle to regain their original quality at higher rates.

When it comes to adaptive quantization, rate-distortion optimization (RDO) methods have been adapted for omnidirectional videos and images. Liu et al. [22] proposed optimizing panoramic video encoding by maximizing S-PSNR [28] at a given bitrate. Li et al. [24] suggested incorporating the weight value of WS-PSNR [26] at the center of each block as a scalar for the Lagrangian multiplier

λ

in RDO. However, the RDO methods mainly relied on the HEVC technique and did not give out the formulation for integer bit allocation. Apart from RDO methods, De Simone et al. [10] proposed a method to adapt the typical quantization tables of JPEG according to frequency shift, which was easy to implement but constrained by the JPEG table. Both the RDO methods and the methods proposed by De Simone et al. used the quantization techniques designed for 2D images instead of for omnidirectional images.

Integer bit allocation performs as a pivotal technique in image transform coding, playing a crucial role in the quantization of transform coefficients. This method significantly contributes to the coding process by efficiently managing the allocation of bits. Huang et al. [29] used the high-resolution quantization approximations and found the solution to the ORBA problem for the transform coders under mean squared error (MSE). However, the practical solution for quantizing the transform coefficient has to be an NNI value. Fox [30] improved the ORBA algorithm and proposed an NNIBA algorithm under MSE. Bit allocation approaches have been widely used in image processing. Thakur et al. [25] proposed a greedy algorithm under MSE measure and obtained a lookup table for the quantization table elements via non-linear regression analysis, thus facilitating the reduction in the additional side information requirement. Li et al. [31] conducted an in-depth study focusing on the joint design of graph signal sampling along with quantization for graph signal compression in a task-based quantization manner. They proposed a joint design of the sampling and recovery mechanisms for a fixed quantization mapping and presented an iterative algorithm for dividing the available bit budget. Nevertheless, none of these methods were designed for omnidirectional images. Besides, neither Thakur et al. nor Li et al. took the distribution of coefficients into consideration, and their methods cannot be directly applied to 360° images.

3. Problem Formulation

3.1. System Model

In the initial stage, we outline the processing chain of our method, illustrated in Figure 2. We consider a batch of omnidirectional images under ERP, with sizes

M \times N

where

N = 2 M

, as depicted in Figure 2a. We partition the images into blocks of size

L \times L

to facilitate subsequent transform, similar to traditional image processing techniques. The signal in a block is represented as an

L \times L

matrix denoted as

X

. After performing block-wise transform, the coefficient matrices can be written as

C = {TXT}^{T}

, where

T

is an

L \times L

transform matrix, e.g., the DCT matrix. Subsequently, we vectorize the matrix

C

and denote the coefficients by

c \in R^{L^{2}}

. The omnidirectional image is then divided into

K = M / L

rows and

2 K

columns, with latitudes denoted by k for

k = 1, \dots, K

, as depicted in Figure 2b. We denote

c

at latitude k by

c_{k}

and the

L^{2}

elements in

c

by

c_{k, l}

where

l = 1, 2, \dots, L^{2}

. Throughout this process, we quantize the coefficients with

K L^{2}

scalar quantizers produced by the bit allocation process, which will be elaborated in detail subsequently. Figure 2c presents the recovered images after quantization and inverse transform. At last, we evaluate the quality of quantized images by WS-PSNR.

Figure 2. Processing chain of our method. (a) A batch of original images. (b) Coefficient matrices partitioned into blocks. (c) Recovered images after quantization. (d) Distribution modeling of coefficients at different latitudes. (e) Bit allocation among titles. (f) Subsequent bit allocation in each tile.

3.2. Latitude-Adaptive Bit Allocation Problem

The right part of Figure 2 illustrates the integer bit allocation process proposed in our study. We allocate different bits to tiles at various latitudes, resulting in distinct quantizers for each tile. We assume that the coefficients

c_{k, l}

of different blocks at the same latitude k follow the same distribution and share quantizers, so every tile has

L^{2}

scalar quantizers. Additionally, we assume coefficients in different tiles follow their respective distribution, necessitating the production of their own quantizers. As there are K tiles, the total number of quantizers is

K L^{2}

.

In our research, we delve into the strategy of allocating integer bits across various quantizers. The distributions of coefficients are needed in the bit allocation method. Therefore, we begin by analyzing the distribution of

K L^{2}

coefficients, as shown in Figure 2d. Notably, we observe that the distribution of

c_{k, 1}

is different from that of other coefficients

c_{k, l}

, where

l = 2, \dots, L^{2}

, which will be further demonstrated in Section 6. As the transform removes the correlation between coefficients, we treat

c_{k, l}

independently and employ empirical distributions to model their distribution. In Figure 2e, we consider allocating bits to K latitudes, where each block at latitude k is allocated

b_{k}

bits. Moving to Figure 2f, we proceed to allocate bits to the

L^{2}

quantizer within each tile, where each quantizer at latitude k is allocated

b_{k, l}

bits. Once every scalar quantizer is allocated with NNI bits, we can produce quantizers and use them to quantize coefficients in Figure 2b.

Our goal is to minimize the reconstruction error under WMSE to recover the quantized panoramic images. The WMSE is defined as

WMSE = \frac{\sum_{i = 0}^{M - 1} \sum_{j = 0}^{N - 1} (w (i, j) {(y (i, j) - y^{'} (i, j))}^{2})}{\sum_{i = 0}^{M - 1} \sum_{j = 0}^{N - 1} w (i, j)},

(1)

where

y (i, j)

and

y^{'} (i, j)

are samples at position

(i, j)

on the

M \times N

original and recovered images, respectively. The weights of ERP are calculated as

w (i, j) = cos \frac{(j + 0.5 - M / 2) π}{M} .

(2)

Similar to PSNR, WS-PSNR for the omnidirectional image is calculated via the WMSE:

WS-PSNR = 10 {log}_{10} (\frac{{MAX}_{I}^{2}}{WMSE}),

(3)

where

{MAX}_{I}

is the maximum possible intensity level of the image.

Since our method is block-based and involves an orthogonal transform preserving the distance, we employ an average block weight of ERP as a replacement for the individual point weights within each block. The average block weight, denoted by

ω_{k}

, is solely dependent on the latitude of the block and is determined by the weight of the center point of the block. As mentioned earlier in this subsection, we make the assumption that distinct blocks within the same tile conform to the same distribution and the total bit budget is evenly allocated among blocks at the same latitude. With this assumption, we can formulate our problem as

\begin{matrix} min_{Q (\cdot)} E \{\sum_{k = 1}^{K} ω_{k} {∥c_{k} - E \{c_{k} ∣ Q^{b_{k}} (c_{k})\}∥}^{2}\} \\ s . t . \sum_{k = 1}^{K} b_{k} \leq \frac{B}{2 K}, b_{k} \in N, \end{matrix}

(4)

where B is the total budget assigned to the image,

b_{k}

is the bit budget allocated to quantize

c_{k}

, and

Q^{b_{k}}

is the vector quantizer with budget

b_{k}

.

In the following sections of the paper, we utilize

L^{2}

scalar quantizers for coefficients within each block. In this way,

\sum_{l = 1}^{L^{2}} b_{k, l} = b_{k}

and this optimization problem involving NNI can be formulated as follows:

\begin{matrix} min_{Q (\cdot)} E \{\sum_{k = 1}^{K} ω_{k} {∥c_{k} - E \{c_{k} ∣ Q^{b_{k, l}} (c_{k, l})\}∥}^{2}\} \\ s . t . \sum_{k = 1}^{K} \sum_{l = 1}^{L^{2}} b_{k, l} \leq \frac{B}{2 K}, b_{k, l} \in N, \end{matrix}

(5)

where

Q^{b_{k, l}}

is the scalar quantizer with budget

b_{k, l}

and

b_{k, l}

is the bit budget allocated to quantize

c_{k, l}

.

4. Theoretical Analysis

The optimization problem (5) is NP-hard with integer constraints, making it challenging to solve directly. In this section, we present the ORBA theorem when NNI constraints are relaxed under the WMSE measure. To start with, we focus on the bit allocation for

L^{2}

scalar quantizers in a block at latitude k with bit quota

b_{k}

, where the coefficients to be quantized are denoted by

c_{k, l}

, where

l = 1, \dots, L^{2}

. When the high-resolution assumptions are retained and

c_{k, l}

are non-identically distributed, the MSE distortion of the l-th scalar quantizer is given by the following formula [32],

d_{k, l} = h_{k, l} σ_{k, l}^{2} 2^{- 2 b_{k, l}},

(6)

where

σ_{k, l}^{2}

is the variance of random variable

c_{k, l}

,

h_{k, l} = \frac{1}{12} {\{\int_{- \infty}^{\infty} {[f_{k, l} (c_{k, l})]}^{1 / 3} d c_{k, l}\}}^{3}

, and

f_{k, l} (c_{k, l})

is the normalized probability density function (pdf) of

c_{k, l}

with unit variance. In other words, the optimization problem can be formulated as

\begin{matrix} min_{b_{k, l}} E \{\sum_{l = 1}^{L^{2}} h_{k, l} σ_{k, l}^{2} 2^{- 2 b_{k, l}}\} \\ s . t . \sum_{l = 1}^{L^{2}} b_{k, l} \leq b_{k}, \end{matrix}

(7)

and we can derive a formal expression for bit allocation without NNI constraints, as stated in Lemma 1.

Lemma 1.

When

{b_{k, l}}

are not constrained to be NNI and the high-resolution assumption holds, the bit allocation to minimize MSE per block given quota

b_{k}

can be formulated as

b_{k, l} = {\bar{b}}_{k} + \frac{1}{2} {log}_{2} \frac{σ_{k, l}^{2}}{ρ_{k}^{2}} + \frac{1}{2} {log}_{2} \frac{h_{k, l}}{H_{k}},

(8)

where

ρ_{k}^{2} = {(\prod_{l = 1}^{L^{2}} σ_{k, l}^{2})}^{\frac{1}{L^{2}}}

,

H_{k} = {(\prod_{l = 1}^{L^{2}} h_{k, l})}^{\frac{1}{L^{2}}}

, and

{\bar{b}}_{k} = \frac{b_{k}}{L^{2}}

.

The MSE distortion per block is

H_{k} L^{2} ρ_{k}^{2} 2^{- 2 {\bar{b}}_{k}} .

(9)

Remark 1.

When the minimum expected MSE distortion in Lemma 1 is achieved,

L^{2}

quantizers produce identical distortion, which is

H_{k} ρ_{k}^{2} 2^{- 2 {\bar{b}}_{k}}

.

Lemma 1 presents the minimum overall distortion (9) under MSE distortion. To derive the overall WMSE distortion, we assume to use the

ω_{k}

to represent the average weight of the block at latitude k. Thus, the overall WMSE distortion in the block with ERP weight

ω_{k}

is

D_{k} = ω_{k} H_{k} L^{2} ρ_{k}^{2} 2^{- 2 {\bar{b}}_{k}} = g_{k} 2^{- 2 {\bar{b}}_{k}},

(10)

where we let

g_{k} = ω_{k} H_{k} L^{2} ρ_{k}^{2}

.

From Lemma 1, we know that when the bits allocated to one block

b_{k}

are given, the optimal bit allocation among

L^{2}

scalar quantizers is decided by the

ρ_{k}^{2}

and

h_{k, l}

, which is related to the distributions of the transform coefficients. Meanwhile, the overall WMSE distortion in a block is related to the weight

ω_{k}

,

L^{2}

distributions, and

b_{k}

. As emphasized in Remark 1, each scalar quantizer produces equivalent distortion when the minimum distortion is achieved. Subsequently, we need to consider how we allocate

\frac{B}{2 K}

bits to K tiles.

Utilizing Lemma 1 in conjunction with (10), we proceed to readdress our relaxed optimization problem, which now can be formulated as

\begin{matrix} min_{b_{k}} E \{\sum_{k = 1}^{K} g_{k} 2^{- 2 \frac{b_{k}}{L^{2}}}\} \\ s . t . \sum_{k = 1}^{K} b_{k} \leq \frac{B}{2 K}, \end{matrix}

(11)

Subsequently, we present the solution along with the minimum WMSE distortion to the optimization problem (11) in Theorem 1.

Theorem 1.

When the assumption of Lemma 1 holds, the expected WMSE distortion in one image is

D = 2 K^{2} G 2^{- 2 \bar{b}} .

(12)

The bit allocated to one block at latitude k is

b_{k} = \frac{B}{2 K^{2}} + \frac{L^{2}}{2} {log}_{2} \frac{g_{k}}{G},

(13)

where

\bar{b} = \frac{B}{M N}

,

G = {(\prod_{k = 1}^{K} g_{k})}^{\frac{1}{K}}

.

Proof.

The geometric mean inequality states that for any positive numbers

a_{k}, k = 1, 2, \dots, K

, it satisfies that

\frac{1}{K} \sum_{k = 1}^{K} a_{k} \geq {(\prod_{i = k}^{K} a_{k})}^{\frac{1}{K}},

(14)

with equality if the

a_{k}

are all equal.

Apply this inequality to (11), considering

\sum_{k = 1}^{K} b_{k} \leq \frac{B}{2 K}

, and then we get

\begin{matrix} \frac{1}{K} \sum_{k = 1}^{K} g_{k} 2^{- 2 \frac{b_{k}}{L^{2}}} \geq {(\prod_{k = 1}^{K} g_{k} 2^{- 2 \frac{b_{k}}{L^{2}}})}^{\frac{1}{K}} \\ = {(\prod_{k = 1}^{K} g_{k})}^{\frac{1}{K}} 2^{- \frac{2}{K} \sum_{k = 1}^{K} \frac{b_{k}}{L^{2}}} \\ = G 2^{- 2 \sum_{k = 1}^{K} \frac{b_{k}}{M L}} \\ \geq G 2^{- 2 \frac{B}{2 K M L}} \\ = G 2^{- 2 \bar{b}} . \end{matrix}

(15)

Equality holds if, for each k,

g_{k} 2^{- 2 \frac{b_{k}}{L^{2}}} = G 2^{- 2 \frac{B}{M N}},

(16)

so that

b_{k} = \frac{B}{2 K^{2}} + \frac{L^{2}}{2} {log}_{2} \frac{g_{k}}{G}

, which is (13). Considering there are

2 K

blocks in a tile, the expected overall distortion in one image is (12). □

We establish Theorem 1 by leveraging the WMSE distortion of the block without NNI constraints, and obtain the bits to be allocated to latitude k. Once

b_{k}

is determined, the computation of

b_{k, l}

according to (8) is straightforward. Building upon Theorem 1, we derive Corollary 1.

Corollary 1.

When the minimum expected WMSE distortion in Theorem 1 is achieved, quantizers of different tiles produce identical distortion, which is

D_{tile} = 2 K G 2^{- 2 \bar{b}}

.

The proposed Corollary 1 can be used to present the theoretical underpinnings of our greedy algorithm, which indicates that we consistently assign bits to the tile incurring the highest distortion.

5. Algorithm

5.1. Two-Step NNI Bit Allocation

It is desirable to implement a solution with an NNI number of bits for each quantizer. However, the problem (5) is a combinatorial optimization problem, and the derived

b_{k, l}

in Section 4 does not guarantee the allocation of NNI bits. The greedy algorithm is a heuristic algorithm commonly used to find approximate solutions to combinatorial optimization problems, with lower complexity compared to exhaustive search. Compared with methods like genetic algorithms and simulated annealing algorithms, the greedy algorithm is relatively simple to design and implement. Allocating integer bits to a large number of quantizers directly is challenging. Hence, we propose a two-step algorithm to find an NNIBA. Referring back to Figure 2, our bit allocation method is executed in two steps. First, we consider allocating bits to blocks at different latitudes according to (10) and then proceed to allocate bits to each quantizer within each block, as illustrated in Figure 2e,f, respectively. The impact of blocks at different latitudes on WMSE varies, justifying the allocation of different numbers of bits. Since

ω_{k}

increases when getting close to the equator, intuitively, we allocate more bits to blocks near the equatorial region to account for this difference.

As discussed in Theorem 1, we can replace the WMSE distortion of all quantizers within a block with the average distortion

D_{k} (b_{k}) = g_{k} 2^{- 2 \frac{b_{k}}{L^{2}}}

, and the overall distortion D is denoted as

D = 2 K \sum_{k = 1}^{K} D_{k}

. Notice that

g_{k}

is related to the normalized pdf and variance of the coefficients at latitude k, necessitating the calculation of their empirical distribution, as seen in Figure 2d. Based on Corollary 1, our approach involves a progressive allocation of bits to the neediest quantizer, meaning the one that incurs the highest distortion with the current bit assignment. This iterative process continues one bit at a time until all available bits are allocated, as detailed in Algorithm 1.

Algorithm 1 Tile bit allocation

Require:: B, $g_{k}$ , K
1:: Initialize the bit allocation to zero, so that $b_{k}^{0} = 0$ for each $k = 1, \dots, K$ and $m = 0$ . Set $s_{k}^{0} = D_{k} (0)$ .
2:: while $m < \frac{B}{2 K}$ do
3:: Find the index j with the maximum $s_{k}^{m}$ .
4:: Set $b_{j}^{m + 1} = b_{j}^{m} + 1$ .
5:: Set $b_{k}^{m + 1} = b_{k}^{m}$ for each $k \neq j$ .
6:: Set $s_{k}^{m + 1} = D_{k} (b_{k}^{m + 1})$ .
7:: Increment m by 1.
8:: end while
9:: return $b_{k}$ for $k = 1, \dots, K$ .

After we allocate NNI bits to each tile, the subsequent step involves the allocation of

b_{k}

bits within one block of size

L \times L

at latitude k, as depicted in Figure 2f. Remark 1 provides the theoretical justification for such an allocation, which states that we allocate bits to the quantizer generating the maximum distortion in a block. The allocation process follows a similar methodology as before. The budget of one block with latitude k is

b_{k}

, and the distortion of the l-th quantizer is

d_{l} (b_{k, l}) = h_{k, l} σ_{k, l}^{2} 2^{- 2 b_{k, l}}

. In light of these considerations, we derive Algorithm 2.

The output of Algorithm 2 represents the NNIBA for

b_{k, l}

at latitude k. By repeating Algorithm 2 K times, we successfully complete the process of assigning bits to each scalar quantizer. Our bit allocation methodology is versatile, applicable to the design of quantizers, as well as the enhancement of other quantization approaches.

We compare the time complexity between our two-stage greedy method and the method of directly applying the greedy algorithm to all quantizers. As described in the optimization problem (5), we need to allocate

\frac{B}{2 K}

bits to

K L^{2}

scalar quantizers. First, we consider the complexity of applying the greedy algorithm directly to

K L^{2}

quantizers. In each round of bit allocation, we need to determine the maximum distortion among

K L^{2}

quantizers. Thus, the time complexity of applying the greedy algorithm directly to

K L^{2}

quantizers is

O (\frac{B}{2 K} \cdot K L^{2}) = O (\frac{B L^{2}}{2})

. Next, we analyze the time complexity of the proposed two-step algorithms. As described in Algorithm 1, allocating bits to K latitudes has a time complexity of

O (\frac{B}{2 K} \cdot K) = O (\frac{B}{2})

. As shown in Algorithm 2, allocating within each block has a complexity of

O (\sum_{k = 1}^{K} b_{k} L^{2}) = O (\frac{B L^{2}}{2 K})

. Thus, the total time complexity of our two-step algorithm is

O (\frac{B}{2} + \frac{B L^{2}}{2 K})

, offering lower complexity compared to directly applying a greedy algorithm to

K L^{2}

quantizers.

Algorithm 2 Block bit allocation

Require:: $b_{k}$ , L, $h_{k, l}$ , $σ_{k, l}^{2}$
1:: Initialize the bit allocation to zero, so that $b_{k, l}^{0} = 0$ for each $l = 1, \dots, L^{2}$ and $m = 0$ . Set $s_{l}^{0} = d_{l}^{0}$ .
2:: while $m < b_{k}$ do
3:: Find the index j with the maximum $s_{l}^{m}$ .
4:: Set $b_{k, j}^{m + 1} = b_{k, j}^{m} + 1$ .
5:: Set $b_{k, l}^{m + 1} = b_{k, l}^{m}$ for each $l \neq j$ .
6:: Set $s_{l}^{m + 1} = d_{l} (b_{k, l}^{m + 1})$ .
7:: Increment m by 1.
8:: end while
9:: return $b_{k, l}$ for $l = 1, \dots, L^{2}$ .

5.2. Distribution Modeling and Non-Uniform Quantizer Design

When the resolution of the quantizer is high, non-uniform quantizers demonstrate a heightened ability to closely align with an input pdf. This enhanced alignment results in a notably reduced average distortion, thereby offering a more efficient quantization performance [32]. Leveraging the high-resolution assumption in our mathematical derivation, we obtain a non-uniform quantization table for each of

K L^{2}

quantizers using Lloyd algorithm [33] based on the assigned bits and the corresponding coefficients.

In the absence of knowledge about the pdf, we resort to employing empirical data and numerical integration techniques. Consider the l-th scalar quantizer at latitude k, the bits allocated to this quantizer are

b_{k, l}

. Assuming the existence of a batch of images with size Z, we obtain

2 K Z

coefficient data for this quantizer. These data serve as direct random samples in a Monte Carlo integration. With the resolution

r = 2^{b_{k, l}}

specified, the quantizer designed for the empirical distribution can be implemented by the Lloyd algorithm. Repeatedly applying the Lloyd algorithm to all

K L^{2}

quantizers, we obtain the corresponding quantization tables.

6. Experiments

In the preceding section, the detailed theoretical analysis and algorithms for allocating NNI bits have been successfully developed. In this section, we present the experimental results showcasing the final quantization performance measured under WS-PSNR.

6.1. Experimental Setup

Several experiments are conducted to assess the compression performance of the proposed method, categorized into three parts. In the initial phase, we present the distribution of DCT coefficients produced by images from the Pano3D dataset [34] and explain our rationale for using empirical distributions. In the second phase, we apply our methodologies to the JPEG quantization table. Nearly all RDO methods are predominantly applied in the context of video processing and typically utilize the HEVC reference software (https://vcgit.hhi.fraunhofer.de/jvet/HM). Owing to their incompatibility with direct application to omnidirectional images, we have excluded these methods from the scope of our comparative analysis. Instead, we perform a comparative analysis of our methods against two previously JPEG-based approaches, utilizing the LIC360 dataset [5]. In the last phase, our methods are juxtaposed with two novel integer bit allocation approaches, utilizing both the Pano3D dataset [34] and LIC360 dataset [5].

Compression methods are evaluated in two terms, i.e., rate and distortion. In the second part, we remove the entropy coding block from the JPEG coder and estimate the final rate using the first-order entropy (FOE) to assess the effectiveness of the technique. Besides, we compare our methods with others in the Bjontegaard gain (

Δ_{B D}

) [35] in terms of the average percentage of rate difference for the same objective quality. In the third part, we evaluate the rate with the bit budget allocated to compress the image, the and bits per pixel (bpp) are calculated as the total bit budget divided by the number of pixels. The distortion is calculated using the WMSE, and our metric is WS-PSNR, where a higher value indicates superior performance. All experiments were conducted using MATLAB 2020b.

6.2. Empirical Distribution of Coefficients

In this section, we display the statistic coefficients of 100 images sourced from the Pano3D dataset [34]. Our analysis adopts a block length configuration of

L = 8

, resulting in a total of 64 coefficients within each block, derived from the

8 \times 8

DCT matrix. The image size is

256 \times 512

, so there are

K = \frac{M}{L} = 32

latitudes. We denoted the coefficients by

c_{k, l} \in R^{64}

where k represents the latitude and l represents the l-th quantizer in the block. In the context of the DCT, the initial coefficient

c_{k, 1}

is identified as the direct current (DC) coefficient at latitude k, which represents the average intensity of the block. The subsequent 63 coefficients are classified as alternating current (AC) coefficients, encapsulating higher frequency components within the image block.

Figure 3 displays the frequency of coefficients at latitude 1 in each interval, used to demonstrate the distribution of coefficients. Our observations distinctly reveal a substantial disparity in the distribution patterns of AC and DC coefficients. Consequently, it is not optimal to operate under the assumption that all coefficients adhere to the same distribution. Instead, employing an empirical distribution presents a more efficacious approach, as it better accommodates the inherent structure of the coefficients. Figure 4 displays the coefficient frequencies at latitude 17 within each interval. As presented in Figure 3 and Figure 4, it is evident that the distribution of coefficients varies with latitude. This variation implies that the bit allocation and the design of distinct scalar quantizers for coefficients at various latitudes could yield significant benefits in terms of quantization efficiency.

In this aspect of our research, we primarily encountered two obstacles. The first obstacle pertains to how we model distributions. Initially, we modeled each coefficient using the parametric model. Fitting all coefficients with the Gaussian distribution overlooks the differences in distribution among coefficients. Thus, we applied a Gaussian mixture distribution to the DC coefficient and a Laplace distribution to the AC coefficients. Nevertheless, the effectiveness of fitting remains unsatisfactory. Ultimately, we resorted to empirical distributions and improved the performance of the quantizers.

The second problem was that the number of bits allocated to DC coefficients was comparatively lower than in other methods. However, the quality of DC coefficients has a greater impact on image quality compared to AC coefficients. Therefore, we increased the amount of input data and enhanced the importance of DC coefficients in bit allocation, which led to improved results.

6.3. Results and Analysis for JPEG-Based Methods

In this part, our bit allocation methods are applied to adjust JPEG quantization tables, and a comparative analysis is conducted with other JPEG-based methods, including traditional codec JPEG [23], and two improved methods designed for omnidirectional images by Budagavi et al. [20] and De Simone et al. [10]. This comparison is performed on images with a resolution of

512 \times 1024

pixels.

Within our methods, bit allocation is computed according to the proposed greedy algorithm. We denote the bits allocated to blocks at latitude k by

b_{k}

and denote the block length by L where L is set to

L = 8

. The quality ratio is then calculated as

\frac{b_{k}}{L^{2}}

. We adjust the quality factor of the JPEG quantization table according to the following formula, where a larger quality factor indicates better image quality.

Q_{cur} = \frac{b_{k}}{L^{2}} \times Q_{base},

(17)

where the base quality factor

Q_{base}

is set to 50, and

Q_{cur}

represent the quantization quality factor of the current block. De Simone et al. [10] proposed that each block on the omnidirectional image is associated with an elevation angle varying from 0 to

π

, corresponding to the elevation of the block projected on the sphere. Similar to our method, its quantization table also varies with the latitude. In the following examples, we illustrate some instances of the adjusted JPEG table in our methods. The quantization table at the elevation angle

\frac{π}{4}

with

r a t i o = 0.67

is presented in Table 2. To demonstrate the variation of our quantization tables across different ratios and latitudes, we change the elevation angle to

\frac{π}{2}

in Table 3, where the block is at the center of the omnidirectional image. We increase the bits allocated to the whole image, resulting in a higher quality ratio at the elevation angle

\frac{π}{4}

, and the quantization table is shown in Table 4.

Figure 3. Coefficient frequencies at latitude 1. (a) DC coefficients

c_{1, 1}

. (b) AC coefficients

c_{1, 2}

. (c) AC coefficients

c_{1, 30}

. (d) AC coefficients

c_{1, 64}

.

Figure 3. Coefficient frequencies at latitude 1. (a) DC coefficients

c_{1, 1}

. (b) AC coefficients

c_{1, 2}

. (c) AC coefficients

c_{1, 30}

. (d) AC coefficients

c_{1, 64}

.

Figure 4. Coefficients frequencies at latitude 17. (a) DC coefficients

c_{17, 1}

. (b) AC coefficients

c_{17, 2}

. (c) AC coefficients

c_{17, 30}

. (d) AC coefficients

c_{17, 64}

.

Figure 4. Coefficients frequencies at latitude 17. (a) DC coefficients

c_{17, 1}

. (b) AC coefficients

c_{17, 2}

. (c) AC coefficients

c_{17, 30}

. (d) AC coefficients

c_{17, 64}

.

Table 2. The quantization table with

r a t i o = 0.67

used in our method at the elevation angle

\frac{π}{4}

.

Table 3. The quantization table with

r a t i o = 0.67

used in our method at the elevation angle

\frac{π}{2}

.

Table 4. The quantization table with

r a t i o = 1.03

used in our method at the elevation angle

\frac{π}{4}

.

A comparison of the quantization tables in Table 2 and Table 3 reveals that our method employs more refined quantization tables near the equator, which significantly contributes to WS-PSNR. This implies that our quantization strategy places heightened importance on regions near the equator. Furthermore, a comparative analysis between Table 2 and Table 4 reveals a discernible inverse relationship between the allocated number of bits and the quantization step size. As the bit allocation increases, there is a corresponding reduction in the quantization step size, leading to a more refined quantization process. This observed trend aligns consistently with theoretical expectations and intuitive understanding.

We then execute a comprehensive rate-distortion analysis, wherein our approach is benchmarked against alternative methods. In each experiment, 100 images are randomly selected from the LIC360 dataset [5]. We repeat the experiment 10 times and calculate the average value. As a result, the rate-distortion curves are shown in Figure 5.

Figure 5. Rate distortion curves for JPEG-based methods over 100 images with a resolution of

512 \times 1024

pixels [10,20,23].

Figure 5 illustrates that our method consistently achieves higher WS-PSNR compared to alternative approaches at equivalent entropy levels. In contrast to our methodology, JPEG [23] does not take into account the weights of WS-PSNR. The approach proposed by De Simone et al. [10] adjusts the quantization table based on frequency, constrained by JPEG’s quantization table and lacking the flexibility inherent in our approach. Additionally, the method of Budagavi et al. [20] applies Gaussian filters at different latitudes to omnidirectional images, followed by the application of the JPEG method [23]. While their approach exhibits commendable performance at low rates, its efficacy diminishes with increasing rates. At lower rates, the filtering impact minimally influences images, effectively reducing the rate and outperforming De Simone et al.’s method [10] and the original JPEG method. However, at higher rates, image recovery outcomes are more significantly affected by the filtering process, limiting the final recovery performance of panoramic images.

Additionally, the averaged

Δ_{B D}

results of 100 omnidirectional images are given in Table 5. The method has a better performance than the benchmark when

Δ_{B D}

is negative, meaning that for the same objective quality, a rate saving is reached when compressing the omnidirectional images. As illustrated in Table 5, our method achieves an average bit saving of 7.9% compared with JPEG, which is the highest among the compared techniques. This result underscores the superior efficiency of our approach in image compression.

Table 5.

Δ_{B D}

of the average percentage of rate difference between JPEG-based omnidirectional methods and JPEG for WS-PSNR.

6.4. Results and Analysis for NNI-Based Methods

Given the absence of specifically designed integer bit allocation methods for omnidirectional images, we compare our method with those proposed by Thakur et al. [25] and Li et al. [31] whose bit allocation methods were designed for planar images. All methods are given the same bit budget and their quality is evaluated using WS-PSNR. In contrast to the method described in Section 6.3, we utilize non-negative scalar quantization based on the Lloyd algorithm following the greedy algorithms. To compare the effects of uniform quantization and non-uniform quantization, we have also designed uniform quantizers after bit allocation. This comparative analysis is conducted both on images with dimensions of

256 \times 512

pixels and

512 \times 1024

pixels. The results are shown in Figure 6 and Figure 7, respectively. The black line and blue line represent the results obtained by generating non-uniform quantizers and uniform quantizers after NNIBA, respectively.

Figure 6. Rate distortion curves for non-negative integer bit allocation (NNIBA) methods over 100 images with a resolution of

256 \times 512

pixels [25,31].

Figure 7. Rate distortion curves for non-negative integer bit allocation (NNIBA) methods over 100 images with a resolution of

512 \times 1024

pixels [25,31].

Combining the rate-distortion curves in Figure 6 and Figure 7, our method achieves an average WS-PSNR improvement of 0.35 dB compared to the method proposed by Li et al. [31],and an average improvement of 2.66 dB compared to the method proposed by Thakur et al. [25] at the same bit budget. The approach delineated by Li et al. [31] demonstrates performance comparable to our method at relatively low rates. This can be attributed to its bit allocation not being in integer bits, but rather based on an integer resolution r, where

r = 2^{b}

and b is the allocated bit. However, integer resolution allocation consumes more time and computational resources in the greedy algorithm compared to our integer bit allocation method, introducing additional complexity. Compared to non-uniform quantizers, the performance of uniform quantizers is not satisfactory, which may be due to our method being designed for non-uniform quantizers. This also demonstrates the effectiveness of using non-uniform quantization in our method. In contrast to the aforementioned methods, our non-uniform approach incorporates a more sophisticated bit allocation strategy across tiles, complemented by the use of non-uniform scalar quantization techniques. Through bit allocation across latitudes, we allocated bits to the more crucial parts of the image, considering both the impact of coefficient distribution and the influence of different latitudes. Additionally, as described in Section 6.2, we have integrated empirical distributions into our framework to further optimize the bit distribution and quantization process, enhancing its overall effectiveness.

Then, we present a visual comparison among the three methods at the same rate, which is evaluated by the bit budget. Figure 8 shows the test images from the Pano3D dataset [34]. All the four test images are converted to 8-bit greyscale images. Figure 9 and Figure 10 show the zoomed version of test image A and test image B at 0.4 bpp. Figure 11 and Figure 12 present the detailed views of test image C and test image D at 0.75 bpp, respectively. Figure 9, Figure 10, Figure 11 and Figure 12 demonstrate that in comparison to alternative methods, the recovered images quantized by our approach exhibit enhanced details near the equator, effectively capturing high-frequency information within the image. This phenomenon can be attributed to our method’s strategic allocation of more bits to the regions near the equator. In contrast to the other two methods, the zoomed versions of test images reveal a relatively higher allocation of bits to higher-frequency coefficients, resulting in superior quantization of high-frequency coefficients.

Figure 8. Four test images from Pano 3D dataset with a resolution of

256 \times 512

. The red box is used for subsequent zoomed comparisons.

Figure 9. Zoomed version of test image A at 0.4 bpp [25,31].

Figure 10. Zoomed version of test image B at 0.4 bpp [25,31].

Figure 11. Zoomedversion of test image C at 0.75 bpp [25,31].

Figure 12. Zoomedversion of test image D at 0.75 bpp [25,31].

7. Conclusions

In this paper, we introduce a novel NNIBA technique, specifically designed to enhance omnidirectional compression quality via efficient quantization of transform coefficients. Initially, we derive the ORBA using the WMSE measure. This is followed by the implementation of a two-step consecutive greedy algorithm for the allocation of NNI bits to each scalar quantizer. Furthermore, with the help of experiments, it has been validated that the proposed NNIBA technique outperforms the existing sampling density and integer bit allocation methods on omnidirectional images.

Our method still has some limitations. The proposed method relies on coefficients to model empirical distribution and generate non-uniform quantizers, so its performance is influenced by the input omnidirectional images. Furthermore, our method necessitates the Lloyd algorithm to iterate to convergence, thereby entailing higher computational complexity compared to uniform quantization. Additionally, our greedy algorithm cannot guarantee to find the optimal solution to this combinatorial optimization problem (5).

In our future work, we plan to continue our research in three main directions. Firstly, we aim to explore the application of our bit allocation method to neural networks or adapt it to existing image/video compression frameworks for improvement. Secondly, our method is independent of the transform. We may consider utilizing other transforms such as the graph Fourier transform (GFT) to effectively exploit spherical characteristics and further enhance performance. Lastly, we will consider other methods such as simulated annealing or genetic algorithms to obtain approximate optimal solutions for bit allocation.

Author Contributions

Conceptualization, Q.S., H.F. and B.H.; methodology, Q.S. and H.F.; software, Q.S.; validation, Q.S. and H.F.; formal analysis, Q.S.; investigation, Q.S.; resources, Q.S.; data curation, Q.S.; writing—original draft preparation, Q.S.; writing—review and editing, H.F. and B.H.; visualization, Q.S.; supervision, H.F. and B.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ERP	Equirectangular projection
NNIBA	Non-negative integer bit allocation
PSNR	Peak signal-to-noise ratio
NNI	Non-negative integer
VR	Virtual reality
AR	Augmented reality
2D	Two-dimensional
QP	Quantization parameter
ORBA	Optimal real-valued bit allocation
DCT	Discrete cosine transform
RDO	Rate-distortion optimization
pdf	Probability density function
MSE	Mean squared error
FOE	First-order entropy
DC	Direct current
AC	Alternating current
GFT	Graph Fourier transform

References

Li, J.; He, Y.; Hu, Y.; Han, Y.; Wen, J. Learning To Compose 6-DOF Omnidirectional Videos Using Multi-Sphere Images. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 3298–3302. [Google Scholar] [CrossRef]
Miura, Y.; Li, X.; Kang, S.; Sakamoto, Y. Data Hiding Technique for Omnidirectional JPEG Images Displayed on VR Spaces. In Proceedings of the 2018 International Workshop on Advanced Image Technology (IWAIT), Chiang Mai, Thailand, 7–9 January 2018; pp. 1–4. [Google Scholar] [CrossRef]
Chen, J.; Yu, X. Research on Cylindrical Panoramic Video Stitching and AR Perspective Observation Algorithm. In Proceedings of the 2018 International Conference on Virtual Reality and Visualization (ICVRV), Qingdao, China, 22–24 October 2018; pp. 66–69. [Google Scholar] [CrossRef]
Xu, M.; Li, C.; Zhang, S.; Callet, P.L. State-of-the-Art in 360° Video/Image Processing: Perception, Assessment and Compression. IEEE J. Sel. Top. Signal Process. 2020, 14, 5–26. [Google Scholar] [CrossRef]
Li, M.; Li, J.; Gu, S.; Wu, F.; Zhang, D. End-to-End Optimized 360° Image Compression. IEEE Trans. Image Process. 2022, 31, 6267–6281. [Google Scholar] [CrossRef] [PubMed]
Wieczorek, M.A.; Meschede, M. SHTools: Tools for working with spherical harmonics. Geochem. Geophys. Geosyst. 2018, 19, 2574–2592. [Google Scholar] [CrossRef]
Liu, Y.; Liu, J.; Argyriou, A.; Ma, S.; Wang, L.; Xu, Z. 360-degree VR video watermarking based on spherical wavelet transform. Acm Trans. Multimed. Comput. Commun. Appl. (TOMM) 2021, 17, 1–23. [Google Scholar] [CrossRef]
Tosic, I.; Frossard, P. Low Bit-Rate Compression of Omnidirectional Images. In Proceedings of the 2009 Picture Coding Symposium, Chicago, IL, USA, 6–8 May 2009; pp. 1–4. [Google Scholar] [CrossRef]
Bidgoli, N.M.; Maugey, T.; Roumy, A. Intra-Coding of 360-Degree Images on the Sphere. In Proceedings of the 2019 Picture Coding Symposium (PCS), Ningbo, China, 12–15 November 2019; pp. 1–5. [Google Scholar] [CrossRef]
De Simone, F.; Frossard, P.; Wilkins, P.; Birkbeck, N.; Kokaram, A. Geometry-Driven Quantization for Omnidirectional Image Coding. In Proceedings of the 2016 Picture Coding Symposium (PCS), Nuremberg, Germany, 4–7 December 2016; pp. 1–5. [Google Scholar] [CrossRef]
Boyce, J.; Xu, Q. Spherical rotation orientation indication for HEVC and JEM coding of 360 degree video. In Proceedings of the Applications of Digital Image Processing XL, San Diego, CA, USA, 6–10 August 2017; Volume 10396, pp. 61–67. [Google Scholar] [CrossRef]
Su, Y.C.; Grauman, K. Learning Compressible 360° Video Isomers. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7824–7833. [Google Scholar] [CrossRef]
Lin, J.L.; Lee, Y.H.; Shih, C.H.; Lin, S.Y.; Lin, H.C.; Chang, S.K.; Wang, P.; Liu, L.; Ju, C.C. Efficient Projection and Coding Tools for 360° Video. IEEE J. Emerg. Sel. Top. Circuits Syst. 2019, 9, 84–97. [Google Scholar] [CrossRef]
He, Y.; Xiu, X.; Hanhart, P.; Ye, Y.; Duanmu, F.; Wang, Y. Content-Adaptive 360-Degree Video Coding Using Hybrid Cubemap Projection. In Proceedings of the 2018 Picture Coding Symposium (PCS), San Francisco, CA, USA, 24–27 June 2018; pp. 313–317. [Google Scholar] [CrossRef]
Chengjia, W.; Haiwu, Z.; Xiwu, S. Octagonal Mapping Scheme for Panoramic Video Encoding. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 2402–2406. [Google Scholar] [CrossRef]
Sreedhar, K.K.; Aminlou, A.; Hannuksela, M.M.; Gabbouj, M. Viewport-Adaptive Encoding and Streaming of 360-Degree Video for Virtual Reality Applications. In Proceedings of the 2016 IEEE International Symposium on Multimedia (ISM), San Jose, CA, USA, 11–13 December 2016; pp. 583–586. [Google Scholar] [CrossRef]
De La Fuente, Y.S.; Bhullar, G.S.; Skupin, R.; Hellge, C.; Schierl, T. Delay Impact on MPEG OMAF’s Tile-Based Viewport-Dependent 360° Video Streaming. IEEE J. Emerg. Sel. Top. Circuits Syst. 2019, 9, 18–28. [Google Scholar] [CrossRef]
Sitzmann, V.; Serrano, A.; Pavel, A.; Agrawala, M.; Gutierrez, D.; Masia, B.; Wetzstein, G. Saliency in VR: How Do People Explore Virtual Environments? IEEE Trans. Vis. Comput. Graph. 2018, 24, 1633–1642. [Google Scholar] [CrossRef] [PubMed]
Luz, G.; Ascenso, J.; Brites, C.; Pereira, F. Saliency-Driven Omnidirectional Imaging Adaptive Coding: Modeling and Assessment. In Proceedings of the 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), Luton, UK, 16–18 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
Budagavi, M.; Furton, J.; Jin, G.; Saxena, A.; Wilkinson, J.; Dickerson, A. 360 Degrees Video Coding Using Region Adaptive Smoothing. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 750–754. [Google Scholar] [CrossRef]
Youvalari, R.G.; Aminlou, A.; Hannuksela, M.M. Analysis of Regional Down-Sampling Methods for Coding of Omnidirectional Video. In Proceedings of the 2016 Picture Coding Symposium (PCS), Nuremberg, Germany, 4–7 December 2016; pp. 1–5. [Google Scholar] [CrossRef]
Liu, Y.; Xu, M.; Li, C.; Li, S.; Wang, Z. A novel rate control scheme for panoramic video coding. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Venice, Italy, 22–29 October 2017; pp. 691–696. [Google Scholar] [CrossRef]
Wallace, G. The JPEG Still Picture Compression Standard. IEEE Trans. Consum. Electron. 1992, 38, xviii–xxxiv. [Google Scholar] [CrossRef]
Li, Y.; Xu, J.; Chen, Z. Spherical Domain Rate-Distortion Optimization for 360-Degree Video Coding. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017; pp. 709–714. [Google Scholar] [CrossRef]
Thakur, V.S.; Thakur, K.; Gupta, S.; Rao, K.R. Image-Independent Optimal Non-Negative Integer Bit Allocation Technique for the DCT-Based Image Transform Coders. IET Image Process. 2019, 14, 11–24. [Google Scholar] [CrossRef]
Sun, Y.; Lu, A.; Yu, L. Weighted-to-Spherically-Uniform Quality Evaluation for Omnidirectional Video. IEEE Signal Process. Lett. 2017, 24, 1408–1412. [Google Scholar] [CrossRef]
Lee, S.H.; Kim, S.T.; Yip, E.; Choi, B.D.; Song, J.; Ko, S.J. Omnidirectional video coding using latitude adaptive down-sampling and pixel rearrangement. Electron. Lett. 2017, 53, 655–657. [Google Scholar] [CrossRef]
Yu, M.; Lakshman, H.; Girod, B. A framework to evaluate omnidirectional video coding schemes. In Proceedings of the 2015 IEEE International Symposium on Mixed and Augmented Reality, Fukuoka, Japan, 29 September–3 October 2015; pp. 31–36. [Google Scholar] [CrossRef]
Huang, J.; Schultheiss, P. Block Quantization of Correlated Gaussian Random Variables. IEEE Trans. Commun. 1963, 11, 289–296. [Google Scholar] [CrossRef]
Fox, B. Discrete Optimization Via Marginal Analysis. Manag. Sci. 1966, 13, 210–216. [Google Scholar] [CrossRef]
Li, P.; Shlezinger, N.; Zhang, H.; Wang, B.; Eldar, Y.C. Graph Signal Compression by Joint Quantization and Sampling. IEEE Trans. Signal Process. 2022, 70, 4512–4527. [Google Scholar] [CrossRef]
Gersho, A.; Gray, R.M. Vector Quantization and Signal Compression; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 159. [Google Scholar] [CrossRef]
Hamerly, G.; Drake, J. Accelerating Lloyd’s algorithm for k-means clustering. In Partitional Clustering Algorithms; Springer: Berlin/Heidelberg, Germany, 2015; pp. 41–78. [Google Scholar]
Albanis, G.; Zioulis, N.; Drakoulis, P.; Gkitsas, V.; Sterzentsenko, V.; Alvarez, F.; Zarpalas, D.; Daras, P. Pano3d: A holistic benchmark and a solid baseline for 360deg depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3727–3737. [Google Scholar] [CrossRef]
Bjontegaard, G. Calculation of Average PSNR Differences between RD-Curves. ITU SG16 Doc. VCEG-M33. 2001. Available online: https://cir.nii.ac.jp/crid/1570009749353497472 (accessed on 26 October 2023).

Figure 1. Overview of omnidirectional image compression methods.

Figure 2. Processing chain of our method. (a) A batch of original images. (b) Coefficient matrices partitioned into blocks. (c) Recovered images after quantization. (d) Distribution modeling of coefficients at different latitudes. (e) Bit allocation among titles. (f) Subsequent bit allocation in each tile.

Figure 5. Rate distortion curves for JPEG-based methods over 100 images with a resolution of

512 \times 1024

pixels [10,20,23].

Figure 5. Rate distortion curves for JPEG-based methods over 100 images with a resolution of

512 \times 1024

pixels [10,20,23].

Figure 6. Rate distortion curves for non-negative integer bit allocation (NNIBA) methods over 100 images with a resolution of

256 \times 512

pixels [25,31].

Figure 6. Rate distortion curves for non-negative integer bit allocation (NNIBA) methods over 100 images with a resolution of

256 \times 512

pixels [25,31].

Figure 7. Rate distortion curves for non-negative integer bit allocation (NNIBA) methods over 100 images with a resolution of

512 \times 1024

pixels [25,31].

Figure 7. Rate distortion curves for non-negative integer bit allocation (NNIBA) methods over 100 images with a resolution of

512 \times 1024

pixels [25,31].

Figure 8. Four test images from Pano 3D dataset with a resolution of

256 \times 512

. The red box is used for subsequent zoomed comparisons.

Figure 8. Four test images from Pano 3D dataset with a resolution of

256 \times 512

. The red box is used for subsequent zoomed comparisons.

Figure 9. Zoomed version of test image A at 0.4 bpp [25,31].

Figure 10. Zoomed version of test image B at 0.4 bpp [25,31].

Figure 11. Zoomedversion of test image C at 0.75 bpp [25,31].

Figure 12. Zoomedversion of test image D at 0.75 bpp [25,31].

Table 1. Summary of the main symbols used in the document.

Symbol	Description
M	the height of image
N	the width of image $N = 2 M$
L	the block length
l	index in a block where $l = 1, \dots, L^{2}$
$X$	signal in a block
$C$	coefficient in a block
$T$	$L \times L$ DCT matrix
$c$	vectorized $C$
K	total latitudes in one omnidirectional image
k	index for latitude $k = 1, \dots, K$
$c_{k}$	vectorized $C$ at latitude k
$c_{k, l}$	l-th coefficient in $c_{k}$
$y (i, j)$	samples at position $(i, j)$ on the original image
$y^{'} (i, j)$	samples at position $(i, j)$ on the recovered image
$w (i, j)$	weight at position $(i, j)$
${MAX}_{I}$	the maximum possible intensity level of the image
$ω_{k}$	the average block weight at latitude k
$b_{k}$	the bit budget allocated to quantize $c_{k}$
$Q^{b_{k}}$	the vector quantizer with budget $b_{k}$
$b_{k, l}$	the budget assigned to the scalar quantizer at latitude k and index l
$Q^{b_{k, l}}$	the scalar quantizer with budget $b_{k, l}$
$σ_{k, l}^{2}$	the variance of random variable $c_{k, l}$
$f_{l} (c_{k, l})$	the normalized pdf of $c_{k, l}$ which has unit variance
$ρ_{k}^{2}$	the geometric mean of $σ_{k, l}^{2}$
$H_{k}$	the geometric mean of $h_{k, l}$
${\bar{b}}_{k}$	mean of bits in block with budget $b_{k}$
$D_{k}$	the WMSE distortion in one block at latitude k
D	the expected WMSE distortion in one image
$\bar{b}$	the average bit allocated to one block
G	the geometric mean of $g_{k}$
$D_{tile}$	the identical WMSE distortion produced by quantizers of different tiles
$D_{k} (b_{k})$	WMSE distortion at latitude k with bit budget $b_{k}$
$d_{l} (b_{k, l})$	WMSE distortion of the l-th scalar quantizer with bit budget $b_{k, l}$ at latitude k

Table 2. The quantization table with

r a t i o = 0.67

used in our method at the elevation angle

\frac{π}{4}

.

Table 2. The quantization table with

r a t i o = 0.67

used in our method at the elevation angle

\frac{π}{4}

.

24	16	15	24	36	60	76	91
18	18	21	28	39	86	89	82
21	19	24	36	60	85	103	83
21	25	33	43	76	129	119	92
27	33	55	83	101	162	153	115
36	52	82	95	121	155	168	137
73	95	116	129	153	180	179	150
107	137	141	146	167	149	153	147

Table 3. The quantization table with

r a t i o = 0.67

used in our method at the elevation angle

\frac{π}{2}

.

Table 3. The quantization table with

r a t i o = 0.67

used in our method at the elevation angle

\frac{π}{2}

.

17	12	11	17	26	43	54	65
13	13	15	20	28	62	64	59
15	14	17	26	43	61	74	60
15	18	23	31	54	93	85	66
19	23	39	60	73	116	110	82
26	37	59	68	86	111	121	98
52	68	83	93	110	129	128	108
77	98	101	105	119	107	110	106

Table 4. The quantization table with

r a t i o = 1.03

used in our method at the elevation angle

\frac{π}{4}

.

Table 4. The quantization table with

r a t i o = 1.03

used in our method at the elevation angle

\frac{π}{4}

.

16	11	10	16	23	39	49	59
12	12	14	18	25	56	58	53
14	13	16	23	39	55	67	54
14	16	21	28	49	84	78	60
17	21	36	54	66	106	100	75
23	34	53	62	78	101	109	89
47	62	76	84	100	117	116	98
70	89	92	95	109	97	100	96

Table 5.

Δ_{B D}

of the average percentage of rate difference between JPEG-based omnidirectional methods and JPEG for WS-PSNR.

Table 5.

Δ_{B D}

of the average percentage of rate difference between JPEG-based omnidirectional methods and JPEG for WS-PSNR.

Method	$Δ_{BD}$ Rate
Budagavi et al. [20]	2.06
De Simone et al. [10]	−4.41
Proposed Method	−7.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.

24	16	15	24	36	60	76	91
18	18	21	28	39	86	89	82
21	19	24	36	60	85	103	83
21	25	33	43	76	129	119	92
27	33	55	83	101	162	153	115
36	52	82	95	121	155	168	137
73	95	116	129	153	180	179	150
107	137	141	146	167	149	153	147

17	12	11	17	26	43	54	65
13	13	15	20	28	62	64	59
15	14	17	26	43	61	74	60
15	18	23	31	54	93	85	66
19	23	39	60	73	116	110	82
26	37	59	68	86	111	121	98
52	68	83	93	110	129	128	108
77	98	101	105	119	107	110	106

16	11	10	16	23	39	49	59
12	12	14	18	25	56	58	53
14	13	16	23	39	55	67	54
14	16	21	28	49	84	78	60
17	21	36	54	66	106	100	75
23	34	53	62	78	101	109	89
47	62	76	84	100	117	116	98
70	89	92	95	109	97	100	96

24	16	15	24	36	60	76	91
18	18	21	28	39	86	89	82
21	19	24	36	60	85	103	83
21	25	33	43	76	129	119	92
27	33	55	83	101	162	153	115
36	52	82	95	121	155	168	137
73	95	116	129	153	180	179	150
107	137	141	146	167	149	153	147

17	12	11	17	26	43	54	65
13	13	15	20	28	62	64	59
15	14	17	26	43	61	74	60
15	18	23	31	54	93	85	66
19	23	39	60	73	116	110	82
26	37	59	68	86	111	121	98
52	68	83	93	110	129	128	108
77	98	101	105	119	107	110	106

16	11	10	16	23	39	49	59
12	12	14	18	25	56	58	53
14	13	16	23	39	55	67	54
14	16	21	28	49	84	78	60
17	21	36	54	66	106	100	75
23	34	53	62	78	101	109	89
47	62	76	84	100	117	116	98
70	89	92	95	109	97	100	96

24	16	15	24	36	60	76	91
18	18	21	28	39	86	89	82
21	19	24	36	60	85	103	83
21	25	33	43	76	129	119	92
27	33	55	83	101	162	153	115
36	52	82	95	121	155	168	137
73	95	116	129	153	180	179	150
107	137	141	146	167	149	153	147

17	12	11	17	26	43	54	65
13	13	15	20	28	62	64	59
15	14	17	26	43	61	74	60
15	18	23	31	54	93	85	66
19	23	39	60	73	116	110	82
26	37	59	68	86	111	121	98
52	68	83	93	110	129	128	108
77	98	101	105	119	107	110	106

16	11	10	16	23	39	49	59
12	12	14	18	25	56	58	53
14	13	16	23	39	55	67	54
14	16	21	28	49	84	78	60
17	21	36	54	66	106	100	75
23	34	53	62	78	101	109	89
47	62	76	84	100	117	116	98
70	89	92	95	109	97	100	96