1. Introduction
Omnidirectional images, alternatively referred to as panoramic, spherical, or 360° images, represent a novel multimedia format. These images are typically used for virtual reality (VR) [
1,
2], augmented reality (AR) [
3], and various immersive experiences [
4]. The content of 360° images is on the sphere covering the whole
viewing range. In essence, a 360° image provides a seamless view of a full sphere, allowing viewers to explore any direction as if they were centrally positioned within the captured space, which is different from the traditional 2-dimensional (2D) image that only covers a limited plane.
The omnidirectional view and the immersive experience result in large data sizes compared to standard 2D images, especially when dealing with high-resolution 360° images [
4]. Additionally, substantial data of omnidirectional images can introduce latency in transmission as they require more time to load and process, while reduced latency is crucial in applications like live streaming and real-time interaction in AR and VR. With the high-resolution demand for panoramic images and their rapid proliferation on the Internet, developing efficient compression techniques becomes crucial to meet the demands for storage and transmission of omnidirectional images [
5].
There are mainly two methods to compress omnidirectional images, as shown in
Figure 1. One strategy involves processing omnidirectional images within the spherical domain. Several techniques have been developed for representing signals on the entire sphere, including the spherical harmonic transform [
6], the spherical wavelet transform [
7], the Gaussian function on the sphere [
8], and block spherical transforms like the graph Fourier transform [
9]. Methods in the spherical domain can avoid the distortion induced by projection, while they may not outperform well-established 2D image compression techniques, which require further development [
10].
The 360° images can also be projected to 2D planes to leverage the existing traditional image compression standards [
4]. Although 2D image compression techniques have reached a high level of maturity, these methods do not adequately accommodate the spherical nature of omnidirectional images. Therefore, various methods have been proposed to address the issues of redundancy and distortion in projected omnidirectional images, which can be categorized into three main groups: re-projection, perceptual methods, and sampling density correction.
Re-projection methods seek to leverage compression-friendly re-projection techniques. Certain approaches involve rotating the panoramic content within the spherical domain [
11,
12]. Others develop new projection methods, including HEC [
13], HCP [
14], OGM [
15], etc. Re-projection methods allow regions demanding high visual quality to be situated in less distorted areas on the projected 2D plane before compression. However, it is important to note that these methods are highly content-dependent. Perceptual compression methods aim to improve the perceptual quality of omnidirectional images within the viewport. This category can be divided into viewport-based coding, such as [
16,
17], and saliency-based adaptive coding, such as [
18,
19]. Perceptual compression methods can yield improved results while necessitating additional data and increased computational resources.
Sampling density correction methods aim to mitigate the oversampling induced by the projection, thereby enhancing the compression efficiency of panoramic images. Sampling-related methods primarily include down-sampling [
20,
21] and adaptive quantization [
10,
22] methods. There are also learned image compression methods leveraging sampling-related approaches [
5]. Due to the necessity for complex methodologies and extensive training data, it falls outside the scope of our consideration. Sampling density correction methods, primarily based on conventional 2D methods with enhancements, are relatively straightforward to understand and implement. However, down-sampling methods predominantly hinge on intuitive insights and exhibit a limited foundation in terms of robust theoretical support. In addition, adaptive quantization methods partly rely on conventional techniques that were not originally designed for omnidirectional images.
Our paper falls within the category of sampling density correction methods and focuses on omnidirectional images under ERP, where the design of quantizers plays a pivotal role. Methods already proposed predominantly employ uniform quantization and rely on conventional quantization techniques such as JPEG [
23] and quantization parameter (QP) in HEVC [
24]. However, these conventional methods, designed for 2D images and videos, may not be well-suited for omnidirectional images, whereas our method can produce non-uniform scalar quantizers with assigned bits.
In this paper, we introduce a novel NNIBA technique to achieve high-quality image quantization through a combination of greedy bit allocation and non-uniform quantization of the transform coefficients. It is crucial to emphasize that our method is compatible with any block-based transformation, like discrete cosine transform (DCT). The process begins by performing the transform block by block, obtaining statistical parameters of a batch of images, followed by deriving the optimal real-valued bit allocation (ORBA). Due to the characteristics of spherical image projection, we posit that coefficients at different latitudes exhibit distinct distributions. To capture this, we utilize empirical distribution instead of the Gaussian distribution assumed in [
25]. The subsequent step involves a two-step greedy algorithm to allocate the non-negative integer (NNI) bits. Further, we design non-uniform quantizers using the Lloyd algorithm instead of conventional quantization techniques, which has demonstrated superior performance than uniform quantizers employed by [
20,
21]. Because of certain spherical properties inherent in panoramic images, conventional 2D image quality metrics such as peak signal-to-noise ratio (PSNR) are no longer suitable for evaluating panoramic images. Consequently, for practicality and simplicity, we have adopted the WS-PSNR metric and WMSE distortion [
26]. With the help of experiments, we have validated that the proposed NNIBA technique outperforms previously proposed sampling correction and bit allocation methods under WS-PSNR. Our proposed method effectively reduces redundancy in omnidirectional image compression and enhances the quantization of transform coefficients, thereby improving overall compression quality. This improvement enhances the storage and transmission performance of omnidirectional images. In applications such as AR and VR, where real-time requirements are critical, it can reduce image latency and provide superior image quality.
Our main contributions can be summarized in the following aspects:
- (1)
To quantize transform coefficients with non-uniform scalar quantizers, we derive the theoretical ORBA and provide a detailed analysis under WMSE distortion.
- (2)
Two consecutive low-complexity greedy algorithms are employed to obtain the integer bit allocation.
- (3)
We empirically verify the effectiveness of our method through simulation.
The rest of the paper is organized as follows. In
Section 2, we provide an overview of related work on sampling-related methods and integer bit allocation. In
Section 3, we introduce the system model and formulate the optimization problem. The theoretical analysis for ORBA and proposed greedy algorithms are presented in
Section 4 and
Section 5, respectively.
Section 6 shows the validation experiment. Finally,
Section 7 concludes the whole paper. Additionally, we first list the notations used in the paper in
Table 1 below.
2. Related Work
Sampling density correction can be divided into two main types, down-sampling and adaptive quantization. In the case of down-sampling methods, Budagavi et al. [
20] introduced a variable smoothing approach that employs Gaussian smoothing filters on the upper and lower regions of 360° video under ERP. Similarly, Youvalari et al. [
21] proposed a strategy to divide panoramic images under ERP into several down-sampled strips based on latitude. Note that in both approaches [
20,
21], down-sampling becomes progressively harsher in the two vertical directions towards the top and bottom borders. Additionally, Lee et al. [
27] presented a scheme that employs rhombus-shaped latitude down-sampling followed by pixel rearrangement. Nevertheless, the preprocessing approaches may not have direct control over the bitrate for panoramic image coding. Consequently, compressed panoramic images may struggle to regain their original quality at higher rates.
When it comes to adaptive quantization, rate-distortion optimization (RDO) methods have been adapted for omnidirectional videos and images. Liu et al. [
22] proposed optimizing panoramic video encoding by maximizing S-PSNR [
28] at a given bitrate. Li et al. [
24] suggested incorporating the weight value of WS-PSNR [
26] at the center of each block as a scalar for the Lagrangian multiplier
in RDO. However, the RDO methods mainly relied on the HEVC technique and did not give out the formulation for integer bit allocation. Apart from RDO methods, De Simone et al. [
10] proposed a method to adapt the typical quantization tables of JPEG according to frequency shift, which was easy to implement but constrained by the JPEG table. Both the RDO methods and the methods proposed by De Simone et al. used the quantization techniques designed for 2D images instead of for omnidirectional images.
Integer bit allocation performs as a pivotal technique in image transform coding, playing a crucial role in the quantization of transform coefficients. This method significantly contributes to the coding process by efficiently managing the allocation of bits. Huang et al. [
29] used the high-resolution quantization approximations and found the solution to the ORBA problem for the transform coders under mean squared error (MSE). However, the practical solution for quantizing the transform coefficient has to be an NNI value. Fox [
30] improved the ORBA algorithm and proposed an NNIBA algorithm under MSE. Bit allocation approaches have been widely used in image processing. Thakur et al. [
25] proposed a greedy algorithm under MSE measure and obtained a lookup table for the quantization table elements via non-linear regression analysis, thus facilitating the reduction in the additional side information requirement. Li et al. [
31] conducted an in-depth study focusing on the joint design of graph signal sampling along with quantization for graph signal compression in a task-based quantization manner. They proposed a joint design of the sampling and recovery mechanisms for a fixed quantization mapping and presented an iterative algorithm for dividing the available bit budget. Nevertheless, none of these methods were designed for omnidirectional images. Besides, neither Thakur et al. nor Li et al. took the distribution of coefficients into consideration, and their methods cannot be directly applied to 360° images.
4. Theoretical Analysis
The optimization problem (
5) is NP-hard with integer constraints, making it challenging to solve directly. In this section, we present the ORBA theorem when NNI constraints are relaxed under the WMSE measure. To start with, we focus on the bit allocation for
scalar quantizers in a block at latitude
k with bit quota
, where the coefficients to be quantized are denoted by
, where
. When the high-resolution assumptions are retained and
are non-identically distributed, the MSE distortion of the
l-th scalar quantizer is given by the following formula [
32],
where
is the variance of random variable
,
, and
is the normalized probability density function (pdf) of
with unit variance. In other words, the optimization problem can be formulated as
and we can derive a formal expression for bit allocation without NNI constraints, as stated in Lemma 1.
Lemma 1. When are not constrained to be NNI and the high-resolution assumption holds, the bit allocation to minimize MSE per block given quota can be formulated aswhere , , and . The MSE distortion per block is Remark 1. When the minimum expected MSE distortion in Lemma 1 is achieved, quantizers produce identical distortion, which is .
Lemma 1 presents the minimum overall distortion (
9) under MSE distortion. To derive the overall WMSE distortion, we assume to use the
to represent the average weight of the block at latitude
k. Thus, the overall WMSE distortion in the block with ERP weight
is
where we let
.
From Lemma 1, we know that when the bits allocated to one block are given, the optimal bit allocation among scalar quantizers is decided by the and , which is related to the distributions of the transform coefficients. Meanwhile, the overall WMSE distortion in a block is related to the weight , distributions, and . As emphasized in Remark 1, each scalar quantizer produces equivalent distortion when the minimum distortion is achieved. Subsequently, we need to consider how we allocate bits to K tiles.
Utilizing Lemma 1 in conjunction with (
10), we proceed to readdress our relaxed optimization problem, which now can be formulated as
Subsequently, we present the solution along with the minimum WMSE distortion to the optimization problem (
11) in Theorem 1.
Theorem 1. When the assumption of Lemma 1 holds, the expected WMSE distortion in one image is The bit allocated to one block at latitude k iswhere , . Proof. The geometric mean inequality states that for any positive numbers
, it satisfies that
with equality if the
are all equal.
Apply this inequality to (
11), considering
, and then we get
Equality holds if, for each
k,
so that
, which is (
13). Considering there are
blocks in a tile, the expected overall distortion in one image is (
12). □
We establish Theorem 1 by leveraging the WMSE distortion of the block without NNI constraints, and obtain the bits to be allocated to latitude
k. Once
is determined, the computation of
according to (
8) is straightforward. Building upon Theorem 1, we derive Corollary 1.
Corollary 1. When the minimum expected WMSE distortion in Theorem 1 is achieved, quantizers of different tiles produce identical distortion, which is .
The proposed Corollary 1 can be used to present the theoretical underpinnings of our greedy algorithm, which indicates that we consistently assign bits to the tile incurring the highest distortion.
7. Conclusions
In this paper, we introduce a novel NNIBA technique, specifically designed to enhance omnidirectional compression quality via efficient quantization of transform coefficients. Initially, we derive the ORBA using the WMSE measure. This is followed by the implementation of a two-step consecutive greedy algorithm for the allocation of NNI bits to each scalar quantizer. Furthermore, with the help of experiments, it has been validated that the proposed NNIBA technique outperforms the existing sampling density and integer bit allocation methods on omnidirectional images.
Our method still has some limitations. The proposed method relies on coefficients to model empirical distribution and generate non-uniform quantizers, so its performance is influenced by the input omnidirectional images. Furthermore, our method necessitates the Lloyd algorithm to iterate to convergence, thereby entailing higher computational complexity compared to uniform quantization. Additionally, our greedy algorithm cannot guarantee to find the optimal solution to this combinatorial optimization problem (
5).
In our future work, we plan to continue our research in three main directions. Firstly, we aim to explore the application of our bit allocation method to neural networks or adapt it to existing image/video compression frameworks for improvement. Secondly, our method is independent of the transform. We may consider utilizing other transforms such as the graph Fourier transform (GFT) to effectively exploit spherical characteristics and further enhance performance. Lastly, we will consider other methods such as simulated annealing or genetic algorithms to obtain approximate optimal solutions for bit allocation.