1. Introduction
There are two approaches to lossless image compression. (These two approaches are detailed in
Section 1 of our previous study [
1].) Most previous studies (e.g., [
2,
3,
4]) adopted an approach in which they constructed a preprocessing function
that outputs a code length assignment vector
from past pixel values
.
determines the code length of the next pixel value
, or typically, a value
equivalent to
in the meaning that there exists a one-to-one mapping
computable for both encoder and decoder. Then,
and
are passed to the following entropy coding process such as [
5,
6]. In this approach, the elements
of the code length assignment vector
satisfy
. Therefore, it appears superficially as a probability distribution. However, it does not directly govern the stochastic generation of original pixel value
. Hence, we cannot define the entropy of the source of pixel value
, and we cannot discuss the theoretical optimality of the preprocessing function
and one-to-one mapping
.
In contrast, we adopted an approach in which we estimated a stochastic generative model
with an unknown parameter
and a model variable
m, which is directly and explicitly assumed on the original pixel value
[
1,
7,
8,
9]. Therefore, we can discuss the theoretical optimality of the entire algorithm to the entropy defined from the assumed stochastic model
. In particular, we can achieve the theoretically optimal coding under the Bayes criterion in statistical decision theory (see, e.g., [
10]) by assuming prior distributions
and
on the unknown parameter
and model variable
m. Such codes are known as Bayes codes [
11] in information theory. It is known that the Bayes code asymptotically achieves the entropy of the true stochastic model, and its convergence speed achieves the theoretical limit [
12]. The Bayes codes have shown remarkable performance in text compression (e.g., [
13]). Therefore, we consider this approach.
We assume that the target image herein has non-stationarity, that is, the properties of pixel values are different among the positions in the image. For such an image, researchers have performed quadtree block segmentation as a component of preprocessing
and one-to-one mapping
in the former approach, and its practical efficiency has been reported in many previous studies (e.g., [
4,
14]). In the latter approach, we proposed a stochastic generative model
that contains a quadtree as a model variable
m. By assuming a prior distribution
on it, we derived the optimal code under the Bayes criterion, and we constructed a polynomial order algorithm to calculate it without loss of optimality [
1]. However, in all these studies [
1,
4,
14], the class of quadtrees is restricted to that of proper trees, whose inner nodes have exactly four children.
In this paper, we propose a stochastic generative model
based on an improper quadtree
m and derive the code optimal under the Bayes criterion. In general, the codes optimal under the Bayes criterion require a summation that takes an exponential order calculation for the data length. However, we herein construct an algorithm that only requires a polynomial order calculation without losing optimality by applying a theory of probability distribution for general rooted trees [
15] to the improper quadtree representing the block segmentation.
2. Proposed Stochastic Generative Model
Let denote a set of possible values of a pixel. For example, we have for binary images and for grayscale images. Let and denote a height and a width of an image, respectively. Although our model is able to represent any rectangular images, we assume that for in the following for the simplicity of the notation. Then, let denote the random variable of the t-th pixel value in order of the raster scan, and let denote its realization. Note that is at the -th row and -th column, where t divided by w is with a reminder of . In addition, let denote the sequence of pixel values . Note that all the indices start from zero herein.
We assume is generated from a probability distribution depending on an unknown model and unknown parameters . (For , we assume follows .) We define m and in the following.
Definition 1 ([
1]).
Let denote the following index set called “block.”where for and . In addition, let be the set of whole indices . Then, let denote the set that consists of all the above index sets, that is, . Example 1 ([
1]).
For ,Therefore, it represents the indices of the upper right region. In a similar manner, . It should be noted that the cardinality for each represents the number of pixels in the block. Definition 2. We define the model m as a quadtree whose nodes are elements of . Let denote the set of the models. Let , and denote the set of the nodes, the leaf nodes and the inner nodes of , respectively. Let denote the set of nodes that have less than four children. Then, corresponds to a pattern of variable block size segmentation, as shown in Figure 1. Definition 3. Each node of the model m has a parameter whose parameter space is . We define as a tuple of parameters , and let denote its space.
Notably, we can reduce the number of parameters from an equivalent model represented by a proper tree with added dummy child nodes. See the following example.
Example 2. For , consider a model represented by the left-hand side image in Figure 2. It has three parameters: , , and . An equivalent model can be represented by a proper quadtree shown in the right-hand side of Figure 2, if assuming by chance. However, it requires four parameters: , , , and . Therefore, it causes inefficient learning. Under the model and the parameters , we assume that the t-th pixel value is generated as follows.
Assumption A1. We assume thatwhere s is the minimal block that satisfies (in other words, s is the the deepest node that contains in m). For , we assume a similar condition . Thus, the pixel value given the past sequence depends only on the parameter of the minimal block s that contains . Note that we do not assume a specific form of at this point. For example, we can assume the Bernoulli distribution for and also the Gaussian distribution (with an appropriate normalization and quantization) for .
3. The Bayes Code for Proposed Model
Since the true
m and
are unknown, we assume prior distributions
and
. Then, we estimate the true generative probability
by
under the Bayes criterion in statistical decision theory (see, e.g., [
10]). Subsequently, we use
as a coding probability of the entropy code such as [
16]. Such a code is known as Bayes codes [
11] in information theory. The expected code length of the Bayes code converges to the entropy of
for sufficiently large data length, and its convergence speed achieves the theoretical limit [
12]. The Bayes code has shown remarkable performances in text compression (e.g., [
13]).
The optimal coding probability of the Bayes code for
is derived as follows, according to the general formula in [
11].
Proposition 1. The optimal coding probability under the Bayes criterion is given byWe call the Bayes-optimal coding probability. Proposition 1 implies that we should use the coding probability that is a weighted mixture of
for every block segmentation pattern
m and parameters
according to the posteriors
and
. (For
,
is mixed with weights according to the priors
and
, which corresponds to the initialization of the algorithm.) Notably,
is generalized to the set of improper quadtrees from the set of proper quadtrees although (
4) has a similar form to Formula (5) in [
1].
4. Polynomial Order Algorithm to Calculate Bayes-Optimal Coding Probability
Unfortunately, the Bayes-optimal coding probability (
4) contains a computationally hard calculation. (Herein, we assume that
is feasible. Examples of feasible settings will be described in the next section.) The summation cost for
m exponentially increases with respect to
. Therefore, we propose a polynomial order algorithm to calculate (
4) without loss of optimality by applying a theory of probability distribution for general rooted trees [
15] to the improper quadtree
m. In this section, we focus on the procedure of the constructed algorithm. Its validity is described in
Appendix A.
Definition 4. Let be the set of child nodes of s. We define a vector representing the block division pattern of s in as , where denotes the indicator function. Examples of are shown in Figure 3. For leaf nodes, . First, we assume the following prior distributions as and .
Assumption A2. Let be a given hyper parameter of a block , which satisfies . Then, we assume that the prior on is represented as follows.where for s whose cardinality is equal to 1. Intuitively,
represents the conditional probability that
s has the block division pattern
under the condition that
. The above prior actually satisfies the condition
. Although this is proved for any rooted tree in [
15], we briefly describe a proof restricted for our model in the
Appendix A to make this paper self-contained. Note that the above assumption does not restrict the expressive capability of the general prior in the meaning that each model
m still has possibly to be assigned a non-zero probability
.
Assumption A3. For each model , we assume thatMoreover, for any , , and , we assume that Therefore, each element of the parameters depends only on s and they are independent from both of the other elements and the model m.
From Assumptions 1 and 3, the following lemma holds.
Lemma 1. For any , let and denote the minimal node that satisfies and , respectively. If and , that is, they are the same block and their division patterns are also the same, thenHence, we represent it by because it does not depend on m but . Let for . Lemma 1 means that the optimal coding probability for depends on the minimal block s that contains and its division pattern . Therefore, it could be calculated as if was known.
At last, the Bayes-optimal coding probability
can be calculated by a recursive function for nodes on a path of the perfect quadtree on
. The definition of the path is the same as [
1].
Definition 5 ([
1]).
Let denote the set of nodes which contain . They construct a path from the leaf node to the root node on the perfect quadtree whose depth is on , as shown in Figure 4. In addition, let denote the child node of on that path. Definition 6. We define the following recursive function for .where is also recursively updated for as follows: Consequently, the following theorem holds.
Theorem 1. The Bayes-optimal coding probability for the proposed model is calculated by Although Theorem 1 is proved by applying Corollary 2 of Theorem 7 in [
15], we briefly describe a proof restricted to our model in the
Appendix A to make this paper self-contained. Theorem 1 means that the summation with respect to
in (
4) is able to be replaced by the summation with respect to
and
, which costs only
. The proposed algorithm recursively calculates a weighted mixture of coding probabilities
for the case where block
s is not divided at
(i.e.,
) and the coding probability
for the case where block
s is divided at
(i.e.,
).
5. Experiments
In this section, we perform four experiments. Three of them are similar to the experiments in [
1]. The fourth one is newly added. In Experiments 1, 2, and 3, we assume
, which is the simplest setting, to focus on the effect of the improper quadtrees. In Experiment 4, we assume
to show our method is also applicable to grayscale images. The purpose of the first experiment is to confirm the Bayes optimality of
for synthetic images generated from the proposed model. The purpose of the second experiment is to show an example image suitable to our model. The purpose of the third experiment is to compare average coding rates of our proposed algorithm with a current image coding procedure on real images. The purpose of the fourth experiment is to show our method is applicable to grayscale images.
In Experiments 1 and 2,
is Bernoulli distribution
for the minimal
s that satisfies
. Each element of
is i.i.d. distributed with the beta distribution
, which is the conjugate prior distribution of Bernoulli distribution. Therefore, the integral in (
4) has a closed form. The hyperparameter
of the model prior is
for every
and
, and the hyperparameters of the beta distribution are
. For comparison, we used the previous method based on proper quadtrees, whose hyperparameters are the same as the experiments in [
1], and the standard methods known as JBIG [
17] and JBIG2 [
18].
5.1. Experiment 1
The setting of Experiment 1 is as follows. The width and height of images are . We generate 1000 images according to the following procedure.
- 1.
Generate
m according to (
5).
- 2.
Generate according to for .
- 3.
Generate pixel value according to for .
- 4.
Repeat Steps 1 to 3 for 1000 times.
Examples of the generated images are shown in
Figure 5. Subsequently, we compress these 1000 images. The size of the image is saved in the header of the compressed file using 4 bytes. The coding probability calculated by the proposed algorithm is quantized in
levels and substituted into the range coder [
16].
Table 1 shows the coding rates (bit/pel) averaged over all the images. Our proposed code has the minimum coding rate as expected by the Bayes optimality.
5.2. Experiment 2
In Experiment 2, we compress
camera.tif in [
19], which is binarized with the threshold of 128. The setting of the header and the range coder is the same as those of Experiment 1.
Figure 6 visualizes the maximum a posteriori (MAP) estimation
based on the improper quadtree model and the proper quadtree model [
1], which are by-products of the compression. They are obtained by applying Theorem 3 in [
15] and the algorithm in Appendix B in the preprint of the full version of [
15], which is uploaded on arXiv. The improper quadtree represents the non-stationarity by a fewer number of regions (i.e., fewer parameters) than that of the proper quadtree [
1].
Table 2 shows that the coding rate of our proposed model for
camera.tif is lower than the previous one based on the proper quadtree [
1] and JBIG [
17] without any special tuning. However, JBIG2 [
18] showed the lowest coding rate. The improvement of our method for real images will be described in the next experiment.
5.3. Experiment 3
In Experiment 3, we compare the proposed algorithm with the proper-quadtree-based algorithm [
1], JBIG [
17], and JBIG2 [
18] on real images from [
19]. They are binarized in a similar manner to Experiment 2. The setting of the header and the range coder is the same as those of Experiments 1 and 2. A difference from Experiments 1 and 2 is in the stochastic generative model
assumed on each block
s. We assume another model
represented as the Bernoulli distribution
that depends on the neighboring four pixels. (If the indices go out of the image, we use the nearest past pixel in Manhattan distance.) Therefore,
has a kind of Markov property. In other words, there are 16 parameters
for each block
s of model
m, and one of them is chosen by the observed values
,
,
, and
in the past. Each parameter is i.i.d. distributed with the beta distribution whose parameters are
. The results are shown in
Table 3. The algorithms labeled as Improper-i.i.d. and Proper-i.i.d. are the same as those in Experiments 1 and 2. The algorithms labeled as Improper-Markov and Proper-Markov are the aforementioned ones.
Improper-Markov outperforms the other methods from the perspective of average coding rates. The effect of the improper quadtree is probably amplified because the number of parameters for each block is increased. However, JBIG2 [
18] still outperforms our algorithms only for
text. We consider it is because JBIG2 [
18] is designed for text images such as faxes in contrast to our general-purpose algorithm. Note that our algorithm has room for improvement by tuning the hyperparameters
and
of the beta distribution for each of
.
5.4. Experiment 4
Through Experiment 4, we show our method is applicable to grayscale images. Herein, we assume two types of stochastic generative models
for the block of the proper quadtree and the improper quadtree. The first one is the i.i.d. Gaussian distribution
. In this case,
can be regarded as
. The second one is the two-dimensional autoregressive (AR) model [
7] of the neighboring four pixels, i.e.,
, where
. (If the indices go out of the image, we use the nearest past pixel in Manhattan distance.) In this case,
can be regarded as
. For both models,
is normalized and quantized into
in a similar manner to [
7]. The prior distributions for each model are assumed to be the Gauss–gamma distributions
and
, where
,
,
,
,
,
. Here,
is the identity matrix. The results are shown in
Table 4. (The values for previous studies [
2,
4,
20,
21] are cited from [
21].)
The coding rates of the proper-quadtree-based algorithm are improved by our proposed method for all the images in this data set and for both settings of the stochastic generative model assumed within blocks. This indicates the superiority of the improper-quadtree-based model to the proper-quadtree-based model. The method labeled by Improper-AR showed an average coding rate lower than JPEG2000, averaging for the whole images. It also showed an average coding rate lower than JPEG-LS, averaging for the natural images. Although it does not outperform recent methods such as MRP and Vanilc, we consider this is because of the suitability of the stochastic generative model within blocks, which is out of the scope of this paper.
6. Conclusions
We proposed a novel stochastic model based on the improper quadtree, so that our model effectively represents the variable block size segmentation of images. Then, we constructed a Bayes code for the proposed stochastic model. Moreover, we introduced an algorithm to implement it in polynomial order of data size without loss of optimality. Some experiments both on synthetic and real images demonstrated the flexibility of our stochastic model and the efficiency of our algorithm. As a result, the derived algorithm showed a better average coding rate than that of JBIG2 [
18].
Author Contributions
Conceptualization, Y.N. and T.M.; methodology, Y.N.; software, Y.N.; validation, Y.N. and T.M.; formal analysis, Y.N. and T.M.; investigation, Y.N. and T.M.; resources, Y.N.; data curation, Y.N.; writing—original draft preparation, Y.N.; writing—review and editing, Y.N. and T.M.; visualization, Y.N.; supervision, T.M.; project administration, T.M.; funding acquisition, T.M. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported in part by JSPS KAKENHI Grant Numbers 17K06446, JP19K04914 and 22K02811.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Acknowledgments
We would like to thank the members of Matsushima laboratory for their meaningful discussions.
Conflicts of Interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Appendix A. Validity of Proposed Algorithm
References
- Nakahara, Y.; Matsushima, T. A Stochastic Model for Block Segmentation of Images Based on the Quadtree and the Bayes Code for It. Entropy 2021, 23, 991. [Google Scholar] [CrossRef] [PubMed]
- Weinberger, M.J.; Seroussi, G.; Sapiro, G. The LOCO-I lossless image compression algorithm: Principles and standardization into JPEG-LS. IEEE Trans. Image Process. 2000, 9, 1309–1324. [Google Scholar] [CrossRef] [PubMed]
- Wu, X.; Memon, N. Context-based, adaptive, lossless image coding. IEEE Trans. Commun. 1997, 45, 437–444. [Google Scholar] [CrossRef]
- Matsuda, I.; Ozaki, N.; Umezu, Y.; Itoh, S. Lossless coding using variable block-size adaptive prediction optimized for each image. In Proceedings of the 2005 13th European Signal Processing Conference, Antalya, Turkey, 4–8 September 2005; pp. 1–4. [Google Scholar]
- Huffman, D.A. A Method for the Construction of Minimum-Redundancy Codes. Proc. IRE 1952, 40, 1098–1101. [Google Scholar] [CrossRef]
- Rissanen, J.; Langdon, G. Universal modeling and coding. IEEE Trans. Inf. Theory 1981, 27, 12–23. [Google Scholar] [CrossRef]
- Nakahara, Y.; Matsushima, T. Autoregressive Image Generative Models with Normal and t-distributed Noise and the Bayes Codes for Them. In Proceedings of the 2020 International Symposium on Information Theory and Its Applications (ISITA), Kapolei, HI, USA, 24–27 October 2020; pp. 81–85. [Google Scholar]
- Nakahara, Y.; Matsushima, T. Hyperparameter Learning of Stochastic Image Generative Models with Bayesian Hierarchical Modeling and Its Effect on Lossless Image Coding. In Proceedings of the 2021 IEEE Information Theory Workshop (ITW), Kanazawa, Japan, 17–21 October 2021. [Google Scholar]
- Nakahara, Y.; Matsushima, T. Bayes code for two-dimensional auto-regressive hidden Markov model and its application to lossless image compression. In Proceedings of the International Workshop on Advanced Imaging Technology (IWAIT) 2020, Yogyakarta, Indonesia, 1 June 2020; SPIE: Bellingham, WA, USA, 2020; Volume 11515, pp. 330–335. [Google Scholar] [CrossRef]
- Berger, J.O. Statistical Decision Theory and Bayesian Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Matsushima, T.; Inazumi, H.; Hirasawa, S. A class of distortionless codes designed by Bayes decision theory. IEEE Trans. Inf. Theory 1991, 37, 1288–1293. [Google Scholar] [CrossRef]
- Clarke, B.S.; Barron, A.R. Information-theoretic asymptotics of Bayes methods. IEEE Trans. Inf. Theory 1990, 36, 453–471. [Google Scholar] [CrossRef]
- Matsushima, T.; Hirasawa, S. Reducing the space complexity of a Bayes coding algorithm using an expanded context tree. In Proceedings of the 2009 IEEE International Symposium on Information Theory, Seoul, Korea, 28 June–3 July 2009; pp. 719–723. [Google Scholar] [CrossRef]
- Sullivan, G.J.; Ohm, J.; Han, W.; Wiegand, T. Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
- Nakahara, Y.; Saito, S.; Kamatsuka, A.; Matsushima, T. Probability Distribution on Rooted Trees. In Proceedings of the 2022 IEEE International Symposium on Information Theory, Espoo, Finland, 26 June–1 July 2022. [Google Scholar]
- Martín, G. Range encoding: An algorithm for removing redundancy from a digitised message. In Proceedings of the Video and Data Recording Conference, Southampton, UK, 24–27 July 1979; pp. 24–27. [Google Scholar]
- Kuhn, M. JBIG-KIT. Available online: https://www.cl.cam.ac.uk/~mgk25/jbigkit/ (accessed on 24 July 2022).
- Langley, A. jbig2enc. Available online: https://github.com/agl/jbig2enc (accessed on 24 July 2022).
- Image Repository of the University of Waterloo. Available online: http://links.uwaterloo.ca/Repository.html (accessed on 8 November 2021).
- Skodras, A.; Christopoulos, C.; Ebrahimi, T. The JPEG 2000 still image compression standard. IEEE Signal Process. Mag. 2001, 18, 36–58. [Google Scholar] [CrossRef]
- Weinlich, A.; Amon, P.; Hutter, A.; Kaup, A. Probability Distribution Estimation for Autoregressive Pixel-Predictive Image Coding. IEEE Trans. Image Process. 2016, 25, 1382–1395. [Google Scholar] [CrossRef] [PubMed]
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).