1. Introduction
In the past few years, as researchers have paid more attention to privacy protection [
1], the reversible data hiding (RDH) [
2,
3,
4,
5] technique has greatly developed, in which secret information can be embedded into the multimedia medium without any obvious distortion. Meanwhile, the extracted embedded data is lossless, and even the restored host medium is exactly the same as the original. Because of these characteristics, RDH has a significant influence in various fields, such as medical applications [
6] and multimedia archive management [
7]. Many investigations have been extensively devoted to RDH methods. The embedding mechanism can be classified into one of three categories: histogram shifting [
8,
9,
10,
11], difference expansion [
12,
13,
14], and lossless compression [
15,
16]. Specifically, the embedding methods based on histogram shifting exploit the peak and zero point of the image histogram to embed the watermark bits, while the embedding, based on difference expansion, in which secret bits are inserted into the difference between adjacent pixels. In addition, the researchers also have carried out reversible embedding of dual cover images [
17,
18]. Since the RDH methods mentioned above all involve plaintext images, which are easily tampered or stolen by an unauthorized user during the transmission process, reversible data hiding in an encrypted image (RDH-EI) [
19,
20,
21] is presented to further improve the security of the image.
Generally speaking, RDH-EI methods can be divided into two categories according to whether there is embeddable space before the image encryption operation: vacating room after encryption (VRAE) [
22,
23,
24,
25,
26,
27,
28,
29,
30] and reserving room before encryption (RRBE) [
31,
32,
33,
34,
35,
36]. The first RDH-EI method was proposed by Puech et al. [
22], where the local standard deviation of the marked encrypted image is analyzed and the embedded information is extracted during the process of image decryption. In the method of Zhang [
23], the encrypted image is divided several nonoverlapping blocks, where the secret data is embedded by modifying some encrypted pixels in the block. The disadvantage of this method is that if the divided blocks are quite small, there will be errors in the extracted data. In another method proposed by Zhang [
24], the spare space for additional data accommodation is provided by compressing the least significant bits of the encrypted image, thereby creating reversible data hiding. Hong et al. [
25] utilized the side match approach to reduce the error rate of extracted data. Moreover, a method based on absolute mean difference of multiple neighboring pixels was created by Liao et al. [
26], in which the complexity of an image block is evaluated by using multiple adjacent pixels. In the Qin et al.’s method [
27], the encrypted image is divided into non-overlapping blocks that are processed by a run-length coding algorithm, and then Huffman coding is applied to encode each block as an embeddable block that can contain secret data. Yu et al. [
28] implemented secret data embedding with the help of histogram shifting, based on two-layer adjacent encrypted pixel errors. Fu et al. [
29] adopted block rearrangement and stream cipher encryption to encrypt the original image, and then adaptively compress the most significant bits of the embeddable block, to provide embedding space for additional data. Wu et al. [
30] employed a parameter binary tree to mark encrypted pixels, which makes full use of the spatial correlation of the entire image and reserves room for the embedding of additional data.
Since image encryption can destroy the spatial correlation of the pixels and cause the entropy of the image to reach its maximum, it is troublesome to vacate spare room for additional data accommodation from the encrypted image. Therefore, in order to increase the image embedding capacity, Ma et al. [
31] were the first to consider a way of reserving room before encryption. On this basis, Cao et al. [
32] put forward a patch-based RDHEI method to take advantage of the correlation of adjacent pixels, in which the patch is regarded as whole and represented by a limited number of coefficients, thereby producing high room capacity. Additionally, Yi and Zhou [
33] exploited a binary-block embedding (BBE) method to embed the binary values of less significant bit-planes of the host image into more significant bit-planes, vacating space for the embedding of additional data. In Wu et al.’s method [
34], before encryption, a way of predicting the most significant bit is applied and the binary bits in the least significant bit-plane are reversibly saved into other bit-planes. Arun et al. [
35] used parametric binary tree to mark pixels, with which a pixel is identified as regular or irregular pixel. Then, regular pixels are used to accommodate secret bits. Mittal et al. [
36] adopted two schemes to embed secret bits based on the checkerboard pattern divided from the cover image, and obtained more rooms for embedding.
In many existing RDH-EI methods, the embedding of secret data is achieved by substituting the least significant bit of the pixel, which limits the embedding capacity of the image and is difficult for the recovery of the image. Based on the above reasons, the researcher’s goal has gradually shifted from the least significant bit (LSB) to the most significant bit (MSB). In 2018, Puteaux and Puech [
37] introduced the RDH-EI algorithm based on MSB prediction, in which the most significant bits of the original image are predicted and then the most significant bits of pixels without prediction errors can be replaced by bits of secret data. In another method proposed by Puteaux and Puech [
38], the other bit-planes of the original image are predicted and analyzed recursively from MSB to LSB to improve embedding capacity as much as possible. In Yin et al.’s method [
39], the median edge detector predictor is utilized to predict the multi-MSB of the image to generate a label map that contains all the pixels’ labels, and is encoded by using Huffman coding to reduce the space occupied by itself into auxiliary information. Then, according to the label map, the multi-MSB can be substituted by the secret bits.
As it is pointed out in [
3], the embedding capacity and visual symmetry/quality are complementary, which yields a very poor quality recovered image. Also, as can be seen from the above discussion, only one main mechanism is applied to create spare space for embedding secret data in most methods, which seriously limits the improvement of embedding capacity. To make full use of redundant space of an original image, this paper proposes a hybrid prediction strategy to implement reversible data hiding based on Huffman coding. In the first part of the proposed method, the original image is decomposed into two parts, i.e., the most significant bit-plane and the remaining bit-planes, where two different embedding mechanisms are applied. For each pixel, its predicted value is obtained by calculating the average value of its two neighboring pixels, from which, any errors are identified. After modifying pixels with prediction errors, which are indicated in the error location map, all of the most significant included in the first part can be vacated to record secret data. In the second part of the process, a new image can be considered, comprised of the seven least significant bits of all pixels. Using the correlation between adjacent pixels, vacant bits can be obtained from higher bit-planes of this new image. Each pixel in this image is assigned a tag by comparing its actual value to its predicted value, where its predicted value is calculated using the median edge detector predictor. Finally, a tag map for this image is built comprised of all tag values, and then Huffman coding is used for compression, to obtain a large amount of vacant space. It should be noted that, through compression of the error location map and reference pixels in the first part, the number of embeddable bits can be increased further. Extensive experiments are performed to demonstrate the performance of the proposed method in terms of several characteristics such as security, embedding capacity, and robustness against potential attacks.
The rest of this paper is organized as follows. In
Section 2, the proposed method including detailed descriptions of the generation of the tag and error location maps, encryption of the original image, secret data embedding, and data extraction and image recovery. In
Section 3, a series of experiments and analyses are presented that demonstrate the performance of the proposed method. Finally, a brief conclusion is given in
Section 4.
2. Proposed Method
In this section, a new reversible data hiding method based on hybrid prediction and Huffman coding is described in detail, with which, not only higher embedding capacity, but also error-free secret data extraction and image recovery can be achieved. As shown in
Figure 1, there are three main processes included in the corresponding schematic diagram, i.e., encryption of the original image, embedding of the secret data, and extraction of the secret data as well as recovery of the original image. In the encryption process, the most significant bits of the pixels are predicted to reserve redundant space to hide the data. The pixels with prediction errors are marked to generate the error location map, and the original image is modified based on this. Then, the pixels composed of the seven least significant bit-planes of the modified image are predicted to obtain the pixel tag value. A part of the auxiliary data is embedded into the most significant bits of encrypted pixels with the help of the error location map. To save embeddable bits, the error location map is compressed using the run-length coding algorithm. In the embedding process, applying bit substitution, the remaining of the auxiliary data and the secret data are adaptively accommodated into bits of other high bit-planes by referencing the tag map. Finally, secret data extraction and image recovery can be implemented separately, according to different secret keys, such as the image encryption key and the data hiding key, where the retrieved secret data is error-free and the restored image is the same as the original image.
2.1. Auxiliary Data Generation
2.1.1. Error Location Map Generation
Since a part of the calculated auxiliary data is embedded into most significant bits of encrypted pixels, their original values are modified after secret data hiding. To recover the original image without errors in the future, it is important to be able to predict them accurately. For this purpose, a binary error location map is built, which consists of basic and additional parts. Except for pixels in the first row and column, which are used as reference pixels, each pixel is analyzed to identify any prediction errors. If there is an error, the corresponding bit in the error location map is set to 1, otherwise it is set to 0. Arranging the error location map column by column, a sequence can be obtained as the basic part as shown in
Figure 2a.
Suppose that pixel values are in the range and the size of the original image is pixels, an initial binary error location map denoted as is created, where all bits are set to 0. Scanning all pixels in the original image, the basic part of the error location map is generated as follows:
- (1)
For the current pixel
, where
and
, its inverse value
is calculated as
where
denotes the inverse value.
- (2)
To predict the current pixel more accurately, especially when there is a large difference between the pixel and its neighboring values, the average of its top pixel
and left pixel
is considered as its predicted value and calculated as
where
denotes the prediction value.
- (3)
Two absolute difference values between
and
, and between
and
are calculated and denoted as:
where
and
represent two absolute difference values, respectively.
- (4)
If the original value is closer to its prediction value than the inverse value, it is assumed that there is no prediction error according to the current pixel, and the bit
in the error location map remains unchanged. Otherwise, there is an error and the bit
is set to 1. The details can be described as
- (5)
Finally, concatenating all bits in the column in order of priority, i.e., from top to bottom and from left to right, a binary sequence is engendered as the basic part of the error location map.
Generally, most significant bits in pixels with prediction errors cannot be vacated for secret data embedding. For example, in the high-capacity reversible data hiding approach proposed by Puteaux and Puech [
37], if at least one prediction error is determined in a block of eight pixels, information will not be embedded in the 24 most significant bits included in this block, nor its previous and following blocks. Therefore, the maximum embedding capacity of this method are less than 1 bit per pixel (bpp). To solve this issue, in the proposed method, a new image is constructed by modifying original values of pixels with prediction errors, and then, all information, including auxiliary data and secret data, are embedded into it. Meanwhile, the additional part of the error location map, as shown in
Figure 2b, is designed to identify what kind of modification is performed for the pixels with prediction errors.
2.1.2. Original Image Modification
In order to make all most significant bits of the original image embeddable, it is necessary to modify the problematic pixels before embedding, so that a new image without any prediction errors can be obtained. In this proposed method, the new image denoted as
is formulated by taking account two aspects, i.e., the two absolute difference values calculated using Equation (3) and the distribution of pixel values. For each pixel in the original image, whether it is modified depends on the following provisions:
From Equation (5), it can be seen that the original pixel value remains unchanged in the new image if there is no prediction error, otherwise it should be modified. When the absolute difference value between and is larger than that between and , the original pixel value is directly substituted with its inverse value. When two absolute difference values are equal, the original pixel value in the range is added to 1, while those in the range are subtracted by 1. Thus, the problematic pixels with are transformed to pixels without prediction errors. For example, if a pixel , satisfying the condition is problematic; it is known that its inverse value is 129, as calculated using Equation (1), and its prediction value should be 65. Adding the original pixel value to 1, according to Equation (5), its inverse value will become 130. At the same time, two absolute difference values and will become 63 and 65, respectively, as calculated in Equation (3), which satisfies . This means that the corresponding pixel without prediction error is obtained.
Correspondingly, the additional part of the error location map as shown in
Figure 2b is used to identify what kind of modification took place for pixels with prediction errors. Because there are three kinds of pixel value modifications in the new image, as expressed in Equation (5), two bits are required to identify them. As shown in
Figure 2b, suppose that there are
pixels with prediction errors in the basic part. Scanning all pixels in the basic part individually, if a pixel is marked with 1 and the condition
is satisfied, two bits “00” are appended in the second part of the error location map. For example, for the
pixel in the basic part, which is the first pixel with prediction error, the first two bits in the second part are set to “00”. If a pixel is marked with 1 and its original value is not larger than 128, two bits “01” are appended in the second part, otherwise two bits “10” are added. For example, for the
pixel in the basic part, which is the last pixel with a prediction error, the last two bits in the second part of the error location map are set to “10”. Thus, a total of
bits denoted as
are used to record the modification of the pixel values.
Obviously, because the basic part of the error location map is employed to identify all pixels of the original image as described in Equation (4), a total of
bits are needed. If it is directly embedded as a part of auxiliary data, a larger number of embeddable bits will be consumed. To solve this issue, it should be further compressed. As an important data compression algorithm, run-length coding can achieve lossless compression of the data, in which the same value occurs in many consecutive data elements and a sequence of the same values, i.e., a run, can be replaced by a code to indicate the value and the number of times it occurs. Because the number of pixels with prediction error is very low, there will be many long runs of “0” in the basic part of the error location map. Therefore, there is no doubt that it can be efficiently compressed using the run-length coding. Obviously, the value of the first run of the sequence corresponding to the error location map generated by any image is always 0. The reason for this is that when the average value of adjacent pixels is used to predict the most significant bit of the current pixel, the pixels of the first row and the first column are taken as reference pixels without processing, so the values of elements in the first row and column of the corresponding error location map are 0. As shown in
Figure 3, there are four parts of data in the compressed basic part. Among them are
bits to record the length of the compressed basic part,
and
bits to record the length of the maximum run of “0” and “1”, respectively. The remaining bits, denoted as
are the binary sequence of the compressed basic part.
Since most significant bits of reference pixels are utilized to embed information in the proposed method, their original values also should be compressed and recorded as a part of the auxiliary data. Concatenating all of the most significant bits of pixels in the first row and column, the obtained sequence is compressed using the run-length coding. Similar to the structure of the compressed basic part, as shown in
Figure 3, the compressed result denoted as
includes five parts of data, as shown in
Figure 4. In this, 1 bit, denoted as
, is used to identify whether the initial run is a run of “0” or “1”,
bits are used to record the length of the compressed result,
and
bits to record the length of the maximum run of “0” and “1”, respectively, and the remaining bits denoted as
are the resultant sequence after compression.
2.1.3. Tag Map Generation
As described above, most significant bits of the modified image can only accommodate a part of the auxiliary data. In order to record the remaining part and the secret data, embeddable bits should be reserved from other high bit-planes of each pixel. To achieve this goal, a new image denoted as is constituted with seven least significant bit-planes of the modified image in the beginning. Obviously, all pixel values are in the range in this new image. Then, multiple bits in high bit-planes of each pixel in are predicted adaptively and assigned to the corresponding Huffman code.
Except for the first pixel
in the new image, the prediction value of each pixel in
should be calculated at first. If the current pixel is in the first row, the value of its neighboring left pixel is regarded as its prediction value. Similarly, if the current pixel is in the first column, the value of its neighboring top pixel is regarded as its prediction value. For other pixels, the corresponding prediction values can be evaluated using the median edge detector (MED) predictor. For the current pixel
in the new image, its prediction value can be calculated as
where
,
,
and
is the corresponding prediction value. Next, the original value and the prediction value are converted into two 7-bit binary sequences, respectively, using the following formulas as
where
and
represent the
bit of two obtained binary sequences, respectively. Compare these two sequences from the most significant bit to the least significant bit until a pair of bits are different. Then, a tag denoted as
is used to mark the current pixel, which indicates the number of bits in high bit-planes for which their original and prediction values are equal. This is calculated as
where
and
are the corresponding
most significant bits, respectively. Because two sequences have 7 bits, the tag
is an integer that is no larger than 7, which means that there are 8 kinds of tags at most. In the embedding process, there are
embeddable bits in high bit-planes can be reserved to save information. For example, if a pixel value is 108 and its prediction value is 116, two values can be converted into “1101100” and “1110100”, respectively. Comparing them in turn from high to low bit-plane, it can be found that there are 2 equal bits until a pair of bits are different. Therefore, the tag of this pixel is set to 2 and bits in three high bit-planes can be vacated to embed information. Notably, the pixel with the tag
only affords 7 embeddable bits, because the binary sequence of the pixel contains only 7 bits.
Except for the first pixel, scanning all other pixels using the aforementioned procedure, a tag map of the modified image can be obtained. Meanwhile, a large number of embeddable bits are created in high bit-planes of pixels in to accommodate information. Note that the pixel does not provide spare bits and remains unchanged. It can be directly used as the prediction value of some pixels, such as and , not only in the embedding process but also in the image recovery process.
2.2. Huffman Coding
To ensure that the original image can be recovered without any error, the tag map generated in also should be considered as a part of the auxiliary data and can be embedded after the modified image is encrypted. It is obvious that the number of each tag in is different due to region characteristics of the original image. Calculating the occurrence frequency for each kind of tag, the tag map is recorded through Huffman coding, which will be very helpful to decrease the amount in this part of the auxiliary data, thereby improving the embedding capacity.
As described in
Section 2.1.3, there are 8 kinds of tags at most, where
is an integer in the range
. If a pixel in
is assigned with the tag 0, the bit in the
bit-plane can be vacated to record information. In the recovery process, the original bit can be obtained by reversing the value of the corresponding position of its prediction value. As the tag is an integer in the range
, a total of
bits from the
bit-plane to the
bit-plane are reserved. The first
original bits can be recovered by retrieving the
most significant bits of its prediction value, and the last bit is obtained using the method similar to that with the tag as 0. If the tag is equal to 7, the bits can be directly recovered from 7 bits occupied by its prediction value. Correspondingly, a group of 8 Huffman codes is defined to represent 8 kinds of tags, which are “00”, “01”, “100”, “101”, “1100”, “1101”, “1110”, and “1111”.
In the Huffman coding process, the numbers of different kinds of tags are calculated initially in accordance with the generated image
of an original image, and then all kinds of tags are sorted. According to their occurrence frequency from high to low, 8 kinds of Huffman codes are assigned to them in turn. In other word, the tag with the maximum frequency is allocated with the shortest Huffman code such as “00” or “01”, while the tag with the minimum frequency assigned with the longest Huffman code such as “1100”, “1101”, “1110”, and “1111”. In this way, it is obvious that the amount of the tag map generated in
can be compressed greatly. Take the test image Baboon, as an example, where the image size is
pixels, the numbers of all kinds of tags and the corresponding Huffman code are listed in
Table 1. It can be intuitively observed that the number of pixels with the tag 0 is the largest, so the corresponding pixels are represented by the Huffman code “00”, while the numbers of pixels with tag 6 and 7 are smaller than others; therefore, the corresponding pixels are marked with “1111” and “1110”, respectively. It is notable that the Huffman coding rule composed of 8 codes should be considered as a part of auxiliary data to be embedded.
After the process of Huffman coding is implemented, the compressed tag map occupying
bits can be obtained, where its length can be calculated by the formula as
where
represents the number of pixels with the tag
and
denotes the length of the corresponding Huffman code. At the same time,
bits are applied to record the length of the compressed tag map.
2.3. Image Encryption
To enhance the security level, each pixel of the modified image
generated in
Section 2.1.2 is encrypted by the aid of the encryption key
in the proposed method. Initially, a pseudo-random matrix
with the same size as
is generated using the encryption key. Then, the bitwise exclusive-OR (XOR) calculation is applied between the current pixel
and the corresponding element
in the pseudo-random matrix. In other word, each pixel in
can be processed as
where
is the encrypted result of the current pixel and
denotes the bitwise exclusive-OR operator. After all pixels in the modified image
are processed using Equation (10), the encrypted image denoted as
is obtained. However, and importantly, the encrypted image is very noisy and no valid information can be observed. In addition, if it is needed to improve the security level further, the pseudo-random matrix used in Equation (10) can be generated with chaotic functions, such as in the Chen system. This can guarantee a random matrix with non-periodic and non-convergent properties that is more suitable for encrypting the image.
2.4. Auxiliary Data Embedding
As discussed in previous subsections, there are four kinds of auxiliary data in the proposed method which are used to describe the compressed error location map, the compressed binary sequence of most significant bits of reference pixels, the Huffman coding rule, and the compressed tag map. To implement extraction of the secret data and recovery of the original image successfully, all auxiliary data are rearranged as shown in
Figure 5. In doing this, a long binary sequence is formed, and its previous part should be directly embedded in most significant bits of the encrypted image
using bit substitution.
According to the structure as shown in
Figure 5, it should be pointed out that the number of bits occupied by the Huffman coding rule is fixed, that is, 26 bits. At the same time, 8 Huffman codes are arranged in the order from 0 to 8, in accordance with tags they represent. Most importantly, because all of most significant bits of the encrypted image
are used to accommodate information, a large number of partial auxiliary data can be stored into them by bit substitution. Therefore, when recovering the original image, the necessary information, such as the Huffman coding rule, the error location map, and the original most significant bits of reference pixels, as shown in
Figure 5, can be easily retrieved at first. Based on the extracted Huffman coding rule, the tag map can be gradually retrieved from the 7 least significant bits of pixels. After the auxiliary data are completely obtained, the original image also can be easily recovered in accordance with the error location map and the original most significant bits of reference pixels.
Finally, scanning
from the left to right and from top to bottom, the remaining part of the auxiliary data is embedded into a part of the pixels by replacing multiple bits of high bit-planes, which can be described as
where
represents the current pixel containing the auxiliary information,
is its tag and
is a bit of the remaining auxiliary information to be embedded. From Equation (11), it can be seen that vacant bits for data accommodation only are determined from the 7 least significant bits of each pixel. After all the auxiliary information is embedded, the encrypted image containing auxiliary data, denoted as
, is obtained, in which enough vacant bits will be reserved from the unprocessed pixels to embed the secret data.
2.5. Secret Data Hiding
Before embedding the secret data, the Huffman coding rule and the tag of the pixel containing all of the auxiliary data should be recovered from the encrypted image
. In the beginning, partial auxiliary information is retrieved from all of the most significant bits of
, and the necessary information, such as the Huffman coding rule and the partial compressed tag map as shown in
Figure 5, can be easily obtained. Then, a part of the pixels in
are scanned from left to right and from top to bottom. For the current pixel, its Huffman code is retrieved from the previously restored compressed tag map, from which its tag
is obtained by referencing the Huffman coding rule. Only considering its 7 least significant bits, the
bits of auxiliary information can be extracted from high bit-planes. In this way, after all the auxiliary data are retrieved, the tag map is restored. Finally, according to their tags, bits in high bit-planes of the remaining pixels can be vacated to accommodate the secret data. Using Equation (11), the marked encrypted image
containing the secret data is obtained. It should be noted that the secret data is encrypted before embedding them in order to enhance the security of the embedded data. Similar to the image encryption using Equation (10), the binary bits are engendered with the help of the data hiding key
at first, and then the secret bits are encrypted using the XOR calculation between them and the generated binary bits.
2.6. Data Extraction and Image Recovery
To extract the secret data and recover the original image successfully, the entire auxiliary data should be extracted from the marked encrypted image
first. Since all of most significant bits are applied to save the previous part of the auxiliary data, information, such as the compressed error location map, the compressed result of most significant bits of reference pixels, the Huffman coding rule, and a part of the compressed tag map as shown in
Figure 5, can be extracted directly. The remaining of the compressed tag map are recovered by scanning the 7 least-significant bits of pixels from left to right and from top to bottom. For a pixel, its Huffman code is retrieved from the previously obtained compressed tag map, and its tag is obtained by referencing the Huffman coding rule. Thus, several corresponding bits containing the information of the remaining compressed tag map can be extracted from its high bit-planes. Until the length of the compressed tag map is satisfied, the tag map can be completely restored from all processed pixels. Obviously, the entire auxiliary data are recovered without involving any secret key, with which different results will be obtained depending on the different key.
If the data hiding key is accessible, with the aid of the restored tag map, the embedded secret information can be extracted from the 7 least significant bits of the remaining unprocessed pixels. For each unprocessed pixel, according to its tag map, several spare bits containing a part of secret information are extracted from its high bit-planes. After all the remaining pixels are processed, the entire secret information is extracted and then decrypted with the help of the binary bits engendered with the data hiding key. The decryption process is implemented using the XOR calculation between the secret bits and the generated binary bits. Notably, the information of the original image is not revealed due to the absence of the encryption key.
If the encryption key
is available, the original image can be recovered without any error with the help of the entire auxiliary data. First, using the reverse process described in Equation (10), where the pseudo-random matrix is the same and generated with the encryption key, the decrypted image
is directly obtained from the marked encrypted image. After
is decompressed, the obtained most significant bits are put back to the first row and column. Next, an image
is reconstituted from the seven least significant bit-planes of the decrypted image and scanned from left to right and from top to bottom to recover original bits for the 7 least significant bit-planes. For the current pixel, its tag
can be retrieved from the restored tag map. With the aid of the MED predictor, its prediction value
is calculated employing its neighboring pixels that are confirmed in the process of recovering its previous pixel. Thus, its original value is recovered by means of its tag and its prediction value, which can be described as
where
is calculated by using Equation (8). Based on this equation, it can be seen that
bits from the
bit-plane to the
bit-plane of
are directly recovered from the corresponding bits of
while the bit in the
bit-plane is obtained by negating the corresponding bit. Meanwhile, if the tag is equal to 7, the original value is equal to its prediction value. In this way, two aspects of the original information are recovered without any error. One aspect is the most significant bits of pixels in the first row and column, and another is the 7 least significant bits of all pixels.
Next, the most significant bits of other pixels are recovered according to the compressed error location map. After
is decompressed, the basic part of the error location map can be obtained, while the additional part is directly acquired from
as shown in
Figure 5. The obtained image is denoted as
, for the sake of simplicity. Except for pixels in the first row and column, all pixels are scanned from the left to right and from the top to bottom. For the current pixel
, its most significant bit is recovered as follows:
- (1)
The average value of its left and top pixels, which are previously confirmed, is obtained as its prediction value similar to the calculation using Equation (2).
- (2)
Two possible values denoted as and are considered. Keeping the 7 least significant bits of the current pixel unchanged, and are obtained by making its most significant bit equal to 0 and 1, respectively.
- (3)
The absolute difference values between
and
and between
and
are calculated as
- (4)
Finally, its most significant bit is determined by comparing the two absolute differences described above. If the condition is satisfied, the most significant bit is set to 0, otherwise it is set to 1.
Repeat the above steps for each pixel, and the decrypted modified image, denoted as
, can be obtained, where most significant bits of some pixels are modified as described with Equation (5). Using the decompressed basic and additional part of the error location map, these most significant bits can be corrected. First, the basic part of the error location map is arranged to the two-dimensional binary matrix
as described with Equation (4), and all pixels in
are scanned from left to right and from top to bottom. For a pixel, when
is satisfied, two corresponding bits are retrieved from the second part of the error location map as shown in
Figure 2. Then, if two bits are “00”, its value is corrected as
When the retrieved two bits are “01” or “10”, its value is corrected by subtracting 1 or adding 1, respectively. Obviously, the correction is the reverse process described with Equation (5). Consequently, the original image is obtained without any loss.
3. Experimental Results and Analysis
To demonstrate the performance of the proposed method, experimental results are analyzed in terms of security and embedding rate, and compared against state-of-the-art methods. In the experiments, six commonly used test images with size of
pixels, including smooth and textural images, as shown in
Figure 6 are taken as examples. A randomly generated bit stream is used as the secret data. The effectiveness of the proposed method is measured using images from different image databases, such as BOSSBase and BOWS2. In addition, for the error location map as shown in
Figure 3,
and
bits are utilized to save the length of the maximum run of “0” and “1”, respectively, and
bits is used to save the length of the compressed basic part. For the compressed results, as shown in
Figure 4,
occupies 1 bit,
and
bits are applied to save he length of the maximum run of “0” and “1”, respectively, and
bits is used to record the length of the compressed most significant bits of reference pixels. For the compressed tag map,
bits are used to record its occupied bits.
3.1. Security Analysis
Before embedding the secret data, the original image is encrypted by using the XOR calculation between it and a pseudo-random matrix generated by the encryption key as described with Equation (10). In order to verify the security of the proposed method, experimental results are analyzed from three parameters: the histogram, information entropy and correlation coefficient.
Using the image Lena as an example, the experimental results: the encrypted image, the encrypted image containing auxiliary data, the marked encrypted image, and the recovered image are shown in
Figure 7. It can be seen that all encrypted images, especially the marked encrypted image, are noisy and no valid information can be observed from them. The histograms corresponding to the original image, the encrypted image, and the marked encrypted image are shown in
Figure 8. Clearly, statistical features can be easily observed from the histogram of the original image’s data, while no valid information can be retrieved from other histograms due to their uniform distribution. Furthermore, after embedding the auxiliary information and secret data, the overall pixel distribution has somewhat changed, as shown in
Figure 8c; however, it is still difficult to recover the original image content. Similar results were obtained for other test images.
As a quantitative measure of the randomness of a signal source, information entropy is used to evaluate the randomness of the marked encrypted image, which is mathematically defined as
where
is the gray value and
is its probability. In general, for an encrypted image, its ideal entropy is 8. The more uniform a frequency distribution is, the higher is its information entropy. The information entropy values of six test images as shown in
Figure 6 are listed in
Table 2, and they are all very close to 8. Therefore, the proposed method can efficiently withstand this kind of statistical attack. Furthermore, in order to measure the degree of correlation between the original image and the marked encrypted image, the corresponding correlation coefficient is calculated as
where
is the cross covariance between the data representing the original image and the marked encrypted image, and
and
are their standard deviations, respectively. The correlation coefficient values of six test images are listed in
Table 3, from which it can be seen that all values are close to zero. This means that each marked encrypted image has a very low correlation with its corresponding original image. In addition, the pseudo-random matrix generated with the key
, used to encrypt the original image as described with Equation (10), can be considered as a binary sequence with the
bits, for which there are
possibilities. Due to the very large number of possibilities, it is almost impossible to solve this pseudo-random matrix. This then guarantees that the proposed method can resist a brute-force security threat.
3.2. Performance Analysis
As described in
Section 2, in order to make full use of pixel redundancy to embed information, the original image is divided into two parts in the proposed method. The first part consists of all the most significant bits. After identifying and modifying pixels with prediction errors, all the most significant bits can be vacated to accommodate information. The second part is a new image constituted with the 7 least significant bits of pixels. Utilizing the correlation between neighbor pixels, vacant bits can be obtained from higher bit-planes of this new image. In the above two processes, different kinds of auxiliary data are generated, as shown in
Figure 5. Among them, the tag map obtained from the second part occupies a large number of bits. Using Huffman coding, a concentrated tag map can be generated to reduce the volume of these data greatly. At the same time, through compression of the error location map and the most significant bits of reference pixels, as noted in the first part of the process, the number of embeddable bits is further increased. Consequently, the embedding capacity of the proposed method is efficiently improved.
After all the auxiliary data are generated in the two parts of the proposed method, the number of bits they occupied is known, and the total embedding capacity of the original image can be calculated. Subtracting the number of bits consumed by auxiliary data from the total capacity, the net embedding capacity is obtained, i.e., the maximum length of secret data. Using the test image Tiffany as an example, the total embedding capacity generated in the first part and the number of bits occupied by the corresponding auxiliary data are listed in
Table 4. It can be seen that many embeddable bits are consumed, in which the compressed basic part of the error location map is saved. Additionally, a fixed number of extra bits are used to record the length of the compressed basic part of the error location map, the length of the compressed bits of reference pixels, and the maximum length of each run, and so on. In this example, the net payload of the first part is 256,195 bits. For the second part, excluding the 26 bits used to record the Huffman coding rule, the tag distribution and encoding of pixels are listed in
Table 5. The first column lists 8 types of tags, and the second column lists the corresponding Huffman codes. The third and fourth columns list the numbers of tags and the length of each tag type, respectively. The total number of bits occupied by the compressed tag map is 779,926 bits, using Equation (9). The fifth column lists the numbers of bits that can be embedded in each pixel, and the total embedding capacity of the second part is 1,271,386 bits. The last column shows the net payload that each pixel can provide. In addition, a fixed number of bits are used to record the length of the compressed tag map, which is 20 bits. Thus, the total net payload of the second part is 491,414 bits after subtracting 26 bits used for the Huffman coding rule from the total embedding capacity. Finally, adding the net payloads of the two parts of the process, the total net embedding capacity is 747,609 bits, and the corresponding embedding rate reaches 2.8519 bpp.
The net payloads for each of the six test images as shown in
Figure 6 was calculated and their corresponding maximum embedding rates are listed in
Table 6. From this table, it can be seen that high embedding rate can be obtained when the original image’s histogram is smoothly distributed. For example, a total of 1,637,384 embeddable bits are vacated in the smooth image Splash. Although 762,626 bits are used to record the entire auxiliary data, there are still 874,758 bits available to record secret data, which yields a high embedding rate, 3.3369 bpp. For a textural image, with little visual uniformity, the total number of embeddable bits is relatively low, and most of them are consumed to save auxiliary data. Using the image Baboon as an example, a total of 1,093,027 bits are vacated from the image, which is obviously less than that of the image Splash. At the same time, only 315,512 bits are used to save the secret data, so that a relatively low embedding rate 1.2036 bpp is obtained. Nevertheless, the original image can be recovered without any error, regardless of whether it is smooth or textural. For example, as shown in
Figure 7d, although a total of 693,570 secret bits are embedded in the marked encrypted image and the net embedding rate achieves 2.6458 bpp, the test image Lena can be completely reconstructed as long as the encryption key is accessible.
To demonstrate the performance of the proposed method in means of maximum embedding rate, extensive experiments were performed on independent databases, BOSSBase [
40] and BOWS2 [
41]. In the BOSSBase and BOWS2 databases, there are 10,000 standard test images with the size of
pixels. As shown in
Table 7, for images in BOSSBase and BOWS2, the best embedding rates are 5.9212 bpp and 5.6459 bpp, respectively, while the worst embedding rates are 0.6886 bpp and 0.5348 bpp, respectively. Meanwhile, the average embedding rates are 3.3894 bpp and 3.2824 bpp, respectively. Most importantly, regardless of which image in these databases is tested, it can be reconstructed without errors, i.e., the peak signal-to-noise ratio (PSNR) between the original image and its recovered image tends to infinity, and the maximum structural similarity (SSIM) between these images equals 1.
3.3. Comparison with State-of-the-Art Methods
For the purpose of evaluating the proposed method relative to others, the embedding rates of this method and five state-of-the-art methods [
29,
30,
33,
37,
39], using the six test images shown in
Figure 6, are presented in
Figure 9. In Fu et al.’s method [
29], the block size that affects the embedding capacity was set to
pixels to achieve the maximum embedding rate. Meanwhile, the number of higher significant bit-layers of the image was set to 5, and the relevant threshold was set to 4. Similarly, the block partition in an original image was utilized and the parameters were set as
and
in Wu et al.’s method [
30]. It can be seen from
Figure 9 that in the two methods described above, excluding the image Baboon, the embedding rate of Wu et al.’s method is higher than that of Fu et al.’s. Using the images Lena and Splash to illustrate, the embedding rates of the method described in [
30] are 2.6447 bpp and 2.6515 bpp, respectively, while the embedding rates are 2.4797 bpp and 2.5908 bpp using the method in [
29]. For the method of Yi and Zhou [
33], the block size was also set to
pixels, but the embedding rates of this method on the six test images shown in
Figure 6 are lower than that of method [
29]. This is because the number of embeddable blocks is too low, such that no more vacant bits were created for embedding [
33]. In Puteaux and Puech’s method [
37], the secret data is mainly embedded using MSB substitution. Pixels with prediction errors should be marked and some of the most significant bits are consumed. Therefore, the maximum embedding rates of the six test images of this method are less than 1 bpp. However, in Yin et al.’s method [
39], the secret data is embedded using adaptive multi-MSB substitution, and better embedding performance can be obtained. For example, for smooth images such as Splash and Airplane, the highest embedding rate can be achieved, i.e., 3.3369 bpp and 3.0671 bpp, respectively. For the textural image Baboon, no matter which method was used, its embedding rate is the lowest among all test images. Nevertheless, it is obvious from
Figure 9 that regardless of which image is tested, the proposed method can provide higher embedding capacity than other methods.
In
Table 8, results from experiments performed using the proposed method and other methods on all images in the two comparison databases are presented. These results further demonstrate that, among the tested methods, the proposed method can achieve the highest embedding capacity. In detail, the average embedding rates of Fu et al.’s method [
29] for the two databases are 2.1733 bpp and 2.0454 bpp, respectively, while Wu et al.’s [
29] method’s performance was higher by 0.388 bpp and 0.474 bpp, respectively. In Yi and Zhou’s scheme [
33], the average embedding rate using images in BOSSBase is slightly higher than that using the images in BOWS2. In addition, the average embedding rates of Puteaux and Puech’s method [
37] are less than 1 bpp. Yin et al.’ s method [
39] takes full advantage of the local correlation in an original image to achieve better performance, where the embedding rates for the two databases are 3.3613 bpp, and 3.2455 bpp, respectively. Compared with that method, more redundant space and fewer bits occupied by auxiliary information resulted with the proposed method. The average embedding rates of the proposed method are 3.3894 bpp and 3.2824 bpp, respectively; thus, the performance of the proposed method was stronger than others.