In this section, a lossless data hiding scheme for VQ compressed images using adaptive prediction difference encoding is proposed. We first provide an overview of the proposed scheme. Afterward, we present the detailed procedures of data embedding, data extraction, and cover image recovery. Lastly, an example is illustrated for easier understanding of the proposed scheme.
3.2. Codebook Rearrangement
The original standard codebook is denoted by where for . is the size of the codebook and is typically a power of 2. Here, we set to be , where is a positive integer. According to the codebook, the original image with a size of is compressed into an index table using the VQ compression algorithm. Let denote the value in the index table at the i-th row and j-th column, where , and its range is . It is the index value of the codeword corresponding to the image block located in the i-th row and j-th column. We use an adjusted AIR algorithm to rearrange the codebook. The details of the adjusted AIR are descripted as follows.
Step 1. Calculate the adjacency matrix with a size of , where the element denotes the occurrence frequence of adjacency between the index and index in the index table . The adjacent directions include the horizontal, vertical, and downward diagonal.
Step 2. Construct a new index list to record the new order of the codewords. The original index list of the codebook is denoted as , which is represented as . Let us assume that is the maximum value of . Then, is initialed as and and are deleted from . That is, is updated as .
Step 3. Perform iterative loops until
is empty. For each iteration, use Equation (3) to search for an index
from
that has the maximal occurrence of adjacency to the indices of the left two-thirds of
, and the corresponding maximum value is represented as
. Meanwhile, use Equation (4) to search for an index
from
that has the maximal occurrence of adjacency to the indices of the right two-thirds of
, and the corresponding maximum value is represented as
. If
, then
is pushed into the right of
and
is updated by deleting
, that is,
; otherwise,
is pushed into the left of
and
should be updated as
.
Here, we search for two indices that have the maximum occurrence of adjacency within the left or right two-thirds of the list
. The goal is to select an index with a high occurrence of adjacencies to
while determining the side to which the selected index should be added. Under this asymmetric computation, we ensure that the selected index maintains a high correlation with the side it is added to, while also maintaining a high overall correlation.
The final new index list is represented as , where is a one-to-one mapping from to . Therefore, the rearranged codebook is denoted by . According to the rearranged codebook , the original image is compressed into a new index table using the VQ compression algorithm.
3.3. Prediction Difference Coding and Data Embedding
In the original image, the adjacent image blocks exhibit a strong correlation, resulting in high correlation between their corresponding VQ indices. Due to the rearranged the codebook, similar code words are eventually brought together. Therefore, the VQ index table obtained after VQ compression also exhibits high redundancy, similar to that of the original image. Based on the redundancy of the VQ index table, we apply the improved MED [
30] as the predictor to predict the indices and calculate the difference between the real indices and the predicted values. Furthermore, we present a prediction difference coding method to encode the prediction differences, thereby vacating the embedding room in the index table for data hiding.
3.3.1. Prediction Difference Calculation
For the VQ index table
of the original image
, we calculate the predicted values of the indices in
with a raster order and then compute the corresponding prediction differences to obtain a prediction difference table
.
denotes the index located in the
i-th row and
j-th column, where
, and the predicted value
of the index
is calculated as follows:
and the function
is defined as follows:
where
,
, and
. Then, calculate the prediction difference
between the real index value and the predicted value using the XOR operation as follows:
3.3.2. Prediction Difference Coding
Based on the prediction difference, we can determine the number of consecutive identical high significance bits (HSBs) between an original index and its predicted value. The number of consecutive identical HSBs between
and
is denoted by
and can be calculated as follows:
The value range of is from 0 to . Therefore, can take on possible values. To reduce the cost of storing the prediction differences, we apply adaptive binary tree encoding to label the different possible values of .
Let us assume that the total number of types used to label possible values is
.
Figure 3 shows two binary coding trees, which are designed according to whether the number of leaf nodes
k is odd or even. A path from the root node to each leaf node represents a unique code that will be used to label a type of possible value. Thus, if
k is an odd number, then
k types of the codes are {00, 01, 100, 101,
,
,
,
}, where
; if
k is an even number, then
k types of codes are {00, 01, 100, 101,
,
,
,}, where
. The difference between the two cases is at the bottom portions of the coding trees.
In our scheme, we classify the prediction differences into
t types, as shown in
Table 1. Type 1 denotes that the prediction difference is 0, meaning that the number of identical HSBs between the original index and its predicted values is
. Type 2 denotes that the number of identical HSBs is
Type
denotes that the prediction difference is within
, meaning that the number of identical HSBs is
, where
. Finally, for the prediction differences greater than
, we classify them into Type
, where the number of identical HSBs is 0 or 1. To assign the binary tree codes to label the
t types of prediction differences, we first define a vacancy capacity for the
-th type as follows:
where
represents the occurrence frequency of type
in the entire prediction difference table
. Then, we sort the vacancy capacity sequence
in a descend ordering to obtain a sorted sequence
, where
is a one-to-one mapping from
to
, and
when
. The designed codes are assigned to the
t types of prediction differences as indicators according to the order
), which is shown in
Table 2.
3.3.3. Data Embedding
In the index table , the index value is represented by a -bit binary. According to the prediction method, we can recover an index using the prediction difference. Therefore, we only need to record the prediction differences for each index except for the first one. We first calculate vacancy capacities for each prediction difference types of the index table, and then determine the types order ). We can reduce the storage cost of an index table and vacate the embedding room for data hiding using the adaptive prediction difference coding. In the data embedding process, we use indicators to label the types of the prediction differences and embed secret data in the vacated room.
For the index value except for the first one, the data embedding procedure is presented as follows:
Step 1. Calculate the predicted value of the index value by Equation (5).
Step 2. Calculate the prediction difference by Equation (7) and then determine by Equation (8), which is the number of consecutive identical HSBs between and .
Step 3. Convert the prediction difference
to the corresponding indicator according to the coding rules in
Table 2, denoted as
, with its length denoted as
.
Step 4. Determine the number of low significant bits (LSBs) of
that must be retained after coding, denoted as
. Because we can be certain that the (
+1)-th HSB of
differs from
, there is no need to record the
+1)-th HSB of
. Therefore,
can be calculated as follows:
Retain the
LSBs of
, denoted as
.
Step 5. Calculate the number of vacated bits after prediction difference coding, denoted as
. After coding, the prediction difference is converted into two parts: an indicator and the retained LSBs. Therefore,
can be calculated as follows:
If
is greater than 0, it means that we have
bits of room available for embedding secret data. If
is less than 0, we record the
LSBs of
in the auxiliary information sequence for index recovery during the recovery process.
Repeat step 1 to step 5 until all the indices are processed. The determined indicators of type 1 to type
t are concatenated into an indicator sequence, denoted as
. We use
to substitute the first
bits of the prediction differences and the original prediction difference bits are recorded in the auxiliary information sequence, where
is the length of
. It is
/4 when
t is even and
/4 when
t is odd. Concatenate the auxiliary information sequence with the secret data. All the data are encrypted with a data-hiding key. Then, generate the marked index
after embedding
into
. If
,
bits of encrypted secret data are embedded into
, denoted as
. Concatenate the indicator
, the retained LSBs
, and the encrypted data
to generate the
t-bit of marked index value
, that is
If
, then the
t-bit of marked index value
is generated as follows:
If
, the (
)HSBs of
, denoted as
, are concatenated with the indicator
to generate the marked index value
, that is
And, the rest
LSBs of
are recorded in the auxiliary information sequence.
Once all the encrypted secret data are embedded into the index table, the marked index table is obtained. Then, the and the rearranged codebook consist of the marked VQ compression stream that is then transmitted to the receiver or restored.
3.4. Data Extraction and Cover Image Recovery
After receiving the marked VQ compressed stream, the receiver can obtain the marked index table and the rearranged codebook . We can extract the secret data and recover the original VQ index table with shared data hiding key.
According to the marked index table and the codebook, the length of the code book, that is, the value of
t, can be obtained. The length of the indicators sequence
can be calculated according to
, and then the indicators sequence
can be extracted from the
bits following
. The prediction difference coding rule table as shown in
Table 2 can be constructed afterward. The data extraction and index recovery procedure in the marked index
after
are described as follows:
Step 1. Convert into a -bit binary. According to the coding tree, separate the indicator from the -bit binary starting from the most significant bit. The length of is represented as .
Step 2. According to the prediction difference coding rule table, obtain the number of consecutive identical HSBs based on the .
Step 3. Calculate , the length of the retained LSBs of , using Equation (10). Then, calculate , the number of vacated bits after prediction difference coding, using Equation (11).
Step 4. If , we extract bits of message from the LSBs of , denoted as . If , there is no secret bit embedded into .
When all the embedded data are extracted, we decrypted it with a data hiding key and split it into two parts: the secret message and the auxiliary information sequence. Next, recover the first bits prediction difference bits from the top bits of the auxiliary information sequence. After that, delete the top bits from the auxiliary information sequence.
We recover the index table in a raster scan order started from . To recover the index , we calculate the predicted index using the Equation (5), and then follow one of the three cases below:
Case 1: If , then =.
Case 2: If , then set the first HSBs of to be the same as and set the )-th HSB of to be different from . Next, if , then the ) LSBs of are set to ) bits following in ; if , then the LSBs of are set to be the top bits of the auxiliary information sequence and the middle bits of are set to be the retained ) bits of .
Case 3: If , then set the ) HSBs of to be the remaining ) bits of after indicator and set the LSBs of to be the top bits of the auxiliary information sequence.
In two later cases, we delete the corresponding top bits in the auxiliary information sequence before moving on to recover the next index.
3.5. Example Illustration
In this subsection, we present an example to illustrate the process of data embedding, data extraction, and index recovery. Let us assume that the length of the codebook is 256, that is
. The prediction differences are classified into 8 types according to
Table 1. Based on the coding tree with
and the vacancy capacities of prediction different types, we can generate the coding rules for prediction differences. We assume that
) is (1, 2, 3, 4, 5, 6, 7, and 8), and the corresponding coding rules are as shown in
Table 3. We use the set of indicators {00, 01, 100, 101
,
,
, and
} to label Type 1 through Type 8. For Type 8, we use ‘1111’ to indicate the case where the number of consecutive identical HSBs is either 0 or 1. Here, we also state the corresponding numbers of retained LSBs and the vacated bits after prediction difference encoding. We illustrate the secret data embedding only. The indicator sequence and the auxiliary information sequence are excluded.
Assuming the index table
is (21, 27, 27, 30; 18, 21, 21, 31; 18, 21, 31, 30; 20, 21, 29, 29) the procedure of data embedding is shown in
Figure 4. The predicted index table
is (21, 21, 27, 27; 21, 24, 21, 30; 18, 21, 31, 31; 18, 20, 31, 31), where the first index remains unchanged. Calculate the prediction difference table
by Equation (7), and then calculate the numbers of the consecutive identical HSBs between the original index values and the predicted index by Equation (8), where the results are shown in
N. Based on the numbers of identical HSBs, we determine the corresponding indicator, the number of retained LSBs, and the vacated bits after prediction difference encoding, which are shown in
,
, and
, respectively. Because each
, where
and
, the marked index
consists of
,
-LSBs of
, and
bits of encrypted secret messages. The final marked index value table is
.
The illustration of the data extraction and the index recovery is shown in
Figure 5. The marked indices are converted into 8-bit binaries. Assume that the indicator sequence has been extracted. We can obtain the indicator of each index and then determine the number of retained LSBs in each index, which are shown in
and
, respectively. Based on the
HC and
, we can calculate the number of secret data bits embedded in each marked index as shown in
. According to
, we extract the embedded message from the corresponding LSBs of marked indices. After the data extraction, we can recover the original indices in a raster scan ordering. Let us assume that the indices before
has been recovered. To recover
, we first calculate the predicted index
of
, which is 24. Convert
and
into 8-bit binaries, respectively. According to the indicator sequence, we know the indicator
=
. Then, based on the coding rules, we obtain
= 4 and
= 3. Because
= 4, the 4 HSBs of
are the same as the bits of
, and the 5th HSB differs from the 5-th HSB of
. Since
= 3, we can determine that the 3 LSBs of
are the 3 bits following
in
. Therefore,
= 21. Next, we can move on to recover the next index until we reach
, which is the final index.