*Article* **Reversible Data Hiding Using Inter-Component Prediction in Multiview Video Plus Depth**

**Jin Young Lee 1,†, Cheonshik Kim 2,\*,† and Ching-Nung Yang 3,†**


Received: 5 April 2019; Accepted: 22 April 2019; Published: 9 May 2019

**Abstract:** With the advent of 3D video compression and Internet technology, 3D videos have been deployed worldwide. Data hiding is a part of watermarking technologies and has many capabilities. In this paper, we use 3D video as a cover medium for secret communication using a reversible data hiding (RDH) technology. RDH is advantageous, because the cover image can be completely recovered after extraction of the hidden data. Recently, Chung et al. introduced RDH for depth map using prediction-error expansion (PEE) and rhombus prediction for marking of 3D videos. The performance of Chung et al.'s method is efficient, but they did not find the way for developing pixel resources to maximize data capacity. In this paper, we will improve the performance of embedding capacity using PEE, inter-component prediction, and allowable pixel ranges. Inter-component prediction utilizes a strong correlation between the texture image and the depth map in MVD. Moreover, our proposed scheme provides an ability to control the quality of depth map by a simple formula. Experimental results demonstrate that the proposed method is more efficient than the existing RDH methods in terms of capacity.

**Keywords:** 3D; depth map; inter-component prediction; MVD; reversible data hiding; texture

### **1. Introduction**

Data hiding (DH) [1] plays an important role in secret communication. For this purpose, secret information and metadata are embedded in the cover media, such as still image, video, audio, 3D video, and so on. The visual quality and capacity of the cover image are important criteria for DH schemes. In addition, reversible DH (RDH) techniques are developed to extract the embedded secret information and restore losslessly the cover image.

Up to date, various RDH algorithms have been proposed, e.g., difference expansion-based algorithms [2–6], histogram shifting [7–10], prediction-error expansion (PEE) [11–14] and integer-to-integer transform [15–17], etc.

The approaches of difference expansion (DE) show good performance in respect of high-capacity. The first introduction of this algorithm was by Tian, and the research has been extended by [3,4]. For data embedding, it has to make a room for a secret bit through a pixel extension, and inserts data therein. Alattar [3] improved the performance of Tian's work by generalizing a DE technique for all integer conversions. The method proposed by Sachnev et al. is that image pixels are separated into black and white squares of a chessboard with two identical sets diagonally connected. This prediction method, called rhombus, is superior to the existing prediction methods (e.g., median edge detector predictor (MED) and gradient-adjusted predictor (GAP)) by making the average value of adjacent

neighbors of a specific pixel as a predictive value. Thereafter, various methods for improving prediction performance were introduced.

The histogram shifting (HS) technique is also known as a method having relatively a few distortion in a cover image. However, it requires a location map in RDHs to embed data and restore a cover image. Al-Qershi and Khoo proposed a 2-dimensional DE (2D-DE) scheme achieving about 1-BPP performance [6]. These histogram-based schemes may achieve good visual quality and adequate embedding capacity, but it has the drawback having to send a pair of peaks and zero points to the receiver.

PEE involves a process that obtains the prediction error (PE) from the neighborhood of a pixel and embeds information bits into the expanded errors. If the difference between an original pixel and a predicted pixel is large, the distortion of the cover image is greatly enlarged during the embedding process. In this case, by enhancing the data embedding capacity of the region of low frequency in the cover image, it may maintain the image quality and embedding capacity of the cover image. Compared to DE and HS-based methods, it is well known that PEE performs better. When we are considering the existing PEE methods, DH with order prediction has less distortion at low embedding rates.

Meanwhile, with the rapid development of multiview video technologies, viewers can experience more realistic 3D scenes with highly advanced multimedia systems, such as 3D television and free-viewpoint television. To overcome a limited bandwidth, the multiview video plus depth (MVD) is adopted as a 3D video format [18,19]. In MVD (see Figure 1), a texture image indicates intensities of an object, whereas a depth map represents a distance between an object and a camera as a grayscale image having values between 0 and 255. Because MVD enables the advanced video system to arbitrarily generate virtual views by using a depth image-based rendering (DIBR) method [20], a small number of view information can be transmitted.

**Figure 1.** MVD consisting of a texture image (**left**) and its corresponding depth map (**right**).

Until now, various watermarking technologies [21–25] have been introduced for marking 3D videos. Asikuzzaman and Pickering [21] proposed a digital watermarking approach that inserts a watermark into the DT-CWT coefficients. Pei and Wang [22] introduced a 3D watermarking technique based on the D-NOSE model which can detect the suitable region of the depth image for watermark embedding. Since view synthesis is very sensitive to variations in-depth values, this scheme focuses mainly on the synthesis error. Wang et al. [23] exploited scale-invariant feature transform (SIFT)-based feature points to synchronize a watermark but focused on only signal processing and omitted geometric attacks.

Based on MVD, Chung et al. [26] and Shi et al. [27] proposed RDHs for depth maps using a depth no-synthesis-error (D-NOSE) model [28] and PEE. Each pixel in the depth maps has an allowable pixel range and the pixel value is increased or decreased in an allowable range. However, it may be guaranteed that there are no errors in the synthesized image using D-NOSE. Taking advantage of this characteristic may be used to improve embedding capacity for RDH. Chung et al. first proposed a method based on PEE that could effectively hide data in depth maps of 3D images. The disadvantage of Chung et al.'s method is that it does not provide sufficient embedding capacity. Shi et al. proposed a way to use all of the acceptable range of pixels to solve the problem proposed by Chung et al.'s method. However, Shi et al. did not suggest a systematic way of adjusting the embedding capacity considering a quality of depth map.

In this paper, we first analyze the disadvantages of Chung et al.'s reversible data hiding algorithm and propose a PEE-based DH technique that completely uses the allowable range of each pixel using an inter-component prediction method. The performances of the proposed RDH are improved by using the correlation between the texture and the depth map, which is an advantage of the inter-component prediction. Also, we propose a method to control embedding rates and image quality systematically which may be applied to various RDH applications such as medical or military fields.

The remainder of this paper is organized as follows. Section 2 briefly discusses view synthesis, difference expansion, related RDH and watermarking methods. In Section 3, we introduce a reversible data hiding method based on D-NOSE model, PEE, and inter-component prediction. In Section 4, we compare and analyze the experimental result with conventional RDHs and our proposed RDH. Finally, this paper concludes in Section 5.

### **2. Related Works**

In Section 2, we explain a 3D view synthesis principle, a difference expansion (DE) method, and Chung et al.'s [26] and Shi et al.'s [27] RDH methods based on 3D view synthesis. In addition, Zhu et al.'s [24], Wang et al.'s [23], and Asikuzzaman et al.'s [25] digital watermarking methods are also introduced.

### *2.1. View Synthesis*

We can obtain a 3D version of the classical 2D videos with depth information via 3D view synthesis. Depth information plays a key role in synthesizing virtual views and the quality of synthesized views is critical in 3D video systems. In view synthesis, a pixel in the texture image is mapped to a new position in the virtual view by using the corresponding depth value. First, the disparity *d* in a pixel of depth map is obtained using the following equation.

$$d\_{i} = \frac{f \cdot l}{255} \left( \frac{1}{z\_{near}} - \frac{1}{z\_{far}} \right) \times q\_{i}. \tag{1}$$

where *f* and *l* denote the focal length and the baseline distance between two horizontally adjacent cameras, respectively, and *znear* and *z f ar* mean the nearest and farthest depth values, respectively. The pixel *q* indicates the *ith* depth pixel value. After the calculation of a disparity *di*, it is rounded into an integer value. Then, the pixel (*x* , *y* ) is filled by shifting the pixel (*x*, *y*) with *d* (see Figure 2).

$$
\begin{pmatrix} \mathbf{x'}\\\mathbf{y'} \end{pmatrix} = \begin{pmatrix} \mathbf{x} \pm d\\\mathbf{y} \end{pmatrix} \tag{2}
$$

**Figure 2.** View synthsis based on disparity information.

Based on the D-NOSE model [28], the symbols *L* and *U* indicate the minimum and maximum depth values within the allowable distortion range. If the marked pixel *q <sup>i</sup>* is still in the range ([*L*, *<sup>U</sup>*]) after hiding the data in *qi*, the virtual view will not be distorted. The notation *ϕ* is a set collecting depth pixel value with the disparity (*di* <sup>=</sup> *<sup>n</sup>*). *<sup>N</sup>* denotes the number of pixels (width <sup>×</sup> height) in the depth map.

For example, assuming that a disparity *n* = 32 and the minimum and maximum pixels belonging to *<sup>n</sup>* are [*Lq*, *Uq*] = {21...24}. It means that there are four pixels corresponding to disparity *di* <sup>=</sup> 32, i.e., the pixels are 21, ... , 24. The way to figure out the allowable range for each pixel of the depth map is shown stepwise in the Algorithm 1.

$$\varphi\_n = \{q\_i \in (d\_i = n)\}\_1^N \tag{3}$$
