**Algorithm 1** DRange(*q*)

```
Input: input pixel q
Output: [L, U]
 1: input pixel q
 2: d ← depth2disprity(q) // Equation (1)
 3: for i = q − 1 To 0 Step − 1 do
 4: if depth2disprity(i)! = d then break
 5: end if
 6: end for
 7: L ← i + 1
 8: for i = q + 1 To 255 Step 1 do //8-bit depth pixel
 9: if depth2disprity(i)! = d then break
10: end if
11: end for
12: U ← i − 1
13: return [L, U]
```
### *2.2. Difference Expansion*

In this section, we describe the concept of RDH using pixel prediction and DE, where *p* and *p*ˆ denote an original pixel and a predicted pixel. A prediction error is determined by *ei*,*<sup>j</sup>* <sup>=</sup> *pi*,*<sup>j</sup>* <sup>−</sup> *<sup>p</sup>*ˆ*i*,*j*. If *ei*,*<sup>j</sup>* < *T* and no overflow and underflow on each pixel, the secret bits *b* may be embedded into the pixel *p* as *p <sup>i</sup>*,*<sup>j</sup>* <sup>=</sup> *pi*,*<sup>j</sup>* <sup>+</sup> *ei*,*<sup>j</sup>* <sup>+</sup> *<sup>b</sup>*. If <sup>|</sup>*ei*,*j*| ≥ *<sup>T</sup>*, it is not appropriate to embed secret bits, because the carrier pixel *p* may have higher prediction error than the other embedded pixels. This pixel is modified as follows:

$$p'\_{i,j} = \begin{cases} \ p\_{i,j} + T, & \text{if } (e\_{i,j} \ge T) \\\ p\_{i,j} - (T - 1), & \text{if } (e\_{i,j} \le -T) \end{cases}$$

The location information of the underflow or overflow is recorded in the location map and is used for decoding. If the predictive values are the same before and after data hiding, the reversibility of the watermarking is guaranteed. It should restore the shifted values *T* before obtaining the predictive value. Then, it is possible to obtain the predictive values correctly.

### *2.3. Chung et al.'s Reversible Data Hiding*

The D-NOSE model can guarantee zero synthesis distortion by determining an allowable distortion range for each depth pixel. The previous two existing methods [26,27] use the rhombus prediction to obtain the prediction pixel *p*ˆ = ' *<sup>p</sup>*<sup>1</sup> <sup>+</sup> *<sup>p</sup>*<sup>2</sup> <sup>+</sup> *<sup>p</sup>*<sup>3</sup> <sup>+</sup> *<sup>p</sup>*<sup>4</sup> 4 ( , and the prediction error *<sup>e</sup>* <sup>=</sup> *<sup>p</sup>* <sup>−</sup> *<sup>p</sup>*<sup>ˆ</sup> means the difference between the original pixel *p* and the predicted depth pixel *p*ˆ. Here, *p*1, *p*2, *p*3, and *p*<sup>4</sup> denote the four adjacent pixels, where they are placed on the top, left, bottom, and right sides of *p*, respectively. The average value is rounded up in the calculation. If the number of hidden bits *m* for a pixel *<sup>p</sup>* is *log*2(*<sup>U</sup>* <sup>−</sup> *<sup>p</sup>*ˆ*<sup>i</sup>* <sup>+</sup> <sup>1</sup>) when *<sup>e</sup>* <sup>=</sup> 0 and *<sup>m</sup>* <sup>=</sup> *log*2(*pi* <sup>−</sup> *<sup>L</sup>* <sup>+</sup> <sup>1</sup>) <sup>−</sup> <sup>1</sup> when *<sup>e</sup>* <sup>=</sup> <sup>−</sup>1. Otherwise, *m* = 0.

The marked pixel *<sup>p</sup>* is obtained from *<sup>p</sup>* <sup>=</sup> *<sup>p</sup>*<sup>ˆ</sup> <sup>+</sup> *<sup>b</sup>* when *<sup>e</sup>* <sup>=</sup> 0 and *<sup>p</sup>* <sup>=</sup> *<sup>p</sup>*<sup>ˆ</sup> <sup>−</sup> *<sup>b</sup>* <sup>−</sup> 1 when *<sup>e</sup>* <sup>=</sup> <sup>−</sup>1. Here, *b* means a binary number of *m* bits, and the allowable range is [*L*, *U*]. On the receiving side, the secret bits *<sup>b</sup>* can be simply obtained by the expression *<sup>b</sup>* = (*e* mod 2*m*), where *<sup>e</sup>* <sup>=</sup> *<sup>p</sup>* <sup>−</sup> *<sup>p</sup>*ˆ.

The location map in Chung et al.'s method is a simple way, i.e., mark "1" if there is no hidden data at the position and "0" otherwise. If the surface of the image is the same color, most of the location map may be zeros since most of prediction errors may be *e* = 0. For this reason, Chung et al. used arithmetic coding to compress the location map and then hide the location map in the front or back end of the depth map. Chung et al.'s method achieves the purpose of RDH for 3D synthesis images. Unfortunately, it does not embed a sufficient amount of data.

There is a vulnerability in Chung et al.'s method. For example, if a pixel *p* and an error *e* is 85 and 0, respectively, and acceptable range of the pixel *p* is [83, 87], then it is allowed to hide only 1-bit in the pixel *<sup>p</sup>*, because of *<sup>m</sup>* <sup>=</sup> *log*2(<sup>87</sup> <sup>−</sup> <sup>85</sup> <sup>+</sup> <sup>1</sup>) <sup>=</sup> 1. If a pixel is *<sup>p</sup>* <sup>=</sup> 87 and *<sup>e</sup>* <sup>=</sup> 0, then it does not allow to embed bits. That is because a room to hide bits is determined by the position of the predicted pixel in an allowable range of a pixel. Thus, there is no room to hide a bit when the pixel is *p* = 87. Another vulnerability is that the quality of the depth map is not considered sufficiently, because the quality only depends on the feature of the depth map.

### *2.4. Shi et al.'s Reversible Data Hiding*

Shi et al. [27] proposed a RDH based on D-NOSE model, where the method embed information into double layer of depth maps. Here, the prediction method for PEE obtains *p*ˆ by the method of rhombus prediction. In order to use of the allowable range fully, it is embed the data into the prediction-error (*<sup>e</sup>* <sup>=</sup> *<sup>p</sup>* <sup>−</sup> *<sup>p</sup>*ˆ) value 0, and the pixel is expanded toward either the maximum or the minimum values. In the allowable range [*ln*, *un*] of the disparity *di* = *<sup>n</sup>* in the pixel, the number of bits for the pixel *qi* is *mi* = *log*2(*u*<sup>∗</sup> *<sup>n</sup>* − *l* ∗ *<sup>n</sup>* <sup>+</sup> <sup>1</sup>) where *<sup>q</sup>*ˆ*<sup>i</sup>* <sup>≤</sup> *<sup>u</sup>*<sup>∗</sup> *<sup>n</sup>* ≤ *un* and *ln* ≤ *l* ∗ *<sup>n</sup>* <sup>≤</sup> *<sup>q</sup>*ˆ*<sup>i</sup>* when *ei* <sup>=</sup> 0. Otherwise, it is *mi* = 0. The marked pixel *<sup>q</sup> <sup>i</sup>* may be expressed as *q <sup>i</sup>* <sup>=</sup> *<sup>l</sup>* ∗ *<sup>n</sup>* <sup>+</sup> *<sup>b</sup>*, where *<sup>b</sup>* <sup>=</sup> {0, 1, 2, ... , 2*<sup>m</sup>* <sup>−</sup> <sup>1</sup>}. For example, if a pixel *p* and an error *e* is 85 and 0, respectively, and acceptable range of the pixel *p* is [83, 87], then it is allowed to hide *<sup>m</sup>* = *log*25.

### *2.5. Zhu et al.'s Digital Watermarking*

Zhu et al. [24] propose a watermarking method for a new viewpoint video frame generated by DIBR (Depth Image-Based Rendering). To preserve the watermark information during the generation of the viewpoint video frame, the blocks in foreground object of original video frame is selected to embed the watermark because pixels in this kind of object are more likely to be preserved in the warping. Here, for watermarking, DCT transformation of the blocks in foreground object is firstly done. Then, after embedding the watermark into the DCT, IDCT should be done before the DIBR.

### *2.6. Wang et al.'s Digital Watermarking*

Wang et al. [23] propose a novel watermarking method for DIBR 3D images by using SIFT to select the area where watermarking should be embedded. Then, the watermark information is embedded into the DCT coefficients of the selected area by using Spread spectrum technology. The SIFT is used to select suitable areas in which watermarking should be embedded by applying *n* × *n* 2D-DCT to the areas selected. Next, spread spectrum technique and orthogonal spread spectrum code are applyed to embed the watermark. In order to extract watermarks from images, we can compute the correlation between DCT coefficients of every area and the spread spectrum code to estimate the embedded message.

### *2.7. Asikuzzaman et al.'s Digital Watermarking*

Asikuzzaman et al. [25] proposed a blind video watermarking algorithm in which a watermark is embedded into two chrominance channels using a double tree complex wavelet transform. The chrominance channel has a watermark and preserves the original video quality and the double tree composite wavelet transform ensures robustness against geometric attacks due to the shift invariant nature. The watermark is extracted from a single frame without the original frame. This approach is also robust to downscaling in arbitrary resolution, aspect ratio change, compression, and camcording.

### **3. Proposed Scheme**

In this section, we introduce RDH based on the D-NOSE model by using PEE and inter-component prediction to improve the accuracy of predictive pixel in our proposed method. The accuracy of the prediction maximizes the performance of the proposed RDH based on the D-NOSE model using maximum allowable pixel range of each pixel. Moreover, our proposed scheme has the capability controlling the quality and embedding capacity of the depth map.

RDH methods based on the 2D texture images usually have a trade-off between the capacity and quality of the cover signal. However, since the depth map is used to synthesize virtual views as non-visual data, we may embed secret data into the depth map without degrading the quality of the virtual view synthesis. The D-NOSE model can guarantee zero synthesis distortion by determining an allowable distortion range for each depth pixel. Besides, since we apply the existing PEE technology to the 3D depth map and use the location map, our proposed method is also a perfect way to restore the original depth map.

### *3.1. Inter-Component Prediction*

Here, we will introduce inter-component prediction to obtain a better prediction for high embedding capacity. Figure 3 illustrates the configuration of the depth pixel, and inter-component prediction based on the corresponding texture pixels. A depth map in Figure 3a is composed of marked '-' and '♦'. The pixels in depth map are subdivided into two sets, <sup>Φ</sup>1(∈ {- ... -}) and <sup>Φ</sup>2(∈ {♦ ... ♦}). The inter-component prediction (in Figure 3b) selects adaptively the predicted direction for the depth pixel by comparing the corresponding texture pixel and its adjacent pixels. When the depth pixel *qi*,*<sup>j</sup>* is predicted from neighbor pixels, *q*1, *q*2, *q*3, and *q*4, as shown in Figure 3b, the corresponding texture pixel *ti*,*<sup>j</sup>* is compared with *t*1, *t*2, *t*3, and *t*<sup>4</sup> as follows

$$\dot{d}rD = \underset{\dot{d}rT}{\min} (|t\_1 - t|\_\prime |t\_2 - t|\_\prime |t\_3 - t|\_\prime |t\_4 - 4|),\tag{4}$$

where *dirT* and *dirD* denote the prediction direction of the texture image and the depth maps, respectively. For example, if the pixel having the minimum texture difference is *t*1, *qdirD* is set as *q*1. The existing rhombus prediction method was an efficient prediction method for color and grayscale images. However, the rhombus method is less effective than the proposed method for depth map pixels. That is why we propose a way to predict the pixel of depth map by considering the texture image. Obviously, considering 3D synthesis, it seems that inter-component prediction is an excellent method.

The prediction is alternately performed for two sets, Φ1 and Φ2. That is, Φ2 is used to predict Φ1, and vice versa. When predicting Φ2, the prediction may not be accurate because there are hidden pixels in Φ1. To tackle this matter, we use the average of the unmarked pixel in Φ1. This is because the correlation between the marked depth pixels and the associated pixels is low, and the prediction is not accurate.

**Figure 3.** Diagram for configuration of the depth map, and inter-component prediction based on the corresponding texture pixels.

### *3.2. Embedding Algorithm*

Here, we will examine in detail the data embedding procedure. To extract hidden bits and recover the original depth map, we have to record locations on whether each pixel contains hidden bits or not.


$$
\sigma\_{i,\bar{j}} = q\_{i,\bar{j}} - q\_{dirD}.\tag{5}
$$

**Step 3:** If (*ei*,*<sup>j</sup>* = 0), the number of bits *<sup>m</sup>* that can be embedded into the pixel *qi*,*<sup>j</sup>* is calculated using the allowable pixel range of the pixel *qi*,*<sup>j</sup>* (Equation (6)), where (*Lq*, *Uq*) is obtained from the Algorithm 1.

$$\begin{cases} \ m = \lfloor \log\_2(\mathcal{U}\_q - L\_q + 1) \rfloor. \\ \text{ if } (m < 1), \text{ got Step 1} \end{cases} \tag{6}$$

Here, (*x*) is the integer less than or equal to *<sup>x</sup>*.

**Step 4:** The binary secret bits *<sup>η</sup><sup>m</sup>* are embedded in *qi*,*<sup>j</sup>* using Equation (7). (Note: the function *<sup>b</sup>*2*d*(·) is to converts binary values to a decimal value). That is, the *η<sup>m</sup>* is included in the expanded *e* by DE.

$$q'\_{i,j} = \begin{cases} \ \mathfrak{e}'\_{i,j} = 2^m \times \mathfrak{e}\_{i,j} + b2d(\eta\_m) \\ \ L\_{\eta} + \mathfrak{e}'\_{i,j} \end{cases} \tag{7}$$

**Step 5:** if (*ei*,*<sup>j</sup>* = <sup>0</sup>), *LMi*,*<sup>j</sup>* = 0, otherwise *LMi*,*<sup>j</sup>* = 1. (Notes: location map *LMi*,*<sup>j</sup>* = 0 means that hidden bits exists.)

$$LM\_{i,j} = \begin{cases} \ 0, & \text{if } (e\_{i,j} = 0) \\ \ 1, & \text{otherwise} \end{cases} \tag{8}$$

**Step 6:** Go to Step 1 until all pixels are processed.

After the embedding procedure is finished, all pixels including the hidden bits are still within the acceptable range. Therefore, 3D synthesis images have no distortion.

**Example 1.** *Given two blocks in Figure 4a, we demonstrate how to hide secret bits in the pixel qi*,*j. First, we obtain the predictive pixel from the texture block and depth block. Applying Equation (4), it is observed that <sup>t</sup>*<sup>1</sup> <sup>=</sup> <sup>51</sup> *is the optimum pixel, so qdirD becomes qi*−1,*<sup>j</sup>* <sup>=</sup> <sup>22</sup>*.*

After applying Equation (5), if *ei*,*<sup>j</sup>* = (*qi*,*<sup>j</sup>* <sup>−</sup> *qdirD*) = <sup>22</sup> <sup>−</sup> <sup>22</sup> <sup>=</sup> 0, we may obtain disparity and allowable range through Algorithm 1. i.e., [*Lq*, *Uq*]=(20, 23). The number of embedded bits in the pixel can be obtained via the Equation (6), i.e., *m* = 2. Thus, it takes 2-bits from the secret data and converts it into a decimal value. By applying the Equation (7), *q <sup>i</sup>*,*<sup>j</sup>* <sup>=</sup> <sup>20</sup> <sup>+</sup> <sup>2</sup> <sup>=</sup> 22. Finally, the pixel *<sup>q</sup>* has secret bits 10 <sup>2</sup>. In this case, *qi*,*<sup>j</sup>* is the same as *q i*,*j* , so there is no noise in the marked pixel *q i*,*j* .

**Figure 4.** Example of the data embedding using the proposed method.

### *3.3. Extraction Algorithm*

Suppose that a depth map containing secret data and a location map was delivered to the receiver side. We describe the procedure of extracting the hidden data and recover original pixels from the depth map. The detail (stepwise) is as follows.

**Step 1:** Reads 3 × 3 block from the given depth map and assigns it to the variable *B*. Obtains the pixel *q <sup>i</sup>*,*<sup>j</sup>* and the predicted pixel *qdirD* from the *B*, respectively.


$$\eta\_{\mathfrak{m}} = \begin{cases} \mathfrak{e}'\_{i,j} = q'\_{i,j} - \mathcal{L}\_{\mathfrak{q}} \\ \ b = d 2b(\mathfrak{e}'\_{i,j'}, \mathfrak{m}) \end{cases} \tag{9}$$

**Step 4:** The original pixel *qi*,*<sup>j</sup>* is restored by replacing the pixel *q* with *qdirD*. Go to Step 1 until all pixels are processed.

**Example 2.** *First, Assuming that through the data embedding procedure (see Figure 4) in Example 1, the pixel q <sup>i</sup>*,*<sup>j</sup> having the binary bits* 10 <sup>2</sup> *was transferred to the receiver. On the receiving side, some of the procedure for extracting the hidden bits in q <sup>i</sup>*,*<sup>j</sup> are similar to the embedding procedure. The pixel qdirD is determined using the neighboring pixels of the pixel q <sup>i</sup>*,*<sup>j</sup> and the inter-component prediction method. In this case, the predicted pixel is qdirD* = <sup>22</sup>*. Since the location map LMi*,*<sup>j</sup>* = <sup>0</sup> *at position* (*i*, *<sup>j</sup>*) *of <sup>q</sup> i*,*j , it can be seen that the data is hidden. Thus, the number of hidden bits from the allowable pixel range of the pixel q <sup>i</sup>*,*<sup>j</sup> may be determined. That is, if we apply Equation (6) to the pixel q i*,*j , then we obtain the <sup>m</sup>* <sup>=</sup> *log*2(*Uq* <sup>−</sup> *Lq* <sup>+</sup> <sup>1</sup>) <sup>=</sup> *log*2(<sup>23</sup> <sup>−</sup> <sup>20</sup> <sup>+</sup> <sup>1</sup>) <sup>=</sup> <sup>2</sup>*. Next, the error bit e <sup>i</sup>*,*<sup>j</sup>* <sup>=</sup> <sup>2</sup> *obtains from <sup>q</sup> <sup>i</sup>*,*<sup>j</sup>* <sup>=</sup> <sup>22</sup> *and Lq* <sup>=</sup> <sup>20</sup> *by using Equation (9). The value hidden in the pixel qi*,*<sup>j</sup> is e <sup>i</sup>*,*<sup>j</sup>* <sup>=</sup> *<sup>q</sup> <sup>i</sup>*,*<sup>j</sup>* <sup>−</sup> *Lq* <sup>=</sup> <sup>22</sup> <sup>−</sup> <sup>20</sup> <sup>=</sup> <sup>2</sup>*, which is converted to the binary number* 10 <sup>2</sup>*. The marked pixel q i*,*j is reconstructed to the original pixel qi*,*<sup>j</sup> by assigning the prediction value qdirD to the position* (*i*, *<sup>j</sup>*)*.*

### *3.4. Embedding/Extraction Procedure for Location Map*

The location map (LM) is necessarily required for data extraction and depth map restoration, and thus, the location map should be transferred to the receiver side. There are many ways to transfer location map, but the most common way is to hide it into a cover image. At this step, it should minimize the location map, because the capacity of the secret data is reduced with the size of the location map. Thus, the compression of location map is a common procedure before embedding it into the cover image. There are several compression methods, but here we use arithmetic coding. The map size may be reduced by less than 10% by arithmetic coding, because the ratio of "0" in the map is more than 90%.

The location map *LM* compressed by arithmetic coding is assigned into variable *δ*. The compressed location data *δ* is embedded by the LSB replacement as follows, but the data are embedded considering the allowable pixel range.

$$q'\_{i,j} = \left\{ \begin{array}{l} \text{if } (\mathcal{U}\_{\emptyset} - L\_{\emptyset} \ge 3) \\ \quad q\_{i,j} + 1, \\ \quad q\_{i,j} - 1, \text{ if } (\delta\_{i,j} = 0 \& \neq q\_{i,j} \,\%\, 2 = 1) \\ \quad q\_{i,j} - 1, \text{ if } (q\_{dirD} = \mathcal{U}\_{\emptyset} \text{ & } A) \end{array} \right\} \equiv A \tag{10}$$

The compressed location data *δ* is embedded in front of the depth map. The data *δ* does not cause a synthesis error since it also adopted a D-NOSE model that hides the data within the allowable pixel range. The last position of the location information is sent on a separate channel. On the receiving side, the compressed location map can be extracted by the following Equation (11).

$$
\delta\_{i,j} = \begin{cases}
\text{ if } (\mathcal{U}\_q - L\_q \ge 3) & \{ \\
0, \text{ if } (q\_{i,j} \% \ge 2 = 0) \\
1, \text{ otherwise} \\
\}
\end{cases}
\tag{11}
$$

### *3.5. Quanlity Control*

In our proposed method, the quality of the depth map can be somewhat reduced if the allowable pixel range on all pixels is fully used. It can certainly be an advantage in terms of embedding capacity, but it is not desirable in terms of quality. Therefore, it is also important to find a balance between the two criteria. Adjustment of the allowable pixel range may be used to achieve such a purpose. Equations (12) and (13) can be used for managing depth map quality and embedding capacity through Equation (6). The control of depth map is achieved by using a limited allowable range [*L <sup>q</sup>*, *U <sup>q</sup>*] instead of the [*Lq*, *Uq*].

$$L'\_q = \begin{cases} \ \begin{array}{l} q\_{dirD} - \sigma\_\prime \quad \text{if } (q\_{dirD} - \sigma < L\_q) \\ L\_{q\prime} \quad \text{otherwise} \end{array} \tag{12}$$

$$\mathcal{U}\_{q}^{\prime} = \begin{cases} \begin{array}{c} \mathcal{U}\_{q}^{\prime} + 2^{\sigma} - 1, \quad \text{if } (\mathcal{U}\_{q}^{\prime} + 2^{\sigma} - 1 > \mathcal{U}\_{q}) \\ \mathcal{U}\_{q}, \text{ otherwise} \end{cases} \tag{13}$$

Here, *σ* is an integer variable and the range of values is {*σ* ≥ 1 & *σ* ≤ *n*}. That is, increasing the value of *σ* can increase the embedding capacity of RDH, while decreasing the value of *σ* may improve the quality of the depth map. We may increase the usefulness of the proposed method if we adjust the value of *σ* appropriately for the application.

### **4. Experimental Results**

To better demonstrate the performance of our proposed scheme, we graphically show the results of experiments and analysis of 3D images with various features. All experiments are performed with eight 3D sized 1920 <sup>×</sup> <sup>1088</sup> (*or* <sup>1024</sup> <sup>×</sup> <sup>768</sup>), "Poznan\_Hall2","Poznan\_Street", "Undo\_Dancer", "GT\_Fly", "Kendo", "Balloons", "Newspaper", and "Shark" (see Figure 5), which are often used for 3D video coding standards, such as 3D-AVC [19] and 3D HEVC [18], and the view synthesis reference software (VSRS) in JCT-3V [29]. Since the schemes such as Chung et al.'s, Shi et al.'s, and the proposed methods adopt the D-NOSE model, the synthesis using the original depth map is identical to that of the synthesis with the marked depth map.

In this paper, we use two criteria to evaluate the performance of the existing and our proposed schemes. The first criterion is the embedding rates (ERs) and the second is the peak signal-to-noise ratio (PSNR). The most well-known measurement method for objective evaluation of images is PSNR. That is, PSNR is the intensity of noise over the maximum intensity the signal can have. The MSE used in PSNR is an average difference in intensity between the marked depth map and the reference depth map. If the MSE value of the depth map is low, it is evaluated that the quality of the image is good. The MSE is calculated using a reference depth map *p* and the distorted depth map *p* as follows.

$$MSE(p\_\prime p') = \frac{1}{N} \sum\_{i=1}^{N} (p\_i - p'\_i)^2 \tag{14}$$

The error value *<sup>ε</sup>* <sup>=</sup> *pi* <sup>−</sup> *<sup>p</sup> <sup>i</sup>* denotes the difference between the original depth map and the distored depth map signal. The 255<sup>2</sup> means the allowable pixel intensity in Equation (15).

$$PSNR = 10 \log\_{10} \frac{255^2}{MSE} \tag{15}$$

Meanwhile, ER is a measurement of how much information is included in the marked depth map. That is, ER is the ratio of the embedded information contained in the marked depth map. In Equation (16), *N* is the total number of pixels and ||*η*|| denotes the number of message bits.

$$\rho = \frac{||\eta||}{N} \tag{16}$$

**Figure 5.** Test images; (**a**–**d**,**h**): 1920 × 1088, (**e**–**g**): 1024 × 768.

Figure 6 shows the comparison of data embedding rates using three methods (two existing methods and the proposed method) and eight depth maps: (a) Poznan\_Hall2, (b) Poznan\_Street, (c) Undo\_Dancer, (d) GT\_Fly, (e) Kendo, (f) Balloons, (g) Newspaper and (h) Shark; 1920 ×1088: (a)–(d) and (h), 1024 ×768: (e)–(g).

The ERs of Chung et al.'s method is much less than that of Shi et al.'s and our proposed methods. That is because Chung et al.'s method does not fully use the allowable range. On the other hand, Shi et al.'s and our proposed method may achieve better results by adopting an allowable pixel

range. For example, we show that the ERs of "Poznan\_Hall2" in Chung et al.'s method is very low. For the "GT\_Fly", the embedding rates of our proposed method is 0.22 BPP higher than that of Shi et al.'s method. As shown in Figure 6, the coefficients of difference between these two methods show that the performance of our inter-component prediction is superior to the rhombus prediction. Therefore, our proposed method outperforms the existing two methods, including Chang et al.'s and Shi et al.'s method.

**Figure 6.** Performance comparison of three data hiding methods using eight depth maps.

In Table 1, we compare the performance of the proposed method and the existing methods by using PSNR with various BPPs and the depth map, "Shark". As the embedding rates increases, it appears that Chung et al.'s PSNR is sharply decreased. The reason is that Chung et al.'s method does not use the same method of quality control and the allowable depth ranges in the data embedding procedure. On the other hand, Shi et al.'s and our methods use the allowable pixel ranges and quality control, thus, they have good performance. In addition, our method shows slightly better performance than Shi et al.'s method because it uses an accurate prediction method.


**Table 1.** The comparision of the PSNR at the different BPP for the depth map, "Shark".

As shown in Table 2, we know that the maximum BPP of Chung et al.'s method is less than that of both Shi et al.'s and our proposed method. However, since the BPP is low, the PSNR is relatively higher than that of Shi et al.'s and our method.


**Table 2.** PSNR comparison for maximum BPP on each image using various methods.

Therefore, if the BPP is the same for the three methods, the PSNRs of Shi et al.'s and our proposed method are better than that of Chang et al.'s method. In "Newspaper", there is the highest difference of PSNR between our proposed and the Shi et al.'s method, because it seems that the depth map is an image including a high-frequency property. Thus, it can be seen that the proposed method had high prediction competence on depth maps with high-frequency characteristics. Moreover, the average BPP of our proposed method is higher (by 0.09) than that of Shi et al.'s method, while our method is less than 0.39 dB compared to that of Shi et al.'s method in the aspect of PSNR. However, in this case, it is indistinguishable from the viewpoint of the usual human visual system. Therefore, it can be recognized that the proposed scheme improves somewhat regarding BPP.

In Figure 7, the control variable *σ* (Equations (12) and (13)) is applied to adjust the embedding capacity and quality of the depth map. Its principle is that the amount of embedding capacity increases in proportion to the value of the variable *σ*, while the quality of depth map decreases as *σ* increases. When the control variable *σ* = 1, the maximum embedding rate is 0.8 and the PSNR is 52 dB. When the variable *σ* = 7, BPP is measured from the lowest 0.1 to the maximum 1.6 and we obtain that the PSNR is from 48.5 dB to 41 dB. Under a strict communication environment, it may be useful to use the control variable for secret communication.

**Figure 7.** The relationship between BPP and PSNR according to control variable *σ* with depth map "Shark".

Table 3 shows PSNRs for the eight depth maps under various BPPs using three methods (two existing and the proposed methods) when *σ* = 1. In the table, we can see that depth maps (a), (e), and (f) may hide data up to 0.9 BPP.


**Table 3.** PSNR measurement for the eight depth maps when *σ* = 1.

The depth map of (a), (e), and (f) has a higher embedding capacity than the other depth maps; because the sum of pixels *ϕ<sup>n</sup>* (Equation (3)) of these depth maps are high. In other words, there are a number of pixels having a wide range of allowable pixels. From this point of view, it can be seen that sum of pixels *ϕ<sup>n</sup>* of the depth map (b) is low. It is proved that the proposed method maintains very high PSNR like the conventional DHs through various simulations.

Figure 8 is a visual comparison of the marked depth maps generated from the simulation results derived from three methods in "Poznan\_Street". In Figure 8, the BPP for (c), (d), and (f) are 0.4222, 0.7713, and 0.8339, respectively, and the PSNRs of these are 53.9985 dB, 49.3750 dB, and 48.9424 dB, respectively. As mentioned above, the embedding rates of the proposed method are about twice as high as that of Chung et al.'s method, and also our proposed scheme provides a good depth map quality about 49 dB.

**Figure 8.** The relationship between BPP and PSNR according to control variable, *σ* in depth map on "Poznan\_Street".

Figure 9 shows a visual representation of how much pixels are distorted during the data-embedding procedure for the depth map, "Shark". The two marked depth maps by Shi et al.'s and the proposed method include 0.4 BPP. We can easily observe the distortion of the original pixels through the comparison of the histogram made by Shi et al.'s and the proposed method. As shown in the histogram, it seems that the marked histograms are very similar to the original histogram. For this reason, the two marked depth maps show a very high image quality about 55 dB. In graylevel 106 and 107, we may recognize that there is a small difference between the the two methods. As a result, it is proved that our proposed method has fewer errors compared to Shi et al.'s method.

**Figure 9.** Comparison of histogram coefficients on depth map—"Shark" between Shi et al.'s method and the proposed method.

### **5. Conclusions**

In this paper, we introduced a method to hide metadata in 3D videos using RDH technology, which is one of many watermarking technologies. The proposed RDH is the PEE method, which hides a large amount of data while minimizing the damage of cover image using LSB of depth map. The accuracy of the pixel prediction is very important to the performance of the PEE method. To improve the accuracy, we proposed an efficient MVD-based RDH using inter-component prediction that predicts depth pixels using MVD-related texture pixels. The newly introduced inter-component prediction may improve the performance of the RDH because the prediction precision is higher than the conventional diamond shape prediction. Especially, the prediction of the depth map in the texture image with high frequency characteristics showed excellent performance. Experimental results demonstrated that the proposed method achieves higher embedding capacity than the all the previous methods by improving the prediction accuracy.

**Author Contributions:** J.Y.L., C.K. conceived and designed the model for research and pre-processed and analyzed the data and the obtained inference. J.Y.L. simulated the design using Visual C ++. C.K. and C.-N.Y. wrote the paper. J.Y.L., C.K., C.-N.Y. checked and edited the manuscript. The final manuscript has been read and approved by all authors.

**Funding:** This research was supported in part by Ministry of Science and Technology (MOST), under Grant 107-2221-E-259-007. This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by (2015R1D1A1A01059253), and was supported under the framework of international cooperation program managed by NRF (2016K2A9A2A05005255). This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2018R1C1B5086072).

**Acknowledgments:** The authors are grateful to the editors and the anonymous reviewers for providing us with insightful comments and suggestions throughout the revision process.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Electronics* Editorial Office E-mail: electronics@mdpi.com www.mdpi.com/journal/electronics

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com ISBN 978-3-03943-858-7