A Robust Video Watermarking Algorithm Based on Two-Dimensional Discrete Fourier Transform

Yang, Xiao; Zhang, Zhenzhen; Jiao, Yueshuang; Li, Zichen

doi:10.3390/electronics12153271

Open AccessArticle

A Robust Video Watermarking Algorithm Based on Two-Dimensional Discrete Fourier Transform

by

Xiao Yang

,

Zhenzhen Zhang

^*,

Yueshuang Jiao

and

Zichen Li

Department of Information Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(15), 3271; https://doi.org/10.3390/electronics12153271

Submission received: 13 July 2023 / Revised: 27 July 2023 / Accepted: 28 July 2023 / Published: 30 July 2023

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Due to the continuous development and popularity of digital video technology, the copyright protection of digital video content has become an increasingly prominent issue. Digital video watermarking, as an effective means of digital copyright protection, has attracted widespread attention from academia and industry. The two-dimensional discrete Fourier transform (2D-DFT) template-based method has the advantages of good real-time performance and robustness, but the embedding capacity is small and cannot resist frame-dropping attack. To address this problem, a new template construction method is proposed in this paper, which can extend the embedding capacity from 1 bit each group of pictures (GOP) to theoretically n bits each GOP. In addition, by changing the GOP pattern, the method gains the ability to resist frame-dropping attacks. Experimental results show that the proposed method can achieve larger watermark capacity while maintaining strong robustness against image-processing attacks, geometric attacks, video-processing attacks and compression attacks.

Keywords:

robust video watermarking; DFT; template sequence; capacity

1. Introduction

The popularity and development of digital video technology brings convenience but also causes a series of problems, such as copyright infringement and fake information. Video data hiding, including video stenography [1,2] and video watermarking [3,4] has received extensive usage and continuous study. Here, we focus on video watermarking algorithms, which can hide copyright information in videos without being perceived and are more suitable for copyright protection. In addition, by embedding watermarks such as brand logos and trademarks in videos, video watermarking technology can increase brand value and revenue, further expanding the commercial value of the video industry.

Since a robust video watermarking algorithm which can extract the watermark correctly even after intentional and unintentional attacks is a key requirement of copyright protection, we focus on robust video watermarking in this paper. According to embedding and extraction methods, the video watermarking algorithms can be classified into compression domain-based video watermarking and content-based video watermarking [5]. In compression domain-based watermarking techniques, the watermark is usually embedded during the encoding process, such as MPEG-2, MPEG-4, H.264/AVC, or H.265/HEVC. There are a number of MPEG-based video watermarking algorithms [6,7,8]. In 2017, Li et al. [9] proposed a MPEG-2 video watermarking technology based on the DC coefficient. The watermarked video only needs to be partially decoded, which can greatly improve the efficiency of the algorithm. Furthermore, in 2017, Su et al. [10] proposed an adaptive approach of MPEG-4 video based on the Watson perceptual model. For HEVC and AVC, the other two popular encoders, there are also some video watermarking algorithms [11,12,13,14]. In 2021, Sun et al. [15] proposed a scalable watermarking algorithm using the discrete cosine transform (DCT) coefficients for watermark embedding in the H.264 compressed domain. Similarly, Cai et al. [16] employed the DCT coefficients and prediction unit partition technique to embed watermarks in HEVC video.

Compression domain-based watermarking algorithms have good real-time performance but are less robust and are also vulnerable to different video codecs. Therefore, many scholars focus on content-based watermarking algorithms, which usually have better robustness and can be applied to various video codecs. In 2017, Bayoudh et al. [17] proposed a new method based on Krawtchouk moments and DCT. The method was robust to various attacks. However, different transform domains have different limitations, so researchers turned to using the combination of multiple transform domains to improve the robustness of watermarking algorithms. In 2020, Liu et al. [18] proposed a video watermarking scheme based on discrete wavelet transform (DWT) and singular value decomposition (SVD). The scheme can greatly improve the imperceptibility. By 2022, Wang et al. [19] transformed the DWT domain into the dual tree-complex wavelet transform (DTCWT) domain, performed SVD of DTCWT video frames, and implemented watermark embedding by adjusting the shape of candidate coefficients. Finally, the algorithm was made robust against temporal asynchrony attacks. Fan et al. [20] proposed a video watermarking algorithm by combining non-subsampled contourlet transform, 3D-DCT, and non-negative matrix factorization (NMF) to embed the encrypted 2D copyright watermark into the base matrix of NMF. However, although these algorithms achieve good robustness, most of them require geometric synchronization when attacked and are more time-consuming due to the embedding in the transform domain.

Sun et al. [21] proposed a method to deal with the synchronization and the real-time issue. Two spatial masks representing secret bits 0 and 1 were generated, and by using the spatial masks, the watermark embedding process could converted from the frequency domain to the spatial domain and thus greatly reduce the watermark embedding time. However, the disadvantage of this algorithm is that only one bit can be embedded in one GOP, which limits its embedding capacity. Furthermore, it is not resistant to frame-dropping attack. Aiming at these issues, we aimed to enlarge the embedding capacity and strengthen the robustness against frame-dropping attack. First, the prime factorization method was introduced to design the corresponding watermark template sequence for the n bits watermark, and the embedding capacity was extended from 1 bit to theoretically n bits per GOP. Furthermore, we modified the GOP segmentation method, which can indirectly enlarge the embedding capacity on one hand and can resist frame-dropping attack by sending the number of GOPs as a key to the verifier, thus reducing the impact of unsynchronized watermark information.

The main contributions of this paper are as follows: (1) We increased the watermark embedding capacity from 1 bit to theoretically n bits for each GOP by using a modified template sequence. (2) We changed the GOP grouping pattern and resized the GOP to an adaptive method, so that the video can remain robust in the case of frame-dropping attack. The rest of this paper is organized as follows: Section 2 briefly introduces the related concepts, including DFT properties, and the proposed watermark template sequences. Section 3 describes the embedding and extraction steps of the proposed method in detail. Section 4 gives the experimental results as well as the comparative analysis. Section 5 concludes this paper and discusses the future research directions.

2. Preliminaries

2.1. Property of 2D-DFT

The 2D-DFT method converts a discrete time-domain signal into a discrete frequency-domain signal and is widely used in image processing. The 2D-DFT approach is often adopted in image processing, and the definition is shown in Equation (1), where

f (x, y)

is the pixel value at the position of xth row and yth column,

F (u, v)

is the value at the

(u, v)

position in the transform domain, the range of u and v is the same as the range of values of x and y, and M and N represent the length and width of the image, respectively.

F (u, v) = \sum_{x = 0}^{M - 1} \sum_{y = 0}^{N - 1} f (x, y) e^{- \frac{j 2 π x u}{M} - \frac{j 2 π y v}{N}}

(1)

f (x, y) = \frac{1}{M N} \sum_{u = 0}^{M - 1} \sum_{v = 0}^{N - 1} F (u, v) e^{\frac{j 2 π u x}{M} + \frac{j 2 π v y}{N}}

(2)

The definition of the inverse discrete fourier transform (IDFT) is shown in Equation (2). The linear property is one important property of 2D-DFT, which is shown in Equation (3), where

G (u, v)

is the corresponding 2D-DFT of

g (x, y)

.

a f (x, y) + b g (x, y) \leftrightarrow a F (u, v) + b G (u, v)

(3)

From Equation (3), we can see that the change in DFT coefficients can be implemented by altering the corresponding pixel value in spatial domain. That is to say, watermark embedding in DFT domain is equivalent to increasing or decreasing the pixel value of the cover image, which is more time-saving.

The DFT rotation feature is also very important. As shown in Equation (4), this means that if an image in the spatial domain is rotated by

θ

degrees, the corresponding content in the frequency domain is also rotated by

θ

degrees. The scaling property of DFT, as shown in Equation (5), shows that shrinking or stretching the image in the spatial domain is equivalent to expanding or shrinking the image in the frequency domain by the corresponding ratio.

f (x * c o s θ - y * s i n θ, x * s i n θ + y * c o s θ) ⟷ F (u * c o s θ - v * s i n θ, u * s i n θ + v * c o s θ)

(4)

f (a x, b y) ⟷ \frac{1}{a b} F (\frac{u}{a}, \frac{v}{b})

(5)

2.2. Watermark Template Construction

According to Section 2.1, the watermark embedding process can be converted from the 2D-DFT frequency domain to the spatial domain and improve the real-time performance. Consequently, we can first design two binary sequences in the 2D-DFT domain, called the zero-template and one-template, to represent watermark bits 0 and 1, respectively. Then, we can calculate the spatial mask corresponding to the watermark template by using Equation (3). Finally, watermark embedding can be conducted by adding the spatial mask to the pixel value of the cover video frame. However, two templates mean only one bit can be embedded in one video frame, and the embedding capacity is limited. Thus, in this section, we will first introduce the watermark template construction principle and then illustrate our design of n template sequence to enlarge the embedding capacity.

2.2.1. Watermark Template and Corresponding Spatial Mask

In the process of designing the template, there are three issues to consider. The first one is the shape of the watermark template. Choosing a circular template can resist the rotation attack. According to the rotation property of 2D-DFT, the circular template keeps the same position after rotation, which makes watermark extraction faster and more accurate. As shown in Figure 1c, the zero-template, which denotes watermark bit zero, is presented. The corresponding binary sequence is 10101010101010101010, where 1 forms a peak and 0 becomes a gap. To achieve this goal, a constant C can be added to the DFT coefficient at the position corresponding to the bit “1” in the sequence. In order to ensure that the video frames are still guaranteed to be real after IDFT transformation, we should ensure that the watermark templates are centrally symmetric. Therefore, we embedded the same template sequence 2 times in the embedding process.

The second issue is the location of the watermark template. Low-frequency DFT coefficients occupy the majority energy of the video frame, and modifying them would greatly reduce the visual quality. On the contrary, high-frequency DFT coefficients hold the minority energy of the image, but they are easily removed under various video-processing operations. Therefore, it is most appropriate to embed the watermark template in the mid-frequency part. Since small radius size denotes low-frequency DFT coefficients, we set the embedding radius size to 1.2 × L/2, where L is the width of the video frame. The last issue is spatial mask generation process. The specific construction process is shown as follows.

Step 1: A grayscale image with size L × L is constructed, and then all its pixel values are set to 128. This ensures that all the DFT magnitude is 0 except the center of DFT, as shown in Figure 1a,b.

Step 2: The watermark template is added to the 2D-DFT of the grayscale image. Then, the IDFT transformation was performed to form a grayscale image with watermark information, as shown in Figure 1c–e.

Step 3: The Spatial mask corresponding to the watermark template is generated by subtracting Figure 1a from Figure 1e. From the linear property of 2D-DFT, when embedding watermark information, we only need to add the corresponding spatial mask to the video frame.

2.2.2. Template Sequence

According to [21], in order to ensure good robustness in the watermark extraction process, two bits sequences, as shown in Equations (6) and (7), were selected as templates for watermark bit 0 and watermark bit 1 for watermark embedding, where S

_{0}

represents the bit sequence for watermark bit 0 and S

_{1}

represents the bit sequence for watermark bit 1. The advantage is that the normalized correlation (NC) value of the two templates remains at 0.5 regardless of the rotation strength they undergo.

S_{0} (i) = \{\begin{matrix} 0 w h e n i m o d 2 = 0 \\ 1 w h e n i m o d 2 = 1 \end{matrix} f o r i = 0, 1, \dots, l - 1

(6)

S_{1} (i) = \{\begin{matrix} 0 w h e n i m o d 4 = 0 o r i m o d 4 = 1 \\ 1 w h e n i m o d 4 = 2 o r i m o d 4 = 3 \end{matrix} f o r i = 0, 1, \dots, l - 1

(7)

However, two watermark templates mean only 1 bit can be embedded in one GOP, and this leads to a low embedding capacity. To solve the problem, a new approach is proposed to extend the embedding capacity from 1 bit to a theoretical n bits by designing a number of watermark template sequences.

When constructing the watermark template, the divisor, such as 2 and 4 in Equations (6) and (7), is a key parameter. First of all, the divisors in different template sequence should be distinct with each other; thus, the watermark template can be different and can be easy to distinguish in the watermark extraction process. Second, the divisor affects the bit error ratio (BER) between any two watermark template sequences, especially with the consideration of rotation attack. Here, the BER is used to calculate the similarity between the template sequences in order to facilitate the following analysis. Therefore, to ensure n bits can be embedded in one video frame, 2

^{n}

watermark template sequences and 2

^{n}

different and robust divisors should be built. To solve the problem, the operation of factorization is brought in to calculate the template sequences, which can ensure the accuracy of the algorithm.

Equation (8) shows the prime factorization of a positive integer cn,

c n = {p p}_{1}^{a_{1}} \times {p p}_{2}^{a_{2}} \times \dots \times {p p}_{n}^{a_{n}}

(8)

where

{p p}_{1}, {p p}_{2}, \dots, {p p}_{n}

are prime numbers that are not equal to 1 and not the same as each other, and

a_{1}, a_{2}, \dots, a_{n}

are the prime index of

{p p}_{1}, {p p}_{2}, \dots, {p p}_{n}

, respectively, and all are greater than or equal to 1. The factor of cn can be used as divisors to construct watermark template sequences, and 2

^{n}

different divisors are required. The factor number of cn can be obtained according to Equation (9).

τ (c n) = (a_{1} + 1) \times (a_{2} + 1) \times \dots \times (a_{n} + 1)

(9)

To ensure that the number of templates embedded in the cover video is exactly 2

^{n}

, i.e., the factor number of cn is 2

^{n}

, we used the inverse method of integer factorization to calculate the number of factors. The inverse operation of factorization is defined in Equations (10) and (11).

m_{1} = \frac{n}{n 1}, m_{2} = m o d (n, n 1), 1 \leq n 1 \leq n

(10)

c n 1 = {p p}_{1}^{2^{(m_{1} + 1)} - 1} \times {p p}_{2}^{2^{(m_{1} + 1)} - 1} \times \dots \times {p p}_{m_{2}}^{2^{(m_{1} + 1)} - 1} \times {p p}_{m_{2} + 1}^{2^{m_{1}} - 1} \dots \times {p p}_{n 1}^{2^{m_{1}} - 1}

(11)

cn1 must have 2

^{n}

mutually different factors, where each factor is the combination

{p p}_{1}, {p p}_{2}, \dots, {p p}_{n 1}

. After we obtain the divisors, we can construct the watermark template sequence by using Equation (12), where q represents the divisor,

⌈ \cdot ⌉

denotes ceiling function, and S

_{k}

represents the kth watermark template sequence.

S_{k} (i) = \{\begin{matrix} 1 w h e n i m o d q < ⌈\frac{q}{2}⌉ \\ 0 w h e n i m o d q \geq ⌈\frac{q}{2}⌉ \end{matrix}

(12)

Next, we elaborate the BER of any two watermark template sequences. Suppose q

_{1}

and q

_{2}

are two different divisors for two watermark templates; there are two kinds of relation between them. One is that q

_{1}

or q

_{2}

is a multiple of the other. The other one is q

_{1}

or q

_{2}

cannot be evenly divided by the other. Figure 2 shows the case of the first relation. In Figure 2a, q

_{1}

, q

_{2}

are equal to 6, 12, respectively, so q

_{2}

is an even multiple of q

_{1}

. The first row is the watermark template sequence S

_{1}

(i) constructed by using divisor q

_{1}

= 6, and the second row is the watermark template sequence S

_{2}

(i) constructed by using divisor q

_{2}

= 12. In Figure 2b, q

_{1}

, q

_{2}

are equal to 6, 18, respectively, so q

_{2}

is an odd multiple of q

_{1}

.

From Figure 2a, the watermark template sequence with divisor q

_{2}

always consists of two parts: the all-one sequence and all-zero sequence. In the all-one sequence, the different bit number between S

_{1}

(i) and S

_{2}

(i) is

⌈\frac{q_{1}}{2}⌉

, and the different bit number between S

_{1}

(i) and S

_{2}

(i) is

⌊\frac{q_{1}}{2}⌋

in the all-zero sequence, where

⌊\cdot⌋

denotes floor function. Therefore the BER between S

_{1}

(i), and S

_{2}

(i) can be described as Equation (13), which can represent the BER for the case of q

_{1}

, or q

_{2}

is an even multiple of the other. By adopting the same analysis, we can obtain the BER for the case of q

_{1}

, or q

_{2}

is an odd multiple of the other. The example is shown in Figure 2b, and BER is described in Equation (14).

B E R = \frac{⌈\frac{q_{1}}{2}⌉ \times \frac{\frac{q_{2}}{q_{1}}}{2} + ⌊\frac{q_{1}}{2}⌋ \times \frac{\frac{q_{2}}{q_{1}}}{2}}{q_{2}} = \frac{q_{1} \times \frac{\frac{q_{2}}{q_{1}}}{2}}{q_{2}} = \frac{q_{1} \times ⌊\frac{\frac{q_{2}}{q_{1}}}{2}⌋}{q_{2}} = \frac{1}{2}

(13)

B E R = \frac{⌈\frac{q_{1}}{2}⌉ \times \frac{(\frac{q_{2}}{q_{1}} - 1)}{2} + ⌊\frac{q_{1}}{2}⌋ \times \frac{(\frac{q_{2}}{q_{1}} - 1)}{2}}{q_{2}} = \frac{q_{1} \times \frac{(\frac{q_{2}}{q_{1}} - 1)}{2}}{q_{2}} = \frac{q_{1} \times \frac{(\frac{q_{2}}{q_{1}} - 1)}{2}}{q_{2}} = \frac{q_{1} \times ⌊\frac{\frac{q_{2}}{q_{1}}}{2}⌋}{q_{2}}

(14)

Then, we should consider the BER of S

_{1}

(i) and S

_{2}

(i) when they are rotated since it is better if the watermark template can resist rotation attack. From Equation (12), we can see that the structure of S

_{k}

(i) is the all-one sequence first and then the all-zero sequence. As illustrated in Figure 3a, the first row is the watermark sequence S

_{1}

(i) with divisor q

_{1}

= 4, and the second row is the watermark sequence S

_{2}

(i) with divisor q

_{2}

= 20. When no rotation appears, S

_{1}

(i) and S

_{2}

(i) are most aligned, which means the BER is lowest at this time. When S

_{2}

(i) is rotated to a sequence which is the opposite of the original one, as shown in Figure 3b, the BER value is the largest. Therefore, we can obtain the BER range, which is displayed in Equation (15).

\frac{q_{1} \times ⌊\frac{\frac{q_{2}}{q_{1}}}{2}⌋}{q_{2}} \leq B E R \leq 1 - \frac{q_{1} \times ⌊\frac{\frac{q_{2}}{q_{1}}}{2}⌋}{q_{2}}

(15)

From the above analysis, we can see in the case that q

_{1}

or q

_{2}

is an even multiple of the other, the BER value always remains at 0.5. In the case that q

_{1}

or q

_{2}

is an odd multiple of the other, the BER value gradually approaches to 0.5 as the multiple increases. Especially, when q

_{2}

is three times q

_{1}

, the BER reaches the minimum value, which is 0.33.

Equation (15) satisfies the case that q

_{1}

or q

_{2}

is a multiple of the other. For the other case that q

_{1}

or q

_{2}

cannot be evenly divided by the other, the BER between S

_{1}

(i) and S

_{2}

(i) also satisfies Equation (15) after extensive testing.

Next, we will take n = 3 as an example. Three bits means 2

^{3}

templates are needed for embedding. We chose n1 = 2,

{p p}_{1}, {p p}_{2}

equal to 2 and 3, respectively. By using Equations (10) and (11), we can obtain cn1 = 24, which consists of 8 factors, including factor 1. Eight factors are used as the divisors according to Equation (12), and the watermark template sequences can be obtained as Equation (15). Please note that, for divisor 1, i.e., the eighth watermark template S

_{7}

(i), it is impossible to construct it as Equation (12), so we found the sequence whose BER with S

_{1}

(i), ⋯, S

_{7}

(i) is nearest to 0.5 with the help of programming.

\begin{matrix} S_{0} (i) = \{\begin{matrix} 1 w h e n i m o d 2 < 1 \\ 0 w h e n i m o d 2 \geq 1 \end{matrix} f o r i = 0, 1, \dots, 23 \\ S_{1} (i) = \{\begin{matrix} 1 w h e n i m o d 3 < 1 \\ 0 w h e n i m o d 3 \geq 1 \end{matrix} f o r i = 0, 1, \dots, 23 \\ S_{2} (i) = \{\begin{matrix} 1 w h e n i m o d 4 < 2 \\ 0 w h e n i m o d 4 \geq 2 \end{matrix} f o r i = 0, 1, \dots, 23 \\ S_{3} (i) = \{\begin{matrix} 1 w h e n i m o d 6 < 3 \\ 0 w h e n i m o d 6 \geq 3 \end{matrix} f o r i = 0, 1, \dots, 23 \\ S_{4} (i) = \{\begin{matrix} 1 w h e n i m o d 8 < 4 \\ 0 w h e n i m o d 8 \geq 4 \end{matrix} f o r i = 0, 1, \dots, 23 \\ S_{5} (i) = \{\begin{matrix} 1 w h e n i m o d 12 < 6 \\ 0 w h e n i m o d 12 \geq 6 \end{matrix} f o r i = 0, 1, \dots, 23 \\ S_{6} (i) = \{\begin{matrix} 1 w h e n i m o d 24 < 12 \\ 0 w h e n i m o d 24 \geq 12 \end{matrix} f o r i = 0, 1, \dots, 23 \\ S_{7} (i) = \{\begin{matrix} 1 w h e n i m o d 2 < 1 \\ 0 w h e n i m o d 2 \geq 1 \\ \begin{matrix} 0 w h e n i m o d 2 < 1 \\ 1 w h e n i m o d 2 \geq 1 \end{matrix} \end{matrix} \begin{matrix} f o r i = 0, 1, \dots, 11 \\ f o r i = 12, 13, \dots, 23 \end{matrix} \end{matrix}

(16)

According to Equation (15), we calculated the BER between each two sequences, and the results are shown in Table 1. It can be seen that the BER between any two template sequences is always higher than 0.33 and as high as 0.67 in some cases, most of them remaining at 0.5.

3. Proposed Watermarking System

In this section, we describe the proposed watermarking algorithm, including watermark embedding and watermark extraction. In the watermark embedding process, the video frames are first divided into multiple GOPs, then the watermark template is embedded in each GOP, and finally perceptual model is utilized to increase the fidelity of the video. In the watermark extraction process, the GOPs are first obtained according to GOP number, then the watermarks in each GOP are extracted by averaging the video frames, and finally the watermark information is obtained by calculating the correlation between the extracted sequence and the template sequence.

3.1. Watermark Template Construction

In this section, the process of watermark embedding is presented, as illustrated in Figure 4.

In addition to watermark template embedding, there are two other key steps in the embedding process. One is the length of GOP, which is related to the embedding capacity and the robustness of the proposed algorithm. The second is the introduction of the perceptual model, which can improve the invisibility of the algorithm.

Frame attacks including frame averaging, frame swapping, and frame dropping cause synchronization issues, which lead to imprecision of watermark extraction. To solve the problem, it is better to repeatedly embed watermark information. Therefore, the video frames are divided into GOPs, and one watermark bit is repeatedly embedded in each video frame of one GOP. First, we need to calculate the number of frames contained in every GOP. The selection of the GOP size needs to be calculated by the length of watermark to be embedded. The calculation formula is shown in Equation (17), where N

_{c}

is the frame number of the cover video and N

_{w}

is the bit number of the watermark information.

G_{s} = \frac{N_{c}}{N_{w}}

(17)

The GOP structure is shown in Figure 5. In the extreme case, each GOP contains only one frame. This maximizes the watermark capacity, but the watermark is sensitive to frame attacks. We will discuss the selection of GOP size in Section 4.2.

Inter-frame encoding is an inevitable step in video coding, which uses a reference frame to predict the current frame. However, inter-frame encoding can disturb the embedding process of watermarks. Assuming that frame F and its reference frame belong to different GOPs, they have different embedded watermark information. When encoding frame F, its reference frame, carrying its own watermark information, is used to predict it; thus, the watermark in the reference frame affects the watermark information in frame F and causes watermark extraction imprecision. To avoid this phenomenon, the last 1/3 part of each GOP is not selected to embed the watermark in the proposed method.

In order to improve the video watermark transparency, a perceptual model was introduced. In the human visual system, low-contrast areas have a great impact on image readability and visual performance because the human eye is sensitive to low-frequency noise. Therefore, in order to ensure the video fidelity, the embedding intensity in the high-contrast region can be appropriately increased, while it can be reduced in the low-contrast region. The specific operation process of information embedding can be referred to in reference [21], and the main principle is to adjust the intensity of information embedding by calculating the local contrast.

The detailed watermark embedding steps are as follows:

Step 1: Watermark template generation. Watermark templates are generated according to Section 2.2.2, and the spatial mask in spatial domain for each watermark template is saved to reduce the time cost of watermark embedding.

Step 2: GOP division of the cover video. Video frames are divided into a series of GOPs according to the bit number of the watermark. Each GOP is divided into two parts; the first 2/3 part is used for watermark embedding, the last 1/3 part remains unchanged and keeps the number of GOPs. Please note that the GOP number needs to be transmitted to the watermark extractor as a key.

Step 3: Embedding intensity adjusting. The spatial mask is adjusted according to the perceptual model. Then, the corresponding spatial mask is added to the Y-component of each video frame.

Step 4: Video encoding. The watermark-embedded video is compressed by using the video codecs.

3.2. Watermark Template Construction

In this section, the process of watermark extraction is presented and is illustrated in Figure 6.

The extraction part is composed of 5 steps.

The first step is GOP segmentation. The video frames are divided according to the GOP number, and then the GOP segmentation is performed in the same way as the embedding part.

The second step is to find the embedded sequence with DFT transform. The average frame of each 2/3 GOP is obtained and subjected to DFT. The time cost is higher if the DFT is performed on each frame. To speed up this process, according to the linear property of DFT, it is performed on the average frame of each GOP.

In the third step, some preprocessing before watermark extraction needs to be carried out. First, the DFT values on the horizontal and vertical lines in the middle of the magnitude image need to be replaced by zero. This is because DFT is implicitly periodic in both the spatial and frequency domains, which can lead to strong horizontal and vertical discontinuities on the cycle boundaries. The intensity of these pixels is often much larger than the watermark amplitude and therefore needs to be suppressed. Second, to eliminate the effect of large energy in the low-frequency part of the DFT domain, a circular region of radius R centered at the origin needs to be set up, and all DFT values in the region of R should be replaced with zero. For most natural images, it is recommended that R be set to 0.35 times the length of the smaller side of the image height or width. For enlarging the attack, a smaller R should be used to ensure that the watermark in the enlarged video is not removed. Then, we can extract the watermark sequence in the DFT domain.

In the fourth step, we need to determine which template the extracted watermark sequence belongs to. We calculated the correlations between the extracted watermark sequence and the watermark template sequence by using Equation (18),

{c o r r}_{X Y} (n) = \sum_{m = 0}^{M - 1} X (m) Y (m + n)

(18)

where

X (m)

represents the extracted sequence in the DFT domain,

Y (m + n)

represents the template sequence, and M represents the length of

X (m)

. In order to extract the watermark accurately, we need to calculate correlations for different radius r and different rotation angles. The watermark is extracted M points at a time with an angle of 360/M between each two points and a radius of r. When embedding the watermark, M points are selected as templates, so that the extracted sequences are separated by 360/M degrees. In order to ensure the correct extraction, the extraction process is carried out every 0.5 degrees. Because the embedding angle between two points is 360/M degrees, we only need to rotate 180/M times to extract the watermark correctly. Finally, the largest correlation value means the extracted sequence belongs to the corresponding watermark template.

4. Experimental Results

4.1. Experimental Setup

The proposed algorithm was employed on an AMD Ryzen 9 5900HX processor with 3.30 GHz and 32 GB RAM, and the tool used for the experiments is MATLAB 2021b on the Windows 11 operating system. To evaluate the effectiveness of the proposed algorithm, experiments were tested on four CIF videos and four 1080p videos, with the majority of them downloaded from https://media.xiph.org/video/derf/ (accessed on 13 July 2023). The parameters of these videos are shown in Table 2, snapshots of these videos are shown in Figure 7, and all videos are in YUV 4:2:0 format. The watermark is the binary sequence of “000 001 010 011 100 101 110 111 000 001”. Eight watermark templates constructed by Equation (16) were adopted, which means 3 bits were embedded in each GOP. In this paper, the performance of the proposed algorithm was tested in four aspects: imperceptibility, capacity, robustness, and time efficiency.

4.2. Selection of GOP Size

GOP size affects the robustness of the algorithm against video frame attacks, especially frame-swapping attack and frame-dropping attack, so it is important to test the effect of GOP size on the efficiency of the algorithm and select the most suitable size. In this part, GOP sizes from 3 to 30, stepped by three were tested on four different CIF videos. We tested four videos by calculating their average BER under three conditions: no attack, 30% frame-swapping attack, and 30% frame-deletion attack. The results are presented in Table 3.

According to the results in Table 3, it can be seen that under the frame-swapping and frame-dropping attacks, the BER of the watermark decreases to 0 as the GOP size increases. When GOP size is greater than 15, it can resist the 30% frame-swapping attack well, while when greater than 30, the algorithm exhibits excellent robustness against the 30% frame-dropping attack. Therefore, we suggest to select a GOP size larger than 30 frames as much as possible. Please note that, in our algorithm, the GOP size is decided by the length of the watermark bit, but we suggest that it is better to set it to larger than 30.

4.3. Imperceptibility and Capacity

Imperceptibility means that the watermark should be transparent, i.e., it cannot be easily perceived by human eyes. The evaluation standard of imperceptibility includes subjective visual quality and objective visual quality.

The subjective visual quality is to determine whether the watermark is visible or not by human eye observation. Figure 8 shows snapshots of the same frame of two 1080p videos before and after watermark embedding; we can see that there are no obvious visual changes before and after watermark embedding.

In terms of objective quality evaluation, two most commonly used transparency metrics, PSNR and SSIM were adopted to calculate the fidelity of the proposed method.

PSNR, short for peak signal-to-noise ratio, is a full-reference image quality evaluation index, which is formulated as Equations (19) and (20),

P S N R = 10 \times {l o g}_{10} (\frac{{M A X}^{2}}{M S E})

(19)

M S E = \frac{1}{m n} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} {[I (i - j) - K (i - j)]}^{2}

(20)

where

I (i, j)

and

K (i, j)

are the pixel values of the original frame and the embedded frame at the

(i, j)

position, respectively.

In addition, SSIM is a commonly used image quality evaluation index. The calculation of SSIM is shown in Equation (21),

S S I M (x, y) = l {(x, y)}^{α} \times c {(x, y)}^{β} \times s {(x, y)}^{γ}

(21)

where l, c, and s represent the luminance, contrast, and structure, respectively. and

α

,

β

,

γ

represent the percentage of each component in SSIM. When

α

,

β

,

γ

are equal to 1, we can obtain Equation (22),

S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{2}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(22)

where x and y denote the two tested images,

μ

denotes the mean pixel value of the frame, and

σ

denotes the covariance pixel value of the frame.

C_{1}

and

C_{2}

are two constants used to control the denominator not being zero.

We calculated the PSNR and SSIM of the tested videos. Since the watermark was embedded on the Y component of the video, we only calculated the PSNR and SSIM for the Y component of the watermark embedded frame. As shown in Table 4, the average PSNR value of the eight tested videos reaches 40.7, and the maximum value reaches 42.14. In addition, the average SSIM value of the eight videos reaches 0.93. The results indicate that the proposed algorithm has good visibility.

In addition to invisibility, capacity is an important evaluation index of watermarking algorithms. Watermark capacity refers to how much watermark information can be embedded in a media file, and the larger the watermark capacity, the larger the amount of information that can be embedded.

We also tested the capacity of the proposed method; since we embedded 3 bits in one GOP in the experiment, the capacity can be calculated according to the frame number of the video. The capacity presents a positive correlation with frame number of the video. In our paper, we set the video embedding capacity to be 30 bits.

4.4. Robustness of the Proposed Method

Robustness refers to the ability of a digital watermark to maintain a certain level of detectability or extractability when subjected to various attacks. In this study, robustness was measured using the normalized correlation (NC) value, which is calculated as Equation (23),

N C = \frac{\sum_{i = 1}^{N} \sum_{j = 1}^{M} w (i, j) w' (i, j)}{\sum_{i = 1}^{N} \sum_{j = 1}^{M} w^{2} (i, j) w^{' 2} (i, j)}

(23)

where w is the original watermark and

w^{'}

is the extracted watermark. The NC value ranges from 0 to 1, and the closer the NC value is to 1, the stronger the robustness of the method is. In this part, image-processing attacks, geometric attacks, video-processing attacks, and compression attacks were tested.

4.4.1. Image Processing Attacks

Image-processing attacks generally include noise attacks, histogram equalization, and median filtering. Two most common noise types, Gaussian noise and salt and pepper noise, were tested in the experiment. Gaussian noise with mean 0 and variance 0.01 was adopted. Furthermore, the salt and pepper noise density was chosen to be 0.05. In addition to noise, we tested the attacks of median filtering and histogram equalization. In the experiment, the median filter with size 3 × 3 was selected. Histogram equalization can homogenize the distribution of pixel values, thus resulting in a more uniform distribution. The sum of the histogram bins was set to 255 in the experiment.

We tested the robustness of the proposed method against the attacks of Gaussian noise, salt and pepper noise, median filtering, and histogram equalization. The NC values of extracted watermark reach 1 for all the eight tested videos after different image-processing attacks. The results show that common image-processing attacks rarely affect the robustness of the proposed method.

4.4.2. Geometric Attacks

Geometric attacks usually cause the target object in the video frame to be distorted, deformed, or shrunken and greatly increase the difficulty of watermark extraction. In this experiment, rotation attacks (1

^{\circ}

, 30

^{\circ}

, 45

^{\circ}

), cropping attacks (cropping the top side and bottom side by 10%), and scaling attacks (shrinking to 3/4, shrinking to 2/3, enlarging to 5/4, and enlarging to 4/3) were tested.

According to the results in Table 5, it can be seen that the NC value of the algorithm is 1 for all eight videos when cropping attacks are performed, so the cropping attacks rarely affect the performance of the algorithm. For most of the rotation attacks and scaling attacks, the watermark can still be extracted correctly for the eight videos. The good robustness is partly attributed to the design of robust watermark templates, and it also owing to the appropriate setting of parameter constant C referred to in Section 2.2.1, which will be discussed in Section 4.5. However, under the attack of 45

^{\circ}

, for four tested videos, there is a slight impact. Although this causes some watermark extraction errors, the high NC values show that the proposed algorithm is still robust to rotation and scaling attacks.

4.4.3. Video-Processing Attack

Video-processing attacks can destroy the synchronization information during the watermark extraction process, which may lead to information loss or misalignment and cause serious errors in the extraction process. Therefore, the robustness of the algorithm to video-processing attacks is crucial. In this section, three main attacks, frame averaging, frame swapping, and frame dropping, were tested to evaluate the algorithm’s robustness. A frame-swapping attack involves swapping two or more frames in a video sequence, thus disrupting the temporal continuity of the video. In our experiments, 10%, 20%, and 30% frame swapping were tested, where (

a %

) denotes that

a %

video frames were swapped. A frame-dropping attack involves the deletion of some frames of the video sequence, creating disorder. In this part, 10%, 20%, and 30% of the frames were randomly deleted. A frame averaging attack is implemented by averaging multiple frames of a video sequence to produce a new frame. In this experiment, an averaging attack was performed for each frame by averaging its neighboring 5 frames.

We conducted experiments on the eight tested videos with different video-processing attacks. The NC values of the extracted watermark are 1 for all the eight tested videos when the algorithm is subjected to the attacks of frame averaging, 10%, 20%, and 30% frame swapping and 10%, 20%, and 30% frame dropping. This demonstrates that the proposed algorithm has strong robustness against video-processing attacks.

4.4.4. Robustness against CRF and Frame Rate Changing

In video coding process, CRF is a parameter that controls the size and the compression ratio of a video. Frame rate changing is also a video-processing technique used to transform the frame rate of a video from one to another. Different video devices and standards usually use different frame rates, but changing the CRF value and frame rate has an impact on the extraction of the video watermark. This is because CRF affects the pixel value of video frames and thus the accuracy of watermark extraction, while frame rates affect the number of video frames and thus influence the synchronization information. Therefore, we also tested the robustness of the proposed method against CRF and frame rate changing attack. In this section, we tested the case of CRF with 23, 25, 28 and the case of converting the video frame rate to 15 and 60.

The results in Table 6 show that the NC value of the extracted watermark is 1 after frame rate changing for both CIF and 1080p video resolutions, which indicates that the proposed algorithm has good robustness in coping with the frame rate changing attack. This is because the frame rate changing can be regarded as a special frame dropping and frame insertion operation, and the proposed GOP grouping method can effectively resist this attack. For the compression attack, we tested the robustness of the video after different CRF compression. For example, when the CRF is 25, the NC value of all eight CIF videos is 1. However, in comparison, when the CRF is 28, the average NC value of 1080p video drops to 0.92, and the CIF video is 0.97. This is because after compression, the amplitude of the template sequence is lower, which may lead to the case of extraction error. Therefore, when dealing with compression attacks, the present algorithm needs to be improved in future research.

4.5. The Effect of Constant C

In the proposed algorithm, C values have a great impact on the robustness. C can affect the invisibility and robustness of the proposed algorithm. Usually, a larger C leads to a decrease in the video invisibility but an increase in the robustness. Table 7 shows the changes in transparency and robustness by setting different C values for the tested CIF videos. For the rotation attack, we calculated the average NC values of rotating 1

^{\circ}

, 30

^{\circ}

, and 45

^{\circ}

. For scaling attacks, the average NC values were calculated after scaling down to 2/3 and scaling up to 4/3. For frame attacks, we calculated the average NC values after frame swapping (30%) and frame dropping (30%).

According to the data in Table 7 and Figure 9, it can be observed that for each video, the NC value gradually increases with the increase in C, while the PSNR and SSIM gradually decrease. As in the tested “News” video, when C = 7000, the average NC value is 0.86, and the PSNR is 42; when C = 8000, the average NC value increases to 0.92, but the PSNR decreases to 41; when C = 9000, the average NC value further increases to 0.99; however, the PSNR further decreases to 40. Therefore, in the proposed method, 9000 was selected as the C value of the CIF video to ensure that the watermarked video has good robustness while maintaining relatively high imperceptibility.

4.6. Time Efficiency

Time efficiency is a response to the time resources needed to process the data. A video is composed of a series of still images, and real time is a very important factor for designers of video watermarking algorithms. We tested the time required to embed and extract the watermark for different videos and also calculated the average watermark embedding and extraction time for each frame. The results are shown in Table 8.

Experiments show that for the four tested CIF videos, average embedding time is of roughly 0.07 s per frame, and the average extraction time is about 0.005 s, while the average embedding and extraction time per frame are about 1.35 s and 0.05 s, respectively, for the four tested 1080p videos. The results present that the proposed algorithm is able to complete the task of watermark embedding and extraction in a short time with high efficiency.

4.7. Algorithm Comparison

References [21,22] were selected for comparison. We performed the proposed method on the same tested videos of references [21,22], and the experimental results are shown in Table 9.

In [21,22], binary strings of the same size of 16 bits were used. We instead used a binary string of size 30 bits. Theoretically, on average, 1 bit information can be embedded per 10 video frames. Reference [22] requires an average of nine video frames to embed one bit, while reference [21] requires an average of 42 video frames to embed one bit. It can be found that our embedding capacity is increased by more than three times compared to that of reference [21], while it remains basically the same as that of reference [22]. In terms of PSNR, our algorithm approaches approximately the same PSNR as in [21,22], both around 40 dB. In terms of image-processing attacks, our algorithm is significantly more robust than [22] and more robust to salt and pepper attacks compared to [21]. Our algorithm performs well in geometric attacks, and it does not require geometric synchronization as in [22]. When referring to the rotation results, the NC value of our algorithm is 0.09 higher than [21] and 0.16 higher than [22]. For scaling, our algorithm also has higher NC values than [21,22]. In addition, the experiment of [21] cannot resist cropping attacks because it breaks spatial synchronization. Finally, our algorithm is also effective against frame-dropping attack. In contrast, references [21,22] are not resistant to frame-dropping attacks because we can reset the GOP size when extracting the watermark.

5. Conclusions

In this study, the 2D-DFT template algorithm was improved. We introduced a prime factorization method to design watermark template sequences, and the embedding capacity in each GOP can be increased from 1 bit to a theoretical n bits. In addition, we modified the GOP segmentation method to provide frame deletion resistance for the proposed method.

The experiments show that the algorithm is robust to image-processing attacks, geometric attacks, frame attack, and compression attacks. Compared with other algorithms, the proposed algorithm has good guarantees in terms of robustness, real time, and capacity.

In future work, we will study the visual perceptual factors of videos to improve watermark transparency. We will also explore methods for joint attack resistance to optimize the performance of watermarking algorithms.

Author Contributions

Conceptualization, X.Y.; methodology, Z.Z. and X.Y.; software, X.Y.; validation, Y.J.; data curation, X.Y. and Y.J.; writing—original draft preparation, X.Y.; writing—review and editing, Z.Z. and Z.L.; funding acquisition, Z.Z. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Scientific Research Common Program of Beijing Municipal Commission of Education (KM202110015004), BIGC Project (Ec202201), the general research project of Beijing Association of Higher Education in 2022 (No. MS2022093), the and R&D Program of Beijing Municipal Education Commission (KM202310015002).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yang, Y.; Li, Z.; Xie, W. High capacity and multilevel information hiding algorithm based on pu partition modes for HEVC videos. Multimed. Tools Appl. 2019, 78, 8423–8446. [Google Scholar] [CrossRef]
Li, Z.; Meng, L.; Jiang, X.; Li, Z. High Capacity HEVC Video Hiding Algorithm Based on EMD Coded PU Partition Modes. Symmetry 2019, 11, 1015. [Google Scholar] [CrossRef] [Green Version]
Lv, Z.; Huang, Y.; Guan, H.; Liu, J.; Zhang, S.; Zheng, Y. Adaptive Video Watermarking against Scaling Attacks Based on Quantization Index Modulation. Electronics 2021, 10, 1655. [Google Scholar] [CrossRef]
Hou, J.U. MPEG and DA-AD Resilient DCT-Based Video Watermarking Using Adaptive Frame Selection. Electronics 2021, 10, 2467. [Google Scholar] [CrossRef]
Joseph, I.; Mandala, J. Comprehensive Review on Video Watermarking Security Threats, Challenges, and Its Applications. ECS Trans. 2022, 107, 13833. [Google Scholar] [CrossRef]
Huang, H.Y.; Yang, C.H.; Hsu, W.H. A video watermarking technique based on pseudo-3-D DCT and quantization index modulation. IEEE Trans. Inf. Forensics Secur. 2010, 5, 625–637. [Google Scholar] [CrossRef]
Choi, D.; Do, H.; Choi, H.; Kim, T. A blind MPEG-2 video watermarking robust to camcorder recording. Signal Process. 2010, 90, 1327–1332. [Google Scholar] [CrossRef]
Belhaj, M.; Mitrea, M.; Prêteux, F.; Duta, S. MPEG-4 AVC robust video watermarking based on QIM and perceptual masking. In Proceedings of the 8th International Conference on Communication, Bucharest, Romania, 10–12 June 2010; pp. 477–480. [Google Scholar]
Li, J.; Wang, Y.; Dong, S. Video watermarking algorithm based DC coefficient. In Proceedings of the 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017; pp. 454–458. [Google Scholar]
Su, P.C.; Kuo, T.Y.; Li, M.H. A practical design of digital watermarking for video streaming services. J. Vis. Commun. Image Represent. 2017, 42, 161–172. [Google Scholar] [CrossRef]
Mansouri, A.; Aznaveh, A.M.; Torkamani-Azar, F.; Kurugollu, F. A low complexity video watermarking in H.264 compressed domain. IEEE Trans. Inf. Forensics Secur. 2010, 5, 649–657. [Google Scholar] [CrossRef]
Xu, D.; Wang, R.; Shi, Y.Q. Data hiding in encrypted H.264/AVC video streams by codeword substitution. IEEE Trans. Inf. Forensics Secur. 2014, 9, 596–606. [Google Scholar] [CrossRef]
Gaj, S.; Kanetkar, A.; Sur, A.; Bora, P.K. Drift-compensated robust watermarking algorithm for H. 265/HEVC video stream. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2017, 13, 1–24. [Google Scholar] [CrossRef]
Elrowayati, A.A.; Abdullah, M.F.L.; Manaf, A.A.; Alfagi, A.S. Robust HEVC video watermarking scheme based on repetition-BCH syndrome code. Int. J. Softw. Eng. Its Appl. 2016, 10, 263–270. [Google Scholar] [CrossRef]
Sun, Y.; Wang, J.; Huang, H.; Chen, Q. Research on scalable video watermarking algorithm based on H.264 compressed domain. Optik 2021, 227, 165911. [Google Scholar] [CrossRef]
Cai, C.; Feng, G.; Wang, C.; Han, X. A reversible watermarking algorithm for high efficiency video coding. In Proceedings of the 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, Shanghai, China, 14–16 October 2017; pp. 1–6. [Google Scholar]
Bayoudh, I.; Jabra, S.B.; Zagrouba, E. A robust video watermarking for real-time application. In Proceedings of the 18th International Conference, ACIVS, Antwerp, Belgium, 18–21 September 2017; pp. 493–504. [Google Scholar]
Liu, Q.; Yang, S.; Liu, J.; Xiong, P.; Zhou, M. A discrete wavelet transform and singular value decomposition-based digital video watermark method. Appl. Math. Model. 2020, 85, 273–293. [Google Scholar] [CrossRef]
Wang, Y.; Ying, Q.; Sun, Y.; Qian, Z.; Zhang, X. A dtcwt-svd based video watermarking resistant to frame rate conversion. In Proceedings of the 3rd International Conference on Culture-Oriented Science and Technology (CoST 2022), Lanzhou (Hybrid), China, 18–21 August 2022; pp. 36–40. [Google Scholar]
Fan, D.; Zhang, X.; Kang, W.; Zhao, H.; Lv, Y. Video watermarking algorithm based on NSCT, pseudo 3D-DCT and NMF. Sensors 2022, 22, 4752. [Google Scholar] [CrossRef] [PubMed]
Sun, X.C.; Lu, Z.M.; Wang, Z.; Liu, Y.L. A geometrically robust multi-bit video watermarking algorithm based on 2-D DFT. Multimed. Tools Appl. 2021, 80, 13491–13511. [Google Scholar] [CrossRef]
Chen, L.; Zhao, J. Contourlet-based image and video watermarking robust to geometric attacks and compressions. Multimed. Tools Appl. 2018, 77, 7187–7204. [Google Scholar] [CrossRef]

Figure 1. Zero-template in 2D-DFT domain: (a) grayscale image with all pixel values of 128, (b) 2D-DFT magnitude of the grayscale image, (c) zero-template in 2D-DFT domain, (d) adding the watermark template. (e) IDFT of Figure 1d, (f) spatial mask of watermark template.

Figure 2. Watermark template sequences with q

_{2}

, which is a multiple of q

_{1}

: (a) watermark template sequences constructed by q

_{1}

and q

_{2}

, where q

_{2}

is an even multiple of q

_{1}

; (b) watermark template sequences constructed by q

_{1}

and q

_{2}

, where q

_{2}

is an odd multiple of q

_{1}

.

Figure 2. Watermark template sequences with q

_{2}

, which is a multiple of q

_{1}

: (a) watermark template sequences constructed by q

_{1}

and q

_{2}

, where q

_{2}

is an even multiple of q

_{1}

; (b) watermark template sequences constructed by q

_{1}

and q

_{2}

, where q

_{2}

is an odd multiple of q

_{1}

.

Figure 3. Two sequences with rotation and non-rotation cases, where red digits represent different bit between two watermark template sequences: (a) two unrotated sequences, (b) a sequence rotated 180 degrees.

Figure 4. Watermark embedding process.

Figure 5. GOP structure.

Figure 6. Watermark extraction process.

Figure 7. Snapshots of the tested videos: (a) hall, (b) news, (c) city, (d) akiyo, (e) station2, (f) pedestrian-area, (g) BasketballDrive, (h) BQterrace.

Figure 8. Subjective visual quality of tested videos before and after watermark embedding: (a) station2 before watermark embedding, (b) station2 after watermark embedding, (c) BasketballDrive before watermark embedding, (d) BasketballDrive after watermark embedding.

Figure 9. The impact of C value on algorithm performance: (a) PSNR of videos with different C values, (b) NC value of videos with different C values.

Table 1. BER values between two of the eight sequences.

BER	$S_{1}$	$S_{2}$	$S_{3}$	$S_{4}$	$S_{5}$	$S_{6}$	$S_{7}$
$S_{0}$	0.5	0.5	0.33–0.67	0.5	0.5	0.5	0.5
$S_{1}$	∖	0.5	0.5	0.5	0.5	0.33–0.67	0.5
$S_{2}$	∖	∖	0.5	0.5	0.5	0.5	0.5
$S_{3}$	∖	∖	∖	0.5	0.5	0.5	0.5
$S_{4}$	∖	∖	∖	∖	0.5	0.33–0.67	0.42–0.58
$S_{5}$	∖	∖	∖	∖	∖	0.5	0.5
$S_{6}$	∖	∖	∖	∖	∖	∖	0.42–0.58

Table 2. Parameters of tested videos.

Video Name	Size	Frame Number
Hall	352 × 288	300
News	352 × 288	300
City	352 × 288	300
Akiyo	352 × 288	300
Station2	1920 × 1080	313
Pedestrian-area	1920 × 1080	375
BasketballDrive	1920 × 1080	501
BQterrace	1920 × 1080	601

Table 3. Average BER of different GOP sizes under frame-swapping and frame-dropping attacks.

GOP Size	No Attack	Frame Swapping (30%)	Frame Dropping (30%)
3	0.4667	0.43	0.44
6	0	0.08	0.40
9	0	0.06	0.22
12	0	0.05	0.20
15	0	0.02	0.15
18	0	0	0.06
21	0	0	0.03
24	0	0	0.03
27	0	0	0.02
30	0	0	0

Table 4. PSNR, SSIM, and capacity of the tested videos.

Video Name	PSNR	SSIM
Hall	40.37	0.933
News	40.31	0.936
City	38.54	0.935
Akiyo	41.21	0.928
Station2	41.54	0.927
Pedestrian-area	42.14	0.929
BasketballDrive	41.09	0.921
BQterrace	40.11	0.929

Table 5. NC values of geometric attacks.

	Hall	News	City	Akiyo	Station2	Pedestrian-Area	BasketballDrive	BQterrace
Cropping (10%)	1	1	1	1	1	1	1	1
Scaling (2/3)	1	1	1	1	1	1	1	1
Scaling (3/4)	1	1	1	1	1	1	1	1
Scaling (5/4)	1	1	1	1	1	1	1	1
Scaling (3/4)	1	1	1	1	1	1	1	1
Rotation (1 $^{\circ}$ )	1	1	1	1	1	1	1	1
Rotation (30 $^{\circ}$ )	1	1	1	1	1	1	1	1
Rotation (45 $^{\circ}$ )	0.96	0.93	0.96	0.96	1	1	1	1

Table 6. NC values after video attack.

	Hall	News	City	Akiyo	Station2	Pedestrian-Area	BasketballDrive	BQterrace
CRF23	1	1	1	1	1	1	1	1
CRF25	1	1	1	1	1	1	1	1
CRF28	0.86	1	1	1	0.96	0.89	0.82	1
FRC(15) ¹	1	1	1	1	1	1	1	1
FRC(60) ²	1	1	1	1	1	1	1	1

¹ Frame rate changing (15 fps). ² Frame rate changing (60 fps).

Table 7. The value of NC and imperceptibility for different C.

	Hall			News			City			Akiyo
C	7000	8000	9000	7000	8000	9000	7000	8000	9000	7000	8000	9000
PSNR	42.52	41.38	40.37	42.47	41.31	40.3	40.7	39.55	38.54	43.37	42.22	41.21
SSIM	0.96	0.95	0.93	0.96	0.95	0.94	0.96	0.95	0.93	0.96	0.94	0.93
Rotation	0.71	0.94	0.99	0.93	0.98	0.98	0.98	0.99	0.99	0.99	0.99	0.99
Scaling	1	1	1	0.69	0.71	1	1	1	1	1	1	1
Frame attack	0.99	1	1	1	1	1	0.99	1	1	1	1	1
CRF28	0.59	0.82	0.86	0.69	0.96	1	0.93	1	1	0.89	0.96	1

Table 8. Time efficiency.

	Embedding Time (s)	Embedding Time per Frame (s)	Extraction Time (s)	Extraction Time per Frame (s)
Hall	20.86	0.07	2.04	0.007
News	21.49	0.07	8.05	0.004
City	21.63	0.07	1.55	0.005
Akiyo	21.56	0.07	5.20	0.005
Station2	681	1.35	26.19	0.05
Pedestrian-area	509	1.35	20.26	0.05
BasketballDrive	662	1.32	24.51	0.05
BQterrace	822	1.30	30.50	0.05

Table 9. Comparison with different methods.

	Chen [21]	Sun [22]	Proposed
PSNR	40.80	39.59	40.82
Capacity	16	16	30
Gaussian noise	1	0.75	1
Salt and pepper noise	0.95	0.84	1
Geometric resynchronization	Not required	required	Not required
Rotation	0.91	0.84	1
Scaling	0.97	0.85	1
Frame dropping	Non-robust	Non-robust	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, X.; Zhang, Z.; Jiao, Y.; Li, Z. A Robust Video Watermarking Algorithm Based on Two-Dimensional Discrete Fourier Transform. Electronics 2023, 12, 3271. https://doi.org/10.3390/electronics12153271

AMA Style

Yang X, Zhang Z, Jiao Y, Li Z. A Robust Video Watermarking Algorithm Based on Two-Dimensional Discrete Fourier Transform. Electronics. 2023; 12(15):3271. https://doi.org/10.3390/electronics12153271

Chicago/Turabian Style

Yang, Xiao, Zhenzhen Zhang, Yueshuang Jiao, and Zichen Li. 2023. "A Robust Video Watermarking Algorithm Based on Two-Dimensional Discrete Fourier Transform" Electronics 12, no. 15: 3271. https://doi.org/10.3390/electronics12153271

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Robust Video Watermarking Algorithm Based on Two-Dimensional Discrete Fourier Transform

Abstract

1. Introduction

2. Preliminaries

2.1. Property of 2D-DFT

2.2. Watermark Template Construction

2.2.1. Watermark Template and Corresponding Spatial Mask

2.2.2. Template Sequence

3. Proposed Watermarking System

3.1. Watermark Template Construction

3.2. Watermark Template Construction

4. Experimental Results

4.1. Experimental Setup

4.2. Selection of GOP Size

4.3. Imperceptibility and Capacity

4.4. Robustness of the Proposed Method

4.4.1. Image Processing Attacks

4.4.2. Geometric Attacks

4.4.3. Video-Processing Attack

4.4.4. Robustness against CRF and Frame Rate Changing

4.5. The Effect of Constant C

4.6. Time Efficiency

4.7. Algorithm Comparison

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI