Two-Stage Robust Lossless DWI Watermarking Based on Transformer Networks in the Wavelet Domain

Liu, Zhangyu; Li, Zhi; Zheng, Long; Li, Dandan

doi:10.3390/app13126886

Open AccessArticle

Two-Stage Robust Lossless DWI Watermarking Based on Transformer Networks in the Wavelet Domain

State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(12), 6886; https://doi.org/10.3390/app13126886

Submission received: 18 April 2023 / Revised: 31 May 2023 / Accepted: 1 June 2023 / Published: 6 June 2023

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

For copyright protection of diffusion-weighted imaging (DWI) images, traditional robust watermarking techniques result in irreversible distortions, while reversible watermarking methods exhibit poor robustness. We propose a two-stage lossless watermarking algorithm based on a Transformer network to solve this problem. The first stage of the algorithm is to train the robust watermarking network, embed the watermark into the cover image in the wavelet domain, and design the frequency information enhancement module to improve the reconstruction quality. In the second stage, based on the pre-trained robust watermarking network, the difference image between the watermarked image and the cover image is reversibly embedded into the watermarked image as the compensation information to losslessly recover the cover image. The difference image is compressed using DCT and Huffman coding to reduce the compensation information. Finally, the watermark extraction network is trained on the second embedding result to avoid weakening the robustness of the first stage caused by the reversible embedding. The experimental results demonstrate that the PSNR of the watermarked image reaches 60.18 dB. Under various types of image attacks, the watermark extraction BER is below 0.003, indicating excellent robustness. The cover image can be recovered losslessly under no attack.

Keywords:

robust lossless watermark; diffusion-weighted imaging; deep learning; Transformer; discrete wavelet transform

1. Introduction

Due to medical imaging technology’s continuous development and maturity, digital medical images play an increasingly important role in hospital diagnostic and therapeutic information. Among them, diffusion-weighted imaging (DWI) [1] is a non-invasive imaging technique that can observe the diffusion motion of water molecules within tissues. As a four-dimensional medical image, DWI is widely used to diagnose diseases such as neurology, cardiology, breast cancer, hepatology, and cervical myelopathy. With the continuous rise in medical costs, remote medical care has become an economically efficient and convenient diagnosis and treatment mode, attracting more and more attention and applications. Remote consultations and surgeries have been widely implemented domestically and internationally in hospitals, and a large amount of digital medical imaging data is transmitted on the public network. As medical images contain patient health information and privacy, the direct dissemination of unprotected medical images on the public network can easily be misused and tampered with by unauthorized users, leading to problems such as patient information leakage, medical misdiagnosis, and copyright disputes. In order to effectively protect the copyright information of medical images, digital watermarking technology has become an essential means of protecting the copyright of digital medical images.

In medical diagnostics, applying digital watermarking techniques for copyright protection requires meeting specific requirements, including ensuring the integrity of medical image data and the robustness of the watermark. Traditional robust watermarking algorithms [2,3,4] can correctly extract watermarked information for copyright authentication under various types of image attacks, but they inevitably cause distortion loss to the cover image. Reversible watermarking algorithms [5,6,7], on the other hand, focus on lossless embedding and extraction of watermarked information, but they exhibit weaker resistance to attacks and are susceptible to common attack methods. Vulnerable watermarking [8] is capable of tamper detection and localization for protected images, but cannot extract watermarked information for copyright authentication when under attack. For medical images, the distortion caused by robust watermarking methods, the vulnerability of reversible watermarking to various attacks and the low robustness of vulnerable watermarking are unacceptable. This has prompted researchers to develop a new watermarking method that combines robustness and reversibility.

In recent years, robust reversible watermarking algorithms have emerged by integrating robust watermarking and reversible techniques. These algorithms achieve lossless recovery of cover images and robustness of watermarks, thereby better fulfilling the requirements for copyright protection of medical images. Coltuc [9] first proposed a lossless robust watermarking algorithm based on a two-stage watermarking scheme (TSW). In this algorithm, the watermark information is first embedded into the DCT coefficients of the image to generate a robust watermarked image. Then the data required to recover the cover image is used as compensation information and embedded into the watermarked image using a reversible watermarking algorithm. Wang et al. [10] improved the conventional TSW framework by proposing embedding the watermark and compensation information into two independent regions to avoid the negative impact of the second embedding on the first embedding. Liu et al. [6] used recursive jitter modulation to achieve reversible information embedding and provided a reliable solution for image authenticity protection using Slantlet transform and singular value decomposition. Hu et al. [11] designed a new quantization watermarking strategy to embed robust watermarks into the extreme harmonic transform matrix, obtaining the ability to resist geometric distortion. Furthermore, quantization errors, watermark errors, and rounding errors to represent the differences between the cover image and the watermarked image can reduce the compensation information required to restore the cover image. Hu et al. [12] proposed a new lossless watermarking method for cover images that effectively embeds the watermark into low-order Zernike moments and embeds the distortion caused by the robust watermarking as compensation information to be reversibly hidden in the watermarked image.

Traditional robust lossless watermarking algorithms possess certain robustness and visual quality. However, most of them are designed to address specific types of image attacks and lack scalability. They become inadequate when the watermarked image is subjected to new noise attacks. In contrast, deep learning-based robust watermarking algorithms achieve robustness against such attacks by training on existing models. This paper proposes a two-stage robust lossless watermarking algorithm for DWI images based on deep learning to address the issues above. In the first stage, a robust watermarking framework combining wavelet transform and Transformer is employed to accurately extract the watermark information for copyright authentication, achieving robustness against various types of image attacks. Additionally, the reconstructed watermarked image exhibits excellent visual quality. In the second stage, reversible watermarking techniques embed the compensation information and image hash value into the watermarked image. The image hash value is utilized to detect whether the protected image has been attacked, and under the condition of no attack, the compensation information allows for lossless recovery of the cover image, ensuring the integrity of medical image data. The major contributions of this study are as follows.

(1): We propose a two-stage robust lossless DWI watermarking framework based on deep learning in the wavelet domain. Separate branching networks are designed for different frequency bands, which can effectively reduce inter-frequency conflicts and improve the image reconstruction quality and robustness of watermarking.
(2): In the watermark embedding network, we design a frequency-enhanced attention module using both low- and high-frequency information, and this approach effectively integrates multi-frequency features to improve the reconstruction performance.
(3): Compared with other algorithms our proposed algorithm achieves better image reconstruction quality and watermark robustness.

The remainder of this paper is structured as follows. Section 2 introduces the related work. In Section 3, the proposed method is described. In Section 4, the experiments are described, and the experimental results are discussed. Finally, the conclusions of this paper are drawn in Section 5.

2. Related Works

2.1. Diffusion-Weighted Imaging

DWI [1] is currently the only non-invasive method to observe the movement of water molecules inside living tissues. DWI images have diffusion gradient fields in six or more directions to obtain the fibre orientation of an organ under different diffusion gradient fields, derived from two-dimensional cross-sections of multiple faults under different dispersion gradient directions. A DWI image can be represented as

I_{D} (x, y, s, d)

and is a piece of four-dimensional data. The first two dimensions

f_{i} (x, y) (i \in [1, s \times d])

represent organ cross-sections, the third dimension s indicates the tomograph, and the fourth dimension d indicates the direction of the dispersion gradient. As shown in the Figure 1, N denotes the number of diffuse gradient directions, and M denotes the number of two-dimensional cross-sections.

2.2. Robust Watermark

Robust watermarking algorithms based on deep learning have become a popular research topic due to their excellent image reconstruction quality and robustness. Zhu et al. [13] first proposed a framework for robust watermarking algorithms based on deep learning, comprising four components: watermark embedding network, attack network, watermark extraction network, and generative adversarial network (GAN). Later, Ahmadi et al. [14] proposed a fully convolutional neural network with residual structure, improving the imperceptibility and robustness of watermarking. Luo et al. [15] used a multilayer convolutional neural network to model the geometric or noise attacks that may be encountered in digital images and trained the convolutional neural network using adversarial training to improve the robustness of the model to unknown distortions and indistinguishable distortions. Liu et al. [16] proposed a watermark embedding framework with staged training, achieving a robustness to signal processing attacks and geometric attacks while ensuring the visual quality of the image.

The above algorithms all focus on the study of robust watermarking algorithms for natural images. Fan et al. [17] proposed a robust watermarking scheme for DWI images in medical images by combining global and local features for image reconstruction and embedding watermark redundancy into the features of multi-scale reconstruction. Chacko et al. [18] proposed a combination of a discrete cosine transform and a deep learning convolutional neural network. The DCT-transformed cover image and the IF coefficients of the watermark were used as inputs to a DLCNN for watermark embedding. Dhaya et al. [19] proposed a lightweight convolutional neural network using pre-trained network parameters to reduce the time to reconstruct the watermarked image and extract the watermark, achieving high robustness to geometric attacks. Chen et al. [20] proposed a copyright watermark recognition model based on deep learning techniques to replace the traditional copyright authentication algorithm based on error rate and normalized equivalence similarity estimation.

2.3. Discrete Wavelet Transform

The 2D discrete wavelet transform [21] (DWT) is a method for decomposing a 2D signal into different frequency components. It can separate an image into low- and high-frequency parts, allowing for image compression, de-noising, and feature extraction. The basic idea is to decompose the signal into multiple scales and different directional wavelet sub-bands, separating the local and global features of the signal for practical analysis and processing.

The Haar wavelet [22] is the simplest discrete wavelet transform and is the foundation for other wavelet transforms. Compared to other discrete wavelet transforms, the Haar wavelet has lower computational complexity, making it suitable for applications with high computational requirements. The first-level decomposition of the Haar wavelet divides an

N \times N

matrix into four sub-matrices of size

N / 2 \times N / 2

, representing the low-frequency information

L L_{1}

that preserves the cover image content, the vertical high-frequency information

L H_{1}

, horizontal high-frequency information

H L_{1}

, and diagonal high-frequency information

H H_{1}

. Based on the first-level decomposition, the low-frequency information

L L_{1}

can be used as the input for the next level of decomposition, resulting in four sub-matrices

L L_{2}

,

L H_{2}

,

H L_{2}

, and

H H_{2}

. The second-level decomposition and reconstruction process of the Haar wavelet for a 2D image is illustrated in Figure 2. For an

N \times N

-sized image I, the decomposition formula for the Haar wavelet can be expressed as follows:

\{\begin{matrix} L L (i, j) = (I (i, 2 j - 1) + I (i, 2 j)) / 2 \\ L H (i, j) = (I (i, 2 j - 1) - I (i, 2 j)) / 2 \\ H L (i, j) = (L L (2 i - 1, j) + L L (2 i, j)) / 2 \\ H H (i, j) = (L L (2 i - 1, j) - L L (2 i, j)) / 2 \end{matrix} (1 \leq i, j \leq N)

(1)

3. Proposed Method

3.1. Watermark Frame

The two-stage robust reversible watermarking framework based on deep learning is illustrated in Figure 3. This framework consists of two stages. The first stage is the pre-training of the robust watermarking network, which focuses on pre-training the robust watermark embedding and extraction networks. First, the cover image

I_{o}

and original watermark

W_{o}

are fed into the watermark embedding network

N_{e n c}

to reconstruct the watermarked image

I_{e}

. Then, the attack layer network (randomly selecting an attack type and its corresponding parameters) containing the watermarked image and geometric attack methods, used to attack the watermarked image

I_{e}

, generate the distorted image

I_{a}

. Finally, the watermark extraction network

N_{d e c}

extracts the watermark information

W_{d}

from the distorted image

I_{a}

.

The second stage involves adding reversible information embedding on top of the first stage’s robust watermarking network. In this stage, the watermark embedding network

N_{e n c}

and extraction network

N_{d e c}

load the pre-trained network parameters from the first stage and fix the watermark embedding network

N_{e n c}

without gradient backpropagation. Reversible information embedding involves using traditional algorithms to reversibly embed the different images between the cover image

I_{o}

and the reconstructed watermarked image

I_{e}

as compensation information into

I_{e}

, generating the watermarked image

I_{m}

. The attack layer network is directly migrated to attack the watermarked image

I_{m}

, generating the distorted image

I_{a}^{'}

. Finally, the watermark extraction network

N_{d e c}

extracts the watermark information

W_{d}

from the distorted image

I_{a}^{'}

.

3.2. Watermark Embedding Network

The above robust watermarking algorithms based on deep learning perform watermark embedding in the spatial domain. However, since DWI images contain rich texture and greyscale information, to preserve their texture details and fibre direction, a two-dimensional discrete wavelet transform (DWT) is adopted to decompose the DWI image into its corresponding frequency components. It performs watermark embedding in the low-frequency part of the DWI image in the wavelet domain. Discrete wavelet transform can decompose the cover image into different spatial and temporal scales and obtain its corresponding frequency components, effectively performing various image processing tasks. Nowadays, discrete wavelet transform is widely used in various computer vision tasks in deep learning research, such as super-resolution reconstruction, style transfer, and quality enhancement. Unlike the literature that directly passes all wavelet frequency information in the image to convolutional layers, the algorithm proposed in this section independently designs network branches for each group of frequency information. It combines the features of other branches to achieve a better reconstruction performance.

The watermark embedding network proposed is shown in Figure 4. The input DWI image is subjected to a two-level discrete wavelet transform to obtain its corresponding wavelet frequency information. The frequency information in the horizontal direction (

L H_{1}

), vertical direction (

H L_{1}

), and diagonal direction (

H H_{1}

) obtained from the first-level wavelet decomposition are combined to represent

F_{1}

. Similarly, the frequency information in the horizontal direction (

L H_{2}

), vertical direction (

H L_{2}

), and diagonal direction (

H H_{2}

) obtained from the second-level wavelet decomposition are combined to represent

F_{2}

, and the low-frequency part of the second-level wavelet decomposition is represented as

L L_{2}

. To avoid damaging the texture details and fibre direction of the DWI image, the information in the three frequency bands of

F_{1}

,

F_{2}

, and

L L_{2}

is separately input into three network branches for reconstruction.

Firstly, watermark embedding is performed on the low-frequency information

L L_{2}

of the image. To enhance the robustness of watermark extraction, the method of multiple redundant embedding was adopted to embed the watermark information and obtain the reconstructed low-frequency information

L L_{2 e}

. As shown in Figure 5.

Secondly, to transmit the embedded watermark information from the low-frequency information

L L_{2 e}

to the high-frequency information, we proposed a low-high-frequency attention (LHFA) block based on frequency information enhancement shown in Figure 6. The low-frequency information

L L_{2}

in the second-level wavelet decomposition contains global information and important low-frequency components, which are the smooth parts of the cover image. The high-frequency information contains details and local information. By using

L L_{2}

and the reconstructed low-frequency information

L L_{2 e}

as Q and K, respectively, the model can focus more on the important low-frequency components, thereby enhancing the global information and stability of the image. Meanwhile, the content of the high-frequency part

F_{2}

is used as V, which can transmit details and local information to the reconstructed frequency band, thereby improving the accuracy and detail expression ability of the image. The frequency information after the LHFA block is subjected to the Transformer block and convolution and then added to the original frequency band

F_{2}

content to obtain the reconstructed frequency band

F_{2 e}

, completing the transfer of watermark information from low- to high-frequency bands and improving the reconstruction effect and robustness.

Thirdly, similar to the second stage, the high-frequency information

F_{2}

and its corresponding reconstructed

F_{2 e}

from the two-level wavelet decomposition are used as Q and K, respectively. The high-frequency information part

F_{1}

from the first-level wavelet decomposition is used as V. The frequency information processed by the LHFA block is then passed through a Transformer block and convolution and then added to the content of the frequency band

F_{1}

to obtain the reconstructed frequency band

F_{1 e}

. Finally, the reconstructed three frequency bands,

L L_{2 e}

,

F_{2 e}

, and

F_{1 e}

, are subjected to a second-level discrete wavelet inverse transform to obtain the watermarked reconstructed image

I_{e}

. The Transformer block in the network uses a self-attention module [23] as shown in Figure 7.

3.3. Reversible Information Embedding

3.3.1. Difference Image Compression

During the watermark embedding stage, the watermark information

W_{o}

is embedded into the cover image

I_{o}

to obtain the reconstructed watermarked image

I_{e}

. To recover

I_{o}

, the difference image D between

I_{e}

and

I_{o}

is used as compensation information and embedded into

I_{e}

using a reversible watermarking method. Ideally, we can losslessly compress D to obtain its corresponding binary sequence and then reversibly embed it into

I_{e}

to obtain the watermarked image

I_{m}

. In the absence of attacks, the encoded information can be extracted from

I_{m}

and decompressed to obtain D for lossless restoration of

I_{o}

. However, lossless compression of D typically results in a large amount of data, leading to the low visual quality of

I_{m}

. To reduce the amount of compensation information, We use a combination of DCT transformation and Huffman coding to compress D. Specifically, we divide D into blocks of size

8 \times 8

and apply DCT transformation to each block to obtain the DCT coefficients F, determined by the following equation,

\begin{matrix} F_{u v} = \frac{1}{4} C (u) C (v) [\sum_{x = 0}^{7} \sum_{y = 0}^{7} I (x, y) cos [\frac{(2 x + 1) u π}{16}] cos [\frac{(2 y + 1) v π}{16}]] \\ w h e r e \{\begin{matrix} C (u) C (v) = \frac{1}{\sqrt{2}}, u, v = 0 \\ C (u) C (v) = 1, o t h e r w i s e \end{matrix} \end{matrix}

(2)

where

F_{u v}

represents the coefficient at position

(u, v)

in the DCT domain, the DC coefficient is the first coefficient

F (0, 0)

after transformation, and the AC coefficients are the remaining coefficients excluding the DC coefficient. Since the values of D are usually small, the DCT coefficients need to be amplified and then quantized using the quantization matrix Q and the scaling factor n, as shown in the following equation.

F_{q} = r o u n d (\frac{F_{u v} \times 2^{n}}{Q})

(3)

The quantized DCT coefficients

F_{q}

and the scaling factor n are Huffman coded to obtain the binary sequence

D_{c}

, as shown in the following equation.

D_{c} = H (F_{q}, n)

(4)

where H represents the Huffman coding function. Since quantization is a loss compression process, there may be differences between the decompressed difference image

D^{'}

and D. To eliminate these differences, we calculate the error matrix

e = D - D^{'}

and directly add it to the watermarked image

I_{e}

to obtain

I_{e}^{'}

.

3.3.2. Integrity Information

In order to verify whether the marked image

I_{m}

has been attacked during transmission, we need to ensure the integrity of the image

I_{e}^{'}

before reversible embedding. We use the hash value

H_{o}

of

I_{e}^{'}

as the integrity information and embed it into

I_{e}^{'}

together with

D_{c}

using the reversible watermarking method, as shown in Figure 8. When verifying the integrity of

I_{m}

after transmission, we first use the same reversible watermarking method to extract

D_{c}

and

H_{o}

. Then, we restore the image

{\tilde{I}}_{e}^{'}

and calculate its hash value

H_{d}

. If

H_{o}

and

H_{d}

are equal, we consider that

I_{m}

has not been attacked during transmission, and the cover image can be restored losslessly. Otherwise, it indicates that

I_{m}

has been attacked during transmission, resulting in a permanent loss of restoration information. In this case, we can directly extract the watermark information from the distorted image

I_{a}^{'}

for copyright verification.

We adopted a generic reversible algorithm framework and traditional reversible watermarking methods such as histogram modification (HM) [24], pixel value ordering (PVO) [25], difference expansion (DE) [26], and prediction error expansion (PEE) [27] techniques. In this paper, we specifically used difference expansion. The details of directly extracting a robust watermark or restor ing the cover image are presented in Section 3.5 and Section 3.6, respectively.

3.4. Attack Layer Network

Digital medical images are vulnerable to channel noise pollution, unauthorized use, and tampering during transmission through public networks. These intentional or unintentional attacks can cause a certain degree of distortion to the protected medical images, leading to the failure of correctly extracting watermark information embedded in the images, thus undermining copyright authentication. To improve the robustness of the watermark algorithms, common attack methods can be incorporated into the model training process to enhance the robustness of the watermark extraction networks. We selected eight common image attack methods to form an attack layer network: Median filtering (MF), Gaussian filtering (GF), salt and pepper noise (SPN), Gaussian noise (GN), rotation, cropping, JPEG compression (JPEG), and scaling. Figure 9 visualizes the distortion losses caused by these eight image attack methods.

3.5. Watermark Extraction Network

The watermark extraction network in this section is equivalent to the inverse process of the watermark embedding network, as shown in Figure 10. There are two types of input images: one with the watermark information represented as

I_{a}

, and one without the watermark information represented as

I_{n}

. Firstly, the two-level DWT on input images to obtain the frequency information of three parts:

F_{1}^{'}

,

F_{2}^{'}

, and

L L_{2}^{'}

. Similar to the embedding process, three independent branch networks are used to extract features of

F_{1}^{'}

,

F_{2}^{'}

, and

L L_{2}^{'}

, and then the extracted features are channel concatenated—the features extracted by the network corresponding to

F_{1}^{'}

need to be downsampled before channel concatenation.

Subsequently, the extracted watermark information is obtained through Transformer blocks and convolutional processing. If the input image is

I_{a}

, containing the watermark information, the extracted watermark information is denoted as

W_{d}

. If the input image is

I_{n}

, not containing any watermark, the extracted information is a matrix of the same size as the watermark image, with all values equal to 0, denoted as

W_{z}

. This is performed to prevent over-fitting of the watermark extraction network. The MSE between the extracted watermark information and Wo is less than the threshold value of 0.01 as the discriminator, determining whether the image contains a watermark or not.

3.6. Image Recovery

The process of image restoration is shown in the Figure 11. First, the discriminator is used to determine if the marked image

I_{m}

contains watermark information. If

I_{m}

contains watermark information, the same reversible watermarking method is used to extract

{\tilde{I}}_{e}^{'}

,

H_{o}

,

D_{c}

, and calculate the hash value

H_{d}

of

{\tilde{I}}_{e}^{'}

. Then, by comparing

H_{d}

with

H_{o}

, we can verify the integrity of

I_{m}

. If

H_{o}

and

H_{d}

are equal, we assume

I_{m}

has not been attacked during transmission. We can then use inverse compression to obtain

D^{'}

from

D_{c}

, and add

D^{'}

to

{\tilde{I}}_{e}^{'}

to restore the cover image

I_{o}

. Otherwise, if

H_{d}

is not equal to

H_{o}

, it indicates that the image has been attacked during transmission, and the restoration information is permanently lost. In this case, we can directly use the watermark extraction network

N_{d e c}

to extract the watermark from the attacked image

I_{a}^{'}

for copyright verification.

3.7. Loss Functions

The network architecture of our proposed algorithm consists of two stages. In the first stage, we train the watermark embedding and extraction networks. The watermark embedding network takes the original watermark information

W_{o}

and the cover image

I_{o}

as inputs to generate the watermarked image

I_{e}

. The watermark extraction network extracts the watermark

W_{d}

from the attacked image

I_{a}

. The total loss function of this stage consists of three parts, as shown in the following formula,

L_{s t a g e 1} = L_{m s e} (I_{o}, I_{e}) + L_{p e r t o t a l} (I_{o}, I_{e}) + L_{m s e} (L L_{2}, L L_{2 e}) + λ L_{m s e} (W_{o}, W_{d}) + (1 - λ) (W_{o}, W_{z})

(5)

In addition, we use the watermark loss

L_{m s e} (W_{o}, W_{d})

to improve the accuracy of watermark extraction, the perceptual loss

L_{p e r c e t o t a l} (I_{o}, I_{e})

to reduce the semantic differences between the two images, and the visual similarity loss

L_{m s e} (I_{o}, I_{e})

,

L_{m s e} (L L_{2}, L L_{2 e})

to enhance the visual similarity of the images. Here,

L_{m s e}

represents the

l_{2}

loss,

λ = 1

means contains a watermark,

λ = 0

means its does not contain a watermark, and the perceptual loss is the

l_{2}

loss of the output features from the first three modules (

F_{1}

,

F_{2}

,

F_{3}

) of the pre-trained VGG16 network, expressed as the following equation.

L_{p e r t o t a l} (x, y) = 0.65 L_{m s e} (F_{1} (I_{o}), F_{1} (I_{e})) + 0.3 L_{m s e} (F_{2} (I_{o}), F_{2} (I_{e})) + 0.05 L_{m s e} (F_{1} (I_{o}), F_{1} (I_{e}))

(6)

The second phase focuses on embedding reversible information and training the watermark extraction network. The watermark embedding network follows the pre-trained network in the first stage, fixes the network parameters, and does not perform gradient backpropagation. Since embedding reversible information causes distortion loss for watermarked image

I_{e}

, the watermark extraction network needs to continue training based on inheriting the training parameters from the first stage. The total loss function for this phase is shown in the following equation.

L_{s t a g e 2} = λ L_{m s e} (W_{o}, W_{d}) + (1 - λ) (W_{o}, W_{z})

(7)

There is only one watermark loss

L_{m s e} (W_{o}, W_{d})

that can improve the correct watermark rate.

3.8. Algorithm Process

To better illustrate the algorithm flow, Algorithm 1 shows the watermark embedding process and Algorithm 2 shows the watermark extraction process.

Algorithm 1 Watermark embedding algorithm.

Stage1:Training watermark embedding network

N_{e n c}

, watermark extraction network

N_{d e c}

Input: Original DWI image

I_{o}

, Original watermark

W_{o}

Output: Well-trained

N_{e n c}

,

N_{d e c}

Required model:

N_{e n c}

,

N_{d e c}

, attack layer

A t t

While epoch<maxepochs:

1:: Robust watermark embedding $I_{e} = N_{e n c} (I_{o})$
2:: Random attack $I_{a} = A t t (I_{e})$ or $I_{n} = A t t (I_{o})$
3:: Watermark extraction $W_{d} = N_{d e c} (I_{a})$ or $W_{z} = N_{d e c} (I_{n})$
4:: Total loss $L_{s t a g e 1} (I_{o}, I_{e}, W_{o}, W_{d}, W_{z})$
5:: Update the parameters for $N_{e n c}$ , $N_{d e c}$

Stage2:Training watermark extraction network

N_{d e c}

Input: Original DWI image

I_{o}

, Original watermark

W_{o}

, Pre-training

N_{e n c}

,

N_{d e c}

Output: Well-trained

N_{d e c}

, Marker Image

I_{m}

Required model:

N_{e n c}

,

N_{d e c}

,

A t t

, Compensation information generation

R_{g e n}

, Reversible embedding

R_{e m b}

While epoch<maxepochs:

1:: Robust watermark embedding $I_{e} = N_{e n c} (I_{o})$
2:: Compensation information generation $R_{i n f} = R_{g e n} (I_{o}, I_{e})$ (Section 3.3)
3:: Reversible embedding $I_{m} = R_{e m b} (R_{i n f})$
4:: Random attack $I_{a}^{'} = A t t (I_{m})$ or $I_{n} = A t t (I_{o})$
5:: Watermark extraction $W_{d} = N_{d e c} (I_{a}^{'})$ or $W_{z} = N_{d e c} (I_{n})$
6:: Total loss $L_{s t a g e 2} (W_{o}, W_{d}, W_{z})$
7:: Update the parameters for $N_{d e c}$

Algorithm 2 Watermark extraction algorithm.

Input: Marker Image

I_{m}

, Original watermark

W_{o}

Output: Watermark

W_{d}

, Original DWI image

I_{o}

Required model:

N_{d e c}

, Contains watermark test

W_{t e s}

, Copyright Authentication

C_{a u t}

, Integrity Verification

A_{t e s}

, Image Recovery

R_{r e c}

1:: Watermark extraction $W_{d} = N_{d e c} (I_{m})$
2:: Contains watermark test $T r u e | F a l s e = W_{t e s} (W_{o}, W_{d})$ (Section 3.6)
3:: Copyright Authentication $T r u e | F a l s e = W_{t e s} (W_{o}, W_{d})$ (Section 3.6)
4:: Integrity Verification $T r u e | F a l s e = A_{t e s} (I_{m})$ (Section 3.6)
5:: Image Recovery $I_{o} = R_{r e c} (I_{m})$ (Section 3.6)

4. Experiments

4.1. Experimental Settings

We selected publicly available brain DWI image data from the literature [28]. By reducing the dimensionality of the DWI image data, we obtained approximately 80,000 2D slice images. All slice images were cropped to

96 \times 96

, 64,000 slices were chosen as the training set, and 16,000 slices were chosen as the test set. For the experimental settings, we used the Adam optimizer with a learning rate of 0.0001 and betas of (0.5, 0.99). The number of epochs was set to 100 with a batch size of 32. The watermark was a binary image of size

24 \times 24

. The reversible watermarking method utilizes difference expansion, and the image hashing is computed using the MD5 algorithm. The model parameters and sizes of the robust watermarking network framework are presented in Table 1.

The robustness of a watermarking algorithm refers to the accuracy of the watermark information extracted from the attacked protected image; the higher the accuracy, the better the robustness of the algorithm. We use the bit error rate (BER) to evaluate the robustness of the algorithm. The formula of BER is shown in the following,

B E R = \frac{B_{e}}{N_{h} \times N_{w}}, B E R \in [0, 1]

(8)

where

B_{e}

is the number of watermarked bits for error detection; while

N_{h} \times N_{w}

is the total number of bits of the binary watermarked image. To evaluate the difference in visual quality between the watermarked image

I_{e}

and the cover image

I_{o}

. We used the peak signal-to-noise ratio (PSNR) as a measure. The higher the PSNR value, the higher the quality of the

I_{e}

image and the PSNR definition can be expressed as follows,

P S N R = 10 • {log}_{10} (\frac{M A X^{2}}{M S E})

(9)

where MAX is the maximum greyscale order of the image; MSE is the mean squared difference between the

I_{o}

and

I_{e}

images.

4.2. Robustness Experiment

This section evaluates the robustness of our proposed algorithm under no attacks, single attacks (noise, filtering, JPEG compression, scaling, cropping, rotation) and multiple attacks.

4.2.1. No Attack

Table 2 shows the average BER values of the extracted watermark and the average PSNR values of the recovered image and the cover image for 64 slice images under the no-attack condition. Figure 12 displays the visual results of the differences between six slice images and their corresponding recovered images. It can be seen that the average BER value of our proposed algorithm is 0, and the average PSNR value is 1, indicating that our proposed algorithm can extract the watermark information and recover the cover image losslessly under no attack.

4.2.2. Common Attacks

The common types of image attacks mainly contain median filtering and Gaussian filtering with convolution kernel size K of

3 \times 3

,

5 \times 5

,

7 \times 7

, SPN with a noise density p of 0.01, 0.02, and 0.03, Gaussian noise with a variance v of 0.001, 002, and 003, and JPEG compression with a quality factor Q of 90, 70, 30, and 10. The above attacks were applied to the slice images in each gradient direction of the DWI, and the watermark extraction BER values are shown in Table 3. It can be seen that almost all values of BER are equal to 0 after the images are subjected to conventional attacks, indicating that our proposed algorithm has excellent robustness against common attacks.

4.2.3. Geometric Attacks

The geometric attack types mainly contain anticlockwise rotation attacks with rotation angles

θ

of 5, 15, 30, and 45, random cropping attacks with cropping scales q of 0.125, 0.25, 0.5, and 0.7, and scaling attacks with scaling factors r of 0.25, 0.5, 0.75, 1.25, 1.5, and 2. The scaling attack can be rescaled to the cover image size by interpolation after scaling to be used as an input to the watermark extraction network

N_{d e c}

. The above attacks were applied to the slice images in each gradient direction of the DWI, and the watermark extraction BER values are shown in Table 4. It can be seen that almost all values of BER are equal to 0 after the images are subjected to various geometric attacks, indicating that our proposed algorithm has excellent robustness to geometric attacks.

4.2.4. Multiple Attacks

Considering that the images are vulnerable to multiple types of attacks in the actual transmission process, the algorithm’s robustness can be further verified by conducting multiple attack experiments on the images. We randomly selected three attack methods to superimpose attacks on the slice images in each gradient direction of the DWI, and the BER values of the watermark extraction under multiple attacks are shown in Table 5. It can be seen that almost all the BER values of the images are below 0.09 after three different types of superimposed attacks, indicating that our proposed algorithm still has good robustness against multiple attacks.

4.3. Ablation Experiment

In this section, we conducted experiments to demonstrate the effectiveness of the LHFA block by adding or removing it from the watermark embedding network

N_{e n c}

. The results are shown in Table 6, and the average PSNR improves to 60.18 dB after adding the LHFA block. This indicates that the attention module that combines low- and high-frequency information in the watermark embedding network with the LHFA block can improve the image’s accuracy and detail expression ability, thus reconstructing visually higher-quality watermarked images.

4.4. Comparative Experiment

In order to measure its performance comprehensively, the proposed algorithm was compared with (1) Hidden [13] and DwiMark [17] robust watermarking algorithms based on deep learning, also trained in a two-stage separated manner for fair comparison, and (2) three traditional robust lossless watermarking algorithms [10,11,12] with good performance. Table 7 shows the comparison results of the algorithms regarding the visual quality of watermarked images, and the proposed algorithm achieved the highest visual quality of the reconstructed images. Table 8 shows the comparison results of the algorithms in terms of robustness. The proposed algorithm exhibited the best robustness against Gaussian noise and scaling attacks, with a corresponding BER of 0 for watermark extraction. For rotation attacks, the proposed algorithm achieved a BER of 0.002 at a rotation angle of 15

^{\circ}

. Moreover, the proposed algorithm also performed well against other common attacks on medical images, with a BER of 0 for multiple attacks, demonstrating its robustness against most attacks.

5. Conclusions

We proposed a two-stage deep learning-based robust lossless watermarking algorithm for DWI images by combining the advantages of robust and reversible watermarking. In the first stage, we pre-trained a robust watermarking network using a framework based on DWT and Transformer. The watermark redundancy was embedded into the decomposed low-frequency components, and an attention module was designed to effectively fuse multi-frequency features for a better reconstruction performance. In the second stage, the compensation information and image hash value were embedded into the watermarked image using reversible watermarking techniques. The image hash value was utilized to detect whether the image had been attacked, and under the condition of no attack, lossless recovery of the cover image was achieved using the compensation information, ensuring the integrity of the medical image data. The proposed algorithm enabled effective copyright protection for DWI images by extracting the watermark for copyright authentication under various types of attacks. Currently, the embedded watermark image was fixed and unchanged. In the future, the use of random watermarks can be considered to enhance security. The robustness of the watermark is designed to address known attacks, and in the future, adversarial training can be employed to enhance robustness against unknown attacks.

Author Contributions

Conceptualization, methodology, Z.L. (Zhangyu Liu) and Z.L. (Zhi Li); writing—original draft preparation, Z.L. (Zhangyu Liu); validation, Z.L. (Zhangyu Liu) and L.Z.; software, data curation, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China: 62062023, Guizhou Science and Technology Plan Project: ZK[2021]-YB314.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

https://db.humanconnectome.org/ (accessed on 17 April 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Bammer, R. Basic principles of diffusion-weighted imaging. Eur. J. Radiol. 2003, 45, 169–184. [Google Scholar] [CrossRef] [PubMed]
Zhong, X.; Huang, P.C.; Mastorakis, S.; Shih, F.Y. An automated and robust image watermarking scheme based on deep neural networks. IEEE Trans. Multimed. 2020, 23, 1951–1961. [Google Scholar] [CrossRef]
Yang, P.; Lao, Y.; Li, P. Robust watermarking for deep neural networks via bi-level optimization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
Zhang, J.; Chen, D.; Liao, J.; Zhang, W.; Feng, H.; Hua, G.; Yu, N. Deep model intellectual property protection via deep watermarking. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4005–4020. [Google Scholar] [CrossRef] [PubMed]
Menendez-Ortiz, A.; Feregrino-Uribe, C.; Hasimoto-Beltran, R.; Garcia-Hernandez, J.J. A survey on reversible watermarking for multimedia content: A robustness overview. IEEE Access 2019, 7, 132662–132681. [Google Scholar] [CrossRef]
Liu, X.; Lou, J.; Fang, H.; Chen, Y.; Ouyang, P.; Wang, Y.; Zou, B.; Wang, L. A novel robust reversible watermarking scheme for protecting authenticity and integrity of medical images. IEEE Access 2019, 7, 76580–76598. [Google Scholar] [CrossRef]
Kamil, K.S.; Sahu, M.; K. R., R.; Sahu, A.K. Secure Reversible Data Hiding Using Block-Wise Histogram Shifting. Electronics 2023, 12, 1222. [Google Scholar] [CrossRef]
Sahu, A.K. A logistic map based blind and fragile watermarking for tamper detection and localization in images. J. Ambient. Intell. Humaniz. Comput. 2022, 13, 3869–3881. [Google Scholar] [CrossRef]
Coltuc, D. Towards distortion-free robust image authentication. J. Phys. Conf. Ser. 2007, 77, 012005. [Google Scholar] [CrossRef]
Wang, X.; Li, X.; Pei, Q. Independent embedding domain based two-stage robust reversible watermarking. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 2406–2417. [Google Scholar] [CrossRef]
Hu, R.; Xiang, S. Lossless robust image watermarking by using polar harmonic transform. Signal Process. 2021, 179, 107833. [Google Scholar] [CrossRef]
Hu, R.; Xiang, S. Cover-lossless robust image watermarking against geometric deformations. IEEE Trans. Image Process. 2020, 30, 318–331. [Google Scholar] [CrossRef] [PubMed]
Zhu, J.; Kaplan, R.; Johnson, J.; Fei-Fei, L. Hidden: Hiding data with deep networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Ahmadi, M.; Norouzi, A.; Karimi, N.; Samavi, S.; Emami, A. ReDMark: Framework for residual diffusion watermarking based on deep networks. Expert Syst. Appl. 2020, 146, 113157. [Google Scholar] [CrossRef]
Luo, X.; Zhan, R.; Chang, H.; Yang, F.; Milanfar, P. Distortion agnostic deep watermarking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Liu, Y.; Guo, M.; Zhang, J.; Zhu, Y.; Xie, X. A novel two-stage separable deep learning framework for practical blind watermarking. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019. [Google Scholar]
Fan, B.; Li, Z.; Gao, J. DwiMark: A multiscale robust deep watermarking framework for diffusion-weighted imaging images. Multimed. Syst. 2022, 28, 295–310. [Google Scholar] [CrossRef]
Chacko, A.; Chacko, S. Deep learning-based robust medical image watermarking exploiting DCT and Harris hawks optimization. Int. J. Intell. Syst. 2022, 37, 4810–4844. [Google Scholar] [CrossRef]
Dhaya, R. Light weight CNN based robust image watermarking scheme for security. J. Inf. Technol. Digit. World 2021, 3, 118–132. [Google Scholar]
Chen, Y.P.; Fan, T.Y.; Chao, H.C. Wmnet: A lossless watermarking technique using deep learning for medical image authentication. Electronics 2021, 10, 932. [Google Scholar] [CrossRef]
Zhang, D.; Zhang, D. Wavelet transform. Fundam. Image Data Mining Anal. Featur. Classif. Retr. 2019, 2019, 35–44. [Google Scholar]
Stanković, R.S.; Falkowski, B.J. The Haar wavelet transform: Its status and achievements. Comput. Electr. Eng. 2003, 29, 25–44. [Google Scholar] [CrossRef]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
Yuan, J.; Zheng, H.; Ni, J. Efficient Reversible Data Hiding Using Two-Dimensional Pixel Clustering. Electronics 2023, 12, 1645. [Google Scholar] [CrossRef]
Huang, P.; Li, D.; Wang, Y.; Zhao, H.; Deng, W. A Novel Color Image Encryption Algorithm Using Coupled Map Lattice with Polymorphic Mapping. Electronics 2022, 11, 3436. [Google Scholar] [CrossRef]
Chen, P.; Lei, Y.; Niu, K.; Yang, X. A Novel Separable Scheme for Encryption and Reversible Data Hiding. Electronics 2022, 11, 3505. [Google Scholar] [CrossRef]
Lee, J.Y.; Kim, C.; Yang, C.N. Reversible Data Hiding Using Inter-Component Prediction in Multiview Video Plus Depth. Electronics 2019, 8, 514. [Google Scholar] [CrossRef] [Green Version]
Van Essen, D.C.; Ugurbil, K.; Auerbach, E.; Barch, D.; Behrens, T.E.; Bucholz, R.; Chang, A.; Chen, L.; Corbetta, M.; Curtiss, S.W.; et al. The Human Connectome Project: A data acquisition perspective. Neuroimage 2012, 62, 2222–2231. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. The data structure of DWI images.

Figure 2. Two-level Haar wavelet decomposition and inverse transform.

Figure 3. A two-stage robust lossless watermarking architecture based on deep learning.

Figure 4. The proposed architecture of the encoder.

Figure 5. The proposed architecture of the WE block.

Figure 6. The proposed architecture of the LHFA block.

Figure 7. The proposed architecture of the Transformer block.

Figure 8. The proposed reversible information embedding process.

Figure 9. Visualization of image distortion loss under eight attack types.

Figure 10. The proposed architecture of the decoder.

Figure 11. The proposed process of image recovery and watermark extraction.

Figure 12. Visualization results of the recovered images of six slice images.

Table 1. The number of parameters, model size, and runtime of the network.

Model Component	Parameters	Size	Time
Encoder	1,864,508	7.25 MB	1.514 s
Decoder	3,588,616	13.96 MB	1.128 s
Total	5,453,124	21.22 MB	2.642 s

Table 2. BER and PSNR under no attack.

Images	Avg BER/bit	Avg PSNR/dB
64	0	1

Table 3. BER values under common attacks.

Attack Type	Parameter	BER/bit
median filtering	K = 3	0.000004
median filtering	K = 5	0.000004
median filtering	K = 7	0.000016
Gaussian filtering	K = 3	0.000005
Gaussian filtering	K = 5	0.00006
Gaussian filtering	K = 7	0.000081
SPN	p = 0.01	0.000004
SPN	p = 0.02	0.000006
SPN	p = 0.03	0.000006
Gaussian noise	v = 0.01	0.000004
Gaussian noise	v = 0.02	0.000385
Gaussian noise	v = 0.03	0.000013
JPEG compression	Q = 90	0.000004
JPEG compression	Q = 70	0.000011
JPEG compression	Q = 30	0.000005
JPEG compression	Q = 10	0.002077

Table 4. BER values under geometric attacks.

Attack Type	Parameter	BER/bit
rotation	$θ$ = 5	0.000785
rotation	$θ$ = 15	0.002837
rotation	$θ$ = 30	0.001154
rotation	$θ$ = 45	0.001025
cropping	q = 0.125	0.000004
cropping	q = 0.25	0.000004
cropping	q = 0.5	0.000004
cropping	q = 0.7	0.000004
scaling	r = 0.5	0.000003
scaling	r = 0.75	0.000004
scaling	r = 1.25	0.000003
scaling	r = 1.5	0.000004
scaling	r = 2	0.000004

Table 5. BER values under multiple attacks.

Attack1	Attack2	Attack3	BER/bit
MF K = 7	GN v = 0.03	scaling r = 0.5	0.0003
GF K = 5	rotation $θ$ = 45	JPEG Q = 30	0.0632
GF K = 3	cropping q = 0.125	JPEG Q = 90	0.0011
SPN p = 0.02	cropping q = 0.25	JPEG Q = 70	0.0756
MF K = 5	GN v = 0.01	JPEG Q = 30	0.0504
SPN p = 0.02	rotation $θ$ = 30	JPEG Q = 30	0.0876

Table 6. Validation of the effectiveness of the LHFA block in image reconstruction quality.

No.	LHFA Block	Avg PSNR/dB
1	✓	60.18
2	✕	58.43

Table 7. Comparison with the PSNR of watermarked images from other methods.

Watermarked Image	[13]	[17]	[10]	[11]	[12]	Proposed
PSNR/dB	39.08	58.69	43.7	43.1	46.8	60.18

Table 8. Robustness comparison with other methods.

Attack	[13]	[17]	[10]	[11]	[12]	Proposed
GN (0.01)	0.002	0.000	0.000	0.010	0.000	0.000
GN (0.02)	0.024	0.000	0.000	0.060	0.010	0.000
GN (0.03)	0.025	0.026	0.000	0.180	0.050	0.000
JPEG (90)	0.164	0.000	0.000	0.000	0.000	0.000
JPEG (70)	0.333	0.012	0.000	0.000	0.000	0.000
JPEG (30)	0.394	0.343	0.000	0.000	0.000	0.000
JPEG (10)	0.430	0.375	0.150	0.000	0.000	0.002
rotation (5)	0.004	0.000	—	0.000	0.000	0.000
rotation (15)	0.009	0.010	—	0.000	0.000	0.002
rotation (30)	0.011	0.010	—	0.000	0.000	0.001
rotation (45)	0.011	0.020	—	0.000	0.000	0.001
scaling (0.5)	0.063	0.001	—	0.000	0.000	0.000
scaling (0.75)	0.096	0.010	—	0.006	0.002	0.000
scaling (1.25)	0.131	0.010	—	0.000	0.000	0.000
scaling (1.5)	0.150	0.015	—	0.000	0.030	0.000
scaling (2)	0.206	0.025	—	0.020	0.010	0.000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Li, Z.; Zheng, L.; Li, D. Two-Stage Robust Lossless DWI Watermarking Based on Transformer Networks in the Wavelet Domain. Appl. Sci. 2023, 13, 6886. https://doi.org/10.3390/app13126886

AMA Style

Liu Z, Li Z, Zheng L, Li D. Two-Stage Robust Lossless DWI Watermarking Based on Transformer Networks in the Wavelet Domain. Applied Sciences. 2023; 13(12):6886. https://doi.org/10.3390/app13126886

Chicago/Turabian Style

Liu, Zhangyu, Zhi Li, Long Zheng, and Dandan Li. 2023. "Two-Stage Robust Lossless DWI Watermarking Based on Transformer Networks in the Wavelet Domain" Applied Sciences 13, no. 12: 6886. https://doi.org/10.3390/app13126886

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Two-Stage Robust Lossless DWI Watermarking Based on Transformer Networks in the Wavelet Domain

Abstract

1. Introduction

2. Related Works

2.1. Diffusion-Weighted Imaging

2.2. Robust Watermark

2.3. Discrete Wavelet Transform

3. Proposed Method

3.1. Watermark Frame

3.2. Watermark Embedding Network

3.3. Reversible Information Embedding

3.3.1. Difference Image Compression

3.3.2. Integrity Information

3.4. Attack Layer Network

3.5. Watermark Extraction Network

3.6. Image Recovery

3.7. Loss Functions

3.8. Algorithm Process

4. Experiments

4.1. Experimental Settings

4.2. Robustness Experiment

4.2.1. No Attack

4.2.2. Common Attacks

4.2.3. Geometric Attacks

4.2.4. Multiple Attacks

4.3. Ablation Experiment

4.4. Comparative Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI