**1. Introduction**

Currently, the ubiquity of the computer office and the development of communication technology render digital image storage, copying, and transmission convenient and fast. Security problems involving digital images, such as leakage, malicious theft, and illegal dissemination, still frequently occur. In order to protect the data on the computer side, data encryption [1–6] and software watermarking [7,8] schemes are proposed. Similarly, some scholars have also investigated access control technology [9–11] to prevent illegal copying and transmission of data by restricting internal operations. Although these methods can effectively prevent the illegal acquisition of digital data directly from a computer, they cannot prohibit using a camera to take a photo of sensitive information displayed on the screen. In particular, the popularization of smartphones has made data leakage

by capturing on-screen content much easier and common. However, verifying the copyright of a camera-captured image that is published without permission is still challenging. Digital image watermarking technology has been widely used for copyright protection and leakage tracking [12–16]. To hold someone accountable for the behavior of sneak shots, an invisible screen-cam robust watermark can be embedded in images. In this circumstance, the embedded watermark will survive in a camera-captured image.

To design a screen-cam robust watermarking scheme that can extract watermark information from a single screen-captured photo, we need to analyze the screen-cam process first. The screen-cam process can be considered a compound attack containing moiré noise, luminance distortion, and geometric distortion. Consequently, most traditional watermarking methods are not applicable to the screen-cam process. The process of generating a new image by camera capture is referred to as a cross-media information transfer process that contains analog-to-digital and digital-to-analog conversion attacks. Thus, the print-scan and print-cam processes have certain similarities with the screen-cam process.

Many researchers have focused on extracting watermark information from printed images. Based on embedding domains, these methods can be divided into spatial domain-based and frequency domain-based methods. The spatial domain-based algorithms primarily employed watermark patterns as watermark information. Nakamura et al. [17] proposed a sinusoidal watermark pattern-based adaptive additive embedding algorithm. The pattern-based method was continuously improved with a frame detection method [18] to achieve autocorrection during extraction, a new pattern design [19–21] to achieve better imperceptibility and robustness, or a combination with computational photography technology [22,23] to solve the lens focusing problem.

However, these pattern-based watermark methods are weak to cropping attacks, and the performance of watermark detection will decrease if an image is zoomed-out before it is printed. The frequency domain-based algorithms include Fourier domain-based methods and wavelet domain-based methods. The Fourier domain-based algorithms can be subdivided into two categories. The first category is transform-invariant domain-based methods [24,25], which employed log-polar mapping (LPM) and discrete Fourier transform (DFT) to achieve robustness to rotation, scaling, and translation (RST) distortions. The second category uses the selected magnitude coefficients of the DFT domain as the message carrier [26–29]. Researchers have improved the method by choosing the optimal radius of the implementation to minimize quality degradation [28] or by color correction to enhance the extraction accuracy [29]. For the wavelet domain-based algorithms, messages are embedded by modulating the coefficients of the wavelet transform. These algorithms commonly combine with another domain-based watermark to achieve robustness to affine distortion attacks [30], to rotation and scale attacks [31], or implement fragile watermarking [32]. Similarly, because the frequency domain-based methods are global watermarks, they are also weak to cropping attacks. Furthermore, the experiments in [33] show that these print-scan and print-cam robust watermarking methods have a relatively higher bit error rate that cannot be directly applied to the screen-cam process.

Considering the particularity of the screen-cam process, Fang et al. [33] proposed an effective screen-cam robust image watermark algorithm with an intensity-based scale-invariant feature transform (I-SIFT)-based synchronization. The message is embedded in the discrete cosine transform (DCT) domain of each feature region. However, this method needs to record the four vertices of the image in advance, which means it cannot cope with the situations where the image's original size is unknown. Besides, this method cannot address orientation or scale desynchronization attacks.

However, normal user operations may cause watermark desynchronization inevitably. For example, the image displayed on the screen is blocked by other application windows or the image has been zoomed, rotated, and cropped for usage. Because the spy may just capture the content displayed on the screen, if we do not know the original information of the captured image, we may not be able to restore the image to its original orientation and scale. Therefore, to address possible desynchronization attacks besides screen-cam attack, a screen-cam, RST, and cropping invariant watermark synchronization method need to be further investigated.

To solve these issues, a feature and Fourier-based screen-cam robust watermarking scheme is proposed in this paper. The main contributions are as follows:


The remainder of the paper is organized as follows: Section 2 summarizes different distortions in the screen-cam process. Section 3 describes the implementation details of the proposed method. The selection of parameters and experiment results are presented in Sections 4 and 5. Finally, Section 6 concludes the paper.

### **2. Screen-Cam Process Analysis**

The screen-cam process contains various distortions [34]. The subprocesses of the screen-cam process produce different types of distortions, as shown in Figure 1, and cause severe image quality degradation. This section aims to provide a basis for the design of screen-cam robust watermarking schemes by analyzing the different types of distortions generated in each step of the screen-cam process.

**Figure 1.** Various processes and corresponding distortions in screen-cam process.

The screen-cam process can be divided into three subprocesses: screen display, while shooting, and camera imaging.

In the screen display process, the main factors that affect image signal are the quality of different monitors and their settings. Regular user operations will also cause distortions.

With regard to the while shooting process, the main factors are the shooting environment, the relative position of screen and camera, and the moiré phenomenon. If shooting at a large angle, the focusing problem cannot be disregarded [23]. Besides, camera shake may occur when pressing the shutter.

The camera imaging process of a mobile phone is the process of converting optical signals to digital image signals and processing them. The main types of equipment in the process are the optical lens, the CMOS sensor [35], and the digital signal processor (DSP). With regard to CMOS sensors, the most important components are the photoelectric sensor and analog-to-digital sensor in receiving and processing signals. Therefore, the factors can be divided into hardware parts and software parts. The hardware part is related to the quality of the cameras' hardware devices. The software part is related to the image processing algorithm performed by the image signal processor (ISP). This part contains signal correction and format conversion before storage.

Assuming an original image is displayed on a screen, and it is well captured by a camera, the distortions created during this screen-cam process can be divided into five categories.

Linear distortion: The luminance, contrast, and color distortions caused by the quality and settings of the monitor can be approximated as a linear change. Linear distortion can also occur through linear correction of the image by the ISP.

Gamma tweaking: To fit the human vision, the monitor performs gamma correction on the digital image, which is a nonlinear distortion. The ISP of the mobile phone performs a gamma compensation according to the algorithms on the digital image. The effect of gamma compensation may be amplified by other previous attacks.

Geometric distortion: (1) Different degrees of perspective distortion are caused by the distance and angle of capturing, which will generate an uneven scaling attack on the digital image. (2) Another perspective distortion, such as pincushion distortion and barrel distortion, is caused by lens distortion of the optical lens. By controlling the photographer, the effects of these attacks can be reduced to a certain extent.

Noise attack: Noise attacks are important factors that cause a sharp reduction in image quality during the screen-cam process. Noise attacks can be divided into four categories: moiré noise, noise caused by the external environment, hardware, and software. (1) In physics, moiré noise is a phenomenon of beat noise, which is the wave interference that occurs between two different objects [36]. During the screen-cam process, the light-sensitive elements in a camera emit high-frequency electromagnetic waves and the object being photographed, such as a liquid crystal display (LCD), emits some electromagnetic interference. When the two electromagnetic waves intersect, the waveforms are superimposed and cause electronic fluctuations of the acquiring device. The original waveforms are changed to form a new waveform that is moiré. Generally, the moiré phenomenon is most severe when the two wave frequencies are similar or in a multiple relationship. (2) External noise is caused by unstable external light interference and screen reflections while shooting. (3) Hardware devices such as sensor material properties, electronic components, and circuit structures can cause different noise. The most typical is the quantization error noise caused in the process of digitizing the captured signal by the CMOS processor, where the actual continuous signal is quantized to the pixel values of the digital image signal. (4) The ISP corrects the captured signal, which is independent of the original image to some extent, so the noise reduction correction process also produces new noise.

Low-pass filtering attack: Although a camera has a high resolution and the number of pixels in the captured image is commonly larger than that of the original image, during the process of capturing an optical signal, the signal acquisition does not record each pixel of an image independently. Interference that occurs between the lights causes blurring of the adjacent pixels, which approximates a low-pass filtering attack. Blurring caused by unfocused pixels is similar.

#### **3. Proposed Watermarking Scheme**

This section is dedicated to present the watermark embedding and detection procedures of the proposed watermarking scheme, and it explains the reasons for doing so. In the process of watermark embedding, we first construct LSFRs as message embedding regions. After that, a watermark message is embedded in each LSFR repeatedly with the proposed algorithm. In the process of watermark detection, we first perform perspective correction on the captured image. Then, we find out all the candidate regions and perform message extraction on these candidate regions one by one. Therefore, this section is organized as follows: In Section 3.1, we analyze the selection of feature operators and present the detailed procedures of the proposed LSFR construction method. In Section 3.2, we analyze the embedding operations and present the detailed procedures of the proposed embedding method. The corresponding watermark detection method is given in Section 3.3.
