3.3.1. Perspective Correction

Because different shooting angles and distances will cause perspective distortion, we need to correct it and extract the needed portion from the captured images. The perspective correction function can be written as:

$$
\begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = H \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} \text{ where } H = \begin{bmatrix} m\_{11} & m\_{12} & m\_{13} \\ m\_{21} & m\_{22} & m\_{23} \\ m\_{31} & m\_{32} & m\_{33} \end{bmatrix} \tag{16}
$$

where [*x*- , *y*- , 1] *<sup>T</sup>* and [*x*, *y*, 1] *<sup>T</sup>* define the homogeneous point coordinates of the corrected image and the photo, respectively. *H* defines a nonsingular 3 × 3 homogeneous matrix. According to the formula, the matrix has eight degrees of freedom (DOF). Therefore, at least four sets of points are required to calculate *H*.

We manually select the four needed vertices from the captured image. As the proposed watermarking scheme is designed for leak tracking, manual selection is acceptable. Since the watermark synchronization method is robust to scaling, the images do not have to be recovered to the original pixel size. In theory, without knowing the original size of the image or if the image has been cropped, we can also choose to use the four vertices of the screen to help with perspective correction, as shown in Figure 9. We at least need to know the size or aspect ratio of the screen, or the aspect ratio of the image if it has not been cropped.

Because smartphones have high-megapixel cameras, the pixels of the captured image are commonly substantially larger than the original image. To fully utilize the captured information, a judgment based on the shortest distance between the four points is made before the correction. If it is larger than 1500 pixels, the image will be recovered to two different sizes. By recovering to the original size, if known, or a relatively minor size, the recovered image *I*<sup>1</sup> is used to calculate the candidate LSFRs, which also accelerates the calculation. The image is recovered based on the shortest distance between two of the four vertices, as shown in Figure 9. The recovered image *I*<sup>2</sup> is used for message extraction. Otherwise, only one image will be recovered.

**Figure 9.** Schematic diagram of correction process.

#### 3.3.2. Candidate Regions Locating

The calculation process of the candidate LSFRs is the same as the embedding process, which will be performed on *I*1. The Gaussian function is performed first to reduce the impact of a noise attack. The feature points and associated orientation are calculated. To avoid missing detection, all feature points that may be used for watermark synchronization are selected based on scale and spatial location. We obtain the candidate LSFR set of *I*1. The corresponding regions are extracted from *I*<sup>2</sup> for message extraction.

#### 3.3.3. Message Extraction

Watermark detection is an iterative search for candidate LSFRs. As long as watermark information is detected in one LSFR, the watermark detection of the captured image is completed. Each time, one candidate LFSR is orientation normalized and discrete Fourier transformed. According to the nature of the DFT coefficients, although we do not know the original size, the radius of watermark locations will not vary as long as the area corresponding to the feature scale has not varied. However, the feature scale and its corresponding area will vary slightly, resulting in a slight variation in the radius of watermark locations. Therefore, the searching area will be between *Ri* ∈ (*R* − 10,*R* + 10) at a step of 1 pixel. Besides, we also need to consider the variation in the feature orientation. As we have investigated in Section 3.1, the orientation variation is primarily less than five degrees. Therefore, the starting position is between −5 ◦ , +5 ◦ of the initial position at each radius *Ri* at a step of one degree.

The correction of perspective distortion will inevitably cause some shift of the coefficients and imperfections in resampling. This results in a variation in the coefficient of the adjacent point. An example is shown in Figure 10. In addition, because the feature orientation will vary, the starting position cannot be located directly. Therefore, each time, the maximum magnitude value of the candidate positions and their neighborhoods are extracted to obtain the message *V*.

**Figure 10.** Magnitude coefficients before (**a**) and after (**b**) screen-cam.

Based on local statistical feature, the extracted message *w* is defined as:

$$w(i) = \begin{cases} 1 & \text{if } V(i) \ge \overline{M}\_w + k\_2 \sigma\_w \\ -1 & \text{if } V(i) < \overline{M}\_w + k\_2 \sigma\_w \end{cases} \tag{17}$$

where *Mw* and σ*<sup>w</sup>* define the mean value and the standard deviation of all the magnitudes in the range of {*Ri* − 2,*Ri* + 2}. *V*(*i*) is the extracted maximum value within 3 × 3 magnitudes, *k*<sup>2</sup> is a parameter used to determine the threshold for message extraction.

The extracted *w* is compared with the pseudorandom sequence *W* generated by the secret key to calculate the number of erroneous bits. The watermark detection is positive if the number of erroneous bits is below the predefined threshold *T*. If the detection is negative, the iterative process continues.

#### **4. Parameter Settings**

For demonstration and experimental purposes, the watermark length *l* is set to 60, which can be considered a reasonable message length for real use cases. Based on this, we designed a series of experiments to select the most appropriate values for the parameters mentioned above.

#### *4.1. The Selection of Embedding Radius*

The magnitudes at different embedding radii *R* have different variation rules which affect the robustness of the algorithm. Considering the imperceptibility of the algorithm, the embedding strength β can vary according to different embedding radii.

To select the most suitable embedding radius for the algorithm, we designed an experiment. The eight host images are resized to 241 × 241, which can be treated as an LSFR. We generate the watermark information with the key *K*1, where totality of 32 watermark bits is "1".

Based on the discussion in Section 3.3.3, the embedding radius should be no less than 55 to avoid the watermark bits being too close affecting each other. According to the method in Section 3.2.2, the DFT magnitudes of the experiment images are preprocessed first. Then, watermark information is embedded at different radii for all images based on Equation 15. The PSNR value of the watermarked images is controlled to be around 42 dB by adjusting the embedding strength. The relationship between embedding radius *R* and the average embedding strength β is shown in Figure 11a. With the increase in the embedding radius, the embedding strength can be increased.

In order to compare the variation of the watermarked magnitudes in different radii and at different shooting distances, we designed an index *Kr*,*<sup>d</sup>* as an evaluation indicator to describe the significance of watermark information. Because only the magnitudes of the positions where watermark bit is "1" are modified, *Kr*,*<sup>d</sup>* only need to consider the modified magnitudes. According to Equation (17), it is defined as:

$$K\_{r,d} = \frac{\sum\_{1}^{32} \left( m\_{c(r,i)} - \overline{M}\_w \right)}{32 \cdot \sigma\_w} \tag{18}$$

where *Kr*,*<sup>d</sup>* defines the index of the image captured at the distance of *d* with embedding radius *r*. *mc*(*r*,*i*) defines the magnitude in i-th position where watermark bit is "1" in the captured image with embedding radius *r*.

The relationship between the average of calculated *Kr*,*<sup>d</sup>* and different shooting distances with different embedding radii is shown in Figure 11b. When the shooting distance is close to the screen, the watermark information with a larger embedding radius is more significant due to the higher embedding strength. However, the captured details of the watermark will be less and less as the shooting distance increases, so the higher frequency band coefficients will be poorly preserved. When the embedding radius is 56 and 60, the watermark information can be better preserved at different shooting distances. Considering the real scene, in order to better capture the image displayed on the screen, we usually shoot at 40 to 60 cm. At these distances, results with an embedding radius of 60 are better. Therefore, *R* is set to 60 in our experiment.

**Figure 11.** Influence of different embedding radii.
