Mutual Effects of Face-Swap Deepfakes and Digital Watermarking—A Region-Aware Study
Abstract
Highlights
- Region-aware evaluation across visible and invisible watermarks with tunable strength and six face-swap families shows that edits are non-local and non-monotonic—background changes introduced by generators even degrade watermarks that are far from the face, and retention does not vary linearly with embedding strength.
- A locality-preserving baseline bounds the minimal impact—architectures that better confine edits to the facial region, typically those with segmentation-weighted objectives, preserve background watermark signal more reliably than globally trained GAN pipelines.
- Classical robustness tests for watermarking are not sufficient on their own—evaluation should include generator-induced transformations from face swap and report region-wise metrics for face and background.
- Watermark strength and placement should be selected in an architecture-aware manner—in our sweeps, appropriately tuned invisible marks achieved higher background correlation under manipulation than visible overlays at comparable perceptual distortion.
Abstract
1. Introduction
- A two-sided, region-aware evaluation protocol that quantifies both identity transfer and watermark retention;
- Empirical evidence that generator edits are non-local and that the relationship between watermark strength, identity transfer, and retention is non-monotonic, challenging the common assumption that placing a mark away from the face suffices;
- An architecture-aware analysis showing that methods which better confine edits to the facial region—typically those leveraging segmentation-weighted objectives—preserve background watermark signal more reliably than globally trained GAN pipelines;
- Practical guidance for robustness evaluation in sensing workflows, indicating when tuned invisible marks retain more background correlation than visible overlays at comparable perceptual impact.
2. Materials and Methods
2.1. Watermark
2.1.1. Visible Watermark
2.1.2. Invisible Watermark
2.2. Face Swap
2.2.1. SimSwap
2.2.2. FaceShifter
2.2.3. Ghost
2.2.4. FastFake
2.2.5. InsightFace
2.2.6. Baseline
- U-Net—typical for diffusion models, responsible for removing the noise.
- Identity encoder—compresses input data into a one-dimensional hidden space; receives a photo of the same person, but in a different shot, pose, or lighting.
- Attribute encoder—also compresses data into a hidden space, but accepts the target image in its original form.
- Masking—randomly zeroing fragments of a vector, which forces the model to draw information from the identity encoder, as they may be insufficient on their own.
- Dropout—increases the dispersion of information in the vector, preventing data concentration in rarely masked fragments.
- Normal distribution constraint (KL divergence loss)—inspired by the VAE approach [38]; forces the elements of the attribute vector to carry a limited amount of information about the details of a specific image.
2.2.7. Examples of Implemented Deepfakes
- (a)
- Image of the target face—the one that will be replaced,
- (b)
- Source identity for face swap algorithms.

2.3. Experiments
- ArcFace and CurricularFace [39] distance—the cosine distance between feature vectors extracted by the models, allowing for assessing whether the persons depicted in the compared images are recognized as the same.
- Pearson correlation—Pearson correlation coefficient between the image with the watermark and the image after applying face swap to the material containing the watermark.
- PSNR (Peak Signal-to-Noise Ratio)—used additionally in local analyses for a selected area (background).
- Heatmaps of differences between the image after face swap performed on material with a watermark and the image after face swap performed on material without a watermark were compared.
- The ArcFace and CurricularFace distance between these two variants was calculated to assess the impact of marking on identity recognition.
- Watermark retention:
- The Pearson correlation coefficient was calculated between the original image with the watermark and the image after face swapping was performed on the same image.
- Correlation maps and heat maps showing the distribution of changes in the image were generated.
- In local analyses, the background area was examined separately by calculating the PSNR for this region to estimate the impact of face swap outside the face area.
3. Results
3.1. Visible Watermark
3.1.1. The Impact of Watermarks on the Face Swap Algorithm
3.1.2. Watermark Resistance
3.2. Hidden Watermark
3.2.1. The Impact of Watermarks on the Face Swap Algorithm
3.2.2. Watermark Resistance
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AAD | Adaptive Attentional Denormalization |
| AdaIN | Adaptive Instance Normalization |
| AEI-Net | Adaptive Embedding Integration Network |
| ArcFace | Additive angular margin face recognition model |
| BCE | Binary Cross-Entropy |
| BCEWithLogitsLoss | Binary Cross-Entropy with logits |
| BiSeNet | Bilateral Segmentation Network |
| CV | Computer Vision |
| DIP | Digital Image Processing |
| FiLM | Feature-wise Linear Modulation |
| GAN | Generative Adversarial Network |
| ID | Identity embedding |
| KL | Kullback–Leibler divergence |
| LPIPS | Learned Perceptual Image Patch Similarity |
| MSE | Mean Squared Error |
| PSNR | Peak Signal-to-Noise Ratio |
| QR | Quick Response code |
| ResNet | Residual Network |
| SDK | Software Development Kit |
| SSIM | Structural Similarity Index Measure |
| SOTA | State of the Art |
| U-Net | U-shaped convolutional neural network |
| VAE | Variational Autoencoder |
| VGGFace2 | VGG Face dataset (version 2) |
References
- Qureshi, A.; Megías, D.; Kuribayashi, M. Detecting Deepfake Videos Using Digital Watermarking. In Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan, 14–17 December 2021; pp. 1786–1793. [Google Scholar]
- Duszejko, P.; Walczyna, T.; Piotrowski, Z. Detection of Manipulations in Digital Images: A Review of Passive and Active Methods Utilizing Deep Learning. Appl. Sci. 2025, 15, 881. [Google Scholar] [CrossRef]
- Mahmud, B.U.; Sharmin, A. Deep Insights of Deepfake Technology: A Review. DUJASE 2023, 5, 13–23. [Google Scholar]
- Westerlund, M. The Emergence of Deepfake Technology: A Review. Technol. Innov. Manag. Rev. 2019, 9, 40–53. [Google Scholar] [CrossRef]
- Amerini, I.; Barni, M.; Battiato, S.; Bestagini, P.; Boato, G.; Bruni, V.; Caldelli, R.; De Natale, F.; De Nicola, R.; Guarnera, L.; et al. Deepfake Media Forensics: Status and Future Challenges. J. Imaging 2025, 11, 73. [Google Scholar] [CrossRef] [PubMed]
- Lai, Z.; Yao, Z.; Lai, G.; Wang, C.; Feng, R. A Novel Face Swapping Detection Scheme Using the Pseudo Zernike Transform Based Robust Watermarking. Electronics 2024, 13, 4955. [Google Scholar] [CrossRef]
- Zhu, J.; Kaplan, R.; Johnson, J.; Li, F.-F. HiDDeN: Hiding Data with Deep Networks. In Proceedings of the Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Tancik, M.; Mildenhall, B.; Ng, R. StegaStamp: Invisible Hyperlinks in Physical Photographs. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Yao, Y.; Grosz, S.; Liu, S.; Jain, A. Hide and Seek: How Does Watermarking Impact Face Recognition? arXiv 2024, arXiv:2404.18890. [Google Scholar] [CrossRef]
- Begum, M.; Uddin, M.S. Digital Image Watermarking Techniques: A Review. Information 2020, 11, 110. [Google Scholar] [CrossRef]
- Walczyna, T.; Piotrowski, Z. Quick Overview of Face Swap Deep Fakes. Appl. Sci. 2023, 13, 6711. [Google Scholar] [CrossRef]
- Chen, R.; Chen, X.; Ni, B.; Ge, Y. SimSwap: An Efficient Framework for High Fidelity Face Swapping. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle WA USA, 12 October 2020; pp. 2003–2011. [Google Scholar]
- Li, L.; Bao, J.; Yang, H.; Chen, D.; Wen, F. FaceShifter: Towards High Fidelity and Occlusion Aware Face Swapping. arXiv 2020, arXiv:1912.13457. [Google Scholar] [CrossRef]
- Groshev, A.; Maltseva, A.; Chesakov, D.; Kuznetsov, A.; Dimitrov, D. GHOST—A New Face Swap Approach for Image and Video Domains. IEEE Access 2022, 10, 83452–83462. [Google Scholar] [CrossRef]
- Walczyna, T.; Piotrowski, Z. Fast Fake: Easy-to-Train Face Swap Model. Appl. Sci. 2024, 14, 2149. [Google Scholar] [CrossRef]
- Deepinsight/Insightface 2025. Available online: https://github.com/deepinsight/insightface (accessed on 1 September 2025).
- Vasiljević, I.; Obradović, R.; Đurić, I.; Popkonstantinović, B.; Budak, I.; Kulić, L.; Milojević, Z. Copyright Protection of 3D Digitized Artistic Sculptures by Adding Unique Local Inconspicuous Errors by Sculptors. Appl. Sci. 2021, 11, 7481. [Google Scholar] [CrossRef]
- Li, Q.; Wang, X.; Ma, B.; Wang, X.; Wang, C.; Gao, S.; Shi, Y. Concealed Attack for Robust Watermarking Based on Generative Model and Perceptual Loss. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5695–5706. [Google Scholar] [CrossRef]
- Zhao, Y.; Wang, C.; Zhou, X.; Qin, Z. DARI-Mark: Deep Learning and Attention Network for Robust Image Watermarking. Mathematics 2023, 11, 209. [Google Scholar] [CrossRef]
- Kaczyński, M.; Piotrowski, Z. High-Quality Video Watermarking Based on Deep Neural Networks and Adjustable Subsquares Properties Algorithm. Sensors 2022, 22, 5376. [Google Scholar] [CrossRef] [PubMed]
- Wadhera, S.; Kamra, D.; Rajpal, A.; Jain, A.; Jain, V. A Comprehensive Review on Digital Image Watermarking. arXiv 2022, arXiv:2207.06909. [Google Scholar] [CrossRef]
- Horé, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Perez, E.; Strub, F.; de Vries, H.; Dumoulin, V.; Courville, A. FiLM: Visual Reasoning with a General Conditioning Layer. arXiv 2017, arXiv:1709.07871. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Cao, Q.; Shen, L.; Xie, W.; Parkhi, O.M.; Zisserman, A. VGGFace2: A Dataset for Recognising Faces across Pose and Age. In Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018. [Google Scholar]
- Binderiya Usukhbayar Deepfake Videos: The Future of Entertainment 2020. Available online: https://www.researchgate.net/publication/340862112_Deepfake_Videos_The_Future_of_Entertainment (accessed on 1 September 2025).
- Deng, J.; Guo, J.; Yang, J.; Xue, N.; Kotsia, I.; Zafeiriou, S. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5962–5979. [Google Scholar] [CrossRef]
- Huang, X.; Belongie, S. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, PMLR, Sydney, Australia, 17 July 2017; pp. 214–223. [Google Scholar]
- Liu, B.; Zhu, Y.; Song, K.; Elgammal, A. Towards Faster and Stabilized GAN Training for High-Fidelity Few-Shot Image Synthesis. arXiv 2021, arXiv:2101.04775. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. arXiv 2018, arXiv:1808.00897. [Google Scholar]
- Estanislao, K. Hacksider/Deep-Live-Cam 2025. Available online: https://github.com/hacksider/Deep-Live-Cam (accessed on 1 September 2025).
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020, arXiv:2006.11239. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Found. Trends® Mach. Learn. 2019, 12, 307–392. [Google Scholar] [CrossRef]
- Huang, Y.; Wang, Y.; Tai, Y.; Liu, X.; Shen, P.; Li, S.; Li, J.; Huang, F. CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition. arXiv 2020, arXiv:2004.00288. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, C.; Lu, M.; Yang, J.; Gui, J.; Zhang, S. From Simple to Complex Scenes: Learning Robust Feature Representations for Accurate Human Parsing. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5449–5462. [Google Scholar] [CrossRef]








| SimSwap | FaceShifter | Ghost | FastFake | InsightFace | Baseline High Mask | Baseline Medium Mask | Baseline Low Mask | |
|---|---|---|---|---|---|---|---|---|
| Face Swapped | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| Heatmap | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| Image with Watermark | SimSwap | Face Shifter | Ghost | Fast Fake | Insight Face | Baseline High Mask | Baseline Medium Mask | Baseline Low Mask | |
|---|---|---|---|---|---|---|---|---|---|
| 5% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 10% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 20% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 30% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 50% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 75% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 100% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| SimSwap | Face Shifter | Ghost | Fast Fake | Insight Face | Baseline High Mask | Baseline Medium Mask | Baseline Low Mask | |
|---|---|---|---|---|---|---|---|---|
| 0% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 5% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 10% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 20% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 30% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 50% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 75% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 100% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| Image with Watermark | SimSwap | Face Shifter | Ghost | Fast Fake | Insight Face | Baseline High Mask | Baseline Medium Mask | Baseline Low Mask | |
|---|---|---|---|---|---|---|---|---|---|
| 5% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 10% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 20% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 30% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 50% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 75% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ||
| 100% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
| SimSwap | Face Shifter | Ghost | Fast Fake | Insight Face | Baseline High Mask | Baseline Medium Mask | Baseline Low Mask | |
|---|---|---|---|---|---|---|---|---|
| 0% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 5% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 10% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 20% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 30% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 50% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 75% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | |
| 100% | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Walczyna, T.; Piotrowski, Z. Mutual Effects of Face-Swap Deepfakes and Digital Watermarking—A Region-Aware Study. Sensors 2025, 25, 6015. https://doi.org/10.3390/s25196015
Walczyna T, Piotrowski Z. Mutual Effects of Face-Swap Deepfakes and Digital Watermarking—A Region-Aware Study. Sensors. 2025; 25(19):6015. https://doi.org/10.3390/s25196015
Chicago/Turabian StyleWalczyna, Tomasz, and Zbigniew Piotrowski. 2025. "Mutual Effects of Face-Swap Deepfakes and Digital Watermarking—A Region-Aware Study" Sensors 25, no. 19: 6015. https://doi.org/10.3390/s25196015
APA StyleWalczyna, T., & Piotrowski, Z. (2025). Mutual Effects of Face-Swap Deepfakes and Digital Watermarking—A Region-Aware Study. Sensors, 25(19), 6015. https://doi.org/10.3390/s25196015































































































































































































































































































































































































































































































































