Clean-Label Backdoor Watermarking for Dataset Copyright Protection via Trigger Optimization
Abstract
:1. Introduction
- We propose a dataset copyright protection method based on clean-label backdoor watermarking and trigger optimization, which can be used to verify whether a suspect model has used our dataset during training.
- We construct a surrogate model and optimize the trigger through iterative guided updates using target class samples. This allows the optimized triggers to have robust feature representations of the target class, making them harmless and more effective.
- We utilize a clean-label watermark embedding setting to ensure that the semantic information of the trigger samples aligns with their labels, making the watermark information imperceptible.
- Extensive experiments have demonstrated the effectiveness, high imperceptibility, and strong robustness of the proposed method. It can maintain effectiveness at a considerably lower watermarking rate compared with existing methods.
2. Preliminaries and Objectives
2.1. The Mechanism of Backdoor-Based Dataset Watermarking
2.2. Design Goals
- Effectiveness: The verification algorithm should accurately detect whether the target model has used the protected dataset.
- Imperceptibility: Watermarking should not be noticeable to those who might steal the dataset.
- Harmlessness: The watermarking algorithm should not affect the performance of the model regarding primary tasks.
- Robustness: Watermarks should be resistant to potential attacks, such as fine tuning and model pruning.
2.2.1. Effectiveness
2.2.2. Imperceptibility
- 1.
- Visual Quality: This factor indicates the difference in visual quality between watermarked and original samples, which is denoted by the following formula.
- 2.
- Label Alignment: This factor indicates whether the labels of watermarked samples are consistent with their ground truth. Otherwise, it would be easy to manually detect that the samples have been mislabeled.
- 3.
- Watermarking Rate: This factor indicates the proportion of watermarked samples relative to the total number of samples in the dataset. The lower the , the harder it is for a watermark to be detected by adversaries.
2.2.3. Harmlessness
2.2.4. Robustness
3. Proposed Methods
3.1. Trigger Optimization
Algorithm 1: Trigger optimization algorithm. |
- 1.
- Trigger Initialization: Initialize a trigger , typically set to a zero vector or small random noise.
- 2.
- Iterative Update: Freeze the parameters of the surrogate model and input . Perform rounds of iterations, updating in each iteration based on the target class data . In each iteration, the gradient of the loss function with respect to the current trigger is calculated.
- 3.
- Gradient Descent: After averaging these gradients, update the trigger using gradient descent based on the update formula , where is the learning rate.
- 4.
- Trigger Optimization Constraint: To ensure the imperceptibility of the trigger, is projected onto the allowed set after each update. is defined as -norm, i.e., . This step can be performed by simply clipping each dimension of to .
- 5.
- Optimization Complete: After the iteration is complete, output the optimized trigger of .
3.2. Dataset Watermark Embedding
3.3. Adversary Model Manipulating
3.4. Watermark Verification
Algorithm 2: Watermark verification algorithm. |
Input: : Suspicious model : Trigger embedding function : Random selected samples t: The target label Output: /* 1. Generate watermark verification set */
/* 2. Record the prediction results of watermark verification set */
/* 3. Perform Wilcoxon signed-rank test */
|
4. Experiments and Analysis
4.1. General Set-Ups
4.1.1. Datasets
4.1.2. Models and Parameters
4.1.3. Hardware Platform and Execution Time
4.2. Evaluation Metrics
- 1.
- WSR (Watermark Success Rate): WSR refers to the proportion of samples in the watermark verification set that are successfully predicted as the target class by . It is widely used to evaluate the effectiveness of watermarking in protecting intellectual property. A higher WSR indicates better robustness of the watermark, enabling more accurate recognition and verification of the watermark information.
- 2.
- MA (Model Accuracy): MA refers to the model’s prediction accuracy on the original task. By evaluating MA, we can assess the impact of watermarking on the model’s performance. Considering the scenario where authorized users train models using watermarked datasets, dataset watermarking should be harmless to the model. A higher MA indicates that the watermark has a lower impact on the overall performance of the model.
- 3.
- TA (Target Accuracy): TA refers to the model’s prediction accuracy on clean samples from the target class. Since clean-label dataset-watermarking methods embed watermark triggers only in samples of the target class, the accuracy of the target class may be more significantly affected by the watermark compared to other classes.
4.3. Performance Analysis
4.3.1. Comparison on Effectiveness
4.3.2. Comparison on Imperceptibility
4.3.3. Comparison on Watermarking Rate
4.3.4. Comparison on Robustness
4.4. Ablation Studies
4.4.1. Trigger Budgets
4.4.2. Misalignment of Surrogate and Target Models
4.5. Analysis of Limitations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, J.; Ma, Y.; Zhang, L.; Gao, R.X.; Wu, D. Deep learning for smart manufacturing: Methods and applications. J. Manuf. Syst. 2018, 48, 144–156. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
- Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695. [Google Scholar] [CrossRef]
- Liang, W.; Tadesse, G.A.; Ho, D.; Fei-Fei, L.; Zaharia, M.; Zhang, C.; Zou, J. Advances, challenges and opportunities in creating data for trustworthy AI. Nat. Mach. Intell. 2022, 4, 669–677. [Google Scholar] [CrossRef]
- Roh, Y.; Heo, G.; Whang, S.E. A survey on data collection for machine learning: A big data-ai integration perspective. IEEE Trans. Knowl. Data Eng. 2019, 33, 1328–1347. [Google Scholar] [CrossRef]
- Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep learning approaches applied to remote sensing datasets for road extraction: A state-of-the-art review. Remote Sens. 2020, 12, 1444. [Google Scholar] [CrossRef]
- Maini, P.; Yaghini, M.; Papernot, N. Dataset inference: Ownership resolution in machine learning. arXiv 2021, arXiv:2104.10706. [Google Scholar]
- Ali, A.; Pinciroli, R.; Yan, F.; Smirni, E. Batch: Machine learning inference serving on serverless platforms with adaptive batching. In Proceedings of the SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Virtual, 9–19 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–15. [Google Scholar]
- Wu, C.J.; Brooks, D.; Chen, K.; Chen, D.; Choudhury, S.; Dukhan, M.; Hazelwood, K.; Isaac, E.; Jia, Y.; Jia, B.; et al. Machine learning at facebook: Understanding inference at the edge. In Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), Washington, DC, USA, 16–20 February 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 331–344. [Google Scholar]
- Taheri, R.; Shojafar, M.; Arabikhan, F.; Gegov, A. Unveiling vulnerabilities in deep learning-based malware detection: Differential privacy driven adversarial attacks. Comput. Secur. 2024, 146, 104035. [Google Scholar] [CrossRef]
- Begum, M.; Uddin, M.S. Digital image watermarking techniques: A review. Information 2020, 11, 110. [Google Scholar] [CrossRef]
- Deng, H.; Qin, Z.; Wu, Q.; Guan, Z.; Deng, R.H.; Wang, Y.; Zhou, Y. Identity-based encryption transformation for flexible sharing of encrypted data in public cloud. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3168–3180. [Google Scholar] [CrossRef]
- Li, Y.; Zhu, M.; Yang, X.; Jiang, Y.; Wei, T.; Xia, S.T. Black-box dataset ownership verification via backdoor watermarking. IEEE Trans. Inf. Forensics Secur. 2023, 18, 2318–2332. [Google Scholar] [CrossRef]
- Chen, X.; Liu, C.; Li, B.; Lu, K.; Song, D. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv 2017, arXiv:1712.05526. [Google Scholar]
- Gu, T.; Liu, K.; Dolan-Gavitt, B.; Garg, S. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access 2019, 7, 47230–47244. [Google Scholar] [CrossRef]
- Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How to backdoor federated learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual, 26–28 August 2020; pp. 2938–2948. [Google Scholar]
- Liu, Y.; Ma, X.; Bailey, J.; Lu, F. Reflection backdoor: A natural backdoor attack on deep neural networks. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 182–199. [Google Scholar]
- Li, S.; Xue, M.; Zhao, B.Z.H.; Zhu, H.; Zhang, X. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans. Dependable Secur. Comput. 2020, 18, 2088–2105. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, Z.; Bai, J.; Wu, B.; Jiang, Y.; Xia, S.T. Open-sourced dataset protection via backdoor watermarking. arXiv 2020, arXiv:2010.05821. [Google Scholar]
- Zeng, Y.; Pan, M.; Just, H.A.; Lyu, L.; Qiu, M.; Jia, R. Narcissus: A practical clean-label backdoor attack with limited information. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen, Denmark, 26–30 November 2023; pp. 771–785. [Google Scholar]
- Tang, R.; Feng, Q.; Liu, N.; Yang, F.; Hu, X. Did you train on my dataset? towards public dataset protection with cleanlabel backdoor watermarking. Acm Sigkdd Explor. Newsl. 2023, 25, 43–53. [Google Scholar] [CrossRef]
- Turner, A.; Tsipras, D.; Madry, A. Label-consistent backdoor attacks. arXiv 2019, arXiv:1912.02771. [Google Scholar]
- Souri, H.; Fowl, L.; Chellappa, R.; Goldblum, M.; Goldstein, T. Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch. Adv. Neural Inf. Process. Syst. 2022, 35, 19165–19178. [Google Scholar]
- Taheri, R.; Javidan, R.; Shojafar, M.; Pooranian, Z.; Miri, A.; Conti, M. On defending against label flipping attacks on malware detection systems. Neural Comput. Appl. 2020, 32, 14781–14800. [Google Scholar] [CrossRef]
- Li, Y.; Bai, Y.; Jiang, Y.; Yang, Y.; Xia, S.T.; Li, B. Untargeted backdoor watermark: Towards harmless and stealthy dataset copyright protection. Adv. Neural Inf. Process. Syst. 2022, 35, 13238–13250. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. Master’s Thesis, University of Toronto, Toronto, ON, Canada, 2009. [Google Scholar]
- Le, Y.; Yang, X. Tiny imagenet visual recognition challenge. CS 231N 2015, 7, 3. [Google Scholar]
- Kumar, N.; Berg, A.C.; Belhumeur, P.N.; Nayar, S.K. Attribute and simile classifiers for face verification. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 365–372. [Google Scholar]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, ICCV, Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Saha, A.; Subramanya, A.; Pirsiavash, H. Hidden trigger backdoor attacks. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11957–11965. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
CIFAR-10 (Tiny-ImageNet OOD) | Tiny-ImageNet (CIFAR-100 OOD) | PubFig (CelebA OOD) | |
---|---|---|---|
Surrogate model training time | 5 min | 12 min | 35 min |
Average trigger optimization time | 16.7 ms | 23.8 ms | 21.6 ms |
Watermarking time () | 486.5 ms | 1437.6 ms | 149.6 ms |
Watermarking time () | 967.8 ms | 2988.7 ms | 306.1 ms |
Benign | BadNets [16] | Blend [15] | HTBA [32] | LCBA [23] | UBW-P [26] | UBW-C [26] | Ours | ||
---|---|---|---|---|---|---|---|---|---|
CIFAR-10 | p-value | 1.00 | 1.00 | 1.00 | |||||
WSR% | 88.76 | 77.79 | 5.28 | 4.09 | 89.98 | 84.32 | 99.03 | ||
MA% | 95.62 | 95.26 | 94.66 | 95.13 | 95.57 | 90.59 | 87.21 | 95.20 | |
TA% | 93.62 | 93.64 | 94.08 | 93.32 | 93.38 | 89.21 | 85.53 | 94.10 | |
Tiny-ImageNet | p-value | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||
WSR% | 0.80 | 0.50 | 2.16 | 1.11 | 1.95 | 0.11 | 85.81 | ||
MA% | 63.22 | 65.10 | 64.59 | 64.08 | 65.08 | 63.94 | 63.31 | 64.65 | |
TA% | 69.11 | 70.36 | 68.13 | 67.84 | 68.20 | 67.69 | 67.80 | 70.00 | |
PubFig | p-value | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||
WSR% | 0.00 | 30.96 | 0.00 | 0.85 | 1.21 | 2.11 | 99.89 | ||
MA% | 93.64 | 93.47 | 93.15 | 93.85 | 93.54 | 93.04 | 93.07 | 93.28 | |
TA% | 95.96 | 92.21 | 96.53 | 97.36 | 97.46 | 97.91 | 96.66 | 95.62 |
BadNets | LCBA | UBW-P | UBW-C | Ours | |
---|---|---|---|---|---|
CIFAR-10 | 0.3833 | 0.0060 | 0.3834 | 0.0008 | 0.0067 |
Tiny-ImageNet | 0.0239 | 0.0349 | 0.0255 | 0.0443 | 0.0215 |
PubFig | 0.2401 | 0.2702 | 0.2396 | 0.3155 | 0.2285 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, W.; Wei, G.; Xu, X.; Xu, Y.; Peng, H.; She, Y. Clean-Label Backdoor Watermarking for Dataset Copyright Protection via Trigger Optimization. Symmetry 2024, 16, 1494. https://doi.org/10.3390/sym16111494
Chen W, Wei G, Xu X, Xu Y, Peng H, She Y. Clean-Label Backdoor Watermarking for Dataset Copyright Protection via Trigger Optimization. Symmetry. 2024; 16(11):1494. https://doi.org/10.3390/sym16111494
Chicago/Turabian StyleChen, Weitong, Gaoyang Wei, Xin Xu, Yanyan Xu, Haibo Peng, and Yingchen She. 2024. "Clean-Label Backdoor Watermarking for Dataset Copyright Protection via Trigger Optimization" Symmetry 16, no. 11: 1494. https://doi.org/10.3390/sym16111494