TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors
Abstract
:1. Introduction
- We are the first to utilize UE5 for generating adversarial patterns within a fully differentiable rendering pipeline. This advancement builds upon prior methods that employed Unreal Engine 4 (UE4), offering improved graphical fidelity and rendering capabilities. Using UE5 we reduce the domain gap between the simulated environment and real-world deployment, ensuring adversarial patterns remain effective in the real world.
- We introduce an additional neural rendering component, a gray textured truck image, to accurately capture and reproduce lighting and shadow conditions.
- We are the first to design adversarial patterns specifically for YOLOv8 in the context of vehicle detection, moving beyond previous work that focused on older models like YOLOv3 [9].
- We introduce Intersection over Prediction-based (IoP-based) filtering as part of the class loss formulation, enhancing the stealthiness of adversarial optimization by considering bounding boxes that significantly overlap with the target object. This method reduces false detections and improves the overall effectiveness of adversarial patterns.
- We propose the Convolutional Smooth Loss function, a novel smooth loss function for ensuring that the adversarial textures are not only effective but also visually plausible.
2. Related Works
3. Materials and Methods
3.1. Problem Statement
- Reference Image (): A photorealistic image of the truck in the scene, rendered with UE5’s advanced lighting and shading techniques. The texture for this image is randomly sampled from a High-Resolution Texture Dataset defined in Section 4.2.
- Gray Textured Truck Image (): A version of the truck rendered in a neutral gray texture (RGB: 127, 127, 127).
- Depth Map (): A depth map that provides the distance from the camera to each pixel on the truck surface.
- Binary Mask (): A mask identifying the pixels corresponding to the truck in each image. It is generated using a custom material in UE5, which ensures black color during rendering for accurate segmentation.
- Camera Parameters (): Parameters that define the camera’s pose and orientation in the scene, used as input for differential rendering.
3.2. Truck Adversarial Camouflage Optimization
3.2.1. Neural Renderer: Overview of the Rendering Process
3.2.2. Photorealistic Rendering Network (PRN)
3.2.3. Attack Loss
- denotes the set of indices for bounding boxes with an IoP greater than the threshold ;
- is the confidence score for class c in bounding box i;
- C is the number of classes (80 in the case of YOLOv8).
- denotes the set of predicted bounding boxes with relatively large IoU values that we want to minimize.
3.2.4. Convolutional Smooth Loss
3.2.5. Projected Gradient Descent with Adam
Standard Clipping and Its Limitations
PGD with Adam
- Compute Raw Updates:
- Update step:
- Texture projection step:
Gradient Projection
- Compute Raw Updates:
- Gradient projection step:
- Update step:
4. Experimental Setup
4.1. Truck Model
- Body Parts: This segment includes the main body of the truck, such as the carrosserie and tarp, which are usually painted for camouflage purposes. While these body parts comprise only 1282 triangular faces (5% of the total number of faces), they cover approximately 80% of the truck’s visible surface area. This is due to the fact that these regions consist of larger, less intricate surfaces compared to the auxiliary parts like wheels.
- Auxiliary Parts: The remaining components of the truck—such as the wheels, bumper, exhaust stacks, and other parts that are not typically painted or are impractical to paint—fall under this category. These parts contain the remaining 23,502 triangular faces (95% of the faces), but they account for only 20% of the truck’s visible surface area due to their smaller individual sizes and more intricate geometry.
4.2. Dataset
- Core Truck Dataset: A large collection of rendered truck images under diverse positions, camera parameters, and textures.
- High-Resolution Texture Dataset: A complementary set of 4000 high-resolution texture images.
4.2.1. Core Truck Dataset
- Camera viewpoints and distances: For each image, the camera was placed between 5 m and 35 m from the truck, randomly oriented with azimuth angles between 5° and 90°, and elevation angles from 0° to 360°.
- Truck textures: A random selection from our High-Resolution Texture Dataset (see below).
4.2.2. High-Resolution Texture Dataset
- Describable Textures Dataset [30]: A total of 1500 images were carefully selected from this dataset to provide a variety of texture patterns.
- Van Gogh Paintings [31]: A total of 300 texture images were sourced from Van Gogh’s paintings, chosen for their distinct color patterns.
- Random Uniform Color Images: A total of 200 images consisting of uniform color values were generated.
- Random Noise Images: A total of 2000 noise textures were randomly generated to simulate non-structured patterns visually similar to adversarial patterns.
4.3. Implementation Details
4.3.1. PRN Training
4.3.2. Adversarial Texture Generation
- We used 1000 images per truck location (instead of the full 2000) to reduce training time. This subset thus contains 25,000 images in total.
- We performed a location-based split into 18 locations for training (18,000 images) and 7 locations for testing (7000 images).
5. Results
5.1. Evaluation Metrics and Models
- Average Precision at IoU threshold 0.5 (AP@0.5): This metric evaluates the precision of the object detector when the Intersection over Union (IoU) between the predicted bounding box and the ground truth exceeds 50%.
- Attack Detection Rate (ADR): The proportion of images in which the object detector successfully identifies the truck.
- YOLOv8X [8]: Our target model for the adversarial attack.
- YOLOv3u [9]: A previous generation of the YOLO family with an upgraded detection head.
- YOLOv5Xu: An intermediate version of the YOLO models with upgraded detection heads.
- Faster R-CNN v2 (FRCNN) [32]: An improved version of the two-stage object detection model.
- Fully Convolutional One-Stage Object Detection (FCOS) [33]: An anchor-free object detection framework.
- Detection Transformer (DETR) [34]: A transformer-based object detection model.
5.2. Performance Comparison of Different Textures
- Base: The original single-color texture of the truck without any adversarial modifications.
- Naive: A simple camouflage texture common on military trucks.
- Random: A texture initialized with random pixel values.
- DTA: The Differentiable Transformation Network Approach (DTA), an existing adversarial camouflage method reimplemented for comparison [14].
5.3. Impact of Different Loss Functions
5.4. Texture Initialization Study
- Zeros: Initializing the texture with all zeros (black texture).
- Ones: Initializing the texture with all ones (white texture).
- Random: Initializing with random values.
- Base: Starting from the truck’s original texture.
5.5. Smoothness Loss Coefficient Analysis
5.6. Class Activation Mapping Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; Conference Track Proceedings; Bengio, Y., LeCun, Y., Eds.; ICLR: Appleton, WI, USA, 2015. [Google Scholar]
- Hendrik Metzen, J.; Chaithanya Kumar, M.; Brox, T.; Fischer, V. Universal adversarial perturbations against semantic image segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2755–2764. [Google Scholar]
- Lin, Y.C.; Hong, Z.W.; Liao, Y.H.; Shih, M.L.; Liu, M.Y.; Sun, M. Tactics of adversarial attack on deep reinforcement learning agents. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 3756–3762. [Google Scholar]
- Shayegani, E.; Mamun, M.A.A.; Fu, Y.; Zaree, P.; Dong, Y.; Abu-Ghazaleh, N. Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv 2023, arXiv:2310.10844. [Google Scholar]
- Zhong, Y.; Liu, X.; Zhai, D.; Jiang, J.; Ji, X. Shadows can be dangerous: Stealthy and effective physical-world adversarial attack by natural phenomenon. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 15345–15354. [Google Scholar]
- Lian, J.; Mei, S.; Wang, X.; Wang, Y.; Wang, L.; Lu, Y.; Ma, M.; Chau, L.P. Attack Anything: Blind DNNs via Universal Background Adversarial Attack. arXiv 2024, arXiv:2409.00029. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. Available online: https://github.com/ultralytics/ultralytics (accessed on 12 October 2024).
- Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2018; Volume 1804, pp. 1–6. [Google Scholar]
- Liu, X.; Yang, H.; Liu, Z.; Song, L.; Li, H.; Chen, Y. Dpatch: An adversarial patch attack on object detectors. arXiv 2018, arXiv:1806.02299. [Google Scholar]
- Thys, S.; Van Ranst, W.; Goedemé, T. Fooling automated surveillance cameras: Adversarial patches to attack person detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Hoory, S.; Shapira, T.; Shabtai, A.; Elovici, Y. Dynamic adversarial patch for evading object detection models. arXiv 2020, arXiv:2010.13070. [Google Scholar]
- Zhang, Y.; Foroosh, P.H.; Gong, B. Camou: Learning a vehicle camouflage for physical adversarial attack on object detections in the wild. In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Suryanto, N.; Kim, Y.; Kang, H.; Larasati, H.T.; Yun, Y.; Le, T.T.H.; Yang, H.; Oh, S.Y.; Kim, H. Dta: Physical camouflage attacks using differentiable transformation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 15305–15314. [Google Scholar]
- Suryanto, N.; Kim, Y.; Larasati, H.T.; Kang, H.; Le, T.T.H.; Hong, Y.; Yang, H.; Oh, S.Y.; Kim, H. Active: Towards highly transferable 3d physical camouflage for universal and robust vehicle evasion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 4305–4314. [Google Scholar]
- Kato, H.; Ushiku, Y.; Harada, T. Neural 3d mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3907–3916. [Google Scholar]
- Wang, J.; Liu, A.; Yin, Z.; Liu, S.; Tang, S.; Liu, X. Dual attention suppression attack: Generate adversarial camouflage in physical world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8565–8574. [Google Scholar]
- Wang, D.; Jiang, T.; Sun, J.; Zhou, W.; Gong, Z.; Zhang, X.; Yao, W.; Chen, X. FCA: Learning a 3d full-coverage vehicle camouflage for multi-view physical adversarial attack. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 2414–2422. [Google Scholar]
- Zhou, J.; Lyu, L.; He, D.; LI, Y. RAUCA: A Novel Physical Adversarial Attack on Vehicle Detectors via Robust and Accurate Camouflage Generation. In Proceedings of the Forty-First International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Duan, Y.; Chen, J.; Zhou, X.; Zou, J.; He, Z.; Zhang, J.; Zhang, W.; Pan, Z. Learning Coated Adversarial Camouflages for Object Detectors. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 23–29 July 2022; IJCAI-2022. pp. 891–897. [Google Scholar] [CrossRef]
- Li, Y.; Tan, W.; Zhao, C.; Zhou, S.; Liang, X.; Pan, Q. Flexible Physical Camouflage Generation Based on a Differential Approach. arXiv 2024, arXiv:2402.13575. [Google Scholar]
- Lyu, L.; Zhou, J.; He, D.; Li, Y. CNCA: Toward Customizable and Natural Generation of Adversarial Camouflage for Vehicle Detectors. arXiv 2024, arXiv:2409.17963. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Ultralytics. YOLOv3 — docs.ultralytics.com. Available online: https://docs.ultralytics.com/models/yolov3/ (accessed on 25 October 2024).
- Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2019, arXiv:1706.06083. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
- Cimpoi, M.; Maji, S.; Kokkinos, I.; Mohamed, S.; Vedaldi, A. Describing Textures in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Kim, A. Van Gogh Paintings Dataset. Mendeley Data. 2022. Available online: https://data.mendeley.com/datasets/3sjjtjfhx7/2 (accessed on 12 October 2024).
- Li, Y.; Xie, S.; Chen, X.; Dollar, P.; He, K.; Girshick, R. Benchmarking detection transfer learning with vision transformers. arXiv 2021, arXiv:2111.11429. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.J.; Bourdev, L.D.; Girshick, R.B.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. [Google Scholar]
- Desai, S.; Ramaswamy, H.G. Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Seattle, WA, USA, 13–19 June 2020; pp. 983–991. [Google Scholar]
Configuration | L1 Loss | SSIM |
---|---|---|
Without | 0.031 | 0.9862 |
With | 0.025 | 0.9901 |
Method | AP@0.5 | |||||
---|---|---|---|---|---|---|
YOLOv8 | YOLOv3 | YOLOv5 | FRCNN | FCOS | DETR | |
Base | 0.7295 | 0.7216 | 0.6132 | 0.8377 | 0.6361 | 0.6441 |
Naive | 0.8057 | 0.7305 | 0.6518 | 0.7770 | 0.6317 | 0.6619 |
Random | 0.6705 | 0.7214 | 0.5537 | 0.8202 | 0.5948 | 0.6470 |
DTA (optimized) | 0.2865 | 0.3663 | 0.3068 | 0.4532 | 0.3633 | 0.4121 |
TACO | 0.0099 | 0.0491 | 0.1381 | 0.3157 | 0.2410 | 0.2600 |
Method | ADR | |||||
---|---|---|---|---|---|---|
YOLOv8 | YOLOv3 | YOLOv5 | FRCNN | FCOS | DETR | |
Base | 0.7453 | 0.7258 | 0.6195 | 0.8689 | 0.7475 | 0.6678 |
Naive | 0.8241 | 0.7376 | 0.6561 | 0.8234 | 0.7719 | 0.7045 |
Random | 0.6814 | 0.7313 | 0.5614 | 0.8440 | 0.6959 | 0.6586 |
DTA (optimized) | 0.2906 | 0.3646 | 0.3011 | 0.4560 | 0.3851 | 0.4130 |
TACO | 0.0097 | 0.0448 | 0.1354 | 0.3558 | 0.5511 | 0.3247 |
Method | AP@0.5 | |||||
---|---|---|---|---|---|---|
YOLOv8 | YOLOv3 | YOLOv5 | FRCNN | FCOS | DETR | |
0.0283 | 0.1283 | 0.1976 | 0.4884 | 0.4400 | 0.4280 | |
0.0189 | 0.0690 | 0.1677 | 0.3786 | 0.2586 | 0.3105 | |
0.0373 | 0.1283 | 0.2371 | 0.4979 | 0.4892 | 0.4638 | |
0.0099 | 0.0491 | 0.1381 | 0.3157 | 0.2410 | 0.2600 |
Method | ADR | |||||
---|---|---|---|---|---|---|
YOLOv8 | YOLOv3 | YOLOv5 | FRCNN | FCOS | DETR | |
0.0317 | 0.1262 | 0.1953 | 0.5067 | 0.5542 | 0.4451 | |
0.0189 | 0.0690 | 0.1677 | 0.3786 | 0.2586 | 0.3105 | |
0.0361 | 0.1298 | 0.2317 | 0.5097 | 0.6220 | 0.4867 | |
0.0097 | 0.0448 | 0.1354 | 0.3558 | 0.5511 | 0.3247 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dimitriu, A.; Michaletzky, T.V.; Remeli, V. TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors. Big Data Cogn. Comput. 2025, 9, 72. https://doi.org/10.3390/bdcc9030072
Dimitriu A, Michaletzky TV, Remeli V. TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors. Big Data and Cognitive Computing. 2025; 9(3):72. https://doi.org/10.3390/bdcc9030072
Chicago/Turabian StyleDimitriu, Adonisz, Tamás Vilmos Michaletzky, and Viktor Remeli. 2025. "TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors" Big Data and Cognitive Computing 9, no. 3: 72. https://doi.org/10.3390/bdcc9030072
APA StyleDimitriu, A., Michaletzky, T. V., & Remeli, V. (2025). TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors. Big Data and Cognitive Computing, 9(3), 72. https://doi.org/10.3390/bdcc9030072