Next Article in Journal
A Novel Detection Algorithm for the Icing Status of Transmission Lines
Next Article in Special Issue
Blockchain-Based Digital Asset Circulation: A Survey and Future Challenges
Previous Article in Journal
Assessing Vulnerabilities in Line Length Parameterization and the Per-Unit-Length Paradigm for Phase Modulation and Figure-of-Merit Evaluation in 60 GHz Liquid Crystal Phase Shifters
Previous Article in Special Issue
AAHEG: Automatic Advanced Heap Exploit Generation Based on Abstract Syntax Tree
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reflective Adversarial Attacks against Pedestrian Detection Systems for Vehicles at Night

by
Yuanwan Chen
1,2,
Yalun Wu
1,2,
Xiaoshu Cui
1,2,
Qiong Li
1,2,
Jiqiang Liu
1,2 and
Wenjia Niu
1,2,*
1
Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing Jiaotong University, Beijing 100044, China
2
School of Cyberspace Science and Techonology, Beijing Jiaotong University, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Symmetry 2024, 16(10), 1262; https://doi.org/10.3390/sym16101262
Submission received: 30 August 2024 / Revised: 14 September 2024 / Accepted: 23 September 2024 / Published: 25 September 2024
(This article belongs to the Special Issue Advanced Studies of Symmetry/Asymmetry in Cybersecurity)

Abstract

:
The advancements in deep learning have significantly enhanced the accuracy and robustness of pedestrian detection. However, recent studies reveal that adversarial attacks can exploit the vulnerabilities of deep learning models to mislead detection systems. These attacks are effective not only in digital environments but also pose significant threats to the reliability of pedestrian detection systems in the physical world. Existing adversarial attacks targeting pedestrian detection primarily focus on daytime scenarios and are easily noticeable by road observers. In this paper, we propose a novel adversarial attack method against vehicle–pedestrian detection systems at night. Our approach utilizes reflective optical materials that can effectively reflect light back to its source. We optimize the placement of these reflective patches using the particle swarm optimization (PSO) algorithm and deploy patches that blend with the color of pedestrian clothing in real-world scenarios. These patches remain inconspicuous during the day or under low-light conditions, but at night, the reflected light from vehicle headlights effectively disrupts the vehicle’s pedestrian detection systems. Considering that real-world detection models are often black-box systems, we propose a “symmetry” strategy, which involves using the behavior of an alternative model to simulate the response of the target model to adversarial patches. We generate adversarial examples using YOLOv5 and apply our attack to various types of pedestrian detection models. Experiments demonstrate that our approach is both effective and broadly applicable.

1. Introduction

Pedestrian detection is a computer vision technique used to identify the presence of pedestrians in images or video sequences and provide precise localization [1]. With the rise of autonomous driving and intelligent transportation systems, pedestrian detection technology plays a crucial role in enhancing road safety, optimizing traffic management, and advancing smart transportation systems. The development of deep learning has significantly improved the accuracy and robustness of pedestrian detection. However, numerous challenges remain in practical applications, such as complex backgrounds, the diversity of pedestrians, and varying lighting conditions. One emerging threat is adversarial attacks, which exploit the vulnerabilities of deep learning models and introduce small perturbations into the input data, thereby misleading detection systems [2]. These attacks are not only effective in digital environments but also pose significant threats in the physical world [3].
Recent research has demonstrated several successful applications of adversarial attacks to evade pedestrian detectors. For instance, Thys et al. [4] attached a patch to a piece of cardboard and placed it in front of the body to deceive surveillance cameras, thus maliciously bypassing the monitoring system. Xu et al. [5] proposed an adversarial T-shirt printed with adversarial patches, achieving physical adversarial instances on non-rigid objects. Hu et al. [6] further introduced AdvTexture, an adversarial texture that can cover clothing of any shape, enabling multi-angle attacks. Zhu et al. [7] focused on infrared detection systems and used small light bulbs to realize physical attacks on the thermal infrared pedestrian detector. Although more and more adversarial attack methods have been proven to be effective in certain situations, they still face some challenges. Firstly, these attack methods often lack stealth, making it easy for drivers, road guards, or other observers to detect them before a successful attack is carried out, thus allowing for the adoption of preventive measures. Additionally, most of the current research on using adversarial attacks to evade pedestrian detection focuses on daytime environments. However, how to achieve effective attacks in night-time conditions remains a question that requires further exploration.
To address the above challenges, in this paper, we propose an adversarial attack method against pedestrian detection systems for vehicles at night. We utilize optical reflective materials with a microprism structure as adversarial patches, which efficiently reflect vehicle headlights back to the vehicle’s camera, thereby disrupting the accuracy of the pedestrian detection model. Initially, we simulate the performance of these reflective materials under vehicle headlights in a digital environment. Following this, we employ a particle swarm optimization (PSO) algorithm [8] to determine the optimal deployment positions for these materials, focusing on locations on pedestrian clothing. In the physical world, based on the PSO algorithm’s results, we strategically deploy the attack using reflective materials that are either transparent or matched to the color of the pedestrian’s clothing. This approach ensures that the patches remain discreet in both daylight and low-light conditions, significantly enhancing the stealthiness of the attack. Moreover, recognizing that attackers in the real world often do not have knowledge of the specific object detection model used by their target vehicle, we propose a “symmetry” strategy. This approach involves using the behavior of YOLOv5 to simulate the response of the target model to adversarial patches. By optimizing adversarial examples on YOLOv5, we explore the interference these samples produce in the target model, ensuring the effectiveness of the attack.
In our study, we simulated the behavior of reflective materials under vehicle headlights in a digital environment and used these materials as adversarial patches. We targeted the YOLOv5 model and applied an improved particle swarm optimization (PSO) algorithm to determine the optimal K deployment positions for the adversarial patches, maximizing the attack’s effectiveness. To demonstrate the effectiveness of the attack across different types of pedestrian detection models, we implemented our attack strategy on single-stage models such as YOLOv3 and RetinaNet, as well as multi-stage detection methods like faster R-CNN and cascade R-CNN. Through experiments conducted in the digital domain, we validated the effectiveness of the reflective adversarial attack. Additionally, we explored the impact of the number of adversarial patches, K, on the detection results of different models and the attack success rate of the reflective adversarial attack. We further conducted experiments on pedestrian datasets with individuals wearing different colored shirts, examining the impact of clothing color on the attack’s effectiveness, and providing experimental evidence for optimizing reflective effects in practical applications. The contributions of this work can be summarized as follows:
  • We propose a novel night-time pedestrian detection adversarial attack method for autonomous vehicle systems. This approach leverages the optical properties of reflective materials to redirect light back to its source, effectively interfering with pedestrian detection systems while remaining inconspicuous during daytime and low-light conditions.
  • We utilize the vehicle’s own equipment as the source of the attack, ensuring that deployment in real-world settings does not draw attention from road observers.
  • We simulated the reflective material in a digital environment, optimizing and evaluating its performance. The effectiveness of the reflective adversarial attack was demonstrated by successfully compromising various types of pedestrian detection models.
The rest of the paper is organized as follows. Section 2 reviews related work on pedestrian detection and adversarial attacks. In Section 4, we explain our threat models and provide a detailed description of the optimization process for adversarial reflective patch attacks. Section 5 describes the experimental setup, the datasets used, and the evaluation metrics, followed by the results and analysis of the proposed attack method on various pedestrian detection models. Finally, Section 6 concludes the paper and suggests directions for future work.

2. Related Work

2.1. Pedestrian Detection

Pedestrian detection [9] is the key to identifying and locating pedestrians and ensuring safety in automatic driving. With the rapid development of deep learning technology, the accuracy and efficiency of pedestrian detection algorithms have been significantly improved. At present, pedestrian detection models can be divided into single-stage and two-stage methods. Single-stage models such as YOLO [10] and SSD [11] are characterized by one-step detection and high real-time performance and are widely used in practical applications. YOLO divides the input image into a grid of fixed size and predicts bounding boxes, class probabilities, and confidence scores for each grid cell. Similarly, SSD performs detection using multi-scale feature maps, making single-stage models particularly effective for real-time applications. Two-stage models like faster R-CNN [12] and mask R-CNN [13] involve two steps: generating candidate regions and classifying and refining those regions. The feature is to provide higher accuracy at the cost of increased computing requirements. Faster R-CNN first generates potential object locations using a region proposal network (RPN) and then classifies and regresses these regions. Mask R-CNN extends faster R-CNN by adding instance segmentation capabilities, further enhancing detection accuracy.
In order to fully evaluate the effectiveness of our proposed method on different types of detection models, we conduct experiments on a variety of single- and two-stage detection methods. These methods include single-stage YOLOv3 and RetinaNet [14], as well as two-stage faster R-CNN and cascade R-CNN [15]. By conducting comprehensive tests on various mainstream pedestrian detection models, we verify the universality and robustness of the proposed method.

2.2. Adversarial Attacks

Adversarial attacks are a technique designed to mislead deep learning models into making decisions and are mainly divided into two types of attacks: digital environments and physical environments [16].

2.2.1. Adversarial Attacks in Digital Settings

Digital environment adversarial attacks mislead deep learning models by adding small, carefully designed perturbations to the input data. Goodfellow et al. [2] proposed a fast gradient symbol method (FGSM) to generate adversarial samples using the gradient information of the loss function. Kurakin et al. [17] proposed a basic iterative method (BIM) to enhance the effectiveness of counter-samples by adding small perturbations through multiple iterations. Madry et al. [18] proposed the projected gradient descent (PGD) method, which projected adduction samples into the allowed disturbance range after each iteration to generate more powerful adduction samples. Carlini and Wagner [19] designed a C&W attack to generate adversarial samples by minimizing specific adversarial loss functions. Laidlaw et al. [20] proposed perceptual adversarial attacks, which generate adversarial samples that retain perceptual features by maximizing perceptual similarity. Kwon [21] introduced a dual-mode method for generating adversarial examples, allowing the prioritization of attack success over minimizing distortion based on the scenario at hand. Additionally, Kwon [22] proposed a method for generating adversarial examples by applying different weights to the RGB channels, which reduces human-perceptible distortions while maintaining high attack success rates. However, these methods of countering attacks in the digital environment are difficult to deploy directly in the real world, as they require direct modification of input data. Our work addresses this limitation and proposes an adversarial attack approach that can be implemented in the real world.

2.2.2. Adversarial Attacks in Physical Settings

Physical environment adversarial attacks are designed to create real objects or perturbations that can mislead deep learning models in real environments.
Research in this area has gone through multiple stages of development. Early studies have focused on the visualization of physical confrontation samples. Kurakin et al. [17] proved the existence of adversarial samples in the physical world by printing digital adversarial samples for the first time. Subsequently, Sharif et al. [23] printed adversarial patterns on eyeglass frames to fool facial recognition systems, Brown et al. [24] introduced adversarial patches, Eykholt et al. [3] showed how minor modifications to physical stop signs can mislead image classifiers, Athalye et al. [25] developed 3D printed adversarial objects, and the following methods were used to create a new type of adversarial object. Xu et al. [5] proposed an antagonistic T-shirt with a special design pattern. However, these patch or decal-based countermeasure techniques often have obvious, semi-permanent, and immutable disadvantages that are easily recognized. To solve these limitations, researchers began to explore the use of light for physical attacks. Lovisotto G. et al. [26] used a projector, Duan et al. [27] used a laser beam, Li et al. [28] used a concentrating flashlight, and Zhong et al. [29] and Cui et al. [30] explored the use of shadows to create perturbations. Although these methods take into account natural light, additional equipment needs to be placed, increasing the likelihood of detection.
In this study, we were inspired by the work of Chen [31], who interfered with night-time vehicle traffic sign recognition systems by attaching reflective patches to traffic signs. We optimize and extend this method and propose an anti-attack method for pedestrian detection at night based on reflective materials. Our approach has a distinct advantage over previous work in that it is less likely to leave recognizable traces, making it difficult to detect in daytime and low-light conditions.

3. Threat Models

In pedestrian detection systems, adversarial attacks pose a significant threat by introducing subtle yet strategically crafted perturbations, causing the detection system to misidentify pedestrians [3,18]. Such attacks can be executed in digital environments by modifying pixel values of input images and can also be implemented in the physical world through special materials or patterns added to pedestrian clothing [32]. Our threat model focuses on adversarial attacks in the physical world, specifically by leveraging the reflective properties of materials to interfere with pedestrian detection systems in night-time conditions.
  • Attacker’s Knowledge: We assume that the adversary has knowledge of the camera and headlights used in the victim’s vehicle but lacks internal information about the target pedestrian detection system, including model architecture, parameter settings, and training data. This represents a black-box attack, where the attacker can only infer the attack strategy by observing the relationship between the input data (e.g., images captured by the camera) and the output results (the pedestrian detection system’s recognition output) [33].
  • Attacker’s Goal: Our attack method employs reflective materials as adversarial patches, designed to reflect the majority of the vehicle’s headlight illumination back to the source during night-time while remaining inconspicuous during daytime or low-light conditions. Therefore, the objective of our attack is to cause the pedestrian detection system to fail in recognizing pedestrians equipped with reflective patches at night, effectively making them “invisible” to the detection system. This increases the risk of pedestrians being hit by vehicles at night.

4. Methodology

In this section, we present a reflective adversarial attack targeting pedestrian detection systems. Section 4.1 outlines the problem formulation and objectives of the pedestrian detection model. In Section 4.2, we detail the attack method, divided into three components: optimization, digital world attack, and physical world attack. Section 4.3 discusses the selection and simulation of reflective materials for our experiments. Finally, Section 4.4 describes the Particle Swarm Optimization (PSO) process used to enhance the placement of reflective patches. The entire pipeline is illustrated in Figure 1.

4.1. Problem Formulation

  • Pedestrian Detection: Pedestrian detection is a critical task in computer vision aimed at automatically identifying and locating pedestrians within images or video frames. Specifically, the goal of pedestrian detection is to generate bounding boxes for each pedestrian in the image and determine whether these bounding boxes contain pedestrians [34].
Given an image dataset D = { I k } k = 1 K , where each image I k has dimensions H × W × C (with H as height, W as width, and C as the number of channels), the objective is to design a pedestrian detection model f. This model should generate a set of bounding boxes B k = { b k , 1 , b k , 2 , , b k , n k } for each image I k , where each bounding box b k , i is represented by coordinates ( x k , i 1 , y k , i 1 , x k , i 2 , y k , i 2 ) and an associated label l k , i . The label l k , i is binary, with  l k , i = 1 indicating the presence of a pedestrian within the bounding box, and  l k , i = 0 indicating its absence.
The model’s objective is to design an effective pedestrian detection model f that accurately detects and locates pedestrians in the images from dataset D. To achieve this, the model should optimize the following loss function:
min f L d e t ( f ( I k ) , { b k , i , l k , i } ) ,
where L d e t represents the detection task’s loss function, which includes classification and localization losses, and  { b k , i , l k , i } are the ground truth bounding boxes and labels for all images in the dataset.
  • Adversarial Patch Attack: The adversarial patch attack is a method that interferes with model predictions by adding adversarial patches to images. An adversarial patch is a carefully designed small image region that can be placed anywhere in an image to affect the model’s detection results.
The attack objective is to generate an adversarial patch P such that when P is added to any image I k from the image dataset D, the pedestrian detection model’s performance significantly deteriorates. Specifically, the optimization objective for the adversarial patch attack is as follows:
min P L a d v ( f ( I k + P ) , { b k , i , l k , i } ) ,
where L a d v is the loss function of the model on images with the adversarial patch, and  { b k , i , l k , i } represents the true labels and bounding boxes for all images in the dataset.
The optimization process to generate an effective adversarial patch typically includes the following objective:
min P L a d v ( f ( I k + P ) , { b k , i , l k , i } ) + λ p a t c h · Mask ( P ) p ,
where Mask ( P ) is the mask of the adversarial patch on the image, Mask ( P ) p is the norm of the mask used to control the size and location of the patch, and  λ p a t c h is a weight hyperparameter that balances loss and patch size.

4.2. Reflective Adversarial Attack Method

As illustrated in Figure 1, our attack is divided into three main components:
  • Optimization: In the digital environment, we simulated the behavior of reflective materials under car headlights. We used white to represent the reflective patches and overlaid a white light source on them. The light source was centered at the intersection of the patch’s diagonal lines, with a radius equal to the length of the patch’s longer side, to more accurately replicate the effect of reflective materials under illumination. Using these simulated reflective patches, we attacked the YOLOv5 pedestrian detection model and optimized the patch placement on pedestrian clothing using a particle swarm optimization (PSO) algorithm, which is detailed in Section 4.4. The optimization results will serve as the basis for applying reflective patches in real-world scenarios.
  • Digital World Attack: Based on our optimization results, we applied the simulated reflective patches technique mentioned in the prior optimization search to generate a series of test images that could potentially cause misclassification by pedestrian detection models. Notably, while the placement parameters for our reflective adversarial patches were determined on the YOLOv5 model, the actual victim model used was a black-box model.
  • Physical World Attack: Based on our optimization results, we deployed reflective materials that closely matched the color of pedestrian clothing. In night-time conditions, we tested the clothing with applied reflective materials in real-world scenarios. We observed and recorded the performance of pedestrian detection models in vehicles passing by at various distances when car headlights illuminated the pedestrians. It is important to note that the victim vehicles were aware of the experiment to avoid causing any real harm to pedestrians.

4.3. Selection and Simulation of Reflective Materials

Reflective materials on the market are primarily divided into two categories: glass bead structures and microprism structures. Reflective films with a glass bead structure generally have lower reflectivity and more dispersed reflected light, making them less effective at long distances or under large angles of incidence compared to microprism films. They are more suited for low-cost applications. To achieve better attack performance, we compared different microprism reflective materials available on the market and selected 3M’s DG3 diamond grade reflective sheeting as the final material for our method. It employs full-cube microprism technology, where light undergoes multiple reflections within each microprism and is almost entirely reflected back along the incident direction, maximizing reflectivity. Compared to partial microprism or glass bead technologies, 3M’s full-cube microprisms provide significantly higher reflection efficiency.
During the attack, when vehicle headlights illuminate the reflective material, the microprism structure nearly completely reflects light back toward the headlights. Since the light does not scatter in other directions, the reflective material appears exceptionally bright in the direction of the light source. Additionally, the limited dynamic range of onboard cameras makes it difficult for them to capture both extremely bright light and darker backgrounds simultaneously. As a result, when the strong light reflected by the material enters the camera, the camera records the area as being much brighter than the actual surroundings, making the reflective material appear as a light source.
To simulate the effect of microprism reflective materials under headlight illumination, our method treats the highlighted region as an area of concentrated reflection and reproduces the retroreflective properties of the material through image processing techniques. The core idea is to highlight specific bounding box regions in the image and simulate the light source effect, making the area appear abnormally bright from the camera’s perspective.
For each region requiring reflection simulation, we first define its coordinate range ( x 1 , y 1 ) and ( x 2 , y 2 ) , and define the width w = x 2 x 1 and height h = y 2 y 1 . The center of the reflective area is ( x c , y c ) = x 1 + x 2 2 , y 1 + y 2 2 , which serves as the basis for the light source position. The light source radius r is set equal to the width w of the region, simulating the light diffusion range of the reflective material.
To create the light source effect, we generate a light source mask L ( x , y ) , defined as follows:
L ( x , y ) = 255 if ( x x c ) 2 + ( y y c ) 2 r 0 otherwise .
This mask represents the initial brightness distribution of the reflective material region. To simulate light diffusion, we apply a Gaussian blur G σ with a standard deviation σ equal to half of the light source radius r:
L ( x , y ) = G σ L ( x , y ) .
The light source intensity α (in the range [ 0 , 1 ] ) is used to control brightness, and the brightness distribution of the light source is given as follows:
I light ( x , y ) = α · L ( x , y ) 255 .
Finally, the light source effect I light is combined with the original image I and the highlight mask M, generating the final highlighted image:
I highlighted = I + I light + M .
Here, the mask M ( x , y ) is a binary mask identifying the highlighted region, defined as follows:
M ( x , y ) = 255 if x 1 x x 2 and y 1 y y 2 0 otherwise .
Through these steps, we successfully simulate the bright reflective effect of the materials under headlight illumination in a digital environment, replicating the strong light behavior of reflective materials in the camera’s view in the physical world.

4.4. Optimization for Reflective Adversarial Attacks

During the materials preparation phase, we collected images of pedestrians illuminated by vehicle headlights at night. In this section, we provide a detailed description of how to optimize the placement of reflective materials in the digital environment to effectively perform adversarial attacks on pedestrian detection systems.
We employed YOLOv5s as the pedestrian detector in our system. YOLOv5s is the smallest model in the YOLOv5 series, characterized by a relatively low number of parameters and computational demands, making it well-suited for deployment on mobile devices or embedded systems. Each detected pedestrian bounding box is divided into a 10 × 5 grid, providing 50 candidate regions for placing reflective patches. Each candidate region serves as a potential position for a particle in the PSO algorithm during the optimization process.
We chose particle swarm optimization (PSO) as our optimization search algorithm. PSO is an optimization technique inspired by the foraging behavior of birds and the schooling behavior of fish. The fundamental idea is to find the optimal solution through collaboration and information sharing among individuals in a population. It is a type of evolutionary algorithm, similar to simulated annealing, that starts with random solutions and iteratively searches for the optimal solution by evaluating the quality of solutions using a fitness function.
The PSO algorithm maintains a swarm of particles, where each particle represents a potential solution in the search space. In our scenario, each particle’s position represents a candidate location for placing the reflective patch within a 10 × 5 grid inside the pedestrian bounding box. Each particle is initialized with a random position p i and a random velocity v i . The position p i represents the candidate location of the reflective patch within the grid, while the velocity v i determines the particle’s movement direction and speed within the grid.
We use a fitness function to evaluate the fitness of each particle. Our objective is to maximize the interference effect of the reflective patch on the detection system, thereby minimizing the detection probability of YOLOv5 for pedestrians. The objective function f ( p ) is defined as the negative value of the detection system’s confidence output C ( p ) , where p is the position of the patch in the grid:
f ( p ) = C ( p ) .
Each particle updates its velocity and position based on its personal best position ( p i b e s t ) and the global best position ( p g l o b a l ). The update equations are as follows:
v i ( t + 1 ) = ω v i ( t ) + c 1 r 1 p i b e s t ( t ) p i ( t ) + c 2 r 2 p g l o b a l ( t ) p i ( t ) ,
p i ( t + 1 ) = p i ( t ) + v i ( t + 1 ) ,
where ω is the inertia weight that controls the influence of the previous velocity, c 1 and c 2 are cognitive and social constants, and r 1 and r 2 are random numbers between 0 and 1.
After updating the positions, the fitness of each particle is re-evaluated. In standard PSO, if a particle’s current position has a better fitness than its previous best position, it is updated as p i b e s t . Similarly, if any particle’s current position is better than the global best position, it is updated as p g l o b a l . In this paper, we modify this approach by recording and updating the best K positions rather than just one.
The implementation details are outlined in Algorithm 1. The ‘run pso’ algorithm initializes the PSO optimizer, setting the number of particles and grid dimensions. It then configures the optimization process according to defined parameters. The algorithm proceeds by running the PSO optimizer, iteratively adjusting candidate solutions to minimize the fitness function. The optimization process involves evaluating various bounding box configurations to identify the best-performing solution. Once optimization is complete, the algorithm ranks the results based on scores and selects the highest-scoring grids. These grids are considered the optimal bounding box candidates according to the optimization criteria.
Algorithm 2 defines the ’fitness function’, which assesses the quality of candidate solutions by calculating fitness scores. For each candidate solution, the function generates images with simulated patches and uses YOLO for object detection to evaluate the performance of the chosen bounding boxes. YOLO returns not only confidence scores but also corresponding bounding box locations, as there may be multiple pedestrians in the image. We determine the confidence value corresponding to the deployed pedestrian by calculating the intersection over union (IoU) between the detection boxes and candidate boxes, using this value as the output of the fitness function.
Algorithm 1 Run PSO
1:
Input: Number of particles n u m _ p a r t i c l e s , Number of iterations n u m _ i t e r a t i o n s , List of grids g r i d s , Image i m a g e , Bounding box b b o x 2
2:
Output: Best grids
3:
Initialize PSO optimizer with parameters
4:
optimizer ← InitializePSO( n u m _ p a r t i c l e s , l e n ( g r i d s ) )
5:
best_cost, best_pos ← optimizer.optimize(fitness_function, iters=num_iterations, image=image, grids=grids, bbox2=bbox2)
6:
sorted_indices ← argsort(best_pos)
7:
best_grids ← list of grids indexed by sorted_indices[:3]
8:
return best_grids
Algorithm 2 Fitness Function for PSO
  1:
Input: List of candidates c a n d i d a t e s , Image i m a g e , List of grids g r i d s , Bounding box b b o x 2
  2:
Output: Fitness scores
  3:
Initialize empty list f i t n e s s _ s c o r e s
  4:
for each candidate in c a n d i d a t e s  do
  5:
    selected_grids ← list of top K grids based on candidate scores
  6:
    highlighted_image ← HighlightImage(image, selected_grids)
  7:
    Save highlighted_image to temporary file
  8:
    score, pos ← RunYOLODetection(temporary file)
  9:
    score_list ← empty list
10:
    if score is not empty then
11:
        for each (score1, pos1) in zip(score, pos) do
12:
            nobbox, bbox1 ← ConvertYOLOToBBox(pos1, width, height)
13:
            iou ← ComputeIoU(bbox1, bbox2)
14:
            Append ( s c o r e 1 ) to score_list
15:
        end for
16:
    end if
17:
    Append score_list to f i t n e s s _ s c o r e s
18:
end for
19:
return f i t n e s s _ s c o r e s

5. Experiments

5.1. Experimental Setup

5.1.1. Models and Datasets

We conducted comprehensive experiments to validate the effectiveness of our proposed adversarial attack on state-of-the-art pedestrian detection models. To demonstrate the generality of the adversarial attack across various pedestrian detection algorithms, we selected four representative methods from different categories: the single-stage RetinaNet and YOLOv3, and the multi-stage faster R-CNN and cascade R-CNN.
Considering our method is illumination-related, we set up two datasets to train the pedestrian detection models to show that our attack remains effective even for models trained under different lighting conditions. As a baseline dataset, we selected the widely used COCO dataset for computer vision tasks. The COCO dataset is a large-scale, rich object detection, segmentation, and captioning dataset, providing 80 categories with over 330,000 images, 200,000 of which are annotated, and more than 1.5 million object instances in total. Additionally, we created an augmented dataset, ARG, which includes 200 low-light samples and 200 samples simulating night-time car headlights. This dataset was used to fine-tune the pedestrian detection model on top of the COCO dataset.
To evaluate the performance of the proposed method, we conducted tests on two datasets. Firstly, we captured night-time pedestrian videos in real-world scenarios, from which we extracted a total of 240 frames. These images covered various locations and pedestrians. To demonstrate the broad applicability of the proposed method, we also included the KITTI dataset, a widely used computer vision dataset that provides images under different weather conditions and times, offering additional pedestrian features. For this purpose, we selected 400 pedestrian-related images from the KITTI dataset and conducted attacks based on these images.

5.1.2. Evaluation Metric

To evaluate the impact of our reflective adversarial attack on pedestrian detectors, we used two evaluation metrics: average recall (AR) and attack success rate (ASR).
To evaluate the impact of our attack method on pedestrian detectors, we used average recall (AR) and attack success rate (ASR) as our assessment metrics. Specifically, AR is actually [email protected]:0.95, which measures the recall capability of object detection models across various intersection over union (IoU) thresholds, making it particularly suitable for adversarial attack scenarios. The mathematical expression for [email protected]:0.95 is as follows:
AR @ 0.5 : 0.95 = 1 N i = 1 N Number of Correctly Detected Targets IoU i Number of Actual Targets × 100 % ,
where N is the number of IoU thresholds.
Attack success rate (ASR) is defined as the proportion of adversarial examples that successfully mislead the model among all generated adversarial examples. It quantifies the effectiveness of the attack by measuring how well the designed adversarial inputs cause the model to make incorrect predictions. The mathematical expression for ASR is as follows:
ASR = N success N total × 100 % ,
where N success is the number of adversarial examples that successfully induce misclassification, and N total is the total number of adversarial examples generated. A higher ASR indicates a more effective adversarial attack, as it demonstrates that the attack significantly reduces the model’s detection capability.

5.1.3. Implementation Details

To determine the optimal locations for deploying reflective adversarial patches, we targeted the YOLOv5 detection model, trained on the COCO dataset, to identify K positions that effectively reduce confidence. In our experiments, we used four pedestrian detection models as primary targets: the single-stage RetinaNet and YOLOv3, and the multi-stage faster R-CNN and cascade R-CNN. These models were trained on the COCO and ARG datasets, with detectors trained for 10 epochs using stochastic gradient descent (SGD) with a learning rate of 0.001, momentum of 0.9, and weight decay of 0.0001. All experiments were conducted on a server equipped with an NVIDIA GeForce RTX 3080 graphics processing unit (NVIDIA Corporation, Beijing, China).

5.2. Attack Effectiveness

Existing adversarial attack methods for pedestrian detection models primarily assume that daytime lighting conditions and conspicuous adversarial examples are likely to attract the attention of road observers. In this paper, we introduce reflective materials as adversarial examples and propose a stealthy attack method effective for night-time vehicle–pedestrian detection systems. Figure 2 provides a visual comparison of our proposed attack with other methods under normal lighting, low-light conditions, and night-time headlight illumination. To evaluate the effectiveness of the proposed attack model, we conducted a comprehensive assessment. First, we trained pedestrian detection models on the COCO dataset and the ARG dataset, which features varying lighting conditions, to investigate the impact of the reflective adversarial attack on detection performance. Second, we selected four detection models, including both single-stage and multi-stage architectures, to explore the effect of our attack on different detection frameworks. Additionally, to demonstrate the broad applicability of the proposed method, we not only conducted thorough testing on our own dataset but also extended our evaluation to the KITTI dataset. The KITTI dataset provides a wider range of environmental conditions and pedestrian features, thereby ensuring the comprehensiveness and accuracy of the evaluation results. Specifically, we set the number of reflective patches to 13 and assessed the impact of the reflective adversarial attack on single-stage models YOLOv3 and RetinaNet, as well as multi-stage models faster R-CNN and cascade R-CNN.
Table 1 shows the results of different models based on various training datasets in terms of the attack success rate (ASR). As shown in the table, the reflective adversarial attack affects different detection models. When applied to the test dataset we created, the attack success rate for RetinaNet exceeded 80%. After applying targeted adversarial training to enhance the pedestrian detection models, the attack success rate generally decreased but still remained above 20% across all models. Notably, RetinaNet maintained a 67% success rate under these conditions. Furthermore, when evaluating the KITTI dataset, the attack success rate for RetinaNet reached 93.6%, while other models also maintained success rates above 20%. This result further confirms the effectiveness and robustness of our method.
In Section 4.4, we introduced our modified particle swarm optimization algorithm, which selected the optimal K positions for deploying reflective patches. To explore the optimal balance between efficiency and cost in terms of the number of reflective patches, we conducted experiments analyzing the relationship between the number of patches, average recall (AR), and attack success rate (ASR). We trained different pedestrian detection models on the COCO dataset, varied the K value in the optimization algorithm, which corresponds to the number of reflective patches, and measured both the ASR and AR of the models. The experimental results are shown in Figure 3. Figure 3a,b illustrate the impact of reflective adversarial attacks with varying K values on RetinaNet, YOLOv3, faster R-CNN, and cascade R-CNN, respectively. The figure shows that as the number of reflective patches increases, the model’s AR gradually decreases while the attack success rate (ASR) steadily rises. Notably, our attack is particularly effective against single-stage RetinaNet and YOLOv3 models, with the ASR exceeding 90%.

5.3. Ablation Study

In previous experiments, the use of reflective materials has been shown to effectively interfere with the performance of detection models. In Section 4.3, we discussed the comparison and selection of different reflective materials. However, the reflective effect of these materials is not only dependent on their intrinsic physical properties but may also be influenced by the color of the clothing worn by the individual. On the one hand, different colors of clothing may affect the intensity and spectral characteristics of the reflected light from the reflective materials. For example, light-colored clothing may enhance the reflective effect of the materials, while dark-colored clothing may reduce the intensity of the reflected light. On the other hand, light-colored clothing provides less contrast with the reflective material illuminated by vehicle headlights, while dark-colored clothing offers stronger visual salience with the illuminated reflective material. Thus, exploring the impact of different shirt colors on reflective materials helps in understanding how to select appropriate shirt colors to optimize the reflective effect in practical applications.
In our experiments, we investigated the impact of clothing color on our reflective adversarial attack using six different colors: white, blue, red, green, gray, and black. Specifically, we collected 50 images of pedestrians wearing each color of clothing and deployed 12 reflective patches on each pedestrian’s upper garment. We then conducted attacks on both the single-stage RetinaNet and multi-stage cascade R-CNN models. The attack success rates are illustrated in Figure 4. The results indicate that clothing color does affect the success rate of the reflective adversarial attack. Notably, deploying reflective materials on black clothing yielded the highest attack success rate for the RetinaNet model, while deploying them on gray clothing achieved the highest attack success rate for the cascade R-CNN model.

6. Conclusions

In this paper, we proposed an adversarial attack method for night-time vehicle–pedestrian detection systems. By leveraging the optical properties of microprism reflective materials, we deployed adversarial examples on pedestrian clothing to reflect vehicle headlights and disrupt detection systems. In the digital domain, we simulated the reflective material using white patches overlaid with light sources and employed a particle swarm optimization (PSO) algorithm to identify the optimal deployment locations. The experimental results demonstrate that our attack method is effective against various detection models, with the single-stage detection model RetinaNet being the most susceptible, achieving an attack success rate of over 80% on our dataset. Even when applied to single-stage detection models that have undergone targeted data augmentation training, the attack success rate still reaches 60%. On the KITTI dataset, our method achieves a 93.6% success rate against RetinaNet, with success rates exceeding 20% for all detection models, further confirming the broad applicability of our approach.

Author Contributions

Conceptualization, Y.C., Y.W., X.C. and Q.L.; methodology, Y.C., Y.W., X.C. and Q.L.; validation, Y.C., Y.W., X.C. and Q.L.; formal analysis, Y.C., Y.W., X.C. and Q.L.; investigation, Y.C., Y.W., X.C. and Q.L.; resources, Y.C., Y.W., X.C. and Q.L.; data curation, Y.C., Y.W., X.C. and Q.L.; writing—original draft preparation, Y.C.; writing—review and editing, Y.C., Y.W., X.C., Q.L., J.L. and W.N.; visualization, Y.C., Y.W., X.C. and Q.L.; supervision, J.L. and W.N.; project administration, J.L. and W.N.; funding acquisition, J.L. and W.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Central Funds Guiding the Local Science and Technology Development under Grant No. 236Z0806G, the Fundamental Research Funds for the Central Universities under Grant No. 2023JBMC055, the National Natural Science Foundation of China under Grant No. 62372021, the Hebei Natural Science Foundation under Grant No. F2023105005, and the Open Competition Mechanism to Select the Best Candidates in Shijiazhuang, Hebei Province, China.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Dollár, P.; Wojek, C.; Schiele, B.; Perona, P. Pedestrian detection: A benchmark. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 304–311. [Google Scholar]
  2. Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
  3. Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Xiao, C.; Prakash, A.; Kohno, T.; Song, D. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1625–1634. [Google Scholar]
  4. Thys, S.; Van Ranst, W.; Goedemé, T. Fooling automated surveillance cameras: Adversarial patches to attack person detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
  5. Xu, K.; Zhang, G.; Liu, S.; Fan, Q.; Sun, M.; Chen, H.; Chen, P.Y.; Wang, Y.; Lin, X. Adversarial t-shirt! evading person detectors in a physical world. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part V 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 665–681. [Google Scholar]
  6. Hu, Z.; Huang, S.; Zhu, X.; Sun, F.; Zhang, B.; Hu, X. Adversarial texture for fooling person detectors in the physical world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13307–13316. [Google Scholar]
  7. Zhu, X.; Li, X.; Li, J.; Wang, Z.; Hu, X. Fooling thermal infrared pedestrian detectors in real world using small bulbs. In Proceedings of the the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 3616–3624. [Google Scholar]
  8. Marini, F.; Walczak, B. Particle swarm optimization (PSO). A tutorial. Chemom. Intell. Lab. Syst. 2015, 149, 153–165. [Google Scholar] [CrossRef]
  9. Wu, Y.; Xiang, Y.; Tong, E.; Ye, Y.; Cui, Z.; Tian, Y.; Zhang, L.; Liu, J.; Han, Z.; Niu, W. Improving the Robustness of Pedestrian Detection in Autonomous Driving with Generative Data Augmentation. IEEE Netw. 2024, 38, 63–69. [Google Scholar] [CrossRef]
  10. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  11. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
  12. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  13. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  14. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
  15. Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
  16. Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbrucken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar]
  17. Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial examples in the physical world. arXiv 2016, arXiv:1607.02533. [Google Scholar]
  18. Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv 2017, arXiv:1706.06083. [Google Scholar]
  19. Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
  20. Laidlaw, C.; Singla, S.; Feizi, S. Perceptual adversarial robustness: Defense against unseen threat models. arXiv 2020, arXiv:2006.12655. [Google Scholar]
  21. Kwon, H.; Kim, S. Dual-mode method for generating adversarial examples to attack deep neural networks. IEEE Access 2023, 1. [Google Scholar] [CrossRef]
  22. Kwon, H. Adversarial image perturbations with distortions weighted by color on deep neural networks. Multimed. Tools Appl. 2023, 82, 13779–13795. [Google Scholar] [CrossRef]
  23. Sharif, M.; Bhagavatula, S.; Bauer, L.; Reiter, M.K. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM Sigsac Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 1528–1540. [Google Scholar]
  24. Brown, T.B.; Mané, D.; Roy, A.; Abadi, M.; Gilmer, J. Adversarial patch. arXiv 2017, arXiv:1712.09665. [Google Scholar]
  25. Athalye, A.; Engstrom, L.; Ilyas, A.; Kwok, K. Synthesizing robust adversarial examples. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 284–293. [Google Scholar]
  26. Lovisotto, G.; Turner, H.; Sluganovic, I.; Strohmeier, M.; Martinovic, I. {SLAP}: Improving physical adversarial examples with {Short-Lived} adversarial perturbations. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual, 11–13 August 2021; pp. 1865–1882. [Google Scholar]
  27. Duan, R.; Mao, X.; Qin, A.K.; Chen, Y.; Ye, S.; He, Y.; Yang, Y. Adversarial laser beam: Effective physical-world attack to dnns in a blink. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16062–16071. [Google Scholar]
  28. Yufeng, L.; Fengyu, Y.; Qi, L.; Jiangtao, L.; Chenhong, C. Light can be dangerous: Stealthy and effective physical-world adversarial attack by spot light. Comput. Secur. 2023, 132, 103345. [Google Scholar]
  29. Zhong, Y.; Liu, X.; Zhai, D.; Jiang, J.; Ji, X. Shadows can be dangerous: Stealthy and effective physical-world adversarial attack by natural phenomenon. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 15345–15354. [Google Scholar]
  30. Cui, X.; Wu, Y.; Gu, Y.; Li, Q.; Tong, E.; Liu, J.; Niu, W. Lurking in the Shadows: Imperceptible Shadow Black-Box Attacks Against Lane Detection Models. In International Conference on Knowledge Science, Engineering and Management; Springer: Berlin/Heidelberg, Germany, 2024; pp. 220–232. [Google Scholar]
  31. Tsuruoka, G.; Sato, T.; Chen, Q.A.; Nomoto, K.; Tanaka, Y.; Kobayashi, R.; Mori, T. WIP: Adversarial Retroreflective Patches: A Novel Stealthy Attack on Traffic Sign Recognition at Night. Available online: https://www.ndss-symposium.org/wp-content/uploads/vehiclesec2024-25-paper.pdf (accessed on 15 August 2024).
  32. Zhang, H.; Wang, J. Towards adversarially robust object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 421–430. [Google Scholar]
  33. Brendel, W.; Rauber, J.; Bethge, M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv 2017, arXiv:1712.04248. [Google Scholar]
  34. Dollar, P.; Wojek, C.; Schiele, B.; Perona, P. Pedestrian detection: An evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 743–761. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The main pipeline of the reflective adversarial attack. We trained our models using both standard pedestrian detection datasets and argument datasets, simulated the behavior of reflective materials in a digital environment, optimized the patch placement using a particle swarm optimization algorithm, and ultimately, validated the impact of the reflective materials on pedestrian detection in real-world scenarios.
Figure 1. The main pipeline of the reflective adversarial attack. We trained our models using both standard pedestrian detection datasets and argument datasets, simulated the behavior of reflective materials in a digital environment, optimized the patch placement using a particle swarm optimization algorithm, and ultimately, validated the impact of the reflective materials on pedestrian detection in real-world scenarios.
Symmetry 16 01262 g001
Figure 2. Examples of Xu, Thys, and our methods under different lighting conditions.
Figure 2. Examples of Xu, Thys, and our methods under different lighting conditions.
Symmetry 16 01262 g002
Figure 3. AR and ASR of four pedestrian detection models trained on the COCO dataset under reflective adversarial attacks with varying numbers of reflective patches.
Figure 3. AR and ASR of four pedestrian detection models trained on the COCO dataset under reflective adversarial attacks with varying numbers of reflective patches.
Symmetry 16 01262 g003
Figure 4. ASR of two pedestrian detection models under reflective adversarial attacks with reflective patches deployed on different colored clothing across various pedestrian detection models.
Figure 4. ASR of two pedestrian detection models under reflective adversarial attacks with reflective patches deployed on different colored clothing across various pedestrian detection models.
Symmetry 16 01262 g004
Table 1. Attack effectiveness of single-stage and two-stage pedestrian detection systems with ASR.
Table 1. Attack effectiveness of single-stage and two-stage pedestrian detection systems with ASR.
Model Single-StageTwo-Stage
Train Dataset RetinaNetYOLOv3Faster R-CNNCascade R-CNN
COCO (Ours Test)83.2%36.7%37.7%34.0%
COCO + ARG (Ours Test)67.0%21.1%29.2%25.1%
COCO + ARG (KITTI Test)93.6%23.6%30.0%26.9%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Wu, Y.; Cui, X.; Li, Q.; Liu, J.; Niu, W. Reflective Adversarial Attacks against Pedestrian Detection Systems for Vehicles at Night. Symmetry 2024, 16, 1262. https://doi.org/10.3390/sym16101262

AMA Style

Chen Y, Wu Y, Cui X, Li Q, Liu J, Niu W. Reflective Adversarial Attacks against Pedestrian Detection Systems for Vehicles at Night. Symmetry. 2024; 16(10):1262. https://doi.org/10.3390/sym16101262

Chicago/Turabian Style

Chen, Yuanwan, Yalun Wu, Xiaoshu Cui, Qiong Li, Jiqiang Liu, and Wenjia Niu. 2024. "Reflective Adversarial Attacks against Pedestrian Detection Systems for Vehicles at Night" Symmetry 16, no. 10: 1262. https://doi.org/10.3390/sym16101262

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop