Pae Attack
Reading: PAE
Trustworthy deep learning.
A survey on PAEs Attack.
Overview
Where it goes ?
The challenges are distributed in many AI application areas.
Risk comes from application.
 CV,
 NLP
 ASR
More precisely:
autodriving visionbased automatic checkout system vehicle classification and detection models ……
Where it from?
A critical question that what makes physical adversarial examples different from digital ones.
basically three :
 characterization
 generating strategy
 attacking ability
Substantially
 digitalphysical domain gap
the physical world is a complex and open environment, where it has several dynamics such as lighting, natural noises, and diverse transformations.
On the one hand, it brings attack more various, but also harder on the other.
What we do ?
A more distinct hierarchy of physical world adversarial example generation methods
understanding of physical examples
 revisit the critical particularities of physical adversarial examples under the perspective of workflow give indepth analysis in turn to induce the typical processes that might pose a great influence on adversarial examples generation.
Three important process
 adversarial example optimization process
 adversarial example manufacturing process
 adversarial example resampling process
where the last two process are specific to the physical adversarial attacks.
Classify the PAEs:
based on the summarized typical particularities and the critical attributes, with respect to identified typical processes, according to the hundreds of physical world attack studies. Backed up by the concluded attacking particularities of the key adversarial example generation processes
Give a proposed hierarchy.
Section II  Go Deep into physical adversarial examples
overview
The digital world and physical world, divide the adversarial examples into digital kinds and physical kinds.
The Key Particularities among PAEs
What makes the PAEs different from the digital ones are the particular generation processes.
manufacture the digitallytrained adversarial patterns into the physical environment of existing objects, which indicates a “virtualtoreal” process.
Key:
 manufacture technique
 manufacture carrier
 sampling environment
 sampler quality
 basic attributes
 core attributes
 epitaxial attributes
Definition of PAEs
adversarial examples $$ y^x \ne \mathcal{F}(x_{adv}^{d}),x_{adv}^{d}=x + \delta $$
where $y^x$ is the groundtruth label of the input instance $x,\delta$ indicates the adversarial perturbation, and it satisfes $\delta<\varepsilon$ (ε is a small enough radius and bigger than 0).
Things changed in physical world.
modified definition into physical world
$$ y^x\neq\mathcal{F}(x_{adv}^p),\quad s.t.,\quad \Vert x_{adv}^p\Vert _ \aleph<\varepsilon, \\ x_{adv}^p=x+\mathcal{R}(\mathcal{M}(\delta),c), $$
 $x_{adv}^p$ physical adversarial example
 $\mathcal{R}(\cdot)$ resampling function that represents the resampling process
 $\mathcal{M}(\cdot)$ manufacturing function that represents the manufacturing process
 $c$ a certain environment condition and comes from the real and infinite environment conditions that are denoted as $\mathbb{E},i.e.$, $c\in\mathbb{E}$
 the $\cdot_\aleph$ represents the evaluation metric that measures the naturalness of the PAE that input to the deployed artificial intelligence system
 $\aleph$ indicates the recognizable space of human beings to the PAEs.
where $x_{adv}^p$ is the input physical adversarial example to the deployed deep models, $\mathcal{R}(\cdot)$ is the resampling function that represents the resampling process, $\mathcal{M}(\cdot)$ is the manufacturing function that represents the manufacturing process, $c$ is a certain environment condition and comes from the real and infinite environment conditions that are denoted as $\mathbb{E},i.e.$, $c\in\mathbb{E}$, the $\cdot_\aleph$ represents the evaluation metric that measures the naturalness of the PAE that input to the deployed artificial intelligence system, $\aleph$ indicates the recognizable space of human beings to the PAEs.
To be brief, the $\aleph$ constraint imposed on the $\delta$ correlates to the **“suspicious” **extent of PAEs. More precisely, a very perceptible adversarial perturbation is not accepted in real scenarios.
Section III .Classify PAEs
 manufacturing processoriented ones
 resampling processoriented ones
 others ( aim at naturalness, transferability )
The Manufacturing Process Oriented PAEs
principles：
 materialdriven
 taskdriven
Our categories
 touchable attacks
 untouchable attacks
where the former indicates that the generated adversarial examples could be touched by hands and the latter could not.
Touchable Attacks:
2D attacks
[22] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “Accessorize to a crime: Real and stealthy attacks on stateoftheart face recognition,” in Proceedings of the 2016 acm sigsac conference on computer and communications security, pp. 1528–1540, 2016.
smoothness and practicability
Optimize process
Total Variation Loss (using totalvariation norm) $$ L_{tv}=\sum_{i,j}\sqrt{\left(p_{i+1,j}p_{i,j}\right)^2+\left(p_{i,j+1}p_{i,j}\right)^2}. $$ Loss function $L_{tv}$ .
practicability
NonPrintability Score
Loss function $NPS(\hat{p})$ . $$ NPS(\hat{p})=\prod_{p\in P}\hat{p}p. $$
The form of this PAEs is kind of simple.
[23] J. Lu, H. Sibai, and E. Fabry, “Adversarial examples that fool detectors,” arXiv preprint arXiv:1712.02494, 2017. demonstrated a minimization procedure to create adversarial examples that fool Faster RCNN in stop sign and face detection tasks. However, due to the restrictive environmental conditions, this adversarial attack did not perform well in the physical world
[24] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in 5th International Conference on Learning Representations, pp. 24–26, 2017.
demonstrated the possibility of crafting adversarial examples in the physical world by simply manufacturing printout adversarial examples, resampling them by a cellphone camera, and then feeding them into an image classification model.
[4] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song, “Robust physicalworld attacks on deep learning visual classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1625–1634, 2018.
first generated adversarial perturbations in the physical world against road sign classifiers and proposed the Robust Physical Perturbations (RP2) algorithm. By optimizing and manufacturing whiteblack bock perturbation, the authors successfully attacked the traffic sign recognition model.
Robust Physical Perturbations (RP2) algorithm.
[30] D. Song, K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tram`er, A. Prakash, and T. Kohno, “Physical adversarial examples for object detectors,” in 12th USENIX Workshop on Offensive Technologies (WOOT 18), (Baltimore, MD), USENIX Association, Aug. 2018.
extended the RP2 algorithm to object detection tasks and manufactured colorful adversarial stop sign posters.
[25] M. Lee and Z. Kolter, “On physical adversarial patches for object detection,” arXiv preprint arXiv:1906.11897, 2019. first proposed an adversarial patchattacking method that could successfully attack detectors without having to overlap the target objects
[26] S. Thys, W. V. Ranst, and T. Goedeme´, “Fooling automated surveillance cameras: Adversarial patches to attack person detection,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 49–55, 2019.
first generated physical adversarial patches against pedestrian detectors by optimizing a combination of adversarial objectness loss, TV loss, and NPS loss.
Robust Physical Perturbations (RP2) algorithm.
New Framework
previous modify adversarial examples in the perturbation process to meet additional objectives
This framework proposed a general framework to generate diverse adversarial examples. The authors utilized GANs and constructed adversarial generative nets (AGNs), which are flexible to accommodate various objectives,
e.g., inconsciousness, robustness, and scalability.
[31] T. Malzbender, D. Gelb, and H. Wolters, “Polynomial texture maps,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 519–528, 2001.
The authors leveraged the Polynomial Texture Maps approach [31] to get eyeglasses’ RGB values under specific luminance. By using this framework, the authors constructed adversarial eyeglasses and fooled classifiers for face recognition.
SNPS
[32] D. Wang, C. Li, S. Wen, Q.L. Han, S. Nepal, X. Zhang, and Y. Xiang, “Daedalus: Breaking nonmaximum suppression in object detection via adversarial examples,” IEEE Transactions on Cybernetics, 2021.
3D Attacks
Different 3D attack scenes.
In 2D patches is a good idea, but the spatial transformations are quite different from those of a real scene.
UPC Universal Physical Camouflage
Technique and carrior
 nonrigid or nonplanar objects.
 flex or inflex surface
 changing , e. g . facing the color/texture fading
 specific shapes e. g.
 partial cover
optimization process
Let $f$ be an attack loss for misdetection, $g$ be the totalvariation norm that enhances perturbations’ smoothness,
then the optimization process can be formulated as:
$$ \min \sum_i\mathbb{E_{t,t_{TPS},v}}[f(x_i,\delta)]+\lambda g(\delta) $$
where $\mathbb{E}$ denotes environment conditions containing a TPS transformation $t_{TPS}\in\mathcal{T_{TPS}}$, a conventional transformation $t\in\mathcal{T}$ and a Gaussian noise $v.$ Considering the difference between flexible and rigid materials, Hu $et.al.[29]$ utilized the toroidal cropping method to manufacture arbitrary length and expandable adversarial texture.
TPS  Thin Plate Spline
Untouchable attack
The untouchable attacks consist of lighting attacks and audio/speech attacks.
Lighting attack

placing a spactial light , e. g. programmable LED

spatial light modulator, such as SLM in front of the photographic

modify human nonsensitive optical parameters

add easily overlooked shadow, projection in special shape
All optimized.
The perturbation generation process can be formulated as follows within the context of $\mathcal{T}$ modeling environment conditions $\mathbb{E}$, which also correlates to $\mathcal{R}(\cdot)$ mentioned previously, during the resampling process. Let $I_{amb}$ represent the image captured under ambient light conditions, $I_{sig}$ denote the image taken under the influence of fully illuminated attackercontrolled lighting, and $g(y+\delta)$ indicate the average impact of the signal on row $y{:}$ $$ x_{adv}^p=\mathcal{T}(I_{amb})+\mathcal{T}(I_{sig})\cdot g(y+\delta). $$
Audio/Speech attacks
Speech recognition is a task to transcribe the audio/speech into text, which is then used to control the system, such as the audio assistant in mobile phones and automatic driving. Currently, some researchers develop overtheair attacks against the deployed speech recognition system by playing the audio, impulse, and so on.
To solve the instability issues during backpropagation in the frequency domain, the author used the Discrete Fourier Transform (DFT), allowing them can perform attacks in the time domain as the symmetry properties of the DFT after the perceptual measures are extracted from the original audio.
The loss function of the manufacturing process can be summarized as:
$$ \mathcal{L}(x,\delta,y)=\mathbb{E_{t\in\mathcal{T}}}[\mathcal{L_{net}}(f(t(x+\delta)),y)+\alpha\cdot\mathcal{L_{\theta}}(x,\delta)] $$
where the former term and the latter term refer to the robustness loss and imperceptibility loss, respectively.
 Voice assistants, KarplusStrong algorithm
 voicecontrollable device
 DNNbased speaker recognition system
 ……
The resampling processoriented ones
Shift from digital to physical with loss.
After finishing manufacturing, the physical adversarial examples will take effect by being resampled and input into the deployed deep models in real artificial systems. And during this process, some of the key information correlated to the adversarial characteristics inside the PAEs might be affected and cause certain attacking ability degeneration due to the imperfect resampling, which could be also called physicaldigital domain shifts. More precisely, this kind of physicaldigital shift consists of 2 types as shown in Figure 9, i.e., the environmentcaused and samplercaused, therefore motivating us to categorize the resampling processoriented PAEs into environmentoriented attacks and sampleroriented attacks.
Environmentoriented Attacks
Interference by natural factors:
 environmental lights (CV)
 environmental noises (ASR)
 ……
The physical attack performance is significantly impacted by environmental factors, such as light and weather, which motivates the researcher to take these factors into account during the optimization of PAEs. Du et.al. [75] proposed the physical adversarial attack for aerial imagery object detector, avoiding remote sensing reconnaissance. They design the tools for simulating resampling differences caused by atmospheric factors, including lightning, weather, and seasons. Finally, they optimize the adversarial patch by minimizing the following loss function:
$$ \mathcal{L}=\mathbb{E_{t\in\mathcal{T}}}[\max(\mathcal{F}^b(t(x_{adv}^d)))]+\lambda_1\mathcal{L_{nps}}(\delta)+\lambda_2\mathcal{L_{tv}}(\delta) $$
where the first term is the adversarial loss to suppress the maximum prediction objectness score over the transformation distribution T , the second term ensures the optimized color is printable, and the last term is used to ensure the naturalness of the adversarial patch. $$ \min \mathcal{L} $$
Import DTN
 brightness, contrast, color
 shadow
Thus, the author devised a differentiable transformation network (DTN) to learn potential physical transformations (e.g., shadow). Once DTN is trained, the author optimizes the robust adversarial texture for the vehicle via DTN.
With light attack
As we mentioned above, the adversarial LED light attack [49] also concerns the environment inside the attacking scenario. During the perturbation generation process, they propose the function T , which can be regarded as the R(·) in our definition, to model environment conditions (including viewpoint and lighting changes). In this way, they take the environmental variation during resampling into account to preserve the attacking ability and cross the digitalphysical domain.
For physical characteristics of voice
To alleviate the potential distortion caused by the environment, a line of works [16], [63], [66], [68], [69] has adopted the room impulse response (RIP) to mimic distortion caused by the process of the speech being played and recorded, which can be expressed as: $$ x_{adv}^d(t):r(t)=y_{adv}^p(t)*x_{adv}^d(t), $$
where the $x_{adv}^d(t)$ is the audio clip, and the $y_{adv}^p(t)$ is the corresponding estimated recorded audio clip, * denotes the convolution operation. Then, RIP $r(t)$ incorporates the generation of $\delta$ by a transform $T(x)=x*r$, which reduces the impact of distortion brought by hardware and physical signal patch, significantly improving speech physical attack robustness.
Sampleroriented Attacks
Sampler waste adversarial information.
The case in point is sampling angles in computer vision tasks, when taking photos from different perspectives, the sampled instances might show slight differences in shape and color, e.g., affine transformationlike difference, and overexpose.
To confront the view perspective change in the physical world, Athalye et.al. [37] formulated the potential physical transformations (e.g., rotation, scale, resize) as a uniform formal that is the expectation over transformation (EOT), which is mathematically denoted as follows.
$$ \delta=\mathbb{E_{t\thicksim\mathcal{T}}}[d(t(x_{adv}^d),t(x))]. $$
The above formula is designed to alleviate the data domain gap caused by transformation, enhancing the robustness of the adversarial texture.
Useful tools function imported
To keep imperceptible, adversarial after the transformation.
Specifically, they utilized the mask to constrain the perturbation to be located in the traffic sign area, and the position of the perturbation is optimized by imposing the L1 norm. The above optimization can be expressed as: $$ \arg \min_\delta \lambda \Vert \delta\Vert_p+\mathcal{L_{nps}}(\delta)+\mathbb{E_{x\sim X^V}} \mathcal{L} (\mathcal{F}(x+t(\delta)),y), $$
where the first term is used to bound the norm of δ for the patch’s imperceptible, the third term takes into account the transformation inside in x and applies the same transformation on δ and the $X^V$ includes the digital and physical collected training dataset.
To mimic the perspective changing in the physical world as possible
The adversarial UV is wrapped over the vehicle by changing the camera position and rendered into multiview images. Thus, the adversarial UV texture is trained to optimize the following object function $$ \arg\min_\delta\mathbb{E_{x\sim X,e\sim\mathbf{E}}}[\frac1n\sum_{p_i\in P}\mathcal{L}(\mathcal{F}(x_{adv}^d,p_i),y)], $$
where E denotes the environment condition determined by the physical render, such as different viewpoints and distances; P indicates the output proposals of each image respective to the twostage detector (e.g., Faster RCNN).
Adversarial viewpoints
Recently, Dong et.al. [87] demonstrated that there exist adversarial viewpoints, where images captured under such viewpoints are hard to recognize for DNN models. They leveraged the Neural Radiance Fields (NeRF) technique to find the adversarial viewpoints. Specifically, they find the adversarial viewpoints by solving the following problem $$ \max_{p(v)} \set{ \mathbb{E_{p(v)}}[\mathcal{L}(\mathcal{F}(\mathcal{G}(v)),y)]+\lambda\cdot\mathcal{H}(p(v)) } $$ where $p(v)$ denotes the adversarial viewpoints distribution $\mathcal{G}(v)$ is the render function of NeRF, which renders an image with the input viewpoints; $\mathcal{H}(p(v))=\mathbb{E_{p(v)}}[\log(p(v))]$ is the entropy of the distribution of $p(v).$
To alleviate the influence of deformable.
Xu et.al. [14] took the Think Plate Spline (TPS) [88] method into account when optimizing the wearable adversarial patch to model the topological transformation from texture to cloth caused by body movement. Specifically, they construct the adversarial examples as following $$ x_{adv}^d=t_{env}(A+t(BC+t_{color}(M_{c,i}\circ t_{TPS}(\delta+\mu v))), $$ where $t_{env}\in\mathcal{T}$ indicates the environmental brightness transform, $t_{color}$ is a regression model that learns the color covert between the digital image and its printed counterpart, $t_{TPS}$ denotes the TPS transform; $A$ is the background region expect the person, B is the personbounded region, and C is the cloth region of the person, $v\in\mathcal{N}(0,1)$ to improve the diversity of perturbation.
Other PAE Topics
The Natural Physical Adversarial Attacks
Physical adversarial attacks often prioritize achieving high performance by ignoring the extent of modifications made to adversarial patches or camouflages. However, noticeable alterations can alert potential victims, leading to the failure of the attack. To address this, research has concentrated on creating subtle perturbations that can be deployed in realworld scenarios without detection, enabling natural physical adversarial attacks. The primary techniques employed in this area are divided into two categories:

Optimizationbased Methods: These methods focus on refining individual adversarial examples to ensure that they are imperceptible while still effective in attacking the target model.

Generative Modelbased Methods: In contrast to optimizationbased methods, these approaches operate within the latent space of generative models that are trained on data. They leverage the learned distribution to generate adversarial examples that are both effective and difficult to detect.
The goal of this research is to develop adversarial attacks that maintain a natural appearance, increasing their stealthiness and likelihood of success when deployed against realworld systems.
Optimizedbased methods
Introduce another metric function.
Initially, researchers attempted to make adversarial patches look like a particular benign patch to expose the security problems of deep learning models. A classical method applies the totalvariation optimization objective mentioned in the previous sections, which improves the naturalness of the adversarial example in addition to improving the printability Duan $et.al.$ propose AdvCam [91] that minimizes $\mathcal{L_s}$, $\mathcal{L_c}$ , $\mathcal{L_{tv}}$ . The naturalness loss can be formalized as: $$ \mathcal{L_{\text{nature}}}=\mathcal{L_s}+\mathcal{L_c}+\mathcal{L_{tv}}. $$
 the style distance $\mathcal{L_s}$ between the patch and the referenced image
 the content distance $\mathcal{L_c}$ between the patch and the background
 maximizes the smoothness loss defined by the totalvariation loss $\mathcal{L_{tv}}$
Generative modelbased methods
In general, the generative method can be formulated as: $$ \mathcal{L_{natural}}=\mathbb{E_{x\sim P_{real},y\sim P_{adv}}}(\mathcal{D}(x,y)), $$ where $x\sim P_{real}$ are real data sampled from the training dataset, $P_{adv}$ is the distribution generated by the attack model $G_{\theta}(P_{real})$, and $\mathcal{D}(\cdot,\cdot)$ is the predefined (in VAE models) or adversarially learned (in GAN models) distance metric Specifically, $\tilde{\mathcal{D} } ( x, y) =  \log ( D_\theta ( x) )  \log ( 1 D_\theta ( y) )$ in the vanilla GAN, where $D_\theta(\cdot)$ is the adversarially trained discriminator network.
The Transferable Physical Adversarial Attacks  transferability
The transferability of PAE measures whether the adversarial examples are highly aggressive across models.
Previous work on adversarial attacks in the digital world has shown that the same adversarial sample can exhibit generic attack capabilities for different deep learning models [101].
Formally, referring to Eq.(1), for the generator $\delta (x)$ trained to maximize $\mathcal{D}(y^x,\mathcal{F_{1}}(x_{adv}^p)),s.t.\Vert x_{adv}^p\Vert _{\aleph} \lt \varepsilon$, the scenario of transferable physical adversarial attacks requires that the adversarial example $\delta(x)$ be evaluated and tested on other models:
$$ \mathcal{D}(y^x,\mathcal{F_2}(x_{adv}^p)),\quad x_{adv}^p=x+\mathcal{R}(\mathcal{M}(\delta(x)),c), $$
where $\mathcal{F_1}$ and $\mathcal{F_2}$ are different models.
The Generalized Physical Adversarial Attacks  robustness
The generalization ability of physical adversarial attacks, is another key to studying the limitation of the deep learning models in the real world
In general, the generalization ability over different target objects and different transformations are two important generalization problems to consider.
Formally, referring to Eq. (1), for the generator $\delta(x)$ trained to maximize $\mathcal{D}(y^x,\mathcal{F}(x_{adv}^p))$, s. t. $x_{adv}^p_\aleph<\varepsilon $
$$ x_{adv}^p=x+\mathcal{R}(\mathcal{M}(\delta(x)),c), x\sim P_x(x), c\sim P_c(c). $$
The scenario of generalized physical adversarial attacks requires that the adversarial example $\delta(x)$ be evaluated in other data set and environment conditions, and tested in the condition of:
$$ x_{adv}^p=x+\mathcal{R}(\mathcal{M}(\delta(x)),c), x\sim P_x^{\prime}(x), c\sim P_c^{\prime}(c), $$ where $P_x$ and $P_x^{\prime}$ are different data distributions, and $P_c$ and $P_c^{\prime}$ are different environmental condition distributions.
SECTION IV . CONFRONT PHYSICAL ADVERSARIAL EXAMPLES
Threats makes necessity to protect intelligent applications.
Mainstream strategies
 dataend defenses
 modelend defenses
Defend against PAEs
We still take the three processes of PAEs as the starting point for thinking, considering the two standards of the data side and the model side, and considering possible defense means in various directions.
Dataend Defense Strategies
The dataend defense strategies aim to reduce the influence of adversarial perturbations, thus the sampled adversarial examples would be not allowed to mislead the deep models in deployed systems.
The adversarial detecting
Determining whether the input instances are adversarial. So just reject the input and evade the attack in turn.
Idea is usually simple, and the practice often has different strategies.
Summary of Adversarial Detection Methods
SentiNet (Chou et.al.)
 Detects universal adversarial patches.
 No model modifications required.
 Practical for realworld scenarios.
AdYOLO (Ji et.al.)
 Utilizes YOLO architecture with an added “patch” class label.
 Effective in detecting adversarial patches compared to standard YOLO.
TaintRadar (Li et.al.)
 Detects localized adversarial examples by identifying regions causing significant label variance.
 Demonstrates effectiveness in digital and physical environments.
Segmentation Approach (Liu et.al.)
 Trains a patch segmentor and performs shape completion to detect and remove adversarial patches from images.
PatchFeature EnergyDriven Method
 Removes deep characteristics of adversarial patches to protect detection models.
Patch Zero
 Detects and nullifies adversarial patches to mitigate their influence.
Each method addresses different aspects of detecting and mitigating adversarial attacks in machine learning models.
The adversarial denoising
This kind of defense method prevents models from being fooled by adversarial attacks at the instance level, i.e., straightly removing the injected perturbation or noises inside the adversarial examples. This kind of defense could also combine with the aforementioned adversarial detecting strategy, leading to better defending ability.
A series of results from this idea:
Summary of Adversarial Defense Methods
InstanceLevel Defense
 Goal: Prevent models from being fooled by removing perturbations or noises within adversarial examples.
 Combination: Can be combined with adversarial detection strategies for enhanced defense.
Local Gradient Smoothing (LGS) (Nasser et.al.)
 Targets physical attacks like Localized and Visible Adversarial Noise (LaVAN) and adversarial patches.
 Estimates regions with high probability of adversarial noise.
 Reduces gradient activity in these regions to correctly recognize adversarial examples.
Occlusion Method (McCoyd et.al.)
 Mitigates influence from adversarial patches by partially occluding the image around candidate patch locations.
 Considered a form of denoising by destroying adversarial patches through occlusion.
Adversarial Pixel Masking (APM)
 Defends against physical attacks, such as adversarial patches.
 Trains an adversarial pixel mask module to remove patches based on the generated mask.
Patch Zero
 Functions as a denoising strategy.
 Combines adversarial detection and denoising to tackle adversarial attacks.
These methods enhance the robustness of models by either directly removing adversarial perturbations or by combining detection and denoising techniques.
The adversarial prompting
Add information to offset the negative impact of adversarial perturbations, prompt what labels the models should truly predict via positive injections.
Summary of Adversarial Prompting Defense Methods
Adversarial Prompting
 Goal: Achieve defense by adding information to counteract the negative impacts of adversarial perturbations, prompting models towards correct predictions with positive injections.
Unadversarial Examples (Salman et.al.) [118]
 Generates textures with prompting ability in a 3D environment.
 Creates “robust objects” based on deep models' inputperturbationsensitivity.
 Provides a new approach for physical adversarial defenses.
Preemptive Robustification (Moon et.al.) [119]
 Defends against interceptandperturb behaviors in real scenarios.
 Utilizes a bilevel optimization scheme to discover robust perturbations that can be added to images.
Defensive Patch (Wang et.al.) [120]
 Preinjects positive patches into instances to aid image recognition.
 Enhances prompting intensity with strong global perceptual correlations and local identifiable patterns.
 Effective against both adversarial patches and common corruptions.
Amicable Aid (Unnamed Study) [121]
 Generates visual prompting perturbations from the underlying manifold perspective.
 Provides universal improvement for classification.
Classwise Adversarial Visual Prompting (Chen et.al.) [122]
 Addresses the noneffectiveness of universal visual prompting.
 Proposes classspecific adversarial visual prompting for enhanced effectiveness.
Angelic Patch (Si et.al.)
 Investigates visual adversarial prompting to enhance detection abilities of detectors.
These methods use various forms of positive injections and perturbations to guide models towards correct predictions, countering adversarial attacks.