Advertisement

Towards Evaluating the Robustness of Neural Networks论文解读集合

阅读量:

https://lifengjun.xin/2020/03/14/【论文笔记】-Towards-Evaluating-the-Robustness-of-Neural-Networks/

下面这个视频是作者本人的oral presentation ,讲的是论文思路,很不错。

https://www.bilibili.com/video/av884481653/

首先是对生成对抗样本的过程进行表达,并试图将其表述为一个凸优化问题并进行求解。

然后将C(x+^) = t 转化为 分类为t的的损失函数最小

然后是构建损失函数的过程,如何构建损失函数呢?

文章提出了7种不同的构造方式。

接着是如何评价论文提出的攻击呢?

1、和之前已有的攻击进行攻击效果对比

该研究团队在针对现有防御机制的研究中取得了重要进展:他们开发出了一种能够有效绕过去年2016年提出的蒸馏防御技术的新方法,并且这种新方法在引入的干扰度上表现优异。

接着作者探讨了在大家构建一种新的防御机制时如何判断该防御措施的有效性。

1、release source code

2、evluate against the strongest attacks as a baseline

Q&A:

Please evaluate the cost-cutting overhead associated with generating the attacks we discussed earlier.

(谈谈降低攻击成本的技术)

The method I use is not particularly fast; on average, it might take approximately 30 seconds per minute to produce an attack.

Several systems approach the generation of attacks more quickly, yet the outcomes are either four or approximately five times worse, contingent on which factor may offer a workaround for exports. This largely hinges on one’s objective. Online systems, however, must prioritize time-sensitive considerations.

2、How do you choose the pixels that changes the values?

There are multiple methods to access or utilize the image of adjusted cloaks, which we examine or analyze for three distinct distance metrics.

one of them i show you is L0.

By means of this approach, enabling you to modify every pixel and gradually reducing the number of modified pixels until only a few remain to be modified.

and in conclusion, the only action we can take is to modify using an alternative distance measure, where our objective is to retain as many pixels as possible without altering the minimal number of pixels, while making a slight modification to each individual pixel.

Your presentation, assuming a white-box approach, would you like to provide feedback on the robustness of your black-box attack?

Essentially, when employing attack methods without assuming a specific model, what one does is utilize the methodology one has already implemented and train their own model.

assuming that the defender has already trained their model, when training my own model, i hope that both models, through shared learning, will converge on identical decision boundaries.

随后我要实施的计划是在我的端模型上展开攻击。结果发现存在一种接近转移能力,使得我的模型攻击对您的模型同样有效。即便我不知道您的模型具体细节,在没有实际访问参数的情况下我也能采用这一策略来攻击您的模型

Have you evaluated the primary perturbations of the inputs you generate in black-box settings and their transfer across different models? 您是否评估过在黑色方框环境中生成的输入的主要扰动及其在不同模型之间的转移?

这些方法是否能实现转移?你实际上能够使用(基于我之前描述的方法)来控制攻击内容,并用来生成能够转移的攻击吗?为了实现有效的攻击转移,在一定程度上需要提高与吸附(或者说是干扰)有关的因素。(具体来说),这取决于你能否接近其他模型的程度以及这种干扰对防御机制依赖的程度如何。如果假设我不了解其他模型的相关信息,在这种情况下可能需要将干扰程度提高两到三倍才能达到预期效果;但如果假设我能对其他模型进行Oracle查询的话,则可以通过显著减少这种干扰来提升效果。

I notice one aspect of your technique is that it will ultimately disturb the image until a change occurs, such an extent such that there be such an extent by which we measure, and thus we can form a model.

我发现你的技术有一个明显的缺点:它的这种特性会导致图像在发生变化之前被破坏。通过这种方式我们可以构建一个矩阵,并根据这个矩阵确定需要返回的图像数量。从而最终能够建立一个模型

是的,在模型中存在两种类型:一种是那些能够轻易成功地通过微小变化使该类特征被识别出来的攻击手段;同时也在努力最小化总的失真度;另外还有一些攻击策略表明:在这样的情况下, 我的方法借鉴了前一种策略;即始终致力于最小化总的失真度;并且如果想要进一步优化, 可以设定一个阈值, 在达到这个阈值之前报告成功, 而一旦超过则判定失败;由于我们的目标是最小化总的失真度, 因此这种方法的成功率非常高

是的,所以有两种类型的攻击,总有一点点会成功,最终只是把不同质的攻击转化成另一种类型的攻击,你只是想尽量减少总的扭曲,还有一种攻击说你只能扭曲这么多,让我们尽你所能做到最好所以我的方法是formers方法,它说所有的时间都最小化总失真,如果你想的话,你可以在任何给定的点设置一个阈值,然后说如果我做了更多的事情来报告失败,但是由于目标是最小化总的失真,所以很少失败。

全部评论 (0)

还没有任何评论哟~