【论文阅读】APMSA: Adversarial Perturbation Against Model Stealing Attacks(2023)
本研究提出了一种基于对抗性置信度扰动的防御方法(APMSA)来抵御模型窃取攻击(MUA)。攻击者通过付费查询(MLaaS)获取目标模型的置信度分布,进而重建替代模型,威胁模型隐私和业务价值。为了解决这一问题,作者提出在每个输入查询中添加精细噪声,使置信度分布接近决策边界,从而隐藏攻击者可利用的信息。该方法无需修改原模型,适合插件式部署,并通过CIFAR10和GTSRB实验验证其有效性,成功降低了被窃取模型的推理精度,同时保护了普通用户的隐私。

摘要
Training a Deep Learning (DL) model(训练深度学习模型) requires proprietary data(专有数据) and computing-intensive resources(计算密集型资源). To recoup their training costs(收回训练成本), a model provider can monetize DL models through Machine Learning as a Service (MLaaS 机器学习即服务). Generally, the model is deployed at the cloud, while providing a publicly accessible(公开访问) Application Programming Interface (API 应用程序接口) for paid queries to obtain benefits(服务查询获得好处). However, model stealing attacks(模型窃取攻击) have posed security threats(安全威胁) to this model monetizing scheme as they steal the model without paying for future extensive queries. Specifically(具体来说), an adversary queries a targeted model(查询目标模型) to obtain input-output pairs(获取输入输出对) and thus infer the model’s internal working mechanism(推断模型内部的工作机制) by reverse-engineering(逆向工程) a substitute model(替代模型), which has deprived model owner’s business advantage (业务优势)and leaked the privacy of the model(泄露了模型的隐私性). In this work, we observe(观察) that the confidence vector(置信度向量) or the top-1 confidence returned from the model under attack (MUA) varies in a relative large degree given different queried inputs(给定不同的查询输入). Therefore(因此), rich internal information(丰富的内部信息) of the MUA is leaked(泄露) to the attacker that facilities her reconstruction of a substitute model(重构替代模型). We thus propose to leverage adversarial confidence perturbation(对抗性置信度扰动) to hide(隐藏) such varied confidence distribution(置信度分布) given different queries, consequentially against model stealing attacks (dubbed as APMSA). In other words(换句话说), the confidence vectors returned now(现在返回的置信度向量) is similar for queries from a specific category(特定类别), considerably reducing information leakage(减少信息泄露) of the MUA. To achieve this objective(为了实现这一目标), through automated optimization(通过自动优化), we constructively(建设性地) add delicate noise(添加精细噪声) into per input query to make its confidence close to the decision boundary(决策边界) of the MUA. Generally(通常), this process is achieved in a similar means of crafting adversarial examples(制作对抗样本) but with a distinction(不同之处在于) that the hard label is preserved(硬标签被保留) to be the same as the queried input(与查询的输入相同). This retains the inference utility (i.e., without sacrificing 没有牺牲 the inference accuracy 推理精度) for normal users(对于普通用户) but bounded the leaked confidence information(泄露信任信息) to the attacker in a small constrained area(一个很小的约束区域) (i.e., close to decision boundary 即接近决策边界). The later(后者) renders greatly deteriorated accuracy(大大降低精度) of the attacker’s substitute model(攻击者的替代模型). As the APMSA serves as a plug-in front-end(充当插件的前端) and requires no change to(不需要更改) the MUA, it is thus generic and easy to deploy(因此它是通用的、易于部署). The high efficacy(高效) of APMSA is validated through experiments(实验验证) on datasets of CIFAR10 and GTSRB. Given a MUA model of ResNet-18 on the CIFAR10, our defense can degrade the accuracy of the stolen model by up to 15% (rendering the stolen model useless to a large extent) with 0% accuracy drop for normal user’s hard-label inference request.
基于对抗性扰动的模型窃取攻击防御机制,展现出显著的防护性能。
相关工作
模型窃取
模型窃取攻击主要采取“查询-预测-再训练”的方式。攻击者利用MLaaS提供的API服务,通过查询输入x获得预测结果y。在某些情况下,预测结果可能以概率向量的形式返回(表示x属于特定类别的置信度)或以标签形式返回(如top-1)。接着,攻击者利用收集到的输入-输出对训练出一个功能上与MUA相似的替代模型。之后,当需要进行推理服务时,攻击者只需使用替代模型即可,从而无需为后续的查询支付费用。
通常情况下,攻击者无法获取MUA系统的架构和超参数信息,但攻击者可以通过获取与训练数据集具有相似分布的公共数据集,从中选择合适的样本进行API查询。此外,针对威胁模型隐私的模型窃取攻击中,攻击者通常会利用API查询公共数据的可接受次数(通常较小),并利用MUA返回的预测结果,构建用于训练替代或被盗模型的“可转移数据集”,从而威胁到MUA的隐私安全并损害其商业价值。值得注意的是,除了允许的私下查询外,被盗模型还可以被用来生成可转移的对抗示例,以欺骗MUA[12],[13]。
model under attack (MUA):受攻击的模型
N. Papernot et al., “Practical black-box attacks on machine learning models,” in Proceedings of AsiaCCS, Apr. 2017, pp. 506–519.
X. Liu et al., “ATMPA: A framework for attacking machine learning-based malware visualization detection methods using adversarial examples,” in Proceedings of IWQoS, Jun. 2019, pp. 1–10.
防御策略

方法

论文链接
APMSA: Adversarial Perturbations for Defending Against Model Stealing Attacks
