Recent Artificial Intelligence (AI) achievements have been depending on the availability of massive amount of labeled data. AlphaGo (Silver et al. 2016) uses 30 millions of moves from 160,000 actual games.The ImageNet dataset (Deng et al. 2009) has over 14 million images.
人工智能在大量有标签地数据下发展地迅速,但是现实中
数据质量差(垃圾数据, 数据少标签,数据无标签)
数据分散
数据隐私保护的需要
使得对数据进行联合变得十分困难
而绝大多数企业只拥有很有限的数据量(few samples and features),甚至有些企业只是持有未标注的数据,数据的质量差,可利用性不强,对于数据依赖性极强的AI模型的训练极为不利。
Google first introduced a federated learning (FL) system (McMahan et al. 2016) in which a global machine learning model is updated by a federation of distributed participants while keeping their data locally
In reality. however, the set of common entities could be small, making a federation less attractive and leaving the majority non-overlapping data undermined.
于是论文提出了可行的解决方案: Federataion Transfer Learning(FTL)
Main contributions:
We introduce federated transfer learning in a privacy-preserving setting to provide solutions for federation problems beyond the scope of existing federated learning approaches;
在保护隐私的情况下引入了联邦迁移学习来解决当前联邦学习的局限
We provide an end-to-end solution to the proposed FTL problem and show that convergence and accuracy of the proposed approach is comparable to the non-privacy-preserving approach;
We provide a novel approach for adopting additively homomorphic encryption (HE) to multi-party computation (MPC) with neural networks such that only minimal modifications to the neural network is required and the accuracy is almost lossless, whereas most of the existing secure deep learning frameworks suffer from loss of accuracy when adopting privacy-preservingtechniques.
DeepSecure: uses Yao's Garbled Circuit Protocol for data encryption instead of HE.
SML(安全多方计算):uses secret-sharing and Yao's Grabled Circuit for encryption and supports collaborative trainingfor linear regression, logistic regression and neural network
Differential Privacy(差分隐私)
Transfer Learning ------- 应用于小型数据集和弱监督的强大技术
In recent years there have been tremendous amount of research work on adopting transfer learning techniques to various fields such as image classification tasks (Zhu et al. 2010), and sentiment analysis (Pan et al. 2010;Li et al. 2017). The performance of transfer learning relies on how related the domains are. Intuitively parties in the same data federation are usually organizations from the same or related industry, therefore are more prone to knowl-
edge propagation.
In summary, we provide both data security and performance gain in the proposed FTL framework. Data security is provided because raw data D A andD B ,as well as the local models Net A and Net B are never exposed and only the encrypted common hidden representations are exchanged. In each iteration, the only non-encrypted values party A and party B receive are the gradients of their model parameters, which is aggregated from all sample variants. Performance gainis provided by the combinationof transfer learning, transfer cross validation and a safeguard with the self-learning supervised model.
Experiments
用了两个数据集进行了实验(data set 1: NUS-WIDE, data set 2: Default-Credit)