论文笔记:Secure Federated Matrix Factorization
目录
- 1. Introduction
- 2. Preliminary Analysis
- 3. Personal-Level Distributed Matrix Factorization
- 4. Gradients Reveal Information
- 5. FedMF: A Framework for Federated Matrix Factorization
- 6. Prototype Implementation and Performance Evaluation
-
-
6.1 Prototype Development
- 6.2 Performance Assessment
-
7.Conclusion and Future Work
-
1. Introduction
In recommendation systems, two types of users' private information are exposed in traditional MF (Matrix Factorization):
(i) users' explicit preference records
(ii) users' understood latent feature representations
Existing research has explored privacy-preserving matrix factorization (MF) techniques in two primary categories.
(1) Data obfuscation techniques: These methods render users' preference data inconspicuous prior to submission to the central server, ensuring a certain level of privacy protection, such as through differential privacy mechanisms.
(2) Encryption approaches: Advanced encryption methods, including homomorphic encryption, are employed for implementing privacy-preserving MF in a secure manner.
2.Preliminaries
Horizontal Federated Learning
Horizontal federated learning is applied in scenarios where the data from different contributors share a common feature space but differ in sample datasets.
Additively Homomorphic Encryption


3.User-level Distributed Matrix Factorization
The matrix M is an element of the Cartesian product of sets [n] and [m], denoting user-item rating pairs where a rating has been generated. M represents the total number of ratings, while ri,j denotes the specific rating given by user i to item j.
The matrix decomposition represents the issue as an estimation of a bilinear model based on existing ratings. The computation of U and V can be achieved through solving the subsequent regularized least squares minimization problem.




4.Gradients Leak Information
The gradients of a user-uploaded file can be known in two continuous steps, enabling us to infer this user's rating information.
5.FedMF: Federated Matrix Factorization
The server operates with honesty and curiosity about sensitive information, while ensuring that a user base composed solely of honest individuals benefits from enhanced security measures.
Matrix Factorization: Major steps include,
The server enciphers item profile V with a public key, generating ciphertext CV. From this point onward, the most recent ciphertext CV will be accessible for all users to download.
Each user retrieves the latest ciphertext CV from the server and decrypts it with a private key to obtain plain text V. This plaintext V is then used to perform a local update and calculate gradient G. Subsequently, G is enciphered using a public key to produce ciphertext CV. Finally, a TLS/SSL secure channel is established, enabling CV to be securely transmitted back to the server through this channel.
Upon receiving an encrypted gradient from a user, the server updates item profile V using this ciphertext. Following this update, once more prepare for users' downloading of the latest ciphertext.
Steps 2 and 3 are sequentially executed until convergence occurs.

6.Prototype and Evaluation
6.1 Prototype Implementation
use Paillier encryption1 to build a prototype of FedMF
Given publick key and encrypted plaintext, Paillier encryption possesses the homomorphic characteristic functions.

The Paillier encryption algorithm mandates that the plaintext must be a positive integer. To ensure compatibility with this system, it is necessary to modify the encryption method.
Float:将基底指数与小数相乘, 其中乘积结果的整数部分_I_和基底指数_e_被视为浮点数的整数表示, 即浮点数值表示为(I, e)_。
The handling of negative numbers in this system involves setting a maximum value for each plaintext block, denoted as max_num, which is equal to half of n in pk. This approach assumes that every original data value does not exceed this maximum value. Given that n is typically chosen as a very large prime number, such an assumption tends to hold true in practice. By performing modular reduction with respect to n, any original negative values are converted into their corresponding positive counterparts, each exceeding the maximum value by at least one unit.
Within a category referred to as FullTex, users submit differential metrics for every item; each item's metric is assigned a value of zero if the user hasn't provided a rating. The following setup, designated as PartText, allows users to submit gradient data exclusively for items they have rated.
6.2 Evaluation
Dataset 2: the real compiled movie rating dataset from MovieLens, including 100K rating information, compiled by 610 users and 9724 movies.
Parameter settings: Within the Paillier encryption scheme, the public key's length is configured as 1024 bits. The communication channel's bandwidth is established at a rate of 1 gigabit per second. In the matrix factorization phase, both user and item profiles are assigned a dimensionality of 100 dimensions.
Our test experiments are conducted on a server running at 5.0GHz with a 6-core Intel Core CPU and 32GB RAM, which operates at a Windows operating system using Python as the programming language. The homomorphic encryption component in our Python implementation leverages the gmpy 4 module to achieve performance comparable to C++ implementations.
Performance:
The FedMF algorithm will generate similar user and item profiles without being influenced by distributed computing or homomorphic encryption mechanisms. Consequently, the primary objective of these experiments is to evaluate FedMF's computational efficiency.


7.Conclusion and Future Work
PROBLEM
- The computational efficiency of FedMF maintains a satisfactory level when the count of items remains within a reasonable range.
- The system's computational burden exhibits a linear growth trend as the number of items increases.
CHALLENGES
- Homomorphic encryption has seen significant advancements in efficiency.
- When comparing the full-text version to the partial-text version, certain differences emerge.
- Definitions have been enhanced to ensure a higher level of security.
Public-key cryptosystems constructed from composite degree residuosity classes. In the International Conference on the Theory and Applications of Cryptographic Techniques, pages 223–238. Springer, 1999.
Maxwell F Harper and Joseph A Konstan. The history and context of the movielens datasets: ACM Transactions on Interactive Intelligent Systems (TIIS), vol.5, no.4:19, 2016
