Literature

Privacy-Preserving Federated Learning

Federated learning is a machine learning technique that allows independent parties to collaboratively train a global model while keeping their local data decentralized. Recently, this approach has attracted considerable attention due to the abundance of data sources dispersed across many sources, which can be combined to generate more efficient and accurate machine learning models. Federated learning offers the possibility of generating these complex and comprehensive models with information coming from different data providers, without revealing the local data of a participant to the remaining parties involved. This could be achieved by each party training a local model based on its own data, and then sharing the model’s parameters with the other parties and/or a centralized server to generate a global model. Although data itself remains decentralized, revealing local model parameters to others can result in the training data to be exposed to unauthorized parties. Privacy-preserving federated learning has emerged as a research topic in order to address problems as such. This article shares some work in the field of privacy-preserving federated learning, which could be useful in familiarizing yourself with the current state-of-the-art.

Georgieva Belorgey, Mariya, et al. “Falkor: Federated Learning Secure Aggregation Powered by AESCTR GPU Implementation”:

The FALKOR protocol is an aggregation-based federated learning protocol within a multi-server environment. The protocol assumes that clients and servers are in possession of shared keys, which are used to generate random masks. These random masks are used to mask the local model of each client. The masked models, publicly shared by clients, are aggregated using a public map-reduce service. Following that, each server calculates its share of the global model. Finally, the global model is made available by revealing these secret shares. This protocol is adaptable to non-linear operations and improves upon the naive federated learning protocols by ensuring that the communication complexity stays independent of the number of servers.

Bonawitz, Keith, et al. “Practical secure aggregation for federated learning on user-held data”:

In this work, the authors propose several secure aggregation protocols for federated learning systems in a single server setting. These include masking using one-time pads, using secret-sharing mechanisms to recover in the case of a user-dropout scenario, using double-masking to overcome possible information leakage in the presence of a malicious server, an efficient key exchange protocol that reduces the overall communication cost, and finally, a mobile-device deployable protocol that ensures pairwise secure connection, authentication and forward-secrecy.

Sav, Sinem, et al. “Privacy-preserving federated neural network learning for disease-associated cell classification”:

This work proposes the usage of multiparty homomorphic encryption to train a federated neural network model, using a variant of the FedAvg algorithm. In the case of prediction-as-a-service, the encryption of the query and its evaluation are achieved using the collective public key of the participants. A collective key-switching protocol is applied to switch the result to the public key of the querier.

Deforth, Kevin, et al. “XORBoost: Tree boosting in the multiparty computation setting”:

XORBoost is a protocol for training and inferring gradient-boosted trees in an MPC setting using the Manticore MPC framework, which operates with an offline trusted dealer, and with full-threshold security across an arbitrary number of players. The XORBoost framework supports training for both vertically and horizontally split datasets (e.g., split by data points vs. split by feature space).

Cheng, Kewei, et al. “SecureBoost: A lossless federated learning framework”:

SecureBoost is a vertical federated tree-boosting protocol that uses homomorphic encryption, specifically the Paillier cryptosystem, to achieve security. The gradient values are calculated by the active party, who owns the label information. The gradients are encrypted and sent to the passive parties (who do not have access to the label information, but only to certain feature values). Passive parties aggregate these gradients based on their feature values. The aggregated gradients are sent back to the active party, who decrypts them to decide the split point for the given node of the tree.

Tian, Zhihua, et al. “FederBoost: Private Federated Learning for GBDT”:

FederBoost introduces vertical and horizontal federated learning protocols for training gradient-boosted tree models using differential privacy and secure aggregation. The vertical protocol does not require any cryptographic operations. The horizontal protocol uses secure aggregation to aggregate the gradients and find the split point for given nodes. The horizontal federated learning approach includes a secure quantile look-up technique that also utilizes secure aggregation for distributed bucket construction, which is a non-trivial operation.