Federated Learning - A Step Towards Privacy

*Co-Authored with Gavin Le Ber, Lawyer

Nearly all our daily activities are dependent on algorithms, machine learning and artificial intelligence (AI). A significant amount of data, including personal data, often collected from a user’s device, is required to improve the algorithm and make daily activities more convenient and efficient. The data collected from a user’s device is transmitted and uploaded to a single central server where the algorithmic model resides. The model identifies and learns patterns from the data, allowing it to improve upon itself. This is a centralized process. 

As individuals search for convenience and technological advancements, they must come to terms that they will need to give up some of their personal data for the betterment of the algorithmic model. This trade-off is also known as the “Privacy Paradox”.

Federated learning flips the centralized training model on its head and creates a decentralized model in which users’ data never leaves their devices to train a model. Instead, models are deployed to users’ devices where they are trained. Once trained, those models are aggregated and transmitted back to the server model which learns from those trained models, instead of the data itself. 

There are many applications of federated learning. The most well-known use-case is Google’s GBoard, a keyboard software used for predictive text. Rather than uploading every word typed on the keyboard to its server to train its model, Google sends a model to the user’s device. That model is trained directly on the device. It is then aggregated with other models and sent back to the central server model. Google’s GBoard does not collect individuals’ data because the data never leaves their devices. This decentralized method is a true example of a positive-sum, where all parties win; a well-illustrated example of Privacy by Design.  

The Traditional Machine Learning Model and its Privacy Challenges 

The most common “centralized” machine learning model involves a central server that receives anonymized data from end-user sensor devices. The central model which processes the data reveals insights about the data and is trained on those insights. Generally, the more data that is processed, the more accurate the model becomes. End-users benefit from the accurate model (or algorithm) by gaining the convenience of receiving relevant newsfeeds, getting traffic directions, or relevant advertisements. While end-users benefit from a customized experience, consumers have expressed an uneasy sentiment about sharing their data. There is a concern that their data will be used for a secondary purpose for which they have little or no control over. 

The Privacy-Centric Mechanics of Federated Learning

Instead of a centralized model, federated learning relies on a decentralized model whereby a model is transferred to end-user devices. The model is trained and augmented by the data directly on the device. The trained model is then aggregated with other devices’ models and collectively they are transmitted back to the ultimate model in the server. The server model is trained by the aggregated models, which do not contain personal data because the data never leaves the device.

Once the server model is updated and improved upon, it will be tested on end-users who will receive a new model from the server. That model will once again be trained and augmented by the local data on the device. The model will once again return to the server model and the cycle continues. 

Federated learning also has various levels of security built into its design. The server, which collects device-level models, only accepts encrypted model data without a key. To avoid a “man-in-the-middle attack” during transmission where data may be reverse engineered, a Secure Aggregation protocol is incorporated.  Secure Aggregation works by aggregating end-users’ unique models and encrypting the aggregation of models. Decryption of end-users’ models can only be performed in aggregate, ensuring that even the server host cannot view any end-user data. To further prevent the risk of re-identification and enhance user privacy, unique data contributed by a device, which may risk revealing personal data, is masked by adding noise before it is transmitted to the central service. This process is also known as Differential Privacy. 

Conclusion

Federated learning can develop models both for personalized uses or for global-scale adoption. Multiple federated learning use-cases have been identified such as Google’s GBoard, autonomous vehicles, healthcare as well as the adtech industry which has to rethink its business model now that third party cookies will be obsolete. Although a relatively new concept that still needs to be improved upon and explored with other use-cases, federated learning is a positive step forward to providing users with the convenience they want while still balancing it with user privacy. Federated learning is a positive step forward towards Privacy by Design. 

Previous
Previous

Return To Work - A Fine Balancing Act

Next
Next

Cyber threat to high net worth individuals is growing (Part 2)