End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets

Authors: Gauri Gupta (MIT), Krithika Ramesh (John Hopkins University), Anwesh Bhattacharya (Microsoft Research India), Divya Gupta (Microsoft Research India), Rahul Sharma (Microsoft Research India), Nishanth Chandran (Microsoft Research India), Rijurekha Sen (IIT Delhi)

Volume: 2023
Issue: 4
Pages: 436–451
DOI: https://doi.org/10.56553/popets-2023-0118

Download PDF

Abstract: Privacy-preserving machine learning (PPML) promises to train machine learning (ML) models by combining data spread across multiple data silos. Theoretically, secure multiparty computation (MPC) allows multiple data owners to train models on their joint data without revealing the data to each other. However, the prior implementations of this secure training using MPC have three limitations: they have only been evaluated on CNNs, and LSTMs have been ignored; fixed point approximations have affected training accuracies compared to training in floating point; and due to significant latency overheads of secure training via MPC, its relevance for practical tasks with streaming data remains unclear. The motivation of this work is to report our experience of addressing the practical problem of secure training and inference of models for urban sensing problems, e.g., traffic congestion estimation, or air pollution monitoring in large cities, where data can be contributed by rival fleet companies while balancing the privacy-accuracy trade-offs using MPC-based techniques.Our first contribution is to design a custom ML model for this task that can be efficiently trained with MPC within a desirable latency. In particular, we design a GCN-LSTM and securely train it on time-series sensor data for accurate forecasting, within 7 minutes per epoch. As our second contribution, we build an end-to-end system of private training and inference that provably matches the training accuracy of cleartext ML training. This work is the first to securely train a model with LSTM cells. Third, this trained model is kept secret-shared between the fleet companies and allows clients to make sensitive queries to this model while carefully handling potentially invalid queries. Our custom protocols allow clients to query predictions from privately trained models in milliseconds, all the while maintaining accuracy and cryptographic security.

Keywords: secure multi pary computation, privacy preserving machine learning, air pollution forecasting

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.