SPRINT: Scalable Secure & Differentially Private Inference for Transformers

Francesco Capano; Jonas Böhler; Benjamin Weggenmann

SPRINT: Scalable Secure & Differentially Private Inference for Transformers

Authors: Francesco Capano (SAP SE), Jonas Böhler (SAP SE), Benjamin Weggenmann (Technische Hochschule Würzburg-Schweinfurt)

Volume: 2026
Issue: 1
Pages: 134–153
DOI: https://doi.org/10.56553/popets-2026-0008

Artifact: Available, Functional

Download PDF

Abstract: Machine learning as a service (MLaaS) enables scalable model deployment and inference on cloud servers. However, MLaaS exposes user queries and model parameters to servers. To guarantee confidentiality of queries and model parameters, multi-party computation (MPC) enables secure inference by distributing data and computations across multiple service providers. MPC eliminates single points of failure, mitigates provider breaches and ensures confidentiality beyond legal agreements. Beyond confidentiality of queries and parameters, the model itself can memorize and leak training data during inference. To mitigate privacy concerns, differential privacy (DP) provides a formal privacy guarantee for training data, which can be satisfied by injecting carefully calibrated noise into gradients during training. However, naive combinations of DP and MPC amplify accuracy loss due to DP noise and MPC approximations, and incur high computational and communication overhead due to cryptographic operations. We present SPRINT, the first scalable solution for efficient MPC inference on DP fine-tuned models with high accuracy. SPRINT fine-tunes public pre-trained models on private data using DP. It integrates DP-specific optimizations, e.g., parameter-efficient fine-tuning and noise-aware optimizers, with MPC optimizations, e.g., cleartext public parameters and efficient approximations of non-linear functions. We evaluate SPRINT on GLUE benchmark with RoBERTa, achieving up to 1.6x faster MPC inference than the state-of-the-art non-DP solution SHAFT, reducing communication by 1.6x. Notably, SPRINT maintains high accuracy during MPC inference, with <1 percentage point gap compared to cleartext accuracy.

Keywords: differential privacy, multiparty computations, secure inference, transformers fine-tuning

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.