Deletion inference, reconstruction, and compliance in machine (un)learning

Authors: Ji Gao (University of Virginia), Sanjam Garg (University of California, Berkeley and NTT Research), Mohammad Mahmoody (University of Virginia), Prashant Nalini Vasudevan (National University of Singapore)

Volume: 2022
Issue: 3
Pages: 415–436
DOI: https://doi.org/10.56553/popets-2022-0079

artifact

Download PDF

Abstract: Privacy attacks on machine learning models aim to identify the data that is used to train such models. Such attacks, traditionally, are studied on static models that are trained once and are accessible by the adversary. Motivated to meet new legal requirements, many machine learning methods are recently extended to support machine unlearning, i.e., updating models as if certain examples are removed from their training sets, and meet new legal requirements. However, privacy attacks could potentially become more devastating in this new setting, since an attacker could now access both the original model before deletion and the new model after the deletion. In fact, the very act of deletion might make the deleted record more vulnerable to privacy attacks. Inspired by cryptographic definitions and the differential privacy framework, we formally study privacy implications of machine unlearning. We formalize (various forms of) deletion inference and deletion reconstruction attacks, in which the adversary aims to either identify which record is deleted or to reconstruct (perhaps part of) the deleted records. We then present successful deletion inference and reconstruction attacks for a variety of machine learning models and tasks such as classification, regression, and language models. Finally, we show that our attacks would provably be precluded if the schemes satisfy (variants of) deletion compliance (Garg, Goldwasser, and Vasudevan, Eurocrypt’20).

Keywords: data leakage, privacy, machine learning, machine unlearning

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 license.