dX-Privacy for Text and the Curse of Dimensionality
Authors: Hassan Asghar (Macquarie University), Robin Carpentier (Macquarie University), Benjamin Zhao (Macquarie University), Dali Kaafar (Macquarie University)
Volume: 2026
Issue: 1
Pages: 224–241
DOI: https://doi.org/10.56553/popets-2026-0012
Abstract: A widely used method to ensure privacy of unstructured text data is the multidimensional Laplace mechanism for dX-privacy, which is a relaxation of differential privacy for metric spaces. We identify an intriguing peculiarity of this mechanism. When applied on a word-by-word basis, the mechanism either outputs the original word, or completely dissimilar words, and very rarely outputs semantically similar words. We investigate this observation in detail, and tie it to the fact that the distance of the nearest neighbor of a word in any word embedding model (which are high-dimensional) is much larger than the relative difference in distances to any of its two consecutive neighbors. We also show that the dot product of the multidimensional Laplace noise vector with any word embedding plays a crucial role in designating the nearest neighbor. We derive the distribution, moments and tail bounds of this dot product. We further propose a fix as a post-processing step, which satisfactorily removes the above-mentioned issue.
Keywords: differential privacy, word embeddings, multidimensional Laplace mechanism
Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.