Word-level Annotation of GDPR Transparency Compliance in Privacy Policies using Large Language Models

Authors: Thomas Cory (Technische Universität Berlin), Wolf Rieder (Technische Universität Berlin), Julia Krämer (Erasmus University Rotterdam), Philip Raschke (Technische Universität Berlin), Patrick Herbke (Technische Universität Berlin), Axel Küpper (Technische Universität Berlin)

Volume: 2026
Issue: 1
Pages: 509–528
DOI: https://doi.org/10.56553/popets-2026-0026

Download PDF

Abstract: Ensuring transparency of data practices related to personal information is a core requirement of the General Data Protection Regulation (GDPR). However, large-scale compliance assessment remains challenging due to the complexity and diversity of privacy policy language. Manual audits are labour-intensive and inconsistent, while current automated methods often lack the granularity required to capture nuanced transparency disclosures. In this paper, we present a modular large language model (LLM)-based pipeline for fine-grained word-level annotation of privacy policies with respect to GDPR transparency requirements. Our approach integrates LLM-driven annotation with passage-level classification, retrieval-augmented generation, and a self-correction mechanism to deliver scalable, context-aware annotations across 21 GDPR-derived transparency requirements. To support empirical evaluation, we compile a corpus of 703,791 English-language privacy policies and generate a ground-truth sample of 200 manually annotated policies based on a comprehensive, GDPR-aligned annotation scheme. We propose a two-tiered evaluation methodology capturing both passage-level classification and span-level annotation quality and conduct a comparative analysis of seven state-of-the-art LLMs on two annotation schemes, including the widely used OPP-115 dataset. The results of our evaluation show that decomposing the annotation task and integrating targeted retrieval and classification components significantly improve annotation accuracy, particularly for well-structured requirements. Our work provides new empirical resources and methodological foundations for advancing automated transparency compliance assessment at scale.

Keywords: privacy, privacy policy, annotation, LLM, RAG, self-correction, GDPR

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.