'AI is from the devil.' Behaviors and Concerns Toward Personal Data Sharing with LLM-based Conversational Agents
Authors: Noé Zufferey (ETH Zurich), Sarah Abdelwahab Gaballah (Ruhr University of Bochum), Karola Marky (Ruhr University of Bochum), Verena Zimmermann (ETH Zurich)
Volume: 2025
Issue: 3
Pages: 5–28
DOI: https://doi.org/10.56553/popets-2025-0086
Abstract: With the increased performance of large language models (LLMs), conversational agents (CA), such as ChatGPT, are nowadays available to any individual requiring little technical knowledge and skills. Initial studies that have investigated related privacy risks primarily focused on either technical aspects and misuse of these tools, or captured overall perceptions of CA users in small-scale qualitative evaluations. Complementing and extending previous work, we used a quantitative user-centered approach to analyze and compare the behaviors and concerns of users and non-users. We conducted a survey study (N=422) with (1) service users, i.e., users of CA services, (2) local users, i.e., users of a local instance of CA (partially local users, or fully local users), and (3) non-users. We collected self-reported usage patterns and personal data-sharing behavior as well as privacy concerns related to different types of personal data (e.g., health data, demographics, or opinions). Furthermore, we analyze individuals' intention to use CA services in multiple scenarios. Our findings show that users of CA services generally have fewer privacy concerns than non-users. While users rarely share data related to personal identifiers and account credentials, they tend to often share data related to lifestyle, health, standard of living, and opinions. Surprisingly, partially local users tend to share more data with CA services as they also generally use CA services more often and for more diverse purposes. Also, while the majority of CA services users declared not being willing to prioritize CA services as an information source in the described scenarios such as seeking legal advice, between about one-quarter and one-third of partially local users would use CA services for all scenarios. Furthermore, half of the users were willing to stop using CA for privacy reasons (e.g., in case of data leaks), whereas a large majority of non-users reported not using CAs simply because they do not have the need or the opportunity. Our work highlights the high privacy risks for CA services users as CA services largely expand the amount of any type of personal information that can be collected by companies.
Keywords: LLM, Conversational Agent, Privacy, Chabot, HCI
Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.
