To enable personalized and context-aware interactions, conversational AI systems have introduced a new mechanism: Memory. Memory creates what we refer to as the Algorithmic Self-portrait - a new form of personalization derived from users' self-disclosed information divulged within private conversations. While memory enables more coherent exchanges, the underlying processes of memory creation remain opaque, raising critical questions about data sensitivity, user agency, and the fidelity of the resulting portrait. To bridge this research gap, we analyze 2,050 memory entries from 80 real-world ChatGPT users. Our analyses reveal three key findings: (1) A striking 96% of memories in our dataset are created unilaterally by the conversational system, potentially shifting agency away from the user; (2) Memories, in our dataset, contain a rich mix of GDPR-defined personal data (in 28% memories) along with psychological insights about participants (in 52% memories); and (3)~A significant majority of the memories (84%) are directly grounded in user context, indicating faithful representation of the conversations. Finally, we introduce a framework-Attribution Shield-that anticipates these inferences, alerts about potentially sensitive memory inferences, and suggests query reformulations to protect personal information without sacrificing utility.
翻译:为实现个性化与情境感知的交互,对话式人工智能系统引入了一种新机制:记忆(Memory)。记忆构建了我们称之为"算法自画像"的新形态——一种源自用户在私密对话中自我披露信息的个性化表征。尽管记忆功能促进了更连贯的对话交流,其生成过程仍具不透明性,由此引发关于数据敏感性、用户能动性及生成画像保真度的关键问题。为填补这一研究空白,我们分析了80位真实ChatGPT用户产生的2,050条记忆条目。分析结果揭示三项核心发现:(1)数据集中高达96%的记忆由对话系统单方面生成,可能导致用户能动性转移;(2)记忆条目包含大量GDPR界定的个人数据(占28%)及用户心理特征信息(占52%);(3)绝大多数记忆(84%)直接植根于用户语境,表明其对对话内容具有忠实表征。最后,我们提出"归因屏障"框架——该框架能预判记忆推断行为,预警潜在敏感的记忆推断,并在保持系统实用性的前提下通过查询重构建议实现个人信息保护。