Conversational AI has now reached billions of users, yet existing datasets capture only what people say, not what they think. We introduce ThoughtTrace, the first large-scale dataset that pairs real-world multi-turn human--AI conversations with users' self-reported thoughts: their reasons for sending prompts and reactions to assistant responses. ThoughtTrace comprises 1,058 users, 2,155 conversations, 17,058 turns, and 10,174 thought annotations collected across 20 language models. Our analysis shows that ThoughtTrace captures long-horizon, topically diverse interactions, and that thoughts are semantically distinct from messages, difficult for frontier LLMs to infer from context, diverse in content, and tied to conversation stages. We further demonstrate the utility of thoughts for downstream modeling. First, thoughts improve user-behavior prediction as inference-time context. Second, thought-guided rewrites provide fine-grained alignment signals for training personalized assistants. Together, ThoughtTrace establishes user thoughts as a new data modality for studying the cognitive dynamics behind human--AI interaction and provides a foundation for building assistants that better understand and adapt to users' latent goals, preferences, and needs.
翻译:对话式人工智能现已服务数十亿用户,然而现有数据集仅捕捉人们所言,而非所思。我们提出思考轨迹,这是首个将真实世界多轮人机对话与用户自述思维(包括发送提示的动机及对助手回复的反应)相结合的大规模数据集。该数据集包含来自20种语言模型的1058名用户、2155段对话、17058轮交互及10174条思维标注。分析表明,思考轨迹捕捉到长期且主题多样的交互,用户思维在语义上与消息内容显著不同,前沿大语言模型难以从上下文中推断其含义,且思维内容多样性与对话阶段紧密相关。我们进一步论证了思维对下游建模的实用价值:首先,思维作为推理过程中的上下文能提升用户行为预测效果;其次,思维引导的文本改写可为训练个性化助手提供细粒度对齐信号。综上,思考轨迹将用户思维确立为研究人机交互背后认知动态的新数据模态,并为构建能更好理解并适应用户潜在目标、偏好及需求的智能助手奠定基础。