Personalization in social robotics is critical for fostering effective human-robot interactions, yet systems often face the cold start problem, where initial user preferences or characteristics are unavailable. This paper proposes a novel framework called USER-LLM R1 for a user-aware conversational agent that addresses this challenge through dynamic user profiling and model initiation. Our approach integrates chain-of-thought (CoT) reasoning models to iteratively infer user preferences and vision-language models (VLMs) to initialize user profiles from multimodal inputs, enabling personalized interactions from the first encounter. Leveraging a Retrieval-Augmented Generation (RAG) architecture, the system dynamically refines user representations within an inherent CoT process, ensuring contextually relevant and adaptive responses. Evaluations on the ElderlyTech-VQA Bench demonstrate significant improvements in ROUGE-1 (+23.2%), ROUGE-2 (+0.6%), and ROUGE-L (+8%) F1 scores over state-of-the-art baselines, with ablation studies underscoring the impact of reasoning model size on performance. Human evaluations further validate the framework's efficacy, particularly for elderly users, where tailored responses enhance engagement and trust. Ethical considerations, including privacy preservation and bias mitigation, are rigorously discussed and addressed to ensure responsible deployment.
翻译:社交机器人中的个性化对于促进有效的人机交互至关重要,然而系统常常面临冷启动问题,即初始用户偏好或特征无法获取。本文提出了一种名为USER-LLM R1的新型框架,用于构建用户感知的对话代理,该框架通过动态用户画像构建和模型初始化来应对这一挑战。我们的方法整合了思维链推理模型以迭代推断用户偏好,并利用视觉语言模型从多模态输入中初始化用户画像,从而从初次接触起即可实现个性化交互。该系统利用检索增强生成架构,在固有的思维链过程中动态优化用户表征,确保生成上下文相关且自适应的响应。在ElderlyTech-VQA基准上的评估表明,相较于最先进的基线模型,本框架在ROUGE-1、ROUGE-2和ROUGE-L的F1分数上分别实现了+23.2%、+0.6%和+8%的显著提升,消融研究进一步揭示了推理模型规模对性能的影响。人工评估进一步验证了该框架的有效性,尤其对于老年用户,定制化的响应增强了参与度和信任感。本文还严格讨论并处理了包括隐私保护和偏见缓解在内的伦理考量,以确保负责任的部署。