Factored Reasoning with Inner Speech and Persistent Memory for Evidence-Grounded Human-Robot Interaction

Dialogue-based human-robot interaction requires robot cognitive assistants to maintain persistent user context, recover from underspecified requests, and ground responses in external evidence, while keeping intermediate decisions verifiable. In this paper we introduce JANUS, a cognitive architecture for assistive robots that models interaction as a partially observable Markov decision process and realizes control as a factored controller with typed interfaces. To this aim, Janus (i) decomposes the overall behavior into specialized modules, related to scope detection, intent recognition, memory, inner speech, query generation, and outer speech, and (ii) exposes explicit policies for information sufficiency, execution readiness, and tool grounding. A dedicated memory agent maintains a bounded recent-history buffer, a compact core memory, and an archival store with semantic retrieval, coupled through controlled consolidation and revision policies. Models inspired by the notion of inner speech in cognitive theories provide a control-oriented internal textual flow that validates parameter completeness and triggers clarification before grounding, while a faithfulness constraint ties robot-to-human claims to an evidence bundle combining working context and retrieved tool outputs. We evaluate JANUS through module-level unit tests in a dietary assistance domain grounded on a knowledge graph, reporting high agreement with curated references and practical latency profiles. These results support factored reasoning as a promising path to scalable, auditable, and evidence-grounded robot assistance over extended interaction horizons.

翻译：基于对话的人机交互要求机器人认知助手能够维持持久的用户上下文，从不明确的请求中恢复，并将响应建立在外部证据的基础上，同时保持中间决策的可验证性。本文提出JANUS，一种用于辅助机器人的认知架构，它将交互建模为部分可观测马尔可夫决策过程，并通过具有类型化接口的因子化控制器实现控制。为此，JANUS（i）将整体行为分解为专用模块，涉及范围检测、意图识别、记忆、内隐言语、查询生成与外显言语；（ii）显式地制定信息充分性、执行就绪度与工具落地的策略。专用的记忆代理维护一个有界的近期历史缓冲区、一个紧凑的核心记忆库以及具备语义检索功能的归档存储，三者通过受控的整合与修订策略相耦合。受认知理论中内隐言语概念启发的模型，提供了一种面向控制的内部文本流，用于验证参数完整性并在落地前触发澄清请求，而忠实性约束则将机器人对人类的声明与结合工作上下文及检索工具输出的证据包相绑定。我们在基于知识图谱的饮食辅助领域通过模块级单元测试评估JANUS，结果显示其与人工标注参考具有高度一致性，并呈现出实用的延迟特性。这些结果表明，因子化推理是实现可扩展、可审计且证据支撑的机器人辅助服务，以应对长期交互场景的一条可行路径。