Agents in partially observable environments require persistent memory to integrate observations over time. While KGs (knowledge graphs) provide a natural representation for such evolving state, existing benchmarks rarely expose agents to environments where both the world dynamics and the agent's memory are explicitly graph-shaped. We introduce the Room Environment v3, a configurable environment whose hidden state is an RDF KG and whose observations are RDF triples. The agent may extend these observations into a temporal KG when storing them in long-term memory. The environment is easily adjustable in terms of grid size, number of rooms, inner walls, and moving objects. We define a lightweight temporal KG memory for agents, based on RDF-star-style qualifiers (time_added, last_accessed, num_recalled), and evaluate several symbolic baselines that maintain and query this memory under different capacity constraints. Two neural sequence models (LSTM and Transformer) serve as contrasting baselines without explicit KG structure. Agents train on one layout and are evaluated on a held-out layout with the same dynamics but a different query order, exposing train-test generalization gaps. In this setting, temporal qualifiers lead to more stable performance, and the symbolic TKG (temporal knowledge graph) agent achieves roughly fourfold higher test QA (question-answer) accuracy than the neural baselines under the same environment and query conditions. The environment, agent implementations, and experimental scripts are released for reproducible research at https://github.com/humemai/agent-room-env-v3 and https://github.com/humemai/room-env.
翻译:在部分可观测环境中,智能体需要持久性记忆来整合随时间推移的观测信息。尽管知识图谱为此类演化状态提供了自然的表示形式,但现有基准测试很少让智能体接触那些世界动态和智能体记忆均明确呈现图结构的环境。我们提出了Room Environment v3,这是一个可配置的环境,其隐藏状态为RDF知识图谱,观测结果为RDF三元组。智能体在将观测结果存储至长期记忆时,可将其扩展为时序知识图谱。该环境在网格尺寸、房间数量、内部墙体及移动物体等方面均可灵活调整。我们基于RDF-star风格的限定符(添加时间、最后访问时间、回忆次数),为智能体定义了一种轻量级时序知识图谱记忆,并评估了在不同容量约束下维护和查询该记忆的若干符号化基线模型。两个神经序列模型(LSTM和Transformer)作为对比基线,不包含显式知识图谱结构。智能体在一种布局上训练,并在具有相同动态特性但查询顺序不同的保留布局上进行评估,从而揭示训练-测试泛化差距。在此设定下,时序限定符带来更稳定的性能表现,且符号化时序知识图谱智能体在相同环境与查询条件下,其测试问答准确率较神经基线模型提升约四倍。该环境、智能体实现及实验脚本已发布于https://github.com/humemai/agent-room-env-v3 与 https://github.com/humemai/room-env,以支持可重复性研究。