Modernising Reinforcement Learning-Based Navigation for Embodied Semantic Scene Graph Generation

Semantic world models enable embodied agents to reason about objects, relations, and spatial context beyond purely geometric representations. In Organic Computing, such models are a key enabler for objective-driven self-adaptation under uncertainty and resource constraints. The core challenge is to acquire observations maximising model quality and downstream usefulness within a limited action budget. Semantic scene graphs (SSGs) provide a structured and compact representation for this purpose. However, constructing them within a finite action horizon requires exploration strategies that trade off information gain against navigation cost and decide when additional actions yield diminishing returns. This work presents a modular navigation component for Embodied Semantic Scene Graph Generation and modernises its decision-making by replacing the policy-optimisation method and revisiting the discrete action formulation. We study compact and finer-grained, larger discrete motion sets and compare a single-head policy over atomic actions with a factorised multi-head policy over action components. We evaluate curriculum learning and optional depth-based collision supervision, and assess SSG completeness, execution safety, and navigation behaviour. Results show that replacing the optimisation algorithm alone improves SSG completeness by 21\% relative to the baseline under identical reward shaping. Depth mainly affects execution safety (collision-free motion), while completeness remains largely unchanged. Combining modern optimisation with a finer-grained, factorised action representation yields the strongest overall completeness--efficiency trade-off.

翻译：语义世界模型使具身智能体能够推理物体、关系及空间上下文，超越了纯几何表征的局限性。在有机计算领域，此类模型是实现不确定性和资源约束条件下目标驱动自适应的关键使能技术。核心挑战在于如何在有限行动预算内获取能最大化模型质量与下游任务价值的观测数据。语义场景图为此提供了结构化紧凑表示，但在有限行动步长内构建该图需要探索策略在信息增益与导航成本之间进行权衡，并判断何时追加行动会产生边际效益递减。本文提出了用于具身语义场景图生成的模块化导航组件，通过替换策略优化方法并重构离散动作公式化来现代化其决策机制。我们研究了紧凑型与细粒度型两类更大规模的离散动作集，并将原子动作上的单头策略与动作组件上的分解多头策略进行对比。通过评估课程学习与可选的深度冲突监督机制，我们分析了场景图完整性、执行安全性及导航行为表现。实验结果表明：在相同奖励塑形条件下，仅替换优化算法即可使场景图完整性相对基线提升21%。深度信息主要影响执行安全性（无碰撞运动），而对完整性影响甚微。结合现代优化方法与细粒度分解动作表示，可获得最优的整体完整性-效率权衡。