Modernising Reinforcement Learning-Based Navigation for Embodied Semantic Scene Graph Generation

Semantic world models enable embodied agents to reason about objects, relations, and spatial context beyond purely geometric representations. In Organic Computing, such models are a key enabler for objective-driven self-adaptation under uncertainty and resource constraints. The core challenge is to acquire observations maximising model quality and downstream usefulness within a limited action budget. Semantic scene graphs (SSGs) provide a structured and compact representation for this purpose. However, constructing them within a finite action horizon requires exploration strategies that trade off information gain against navigation cost and decide when additional actions yield diminishing returns. This work presents a modular navigation component for Embodied Semantic Scene Graph Generation and modernises its decision-making by replacing the policy-optimisation method and revisiting the discrete action formulation. We study compact and finer-grained, larger discrete motion sets and compare a single-head policy over atomic actions with a factorised multi-head policy over action components. We evaluate curriculum learning and optional depth-based collision supervision, and assess SSG completeness, execution safety, and navigation behaviour. Results show that replacing the optimisation algorithm alone improves SSG completeness by 21\% relative to the baseline under identical reward shaping. Depth mainly affects execution safety (collision-free motion), while completeness remains largely unchanged. Combining modern optimisation with a finer-grained, factorised action representation yields the strongest overall completeness--efficiency trade-off.

翻译：语义世界模型使具身智能体能够在纯几何表征之外，推理物体、关系和空间上下文。在有机计算中，此类模型是在不确定性和资源约束下实现目标驱动型自适应能力的关键使能器。核心挑战在于在有限的动作预算内，获取能最大化模型质量与下游实用性的观测数据。语义场景图为此提供了结构化的紧凑表征。然而，在有限动作周期内构建语义场景图需要探索策略在信息增益与导航成本之间进行权衡，并判断何时采取额外动作会产生边际效益递减。本研究为具身语义场景图生成提出了模块化导航组件，通过替换策略优化方法并重新审视离散动作公式化表达，实现了决策过程的现代化。我们研究了紧凑型与更细粒度、更大规模的离散动作集，并比较了作用于原子动作的单头策略与作用于动作组件的分解式多头策略。我们评估了课程学习与可选的基于深度的碰撞监督，并针对语义场景图的完整性、执行安全性和导航行为进行了评估。结果表明，在相同的奖励塑形条件下，仅替换优化算法即可使语义场景图完整性相较于基线提升21%。深度主要影响执行安全性（无碰撞运动），而完整性基本保持不变。将现代优化与更细粒度的分解式动作表征相结合，可在完整性与效率之间实现最强综合权衡。