Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

Skill libraries in deployed robotic systems are continually updated through fine-tuning, fresh demonstrations, or domain adaptation, yet existing typed-composition methods (BLADE, SymSkill, Generative Skill Chaining) treat the library as frozen at test time and do not analyze how composition outcomes change when a skill is replaced. We introduce a paired-sampling cross-version swap protocol on robosuite manipulation tasks to characterize this dimension of compositional skill learning. On a dual-arm peg-in-hole task we discover a dominant-skill effect: one ECM achieves 86.7% atomic success rate while every other ECM is at or below 26.7%, and whether this dominant ECM enters a composition shifts the success rate by up to +50pp. We characterize the boundary on a simpler pick task where all atomic policies saturate at 100% and the effect is undefined. Across three tasks we further find that off-policy behavioral distance metrics fail to identify the dominant ECM, ruling out the natural cheap predictor. We propose an atomic-quality probe and a Hybrid Selector combining per-skill probes (zero per-decision cost) with selective composition revalidation (full cost), and characterize its Pareto frontier on 144 skill-update decisions. On T6 the atomic-only probe sits 23pp below full revalidation (64.6% vs 87.5% oracle match) at zero per-decision cost; a Hybrid Selector with m=10 closes most of that gap to ~12pp at 46% of full-revalidation cost. On the cross-task average over 144 events, atomic-only is within 3pp of full revalidation under a mixed-oracle caveat. The atomic-quality probe is, to our knowledge, the first principled, deployment-ready primitive for skill-update governance in compositional robot policies.

翻译：部署机器人系统中的技能库会通过微调、新示范或领域自适应持续更新，但现有的类型化组合方法（BLADE、SymSkill、生成式技能链）将技能库视为测试时冻结状态，未分析替换某项技能时组合结果如何变化。我们针对robosuite操作任务提出配对采样跨版本交换协议，以表征组合技能学习的这一维度。在双臂插销孔任务中，我们发现主导技能效应：一个ECM达到86.7%原子成功率，而其他ECM均等于或低于26.7%，该主导ECM是否进入组合会使成功率波动高达+50个百分点。在更简单的拾取任务中（所有原子策略饱和于100%，该效应无定义），我们刻画了这一边界。跨三个任务的进一步研究表明，离策略行为距离指标无法识别主导ECM，排除了自然廉价预测器。我们提出原子质量探针和混合选择器——结合每技能探针（零决策成本）与选择性组合重新验证（全成本），并在144个技能更新决策上刻画其帕累托前沿。在T6任务上，仅原子探针在零决策成本下比全重新验证低23个百分点（64.6% vs 87.5% oracle匹配）；混合选择器（m=10）以46%的全重新验证成本将差距缩小至约12个百分点。跨任务平均（144个事件），在混合oracle假设下，仅原子探针与全重新验证的差距在3个百分点内。据我们所知，原子质量探针是首个面向部署、适用于组合式机器人策略技能更新治理的基元化方案。