Bridging Semantics and Physical Execution: A Neuro-Symbolic Framework for Multi-Pair Robotic Assembly

Multi-pair robotic assembly in unstructured environments faces spatial interference and contact uncertainties. Existing paradigms fail to bridge cognitive decision-making and physical execution, as they either encounter state-space explosion and knowledge bottlenecks or suffer from logical hallucinations and topological conflicts. We propose an end-to-end neuro-symbolic framework that solves the challenge hierarchically: generating optimal subgraphs for each pair, decoupling generality from edge cases, and then resolving cross-pair interferences. Given an eye-on-hand RGB-D assembly scene, the framework extracts semantic instance identity and state while quantifying the scene for divergence calculation. For each pair, optimal subgraph is generated via LLM using barely basic actions to mitigate hallucinations. Supportive actions for edge cases are reasoned and inserted with a lightweight discriminator. Driven by the divergence between the quantified baseline and current scene, it is easily extensible at low cost. Augmented subgraphs are topologically coordinated into global sequences while preserving internal behavioral coherence. Dynamic behavior trees embedding atomic skills close the force-aware execution loop. Offline evaluation on 100 real-world scenes achieves 97.00% global executability, outperforming classical and state-of-the-art planners. Real-robot deployment on a UR3 arm attains 90% success rate with 0.5 mm tolerance under strong interference, demonstrating a unified and verifiable solution for complex autonomous assembly.

翻译：非结构化环境中的多对机器人装配面临空间干扰和接触不确定性。现有范式无法弥合认知决策与物理执行之间的鸿沟，要么遭遇状态空间爆炸和知识瓶颈，要么产生逻辑幻觉和拓扑冲突。本文提出一种端到端的神经符号框架，通过分层方式解决这一挑战：为每一对组件生成最优子图，将通用性从边缘情况中解耦，进而解决跨对干扰问题。给定一个眼在手上的RGB-D装配场景，该框架在提取语义实例身份和状态的同时，对场景进行量化以计算散度。针对每一对组件，利用大语言模型（LLM）通过仅使用基础动作生成最优子图，从而缓解幻觉问题。针对边缘用例的支持性动作通过轻量级判别器进行推理和插入。在量化基线与当前场景之间的散度驱动下，该框架能以低成本轻松扩展。增强后的子图在保持内部行为一致性的前提下，通过拓扑协调整合为全局序列。融合原子技能的动态行为树闭环实现力感知执行。在100个真实场景上的离线评估实现了97.00%的全局可执行性，优于经典和最先进的规划器。在UR3机械臂上的实物部署在强干扰下实现了90%的成功率和0.5毫米的容差，为复杂自主装配展示了统一且可验证的解决方案。