PAct：基于部件分解的单视角铰接物体生成 (PAct: Part-Decomposed Single-View Articulated Object Generation)

Articulated objects are central to interactive 3D applications, including embodied AI, robotics, and VR/AR, where functional part decomposition and kinematic motion are essential. Yet producing high-fidelity articulated assets remains difficult to scale because it requires reliable part decomposition and kinematic rigging. Existing approaches largely fall into two paradigms: optimization-based reconstruction or distillation, which can be accurate but often takes tens of minutes to hours per instance, and inference-time methods that rely on template or part retrieval, producing plausible results that may not match the specific structure and appearance in the input observation. We introduce a part-centric generative framework for articulated object creation that synthesizes part geometry, composition, and articulation under explicit part-aware conditioning. Our representation models an object as a set of movable parts, each encoded by latent tokens augmented with part identity and articulation cues. Conditioned on a single image, the model generates articulated 3D assets that preserve instance-level correspondence while maintaining valid part structure and motion. The resulting approach avoids per-instance optimization, enables fast feed-forward inference, and supports controllable assembly and articulation, which are important for embodied interaction. Experiments on common articulated categories (e.g., drawers and doors) show improved input consistency, part accuracy, and articulation plausibility over optimization-based and retrieval-driven baselines, while substantially reducing inference time.

翻译：铰接物体是交互式三维应用（包括具身人工智能、机器人学以及虚拟/增强现实）的核心要素，其功能性部件分解与运动学结构至关重要。然而，生成高保真度的铰接资源仍难以规模化，因其需要可靠的部件分解与运动学绑定。现有方法主要分为两种范式：基于优化的重建或蒸馏方法虽能保证精度，但每个实例通常需要数十分钟至数小时的处理时间；以及依赖模板或部件检索的推理时方法，虽能生成合理结果，但可能无法匹配输入观测中的特定结构与外观。本文提出一种以部件为中心的铰接物体生成框架，该框架在显式的部件感知条件下合成部件几何、组合方式及铰接结构。我们的表征模型将物体建模为一组可动部件，每个部件通过增强部件身份与铰接线索的隐式标记进行编码。在单张图像条件下，该模型生成能保持实例级对应关系、同时具备有效部件结构与运动可行性的铰接三维资源。该方法避免了逐实例优化，实现了快速前馈推理，并支持可控的装配与铰接操作——这对具身交互至关重要。在常见铰接类别（如抽屉与门）上的实验表明，相较于基于优化和检索驱动的基线方法，本方法在输入一致性、部件精度与铰接合理性方面均有提升，同时大幅减少了推理时间。