Humans naturally develop preferences for how manipulation tasks should be performed, which are often subtle, personal, and difficult to articulate. Although it is important for robots to account for these preferences to increase personalization and user satisfaction, they remain largely underexplored in robotic manipulation, particularly in the context of deformable objects like garments and fabrics. In this work, we study how to adapt pretrained visuomotor diffusion policies to reflect preferred behaviors using limited demonstrations. We introduce RKO, a novel preference-alignment method that combines the benefits of two recent frameworks: RPO and KTO. We evaluate RKO against common preference learning frameworks, including these two, as well as a baseline vanilla diffusion policy, on real-world cloth-folding tasks spanning multiple garments and preference settings. We show that preference-aligned policies (particularly RKO) achieve superior performance and sample efficiency compared to standard diffusion policy fine-tuning. These results highlight the importance and feasibility of structured preference learning for scaling personalized robot behavior in complex deformable object manipulation tasks.
翻译:人类在执行操作任务时,会自然地形成个人偏好,这些偏好通常微妙、个性化且难以明确表述。尽管机器人需要考虑这些偏好以提升个性化程度和用户满意度,但在机器人操作领域,尤其是涉及服装、织物等可变形物体的场景中,这一问题仍未得到充分探索。本研究探讨如何利用有限演示数据,使预训练的视觉运动扩散策略适应并体现偏好行为。我们提出了一种新颖的偏好对齐方法RKO,该方法结合了RPO与KTO两种近期框架的优势。我们在涵盖多种衣物类型与偏好设置的真实世界布料折叠任务中,将RKO与常见的偏好学习框架(包括上述两种方法)以及基准的原始扩散策略进行了对比评估。结果表明,相较于标准的扩散策略微调方法,偏好对齐策略(尤其是RKO)在性能与样本效率方面均表现出显著优势。这些发现凸显了结构化偏好学习在复杂可变形物体操作任务中实现规模化个性化机器人行为的重要性和可行性。