We study the problem of learning online packing skills for irregular 3D shapes, which is arguably the most challenging setting of bin packing problems. The goal is to consecutively move a sequence of 3D objects with arbitrary shapes into a designated container with only partial observations of the object sequence. Meanwhile, we take physical realizability into account, involving physics dynamics and constraints of a placement. The packing policy should understand the 3D geometry of the object to be packed and make effective decisions to accommodate it in the container in a physically realizable way. We propose a Reinforcement Learning (RL) pipeline to learn the policy. The complex irregular geometry and imperfect object placement together lead to huge solution space. Direct training in such space is prohibitively data intensive. We instead propose a theoretically-provable method for candidate action generation to reduce the action space of RL and the learning burden. A parameterized policy is then learned to select the best placement from the candidates. Equipped with an efficient method of asynchronous RL acceleration and a data preparation process of simulation-ready training sequences, a mature packing policy can be trained in a physics-based environment within 48 hours. Through extensive evaluation on a variety of real-life shape datasets and comparisons with state-of-the-art baselines, we demonstrate that our method outperforms the best-performing baseline on all datasets by at least 12.8% in terms of packing utility.
翻译:研究针对不规则三维形状的在线打包技能学习问题,这是装箱问题中最具挑战性的设定。目标是在仅部分观测物体序列的情况下,将具有任意形状的一系列三维物体连续移动至指定容器中。同时,我们考虑物理可实现性,涉及物理动力学和放置约束。打包策略需要理解待包装物体的三维几何结构,并以物理可实现的方式在容器中做出有效放置决策。我们提出基于强化学习(RL)的框架来学习该策略。复杂的不规则几何形状与非完美物体放置共同导致巨大的解空间,直接在此空间中训练需要海量数据。我们提出了一种理论上可证明的候选动作生成方法,用以缩减强化学习的动作空间并降低学习负担。随后学习参数化策略,从候选动作中选择最优放置方案。结合异步强化学习加速的高效方法和仿真训练序列的数据准备流程,可在基于物理环境内48小时内训练出成熟的打包策略。通过在多种真实形状数据集上的广泛评估及与先进基线方法的比较,我们证明该方法在所有数据集上的打包效用至少比最优基线方法高出12.8%。