Robotics and automation are increasingly influential in logistics but remain largely confined to traditional warehouses. In grocery retail, advancements such as cashier-less supermarkets exist, yet customers still manually pick and pack groceries. While there has been a substantial focus in robotics on the bin picking problem, the task of packing objects and groceries has remained largely untouched. However, packing grocery items in the right order is crucial for preventing product damage, e.g., heavy objects should not be placed on top of fragile ones. However, the exact criteria for the right packing order are hard to define, in particular given the huge variety of objects typically found in stores. In this paper, we introduce LLM-Pack, a novel approach for grocery packing. LLM-Pack leverages language and vision foundation models for identifying groceries and generating a packing sequence that mimics human packing strategy. LLM-Pack does not require dedicated training to handle new grocery items and its modularity allows easy upgrades of the underlying foundation models. We extensively evaluate our approach to demonstrate its performance. We will make the source code of LLMPack publicly available upon the publication of this manuscript.
翻译:机器人技术与自动化在物流领域日益重要,但仍主要局限于传统仓库场景。在生鲜零售中,虽已出现无收银员超市等技术突破,但顾客仍需手动挑选和包装商品。尽管机器人领域对箱体分拣问题已有大量研究,但物品与生鲜商品的包装任务仍未得到充分关注。然而,按正确顺序包装生鲜商品对防止产品损坏至关重要——例如,重物不应置于易碎物品之上。但包装顺序的具体标准难以界定,尤其面对商店中种类繁多的商品。本文提出LLM-Pack,一种面向生鲜包装的新方法。该方法利用语言与视觉基础模型识别生鲜商品,并生成模拟人类包装策略的序列方案。LLM-Pack无需专门训练即可处理新增生鲜品项,其模块化设计便于对底层基础模型进行升级。我们通过大量实验验证了该方法性能。本手稿发表后,我们将公开LLM-Pack的源代码。