We train a language model to generate LEGO-brick build sequences. While prior work has been restricted to discrete, voxel-like towers, we consider a much broader set of pieces, encompassing thousands of part types with diverse connection semantics. To enable this, we first collect a large-scale dataset of over 100,000 human-designed LDraw brick objects and scenes. The complexity of our setting makes it challenging to autoregressively assemble structures that satisfy physical constraints. When predicting block pose directly, build sequences quickly become invalid after a small number of steps. Although pieces are placed in 3D space, it is the spatial relationships of the parts which define the whole. With this in mind, we design a graph-based program representation that parametrizes structure through connectivity, improving the physical grounding of generated sequences. To enable future applications, we make our dataset and models available for research purposes. https://kulits.github.io/BrickNet
翻译:我们训练了一个语言模型来生成乐高积木的搭建序列。与先前局限于离散、类体素塔式结构的研究不同,我们考虑了更广泛的零件集,涵盖数千种具有多样连接语义的部件类型。为实现这一目标,我们首先收集了一个包含超过10万个人工设计的LDraw积木对象与场景的大规模数据集。该问题的复杂性使得自回归地生成满足物理约束的结构具有挑战性:直接预测积木位姿时,搭建序列在少量步骤后便会迅速失效。尽管零件被放置于三维空间中,但定义整体的却是零件间的空间关系。基于此,我们设计了一种基于图的程序表示方法,通过连通性对结构进行参数化,从而提升了生成序列的物理合理性。为促进后续应用,我们开源了该数据集与模型。https://kulits.github.io/BrickNet