With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing. However, its application in 3D point cloud tasks remains largely unexplored. In this work, we introduce Point-In-Context (PIC), a novel framework for 3D point cloud understanding via in-context learning. We address the technical challenge of effectively extending masked point modeling to 3D point clouds by introducing a Joint Sampling module and proposing a vanilla version of PIC called Point-In-Context-Generalist (PIC-G). PIC-G is designed as a generalist model for various 3D point cloud tasks, with inputs and outputs modeled as coordinates. In this paradigm, the challenging segmentation task is achieved by assigning label points with XYZ coordinates for each category; the final prediction is then chosen based on the label point closest to the predictions. To break the limitation by the fixed label-coordinate assignment, which has poor generalization upon novel classes, we propose two novel training strategies, In-Context Labeling and In-Context Enhancing, forming an extended version of PIC named Point-In-Context-Segmenter (PIC-S), targeting improving dynamic context labeling and model training. By utilizing dynamic in-context labels and extra in-context pairs, PIC-S achieves enhanced performance and generalization capability in and across part segmentation datasets. PIC is a general framework so that other tasks or datasets can be seamlessly introduced into our PIC through a unified data format. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks and segmenting multi-datasets. Our PIC-S is capable of generalizing unseen datasets and performing novel part segmentation by customizing prompts.
翻译:随着基于多样化数据集训练的大规模模型兴起,上下文学习已成为多任务处理领域极具前景的范式,尤其在自然语言处理与图像处理中表现突出。然而,该范式在三维点云任务中的应用仍鲜有探索。本文提出Point-In-Context(PIC)——一种通过上下文学习实现三维点云理解的新型框架。我们通过引入联合采样模块,有效解决了将掩码点建模扩展至三维点云的技术挑战,并提出PIC的简化版本Point-In-Context-Generalist(PIC-G)。PIC-G被设计为面向多种三维点云任务的通才模型,其输入与输出均以坐标形式建模。在该范式下,具有挑战性的分割任务通过为每个类别分配带有XYZ坐标的标签点来实现:最终预测结果基于距离预测值最近的标签点进行选择。针对固定标签坐标分配导致新类别泛化能力差的问题,我们提出两种创新训练策略——上下文标签标注与上下文增强,构建PIC的扩展版本Point-In-Context-Segmenter(PIC-S),旨在优化动态上下文标签标注与模型训练。通过利用动态上下文标签及额外的上下文配对样本,PIC-S在部分分割数据集内及跨数据集场景下均展现出增强的性能与泛化能力。PIC作为通用框架,其他任务或数据集可通过统一数据格式无缝接入。我们通过大量实验验证了所提方法在应对多样化任务与跨数据集分割中的多功能性与适应性。PIC-S具备对未见数据集进行泛化推理的能力,并可通过定制提示实现新型部件分割。