Unlike image or text domains that benefit from an abundance of large-scale datasets, point cloud learning techniques frequently encounter limitations due to the scarcity of extensive datasets. To overcome this limitation, we present Symmetria, a formula-driven dataset that can be generated at any arbitrary scale. By construction, it ensures the absolute availability of precise ground truth, promotes data-efficient experimentation by requiring fewer samples, enables broad generalization across diverse geometric settings, and offers easy extensibility to new tasks and modalities. Using the concept of symmetry, we create shapes with known structure and high variability, enabling neural networks to learn point cloud features effectively. Our results demonstrate that this dataset is highly effective for point cloud self-supervised pre-training, yielding models with strong performance in downstream tasks such as classification and segmentation, which also show good few-shot learning capabilities. Additionally, our dataset can support fine-tuning models to classify real-world objects, highlighting our approach's practical utility and application. We also introduce a challenging task for symmetry detection and provide a benchmark for baseline comparisons. A significant advantage of our approach is the public availability of the dataset, the accompanying code, and the ability to generate very large collections, promoting further research and innovation in point cloud learning.
翻译:与受益于丰富大规模数据集的图像或文本领域不同,点云学习技术常常因广泛数据集的稀缺而受到限制。为克服这一局限,我们提出了Symmetria,这是一个可按任意规模生成的公式驱动数据集。通过构建,它确保了精确真实标签的绝对可用性,通过要求更少的样本促进了数据高效的实验,实现了跨多样化几何设置的广泛泛化,并提供了向新任务和模态的轻松可扩展性。利用对称性概念,我们创建了具有已知结构和高变异性的形状,使神经网络能够有效地学习点云特征。我们的结果表明,该数据集对于点云自监督预训练非常有效,所产生的模型在下游任务(如分类与分割)中表现出强劲性能,同时也展现出良好的少样本学习能力。此外,我们的数据集能够支持微调模型以对真实世界物体进行分类,凸显了我们方法的实用价值与应用前景。我们还引入了一项具有挑战性的对称性检测任务,并提供了用于基线比较的基准。我们方法的一个显著优势在于数据集、配套代码的公开可用性,以及生成超大规模集合的能力,从而促进了点云学习领域的进一步研究与创新。