Given large data sets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of each problem? Or is it more efficient to learn them from data? We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and on general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs. Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget. Finally, the optimal allocation of a compute budget onto model size and training duration differs between equivariant and non-equivariant models.
翻译:给定大规模数据集和充足的计算资源,针对每个问题的结构和对称性设计神经架构是否有益?还是从数据中学习这些特性更为高效?我们通过实证研究等变网络与非等变网络在计算资源和训练样本规模扩展时的表现。聚焦于刚体相互作用的基准问题及通用Transformer架构,我们进行了一系列实验,通过改变模型规模、训练步数和数据集大小来探究其影响。研究结果支持以下三个结论:首先,等变性能够提升数据效率,但通过数据增强训练非等变模型在足够训练周期下可以弥补这一差距;其次,计算资源的扩展遵循幂律规律,在每个测试的计算预算下,等变模型均优于非等变模型;最后,等变与非等变模型在计算预算分配到模型规模和训练时长上的最优策略存在差异。