Although deep learning models have taken on commercial and political relevance, key aspects of their training and operation remain poorly understood. This has sparked interest in science of deep learning projects, many of which require large amounts of time, money, and electricity. But how much of this research really needs to occur at scale? In this paper, we introduce MNIST-1D: a minimalist, procedurally generated, low-memory, and low-compute alternative to classic deep learning benchmarks. Although the dimensionality of MNIST-1D is only 40 and its default training set size only 4000, MNIST-1D can be used to study inductive biases of different deep architectures, find lottery tickets, observe deep double descent, metalearn an activation function, and demonstrate guillotine regularization in self-supervised learning. All these experiments can be conducted on a GPU or often even on a CPU within minutes, allowing for fast prototyping, educational use cases, and cutting-edge research on a low budget.
翻译:尽管深度学习模型已具备商业与政治层面的重要性,其训练与运行的关键环节仍缺乏深入理解。这激发了深度学习科学研究的广泛兴趣,然而此类研究往往需要投入大量时间、资金与电力资源。但究竟有多少研究真正需要大规模计算?本文提出MNIST-1D:一种极简化的、程序化生成的、低内存与低计算需求的经典深度学习基准替代方案。虽然MNIST-1D的维度仅为40且默认训练集规模仅4000个样本,该数据集仍可用于研究不同深度架构的归纳偏置、发现彩票假设、观察深度双下降现象、元学习激活函数,并在自监督学习中验证断头台正则化效应。所有实验均可在GPU上完成,甚至多数情况下仅需CPU即可在数分钟内完成,这为快速原型开发、教学应用场景以及低成本的前沿研究提供了可能。