DataPerf: Benchmarks for Data-Centric AI Development

Mark Mazumder,Colby Banbury,Xiaozhe Yao,Bojan Karlaš,William Gaviria Rojas,Sudnya Diamos,Greg Diamos,Lynn He,Alicia Parrish,Hannah Rose Kirk,Jessica Quaye,Charvi Rastogi,Douwe Kiela,David Jurado,David Kanter,Rafael Mosquera,Juan Ciro,Lora Aroyo,Bilge Acun,Lingjiao Chen,Mehul Smriti Raje,Max Bartolo,Sabri Eyuboglu,Amirata Ghorbani,Emmett Goodman,Oana Inel,Tariq Kane,Christine R. Kirkpatrick,Tzu-Sheng Kuo,Jonas Mueller,Tristan Thrush,Joaquin Vanschoren,Margaret Warren,Adina Williams,Serena Yeung,Newsha Ardalani,Praveen Paritosh,Lilith Bat-Leah,Ce Zhang,James Zou,Carole-Jean Wu,Cody Coleman,Andrew Ng,Peter Mattson,Vijay Janapa Reddi

from arxiv, NeurIPS 2023 Datasets and Benchmarks Track

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.

翻译：机器学习研究长期以来聚焦于模型而非数据集，且常用数据集被用于一般性机器学习任务，却未充分考虑底层问题的广度、难度与真实性。忽视数据的基础重要性导致实际应用中出现不准确、偏差与脆弱性问题，而现有数据集基准测试的饱和现象也阻碍了研究进展。为此，我们提出DataPerf——一个由社区主导的基准测试套件，用于评估机器学习数据集及数据为中心的算法。我们旨在通过竞赛、可比性与可重复性推动数据为中心的人工智能创新。我们使机器学习社区能够迭代优化数据集（而非仅局限于架构），并提供开放的在线平台与多轮挑战赛以支持这一迭代开发过程。DataPerf首批版本包含五项基准测试，涵盖视觉、语音、数据采集、调试与扩散提示等广泛的数据中心技术、任务与模态，并支持社区贡献新增基准测试。这些基准测试、在线评估平台及基线实现均为开源项目，MLCommons协会将维护DataPerf以确保其对学术界与工业界的长期效益。