The role of data in building AI systems has recently been significantly magnified by the emerging concept of data-centric AI (DCAI), which advocates a fundamental shift from model advancements to ensuring data quality and reliability. Although our community has continuously invested efforts into enhancing data in different aspects, they are often isolated initiatives on specific tasks. To facilitate the collective initiative in our community and push forward DCAI, we draw a big picture and bring together three general missions: training data development, inference data development, and data maintenance. We provide a top-level discussion on representative DCAI tasks and share perspectives. Finally, we list open challenges. More resources are summarized at https://github.com/daochenzha/data-centric-AI
翻译:在构建AI系统的过程中,数据的作用近年因新兴的“以数据为中心的AI”(DCAI)概念而显著增强,该概念主张从模型改进的根本性转向确保数据质量与可靠性。尽管我们的社区持续在不同方面投入精力以提升数据质量,但这些努力往往局限于特定任务上的孤立举措。为促进社区协同行动并推动DCAI发展,我们绘制了全景图谱,整合了三大通用性使命:训练数据开发、推理数据开发与数据维护。我们针对代表性DCAI任务展开顶层讨论并分享观点,最后列出了开放挑战。更多资源汇总于 https://github.com/daochenzha/data-centric-AI