There are pronounced differences in the extent to which industrial and academic AI labs use computing resources. We provide a data-driven survey of the role of the compute divide in shaping machine learning research. We show that a compute divide has coincided with a reduced representation of academic-only research teams in compute intensive research topics, especially foundation models. We argue that, academia will likely play a smaller role in advancing the associated techniques, providing critical evaluation and scrutiny, and in the diffusion of such models. Concurrent with this change in research focus, there is a noticeable shift in academic research towards embracing open source, pre-trained models developed within the industry. To address the challenges arising from this trend, especially reduced scrutiny of influential models, we recommend approaches aimed at thoughtfully expanding academic insights. Nationally-sponsored computing infrastructure coupled with open science initiatives could judiciously boost academic compute access, prioritizing research on interpretability, safety and security. Structured access programs and third-party auditing may also allow measured external evaluation of industry systems.
翻译:工业界与学术界人工智能实验室在计算资源使用上存在显著差异。我们通过数据驱动的方式调查了计算鸿沟在塑造机器学习研究中的作用,表明计算鸿沟与纯学术研究团队在计算密集型研究课题(尤其是基础模型)中代表性下降的现象相吻合。我们认为,学术界在推动相关技术发展、提供批判性评估与审视以及推广此类模型方面的影响力可能会减弱。伴随这种研究焦点的转变,学术研究明显转向采用工业界开发的开源预训练模型。为应对这一趋势带来的挑战(尤其是对具有影响力模型的审视不足),我们建议采取旨在审慎拓展学术洞察力的方法。由国家资助的计算基础设施与开放科学计划相结合,可明智地提升学术界的计算资源获取能力,优先支持可解释性、安全性与可靠性研究。结构化访问计划与第三方审计机制亦能对工业系统进行适度的外部评估。