There are pronounced differences in the extent to which industrial and academic AI labs use computing resources. We provide a data-driven survey of the role of the compute divide in shaping machine learning research. We show that a compute divide has coincided with a reduced representation of academic-only research teams in compute intensive research topics, especially foundation models. We argue that, academia will likely play a smaller role in advancing the associated techniques, providing critical evaluation and scrutiny, and in the diffusion of such models. Concurrent with this change in research focus, there is a noticeable shift in academic research towards embracing open source, pre-trained models developed within the industry. To address the challenges arising from this trend, especially reduced scrutiny of influential models, we recommend approaches aimed at thoughtfully expanding academic insights. Nationally-sponsored computing infrastructure coupled with open science initiatives could judiciously boost academic compute access, prioritizing research on interpretability, safety and security. Structured access programs and third-party auditing may also allow measured external evaluation of industry systems.
翻译:工业界与学术界人工智能实验室在计算资源使用上存在显著差异。我们通过数据驱动的方式系统考察了计算鸿沟在塑造机器学习研究中的作用,证明计算鸿沟与纯学术研究团队在计算密集型研究主题(尤其是基础模型领域)中占比下降的现象相吻合。我们认为,学术界在推进相关技术发展、提供批判性评估与学术审视及推广此类模型方面的作用可能逐渐弱化。伴随研究焦点的转变,学术界呈现出明显转向拥抱工业界开发的开源预训练模型的趋势。为应对这一趋势带来的挑战(尤其是对具有影响力模型的审视不足),我们建议采取旨在审慎拓展学术洞察力的策略。国家级资助的计算基础设施与开放科学计划相结合,可理性提升学术计算资源获取能力,优先支持可解释性、安全性与防御性研究。结构化访问计划与第三方审计机制亦能对产业系统实施有节制的外部评估。