Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of parameters. Large model sizes makes computational cost one of the main limiting factors for training and evaluating such models; and has raised severe concerns about the sustainability, reproducibility, and inclusiveness for researching PLMs. These concerns are often based on personal experiences and observations. However, there had not been any large-scale surveys that investigate them. In this work, we provide a first attempt to quantify these concerns regarding three topics, namely, environmental impact, equity, and impact on peer reviewing. By conducting a survey with 312 participants from the NLP community, we capture existing (dis)parities between different and within groups with respect to seniority, academia, and industry; and their impact on the peer reviewing process. For each topic, we provide an analysis and devise recommendations to mitigate found disparities, some of which already successfully implemented. Finally, we discuss additional concerns raised by many participants in free-text responses.
翻译:近年来NLP领域的诸多进步源于开发和使用具有数十亿参数的大型预训练语言模型(PLMs)。模型规模的扩大使得计算成本成为训练与评估此类模型的主要限制因素之一,并引发了对PLMs研究可持续性、可复现性和包容性的严重关切。这些关切通常基于个人经验与观察,然而此前尚未有任何大规模调查对此进行系统研究。本研究首次尝试量化这些关切的三个维度,即环境影响、公平性以及对同行评审的影响。通过对NLP社区312名参与者开展问卷调查,我们揭示了不同资历层级、学术机构与产业界内部及群体之间现有的(不)平等现象,及其对同行评审流程的影响。针对每个维度,我们进行分析并提出缓解发现不平等的建议(部分建议已成功实施)。最后,我们讨论了许多参与者在开放式回答中提出的其他关切问题。