We present ACL OCL, a scholarly corpus derived from the ACL Anthology to assist Open scientific research in the Computational Linguistics domain. Integrating and enhancing the previous versions of the ACL Anthology, the ACL OCL contributes metadata, PDF files, citation graphs and additional structured full texts with sections, figures, and links to a large knowledge resource (Semantic Scholar). The ACL OCL spans seven decades, containing 73K papers, alongside 210K figures. We spotlight how ACL OCL applies to observe trends in computational linguistics. By detecting paper topics with a supervised neural model, we note that interest in "Syntax: Tagging, Chunking and Parsing" is waning and "Natural Language Generation" is resurging. Our dataset is available from HuggingFace (https://huggingface.co/datasets/WINGNUS/ACL-OCL).
翻译:我们提出ACL OCL,这是一个源自ACL Anthology的学术语料库,旨在支持计算语言学领域的开放科学研究。通过整合并增强早期版本的ACL Anthology,ACL OCL贡献了元数据、PDF文件、引文图谱以及包含章节、图表和链接至大规模知识资源(Semantic Scholar)的额外结构化全文。该语料库跨越七十年,收录73,000篇论文及210,000幅图表。我们重点展示ACL OCL在观察计算语言学研究趋势中的应用:通过监督神经模型检测论文主题,我们发现对"句法:词性标注、组块分析与句法分析"的兴趣正在减弱,而"自然语言生成"领域正重新兴起。我们的数据集已在HuggingFace平台开放共享(https://huggingface.co/datasets/WINGNUS/ACL-OCL)。