The ARR Responsible NLP Research checklist website states that the "checklist is designed to encourage best practices for responsible research, addressing issues of research ethics, societal impact and reproducibility." Answering the questions is an opportunity for authors to reflect on their work and make sure any shared scientific assets follow best practices. Ideally, considering a checklist before submission can favorably impact the writing of a research paper. However, previous research has shown that self-reported checklist responses don't always accurately represent papers. In this work, we introduce ConfReady, a retrieval-augmented generation (RAG) application that can be used to empower authors to reflect on their work and assist authors with conference checklists. To evaluate checklist assistants, we curate a dataset of 1,975 ACL checklist responses, analyze problems in human answers, and benchmark RAG and Large Language Model (LM) based systems on an evaluation subset. Our code is released under the AGPL-3.0 license on GitHub, with documentation covering the user interface and PyPI package.
翻译:ARR负责任NLP研究清单网站指出,“该清单旨在鼓励负责任研究的最佳实践,解决研究伦理、社会影响和可复现性等问题”。回答清单问题为作者提供了反思自身研究工作的机会,并确保所有共享的科学资产遵循最佳实践。理想情况下,在提交前考虑清单问题能够对研究论文的撰写产生积极影响。然而,先前研究表明,自我报告的清单应答并不总能准确反映论文内容。本研究提出ConfReady——一个基于检索增强生成(RAG)的应用程序,可用于帮助作者反思研究工作并协助完成会议清单应答。为评估清单助手性能,我们构建了包含1,975份ACL清单应答的数据集,分析了人工应答中的问题,并在评估子集上对基于RAG和大语言模型(LM)的系统进行了基准测试。我们的代码已在GitHub上基于AGPL-3.0许可证发布,相关文档涵盖了用户界面和PyPI软件包的使用说明。