Lay summarisation aims to jointly summarise and simplify a given text, thus making its content more comprehensible to non-experts. Automatic approaches for lay summarisation can provide significant value in broadening access to scientific literature, enabling a greater degree of both interdisciplinary knowledge sharing and public understanding when it comes to research findings. However, current corpora for this task are limited in their size and scope, hindering the development of broadly applicable data-driven approaches. Aiming to rectify these issues, we present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale), each of which contains biomedical journal articles alongside expert-written lay summaries. We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets that can be leveraged to support the needs of different applications. Finally, we benchmark our datasets using mainstream summarisation approaches and perform a manual evaluation with domain experts, demonstrating their utility and casting light on the key challenges of this task.
翻译:通俗摘要旨在对给定文本进行联合摘要和简化,从而使非专家读者更易理解其内容。自动化的通俗摘要方法在拓宽科学文献的可及性方面具有重要价值,能够促进跨学科知识共享并提升公众对研究成果的理解。然而,当前用于该任务的语料库在规模和范围上均存在局限性,阻碍了广泛适用数据驱动方法的发展。为弥补这些不足,我们提出了两个新型通俗摘要数据集——PLOS(大规模)和eLife(中等规模),每个数据集均包含生物医学期刊文章及专家撰写的通俗摘要。我们对通俗摘要进行了全面特征分析,揭示了不同数据集间可读性和抽象性的差异,这些差异可被用于支持不同应用场景的需求。最终,我们采用主流摘要方法对数据集进行基准测试,并联合领域专家进行人工评估,验证了其有效性,同时揭示了该任务面临的关键挑战。