To broaden the dissemination of scientific knowledge to diverse audiences, scientific document summarization must simultaneously control multiple attributes such as length and empirical focus. However, existing research typically focuses on controlling single attributes, leaving the compositional control of multiple attributes underexplored. To address this gap, we introduce CCSBench, a benchmark for compositional controllable summarization in the scientific domain. Our benchmark enables fine-grained control over both explicit attributes (e.g., length), which are objective and straightforward, and implicit attributes (e.g., empirical focus), which are more subjective and conceptual. We conduct extensive experiments on GPT-4, LLaMA2, and other popular LLMs under various settings. Our findings reveal significant limitations in large language models' ability to balance trade-offs between control attributes, especially implicit ones that require deeper understanding and abstract reasoning.
翻译:为将科学知识更广泛地传播给不同受众,科学文档摘要必须同时控制多个属性,如长度和实证焦点。然而,现有研究通常侧重于控制单一属性,对多个属性的组合控制探索不足。为填补这一空白,我们提出了CCSBench,一个面向科学领域的组合可控摘要基准。我们的基准支持对显式属性(例如长度,这些属性客观且直接)和隐式属性(例如实证焦点,这些属性更为主观和概念化)进行细粒度控制。我们在多种设置下对GPT-4、LLaMA2及其他流行大型语言模型进行了广泛实验。我们的研究结果揭示了大型语言模型在平衡控制属性(尤其是需要更深层次理解和抽象推理的隐式属性)之间的权衡方面存在显著局限性。