Large Language Model (LLM) can enhance its credibility and verifiability by generating text with citations. However, existing research on citation generation is predominantly limited to sentence-level statements, neglecting the significance of positional fine-grained citations that can appear anywhere within sentences. To facilitate further exploration of the positional fine-grained citation generation, we propose ALiiCE, the first automatic evaluation framework for this task. Our method employs a dependency tree based approach to parse the sentence-level claim into atomic claims. Then ALiiCE evaluates citation quality using three metrics, including positional fine-grained citation recall, precision, and coefficient of variation of citation positions. We evaluate the positional fine-grained citation generation performance of several LLMs on long-form QA datasets. Our experiments and analyses demonstrate the effectiveness and reasonableness of ALiiCE. We offer our insights into the current advancements and future directions for the positional fine-grained citation generation task.
翻译:大型语言模型(LLM)可通过生成附带引文的文本来增强其可信度与可验证性。然而,现有关于引文生成的研究主要局限于句子层面的陈述,忽视了可在句子任意位置出现的定位细粒度引文的重要性。为促进对定位细粒度引文生成的进一步探索,我们提出了首个面向该任务的自动评估框架ALiiCE。该方法基于依存树将句子级主张解析为原子主张,进而通过定位细粒度引文召回率、精确度及引文位置变异系数三项指标评估引文质量。我们在长文本问答数据集上评估了多个LLM的定位细粒度引文生成性能。实验与分析证明了ALiiCE的有效性与合理性。我们对该任务的当前进展与未来方向提出了见解。