Document-level Information Extraction (DocIE) aims to produce an output template with the entities and relations of interest occurring in the given document. Standard practices include prompting decoder-only LLMs using greedy decoding to avoid output variability. Rather than treating this variability as a limitation, we show that sampling can produce substantially better solutions than greedy decoding, especially when using reasoning models. We thus propose ThinkTwice, a sampling and selection framework in which the LLM generates multiple candidate templates for a given document, and a selection module chooses the most suitable one. We introduce both an unsupervised method that exploits agreement across generated outputs, and a supervised selection method using reward models trained on labeled DocIE data. To address the scarcity of golden reasoning trajectories for DocIE, we propose a rejection-sampling-based method to generate silver training data that pairs output templates with reasoning traces. Our experiments show the validity of unsupervised and supervised ThinkTwice, consistently outperforming greedy baselines and the state-of-the-art.
翻译:文档级信息抽取(DocIE)的目标是根据给定文档中出现的实体和感兴趣的关系生成输出模板。标准实践通常采用贪心解码来提示仅解码器的大语言模型,以避免输出结果的波动性。然而,我们并未将这种波动性视为局限,而是证明采样方法能够产生比贪心解码显著更优的解决方案,尤其是在使用推理模型时。为此,我们提出了ThinkTwice框架,这是一种采样与选择框架:大语言模型为给定文档生成多个候选模板,随后通过选择模块筛选出最合适的模板。我们引入了一种利用生成输出间一致性的无监督方法,以及一种使用基于标注DocIE数据训练的奖励模型进行监督选择的方法。针对DocIE领域黄金推理轨迹稀缺的问题,我们提出了一种基于拒绝采样的方法,用于生成将输出模板与推理轨迹配对的银标训练数据。实验结果表明,无监督与监督版本的ThinkTwice均具有有效性,其性能持续超越贪心基线方法及当前最优技术。