Summarizing multiple disaster-relevant data streams simultaneously is particularly challenging as existing Retrieve&Re-ranking strategies suffer from the inherent redundancy of multi-stream data and limited scalability in a multi-query setting. This work proposes an online approach to crisis timeline generation based on weak annotation with Deep Q-Networks. It selects on-the-fly the relevant pieces of text without requiring neither human annotations nor content re-ranking. This makes the inference time independent of the number of input queries. The proposed approach also incorporates a redundancy filter into the reward function to effectively handle cross-stream content overlaps. The achieved ROUGE and BERTScore results are superior to those of best-performing models on the CrisisFACTS 2022 benchmark.
翻译:同时总结多个灾害相关数据流具有特殊挑战性,因为现有的检索与重排序策略在多流数据固有冗余性和多查询场景的有限扩展性方面存在缺陷。本文提出一种基于深度Q网络弱标注的在线危机时间线生成方法。该方法无需人工标注或内容重排序即可实时选择相关文本片段,使推理时间与输入查询数量无关。该方案还将冗余过滤器纳入奖励函数,有效处理跨流内容重叠。在CrisisFACTS 2022基准测试中,所获ROUGE和BERTScore指标均优于最优模型。