Question Under Discussion (QUD) is a discourse framework that uses implicit questions to reveal discourse relationships between sentences. In QUD parsing, each sentence is viewed as an answer to a question triggered by an anchor sentence in prior context. The resulting QUD structure is required to conform to several theoretical criteria like answer compatibility (how well the question is answered), making QUD parsing a challenging task. Previous works construct QUD parsers in a pipelined manner (i.e. detect the trigger sentence in context and then generate the question). However, these parsers lack a holistic view of the task and can hardly satisfy all the criteria. In this work, we introduce QUDSELECT, a joint-training framework that selectively decodes the QUD dependency structures considering the QUD criteria. Using instruction-tuning, we train models to simultaneously predict the anchor sentence and generate the associated question. To explicitly incorporate the criteria, we adopt a selective decoding strategy of sampling multiple QUD candidates during inference, followed by selecting the best one with criteria scorers. Our method outperforms the state-of-the-art baseline models by 9% in human evaluation and 4% in automatic evaluation, demonstrating the effectiveness of our framework.
翻译:讨论问题(QUD)是一种利用隐含问题揭示句子间语篇关系的理论框架。在QUD解析任务中,每个句子都被视为对前文锚点句所触发问题的回答。生成的QUD结构需满足答案兼容性(问题被回答的充分程度)等多项理论标准,这使得QUD解析成为一项具有挑战性的任务。现有研究通常采用流水线方式构建QUD解析器(即先检测上下文中的触发句,再生成问题),但这类解析器缺乏对任务的整体性考量,难以同时满足所有标准。本研究提出QUDSELECT——一种基于联合训练的选择性解码框架,该框架在解码QUD依存结构时综合考虑各项QUD标准。通过指令微调技术,我们训练模型同时预测锚点句并生成对应问题。为显式融入标准约束,我们在推理阶段采用选择性解码策略:首先生成多个QUD候选结构,随后通过标准评分器筛选最优结果。实验表明,我们的方法在人工评估中超越现有最优基线模型9%,在自动评估中提升4%,验证了该框架的有效性。