Several recent works have suggested to represent semantic relations with questions and answers, decomposing textual information into separate interrogative natural language statements. In this paper, we consider three QA-based semantic tasks - namely, QA-SRL, QANom and QADiscourse, each targeting a certain type of predication - and propose to regard them as jointly providing a comprehensive representation of textual information. To promote this goal, we investigate how to best utilize the power of sequence-to-sequence (seq2seq) pre-trained language models, within the unique setup of semi-structured outputs, consisting of an unordered set of question-answer pairs. We examine different input and output linearization strategies, and assess the effect of multitask learning and of simple data augmentation techniques in the setting of imbalanced training data. Consequently, we release the first unified QASem parsing tool, practical for downstream applications who can benefit from an explicit, QA-based account of information units in a text.
翻译:近年多项研究提出以问答形式表示语义关系,将文本信息分解为独立的疑问式自然语言陈述。本文考虑三种基于问答的语义任务——即QA-SRL、QANom和QADiscourse,分别针对特定类型的谓词结构,并提出将其视为共同提供文本信息的综合表征。为推进这一目标,我们研究了如何在半结构化输出(由无序问答对集合构成)的独特设置中,最优地利用序列到序列预训练语言模型的能力。我们考察了不同的输入输出线性化策略,评估了多任务学习与简单数据增强技术在训练数据不均衡场景下的效果。最终发布首个统一的QASem解析工具,适用于需要从文本中获取显式、基于问答信息单元表征的下游应用场景。