Semantic answer type prediction (SMART) is known to be a useful step towards effective question answering (QA) systems. The SMART task involves predicting the top-$k$ knowledge graph (KG) types for a given natural language question. This is challenging due to the large number of types in KGs. In this paper, we propose use of extreme multi-label classification using Transformer models (XBERT) by clustering KG types using structural and semantic features based on question text. We specifically improve the clustering stage of the XBERT pipeline using textual and structural features derived from KGs. We show that these features can improve end-to-end performance for the SMART task, and yield state-of-the-art results.
翻译:语义答案类型预测(SMART)被认为是构建有效问答系统的重要步骤。SMART任务涉及对给定的自然语言问题预测知识图谱中的前$k$个类型。由于知识图谱中类型数量庞大,该任务具有挑战性。本文提出利用基于问题文本的结构性和语义特征对知识图谱类型进行聚类,并采用基于Transformer模型的极端多标签分类方法(XBERT)。我们特别改进了XBERT流水线中的聚类阶段,通过引入从知识图谱中提取的文本和结构特征。研究表明,这些特征能提升SMART任务的端到端性能,并取得当前最优结果。