With the increasing demand of intelligent systems capable of operating in different contexts (e.g. users on the move) the correct interpretation of the user-need by such systems has become crucial to give consistent answers to the user questions. The most effective applications addressing such task are in the fields of natural language processing and semantic expansion of terms. These techniques are aimed at estimating the goal of an input query reformulating it as an intent, commonly relying on textual resources built exploiting different semantic relations like \emph{synonymy}, \emph{antonymy} and many others. The aim of this paper is to generate such resources using the labels of a given taxonomy as source of information. The obtained resources are integrated into a plain classifier for reformulating a set of input queries as intents and tracking the effect of each relation, in order to quantify the impact of each semantic relation on the classification. As an extension to this, the best tradeoff between improvement and noise introduction when combining such relations is evaluated. The assessment is made generating the resources and their combinations and using them for tuning the classifier which is used to reformulate the user questions as labels. The evaluation employs a wide and varied taxonomy as a use-case, exploiting its labels as basis for the semantic expansion and producing several corpora with the purpose of enhancing the pseudo-queries estimation.
翻译:随着能够在不同情境(例如移动用户)下运行的智能系统需求日益增长,这类系统对用户需求的准确解读已成为提供一致答案的关键。处理此类任务最有效的应用领域是自然语言处理和术语语义扩展。这些技术旨在通过输入查询的重构来估计其目标,将其转化为意图,通常依赖于利用不同语义关系(如同义、反义等)构建的文本资源。本文旨在利用给定分类体系的标签作为信息源来生成此类资源。将获得的资源集成到一个简单的分类器中,用于将一组输入查询重构为意图,并追踪每种关系的影响,以量化每种语义关系对分类的影响。作为扩展,本文评估了在组合这些关系时改进与引入噪声之间的最佳权衡。评估通过生成资源及其组合,并用于调整分类器(该分类器用于将用户问题重构为标签)来进行。评估采用一个广泛且多样的分类体系作为用例,利用其标签作为语义扩展的基础,并生成多个语料库以增强伪查询的估计。