Most of the Natural Language Processing sys- tems are involved in entity-based processing for several tasks like Information Extraction, Question-Answering, Text-Summarization and so on. A new challenge comes when entities play roles according to their act or attributes in certain context. Entity Role Detection is the task of assigning such roles to the entities. Usu- ally real-world entities are of types: person, lo- cation and organization etc. Roles could be con- sidered as domain-dependent subtypes of these types. In the cases, where retrieving a subset of entities based on their roles is needed, poses the problem of defining the role and entities having those roles. This paper presents the study of study of solving Entity Role Detection prob- lem by modeling it as Named Entity Recogni- tion (NER) and Entity Retrieval/Ranking task. In NER, these roles could be considered as mutually exclusive classes and standard NER methods like sequence tagging could be used. For Entity Retrieval, Roles could be formulated as Query and entities as Collection on which the query needs to be executed. The aspect of Entity Retrieval task, which is different than document retrieval task is that the entities and roles against which they need to be retrieved are indirectly described. We have formulated au- tomated ways of learning representative words and phrases and building representations of roles and entities using them. We have also explored different contexts like sentence and document. Since the roles depend upon con- text, so it is not always possible to have large domain-specific dataset or knowledge bases for learning purposes, so we have tried to exploit the information from small dataset in domain- agnostic way.
翻译:大多数自然语言处理系统在处理信息抽取、问答系统、文本摘要等任务时,均涉及基于实体的处理流程。当实体在特定语境中根据其行为或属性扮演不同角色时,便产生了新的挑战。实体角色检测任务旨在为实体分配此类角色。现实世界中的实体通常可分为人物、地点、组织等类型,而角色可视为这些类型在特定领域下的子类。当需要根据角色筛选实体子集时,如何定义角色及具备这些角色的实体便成为关键问题。本文提出通过将实体角色检测问题建模为命名实体识别与实体检索/排序任务的研究方案。在命名实体识别中,这些角色可视为互斥的类别,可采用序列标注等标准方法进行处理。对于实体检索任务,角色可表述为查询条件,实体则构成待检索的集合。与文档检索任务不同,实体检索的特殊性在于实体及其对应角色的描述往往具有间接性。我们构建了自动化学习表征词与短语的框架,并利用这些表征构建角色与实体的向量表示。同时探索了句子与文档等不同语境层级。由于角色具有语境依赖性,获取大规模领域专用数据集或知识库常面临困难,因此我们尝试以领域无关的方式挖掘小规模数据集的信息价值。