We are interested in aligning how people think about objects and what machines perceive, meaning by this the fact that object recognition, as performed by a machine, should follow a process which resembles that followed by humans when thinking of an object associated with a certain concept. The ultimate goal is to build systems which can meaningfully interact with their users, describing what they perceive in the users' own terms. As from the field of Lexical Semantics, humans organize the meaning of words in hierarchies where the meaning of, e.g., a noun, is defined in terms of the meaning of a more general noun, its genus, and of one or more differentiating properties, its differentia. The main tenet of this paper is that object recognition should implement a hierarchical process which follows the hierarchical semantic structure used to define the meaning of words. We achieve this goal by implementing an algorithm which, for any object, recursively recognizes its visual genus and its visual differentia. In other words, the recognition of an object is decomposed in a sequence of steps where the locally relevant visual features are recognized. This paper presents the algorithm and a first evaluation.
翻译:我们致力于将人类对物体的认知方式与机器的感知过程对齐,即机器执行的物体识别应遵循与人类思考特定概念相关物体时相似的流程。最终目标是构建能够与用户进行有意义交互的系统,使用户能以自身语言描述其感知内容。借鉴词汇语义学领域的研究,人类将词语含义组织为层级结构:例如,一个名词的含义由更广义的名词(其属类)和一个或多个区分属性(其差类)共同定义。本文的核心论点是:物体识别应实现与定义词语含义的层级语义结构相一致的层级化过程。我们通过实现一种算法达成此目标,该算法对任意物体递归地识别其视觉属类与视觉差类。换言之,物体识别被分解为一系列步骤,其中逐步识别局部相关的视觉特征。本文介绍了该算法及初步评估结果。