Categorization is one of the basic tasks in machine learning and data analysis. Building on formal concept analysis (FCA), the starting point of the present work is that different ways to categorize a given set of objects exist, which depend on the choice of the sets of features used to classify them, and different such sets of features may yield better or worse categorizations, relative to the task at hand. In their turn, the (a priori) choice of a particular set of features over another might be subjective and express a certain epistemic stance (e.g. interests, relevance, preferences) of an agent or a group of agents, namely, their interrogative agenda. In the present paper, we represent interrogative agendas as sets of features, and explore and compare different ways to categorize objects w.r.t. different sets of features (agendas). We first develop a simple unsupervised FCA-based algorithm for outlier detection which uses categorizations arising from different agendas. We then present a supervised meta-learning algorithm to learn suitable (fuzzy) agendas for categorization as sets of features with different weights or masses. We combine this meta-learning algorithm with the unsupervised outlier detection algorithm to obtain a supervised outlier detection algorithm. We show that these algorithms perform at par with commonly used algorithms for outlier detection on commonly used datasets in outlier detection. These algorithms provide both local and global explanations of their results.
翻译:分类是机器学习与数据分析的基本任务之一。基于形式概念分析(FCA),本研究的出发点在于:对给定对象集存在多种分类方式,这取决于用于分类的特征集的选择;而针对具体任务,不同特征集可能产生优劣不一的分类结果。此外,(先验地)选择特定特征集而非其他特征集可能具有主观性,并体现某个或某组智能体的特定认知立场(如兴趣、相关性、偏好),即其"询问议程"。本文中,我们将询问议程表示为特征集,并探索比较基于不同特征集(议程)对对象进行分类的多种方式。首先提出一种基于FCA的无监督离群点检测算法,该算法利用由不同议程产生的分类结果;随后提出一种有监督元学习算法,用于学习适用于分类的(模糊)议程,即具有不同权重或质量的若干特征集。通过将元学习算法与无监督离群点检测算法相结合,得到一种有监督离群点检测算法。实验表明,在离群点检测常用数据集上,这些算法的性能与常见离群点检测算法相当,并且能为其结果提供局部与全局解释。