Archetypal analysis is a matrix factorization method with convexity constraints. Due to local minima, a good initialization is essential. Frequently used initialization methods yield either sub-optimal starting points or are prone to get stuck in poor local minima. In this paper, we propose archetypal analysis++ (AA++), a probabilistic initialization strategy for archetypal analysis that sequentially samples points based on their influence on the objective, similar to $k$-means++. In fact, we argue that $k$-means++ already approximates the proposed initialization method. Furthermore, we suggest to adapt an efficient Monte Carlo approximation of $k$-means++ to AA++. In an extensive empirical evaluation of 13 real-world data sets of varying sizes and dimensionalities and considering two pre-processing strategies, we show that AA++ almost consistently outperforms all baselines, including the most frequently used ones.
翻译:原型分析是一种带有凸性约束的矩阵分解方法。由于存在局部最小值问题,良好的初始化至关重要。常用的初始化方法要么提供次优的初始点,要么容易陷入不良局部最小值。本文提出了原型分析++(AA++),这是一种用于原型分析的概率初始化策略,该策略根据样本对目标函数的影响程度进行顺序抽样,类似于$k$-均值++算法。实际上,我们认为$k$-均值++已经近似于所提出的初始化方法。此外,我们建议将$k$-均值++的有效蒙特卡洛近似方法适配到AA++中。通过对13个规模与维度各异且采用两种预处理策略的真实世界数据集进行广泛实证评估,我们表明AA++几乎始终优于所有基线方法,包括最常用的那些。