Archetypal analysis is a matrix factorization method with convexity constraints. Due to local minima, a good initialization is essential, but frequently used initialization methods yield either sub-optimal starting points or are prone to get stuck in poor local minima. In this paper, we propose archetypal analysis++ (AA++), a probabilistic initialization strategy for archetypal analysis that sequentially samples points based on their influence on the objective, similar to $k$-means++. In fact, we argue that $k$-means++ already approximates the proposed initialization method. Furthermore, we suggest to adapt an efficient Monte Carlo approximation of $k$-means++ to AA++. In an extensive empirical evaluation of 13 real-world data sets of varying sizes and dimensionalities and considering two pre-processing strategies, we show that AA++ nearly always outperforms all baselines, including the most frequently used ones.
翻译:原型分析是一种具有凸性约束的矩阵分解方法。由于局部最小值的存在,良好的初始化至关重要,但常用的初始化方法要么导致次优的起始点,要么容易陷入较差的局部最小值。本文提出原型分析++(AA++),这是一种针对原型分析的概率初始化策略,该策略根据数据点对目标函数的影响进行顺序采样,类似于$k$-均值++算法。事实上,我们证明$k$-均值++已近似于所提出的初始化方法。此外,我们建议将$k$-均值++的高效蒙特卡洛近似方法适配到AA++中。通过对13个不同规模和维度的真实世界数据集进行广泛实证评估,并结合两种预处理策略,结果表明AA++几乎总是优于所有基线方法,包括最常用的那些方法。