The volume of image repositories continues to grow. Despite the availability of content-based addressing, we still lack a lightweight tool that allows us to discover images of distinct characteristics from a large collection. In this paper, we propose a fast and training-free algorithm for novel image discovery. The key of our algorithm is formulating a collection of images as a perceptual distance-weighted graph, within which our task is to locate the K-densest subgraph that corresponds to a subset of the most unique images. While solving this problem is not just NP-hard but also requires a full computation of the potentially huge distance matrix, we propose to relax it into a K-sparse eigenvector problem that we can efficiently solve using stochastic gradient descent (SGD) without explicitly computing the distance matrix. We compare our algorithm against state-of-the-arts on both synthetic and real datasets, showing that it is considerably faster to run with a smaller memory footprint while able to mine novel images more accurately.
翻译:图像数据库的规模持续增长。尽管基于内容的寻址技术已经存在,但我们仍缺乏一种轻量级工具,能够从海量数据中发现具有独特特征的图像。本文提出一种快速且无需训练的算法,用于新颖图像发现。该算法的核心是将图像集合构建为感知距离加权图,在此图中定位K-最密子图,对应最独特图像的子集。尽管该问题不仅是NP难的,而且需要完整计算潜在的巨大距离矩阵,我们将其松弛为K-稀疏特征向量问题,通过随机梯度下降(SGD)高效求解,无需显式计算距离矩阵。我们在合成数据集和真实数据集上将该算法与现有最优方法比较,结果表明该算法在运行速度更快、内存占用更小的同时,能够更精确地挖掘新颖图像。