Cluster analysis relates to the task of assigning objects into groups which ideally present some desirable characteristics. When a cluster structure is confined to a subset of the feature space, traditional clustering techniques face unprecedented challenges. We present an information-theoretic framework that overcomes the problems associated with sparse data, allowing for joint feature weighting and clustering. Our proposal constitutes a competitive alternative to existing clustering algorithms for sparse data, as demonstrated through simulations on synthetic data. The effectiveness of our method is established by an application on a real-world genomics data set.
翻译:聚类分析涉及将对象分配到理想情况下呈现某些期望特征的组中的任务。当聚类结构局限于特征空间的子集时,传统聚类技术面临前所未有的挑战。我们提出了一种信息论框架,克服了稀疏数据带来的问题,允许同时进行特征加权和聚类。正如在合成数据上的模拟实验所证明的,我们的方案构成了现有稀疏数据聚类算法的一个有竞争力的替代方案。我们方法在真实世界基因组学数据集上的应用验证了其有效性。