Biclustering is widely used in different kinds of fields including gene information analysis, text mining, and recommendation system by effectively discovering the local correlation between samples and features. However, many biclustering algorithms will collapse when facing heavy-tailed data. In this paper, we propose a robust version of convex biclustering algorithm with Huber loss. Yet, the newly introduced robustification parameter brings an extra burden to selecting the optimal parameters. Therefore, we propose a tuning-free method for automatically selecting the optimal robustification parameter with high efficiency. The simulation study demonstrates the more fabulous performance of our proposed method than traditional biclustering methods when encountering heavy-tailed noise. A real-life biomedical application is also presented. The R package RcvxBiclustr is available at https://github.com/YifanChen3/RcvxBiclustr.
翻译:双聚类通过有效发现样本与特征之间的局部关联,被广泛应用于基因信息分析、文本挖掘和推荐系统等多个领域。然而,许多双聚类算法在处理重尾数据时会出现性能崩溃。本文提出了一种基于Huber损失的鲁棒凸双聚类算法。但新引入的鲁棒化参数给最优参数选择带来了额外负担。为此,我们提出了一种高效自动选择最优鲁棒化参数的免调参方法。仿真研究表明,当遭遇重尾噪声时,本方法比传统双聚类方法展现出更优越的性能。文中还给出了真实生物医学应用案例。R包RcvxBiclustr可通过https://github.com/YifanChen3/RcvxBiclustr获取。