The existing biclustering algorithms for finding feature relation based biclusters often depend on assumptions like monotonicity or linearity. Though a few algorithms overcome this problem by using density-based methods, they tend to miss out many biclusters because they use global criteria for identifying dense regions. The proposed method, RelDenClu uses the local variations in marginal and joint densities for each pair of features to find the subset of observations, which forms the bases of the relation between them. It then finds the set of features connected by a common set of observations, resulting in a bicluster. To show the effectiveness of the proposed methodology, experimentation has been carried out on fifteen types of simulated datasets. Further, it has been applied to six real-life datasets. For three of these real-life datasets, the proposed method is used for unsupervised learning, while for other three real-life datasets it is used as an aid to supervised learning. For all the datasets the performance of the proposed method is compared with that of seven different state-of-the-art algorithms and the proposed algorithm is seen to produce better results. The efficacy of proposed algorithm is also seen by its use on COVID-19 dataset for identifying some features (genetic, demographics and others) that are likely to affect the spread of COVID-19.
翻译:现有的用于发现基于特征关系的双聚类算法通常依赖于单调性或线性性等假设。尽管少数算法通过基于密度的方法克服了这一问题,但它们往往因使用全局标准来识别密集区域而遗漏许多双聚类。本文提出的方法RelDenClu利用每对特征的边缘密度与联合密度的局部变化来寻找观测子集,该子集构成了特征间关系的基础。随后,该方法通过共同的观测集找到相互关联的特征集合,从而形成一个双聚类。为验证所提方法的有效性,我们在十五类模拟数据集上进行了实验。此外,该方法还被应用于六个真实数据集。其中三个真实数据集用于无监督学习,另外三个则作为监督学习的辅助工具。在所有数据集上,我们将所提方法与七种不同的前沿算法进行性能比较,结果显示所提算法能产生更好的结果。通过在COVID-19数据集上的应用,该方法进一步证明了其有效性,成功识别出可能影响COVID-19传播的若干特征(包括遗传、人口统计学及其他特征)。