We propose a homogeneity test closely related to the concept of linear separability between two samples. Using the test one can answer the question whether a linear classifier is merely ``random'' or effectively captures differences between two classes. We focus on establishing upper bounds for the test's \emph{p}-value when applied to two-dimensional samples. Specifically, for normally distributed samples we experimentally demonstrate that the upper bound is highly accurate. Using this bound, we evaluate classifiers designed to detect ER-positive breast cancer recurrence based on gene pair expression. Our findings confirm significance of IGFBP6 and ELOVL5 genes in this process.
翻译:我们提出了一种与两个样本间线性可分性概念密切相关的同质性检验。利用该检验可以回答线性分类器是仅仅“随机”还是有效捕捉了两个类别间差异的问题。我们重点在于建立该检验应用于二维样本时其\emph{p}值上界。具体而言,对于正态分布样本,我们通过实验证明该上界具有很高的准确性。利用此上界,我们评估了为检测基于基因对表达的ER阳性乳腺癌复发而设计的分类器。我们的研究结果证实了IGFBP6和ELOVL5基因在此过程中的重要性。