Background Deriving feature rankings is essential in bioinformatics studies since the ordered features are important in guiding subsequent research. Feature rankings may be distorted by influential points (IP), but such effects are rarely mentioned in previous studies. This study aimed to investigate the impact of IPs on feature rankings and propose a new method to detect IPs. Method The present study utilized a case-deletion (i.e., leave-one-out) approach to assess the impact of cases. The influence of a case was measured by comparing the rank changes before and after the deletion of that case. We proposed a rank comparison method using adaptive top-prioritized weights that highlighted the rank changes of the top-ranked features. The weights were adjustable to the distribution of rank changes. Results Potential IPs could be observed in several datasets. The presence of IPs could significantly alter the results of the following analysis (e.g., enriched pathways), suggesting the necessity of IPs detection when deriving feature rankings. Compared with existing methods, the novel rank comparison method could identify rank changes of important (top-ranked) features because of employing the adaptive weights adjusted to the distribution of rank changes. Conclusions IPs detection should be routinely performed when deriving feature rankings. The new method for IPs detection exhibited favorable features compared with existing methods.
翻译:背景:在生物信息学研究中进行特征排序至关重要,因为有序特征对指导后续研究具有重要意义。特征排序可能受到影响点(influential points, IPs)的扭曲,但以往研究很少提及此类影响。本研究旨在探讨IPs对特征排序的影响,并提出检测IPs的新方法。方法:本研究采用案例删除(即留一法)评估案例的影响。通过比较删除某案例前后排序的变化来衡量该案例的影响。我们提出了一种采用自适应优先权重的排序比较方法,该权重突出显示排名靠前特征的变化,且可根据排序变化的分布进行调整。结果:在多个数据集中可观察到潜在IPs。IPs的存在可能显著改变后续分析结果(如富集通路),表明在推导特征排序时进行IPs检测的必要性。与现有方法相比,新提出的排序比较方法因采用根据排序变化分布调整的自适应权重,能够识别重要(排名靠前)特征的排序变化。结论:在推导特征排序时应常规进行IPs检测。与现有方法相比,新的IPs检测方法展现出更优特性。