The standard quantitative metric for evaluating enrichment capacity known as $\textit{LogAUC}$ depends on a cutoff parameter that controls what the minimum value of the log-scaled x-axis is. Unless this parameter is chosen carefully for a given ROC curve, one of the two following problems occurs: either (1) some fraction of the first inter-decoy intervals of the ROC curve are simply thrown away and do not contribute to the metric at all, or (2) the very first inter-decoy interval contributes too much to the metric at the expense of all following inter-decoy intervals. We fix this problem with LogAUC by showing a simple way to choose the cutoff parameter based on the number of decoys which forces the first inter-decoy interval to always have a stable, sensible contribution to the total value. Moreover, we introduce a normalized version of LogAUC known as $\textit{enrichment score}$, which (1) enforces stability by selecting the cutoff parameter in the manner described, (2) yields scores which are more intuitively meaningful, and (3) allows reliably accurate comparison of the enrichment capacities exhibited by different ROC curves, even those produced using different numbers of decoys. Finally, we demonstrate the advantage of enrichment score over unbalanced metrics using data from a real retrospective docking study performed using the program $\textit{DOCK 3.7}$ on the target receptor TRYB1 included in the $\textit{DUDE-Z}$ benchmark.
翻译:标准富集能力定量评价指标 $\textit{LogAUC}$ 依赖于一个截断参数,该参数控制对数刻度横坐标的最小值。除非针对特定ROC曲线仔细选择该参数,否则会出现以下两种问题之一:(1)ROC曲线中前几个诱饵间区间的一部分被直接舍弃,未对指标产生任何贡献;或(2)第一个诱饵间区间以牺牲后续所有诱饵间区间为代价,对指标贡献过大。我们通过展示一种基于诱饵数量选择截断参数的简单方法解决了LogAUC的这一缺陷,使得第一个诱饵间区间始终对总值产生稳定且合理的贡献。此外,我们引入了LogAUC的归一化版本,即\textit{富集分数},该指标(1)通过上述方式选择截断参数确保稳定性;(2)得到更直观且具有实际意义的分数;(3)允许对不同ROC曲线(甚至由不同诱饵数量生成的曲线)所展现的富集能力进行可靠且准确的比较。最后,我们利用程序$\textit{DOCK 3.7}$对$\textit{DUDE-Z}$基准测试中靶标受体TRYB1进行的真实回顾性对接研究数据,证明了富集分数相对于非平衡指标的优势。