Pairwise comparison labeling is emerging as it yields higher inter-rater reliability than conventional classification labeling, but exhaustive comparisons require quadratic cost. We propose Dodgersort, which leverages CLIP-based hierarchical pre-ordering, a neural ranking head and probabilistic ensemble (Elo, BTL, GP), epistemic--aleatoric uncertainty decomposition, and information-theoretic pair selection. It reduces human comparisons while improving the reliability of the rankings. In visual ranking tasks in medical imaging, historical dating, and aesthetics, Dodgersort achieves a 11--16\% annotation reduction while improving inter-rater reliability. Cross-domain ablations across four datasets show that neural adaptation and ensemble uncertainty are key to this gain. In FG-NET with ground-truth ages, the framework extracts 5--20$\times$ more ranking information per comparison than baselines, yielding Pareto-optimal accuracy--efficiency trade-offs.
翻译:逐对比较标注因其比传统分类标注具有更高的评分者间信度而逐渐兴起,但穷举比较需要平方级成本。我们提出Dodgersort方法,该方法利用基于CLIP的层次化预排序、神经排序头与概率集成(Elo、BTL、GP)、认知-偶然不确定性分解以及信息论对选择。该方法在提升排序可靠性的同时减少了人工比较次数。在医学影像、历史年代判定和美学领域的视觉排序任务中,Dodgersort在提升评分者间信度的同时实现了11%–16%的标注量缩减。跨四个数据集的消融实验表明,神经自适应与集成不确定性是这一增益的关键。在包含真实年龄标签的FG-NET数据集上,该框架每次比较提取的排序信息量是基线方法的5–20倍,实现了帕累托最优的精度-效率权衡。