Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank product reviews by sentiment. We compare pairwise, pointwise and listwise prompting techniques to elicit a language model's ranking knowledge. However, we find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce. This motivates us to explore an alternative approach that is inspired by an unsupervised probing method called Contrast-Consistent Search (CCS). The idea is to train a probe guided by a logical constraint: a language model's representation of a statement and its negation must be mapped to contrastive true-false poles consistently across multiple statements. We hypothesize that similar constraints apply to ranking tasks where all items are related via consistent, pairwise or listwise comparisons. To this end, we extend the binary CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking methods such as the Max-Margin Loss, Triplet Loss and an Ordinal Regression objective. Across different models and datasets, our results confirm that CCR probing performs better or, at least, on a par with prompting.
翻译:语言模型蕴含基于排序的知识,并能有效解决上下文中的排序任务。例如,它们可能具有关于国家面积排序的参数化知识,或能够根据情感对产品评论进行排序。我们比较了成对、逐点和列表式提示技术,以激发语言模型的排序知识。然而,我们发现即使经过仔细校准和约束解码,基于提示的技术所产生的排序结果也可能并非始终自洽。这促使我们探索一种替代方法,该方法受名为对比一致性搜索(CCS)的无监督探针方法启发。其核心思想是训练一个由逻辑约束引导的探针:语言模型对某个陈述及其否定形式的表示,必须跨多个陈述一致地映射到对比的真-假两极。我们假设类似的约束也适用于排序任务,其中所有项目通过一致的成对或列表比较相互关联。为此,我们将二元CCS方法扩展为对比一致性排序(CCR),通过适配现有排序方法如最大间隔损失、三元组损失和序数回归目标。跨不同模型和数据集的结果证实,CCR探针性能优于或至少与基于提示的方法相当。