Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank reviews by sentiment. Recent work focuses on pairwise, pointwise, and listwise prompting techniques to elicit a language model's ranking knowledge. However, we find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce. This motivates us to explore an alternative approach that is inspired by an unsupervised probing method called Contrast-Consistent Search (CCS). The idea is to train a probing model guided by a logical constraint: a model's representation of a statement and its negation must be mapped to contrastive true-false poles consistently across multiple statements. We hypothesize that similar constraints apply to ranking tasks where all items are related via consistent pairwise or listwise comparisons. To this end, we extend the binary CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking methods such as the Max-Margin Loss, Triplet Loss, and Ordinal Regression objective. Our results confirm that, for the same language model, CCR probing outperforms prompting and even performs on a par with prompting much larger language models.
翻译:语言模型包含基于排序的知识,并能有效解决上下文中的排序任务。例如,它们可能具备关于国家面积排序的参数化知识,或能够根据情感对评论进行排序。近期研究聚焦于点方式、点序方式和列方式提示技术以提取语言模型的排序知识。然而我们发现,即使经过精细校准和约束解码,基于提示的技术在产生的排序中未必总是自洽的。这促使我们探索一种受无监督探测方法——对比一致搜索(CCS)启发的新途径。其核心思想是训练一个受逻辑约束引导的探测模型:模型对陈述及其否定的表示必须跨多个陈述一致地映射到对比的真-假极性。我们假设类似约束同样适用于排序任务,其中所有项目通过一致的点对点或列对列比较相关联。为此,我们将二值CCS方法扩展为对比一致排序(CCR),通过改编现有排序方法(如最大间隔损失、三元组损失和序数回归目标)。我们的结果验证了,对于同一语言模型,CCR探测方法优于提示技术,甚至能达到与提示更大语言模型相当的性能。