In recent years, large language models (LLMs) have achieved strong performance on benchmark tasks, especially in zero or few-shot settings. However, these benchmarks often do not adequately address the challenges posed in the real-world, such as that of hierarchical classification. In order to address this challenge, we propose refactoring conventional tasks on hierarchical datasets into a more indicative long-tail prediction task. We observe LLMs are more prone to failure in these cases. To address these limitations, we propose the use of entailment-contradiction prediction in conjunction with LLMs, which allows for strong performance in a strict zero-shot setting. Importantly, our method does not require any parameter updates, a resource-intensive process and achieves strong performance across multiple datasets.
翻译:近年来,大语言模型(LLMs)在基准测试任务中表现出色,尤其在零样本或小样本场景下。然而,这些基准测试往往未能充分解决现实世界中面临的挑战,例如层级分类问题。为应对这一挑战,我们提出将传统层级数据集任务重构为更具指示性的长尾预测任务。我们观察到,LLMs在此类情况下更容易出现失误。为解决这些局限性,我们提出在LLMs中引入蕴含-矛盾预测方法,从而在严格零样本设置下实现强性能。重要的是,我们的方法无需任何参数更新(这一资源密集型过程),并在多个数据集上取得了优异表现。