In this work, we conceptualize the learning process as information compression. We seek to equip generative pre-trained models with human-like learning capabilities that enable data compression during inference. We present a novel approach that utilizes the Generative Pre-trained Transformer (GPT) to approximate Kolmogorov complexity, with the aim of estimating the optimal Information Distance for few-shot learning. We first propose using GPT as a prior for lossless text compression, achieving a noteworthy compression ratio. Experiment with LLAMA2-7B backbone achieves a compression ratio of 15.5 on enwik9. We justify the pre-training objective of GPT models by demonstrating its equivalence to the compression length, and, consequently, its ability to approximate the information distance for texts. Leveraging the approximated information distance, our method allows the direct application of GPT models in quantitative text similarity measurements. Experiment results show that our method overall achieves superior performance compared to embedding and prompt baselines on challenging NLP tasks, including semantic similarity, zero and one-shot text classification, and zero-shot text ranking.
翻译:本文从信息压缩的角度重构学习过程,旨在赋予生成式预训练模型在推理阶段实现数据压缩的类人学习能力。我们提出一种创新方法,利用生成式预训练Transformer(GPT)近似柯尔莫哥洛夫复杂度,以估计用于小样本学习的最优信息距离。首先,我们提出将GPT作为无损文本压缩的先验模型,取得了显著的压缩比。采用LLAMA2-7B骨干网络在enwik9数据集上实现了15.5的压缩比。通过论证GPT模型预训练目标与压缩长度的等价性,我们证明了其近似文本信息距离的能力。基于近似信息距离,我们的方法可直接将GPT应用于定量文本相似度测量。实验结果表明,在语义相似度、零样本与单样本文本分类、零样本文本排序等具有挑战性的自然语言处理任务中,本方法整体性能优于基于嵌入向量和提示的基准方法。