This paper describes Asterisk, a compact GPT-based model for generating text embeddings. The model uses a minimalist architecture with two layers, two attention heads, and 256 embedding dimensions. By applying knowledge distillation from larger pretrained models, we explore the trade-offs between model size and performance while minimizing computational and memory requirements. The model is primarily evaluated and optimized for classification tasks, with experimental results showing its moderate performance in zero-shot classification across various downstream applications. With additional configuration, the model performance can approach or even surpass that of larger architectures on specific classification tasks.
翻译:本文介绍Asterisk,一种基于GPT的紧凑型文本嵌入生成模型。该模型采用极简架构,包含两个网络层、两个注意力头以及256维嵌入空间。通过应用从大型预训练模型提取的知识蒸馏技术,我们在最小化计算与内存需求的同时,探索了模型规模与性能之间的权衡关系。该模型主要针对分类任务进行评估与优化,实验结果表明其在多种下游应用的零样本分类任务中表现出中等性能水平。通过额外配置,该模型在特定分类任务上的性能可接近甚至超越更大规模架构。