In this study, we introduce FEET, a standardized protocol designed to guide the development and benchmarking of foundation models. While numerous benchmark datasets exist for evaluating these models, we propose a structured evaluation protocol across three distinct scenarios to gain a comprehensive understanding of their practical performance. We define three primary use cases: frozen embeddings, few-shot embeddings, and fully fine-tuned embeddings. Each scenario is detailed and illustrated through two case studies: one in sentiment analysis and another in the medical domain, demonstrating how these evaluations provide a thorough assessment of foundation models' effectiveness in research applications. We recommend this protocol as a standard for future research aimed at advancing representation learning models.
翻译:在本研究中,我们提出了FEET,这是一个旨在指导基础模型开发和基准测试的标准化协议。尽管已有众多用于评估这些模型的基准数据集,我们仍提出了一种涵盖三种不同场景的结构化评估协议,以全面理解其实际性能。我们定义了三种主要用例:冻结嵌入、少样本嵌入和完全微调嵌入。每种场景均通过两个案例研究进行详细说明和展示:一个在情感分析领域,另一个在医学领域,以此证明这些评估如何为基础模型在科研应用中的有效性提供全面评估。我们建议将此协议作为未来旨在推进表征学习模型研究的标准。