The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased. These algorithms increase the effective human-labeled sample size by up to 50% on experiments with GPT-4.
翻译:使用人工标注的验证数据评估机器学习模型可能成本高昂且耗时。通过一个称为自动评估的过程,使用AI标注的合成数据可以减少为此目的所需的人工标注数量。我们提出了高效且具有统计原则的算法,这些算法在保持无偏性的同时提高了样本效率。在GPT-4的实验中,这些算法将有效的人工标注样本量提升了高达50%。