Benchmark hacking refers to tuning a machine learning model to score highly on certain evaluation criteria without improving true generalization or faithfully solving the intended problem. We study this phenomenon in a generic machine learning contest, where each contestant chooses two types of effort: creative effort that improves model capability as desired by the contest host, and mechanistic effort that only improves the model's fitness to the particular task in contest without contributing to true generalization. We establish the existence of a symmetric monotone pure strategy equilibrium in this competition game. It also provides a natural definition of benchmark hacking in this strategic context by comparing a player's equilibrium effort allocation to that of a single-agent baseline scenario. Under our definition, contestants with types below certain threshold (low types) always engage in benchmark hacking, whereas those above the threshold do not. Furthermore, we show that more skewed reward structures (favoring top-ranked contestants) can elicit more desirable contest outcomes. We also provide empirical evidence to support our theoretical predictions.
翻译:基准作弊是指调整机器学习模型使其在某些评估标准上获得高分,但并未真正提升模型的泛化能力或忠实解决预期问题。我们在一类通用机器学习竞赛中研究这一现象:每位参赛者投入两种努力——创造性努力(旨在提升模型能力,符合竞赛主办方的期望)与机械性努力(仅提升模型在特定竞赛任务上的适配性,而不贡献于真正的泛化)。我们证明了在此竞争博弈中存在对称单调纯策略均衡。该均衡通过比较参赛者的均衡努力分配与单智能体基线场景,为基准作弊提供了战略语境下的自然定义。根据我们的定义,能力低于某个阈值的参赛者(低能力类型)始终会进行基准作弊,而高于阈值的参赛者则不会。进一步研究表明,更倾斜的奖励结构(偏向排名靠前的参赛者)能激发更理想的竞赛结果。我们还提供了实证证据支持理论预测。