In many scenarios, one uses a large training set to train a model with the goal of performing well on a smaller testing set with a different distribution. Learning a weight for each data point of the training set is an appealing solution, as it ideally allows one to automatically learn the importance of each training point for generalization on the testing set. This task is usually formalized as a bilevel optimization problem. Classical bilevel solvers are based on a warm-start strategy where both the parameters of the models and the data weights are learned at the same time. We show that this joint dynamic may lead to sub-optimal solutions, for which the final data weights are very sparse. This finding illustrates the difficulty of data reweighting and offers a clue as to why this method is rarely used in practice.
翻译:在许多场景中,人们使用大规模训练集训练模型,目标是使其在分布不同的小型测试集上表现良好。为训练集中的每个数据点学习一个权重是一种颇具吸引力的解决方案,因为理想情况下,它能自动学习每个训练点对测试集泛化能力的重要性。该任务通常被形式化为一个双层优化问题。经典的双层求解器基于热启动策略,即同时学习模型参数和数据权重。我们证明,这种联合动态可能导致次优解,其最终数据权重非常稀疏。这一发现揭示了数据重加权的难点,并为此方法在实践中鲜少被采用提供了线索。