As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable levels. Addressing this, our work aims to strike a delicate balance between computational efficiency and model accuracy, a persisting challenge in the field. We introduce a novel method that employs core subset selection for reweighting, effectively optimizing both computational time and model performance. By focusing on a strategically selected coreset, our approach offers a robust representation, as it efficiently minimizes the influence of outliers. The re-calibrated weights are then mapped back to and propagated across the entire dataset. Our experimental results substantiate the effectiveness of this approach, underscoring its potential as a scalable and precise solution for model training.
翻译:随着机器学习任务的不断发展,收集更大数据集并训练更大模型已成为趋势。这虽然提升了模型精度,但也将计算成本推高至不可持续的水平。针对这一问题,我们的研究致力于在计算效率与模型精度之间取得微妙平衡——这是该领域长期存在的挑战。我们提出了一种利用核心子集选择进行重加权的新方法,能有效优化计算时间与模型性能。通过聚焦于策略性选择的核心集,我们的方法提供了稳健的数据表征,因为它能高效最小化异常值的影响。重新校准后的权重随后被映射回整个数据集并进行传播。实验结果证实了该方法的有效性,凸显了其作为可扩展且精确的模型训练解决方案的潜力。