As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable levels. Addressing this, our work aims to strike a delicate balance between computational efficiency and model accuracy, a persisting challenge in the field. We introduce a novel method that employs core subset selection for reweighting, effectively optimizing both computational time and model performance. By focusing on a strategically selected coreset, our approach offers a robust representation, as it efficiently minimizes the influence of outliers. The re-calibrated weights are then mapped back to and propagated across the entire dataset. Our experimental results substantiate the effectiveness of this approach, underscoring its potential as a scalable and precise solution for model training.
翻译:随着机器学习任务的持续演进,其趋势已转向收集更大规模的数据集并训练日益庞大的模型。虽然这带来了准确率的提升,但也导致计算成本攀升至不可持续的水平。为解决这一难题,我们的工作旨在平衡计算效率与模型精度——这一领域长期存在的挑战。我们提出了一种新颖方法,采用核心子集选择进行重加权,从而有效优化计算时间与模型性能。通过聚焦于策略性选取的核心集,该方法既能高效抑制异常值的影响,又可提供鲁棒的表示。经过重校准的权重随后被映射回并传播至整个数据集。实验结果验证了该方法的有效性,彰显其作为可扩展且精确的模型训练方案的潜力。