As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable levels. Addressing this, our work aims to strike a delicate balance between computational efficiency and model accuracy, a persisting challenge in the field. We introduce a novel method that employs core subset selection for reweighting, effectively optimizing both computational time and model performance. By focusing on a strategically selected coreset, our approach offers a robust representation, as it efficiently minimizes the influence of outliers. The re-calibrated weights are then mapped back to and propagated across the entire dataset. Our experimental results substantiate the effectiveness of this approach, underscoring its potential as a scalable and precise solution for model training.
翻译:随着机器学习任务的不断发展,收集更大规模数据集并训练更大模型已成为趋势。虽然这推动了准确率的提升,但也将计算成本推升至不可持续的水平。针对这一挑战,本研究致力于在计算效率与模型准确率之间实现精细平衡——这是该领域持续存在的难题。我们提出了一种新颖方法,通过核心子集选择进行重加权,有效优化了计算时间与模型性能。通过聚焦策略性选取的核心集,该方法能够高效抑制异常值影响,从而提供稳健的数据表征。经重新校准的权重随后被映射回完整数据集并传播至全局。实验结果表明该方法的有效性,彰显其作为可扩展且精确的模型训练解决方案的潜力。