Label quality issues, such as noisy labels and imbalanced class distributions, have negative effects on model performance. Automatic reweighting methods identify problematic samples with label quality issues by recognizing their negative effects on validation samples and assigning lower weights to them. However, these methods fail to achieve satisfactory performance when the validation samples are of low quality. To tackle this, we develop Reweighter, a visual analysis tool for sample reweighting. The reweighting relationships between validation samples and training samples are modeled as a bipartite graph. Based on this graph, a validation sample improvement method is developed to improve the quality of validation samples. Since the automatic improvement may not always be perfect, a co-cluster-based bipartite graph visualization is developed to illustrate the reweighting relationships and support the interactive adjustments to validation samples and reweighting results. The adjustments are converted into the constraints of the validation sample improvement method to further improve validation samples. We demonstrate the effectiveness of Reweighter in improving reweighting results through quantitative evaluation and two case studies.
翻译:标签质量问题(如噪声标签和类别分布不均衡)会对模型性能产生负面影响。自动加权方法通过识别问题样本对验证样本的负面效应,为其分配较低权重来定位存在标签质量问题的样本。然而,当验证样本本身质量较低时,这些方法难以取得令人满意的效果。为此,我们开发了Reweighter——一个面向样本加权的可视化分析工具。该工具将验证样本与训练样本之间的加权关系建模为二分图,并基于此图研发了验证样本改进方法以提升验证样本质量。考虑到自动改进未必完全准确,我们设计了基于共聚类的二分图可视化方法,用于展示加权关系并支持对验证样本及加权结果进行交互式调整。这些调整被转化为验证样本改进方法的约束条件,从而进一步优化验证样本。通过定量评估与两项案例研究,我们验证了Reweighter在改善加权结果方面的有效性。