This study explores the potential of using training dynamics as an automated alternative to human annotation for evaluating the quality of training data. The framework used is Data Maps, which classifies data points into categories such as easy-to-learn, hard-to-learn, and ambiguous (Swayamdipta et al., 2020). Swayamdipta et al. (2020) highlight that difficult-to-learn examples often contain errors, and ambiguous cases significantly impact model training. To confirm the reliability of these findings, we replicated the experiments using a challenging dataset, with a focus on medical question answering. In addition to text comprehension, this field requires the acquisition of detailed medical knowledge, which further complicates the task. A comprehensive evaluation was conducted to assess the feasibility and transferability of the Data Maps framework to the medical domain. The evaluation indicates that the framework is unsuitable for addressing datasets' unique challenges in answering medical questions.
翻译:本研究探讨了利用训练动态作为人工标注的自动化替代方案来评估训练数据质量的潜力。所采用的框架是Data Maps,该框架将数据点分类为易学习、难学习和模糊等类别(Swayamdipta等人,2020年)。Swayamdipta等人(2020年)指出,难以学习的样本通常包含错误,而模糊案例对模型训练具有显著影响。为验证这些发现的可靠性,我们使用具有挑战性的数据集复现了实验,重点关注医疗问答领域。该领域不仅需要文本理解能力,还要求掌握详细的医学知识,这进一步增加了任务的复杂性。我们进行了全面评估,以检验Data Maps框架在医疗领域的可行性和可迁移性。评估结果表明,该框架不适用于解决医疗问答数据集中特有的挑战。