Machine learning models need to be continually updated or corrected to ensure that the prediction accuracy remains consistently high. In this study, we consider scenarios where developers should be careful to change the prediction results by the model correction, such as when the model is part of a complex system or software. In such scenarios, the developers want to control the specification of the corrections. To achieve this, the developers need to understand which subpopulations of the inputs get inaccurate predictions by the model. Therefore, we propose correction rule mining to acquire a comprehensive list of rules that describe inaccurate subpopulations and how to correct them. We also develop an efficient correction rule mining algorithm that is a combination of frequent itemset mining and a unique pruning technique for correction rules. We observed that the proposed algorithm found various rules which help to collect data insufficiently learned, directly correct model outputs, and analyze concept drift.
翻译:机器学习模型需要不断更新或修正,以确保预测准确性始终保持较高水平。在本研究中,我们考虑了开发人员应谨慎通过模型修正来改变预测结果的场景,例如当模型是复杂系统或软件的一部分时。在此类场景中,开发人员希望控制修正的规格。为实现这一目标,开发人员需要了解模型的哪些输入子群体获得了不准确的预测。因此,我们提出了修正规则挖掘方法,以获取描述不准确子群体及其修正方法的全面规则列表。我们还开发了一种高效的修正规则挖掘算法,该算法结合了频繁项集挖掘和针对修正规则的独特剪枝技术。我们观察到,所提出的算法发现了多种规则,这些规则有助于收集学习不足的数据、直接修正模型输出以及分析概念漂移。