Machine learning models need to be continually updated or corrected to ensure that the prediction accuracy remains consistently high. In this study, we consider scenarios where developers should be careful to change the prediction results by the model correction, such as when the model is part of a complex system or software. In such scenarios, the developers want to control the specification of the corrections. To achieve this, the developers need to understand which subpopulations of the inputs get inaccurate predictions by the model. Therefore, we propose correction rule mining to acquire a comprehensive list of rules that describe inaccurate subpopulations and how to correct them. We also develop an efficient correction rule mining algorithm that is a combination of frequent itemset mining and a unique pruning technique for correction rules. We observed that the proposed algorithm found various rules which help to collect data insufficiently learned, directly correct model outputs, and analyze concept drift.
翻译:机器学习模型需要不断更新或修正,以确保预测准确率持续保持在高水平。在本研究中,我们考虑开发者在修正模型时应谨慎改变预测结果的场景,例如当模型是复杂系统或软件的一部分时。在此类场景中,开发者希望控制修正的具体规范。为此,开发者需要了解模型对哪些输入子群体产生了不准确的预测。因此,我们提出修正规则挖掘,以获取描述不准确子群体及其修正方法的全面规则列表。我们还开发了一种高效的修正规则挖掘算法,该算法结合了频繁项集挖掘和针对修正规则的独特剪枝技术。我们观察到,所提出的算法能够发现多种规则,这些规则有助于收集学习不充分的数据、直接修正模型输出以及分析概念漂移。