The deployment of pre-trained perception models in novel environments often leads to performance degradation due to distributional shifts. Although recent artificial intelligence approaches for metacognition use logical rules to characterize and filter model errors, improving precision often comes at the cost of reduced recall. This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction. We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem, building on the idea of abductive learning (ABL) but applying it to test-time instead of training. The input predictions and the learned error detection rules derived from each model are encoded in a logic program. We then seek an abductive explanation--a subset of model predictions--that maximizes prediction coverage while ensuring the rate of logical inconsistencies (derived from domain constraints) remains below a specified threshold. We propose two algorithms for this knowledge representation task: an exact method based on Integer Programming (IP) and an efficient Heuristic Search (HS). Through extensive experiments on a simulated aerial imagery dataset featuring controlled, complex distributional shifts, we demonstrate that our abduction-based framework outperforms individual models and standard ensemble baselines, achieving, for instance, average relative improvements of approximately 13.6\% in F1-score and 16.6\% in accuracy across 15 diverse test datasets when compared to the best individual model. Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect models in challenging, novel scenarios.
翻译:预训练感知模型在新环境中的部署常因分布偏移而导致性能下降。尽管近期用于元认知的人工智能方法采用逻辑规则来刻画和过滤模型错误,但精度的提升往往以召回率的降低为代价。本文探讨了利用多个预训练模型可缓解这种召回率下降的假设。我们将识别和处理不同模型间冲突预测的挑战形式化为基于一致性的溯因问题,其建立在溯因学习(ABL)思想之上,但将其应用于测试阶段而非训练阶段。各模型的输入预测及学习得到的错误检测规则被编码为逻辑程序。随后,我们寻求一种溯因解释——即模型预测的一个子集——在确保逻辑不一致性(源自领域约束)比率低于指定阈值的前提下,最大化预测覆盖率。针对这一知识表示任务,我们提出了两种算法:基于整数规划(IP)的精确方法及高效的启发式搜索(HS)。通过在包含受控复杂分布偏移的模拟航空影像数据集上进行大量实验,我们证明:相较于最佳单模型,我们的溯因框架在15个多样化测试数据集上平均实现了约13.6%的F1分数相对提升和16.6%的准确率相对提升,其性能优于单模型及标准集成基线方法。实验结果验证了基于一致性的溯因可作为在挑战性新场景中稳健整合多个非完美模型知识的有效机制。