The deployment of pre-trained perception models in novel environments often leads to performance degradation due to distributional shifts. Although recent artificial intelligence approaches for metacognition use logical rules to characterize and filter model errors, improving precision often comes at the cost of reduced recall. This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction. We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem, building on the idea of abductive learning (ABL) but applying it to test-time instead of training. The input predictions and the learned error detection rules derived from each model are encoded in a logic program. We then seek an abductive explanation--a subset of model predictions--that maximizes prediction coverage while ensuring the rate of logical inconsistencies (derived from domain constraints) remains below a specified threshold. We propose two algorithms for this knowledge representation task: an exact method based on Integer Programming (IP) and an efficient Heuristic Search (HS). Through extensive experiments on a simulated aerial imagery dataset featuring controlled, complex distributional shifts, we demonstrate that our abduction-based framework outperforms individual models and standard ensemble baselines, achieving, for instance, average relative improvements of approximately 13.6\% in F1-score and 16.6\% in accuracy across 15 diverse test datasets when compared to the best individual model. Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect models in challenging, novel scenarios.
翻译:预训练感知模型在新环境中的部署常因分布偏移导致性能下降。尽管近期基于元认知的人工智能方法采用逻辑规则来表征和过滤模型错误,但提升精确率通常以降低召回率为代价。本文提出利用多个预训练模型可缓解召回率下降的假设。我们将识别和管理多模型冲突预测的挑战形式化为一个基于一致性的溯因问题,该方法借鉴溯因学习(ABL)的思想,但将其应用于测试阶段而非训练阶段。各模型产生的输入预测及其习得错误检测规则被编码为逻辑程序。随后我们寻求一种溯因解释——即模型预测的子集——在确保由领域约束导出的逻辑不一致率低于指定阈值的同时,最大化预测覆盖范围。我们为此知识表征任务提出了两种算法:基于整数规划(IP)的精确方法和高效的启发式搜索(HS)。通过在包含可控复杂分布偏移的模拟航拍图像数据集上进行广泛实验,我们证明基于溯因的框架优于单一模型和标准集成基线——例如,与最优单一模型相比,在15个多样化测试数据集上平均F1分数相对提升约13.6%,准确率相对提升约16.6%。实验结果验证了基于一致性的溯因推理作为在挑战性新场景中稳健整合多个不完美模型知识的有效机制。