We study a classification problem with three key challenges: pervasive informative missingness, the integration of partial prior expert knowledge into the learning process, and the need for interpretable decision rules. We propose a framework that encodes prior knowledge through an expert-guided class-conditional model for one or more classes, and use this model to construct a small set of interpretable goodness-of-fit features. The features quantify how well the observed data agree with the expert model, isolating the contributions of different aspects of the data, including both observed and missing components. These features are combined with a few transparent auxiliary summaries in a simple discriminative classifier, resulting in a decision rule that is easy to inspect and justify. We develop and apply the framework in the context of seismic monitoring used to assess compliance with the Comprehensive Nuclear-Test-Ban Treaty. We show that the method has strong potential as a transparent screening tool, reducing workload for expert analysts. A simulation designed to isolate the contribution of the proposed framework shows that this interpretable expert-guided method can even outperform strong standard machine-learning classifiers, particularly when training samples are small.
翻译:我们研究了一个分类问题,该问题面临三个关键挑战:普遍存在的信息缺失、将部分先验专家知识融入学习过程,以及需要可解释的决策规则。我们提出了一种框架,通过专家引导的类别条件模型为一类或多类编码先验知识,并利用该模型构建一组少量可解释的拟合优度特征。这些特征量化了观测数据与专家模型的一致程度,分离了数据不同方面(包括观测和缺失部分)的贡献。这些特征与少数透明的辅助摘要相结合,构成一个简单的判别分类器,从而得到一个易于检查和验证的决策规则。我们在地震监测的背景下开发并应用了该框架,以评估《全面禁止核试验条约》的遵守情况。研究表明,该方法作为透明的筛选工具具有巨大潜力,能够减少专家分析师的工作量。通过一项旨在分离该框架贡献的模拟实验显示,这种可解释的专家引导方法甚至可以超越强大的标准机器学习分类器,尤其是在训练样本较少时表现更优。