Variational Information Pursuit for Interpretable Predictions

There is a growing interest in the machine learning community in developing predictive algorithms that are "interpretable by design". Towards this end, recent work proposes to make interpretable decisions by sequentially asking interpretable queries about data until a prediction can be made with high confidence based on the answers obtained (the history). To promote short query-answer chains, a greedy procedure called Information Pursuit (IP) is used, which adaptively chooses queries in order of information gain. Generative models are employed to learn the distribution of query-answers and labels, which is in turn used to estimate the most informative query. However, learning and inference with a full generative model of the data is often intractable for complex tasks. In this work, we propose Variational Information Pursuit (V-IP), a variational characterization of IP which bypasses the need for learning generative models. V-IP is based on finding a query selection strategy and a classifier that minimizes the expected cross-entropy between true and predicted labels. We then demonstrate that the IP strategy is the optimal solution to this problem. Therefore, instead of learning generative models, we can use our optimal strategy to directly pick the most informative query given any history. We then develop a practical algorithm by defining a finite-dimensional parameterization of our strategy and classifier using deep networks and train them end-to-end using our objective. Empirically, V-IP is 10-100x faster than IP on different Vision and NLP tasks with competitive performance. Moreover, V-IP finds much shorter query chains when compared to reinforcement learning which is typically used in sequential-decision-making problems. Finally, we demonstrate the utility of V-IP on challenging tasks like medical diagnosis where the performance is far superior to the generative modelling approach.

翻译：机器学习社区对开发“本质可解释”的预测算法兴趣日益增长。为此，近期工作提出通过顺序询问关于数据的可解释查询，直至基于所获答案（历史记录）能做出高置信度预测，从而实现可解释决策。为促进简短查询-答案链，采用了一种名为信息追求（IP）的贪心过程，该过程根据信息增益自适应选择查询。生成模型被用于学习查询-答案与标签的联合分布，进而估计最具信息量的查询。然而，对于复杂任务，学习并推断完整数据生成模型往往难以实现。本文提出变分信息追求（V-IP），这是一种IP的变分表征，无需学习生成模型。V-IP基于寻找查询选择策略和分类器，以最小化真实标签与预测标签之间的期望交叉熵。我们随后证明IP策略是该问题的最优解。因此，无需学习生成模型，即可通过最优策略直接选取给定历史下最具信息量的查询。我们通过使用深度网络对策略和分类器进行有限维参数化，并基于目标函数进行端到端训练，从而开发出实用算法。实验表明，在视觉与自然语言处理任务中，V-IP比IP快10-100倍且性能相当。此外，与常用于序列决策问题的强化学习相比，V-IP能发现更短的查询链。最终，我们在医学诊断等具有挑战性的任务中展示了V-IP的实用性——其性能远优于生成建模方法。