Variational Information Pursuit for Interpretable Predictions

There is a growing interest in the machine learning community in developing predictive algorithms that are "interpretable by design". Towards this end, recent work proposes to make interpretable decisions by sequentially asking interpretable queries about data until a prediction can be made with high confidence based on the answers obtained (the history). To promote short query-answer chains, a greedy procedure called Information Pursuit (IP) is used, which adaptively chooses queries in order of information gain. Generative models are employed to learn the distribution of query-answers and labels, which is in turn used to estimate the most informative query. However, learning and inference with a full generative model of the data is often intractable for complex tasks. In this work, we propose Variational Information Pursuit (V-IP), a variational characterization of IP which bypasses the need for learning generative models. V-IP is based on finding a query selection strategy and a classifier that minimizes the expected cross-entropy between true and predicted labels. We then demonstrate that the IP strategy is the optimal solution to this problem. Therefore, instead of learning generative models, we can use our optimal strategy to directly pick the most informative query given any history. We then develop a practical algorithm by defining a finite-dimensional parameterization of our strategy and classifier using deep networks and train them end-to-end using our objective. Empirically, V-IP is 10-100x faster than IP on different Vision and NLP tasks with competitive performance. Moreover, V-IP finds much shorter query chains when compared to reinforcement learning which is typically used in sequential-decision-making problems. Finally, we demonstrate the utility of V-IP on challenging tasks like medical diagnosis where the performance is far superior to the generative modelling approach.

翻译：在机器学习社区中，开发"本质可解释"的预测算法正引起日益浓厚的兴趣。为此，近期研究提出通过顺序询问关于数据的可解释查询来做出可解释决策，直到基于已获得答案（历史记录）能够以高置信度进行预测。为促进短查询-答案链，采用了一种名为信息追踪（IP）的贪心过程，该过程根据信息增益自适应选择查询。生成模型被用于学习查询-答案与标签的联合分布，进而用于估计信息量最大的查询。然而，对于复杂任务而言，通过完整数据生成模型进行学习与推理往往难以处理。在本工作中，我们提出变分信息追踪（V-IP）——IP的变分形式，它绕过了学习生成模型的需求。V-IP基于寻找查询选择策略与分类器，以最小化真实标签与预测标签之间的期望交叉熵。我们随后证明IP策略是该问题的最优解。因此，我们无需学习生成模型，而是可以直接使用最优策略根据任意历史记录选取信息量最大的查询。我们进一步通过使用深度网络对策略与分类器进行有限维参数化，并基于目标函数进行端到端训练，从而开发出实用算法。实验表明，在视觉与自然语言处理任务中，V-IP相比IP快10-100倍，且性能具有竞争力。此外，与常用于序贯决策问题的强化学习相比，V-IP能够找到更短的查询链。最后，我们展示了V-IP在医学诊断等具有挑战性任务中的实用性，其性能远超生成建模方法。