We investigate explainability via short Boolean formulas in the data model based on unary relations. As an explanation of length k, we take a Boolean formula of length k that minimizes the error with respect to the target attribute to be explained. We first provide novel quantitative bounds for the expected error in this scenario. We then also demonstrate how the setting works in practice by studying three concrete data sets. In each case, we calculate explanation formulas of different lengths using an encoding in Answer Set Programming. The most accurate formulas we obtain achieve errors similar to other methods on the same data sets. However, due to overfitting, these formulas are not necessarily ideal explanations, so we use cross validation to identify a suitable length for explanations. By limiting to shorter formulas, we obtain explanations that avoid overfitting but are still reasonably accurate and also, importantly, human interpretable.
翻译:我们研究基于一元关系数据模型中通过短布尔公式实现的可解释性。作为长度为k的解释,我们选取一个长度为k的布尔公式,该公式最小化了与待解释目标属性的误差。首先,我们针对该场景下的期望误差提出新的定量界。随后,通过研究三个具体数据集,我们展示了该设置在实际中的应用。在每种情况下,我们利用回答集编程中的编码来计算不同长度的解释公式。所得最精确公式的误差与同一数据集上其他方法的误差相近。然而,由于过拟合,这些公式未必是理想的解释,因此我们采用交叉验证来识别适合的解释长度。通过限制为更短的公式,我们获得的解释既能避免过拟合,又保持了合理的准确性,并且重要的是,具备人类可理解性。