We investigate explainability via short Boolean formulas in the data model based on unary relations. As an explanation of length k, we take a Boolean formula of length k that minimizes the error with respect to the target attribute to be explained. We first provide novel quantitative bounds for the expected error in this scenario. We then also demonstrate how the setting works in practice by studying three concrete data sets. In each case, we calculate explanation formulas of different lengths using an encoding in Answer Set Programming. The most accurate formulas we obtain achieve errors similar to other methods on the same data sets. However, due to overfitting, these formulas are not necessarily ideal explanations, so we use cross validation to identify a suitable length for explanations. By limiting to shorter formulas, we obtain explanations that avoid overfitting but are still reasonably accurate and also, importantly, human interpretable.
翻译:我们研究了基于一元关系数据模型中,利用短布尔公式进行可解释性的问题。作为长度为k的解释,我们采用长度为k的布尔公式,该公式相对于待解释的目标属性具有最小误差。首先,我们为该场景下的期望误差提供了新的定量界。随后,我们通过分析三个具体数据集,演示了该设置在实际中的应用。在每种情况下,我们利用答案集编程(Answer Set Programming)的编码方式,计算了不同长度的解释公式。所得到的最准确公式的误差,与相同数据集上其他方法的误差相当。然而,由于过拟合问题,这些公式并非理想解释,因此我们使用交叉验证来确定解释的合适长度。通过限制使用更短的公式,我们获得的解释避免了过拟合,同时仍然保持足够的准确性,并且,重要的是,这些公式具有人类可解释性。