Recent progress towards theoretical interpretability guarantees for AI has been made with classifiers that are based on interactive proof systems. A prover selects a certificate from the datapoint and sends it to a verifier who decides the class. In the context of machine learning, such a certificate can be a feature that is informative of the class. For a setup with high soundness and completeness, the exchanged certificates must have a high mutual information with the true class of the datapoint. However, this guarantee relies on a bound on the Asymmetric Feature Correlation of the dataset, a property that so far is difficult to estimate for high-dimensional data. It was conjectured in W\"aldchen et al. that it is computationally hard to exploit the AFC, which is what we prove here. We consider a malicious prover-verifier duo that aims to exploit the AFC to achieve high completeness and soundness while using uninformative certificates. We show that this task is $\mathsf{NP}$-hard and cannot be approximated better than $\mathcal{O}(m^{1/8 - \epsilon})$, where $m$ is the number of possible certificates, for $\epsilon>0$ under the Dense-vs-Random conjecture. This is some evidence that AFC should not prevent the use of interactive classification for real-world tasks, as it is computationally hard to be exploited.
翻译:近期,基于交互式证明系统的分类器在人工智能理论可解释性保证方面取得了进展。证明者从数据点中选择一个证书并发送给判定类别的验证者。在机器学习背景下,此类证书可以是能够提供类别信息的特征。对于具有高可靠性和完备性的设置,交换的证书必须与数据点的真实类别具有高互信息。然而,这一保证依赖于数据集的不对称特征相关性这一性质,而目前对于高维数据,该性质难以估计。Wäldchen等人曾猜想利用AFC在计算上是困难的,本文对此进行了证明。我们考虑一组恶意的证明者-验证者组合,旨在利用AFC在采用无信息证书的情况下实现高完备性和可靠性。我们证明该任务是$\mathsf{NP}$-难的,且在稠密vs随机猜想下,对于$\epsilon>0$,其近似比无法优于$\mathcal{O}(m^{1/8 - \epsilon})$,其中$m$为可能证书的数量。这为AFC不应阻碍交互式分类在实际任务中的应用提供了证据,因为其在计算上难以被利用。