Verifiable learning advocates for training machine learning models amenable to efficient security verification. Prior research demonstrated that specific classes of decision tree ensembles -- called large-spread ensembles -- allow for robustness verification in polynomial time against any norm-based attacker. This study expands prior work on verifiable learning from basic ensemble methods (i.e., hard majority voting) to advanced boosted tree ensembles, such as those trained using XGBoost or LightGBM. Our formal results indicate that robustness verification is achievable in polynomial time when considering attackers based on the $L_\infty$-norm, but remains NP-hard for other norm-based attackers. Nevertheless, we present a pseudo-polynomial time algorithm to verify robustness against attackers based on the $L_p$-norm for any $p \in \mathbb{N} \cup \{0\}$, which in practice grants excellent performance. Our experimental evaluation shows that large-spread boosted ensembles are accurate enough for practical adoption, while being amenable to efficient security verification.
翻译:可验证学习倡导训练易于进行高效安全验证的机器学习模型。先前研究表明,特定类别的决策树集成(称为大间距集成)能够在多项式时间内抵御任意基于范数攻击者的鲁棒性验证。本研究将可验证学习的先前工作从基础集成方法(即硬多数投票)扩展至高级增强树集成(如使用XGBoost或LightGBM训练的集成)。我们的形式化结果表明,当考虑基于$L_\infty$-范数的攻击者时,鲁棒性验证可在多项式时间内实现,但对于其他基于范数的攻击者而言,该问题仍为NP困难问题。尽管如此,我们提出了一种伪多项式时间算法,用于验证针对基于$L_p$-范数攻击者(其中$p \in \mathbb{N} \cup \{0\}$)的鲁棒性,该算法在实践中展现出优异的性能。实验评估表明,大间距增强集成在具备足够准确性的同时,能够实现高效安全验证,适用于实际部署。