The Lipschitz constant of the map between the input and output space represented by a neural network is a natural metric for assessing the robustness of the model. We present a new method to constrain the Lipschitz constant of dense deep learning models that can also be generalized to other architectures. The method relies on a simple weight normalization scheme during training that ensures the Lipschitz constant of every layer is below an upper limit specified by the analyst. A simple monotonic residual connection can then be used to make the model monotonic in any subset of its inputs, which is useful in scenarios where domain knowledge dictates such dependence. Examples can be found in algorithmic fairness requirements or, as presented here, in the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider. Our normalization is minimally constraining and allows the underlying architecture to maintain higher expressiveness compared to other techniques which aim to either control the Lipschitz constant of the model or ensure its monotonicity. We show how the algorithm was used to train a powerful, robust, and interpretable discriminator for heavy-flavor-quark decays, which has been adopted for use as the primary data-selection algorithm in the LHCb real-time data-processing system in the current LHC data-taking period known as Run 3. In addition, our algorithm has also achieved state-of-the-art performance on benchmarks in medicine, finance, and other applications.
翻译:神经网络中输入与输出空间映射的Lipschitz常数是评估模型鲁棒性的自然度量。我们提出了一种新方法,用于约束密集深度学习模型的Lipschitz常数,该方法亦可推广至其他架构。该技术基于训练过程中的简单权重归一化方案,确保每一层的Lipschitz常数均低于分析人员指定的上限。通过引入简单的单调残差连接,可使模型在其任意输入子集上保持单调性,这在领域知识要求此类依赖关系的场景中尤为有用,例如算法公平性需求,或本文中欧洲核子研究中心大型强子对撞机产生的亚原子粒子衰变分类任务。与其他旨在控制模型Lipschitz常数或确保其单调性的技术相比,我们的归一化方法约束性最小,能保持底层架构更高的表达能力。我们展示了该算法如何训练出强大、鲁棒且可解释的重味夸克衰变鉴别器,该鉴别器已被采用为当前LHC数据采集期(Run 3)LHCb实时数据处理系统中的主要数据选择算法。此外,该算法在医学、金融及其他领域的基准测试中均达到了最先进的性能。