Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, exposing gradients during training can leak sensitive information about the underlying data, raising privacy and security concerns such as susceptibility to data poisoning attacks. In contrast, black box optimization methods, which treat the model as an opaque function, relying solely on function evaluations to guide optimization, offer a promising alternative in scenarios where data access is restricted, adversarial risks are high, or overfitting is a concern. This paper introduces BBoxER, an evolutionary black-box method for LLM post-training that induces an information bottleneck via implicit compression of the training data. Leveraging the tractability of information flow, we provide non-vacuous generalization bounds and strong theoretical guarantees for privacy, robustness to data poisoning attacks, and extraction attacks. In experiments with LLMs, we demonstrate empirically that black-box optimization methods, despite the scalability and computational challenges inherent to black-box approaches, are able to learn, showing how a few iterations of BBoxER improve performance, generalize well on a benchmark of reasoning datasets, and are robust to membership inference attacks. This positions BBoxER as an attractive add-on on top of gradient-based optimization, offering suitability for deployment in restricted or privacy-sensitive environments while also providing non-vacuous generalization guarantees.
翻译:梯度优化是深度学习的核心方法,通过反向传播实现了高效且可扩展的训练。然而,训练过程中梯度的暴露可能泄露底层数据的敏感信息,从而引发隐私和安全问题,例如对数据投毒攻击的易感性。相比之下,黑盒优化方法将模型视为不透明函数,仅依靠函数评估来指导优化,在数据访问受限、对抗风险较高或存在过拟合担忧的场景中提供了一种有前景的替代方案。本文提出了BBoxER,一种用于大语言模型后训练的进化黑盒方法,通过对训练数据的隐式压缩引入信息瓶颈。利用信息流的可追踪性,我们为非平凡泛化边界以及隐私性、数据投毒攻击鲁棒性和提取攻击鲁棒性提供了坚实的理论保证。在大语言模型实验中,我们通过实证表明,尽管黑盒方法存在固有的可扩展性和计算挑战,黑盒优化方法仍具备学习能力:少量BBoxER迭代即可提升模型性能,在推理数据集基准测试中表现出良好的泛化能力,并对成员推理攻击具有鲁棒性。这使得BBoxER成为基于梯度优化的理想附加模块,适用于受限或隐私敏感环境的部署,同时提供非平凡的泛化保证。