Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, exposing gradients during training can leak sensitive information about the underlying data, raising privacy and security concerns such as susceptibility to data poisoning attacks. In contrast, black box optimization methods, which treat the model as an opaque function, relying solely on function evaluations to guide optimization, offer a promising alternative in scenarios where data access is restricted, adversarial risks are high, or overfitting is a concern. This paper introduces BBoxER, an evolutionary black-box method for LLM post-training that induces an information bottleneck via implicit compression of the training data. Leveraging the tractability of information flow, we provide non-vacuous generalization bounds and strong theoretical guarantees for differential privacy, robustness to data poisoning attacks, and extraction attacks. In experiments with LLMs, we demonstrate empirically that black-box optimization methods-despite the scalability and computational challenges inherent to black-box approaches-are able to learn, showing how a few iterations of BBoxER improve performance, generalize well on a benchmark of reasoning datasets, and are robust to membership inference attacks. This positions BBoxER as an attractive add-on on top of gradient-based optimization, offering suitability for deployment in restricted or privacy-sensitive environments while also providing non-vacuous generalization guarantees.
翻译:梯度优化是深度学习的核心方法,通过反向传播实现高效可扩展的训练。然而,训练过程中暴露梯度可能泄露底层数据的敏感信息,引发隐私与安全隐患,例如易受数据投毒攻击。相比之下,黑盒优化方法将模型视为不透明函数,仅依赖函数评估指导优化,在数据访问受限、对抗风险较高或需避免过拟合的场景中提供了有前景的替代方案。本文提出BBoxER——一种用于LLM后训练的进化黑盒方法,通过对训练数据的隐式压缩构建信息瓶颈。借助信息流的可追踪性,我们建立了非平凡的泛化界限,并为差分隐私、数据投毒攻击鲁棒性及提取攻击提供了严格的理论保证。在LLM实验中,我们实证表明黑盒优化方法——尽管存在固有的可扩展性与计算挑战——仍具备学习能力:少量BBoxER迭代即可提升模型性能,在推理数据集基准测试中展现良好泛化能力,并对成员推断攻击保持鲁棒性。这使BBoxER成为基于梯度优化的理想补充方案,适用于受限或隐私敏感环境下的部署,同时提供非平凡的泛化理论保证。