We study mathematical programs with equilibrium constraints, in which a leader knows their own cost function, but lacks a model of the followers' response. Instead, the leader can only query this response at specific points. While this setting precludes the use of gradient-based methods, existing zeroth-order approaches treat the composed objective entirely as a black box, deploying zeroth-order tools across both the leader and follower. Such approaches are inefficient, as they discard information the leader already possesses about their own cost function. In this work we instead propose to deploy zeroth-order tools only where they are truly needed: to handle the unknown, non-smooth followers' response. Specifically, we first propose PZOS, an algorithm that combines exact partial gradients of the leader's cost with zeroth-order Jacobian estimates of the followers' response in a chain-rule-inspired manner, and establish that it achieves a strictly lower variance bound than the black-box baseline. Second, we introduce the partial Goldstein subdifferential, a stationarity notion tailored to this composite structure, and prove convergence of our algorithm to both standard and partial Goldstein stationary points. Finally, we validate our method on two application domains -- toll optimization in routing games and defense-attack investment in security games -- demonstrating consistent improvements over black-box baselines in convergence speed, objective value, and estimator variance, with robust performance even under few queries per iteration.
翻译:我们研究带有均衡约束的数学规划问题,其中领导者知晓自身的代价函数,但缺乏对跟随者响应的模型。相反,领导者只能在特定点查询该响应。虽然此设定排除了基于梯度的方法,但现有的零阶方法将复合目标函数完全视为黑箱,并对领导者和跟随者均部署零阶工具。此类方法效率低下,因为它们丢弃了领导者已掌握的自身代价函数信息。本研究提出仅在真正需要的地方部署零阶工具:处理未知、非光滑的跟随者响应。具体而言,我们首先提出PZOS算法,该算法结合了领导者代价的精确偏梯度与跟随者响应的零阶雅可比估计,以链式法则为启发,并证明其方差下界严格低于黑箱基线。其次,我们引入局部Goldstein次微分——一种针对此复合结构定制的平稳性概念,并证明我们的算法收敛至标准及局部Goldstein平稳点。最后,我们在两个应用领域——博弈路由中的收费优化和安全博弈中的防御-攻击投资——验证了该方法,展示了其在收敛速度、目标值和估计器方差上相比黑箱基线的一致改进,即使在每轮迭代仅进行少量查询时仍具有稳健性能。