This paper considers stochastic-constrained stochastic optimization where the stochastic constraint is to satisfy that the expectation of a random function is below a certain threshold. In particular, we study the setting where data samples are drawn from a Markov chain and thus are not independent and identically distributed. We generalize the drift-plus-penalty framework, a primal-dual stochastic gradient method developed for the i.i.d. case, to the Markov chain sampling setting. We propose two variants of drift-plus-penalty; one is for the case when the mixing time of the underlying Markov chain is known while the other is for the case of unknown mixing time. In fact, our algorithms apply to a more general setting of constrained online convex optimization where the sequence of constraint functions follows a Markov chain. Both algorithms are adaptive in that the first works without knowledge of the time horizon while the second uses AdaGrad-style algorithm parameters, which is of independent interest. We demonstrate the effectiveness of our proposed methods through numerical experiments on classification with fairness constraints.
翻译:本文研究随机约束下的随机优化问题,其中随机约束要求某个随机函数的期望值低于给定阈值。特别地,我们考虑数据样本来自马尔可夫链(而非独立同分布)的场景。我们将漂移加惩罚框架(一种针对独立同分布情况提出的原始-对偶随机梯度方法)推广至马尔可夫链采样环境。我们提出了两种漂移加惩罚变体:一种适用于底层马尔可夫链混合时间已知的情况,另一种适用于混合时间未知的情况。实际上,我们的算法适用于更一般的约束在线凸优化场景,其中约束函数序列遵循马尔可夫链。两种算法均具有自适应性:前者无需预知时间范围即可运行,后者采用AdaGrad风格的算法参数(该设计本身具有独立研究价值)。通过公平约束分类问题的数值实验,我们验证了所提方法的有效性。