M${}^{\natural}$-concave functions, a.k.a. gross substitute valuation functions, play a fundamental role in many fields, including discrete mathematics and economics. In practice, perfect knowledge of M${}^{\natural}$-concave functions is often unavailable a priori, and we can optimize them only interactively based on some feedback. Motivated by such situations, we study online M${}^{\natural}$-concave function maximization problems, which are interactive versions of the problem studied by Murota and Shioura (1999). For the stochastic bandit setting, we present $O(T^{-1/2})$-simple regret and $O(T^{2/3})$-regret algorithms under $T$ times access to unbiased noisy value oracles of M${}^{\natural}$-concave functions. A key to proving these results is the robustness of the greedy algorithm to local errors in M${}^{\natural}$-concave function maximization, which is one of our main technical results. While we obtain those positive results for the stochastic setting, another main result of our work is an impossibility in the adversarial setting. We prove that, even with full-information feedback, no algorithms that run in polynomial time per round can achieve $O(T^{1-c})$ regret for any constant $c > 0$ unless $\mathsf{P} = \mathsf{NP}$. Our proof is based on a reduction from the matroid intersection problem for three matroids, which would be a novel idea in the context of online learning.
翻译:Mᵗ-凹函数(即总替代估值函数)在离散数学和经济学等多个领域发挥着基础性作用。实际应用中,Mᵗ-凹函数的完备先验知识往往不可得,我们只能基于某些反馈通过交互方式对其进行优化。受此类场景启发,我们研究在线Mᵗ-凹函数最大化问题——这是Murota与Shioura(1999)所研究问题的交互式版本。针对随机赌博机设置,我们提出了在T次访问Mᵗ-凹函数无偏噪声值预言机条件下的$O(T^{-1/2})$简单遗憾与$O(T^{2/3})$遗憾算法。证明这些结果的关键在于Mᵗ-凹函数最大化中贪婪算法对局部误差的鲁棒性——这是本文主要技术成果之一。在随机设置中获得正面结果的同时,另一主要成果揭示了对抗设置中的不可行性。我们证明:即使具备全信息反馈,除非$\mathsf{P} = \mathsf{NP}$,否则任何每轮多项式时间运行的算法都无法对任意常数$c > 0$实现$O(T^{1-c})$遗憾。该证明基于三拟阵交问题的归约,这在在线学习领域属于创新性思路。