We study online inverse linear optimization, also known as contextual recommendation, where a learner sequentially infers an agent's hidden objective vector from observed optimal actions over feasible sets that change over time. The learner aims to recommend actions that perform well under the agent's true objective, and the performance is measured by the regret, defined as the cumulative gap between the agent's optimal values and those achieved by the learner's recommended actions. Prior work has established a regret bound of $O(d\log T)$, as well as a finite but exponentially large bound of $\exp(O(d\log d))$, where $d$ is the dimension of the optimization problem and $T$ is the time horizon, while a regret lower bound of $Ω(d)$ is known (Gollapudi et al. 2021; Sakaue et al. 2025). Whether a finite regret bound polynomial in $d$ is achievable or not has remained an open question. We partially resolve this by showing that when the feasible sets are M-convex -- a broad class that includes matroids -- a finite regret bound of $O(d\log d)$ is possible. We achieve this by combining a structural characterization of optimal solutions on M-convex sets with a geometric volume argument. Moreover, we extend our approach to adversarially corrupted feedback in up to $C$ rounds. We obtain a regret bound of $O((C+1)d\log d)$ without prior knowledge of $C$, by monitoring directed graphs induced by the observed feedback to detect corruptions adaptively.
翻译:我们研究在线逆线性优化问题,也称为情境推荐问题:学习者需从随时间变化的可行集上观察到的代理最优行动中,逐步推断该代理的隐藏目标向量。学习者的目标是在代理真实目标下推荐表现良好的行动,其性能通过遗憾值衡量,即代理最优值与学习者推荐行动所得累积差距。已有研究建立了$O(d\log T)$的遗憾界,以及指数级增长但有界的$\exp(O(d\log d))$界(其中$d$为优化问题维度,$T$为时间范围),同时已知遗憾下界为$\Omega(d)$(Gollapudi等,2021;Sakaue等,2025)。遗憾界能否实现$d$的多项式级有界性仍是开放问题。我们通过证明当可行集满足M-凸性(包含拟阵的广泛类别)时,可实现$O(d\log d)$的有限遗憾界,从而部分解决该问题。这一结果源于将M-凸集上最优解的结构特征与几何体积论证相结合。此外,我们将方法扩展至至多$C$轮的对抗性破坏反馈场景。通过监测由观测反馈诱导的有向图以自适应检测破坏,我们在无需先验知晓$C$的情况下,获得了$O((C+1)d\log d)$的遗憾界。