A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit - 专知论文

会员服务 ·

0

易处理的 · 对数几率 · 情境 · 效用 · 属性 ·

2023 年 3 月 27 日

A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

翻译：一个可解的多项式罗吉特情境赌博机在线学习算法

Priyank Agrawal,Theja Tulabandhula,Vashist Avadhanula

from arxiv, Accepted to be published at Elsevier European Journal of Operational Research (EJOR)

In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describe the products, and the mean utility of a product is linear in the values of these attributes. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model and consider the decision maker problem of dynamically learning the model parameters while optimizing cumulative revenue over the selling horizon $T$. Though this problem has attracted considerable attention in recent times, many existing methods often involve solving an intractable non-convex optimization problem. Their theoretical performance guarantees depend on a problem-dependent parameter which could be prohibitively large. In particular, existing algorithms for this problem have regret bounded by $O(\sqrt{\kappa d T})$, where $\kappa$ is a problem-dependent constant that can have an exponential dependency on the number of attributes. In this paper, we propose an optimistic algorithm and show that the regret is bounded by $O(\sqrt{dT} + \kappa)$, significantly improving the performance over existing methods. Further, we propose a convex relaxation of the optimization step, which allows for tractable decision-making while retaining the favourable regret guarantee.

翻译：本文考虑MNL-赌博机问题的情境变体。具体而言，我们研究一个动态集合优化问题：决策者在每一轮向消费者提供一个产品子集（组合），并观察消费者的响应。消费者通过购买产品以最大化其效用。我们假设产品由一组属性描述，且产品的平均效用与这些属性的取值呈线性关系。我们采用广泛使用的多项式罗吉特（MNL）模型对消费者选择行为进行建模，并考虑决策者在销售周期$T$内动态学习模型参数同时优化累积收益的问题。尽管该问题近年来备受关注，但现有方法往往需要求解一个棘手的非凸优化问题。其理论性能保证依赖于一个可能过大的问题相关参数。具体而言，现有算法的遗憾界为$O(\sqrt{\kappa d T})$，其中$\kappa$是可能随属性数量呈指数增长的问题相关常数。本文提出一种乐观算法，证明其遗憾界为$O(\sqrt{dT} + \kappa)$，显著优于现有方法。此外，我们提出优化步骤的凸松弛方法，在保持有利遗憾保证的同时实现可解的决策过程。

0

相关内容

易处理的

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【ICDM2022教程】多目标优化与推荐，173页ppt

【ICDM2022教程】多目标优化与推荐，173页ppt

专知会员服务

47+阅读 · 2022年12月24日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

专知会员服务

132+阅读 · 2021年4月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

专知会员服务

22+阅读 · 2020年6月19日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

3+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

1+阅读 · 2022年6月10日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

不完全信息下的投资组合选择模型研究：一个时间一致性的视角

国家自然科学基金

5+阅读 · 2015年12月31日

有理 Krylov 子空间算法的最优参数选取

国家自然科学基金

0+阅读 · 2015年12月31日

农村地区2型糖尿病Markov模型构建及相关干预策略经济学评价

国家自然科学基金

0+阅读 · 2013年12月31日

克服库存不精确的鲁棒集成补货、生产控制及分销策略

国家自然科学基金

0+阅读 · 2012年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

基于正则Vine copula的相依建模及软件开发

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

非线性协整模型的有效估计、检验及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

Domain Decomposition Learning Methods for Solving Elliptic Problems

Arxiv

0+阅读 · 2023年5月17日

A hybrid deep-learning-metaheuristic framework for discrete road network design problems

Arxiv

0+阅读 · 2023年5月16日

Improving the Data Efficiency of Multi-Objective Quality-Diversity through Gradient Assistance and Crowding Exploration

Arxiv

0+阅读 · 2023年5月16日

Evaluation Strategy of Time-series Anomaly Detection with Decay Function

Arxiv

0+阅读 · 2023年5月15日

Fairness in Forecasting of Observations of Linear Dynamical Systems

Arxiv

0+阅读 · 2023年5月15日

Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification

Arxiv

0+阅读 · 2023年5月15日

Enhancing Datalog Reasoning with Hypertree Decompositions

Arxiv

0+阅读 · 2023年5月15日

Generalized Kernel Two-Sample Tests

Arxiv

0+阅读 · 2023年5月14日

Sequential model correction for nonlinear inverse problems

Arxiv

0+阅读 · 2023年5月12日

Copula-based Sensitivity Analysis for Multi-Treatment Causal Inference with Unobserved Confounding

Arxiv

0+阅读 · 2023年5月11日

VIP会员

文章信息

相关主题

最新内容

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

9+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

8+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

3+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

5+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

6+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

5+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

7+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

6+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

8+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

7+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

9+阅读 · 7月20日

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

综述 | 终身视觉表征：持续自监督学习CSSL系统综述

专知会员服务

9+阅读 · 7月20日

深入Project Maven：为何人工智能在战场上依然失灵

深入Project Maven：为何人工智能在战场上依然失灵

专知会员服务

15+阅读 · 7月19日

锻造未来士兵：外骨骼、基因工程与赛博格

锻造未来士兵：外骨骼、基因工程与赛博格

专知会员服务

8+阅读 · 7月19日

《无人机系统（UAS）通信网状网络试验性部署》50页报告

《无人机系统（UAS）通信网状网络试验性部署》50页报告

专知会员服务

10+阅读 · 7月19日

相关VIP内容

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【ICDM2022教程】多目标优化与推荐，173页ppt

【ICDM2022教程】多目标优化与推荐，173页ppt

专知会员服务

47+阅读 · 2022年12月24日

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

128+阅读 · 2022年4月21日

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

专知会员服务

132+阅读 · 2021年4月25日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

专知会员服务

22+阅读 · 2020年6月19日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

《无人机对海面作战影响评估》

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

相关资讯

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

3+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

1+阅读 · 2022年6月10日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Domain Decomposition Learning Methods for Solving Elliptic Problems

Arxiv

0+阅读 · 2023年5月17日

A hybrid deep-learning-metaheuristic framework for discrete road network design problems

Arxiv

0+阅读 · 2023年5月16日

Improving the Data Efficiency of Multi-Objective Quality-Diversity through Gradient Assistance and Crowding Exploration

Arxiv

0+阅读 · 2023年5月16日

Evaluation Strategy of Time-series Anomaly Detection with Decay Function

Arxiv

0+阅读 · 2023年5月15日

Fairness in Forecasting of Observations of Linear Dynamical Systems

Arxiv

0+阅读 · 2023年5月15日

Probabilistic forecast of nonlinear dynamical systems with uncertainty quantification

Arxiv

0+阅读 · 2023年5月15日

Enhancing Datalog Reasoning with Hypertree Decompositions

Arxiv

0+阅读 · 2023年5月15日

Generalized Kernel Two-Sample Tests

Arxiv

0+阅读 · 2023年5月14日

Sequential model correction for nonlinear inverse problems

Arxiv

0+阅读 · 2023年5月12日

Copula-based Sensitivity Analysis for Multi-Treatment Causal Inference with Unobserved Confounding

Arxiv

0+阅读 · 2023年5月11日

相关基金

不完全信息下的投资组合选择模型研究：一个时间一致性的视角

国家自然科学基金

5+阅读 · 2015年12月31日

有理 Krylov 子空间算法的最优参数选取

国家自然科学基金

0+阅读 · 2015年12月31日

农村地区2型糖尿病Markov模型构建及相关干预策略经济学评价

国家自然科学基金

0+阅读 · 2013年12月31日

克服库存不精确的鲁棒集成补货、生产控制及分销策略

国家自然科学基金

0+阅读 · 2012年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

基于正则Vine copula的相依建模及软件开发

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

非线性协整模型的有效估计、检验及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员