Optimal Exploration of New Products under Assortment Decisions

We study online learning for new products on a platform that makes capacity-constrained assortment decisions on which products to offer. For a newly listed product, its quality is initially unknown, and quality information propagates through social learning: when a customer purchases a new product and leaves a review, its quality is revealed to both the platform and future customers. Since reviews require purchases, the platform must feature new products in the assortment ("explore") to generate reviews to learn about new products. Such exploration is costly because customer demand for new products is lower than for incumbent products. We characterize the optimal assortments for exploration to minimize regret, addressing two questions. (1) Should the platform offer a new product alone or alongside incumbent products? The former maximizes the purchase probability of the new product but yields lower short-term revenue. Despite the lower purchase probability, we show it is always optimal to pair the new product with the top incumbent products. (2) With multiple new products, should the platform explore them simultaneously or one at a time? We show that the optimal number of new products to explore simultaneously has a simple threshold structure: it increases with the "potential" of the new products and, surprisingly, does not depend on their individual purchase probabilities. We also show that two canonical bandit algorithms, UCB and Thompson Sampling, both fail in this setting for opposite reasons: UCB over-explores while Thompson Sampling under-explores. Our results provide structural insights on how platforms should learn about new products through assortment decisions.

翻译：本文研究平台在容量受限的产品组合决策中，对新产品进行在线学习的问题。对于新上架产品，其质量初始未知，质量信息通过社会学习传播：当消费者购买新产品并留下评价后，平台与未来消费者均能获知其质量。由于评价依赖于购买行为，平台必须将新产品纳入产品组合（“探索”）以生成评价，从而了解新产品。这类探索具有成本，因为消费者对新产品的需求低于对已有产品的需求。我们刻画了最小化遗憾的探索最优产品组合，并回答两个问题：（1）平台应单独推出新产品，还是将其与已有产品共同展示？前者能最大化新产品的购买概率，但导致短期收益降低。尽管购买概率较低，但我们证明将新产品与最优质的已有产品搭配始终是最优策略。（2）面对多个新产品时，平台应同时探索还是逐一探索？我们证明同时探索的最优新产品数量具有简单的阈值结构：该阈值随新产品的“潜力”递增，且令人惊讶地独立于其个体购买概率。我们还表明，两种经典赌博机算法——上置信界算法与汤普森采样——在此场景下因相反原因失效：上置信界算法过度探索，而汤普森采样探索不足。本研究为平台如何通过产品组合决策学习新产品提供了结构性启示。