The ability to learn robust policies while generalizing over large discrete action spaces is an open challenge for intelligent systems, especially in noisy environments that face the curse of dimensionality. In this paper, we present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future state. We describe an encoder-decoder architecture for action embeddings with a dual channel loss that balances between action reconstruction and state prediction accuracy. We use the trained decoder in conjunction with a standard reinforcement learning algorithm that produces actions in the embedding space. Our architecture is able to outperform two competitive baselines in two diverse environments: a 2D maze environment with more than 4000 discrete noisy actions, and a product recommendation task that uses real-world e-commerce transaction data. Empirical results show that the model results in cleaner action embeddings, and the improved representations help learn better policies with earlier convergence.
翻译:在噪声环境中面临维度诅咒时,学习鲁棒策略并泛化大规模离散动作空间的能力仍是智能系统的开放挑战。本文提出一种高效学习动作嵌入的新框架,该框架能同时重构原始动作并预测期望的未来状态。我们描述了一种编码器-解码器架构,其双通道损失函数在动作重构与状态预测精度之间取得平衡。训练后的解码器可与标准强化学习算法协同使用,在嵌入空间中生成动作。在两种不同环境下,我们的架构能超越两个竞争基线:具有4000多个离散噪声动作的二维迷宫环境,以及使用真实世界电子商务交易数据的产品推荐任务。实验结果表明,模型能产生更清晰的動作嵌入,改进后的表征有助于学习更优策略并实现更早收敛。