Reinforcement Learning with Maskable Stock Representation for Portfolio Management in Customizable Stock Pools

Portfolio management (PM) is a fundamental financial trading task, which explores the optimal periodical reallocation of capitals into different stocks to pursue long-term profits. Reinforcement learning (RL) has recently shown its potential to train profitable agents for PM through interacting with financial markets. However, existing work mostly focuses on fixed stock pools, which is inconsistent with investors' practical demand. Specifically, the target stock pool of different investors varies dramatically due to their discrepancy on market states and individual investors may temporally adjust stocks they desire to trade (e.g., adding one popular stocks), which lead to customizable stock pools (CSPs). Existing RL methods require to retrain RL agents even with a tiny change of the stock pool, which leads to high computational cost and unstable performance. To tackle this challenge, we propose EarnMore, a rEinforcement leARNing framework with Maskable stOck REpresentation to handle PM with CSPs through one-shot training in a global stock pool (GSP). Specifically, we first introduce a mechanism to mask out the representation of the stocks outside the target pool. Second, we learn meaningful stock representations through a self-supervised masking and reconstruction process. Third, a re-weighting mechanism is designed to make the portfolio concentrate on favorable stocks and neglect the stocks outside the target pool. Through extensive experiments on 8 subset stock pools of the US stock market, we demonstrate that EarnMore significantly outperforms 14 state-of-the-art baselines in terms of 6 popular financial metrics with over 40% improvement on profit.

翻译：投资组合管理（PM）是一项基础性的金融交易任务，旨在通过定期优化不同股票间的资本配置以实现长期收益。强化学习（RL）近年来展现了通过与金融市场交互来训练盈利性PM代理的潜力。然而，现有研究主要聚焦于固定股票池，这与投资者的实际需求不符。具体而言，不同投资者的目标股票池因市场状态差异而千差万别，且个体投资者可能临时调整其期望交易的股票（例如加入热门股票），从而形成可定制股票池（CSP）。现有RL方法即使股票池发生微小变化也需要重新训练代理，导致计算成本高昂且性能不稳定。为解决这一挑战，我们提出EarnMore——一种基于可掩码股票表示的强化学习框架，通过全局股票池（GSP）的一次性训练即可处理CSP下的PM问题。具体地，我们首先引入一种机制来掩码目标池外股票的表示；其次，通过自监督掩码与重构过程学习有意义的股票表示；最后，设计重加权机制使得投资组合聚焦于有利股票而忽略目标池外股票。通过对美国股市8个子集股票池的广泛实验，我们证明EarnMore在6项流行金融指标上显著超越14个最先进基线方法，在收益方面实现超过40%的提升。