Reinforcement Learning with Maskable Stock Representation for Portfolio Management in Customizable Stock Pools

Portfolio management (PM) is a fundamental financial trading task, which explores the optimal periodical reallocation of capitals into different stocks to pursue long-term profits. Reinforcement learning (RL) has recently shown its potential to train profitable agents for PM through interacting with financial markets. However, existing work mostly focuses on fixed stock pools, which is inconsistent with investors' practical demand. Specifically, the target stock pool of different investors varies dramatically due to their discrepancy on market states and individual investors may temporally adjust stocks they desire to trade (e.g., adding one popular stocks), which lead to customizable stock pools (CSPs). Existing RL methods require to retrain RL agents even with a tiny change of the stock pool, which leads to high computational cost and unstable performance. To tackle this challenge, we propose EarnMore, a rEinforcement leARNing framework with Maskable stOck REpresentation to handle PM with CSPs through one-shot training in a global stock pool (GSP). Specifically, we first introduce a mechanism to mask out the representation of the stocks outside the target pool. Second, we learn meaningful stock representations through a self-supervised masking and reconstruction process. Third, a re-weighting mechanism is designed to make the portfolio concentrate on favorable stocks and neglect the stocks outside the target pool. Through extensive experiments on 8 subset stock pools of the US stock market, we demonstrate that EarnMore significantly outperforms 14 state-of-the-art baselines in terms of 6 popular financial metrics with over 40% improvement on profit.

翻译：投资组合管理（PM）是一项基础性金融交易任务，旨在通过定期优化不同股票间的资金配置来追求长期收益。近期，强化学习（RL）通过金融市场的交互训练能够获取盈利的PM代理，展现出其潜力。然而，现有研究主要集中于固定股票池，这与投资者的实际需求存在差异。具体而言，不同投资者的目标股票池因市场状态差异而显著不同，且个体投资者可能临时调整其交易意愿（例如增加热门股票），从而形成定制化股票池（CSP）。现有RL方法即使对股票池进行微小调整也需要重新训练代理，导致计算成本高且性能不稳定。为解决这一挑战，我们提出EarnMore框架——一种基于可掩码股票表征的强化学习框架，通过全球股票池（GSP）的一次性训练处理CSP下的PM问题。具体而言：首先引入机制对目标池外股票的表征进行掩码；其次通过自监督掩码与重构过程学习有意义的股票表征；最后设计重加权机制使投资组合聚焦于有利股票并忽略目标池外股票。在美国股市8个子集股票池上的大量实验表明，EarnMore在6项流行金融指标上显著优于14种最新基线方法，利润提升幅度超过40%。