We introduce Onflow, a reinforcement learning technique that enables online optimization of portfolio allocation policies based on gradient flows. We devise dynamic allocations of an investment portfolio to maximize its expected log return while taking into account transaction fees. The portfolio allocation is parameterized through a softmax function, and at each time step, the gradient flow method leads to an ordinary differential equation whose solutions correspond to the updated allocations. This algorithm belongs to the large class of stochastic optimization procedures; we measure its efficiency by comparing our results to the mathematical theoretical values in a log-normal framework and to standard benchmarks from the 'old NYSE' dataset. For log-normal assets, the strategy learned by Onflow, with transaction costs at zero, mimics Markowitz's optimal portfolio and thus the best possible asset allocation strategy. Numerical experiments from the 'old NYSE' dataset show that Onflow leads to dynamic asset allocation strategies whose performances are: a) comparable to benchmark strategies such as Cover's Universal Portfolio or Helmbold et al. "multiplicative updates" approach when transaction costs are zero, and b) better than previous procedures when transaction costs are high. Onflow can even remain efficient in regimes where other dynamical allocation techniques do not work anymore. Therefore, as far as tested, Onflow appears to be a promising dynamic portfolio management strategy based on observed prices only and without any assumption on the laws of distributions of the underlying assets' returns. In particular it could avoid model risk when building a trading strategy.
翻译:我们提出Onflow,一种基于梯度流的强化学习技术,能够在线优化投资组合配置策略。我们设计投资组合的动态配置,以最大化其预期对数收益率,同时考虑交易费用。投资组合配置通过softmax函数参数化,在每一步时间,梯度流方法导出一个常微分方程,其解对应于更新后的配置。该算法属于随机优化过程的大类;我们通过在对数正态框架下将其结果与数学理论值进行比较,以及基于“老纽约证券交易所”数据集的标准基准测试,来衡量其效率。对于对数正态资产,在交易成本为零时,Onflow学习的策略模拟了马科维茨最优投资组合,从而实现了最佳的资产配置策略。“老纽约证券交易所”数据集的数值实验表明,Onflow生成的动态资产配置策略具有以下性能:a)当交易成本为零时,与Cover的通用投资组合或Helmbold等人的“乘法更新”方法等基准策略相当;b)当交易成本较高时,优于以往的程序。Onflow甚至可以在其他动态配置技术失效的机制中保持高效。因此,就测试结果而言,Onflow似乎是一种有潜力的动态投资组合管理策略,它仅基于观测价格,且无需对底层资产收益的分布规律做任何假设,尤其可以在构建交易策略时避免模型风险。