Reinforcement learning (RL) has shown promise for trading, yet most open-source backtesting environments assume negligible or fixed transaction costs, causing agents to learn trading behaviors that fail under realistic execution. We introduce three Gymnasium-compatible trading environments -- MACE (Market-Adjusted Cost Execution) stock trading, margin trading, and portfolio optimization -- that integrate nonlinear market impact models grounded in the Almgren-Chriss framework and the empirically validated square-root impact law. Each environment provides pluggable cost models, permanent impact tracking with exponential decay, and comprehensive trade-level logging. We evaluate five DRL algorithms (A2C, PPO, DDPG, SAC, TD3) on the NASDAQ-100, comparing a fixed 10 bps baseline against the AC model with Optuna-tuned hyperparameters. Our results show that (i) the cost model materially changes both absolute performance and the relative ranking of algorithms across all three environments; (ii) the AC model produces dramatically different trading behavior, e.g., daily costs dropping from $200k to $8k with turnover falling from 19% to 1%; (iii) hyperparameter optimization is essential for constraining pathological trading, with costs dropping up to 82%; and (iv) algorithm-cost model interactions are strongly environment-specific, e.g., DDPG's OOS Sharpe jumps from -2.1 to 0.3 under AC in margin trading while SAC's drops from -0.5 to -1.2. We release the full suite as an open-source extension to FinRL-Meta.
翻译:强化学习在交易领域展现出潜力,但大多数开源回测环境假设交易成本可忽略或固定,导致智能体学习到的交易行为在真实执行中失效。我们引入三个兼容Gymnasium的交易环境——MACE(市场调整成本执行)股票交易、保证金交易和投资组合优化——这些环境集成了基于Almgren-Chriss框架和经实证验证的平方根冲击法则的非线性市场冲击模型。每个环境提供可插拔成本模型、具有指数衰减的永久冲击追踪以及全面的交易级日志记录。我们在纳斯达克100指数上评估了五种深度强化学习算法(A2C、PPO、DDPG、SAC、TD3),将固定10个基点的基线模型与经Optuna调优超参数的AC模型进行比较。结果表明:(i)成本模型显著改变了所有三个环境中算法的绝对表现和相对排名;(ii)AC模型产生截然不同的交易行为,例如每日成本从20万美元降至8000美元,换手率从19%降至1%;(iii)超参数优化对于约束病态交易至关重要,成本降幅最高达82%;(iv)算法与成本模型的交互具有强烈的环境特异性,例如在保证金交易中,DDPG的样本外夏普比率从-2.1跃升至0.3,而SAC的则从-0.5降至-1.2。我们将整套工具作为FinRL-Meta的开源扩展发布。