Large language models (LLMs) and agentic systems are increasingly proposed for financial trading, yet their reported performance remains difficult to compare because studies vary in data provenance, temporal split discipline, execution timing, turnover treatment, and transaction-cost modeling. This article presents a targeted topical review and reproducibility audit of execution realism in LLM-based trading research. A coded evidence matrix covering 30 trade-relevant primary studies is used to assess point-in-time controls, split transparency, held-out evaluation, cost and turnover treatment, execution semantics, universe definition, and artifact release. Across the audited sample, architecture reporting is generally clearer than the evaluation assumptions needed to judge whether a trading result is economically interpretable or reproducible. A 10-equity worked example is included only as a methodological scaffold to illustrate how explicit friction and timing choices can materially compress active-strategy results. The main conclusion is that the next useful step for LLM trading research is not only better agent design, but also clearer reporting standards for execution realism, reproducibility, and evaluation comparability.
翻译:大型语言模型(LLMs)及智能体系统正被越来越多地用于金融交易领域,然而其报告的性能仍难以比较,因为不同研究在数据来源、时间分割规范、执行时机、换手率处理及交易成本建模等方面存在差异。本文针对基于LLM的交易研究中的执行现实性,开展了一项定向主题综述与可重复性审计。采用覆盖30项交易相关主要研究的编码证据矩阵,评估时间点控制、分割透明度、保留集评估、成本与换手率处理、执行语义、标的范围界定及成果发布。审计样本显示,架构报告通常比评估假设更清晰——而后者正是判断交易结果在经济层面是否可解释或可重复所需的关键因素。本文仅以包含10只股票的示例作为方法论框架,说明明确的摩擦与时机选择如何实质性压缩主动策略结果。主要结论是:LLM交易研究下一步的关键不仅在于更优的智能体设计,更在于建立更清晰的执行现实性、可重复性及评估可比性的报告标准。