Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, but their effectiveness in financial decision-making remains inadequately evaluated. Current benchmarks primarily assess LLMs' understanding on financial documents rather than the ability to manage assets or dig out trading opportunities in dynamic market conditions. Despite the release of new benchmarks for evaluating diversified tasks on the financial domain, we identified four major problems in these benchmarks, which are data leakage, navel-gazing, over-intervention, and maintenance-hard. To pave the research gap, we introduce DeepFund, a comprehensive arena platform for evaluating LLM-based trading strategies in a live environment. Our approach implements a multi-agent framework where they serve as multiple key roles that realize the real-world investment decision processes. Moreover, we provide a web interface that visualizes LLMs' performance with fund investment metrics across different market conditions, enabling detailed comparative analysis. Through DeepFund, we aim to provide a more realistic and fair assessment on LLM's capabilities in fund investment, offering diversified insights and revealing their potential applications in real-world financial markets. Our code is publicly available at https://github.com/HKUSTDial/DeepFund.
翻译:大型语言模型(LLMs)已在多个领域展现出卓越能力,但其在金融决策中的有效性仍未得到充分评估。现有基准主要测试LLMs对金融文档的理解能力,而非在动态市场环境中管理资产或挖掘交易机会的能力。尽管近期发布了针对金融领域多样化任务的评估基准,我们发现这些基准存在四大问题:数据泄露、闭门造车、过度干预和维护困难。为填补研究空白,我们推出了DeepFund——一个在实时环境中评估基于LLM的交易策略的综合竞技平台。该方法采用多智能体框架,各智能体扮演实现真实投资决策流程的关键角色。此外,我们提供了可视化网络界面,通过不同市场条件下的基金投资指标展示LLMs表现,支持精细化对比分析。通过DeepFund,我们旨在为LLMs在基金投资领域的能力提供更真实、公正的评估,提供多元化见解并揭示其在现实金融市场中的潜在应用。代码已开源:https://github.com/HKUSTDial/DeepFund。