IPO Finance Agent: Evaluation of LLM Financial Analysts beyond Finance Agent v2, with Automated Rubric Generation -- the Case of the SpaceX (SPCX) IPO - 专知论文

会员服务 ·

0

Agent · 首次公开募股（IPO） · MoDELS · Automator · CASE ·

IPO Finance Agent: Evaluation of LLM Financial Analysts beyond Finance Agent v2, with Automated Rubric Generation -- the Case of the SpaceX (SPCX) IPO

翻译：暂无翻译

Mostapha Benhenda

Finance Agent v2 (by Vals AI) has emerged as the reference benchmark for evaluating both Anthropic Claude and OpenAI ChatGPT frontier language models on financial tasks. However, it narrowly deals with periodic reporting from publicly traded companies (SEC 10-K and 10-Q filings), and its agentic harness relies on naive, unenriched chunk retrieval. Neither the task design nor the retrieval approach addresses the distinct challenges of IPO due diligence. SEC S-1 filings combine historical financial statements, governance structures, pro forma and common-control accounting treatments, capital-formation narratives, and underwriting-sensitive risk disclosures within substantially longer documents than typical periodic filings. That is why we introduce IPO Finance Agent, which extends the Finance Agent v2 framework along two directions: task domain and retrieval architecture. During our experiments, the original Finance Agent v2 harness basically failed to deliver any output related to the SpaceX S-1 filing, due to document length. We therefore had to improve the agentic harness with contextual retrieval, a more realistic and industry-standard approach for long documents. We also built a dataset of 1,000 IPO-diligence questions, and publicly release 70 questions on the SpaceX (SPCX) S-1 filing to support reproducibility, while the remainder are held private to guard against benchmark contamination. In addition, we introduce an evaluator-optimizer pipeline to automatically generate evaluation rubrics for the benchmark: candidate facts are extracted from an ensemble of independently-generated model answers to each question, consolidated into draft criteria, then automatically audited for omissions, hallucinations, mistiered items, and redundancy, with LLM feedback driving iterative repair, targeted enrichment, and deduplication. Human experts only review final rubrics before deployment. Results show that the best-performing evaluated model, Alibaba Qwen 3.7 Max, reaches 79.4% accuracy at $0.30 per query, and the most cost-efficient model on the resulting Pareto frontier, Xiaomi MiMo-2.5 Pro, reaches slightly lower accuracy (76.8%) at $0.05 per query. Both exceed the current Finance Agent v2 leaderboard ceiling-Google Gemini 3.5 Flash at 57.9% for $2.51 per querywhile undercutting even FABv2's cheapest entry (MiniMax M3: 48.3% at $0.32) on cost-efficiency. Code and data are released on GitHub: https://github.com/benstaf/ipoagent

翻译：暂无翻译

0

相关内容

Agent

《Hello-Agents》项目正式发布，一起从零学习智能体！

《Hello-Agents》项目正式发布，一起从零学习智能体！

专知会员服务

31+阅读 · 1月2日

Agent有望定义万亿劳动力市场

Agent有望定义万亿劳动力市场

专知会员服务

19+阅读 · 2025年6月11日

AI Agent，大模型时代重要落地方向, 42页ppt

AI Agent，大模型时代重要落地方向, 42页ppt

专知会员服务

291+阅读 · 2023年10月12日

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

专知会员服务

170+阅读 · 2023年9月15日

【ChatGPT系列报告】大模型在金融行业应用，金融+AI发展路径与商业落地探索，21页ppt

【ChatGPT系列报告】大模型在金融行业应用，金融+AI发展路径与商业落地探索，21页ppt

专知会员服务

105+阅读 · 2023年4月20日

【重磅推荐】量化金融自动交易的深度强化学习库。哥大开源“FinRL”:

【重磅推荐】量化金融自动交易的深度强化学习库。哥大开源“FinRL”:

专知会员服务

73+阅读 · 2021年3月27日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

119+阅读 · 2019年12月24日

【金融强化学习论文】金融资产组合管理问题的深度强化学习框架（A Deep Reinforcement Learning Framework for theFinancial Portfolio Management Problem）

【金融强化学习论文】金融资产组合管理问题的深度强化学习框架（A Deep Reinforcement Learning Framework for theFinancial Portfolio Management Problem）

专知会员服务

55+阅读 · 2019年12月16日

【O'Reilly AI Conference 2019】人工智能用于金融时间序列预测和动态资产组合优化（AI for financial time series forecasting and dynamic assets portfolio optimization），7bulls.com的高级副总裁Konrad Wawruch

【O'Reilly AI Conference 2019】人工智能用于金融时间序列预测和动态资产组合优化（AI for financial time series forecasting and dynamic assets portfolio optimization），7bulls.com的高级副总裁Konrad Wawruch

专知会员服务

53+阅读 · 2019年11月5日

金融大数据平台建设实践，深圳证券信息有限公司数据中心张俊总监，第八届全国社会媒体处理大会SMP2019

金融大数据平台建设实践，深圳证券信息有限公司数据中心张俊总监，第八届全国社会媒体处理大会SMP2019

专知会员服务

38+阅读 · 2019年10月24日

《通过近似动态规划解决具有动态目标到达的多Agent路由问题》美国空军大学130页学位论文

《通过近似动态规划解决具有动态目标到达的多Agent路由问题》美国空军大学130页学位论文

专知

15+阅读 · 2022年7月22日

悉尼科技操龙兵教授《金融人工智能》综述，40页pdf阐述金融AI挑战、技术与机会

悉尼科技操龙兵教授《金融人工智能》综述，40页pdf阐述金融AI挑战、技术与机会

专知

12+阅读 · 2021年10月9日

蚂蚁金服人工智能部论文《AGL:可扩展工业图机器学习系统》，处理十亿节点千亿边图数据的GNNs训练推理

蚂蚁金服人工智能部论文《AGL:可扩展工业图机器学习系统》，处理十亿节点千亿边图数据的GNNs训练推理

专知

33+阅读 · 2020年3月9日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

【AAAI2019教程】面向隐私安全保密的联邦学习与迁移学习，101页pdf

【AAAI2019教程】面向隐私安全保密的联邦学习与迁移学习，101页pdf

专知

47+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

泡泡机器人SLAM

11+阅读 · 2018年10月6日

上证信息－文因互联（联合）发布：知识提取在上市公司信息披露中的应用

上证信息－文因互联（联合）发布：知识提取在上市公司信息披露中的应用

文因互联

27+阅读 · 2018年9月7日

智能时代如何构建金融反欺诈体系？

智能时代如何构建金融反欺诈体系？

数据猿

12+阅读 · 2018年3月26日

金融科技&大数据产品推荐：达观数据—金融平台产品及资讯个性化推荐引擎

金融科技&大数据产品推荐：达观数据—金融平台产品及资讯个性化推荐引擎

数据猿

10+阅读 · 2017年9月19日

集成专家意见的在线投资组合策略设计及竞争性能分析

国家自然科学基金

0+阅读 · 2015年12月31日

针对S芯片验证模块引脚信息的自动分析技术

国家自然科学基金

0+阅读 · 2015年12月31日

基于反射理论的信息驱动金融市场模型研究

国家自然科学基金

2+阅读 · 2015年12月31日

随机波动率模型下金融衍生产品定价中的条件蒙特卡罗加速方法

国家自然科学基金

1+阅读 · 2015年12月31日

多主体交互作用下基于随机微分博弈的IPO决策机理研究

国家自然科学基金

1+阅读 · 2014年12月31日

信任、投资者参与模式与股票市场发展研究

国家自然科学基金

0+阅读 · 2014年12月31日

金融大数据随机建模中若干非马氏问题及其应用的研究

国家自然科学基金

1+阅读 · 2014年12月31日

互联网金融三维信任机制及参与者信任感知与交易决策

国家自然科学基金

0+阅读 · 2014年12月31日

金融数学交叉融合项目

国家自然科学基金

0+阅读 · 2014年12月31日

基于高频数据的金融市场间信息溢出与风险传染的微观机理、动态模型及其应用

国家自然科学基金

0+阅读 · 2014年12月31日

When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

Arxiv

0+阅读 · 6月23日

FinAcumen: Financial Multimodal Reasoning via Self-Evolving Experience Memory Harness

Arxiv

0+阅读 · 6月22日

MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

Arxiv

0+阅读 · 6月22日

Point-in-Time Financial RAG with Frozen LLMs and Market-Feedback Adaptive Retrieval

Arxiv

0+阅读 · 6月21日

CFAgentBench: A Reproducible Environment and Benchmark for Autonomous Construction-Finance Agents

Arxiv

0+阅读 · 6月20日

AGENTSERVESIM: A Hardware-aware Simulator for Multi-Turn LLM Agent Serving

Arxiv

0+阅读 · 6月18日

AI Economist Agent: An Agentic Framework for Model-Grounded Economic Analysis with RAG, Knowledge Graphs, and Large Language Models

Arxiv

0+阅读 · 6月18日

FFinRED: An Expert-Guided Benchmark Generation and Evaluation Framework for Financial LLM Red-Teaming

Arxiv

0+阅读 · 6月18日

AgentArmor: A Framework, Evaluation, \& Mitigation of Coding Agent Failures

Arxiv

0+阅读 · 6月13日

AI in Finance: Challenges, Techniques and Opportunities

Arxiv

46+阅读 · 2021年7月20日

VIP会员

文章信息

相关主题

首次公开募股（IPO）

最新内容

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

10+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

3+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

3+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

2+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

9+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

7+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

10+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

相关VIP内容

《Hello-Agents》项目正式发布，一起从零学习智能体！

《Hello-Agents》项目正式发布，一起从零学习智能体！

专知会员服务

31+阅读 · 1月2日

Agent有望定义万亿劳动力市场

Agent有望定义万亿劳动力市场

专知会员服务

19+阅读 · 2025年6月11日

AI Agent，大模型时代重要落地方向, 42页ppt

AI Agent，大模型时代重要落地方向, 42页ppt

专知会员服务

291+阅读 · 2023年10月12日

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

AI Agent下一个热点？复旦最新86页《大型语言模型智能体的崛起与潜力》综述，详述LLM Agent: 大脑、感知和行动

专知会员服务

170+阅读 · 2023年9月15日

【ChatGPT系列报告】大模型在金融行业应用，金融+AI发展路径与商业落地探索，21页ppt

【ChatGPT系列报告】大模型在金融行业应用，金融+AI发展路径与商业落地探索，21页ppt

专知会员服务

105+阅读 · 2023年4月20日

【重磅推荐】量化金融自动交易的深度强化学习库。哥大开源“FinRL”:

【重磅推荐】量化金融自动交易的深度强化学习库。哥大开源“FinRL”:

专知会员服务

73+阅读 · 2021年3月27日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

119+阅读 · 2019年12月24日

【金融强化学习论文】金融资产组合管理问题的深度强化学习框架（A Deep Reinforcement Learning Framework for theFinancial Portfolio Management Problem）

【金融强化学习论文】金融资产组合管理问题的深度强化学习框架（A Deep Reinforcement Learning Framework for theFinancial Portfolio Management Problem）

专知会员服务

55+阅读 · 2019年12月16日

【O'Reilly AI Conference 2019】人工智能用于金融时间序列预测和动态资产组合优化（AI for financial time series forecasting and dynamic assets portfolio optimization），7bulls.com的高级副总裁Konrad Wawruch

【O'Reilly AI Conference 2019】人工智能用于金融时间序列预测和动态资产组合优化（AI for financial time series forecasting and dynamic assets portfolio optimization），7bulls.com的高级副总裁Konrad Wawruch

专知会员服务

53+阅读 · 2019年11月5日

金融大数据平台建设实践，深圳证券信息有限公司数据中心张俊总监，第八届全国社会媒体处理大会SMP2019

金融大数据平台建设实践，深圳证券信息有限公司数据中心张俊总监，第八届全国社会媒体处理大会SMP2019

专知会员服务

38+阅读 · 2019年10月24日

热门VIP内容

开通专知VIP会员享更多权益服务

巡飞弹与反无人机系统——现代战场的两大支柱

《北约数字教官网络发展路径》128页报告

无人机自主控制与人工智能：系统性综述

《打造“黄金舰队”》57页报告

相关资讯

《通过近似动态规划解决具有动态目标到达的多Agent路由问题》美国空军大学130页学位论文

《通过近似动态规划解决具有动态目标到达的多Agent路由问题》美国空军大学130页学位论文

专知

15+阅读 · 2022年7月22日

悉尼科技操龙兵教授《金融人工智能》综述，40页pdf阐述金融AI挑战、技术与机会

悉尼科技操龙兵教授《金融人工智能》综述，40页pdf阐述金融AI挑战、技术与机会

专知

12+阅读 · 2021年10月9日

蚂蚁金服人工智能部论文《AGL:可扩展工业图机器学习系统》，处理十亿节点千亿边图数据的GNNs训练推理

蚂蚁金服人工智能部论文《AGL:可扩展工业图机器学习系统》，处理十亿节点千亿边图数据的GNNs训练推理

专知

33+阅读 · 2020年3月9日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

【AAAI2019教程】面向隐私安全保密的联邦学习与迁移学习，101页pdf

【AAAI2019教程】面向隐私安全保密的联邦学习与迁移学习，101页pdf

专知

47+阅读 · 2019年1月29日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

【泡泡图灵智库】密集相关的自监督视觉描述学习（RAL）

泡泡机器人SLAM

11+阅读 · 2018年10月6日

上证信息－文因互联（联合）发布：知识提取在上市公司信息披露中的应用

上证信息－文因互联（联合）发布：知识提取在上市公司信息披露中的应用

文因互联

27+阅读 · 2018年9月7日

智能时代如何构建金融反欺诈体系？

智能时代如何构建金融反欺诈体系？

数据猿

12+阅读 · 2018年3月26日

金融科技&大数据产品推荐：达观数据—金融平台产品及资讯个性化推荐引擎

金融科技&大数据产品推荐：达观数据—金融平台产品及资讯个性化推荐引擎

数据猿

10+阅读 · 2017年9月19日

相关论文

When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

Arxiv

0+阅读 · 6月23日

FinAcumen: Financial Multimodal Reasoning via Self-Evolving Experience Memory Harness

Arxiv

0+阅读 · 6月22日

MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

Arxiv

0+阅读 · 6月22日

Point-in-Time Financial RAG with Frozen LLMs and Market-Feedback Adaptive Retrieval

Arxiv

0+阅读 · 6月21日

CFAgentBench: A Reproducible Environment and Benchmark for Autonomous Construction-Finance Agents

Arxiv

0+阅读 · 6月20日

AGENTSERVESIM: A Hardware-aware Simulator for Multi-Turn LLM Agent Serving

Arxiv

0+阅读 · 6月18日

AI Economist Agent: An Agentic Framework for Model-Grounded Economic Analysis with RAG, Knowledge Graphs, and Large Language Models

Arxiv

0+阅读 · 6月18日

FFinRED: An Expert-Guided Benchmark Generation and Evaluation Framework for Financial LLM Red-Teaming

Arxiv

0+阅读 · 6月18日

AgentArmor: A Framework, Evaluation, \& Mitigation of Coding Agent Failures

Arxiv

0+阅读 · 6月13日

AI in Finance: Challenges, Techniques and Opportunities

Arxiv

46+阅读 · 2021年7月20日

相关基金

集成专家意见的在线投资组合策略设计及竞争性能分析

国家自然科学基金

0+阅读 · 2015年12月31日

针对S芯片验证模块引脚信息的自动分析技术

国家自然科学基金

0+阅读 · 2015年12月31日

基于反射理论的信息驱动金融市场模型研究

国家自然科学基金

2+阅读 · 2015年12月31日

随机波动率模型下金融衍生产品定价中的条件蒙特卡罗加速方法

国家自然科学基金

1+阅读 · 2015年12月31日

多主体交互作用下基于随机微分博弈的IPO决策机理研究

国家自然科学基金

1+阅读 · 2014年12月31日

信任、投资者参与模式与股票市场发展研究

国家自然科学基金

0+阅读 · 2014年12月31日

金融大数据随机建模中若干非马氏问题及其应用的研究

国家自然科学基金

1+阅读 · 2014年12月31日

互联网金融三维信任机制及参与者信任感知与交易决策

国家自然科学基金

0+阅读 · 2014年12月31日

金融数学交叉融合项目

国家自然科学基金

0+阅读 · 2014年12月31日

基于高频数据的金融市场间信息溢出与风险传染的微观机理、动态模型及其应用

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员