Building upon FutureX, which established a live benchmark for general-purpose future prediction, this report introduces FutureX-Pro, including FutureX-Finance, FutureX-Retail, FutureX-PublicHealth, FutureX-NaturalDisaster, and FutureX-Search. These together form a specialized framework extending agentic future prediction to high-value vertical domains. While generalist agents demonstrate proficiency in open-domain search, their reliability in capital-intensive and safety-critical sectors remains under-explored. FutureX-Pro targets four economically and socially pivotal verticals: Finance, Retail, Public Health, and Natural Disaster. We benchmark agentic Large Language Models (LLMs) on entry-level yet foundational prediction tasks -- ranging from forecasting market indicators and supply chain demands to tracking epidemic trends and natural disasters. By adapting the contamination-free, live-evaluation pipeline of FutureX, we assess whether current State-of-the-Art (SOTA) agentic LLMs possess the domain grounding necessary for industrial deployment. Our findings reveal the performance gap between generalist reasoning and the precision required for high-value vertical applications.
翻译:基于为通用未来预测建立实时基准的FutureX,本报告介绍了FutureX-Pro,包括FutureX-Finance、FutureX-Retail、FutureX-PublicHealth、FutureX-NaturalDisaster和FutureX-Search。这些共同构成了一个将智能体未来预测拓展至高价值垂直领域的专业化框架。尽管通用智能体在开放领域搜索中表现出色,但它们在资本密集型和安全性关键领域的可靠性仍有待探索。FutureX-Pro瞄准四个经济与社会关键垂直领域:金融、零售、公共卫生和自然灾害。我们在入门级但基础性的预测任务上对智能体大语言模型进行基准测试——涵盖从预测市场指标和供应链需求到追踪流行病趋势与自然灾害。通过采用FutureX的无污染实时评估流程,我们评估当前最先进的智能体大语言模型是否具备工业部署所需的领域基础。我们的研究结果揭示了通用推理能力与高价值垂直应用所需精度之间的性能差距。