全球猎寻：用于投资、业务拓展及搜索评估中药物资产探查的深度研究AI智能体 (Hunt Globally: Deep Research AI Agents for Drug Asset Scouting in Investing, Business Development, and Search & Evaluation)

Alisa Vinogradova,Vlad Vinogradov,Luba Greenwood,Ilya Yasny,Dmitry Kobyzev,Shoman Kasbekar,Kong Nguyen,Dmitrii Radkevich,Roman Doronin,Andrey Doronichev

Bio-pharmaceutical innovation has shifted: many new drug assets now originate outside the United States and are disclosed primarily via regional, non-English channels. Recent data suggests >85% of patent filings originate outside the U.S., with China accounting for nearly half of the global total; a growing share of scholarly output is also non-U.S. Industry estimates put China at ~30% of global drug development, spanning 1,200+ novel candidates. In this high-stakes environment, failing to surface "under-the-radar" assets creates multi-billion-dollar risk for investors and business development teams, making asset scouting a coverage-critical competition where speed and completeness drive value. Yet today's Deep Research AI agents still lag human experts in achieving high-recall discovery across heterogeneous, multilingual sources without hallucinations. We propose a benchmarking methodology for drug asset scouting and a tuned, tree-based self-learning Bioptic Agent aimed at complete, non-hallucinated scouting. We construct a challenging completeness benchmark using a multilingual multi-agent pipeline: complex user queries paired with ground-truth assets that are largely outside U.S.-centric radar. To reflect real deal complexity, we collected screening queries from expert investors, BD, and VC professionals and used them as priors to conditionally generate benchmark queries. For grading, we use LLM-as-judge evaluation calibrated to expert opinions. We compare Bioptic Agent against Claude Opus 4.6, OpenAI GPT-5.2 Pro, Perplexity Deep Research, Gemini 3 Pro + Deep Research, and Exa Websets. Bioptic Agent achieves 79.7% F1 versus 56.2% (Claude Opus 4.6), 50.6% (Gemini 3 Pro + Deep Research), 46.6% (GPT-5.2 Pro), 44.2% (Perplexity Deep Research), and 26.9% (Exa Websets). Performance improves steeply with additional compute, supporting the view that more compute yields better results.

翻译：生物制药创新格局已发生转变：如今许多新药资产源自美国以外地区，且主要通过区域性非英语渠道披露。近期数据显示，超过85%的专利申报源自美国境外，其中中国占全球总量近半数；非美国产出的学术成果份额亦持续增长。行业评估显示中国约占全球药物研发的30%，涵盖1200余种新候选药物。在此高风险环境中，若未能发掘"潜藏"资产，将为投资者及业务拓展团队带来数十亿美元的风险，使得资产探查成为覆盖关键领域的竞争——其价值由速度与完整性驱动。然而，当前深度研究AI智能体在跨异构多语言源实现高召回率发现且避免幻觉生成方面，仍落后于人类专家。我们提出针对药物资产探查的基准测试方法，以及一种经调优的、基于树状结构的自学习双重视角智能体，旨在实现完整且无幻觉的探查。我们通过多语言多智能体流程构建了具有挑战性的完整性基准：将复杂用户查询与基本事实资产配对，这些资产大多处于以美国为中心的雷达范围之外。为反映真实交易复杂性，我们收集了来自专业投资者、业务拓展及风投人士的筛选查询，并将其作为先验条件生成基准查询。评估环节采用经专家意见校准的LLM-as-judge方法。我们将双重视角智能体与Claude Opus 4.6、OpenAI GPT-5.2 Pro、Perplexity Deep Research、Gemini 3 Pro + Deep Research及Exa Websets进行对比。双重视角智能体取得79.7%的F1分数，对比结果为：56.2%（Claude Opus 4.6）、50.6%（Gemini 3 Pro + Deep Research）、46.6%（GPT-5.2 Pro）、44.2%（Perplexity Deep Research）及26.9%（Exa Websets）。其性能随算力增加显著提升，印证了更多算力可产生更优结果的观点。