From Paper to Program: Knowledge Externalization for AI-Assisted Quantum Many-Body Code Generation

Large language models can write scientific code, but direct paper-to-program translation remains fragile when correctness depends on tacit conventions in the literature. We identify this bottleneck as \textbf{knowledge externalization}: converting implicit computational assumptions -- index conventions, gauge choices, fermionic signs, contraction order, and memory constraints -- into an explicit technical specification before implementation. We evaluate a multi-stage, human-in-the-loop workflow that inserts such a specification, with validation and stop gates, between theory extraction and code generation. The workflow is tested on two algorithmically distinct quantum many-body tasks: variational sweep-based Density-Matrix Renormalization Group (DMRG) from a pedagogical review and constructive Pfaffian conversion of Hartree--Fock--Bogoliubov states to matrix product states from the five-page Letter by Jin et al., Phys. Rev. B 105, L081101 (2022), for which no public code is available. For DMRG, all 16 specification-guided model pairings in a $4\times4$ grid satisfy physics-validation criteria, compared with 6/13 direct attempts. A prose-specification ablation indicates that externalized content, not \LaTeX{} formatting, is the essential ingredient. For Pfaffian-MPS, the workflow succeeds in 11/26 archived attempts, whereas direct prompting yields zero audited passes. Cross-specification transfer is asymmetric: non-GPT specifications implemented by GPT~5.5 pass 4/4, while GPT~5.5 specifications implemented by weaker models fail 4/4, indicating a residual implementation-model bottleneck. The resulting \emph{Paper-to-Program Many-Body} skill provides an auditable protocol for AI-assisted implementation of many-body algorithms and for diagnosing where externalization succeeds or fails.

翻译：大语言模型可以编写科学代码，但当正确性依赖于文献中的隐性约定时，从论文到程序的直接翻译仍然脆弱。我们将这一瓶颈识别为**知识外化**：在实现之前，将隐式计算假设（索引约定、规范选择、费米子符号、收缩顺序和内存约束）转化为明确的技术规范。我们评估了一个多阶段、人在环中的工作流程，该流程在理论提取和代码生成之间插入这样的规范，并带有验证和停止门控。该工作流程在两个算法上不同的量子多体任务上进行了测试：基于变分扫描的密度矩阵重正化群（DMRG，来自一篇教学综述），以及Jin等人《Phys. Rev. B 105, L081101 (2022)》五页信件中描述的从Hartree-Fock-Bogoliubov态到矩阵乘积态的构造性Pfaffian转换（该任务无公开代码）。对于DMRG，在$4\times4$网格中，所有16个规范引导的模型配对均满足物理验证标准，而直接尝试的通过率为6/13。一项文本规范消融实验表明，外化内容（而非LaTeX格式）是关键要素。对于Pfaffian-MPS，工作流程在26次存档尝试中成功11次，而直接提示的审计通过率为零。跨规范迁移具有不对称性：由GPT 5.5实现的非GPT规范通过率为4/4，而由较弱模型实现的GPT 5.5规范全部失败（4/4），表明存在残余的实现模型瓶颈。由此产生的*论文到程序多体*技能为AI辅助实现多体算法以及诊断外化成功或失败提供了可审计的协议。