Foundation models are deep neural networks (such as GPT-5, Gemini~3, and Opus~4) trained on large datasets that can perform diverse downstream tasks -- text and code generation, question answering, summarization, image classification, and so on. The philosophy of foundation models is to put effort into a single, large (${\sim}10^{12}$-parameter) general-purpose model that can be adapted to many downstream tasks with no or minimal additional training. We argue that the rise of foundation models presents an opportunity for hardware engineers: in contrast to when different models were used for different tasks, it now makes sense to build special-purpose, fixed hardware implementations of neural networks, manufactured and released at the roughly 1-year cadence of major new foundation-model versions. Beyond conventional digital-electronic inference hardware with read-only weight memory, we advocate a more radical re-thinking: hardware in which the neural network is realized directly at the level of the physical design and operates via the hardware's natural physical dynamics -- \textit{Physical Foundation Models} (PFMs). PFMs could enable orders-of-magnitude advantages in energy efficiency, speed, and parameter density. For ${\sim}10^{12}$-parameter models, this would both reduce the high energy burden of AI in datacenters and enable AI in edge devices that today are power-constrained to far smaller models. PFMs could also enable inference hardware for models much larger than current ones: $10^{15}$- or even $10^{18}$-parameter PFMs seem plausible by some measures. We present back-of-the-envelope calculations illustrating PFM scaling using an optical example -- a 3D nanostructured glass medium -- and discuss prospects in nanoelectronics and other physical platforms. We conclude with the major research challenges that must be resolved for trillion-parameter PFMs and beyond to become reality.
翻译:基础模型是在大数据集上训练的深度神经网络(如GPT-5、Gemini~3和Opus~4),能够执行多样化的下游任务——文本与代码生成、问答、摘要、图像分类等。基础模型的理念是将精力投入到一个大型(约$10^{12}$参数)、通用模型中,该模型可通过最少甚至无需额外训练即可适应多种下游任务。我们认为,基础模型的兴起为硬件工程师提供了机遇:与过去不同任务采用不同模型的做法相反,现在构建专用、固定的神经网络硬件实现是合理的,这类硬件可按主要新基础模型版本约1年的更新节奏进行制造和发布。除了带有只读权重存储器的传统数字电子推理硬件外,我们主张一种更彻底的重新思考:直接在物理设计层面实现神经网络,并通过硬件的自然物理动态来运行的硬件——即《物理基础模型》(PFMs)。PFMs可在能效、速度和参数密度上实现数量级的优势。对于约$10^{12}$参数的模型,这既能降低数据中心中AI的高能耗负担,也能使当前受功率限制而仅能运行小得多的模型的边缘设备实现AI功能。PFMs还可为比当前模型更大的推理硬件提供支持:根据某些指标,$10^{15}$甚至$10^{18}$参数的PFMs似乎可行。我们通过一个光学示例——三维纳米结构玻璃介质——展示了PFM规模的粗略估算,并讨论了纳电子学及其他物理平台的前景。最后,我们总结了在万亿参数级PFM及其更远未来成为现实之前必须解决的主要研究挑战。