We present Incisor, a cloud HPC job submission system for the ex ante instance selection problem: choosing suitable hardware in the challenging but common setting where only the executable, inputs, and invocation commands are available at submission time. In practice, this task is manual and expertise-intensive, requiring users to combine incomplete knowledge of rapidly evolving cloud offerings with workload-specific intuition, static analysis, and systems reasoning to infer hardware constraints and select an instance type for each job. Incisor automates this process by pairing widely available program analysis tools with LLM-guided reasoning to infer hardware requirements and choose cloud instances. Using submission artifacts alone, Incisor atop frontier coding LLMs selects working AWS EC2 instances ex ante for 100% of first-time runs of source-compiled (C, C++, Fortran) and Python applications. Against a strong baseline combining expert-derived constraints with SkyPilot's instance selection, Incisor cuts job runtime by 54% and instance costs by 44%.
翻译:我们提出 Incisor,一种面向事前实例选择问题的云 HPC 作业提交系统:在仅获知可执行程序、输入文件及调用命令的挑战性且常见场景下,为作业匹配合适的硬件资源。实际应用中,该任务依赖人工且需要丰富的专业经验,要求用户将快速演进的云服务不完全认知与工作负载特性直觉、静态分析及系统推理能力相结合,才能推断出硬件约束并为每项作业选定实例类型。Incisor 通过将广泛可用的程序分析工具与基于大语言模型(LLM)的推理能力相融合,自动完成硬件需求推断与云实例选择。仅凭提交工件,基于前沿编码型LLM的 Incisor 即可为100%源码编译型(C、C++、Fortran)及 Python 应用程序的首次运行,实现事前选取可用的 AWS EC2 实例。相较于融合专家经验约束与 SkyPilot 实例选择的强基线方法,Incisor 将作业运行时长缩短54%,实例成本降低44%。