What makes a harness a harness: necessary and sufficient conditions for an agent harness

The term agent harness now circulates widely in software engineering with generative artificial intelligence. It names the layer that wraps a language model and turns it into a coding agent able to act on a repository. The usage is loose and polysemous. Sometimes the term denotes the whole product (Claude Code, Codex CLI); sometimes it denotes the evaluation scaffold that runs an agent against tasks (the SWE-bench harness); sometimes it gets conflated with an agent framework, an SDK, an IDE plugin, or an orchestrator. What is missing is a reference definition that works as an instrument, one that includes and excludes cases consistently. We build that definition through a conceptual analysis that combines works with persistent identifiers and primary grey-literature sources, such as official documentation, glossaries, and engineering reports. We reconstruct the genealogy of the term, from the horse's tack to the classic test harness, to the machine-learning evaluation harness, and finally to the agent harness. We then propose a constitutive definition that states the necessary and sufficient conditions for a system to be an agent harness, we operationalize it as an inclusion and exclusion test, and we draw the boundary of the concept against an agent framework, an agent SDK, an IDE plugin, an eval harness, and an orchestrator. We apply the definition to six real harnesses (Claude Code, Codex CLI, Aider, Cline, OpenHands, and SWE-agent) and to deliberate edge cases; the test includes and excludes consistently. We close with a research agenda organized by design tension axes. The contribution is an operational definition of agent harness, with a shared vocabulary, able to guide engineering practice and the scientific comparison of agentic systems.

翻译：“智能体框架”一词如今在生成式人工智能的软件工程领域广泛流传。它指的是包裹语言模型并将其转化为能够对代码仓库执行操作的编码智能体的一层结构。该术语的使用较为松散且多义。有时它指代整个产品（如Claude Code、Codex CLI）；有时指代用于运行智能体执行任务的评估脚手架（如SWE-bench框架）；有时它又与智能体框架、软件开发工具包、集成开发环境插件或编排器混为一谈。目前缺乏一个作为工具性参考的定义，该定义能够一致地包含和排除各种情形。我们通过概念分析来构建这一定义，结合持久标识符与主要灰色文献来源（如官方文档、术语表和工程报告）。我们追溯了该术语的谱系，从马的挽具到经典的测试框架，再到机器学习评估框架，最后到智能体框架。随后，我们提出了一个构成性定义，阐述了系统成为智能体框架的充要条件，并将其操作化为包含与排除测试，同时划定了该概念与智能体框架、智能体软件开发工具包、集成开发环境插件、评估框架和编排器之间的边界。我们将此定义应用于六个真实框架（Claude Code、Codex CLI、Aider、Cline、OpenHands和SWE-agent）及刻意设计的边缘案例；该测试能够一致地进行包含与排除。最后，我们根据设计张力轴线的研究议程进行总结。本文的贡献在于提出了一个具有共享词汇的智能体框架操作定义，能够指导工程实践和智能体系统的科学比较。