AI-native software development is often evaluated at the level of individual models, prompts, or generated artifacts. This framing is insufficient for production environments where software must be continuously produced, verified, deployed, maintained, and adapted across many operational contexts and long time horizons. We present a meta-engineering harness: a software-production architecture that transforms operational and product feature requirements into explicit contracts, routes work through role-specialized AI agents, performs independent and adversarial verification, and continuously improves itself through structured failure classification and outer-loop calibration. The harness is designed for settings in which software delivery is not a one-time project but an ongoing operating function. In our motivating application, CTO-as-a-service for small service firms, the system manages websites, booking flows, payment systems, backoffice workflow automations, and AI-agent interfaces as continuously evolving technical infrastructure rather than one-off deliverables. We describe the layered architecture, including two-pass contract compilation, persistent markdown memory with specialization records, attention-based and independence-based verifications, a four-way failure arbiter, and outer-loop calibration. We report results from an early production deployment spanning 17 features over several weeks, including a detailed in-app payments case study that revealed contract incompleteness and verification-boundary issues. These observations directly drove targeted improvements to the harness. The contribution is an implemented, measurable, and extensible verification architecture for making AI-native service-as-a-software production reliable, auditable, and improvable over time.
翻译:AI原生软件开发通常通过单个模型、提示词或生成工件的层次进行评估。这种框架不足以满足生产环境的需求——在这些环境中,软件必须在多种运营场景和长期时间跨度内持续被生产、验证、部署、维护和调整。我们提出一种元工程框架:一种将运营和产品特性需求转化为显式合约、通过角色专业化AI代理分配工作、执行独立与对抗性验证、并通过结构化失败分类与外环校准实现持续自我改进的软件生产架构。该框架专为软件交付并非一次性项目而是持续运营职能的场景设计。在我们的驱动应用中(面向小型服务企业的"CTO即服务"系统),该框架将网站、预订流程、支付系统、后台工作流自动化及AI代理接口作为持续演进的技术基础设施而非一次性交付物进行管理。我们描述了其分层架构,包括两轮合约编译、带专业化记录的持久化Markdown记忆、基于注意力和独立性的验证、四向失败仲裁器以及外环校准。我们报告了早期生产部署结果(涵盖17项功能、持续数周),并包含一项详细的支付应用案例研究,该案例揭示了合约不完整性与验证边界问题。这些观察直接推动了框架的定向改进。本研究的贡献在于提供了一种已实现、可衡量且可扩展的验证架构,使AI原生服务即软件生产能够长期保持可靠、可审计且可改进。