When multiple LLM coding agents share a rate-limited API endpoint, they exhibit resource contention patterns analogous to unscheduled OS processes competing for CPU, memory, and I/O. In a motivating incident, 3 of 11 parallel agents died from connection resets and HTTP 502 errors - a 27% failure rate - despite the API having sufficient aggregate capacity to serve all 11 sequentially. We present HIVEMIND, a transparent HTTP proxy that applies five OS-inspired scheduling primitives - admission control, rate-limit tracking, AIMD backpressure with circuit breaking, token budget management, and priority queuing - to eliminate the failure modes caused by uncoordinated parallel execution. The proxy requires zero modifications to existing agent code and supports Anthropic, OpenAI, and local model APIs via auto-detected provider profiles. Our evaluation across seven scenarios (5-50 concurrent agents) shows that uncoordinated agents fail at 72-100% rates under contention, while HIVEMIND reduces failures to 0-18% and eliminates 48-100% of wasted compute. An ablation study reveals that transparent retry - not admission control - is the single most critical primitive, but the primitives are most effective in combination. Real-world validation against Ollama confirms that HIVEMIND adds under 3ms of proxy overhead per request. The system is open-source under the MIT license.
翻译:当多个LLM编码代理共享速率受限的API端点时,会表现出类似于未调度操作系统进程竞争CPU、内存和I/O的资源争用模式。在一次典型故障中,尽管API其实具备足以依次服务全部11个代理的聚合容量,仍有3个并行代理因连接重置和HTTP 502错误而失效——故障率达27%。我们提出HIVEMIND,一种透明的HTTP代理,通过应用五种受操作系统启发的调度原语——准入控制、速率限制跟踪、带熔断机制的AIMD背压、令牌预算管理和优先级队列——来消除非协调并行执行引发的故障模式。该代理无需修改现有代理代码,并可通过自动检测的提供商配置文件支持Anthropic、OpenAI及本地模型API。在七个场景(5-50个并发代理)中的评估表明,非协调代理在争用条件下故障率达72-100%,而HIVEMIND将故障率降至0-18%,并消除48-100%的无效计算。消融实验揭示,透明重试(而非准入控制)是唯一最关键的原语,但各原语组合使用时效果最佳。针对Ollama的真实环境验证证实,HIVEMIND每次请求的代理开销低于3ms。该系统采用MIT许可证开源。