SemaTune: Semantic-Aware Online OS Tuning with Large Language Models

Online OS tuning can improve long-running services, but existing controllers are poorly matched to live hosts. They treat scheduler, power, memory, and I/O controls as black-box variables and optimize a scalar reward. This view ignores cross-knob policy structure, breaks down when application metrics are unavailable, and can send a running service into degraded regions that persist after the bad setting is removed. We present SemaTune, a host-side framework for steady-state OS tuning with bounded language-model guidance. SemaTune turns knob schemas, telemetry, current configuration, recent action--response history, and retrieved prior runs into a compact decision context. A fast loop proposes low-latency updates, a slower loop periodically revises the search strategy, and every proposed change passes through typed validation before reaching kernel or sysctl interfaces. This lets the controller reason about OS-control meaning and indirect performance signals while keeping model cost, latency, and authority constrained. We evaluate SemaTune on 13 live workloads from five benchmark suites while tuning up to 41 Linux parameters. Across the suite, SemaTune improves stable-phase performance by 72.5\% over default settings and by 153.3\% relative to the strongest non-LLM baseline. A 30-window session costs about \$0.20 in model calls. With only host-level metrics, SemaTune still outperforms baselines given direct application objectives by 93.7 percentage points, while avoiding severe degraded regions reached by structure-blind exploration.

翻译：在线操作系统调优能够提升长期运行服务的性能，但现有控制器难以适配真实运行的主机环境。这些控制器将调度器、电源、内存和I/O控制视为黑盒变量，并优化单一标量奖励值。这种视角忽略了跨调控参数间的策略结构，当应用指标不可用时即失效，且可能使运行中的服务进入劣化区域——即便在移除错误设置后仍无法恢复。我们提出SemaTune，一种采用受限语言模型引导的稳态操作系统调优主机端框架。SemaTune将调控参数模式、遥测数据、当前配置、近期动作-响应历史以及检索到的历史运行记录整合为紧凑决策上下文。快速循环执行低延迟更新，慢速循环周期性地修正搜索策略，所有提议变更在生效前均需通过类型化验证才能触及内核或sysctl接口。这使得控制器既能理解操作系统控制语义和间接性能信号，又能约束模型成本、延迟和授权范围。我们在五个基准套件的13个实际工作负载上评估SemaTune，同时调优多达41个Linux参数。在完整套件中，SemaTune将稳态性能较默认设置提升72.5%，较最强非大语言模型基线提升153.3%。30轮会话的模型调用成本约为0.20美元。即便仅依赖主机级指标，SemaTune仍以93.7个百分点的优势优于直接给定应用目标的基线，同时避免了结构无关搜索所导致的严重劣化区域。