Real-time schedulers must reason about tight deadlines under strict compute budgets. We present TempoNet, a reinforcement learning scheduler that pairs a permutation-invariant Transformer with a deep Q-approximation. An Urgency Tokenizer discretizes temporal slack into learnable embeddings, stabilizing value learning and capturing deadline proximity. A latency-aware sparse attention stack with blockwise top-k selection and locality-sensitive chunking enables global reasoning over unordered task sets with near-linear scaling and sub-millisecond inference. A multicore mapping layer converts contextualized Q-scores into processor assignments through masked-greedy selection or differentiable matching. Extensive evaluations on industrial mixed-criticality traces and large multiprocessor settings show consistent gains in deadline fulfillment over analytic schedulers and neural baselines, together with improved optimization stability. Diagnostics include sensitivity analyses for slack quantization, attention-driven policy interpretation, hardware-in-the-loop and kernel micro-benchmarks, and robustness under stress with simple runtime mitigations; we also report sample-efficiency benefits from behavioral-cloning pretraining and compatibility with an actor-critic variant without altering the inference pipeline. These results establish a practical framework for Transformer-based decision making in high-throughput real-time scheduling.
翻译:实时调度器必须在严格的计算预算下处理紧迫的截止时间。本文提出TempoNet,一种强化学习调度器,它将置换不变的Transformer与深度Q值近似相结合。其紧迫性标记器将时间松弛离散化为可学习的嵌入,从而稳定价值学习并捕捉截止时间的临近性。一个具有延迟感知的稀疏注意力堆栈,通过分块top-k选择与局部敏感分块技术,实现了对无序任务集的全局推理,具备近线性扩展和亚毫秒级推理能力。一个多核映射层通过掩码贪婪选择或可微分匹配,将上下文化的Q值分数转换为处理器分配。在工业混合关键性追踪数据和大规模多处理器设置上的广泛评估表明,相较于解析调度器和神经基线方法,TempoNet在截止时间满足率方面持续取得优势,同时优化稳定性得到提升。诊断分析包括:松弛量化的敏感性分析、注意力驱动的策略解释、硬件在环与内核微基准测试,以及在高负载下配合简单运行时缓解措施的鲁棒性;我们还报告了行为克隆预训练带来的样本效率优势,以及与无需改变推理流程的演员-评论家变体的兼容性。这些结果为基于Transformer的决策在高吞吐量实时调度中的实际应用建立了一个实用框架。