Real-time schedulers must reason about tight deadlines under strict compute budgets. We present TempoNet, a reinforcement learning scheduler that pairs a permutation-invariant Transformer with a deep Q-approximation. An Urgency Tokenizer discretizes temporal slack into learnable embeddings, stabilizing value learning and capturing deadline proximity. A latency-aware sparse attention stack with blockwise top-k selection and locality-sensitive chunking enables global reasoning over unordered task sets with near-linear scaling and sub-millisecond inference. A multicore mapping layer converts contextualized Q-scores into processor assignments through masked-greedy selection or differentiable matching. Extensive evaluations on industrial mixed-criticality traces and large multiprocessor settings show consistent gains in deadline fulfillment over analytic schedulers and neural baselines, together with improved optimization stability. Diagnostics include sensitivity analyses for slack quantization, attention-driven policy interpretation, hardware-in-the-loop and kernel micro-benchmarks, and robustness under stress with simple runtime mitigations; we also report sample-efficiency benefits from behavioral-cloning pretraining and compatibility with an actor-critic variant without altering the inference pipeline. These results establish a practical framework for Transformer-based decision making in high-throughput real-time scheduling.
翻译:实时调度器必须在严格计算预算下处理紧截止期问题。我们提出TempoNet,一种将置换不变Transformer与深度Q近似相结合的强化学习调度器。紧迫性分词器将时间松驰离散化为可学习嵌入,稳定值学习并捕获截止期邻近性。延迟感知的稀疏注意力堆栈通过块级top-k选择与局部敏感分块,实现对无序任务集的全局推理,具备近线性扩展与亚毫秒级推理能力。多核映射层通过掩码贪婪选择或可微匹配将上下文感知Q值转换为处理器分配。在工业混合关键性轨迹及大规模多处理器场景下的广泛评估表明,相较于分析型调度器与神经基线方法,该方法在截止期满足率上取得一致性提升,并具有更优的优化稳定性。诊断分析包括松驰量化敏感性实验、注意力驱动策略解释、硬件在环与内核微基准测试,以及结合简易运行时缓解措施的压力鲁棒性验证;同时报告了行为克隆预训练带来的样本效率优势,以及无需修改推理管线即可适配演员-评论家变体的兼容性。这些结果为基于Transformer的高通量实时调度决策建立了实用框架。