Semantic communication promises task-aligned transmission but must reconcile semantic fidelity with stringent latency guarantees in immersive and safety-critical services. This paper introduces a time-constrained human-in-the-loop reinforcement learning (TC-HITL-RL) framework that embeds human feedback, semantic utility, and latency control within a semantic-aware Open radio access network (RAN) architecture. We formulate semantic adaptation driven by human feedback as a constrained Markov decision process (CMDP) whose state captures semantic quality, human preferences, queue slack, and channel dynamics, and solve it via a primal--dual proximal policy optimization algorithm with action shielding and latency-aware reward shaping. The resulting policy preserves PPO-level semantic rewards while tightening the variability of both air-interface and near-real-time RAN intelligent controller processing budgets. Simulations over point-to-multipoint links with heterogeneous deadlines show that TC-HITL-RL consistently meets per-user timing constraints, outperforms baseline schedulers in reward, and stabilizes resource consumption, providing a practical blueprint for latency-aware semantic adaptation.
翻译:语义通信技术致力于实现任务导向型传输,但在沉浸式与安全关键型服务中,必须兼顾语义保真度与严格的时延保障。本文提出一种时延约束的人机协同强化学习(TC-HITL-RL)框架,将人类反馈、语义效用与时延控制集成于语义感知的开放式无线接入网络(RAN)架构中。我们将基于人类反馈的语义适配建模为约束马尔可夫决策过程(CMDP),其状态涵盖语义质量、用户偏好、队列裕度及信道动态特性,并通过结合动作屏蔽与时延感知奖励塑形的原始-对偶近端策略优化算法进行求解。所得策略在保持PPO级别语义奖励的同时,显著降低了空口接口与近实时RAN智能控制器处理资源的波动性。在具有异构截止时限的点对多点链路上的仿真表明,TC-HITL-RL能够持续满足单用户时延约束,在奖励指标上优于基线调度器,并实现资源消耗的稳定化,为时延感知的语义适配提供了可行的技术方案。