The emergence of sixth-generation networks heralds an intelligent communication ecosystem driven by the rapid proliferation of intelligent services and increasingly complex communication scenarios. However, current physical-layer designs-typically following modular and isolated optimization paradigms-fail to achieve global end-to-end optimality due to neglected inter-module dependencies. Although large language models (LLMs) have recently been applied to communication tasks such as beam prediction and resource allocation, existing studies remain limited to single-task or single-modality scenarios and lack the ability to jointly reason over communication states and user intents for personalized strategy adaptation. To address these limitations, this paper proposes a novel multimodal communication decision-making model for link construction leveraging reinforcement learning on pretrained LLMs. The proposed model semantically aligns channel state information (CSI) and textual user instructions, enabling comprehensive understanding of both physical-layer conditions and communication intents. It then generates physically realizable, user-customized link construction that dynamically adapts to changing environments and preference tendencies. A two-stage reinforcement learning framework is employed: the first stage expands the experience pool via heuristic exploration and behavior cloning to obtain a near-optimal initialization, while the second stage fine-tunes the model through multi-objective reinforcement learning considering BER, throughput, and power consumption. Experimental results demonstrate that the proposed model significantly outperforms conventional planning-based algorithms under challenging channel conditions, achieving robust, efficient, and personalized end-to-end communication strategies.
翻译:第六代移动通信网络的出现标志着智能通信生态系统的兴起,其核心驱动力来自智能服务的快速普及与通信场景的日益复杂化。然而,当前物理层设计普遍采用模块化、孤立优化范式,因忽视模块间依赖关系而难以实现端到端全局最优。尽管大语言模型已被应用于波束预测、资源分配等通信任务,但现有研究仍局限于单一任务或单模态场景,缺乏联合推理通信状态与用户意图以生成个性化策略的能力。为此,本文提出一种基于预训练大语言模型强化学习的多模态链路构建决策模型。该模型通过语义对齐信道状态信息与文本化用户指令,实现物理层条件与通信意图的深度理解,并据此动态生成可物理实现、适配环境变化与用户偏好倾向的定制化链路。我们采用两阶段强化学习框架:第一阶段通过启发式探索与行为克隆扩展经验池以获得近似最优初始化,第二阶段则结合误码率、吞吐量与功耗等多目标进行强化学习微调。实验结果表明,在复杂信道条件下,所提模型显著优于传统基于规划的算法,实现了鲁棒、高效且个性化的端到端通信策略。