Large Language Models Can Perform Automatic Modulation Classification via Discretized Self-supervised Candidate Retrieval

Identifying wireless modulation schemes is essential for cognitive radio, but standard supervised models often degrade under distribution shift, and training domain-specific wireless foundation models from scratch is computationally prohibitive. Large Language Models (LLMs) offer a promising training-free alternative via in-context learning, yet feeding raw floating-point signal statistics into LLMs overwhelms models with numerical noise and exhausts token budgets. We introduce DiSC-AMC, a framework that reformulates Automatic Modulation Classification (AMC) as an LLM reasoning task by combining aggressive feature discretization with nearest-neighbor retrieval over self-supervised embeddings. By mapping continuous features to coarse symbolic tokens, DiSC-AMC aligns abstract signal patterns with LLM reasoning capabilities and reduces prompt length by over $50$\%. Simultaneously, utilizing a DINOv2 visual encoder to retrieve the $k_\text{NN}$ most similar labeled exemplars provides highly relevant, query-specific context rather than generic class averages. On a 10-class benchmark, a fine-tuned 7B-parameter LLM using DiSC-AMC achieves $83.0$\% in-distribution accuracy ($-10$\,to\,$+10$\,dB) and $82.50$\% out-of-distribution (OOD) accuracy ($-11$\,to\,$-15$\,dB), outperforming supervised baselines. Comprehensive ablations on vanilla LLMs demonstrate the token efficiency of DiSC-AMC. A training-free $7$B LLM achieves $71$\% accuracy using only $0.5$\,K-token prompt,surpassing a $200$B-parameter baseline that relies on a $2.9$K-token prompt. Furthermore, similarity-based exemplar retrieval outperforms naive class-average selection by over $20$\%. Finally, we identify a fundamental limitation of this pipeline. At extreme OOD noise levels ($-30$\,dB), the underlying self-supervised representations collapse, degrading retrieval quality and reducing classification to random chance.

翻译：识别无线调制方案对于认知无线电至关重要，但标准监督模型在分布偏移下性能会退化，而从零训练领域特定的无线基础模型计算代价高昂。大型语言模型（LLM）通过上下文学习提供了一种有前景的免训练替代方案，然而将原始浮点信号统计量直接输入LLM会因数值噪声使模型不堪重负，并耗尽令牌预算。我们提出DiSC-AMC框架，通过将激进的特征离散化与自监督嵌入上的最近邻检索相结合，将自动调制分类（AMC）重塑为LLM推理任务。通过将连续特征映射为粗粒度符号令牌，DiSC-AMC将抽象信号模式与LLM推理能力对齐，并将提示长度减少超过50%。同时，利用DINOv2视觉编码器检索k个最近邻相似标注样本，提供高度相关、查询特定的上下文，而非通用类别平均值。在10类基准测试中，采用DiSC-AMC微调的7B参数LLM在分布内准确率（-10至+10 dB）达83.0%，分布外（OOD）准确率（-11至-15 dB）达82.50%，优于监督基线。针对原始LLM的全面消融实验证明了DiSC-AMC的令牌效率：免训练的7B LLM仅使用0.5K令牌提示即可达到71%准确率，超越依赖2.9K令牌提示的200B参数基线。此外，基于相似度的样本检索性能比朴素类别平均选择高出20%以上。最后，我们识别出该流程的根本局限性：在极端OOD噪声水平（-30 dB）下，底层自监督表示会崩溃，导致检索质量下降并使分类退化为随机猜测。