Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency. Researchers recently discovered the potential of codecs as suitable tokenizers for converting continuous audio into discrete codes, which can be employed to develop audio language models (LMs). Numerous high-performance neural audio codecs and codec-based LMs have been developed. The paper aims to provide a thorough and systematic overview of the neural audio codec models and codec-based LMs.
翻译:神经音频编解码器最初被引入用于将音频数据压缩为紧凑码字以降低传输延迟。研究人员近期发现,编解码器可作为将连续音频转换为离散码字的有效分词器,从而用于开发音频语言模型(LM)。目前已涌现出众多高性能的神经音频编解码器及基于编解码器的语言模型。本文旨在对神经音频编解码器模型与基于编解码器的语言模型进行系统全面的综述。