This paper provides an in-depth analysis of Token2Wave, a novel token representation method derived from the Wave Network, designed to capture both global and local semantics of input text through wave-inspired complex vectors. In Token2Wave, each token is represented with a magnitude component, capturing the global semantics of the entire input text, and a phase component, encoding the relationships between individual tokens and the global semantics. Building on prior research that demonstrated the effectiveness of wave-like operations, such as interference and modulation, during forward propagation, this study investigates the convergence behavior, backpropagation characteristics, and embedding independence within the Token2Wave framework. A detailed computational complexity analysis shows that Token2Wave can significantly reduce video memory usage and training time compared to BERT. Gradient comparisons for the [CLS] token, total input text, and classifier parameters further highlight Token2Wave's unique characteristics. This research offers new insights into wave-based token representations, demonstrating their potential to enable efficient and computationally friendly language model architectures.
翻译:本文对Token2Wave进行了深入分析,这是一种源自波网络的新型标记表示方法,旨在通过受波启发的复向量同时捕获输入文本的全局与局部语义。在Token2Wave中,每个标记通过幅度分量和相位分量进行表示:幅度分量捕获整个输入文本的全局语义,相位分量则编码各个标记与全局语义之间的关系。基于先前研究中已证明波状操作(如干涉和调制)在前向传播过程中的有效性,本研究探讨了Token2Wave框架内的收敛行为、反向传播特性及嵌入独立性。详细的计算复杂度分析表明,与BERT相比,Token2Wave能显著降低显存占用和训练时间。针对[CLS]标记、整体输入文本及分类器参数的梯度比较进一步凸显了Token2Wave的独特性质。本研究为基于波的标记表示提供了新的见解,证明了其在构建高效且计算友好的语言模型架构方面的潜力。