Bit Cipher -- A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

While Large Language Models (LLMs) become ever more dominant, classic pre-trained word embeddings sustain their relevance through computational efficiency and nuanced linguistic interpretation. Drawing from recent studies demonstrating that the convergence of GloVe and word2vec optimizations all tend towards log-co-occurrence matrix variants, we construct a novel word representation system called Bit-cipher that eliminates the need of backpropagation while leveraging contextual information and hyper-efficient dimensionality reduction techniques based on unigram frequency, providing strong interpretability, alongside efficiency. We use the bit-cipher algorithm to train word vectors via a two-step process that critically relies on a hyperparameter -- bits -- that controls the vector dimension. While the first step trains the bit-cipher, the second utilizes it under two different aggregation modes -- summation or concatenation -- to produce contextually rich representations from word co-occurrences. We extend our investigation into bit-cipher's efficacy, performing probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess its competitiveness with classic embeddings like word2vec and GloVe. Additionally, we explore its applicability in LM training and fine-tuning. By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima compared to conventional training paradigms. Experiments on the integration of bit-cipher embedding layers with Roberta, T5, and OPT, prior to or as a substitute for fine-tuning, showcase a promising enhancement to transfer learning, allowing rapid model convergence while preserving competitive performance.

翻译：摘要：尽管大型语言模型（LLMs）日益占据主导地位，但经典的预训练词嵌入凭借其计算效率和精细的语义解释能力仍保持其相关性。基于近期研究表明，GloVe与word2vec的优化方法均趋近于对数共现矩阵变体，我们构建了一种名为比特密码（Bit-cipher）的新型词汇表示系统。该系统无需反向传播，同时利用上下文信息和基于单词语频的超高效降维技术，兼具强解释性与高计算效率。我们通过两步流程使用比特密码算法训练词向量：关键超参数"比特数"（bits）用于控制向量维度。第一步训练比特密码模型，第二步在两种聚合模式（求和或拼接）下生成基于词共现的上下文丰富表示。我们进一步验证比特密码的有效性，通过对词性标注（POS）与命名实体识别（NER）的探测实验，评估其与word2vec、GloVe等经典嵌入的竞争力。此外，我们探索了其在语言模型训练与微调中的适用性。通过将嵌入层替换为密码嵌入，实验表明，与传统训练范式相比，密码嵌入在加速训练过程与获取更优解方面表现出显著效率。将比特密码嵌入层与Roberta、T5和OPT集成（在微调前或替代微调）的实验表明，该方法在保持竞争性性能的同时，可加速模型收敛，为迁移学习提供有前景的增强方案。