We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 forward passes through a 78M parameter model with no access to the target encoder. On 32-token sequences across three embedding models, the method achieves 81.3% token accuracy and 0.87 cosine similarity.
翻译:我们将嵌入反演问题构建为条件掩码扩散过程,通过迭代去噪并行恢复所有词元,而非采用序列自回归生成方式。该方法通过自适应层归一化将掩码扩散语言模型以目标嵌入为条件,仅需对7800万参数模型进行8次前向传播,且无需访问目标编码器。在三种嵌入模型的32词元序列测试中,该方法实现了81.3%的词元准确率与0.87的余弦相似度。