Sentence embeddings can be decoded to give approximations of the original texts used to create them. We explore this effect in the context of text simplification, demonstrating that reconstructed text embeddings preserve complexity levels. We experiment with a small feed forward neural network to effectively learn a transformation between sentence embeddings representing high-complexity and low-complexity texts. We provide comparison to a Seq2Seq and LLM-based approach, showing encouraging results in our much smaller learning setting. Finally, we demonstrate the applicability of our transformation to an unseen simplification dataset (MedEASI), as well as datasets from languages outside the training data (ES,DE). We conclude that learning transformations in sentence embedding space is a promising direction for future research and has potential to unlock the ability to develop small, but powerful models for text simplification and other natural language generation tasks.
翻译:句子嵌入可通过解码生成用于创建它们的原始文本的近似表示。我们在文本简化的背景下探究这一效应,证明重构的文本嵌入能够保留复杂度水平。我们通过一个小型前馈神经网络进行实验,有效学习了表示高复杂度与低复杂度文本的句子嵌入之间的转换关系。我们与基于Seq2Seq和大型语言模型的方法进行了对比,在更小的学习设置中展示了令人鼓舞的结果。最后,我们验证了该转换方法在未见过的简化数据集(MedEASI)以及训练数据之外的语言数据集(西班牙语、德语)上的适用性。我们得出结论:在句子嵌入空间中学习转换是未来研究的一个有前景的方向,并具有开发小型但强大的文本简化及其他自然语言生成模型的潜力。