The introduction of embedding techniques has pushed forward significantly the Natural Language Processing field. Many of the proposed solutions have been presented for word-level encoding; anyhow, in the last years, new mechanism to treat information at an higher level of aggregation, like at sentence- and document-level, have emerged. With this work we address specifically the sentence embeddings problem, presenting the Static Fuzzy Bag-of-Word model. Our model is a refinement of the Fuzzy Bag-of-Words approach, providing sentence embeddings with a predefined dimension. SFBoW provides competitive performances in Semantic Textual Similarity benchmarks, while requiring low computational resources.
翻译:嵌入技术的引入显著推动了自然语言处理领域的发展。众多解决方案已针对词级编码提出;然而,近年来出现了新的机制以处理更高聚合层次的信息,例如句子级和文档级信息。在本工作中,我们专门针对句子嵌入问题,提出了静态模糊词袋模型。该模型是对模糊词袋方法的改进,能够提供预定义维度的句子嵌入。SFBoW在语义文本相似度基准测试中展现出具有竞争力的性能,同时所需计算资源较低。