Dense retrieval methods have been mostly focused on unstructured text and less attention has been drawn to structured data with various aspects, e.g., products with aspects such as category and brand. Recent work has proposed two approaches to incorporate the aspect information into item representations for effective retrieval by predicting the values associated with the item aspects. Despite their efficacy, they treat the values as isolated classes (e.g., "Smart Homes", "Home, Garden & Tools", and "Beauty & Health") and ignore their fine-grained semantic relation. Furthermore, they either enforce the learning of aspects into the CLS token, which could confuse it from its designated use for representing the entire content semantics, or learn extra aspect embeddings only with the value prediction objective, which could be insufficient especially when there are no annotated values for an item aspect. Aware of these limitations, we propose a MUlti-granulaRity-aware Aspect Learning model (MURAL) for multi-aspect dense retrieval. It leverages aspect information across various granularities to capture both coarse and fine-grained semantic relations between values. Moreover, MURAL incorporates separate aspect embeddings as input to transformer encoders so that the masked language model objective can assist implicit aspect learning even without aspect-value annotations. Extensive experiments on two real-world datasets of products and mini-programs show that MURAL outperforms state-of-the-art baselines significantly.
翻译:稠密检索方法主要集中于非结构化文本,而对具有多种方面(例如具有类别和品牌等属性的商品)的结构化数据关注较少。近期工作提出了两种方法,通过预测与商品方面相关的值,将方面信息融入商品表示以实现高效检索。尽管这些方法有效,但它们将值视为孤立类别(如“智能家居”、“家居、花园与工具”、“美容与健康”),忽略了其细粒度语义关联。此外,这些方法要么将方面学习强制融入CLS标记中,这可能使其偏离原本用于表示整体内容语义的指定用途;要么仅通过值预测目标学习额外的方面嵌入,这在缺乏商品方面标注值时可能不足。针对这些局限性,我们提出了一种面向多方面稠密检索的多粒度感知方面学习模型(MURAL)。该模型利用不同粒度下的方面信息,以捕捉值之间的粗粒度及细粒度语义关系。同时,MURAL将独立的方面嵌入作为Transformer编码器的输入,使得掩码语言模型目标即使在缺乏方面-值标注的情况下也能辅助隐式方面学习。在两个真实世界的商品与小程序数据集上的大量实验表明,MURAL显著优于当前最先进的基线方法。