Retrieving temporal event sequences from textual descriptions is essential for applications such as analyzing e-commerce behavior, monitoring social media activities, and tracking criminal incidents. In this paper, we introduce TPP-LLM-Embedding, a unified model for efficiently embedding and retrieving event sequences based on natural language descriptions. Built on the TPP-LLM framework, which integrates large language models with temporal point processes, our model encodes both event types and times, generating a sequence-level representation through pooling. Textual descriptions are embedded using the same architecture, ensuring a shared embedding space for both sequences and descriptions. We optimize a contrastive loss based on similarity between these embeddings, bringing matching pairs closer and separating non-matching ones. TPP-LLM-Embedding enables efficient retrieval and demonstrates superior performance compared to baseline models across diverse datasets.
翻译:从文本描述中检索时序事件序列对于分析电子商务行为、监测社交媒体活动及追踪犯罪事件等应用至关重要。本文提出了TPP-LLM-Embedding,一种基于自然语言描述高效嵌入和检索事件序列的统一模型。该模型建立在TPP-LLM框架之上,该框架将大语言模型与时序点过程相结合,能够同时编码事件类型与时间,并通过池化操作生成序列级表示。文本描述采用相同架构进行嵌入,确保序列与描述共享同一嵌入空间。我们基于嵌入间的相似度优化对比损失,使匹配对更接近、非匹配对更分离。TPP-LLM-Embedding实现了高效检索,并在多个数据集上展现出优于基线模型的性能。