This study investigates self-supervised learning techniques to obtain representations of Event Sequences. It is a key modality in various applications, including but not limited to banking, e-commerce, and healthcare. We perform a comprehensive study of generative and contrastive approaches in self-supervised learning, applying them both independently. We find that there is no single supreme method. Consequently, we explore the potential benefits of combining these approaches. To achieve this goal, we introduce a novel method that aligns generative and contrastive embeddings as distinct modalities, drawing inspiration from contemporary multimodal research. Generative and contrastive approaches are often treated as mutually exclusive, leaving a gap for their combined exploration. Our results demonstrate that this aligned model performs at least on par with, and mostly surpasses, existing methods and is more universal across a variety of tasks. Furthermore, we demonstrate that self-supervised methods consistently outperform the supervised approach on our datasets.
翻译:本研究探讨了自监督学习技术在事件序列表征获取中的应用。事件序列是银行、电子商务和医疗保健等多种应用场景中的关键数据模态。我们系统研究了自监督学习中生成式方法与对比方法,分别独立应用这两类技术,发现并不存在单一最优方法。为此,我们探索了结合这两种方法的潜在优势。为实现这一目标,我们受当代多模态研究启发,提出了一种新颖方法,将生成式嵌入与对比嵌入作为不同模态进行对齐。现有研究常将生成式方法与对比方法视为相互排斥的范式,这为二者的联合探索留下了空白。实验结果表明,该对齐模型在性能上至少与现有方法持平,在多数情况下优于现有方法,且在不同任务中表现出更强的通用性。此外,我们证明在所使用的数据集上,自监督方法始终优于有监督方法。