Representation learning has become a crucial area of research in machine learning, as it aims to discover efficient ways of representing raw data with useful features to increase the effectiveness, scope and applicability of downstream tasks such as classification and prediction. In this paper, we propose a novel method to generate representations for time series-type data. This method relies on ideas from theoretical physics to construct a compact representation in a data-driven way, and it can capture both the underlying structure of the data and task-specific information while still remaining intuitive, interpretable and verifiable. This novel methodology aims to identify linear laws that can effectively capture a shared characteristic among samples belonging to a specific class. By subsequently utilizing these laws to generate a classifier-agnostic representation in a forward manner, they become applicable in a generalized setting. We demonstrate the effectiveness of our approach on the task of ECG signal classification, achieving state-of-the-art performance.
翻译:表示学习已成为机器学习的重要研究方向,其目标在于发现利用有效特征表示原始数据的高效方式,从而提升分类、预测等下游任务的效果、范围及适用性。本文提出一种面向时序数据表示生成的新方法。该方法借鉴理论物理学思想,以数据驱动方式构建紧致表示,既能捕捉数据内在结构及任务特定信息,同时保持直观性、可解释性与可验证性。该新方法旨在识别能有效捕获特定类别样本间共有特征的线性规律,通过利用这些规律以正向方式生成与分类器无关的表示,使其适用于泛化场景。我们在ECG信号分类任务上验证了本方法的有效性,并取得了业界领先的性能。