Many fields collect large-scale temporal data through repeated measurements (trials), where each trial is labeled with a set of metadata variables spanning several categories. For example, a trial in a neuroscience study may be linked to a value from category (a): task difficulty, and category (b): animal choice. A critical challenge in time-series analysis is to understand how these labels are encoded within the multi-trial observations, and disentangle the distinct effect of each label entry across categories. Here, we present MILCCI, a novel data-driven method that i) identifies the interpretable components underlying the data, ii) captures cross-trial variability, and iii) integrates label information to understand each category's representation within the data. MILCCI extends a sparse per-trial decomposition that leverages label similarities within each category to enable subtle, label-driven cross-trial adjustments in component compositions and to distinguish the contribution of each category. MILCCI also learns each component's corresponding temporal trace, which evolves over time within each trial and varies flexibly across trials. We demonstrate MILCCI's performance through both synthetic and real-world examples, including voting patterns, online page view trends, and neuronal recordings.
翻译:许多领域通过重复测量(试验)收集大规模时间序列数据,其中每个试验均被标记一组跨越多个类别的元数据变量。例如,神经科学研究中的一次试验可能与类别(a):任务难度,以及类别(b):动物选择中的一个取值相关联。时间序列分析中的一个关键挑战在于理解这些标签如何在多试验观测中被编码,并分离每个类别中各个标签条目的独立效应。本文提出MILCCI,一种新颖的数据驱动方法,该方法能够:i)识别数据中可解释的潜在成分;ii)捕捉跨试验的变异性;iii)集成标签信息以理解每个类别在数据中的表征。MILCCI扩展了一种稀疏的逐试验分解方法,该方法利用每个类别内的标签相似性,实现成分构成中细微的、标签驱动的跨试验调整,并区分每个类别的贡献。MILCCI还学习每个成分对应的时间轨迹,该轨迹在每次试验内随时间演化,并在不同试验间灵活变化。我们通过合成数据与真实世界示例(包括投票模式、在线页面浏览趋势及神经元记录)验证了MILCCI的性能。