Time-Dependent VAE for Building Latent Representations from Visual Neural Activity with Complex Dynamics

Seeking high-quality representations with latent variable models (LVMs) to reveal the intrinsic correlation between neural activity and behavior or sensory stimuli has attracted much interest. Most work has focused on analyzing motor neural activity that controls clear behavioral traces and has modeled neural temporal relationships in a way that does not conform to natural reality. For studies of visual brain regions, naturalistic visual stimuli are high-dimensional and time-dependent, making neural activity exhibit intricate dynamics. To cope with such conditions, we propose Time-Dependent Split VAE (TiDeSPL-VAE), a sequential LVM that decomposes visual neural activity into two latent representations while considering time dependence. We specify content latent representations corresponding to the component of neural activity driven by the current visual stimulus, and style latent representations corresponding to the neural dynamics influenced by the organism's internal state. To progressively generate the two latent representations over time, we introduce state factors to construct conditional distributions with time dependence and apply self-supervised contrastive learning to shape them. By this means, TiDeSPL-VAE can effectively analyze complex visual neural activity and model temporal relationships in a natural way. We compare our model with alternative approaches on synthetic data and neural data from the mouse visual cortex. The results show that our model not only yields the best decoding performance on naturalistic scenes/movies but also extracts explicit neural dynamics, demonstrating that it builds latent representations more relevant to visual stimuli.

翻译：利用潜变量模型（LVMs）寻求高质量表征以揭示神经活动与行为或感官刺激之间的内在关联，已引起广泛关注。现有研究大多集中于分析控制明确行为轨迹的运动神经活动，且对神经时间关系的建模方式与自然现实不符。对于视觉脑区的研究而言，自然视觉刺激具有高维且时间依赖的特性，导致神经活动呈现出复杂的动态特性。为应对此类情况，我们提出时间依赖分裂变分自编码器（TiDeSPL-VAE），这是一种序列潜变量模型，可在考虑时间依赖性的同时将视觉神经活动分解为两种潜在表征。我们定义了与当前视觉刺激驱动的神经活动成分相对应的内容潜在表征，以及与受生物体内部状态影响的神经动态特性相对应的风格潜在表征。为随时间逐步生成这两种潜在表征，我们引入状态因子来构建具有时间依赖性的条件分布，并应用自监督对比学习对其进行塑造。通过这种方式，TiDeSPL-VAE能够有效分析复杂的视觉神经活动，并以自然的方式建模时间关系。我们在合成数据及小鼠视觉皮层神经数据上，将本模型与其他方法进行了比较。结果表明，我们的模型不仅在自然场景/电影的解码性能上表现最佳，还能提取出明确的神经动态特性，证明其构建的潜在表征与视觉刺激具有更强的相关性。