We study generative modeling of \emph{variable-length trajectories} -- sequences of visited locations/items with associated timestamps -- for downstream simulation and counterfactual analysis. A recurring practical issue is that standard mini-batch training can be unstable when trajectory lengths are highly heterogeneous, which in turn degrades \emph{distribution matching} for trajectory-derived statistics. We propose \textbf{length-aware sampling (LAS)}, a simple batching strategy that groups trajectories by length and samples batches from a single length bucket, reducing within-batch length heterogeneity (and making updates more consistent) without changing the model class. We integrate LAS into a conditional trajectory GAN with auxiliary time-alignment losses and provide (i) a distribution-level guarantee for derived variables under mild boundedness assumptions, and (ii) an IPM/Wasserstein mechanism explaining why LAS improves distribution matching by removing length-only shortcut critics and targeting within-bucket discrepancies. Empirically, LAS consistently improves matching of derived-variable distributions on a multi-mall dataset of shopper trajectories and on diverse public sequence datasets (GPS, education, e-commerce, and movies), outperforming random sampling across dataset-specific metrics.
翻译:本研究探讨**变长轨迹**的生成建模——即带有时间戳的访问位置/物品序列——以用于下游仿真与反事实分析。一个常见的实际问题是,当轨迹长度高度异质时,标准小批量训练可能不稳定,进而导致轨迹衍生统计量的**分布匹配**性能下降。我们提出**长度感知采样(LAS)**,这是一种简单的批处理策略,通过长度对轨迹进行分组,并从单一长度桶中采样批次,从而减少批次内的长度异质性(并使更新更一致),且无需改变模型类别。我们将LAS集成到带有辅助时间对齐损失的**条件轨迹生成对抗网络**中,并给出:(i)在温和有界性假设下对衍生变量的分布级保证;(ii)一种积分概率度量/瓦瑟斯坦机制,解释LAS如何通过消除仅依赖长度的捷径判别器并针对桶内差异来改进分布匹配。实证表明,在商场购物者轨迹的多商场数据集以及多样化的公共序列数据集(GPS、教育、电子商务和电影)上,LAS持续改进了衍生变量分布的匹配效果,在数据集特定指标上均优于随机采样。