Given a pair of multivariate time-series data of the same length and dimensions, an approach is proposed to select variables and time intervals where the two series are significantly different. In applications where one time series is an output from a computationally expensive simulator, the approach may be used for validating the simulator against real data, for comparing the outputs of two simulators, and for validating a machine learning-based emulator against the simulator. With the proposed approach, the entire time interval is split into multiple subintervals, and on each subinterval, the two sample sets are compared to select variables that distinguish their distributions and a two-sample test is performed. The validity and limitations of the proposed approach are investigated in synthetic data experiments. Its usefulness is demonstrated in an application with a particle-based fluid simulator, where a deep neural network model is compared against the simulator, and in an application with a microscopic traffic simulator, where the effects of changing the simulator's parameters on traffic flows are analysed.
翻译:给定一对长度和维度相同的多元时间序列数据,本文提出了一种方法来选择两个序列存在显著差异的变量和时间区间。在其中一个时间序列来自计算成本高昂的仿真器的应用场景中,该方法可用于:将仿真器与真实数据进行验证、比较两个仿真器的输出,以及将基于机器学习的代理模型与仿真器进行验证。通过所提出的方法,整个时间区间被划分为多个子区间,在每个子区间上对两个样本集进行比较以筛选能区分其分布的变量,并执行双样本检验。通过合成数据实验研究了所提方法的有效性与局限性。其实际效用在一个基于粒子的流体仿真器应用中得到验证——该场景中将深度神经网络模型与仿真器进行比较;同时在一个微观交通仿真器应用中得到展示——该场景中分析了改变仿真器参数对交通流量的影响。