条件柯西-施瓦茨散度在时间序列分析中的应用：核化估计及其在聚类与欺诈检测中的应用 (Conditional Cauchy-Schwarz Divergence for Time Series Analysis: Kernelized Estimation and Applications in Clustering and Fraud Detection)

We study the conditional Cauchy-Schwarz divergence (C-CSD) as a symmetric and density-free measure for time series analysis. We derive a practical kernel based estimator using radial basis function kernels on both the condition and output spaces, together with numerical stabilizations including a symmetric logarithmic form with an epsilon ridge and a robust bandwidth selection rule based on the interquartile range. Median heuristic bandwidths are applied to window vectors, and effective rank filtering is used to avoid degenerate kernels. We demonstrate the framework in two applications. In time series clustering, conditioning on the time index and comparing scalar series values yields a pairwise C-CSD dissimilarity. Bandwidths are selected on the training split, after which precomputed distance k-medoids clustering is performed on the test split and evaluated using normalized mutual information. In fraud detection, conditioning on sliding transaction windows and comparing the magnitude of value changes with categorical and merchant change indicators, each query window is scored by contrasting a global normal reference mixture against a same account local history mixture with recency decay and change flag weighting. Account level decisions are obtained by aggregating window scores using the maximum value. Experiments on benchmark time series datasets and a transactional fraud detection dataset demonstrate stable estimation and effective performance under a strictly leak free evaluation protocol.

翻译：本文研究条件柯西-施瓦茨散度（C-CSD）作为一种对称且无需密度估计的度量方法在时间序列分析中的应用。我们提出一种基于核的实用估计器，该估计器在条件空间和输出空间均采用径向基函数核，并引入数值稳定化技术，包括带ε岭参数的对称对数形式以及基于四分位距的鲁棒带宽选择准则。对窗口向量采用中位数启发式带宽选择，并利用有效秩滤波以避免核矩阵退化。我们在两个应用场景中验证该框架的有效性。在时间序列聚类中，通过以时间索引为条件并比较标量序列值，可得到成对的C-CSD相异性度量。带宽在训练集上确定后，在测试集上执行预计算距离的k-medoids聚类，并使用归一化互信息进行评估。在欺诈检测中，以滑动交易窗口为条件，通过对比数值变化幅度与分类及商户变更指标，将每个查询窗口的评分计算为全局正常参考混合分布与带时效衰减及变更标志加权的同账户本地历史混合分布的对比结果。账户级决策通过聚合窗口评分的最大值获得。在基准时间序列数据集和交易欺诈检测数据集上的实验表明，该估计方法在严格无信息泄露的评估协议下具有稳定的估计性能和有效的检测效果。