It is well-known in industrial data science that large values of real-life time series tend to be structured and often follow concrete and visible patterns. In this paper, we use ideas from additive combinatorics and discrete Fourier analysis to give this heuristic a mathematical foundation. Our main tool is the Fourier ratio, a complexity measure previously used in compressed sensing, combined with a generalized version of Chang's lemma from additive combinatorics. Together, these yield a precise prediction: when the Fourier ratio of a time series is small, the set of its largest values can be additively generated by a very small set using only $\{-1,0,1\}$ coefficients. We test this prediction on US inflation data and Delhi climate data, both in their original form and after mean-centering. The numerical results confirm the predicted structure: a generating set of size $4$--$7$ suffices to span large spectra containing dozens of points, even when the Fourier ratio is large enough that our theoretical bounds become loose. These findings provide a rigorous explanation for why extreme values in real-world data are information-rich and structurally significant.
翻译:在工业数据科学中,一个广为人知的现象是,现实时间序列的大值往往具有结构性,且通常遵循具体且可见的模式。本文利用加性组合学和离散傅里叶分析的思想,为这一经验法则提供了数学基础。我们的主要工具是傅里叶比率(一种先前在压缩感知中使用的复杂度度量),结合加性组合学中Chang引理的推广形式。这些方法共同得出了一个精确的预测:当时间序列的傅里叶比率较小时,其最大值的集合可以通过一个非常小的集合,仅使用$\{-1,0,1\}$系数,以加法方式生成。我们将这一预测应用于美国通胀数据和德里气候数据,包括原始形式和均值中心化后的形式。数值结果证实了该预测的结构:即使傅里叶比率足够大导致我们的理论界限变得宽松,一个大小为$4$至$7$的生成集也足以张成包含数十个点的大值谱。这些发现为现实世界数据中的极端值为何信息丰富且具有结构意义提供了严格解释。