Signal detection is one of the main challenges of data science. As it often happens in data analysis, the signal in the data may be corrupted by noise. There is a wide range of techniques aimed at extracting the relevant degrees of freedom from data. However, some problems remain difficult. It is notably the case of signal detection in almost continuous spectra when the signal-to-noise ratio is small enough. This paper follows a recent bibliographic line which tackles this issue with field-theoretical methods. Previous analysis focused on equilibrium Boltzmann distributions for some effective field representing the degrees of freedom of data. It was possible to establish a relation between signal detection and $\mathbb{Z}_2$-symmetry breaking. In this paper, we consider a stochastic field framework inspiring by the so-called "Model A", and show that the ability to reach or not an equilibrium state is correlated with the shape of the dataset. In particular, studying the renormalization group of the model, we show that the weak ergodicity prescription is always broken for signals small enough, when the data distribution is close to the Marchenko-Pastur (MP) law. This, in particular, enables the definition of a detection threshold in the regime where the signal-to-noise ratio is small enough.
翻译:信号检测是数据科学面临的主要挑战之一。如同数据分析中常见的情况,数据中的信号可能受到噪声干扰。已有多种技术旨在从数据中提取相关自由度,但某些问题仍然难以解决,尤其在信噪比很低时对近乎连续谱中信号的检测便是典型案例。本文遵循近期一条学术研究脉络,采用场论方法处理该问题。以往分析聚焦于表示数据自由度的有效场的平衡玻尔兹曼分布,建立了信号检测与$\mathbb{Z}_2$对称性破缺之间的关联。本文基于所谓“模型A”的启发,考虑随机场框架,证明能否达到平衡态与数据集形态相关。特别地,通过研究模型的重整化群,我们指出当数据分布接近马尔琴科-帕斯图尔(MP)律时,对于足够小的信号,弱遍历性条件总是破缺的。这尤其使得在信噪比足够小的条件下能够定义检测阈值。