Maximum subarray is a classical problem in computer science that given an array of numbers aims to find a contiguous subarray with the largest sum. We focus on its use for a noisy statistical problem of localizing an interval with a mean different from background. While a naive application of maximum subarray fails at this task, both a penalized and a constrained version can succeed. We show that the penalized version can be derived for common exponential family distributions, in a manner similar to the change-point detection literature, and we interpret the resulting optimal penalty value. The failure of the naive formulation is then explained by an analysis of the estimated interval boundaries. Experiments further quantify the effect of deviating from the optimal penalty. We also relate the penalized and constrained formulations and show that the solutions to the former lie on the convex hull of the solutions to the latter.
翻译:最大子数组是计算机科学中的一个经典问题,给定一个数值数组,旨在找出和最大的连续子数组。我们聚焦于其在噪声统计问题中的应用——定位均值不同于背景的区间。虽然直接应用最大子数组无法完成此任务,但惩罚版本和约束版本均可成功。我们证明,惩罚版本可针对常见指数族分布推导得出,其思路类似于变点检测文献,并解释了所得最优惩罚值的含义。随后通过估计区间边界的分析,阐释了原始公式失效的原因。实验进一步量化了偏离最优惩罚的影响。我们还建立了惩罚版本与约束版本之间的关联,并证明前者的解位于后者解构成的凸包上。