Maximum subarray is a classical problem in computer science that given an array of numbers aims to find a contiguous subarray with the largest sum. We focus on its use for a noisy statistical problem of localizing an interval with a mean different from background. While a naive application of maximum subarray fails at this task, both a penalized and a constrained version can succeed. We show that the penalized version can be derived for common exponential family distributions, in a manner similar to the change-point detection literature, and we interpret the resulting optimal penalty value. The failure of the naive formulation is then explained by an analysis of the estimated interval boundaries. Experiments further quantify the effect of deviating from the optimal penalty. We also relate the penalized and constrained formulations and show that the solutions to the former lie on the convex hull of the solutions to the latter.
翻译:最大子数组是计算机科学中的一个经典问题,即给定一个数组,寻找其连续子数组中和最大的子数组。我们聚焦于该问题在噪声统计任务中的应用——定位均值不同于背景的区间。虽然直接应用最大子数组无法完成该任务,但惩罚版本和约束版本均能成功。我们证明,惩罚版本可针对常见指数族分布推导得出,其推导方式类似于变点检测文献中的方法,并解释由此得到的最优惩罚值。随后,通过分析估计区间边界,阐释朴素公式失效的原因。实验进一步量化了偏离最优惩罚值的影响。此外,我们建立了惩罚公式与约束公式的关联,并证明前者解集位于后者解集的凸包上。