A compression function is a map that slims down an observational set into a subset of reduced size, while preserving its informational content. In multiple applications, the condition that one new observation makes the compressed set change is interpreted that this observation brings in extra information and, in learning theory, this corresponds to misclassification, or misprediction. In this paper, we lay the foundations of a new theory that allows one to keep control on the probability of change of compression (called the "risk"). We identify conditions under which the cardinality of the compressed set is a consistent estimator for the risk (without any upper limit on the size of the compressed set) and prove unprecedentedly tight bounds to evaluate the risk under a generally applicable condition of preference. All results are usable in a fully agnostic setup, without requiring any a priori knowledge on the probability distribution of the observations. Not only these results offer a valid support to develop trust in observation-driven methodologies, they also play a fundamental role in learning techniques as a tool for hyper-parameter tuning.
翻译:压缩函数是一种将观察集精简为规模更小的子集,同时保留其信息内容的映射。在多种应用中,当新观察值导致压缩集发生变化时,这一现象被解读为该观察值带来了额外信息;在学习理论中,这对应于错误分类或错误预测。本文奠定了一种新理论的基础,该理论能够有效控制压缩变化概率(称为"风险")。我们确定了压缩集基数是风险的一致估计量(且对压缩集规模无上限约束)的条件,并证明了在普遍适用的偏好条件下评估风险的前所未有的紧界。所有结果均可应用于完全不可知设置,无需任何关于观察值概率分布的先验知识。这些结果不仅为发展基于观察驱动的方法论的可信度提供了有效支持,还在学习技术中作为超参数调优的工具发挥着基础性作用。