Conformal predictions make it possible to define reliable and robust learning algorithms. But they are essentially a method for evaluating whether an algorithm is good enough to be used in practice. To define a reliable learning framework for classification from the very beginning of its design, the concept of scalable classifier was introduced to generalize the concept of classical classifier by linking it to statistical order theory and probabilistic learning theory. In this paper, we analyze the similarities between scalable classifiers and conformal predictions by introducing a new definition of a score function and defining a special set of input variables, the conformal safety set, which can identify patterns in the input space that satisfy the error coverage guarantee, i.e., that the probability of observing the wrong (possibly unsafe) label for points belonging to this set is bounded by a predefined $\varepsilon$ error level. We demonstrate the practical implications of this framework through an application in cybersecurity for identifying DNS tunneling attacks. Our work contributes to the development of probabilistically robust and reliable machine learning models.
翻译:置信预测使得定义可靠且鲁棒的学习算法成为可能,但本质上它是一种评估算法是否足够好以应用于实际的方法。为了从设计之初就为分类任务建立可靠的学习框架,可扩展分类器的概念被引入,通过将经典分类器与统计序理论和概率学习理论相关联,对其进行了推广。本文通过引入新的评分函数定义,并定义一组特殊的输入变量——置信安全集,分析了可扩展分类器与置信预测之间的相似性。该安全集能够识别输入空间中满足错误覆盖保证的模式,即属于该集合的点出现错误(可能不安全)标签的概率被限制在预定义的$\varepsilon$误差水平内。我们通过网络安全领域中识别DNS隧道攻击的应用,展示了该框架的实践意义。本研究为开发概率鲁棒且可靠的机器学习模型做出了贡献。