Deployed machine learning systems must continuously evolve as data, architectures, and regulations change, often without access to original training data or model internals. In such settings, black-box copying provides a practical refactoring mechanism, i.e. upgrading legacy models by learning replicas from input-output queries alone. When restricted to hard-label outputs, copying turns into a discontinuous surface reconstruction problem from pointwise queries, severely limiting the ability to recover boundary geometry efficiently. We propose a distance-based copying (distillation) framework that replaces hard-label supervision with signed distances to the teacher's decision boundary, converting copying into a smooth regression problem that exploits local geometry. We develop an $α$-governed smoothing and regularization scheme with Hölder/Lipschitz control over the induced target surface, and introduce two model-agnostic algorithms to estimate signed distances under label-only access. Experiments on synthetic problems and UCI benchmarks show consistent improvements in fidelity and generalization accuracy over hard-label baselines, while enabling distance outputs as uncertainty-related signals for black-box replicas.
翻译:随着数据、架构和法规的不断变化,已部署的机器学习系统必须持续演进,而在此过程中往往无法获取原始训练数据或模型内部信息。在此类场景下,黑盒复制提供了一种实用的重构机制,即仅通过输入-输出查询来学习副本,从而实现遗留模型的升级。当仅限于硬标签输出时,复制问题转化为基于离散点查询的不连续曲面重建问题,这严重限制了高效恢复决策边界几何结构的能力。本文提出一种基于距离的复制(蒸馏)框架,该框架使用到教师模型决策边界的符号距离替代硬标签监督,从而将复制问题转化为可利用局部几何特征的平滑回归问题。我们开发了一种受α参数控制的平滑与正则化方案,对诱导目标曲面进行Hölder/Lipschitz连续性控制,并提出了两种模型无关的算法来在仅标签访问条件下估计符号距离。在合成问题与UCI基准数据集上的实验表明,相较于硬标签基线方法,本方法在保真度和泛化准确率方面均取得持续提升,同时使距离输出能够作为黑盒副本中与不确定性相关的信号。