善者、恶者与样本：一种无遗憾的安全在线分类方法 (The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification)

We study the problem of sequentially testing individuals for a binary disease outcome whose true risk is governed by an unknown logistic model. At each round, a patient arrives with feature vector $x_t$, and the decision maker may either pay to administer a (noiseless) diagnostic test--revealing the true label--or skip testing and predict the patient's disease status based on their feature vector and prior history. Our goal is to minimize the total number of costly tests required while guaranteeing that the fraction of misclassifications does not exceed a prespecified error tolerance $\alpha$, with probability at least $1-\delta$. To address this, we develop a novel algorithm that interleaves label-collection and distribution estimation to estimate both $\theta^{*}$ and the context distribution $P$, and computes a conservative, data-driven threshold $\tau_t$ on the logistic score $|x_t^\top\theta|$ to decide when testing is necessary. We prove that, with probability at least $1-\delta$, our procedure does not exceed the target misclassification rate, and requires only $O(\sqrt{T})$ excess tests compared to the oracle baseline that knows both $\theta^{*}$ and the patient feature distribution $P$. This establishes the first no-regret guarantees for error-constrained logistic testing, with direct applications to cost-sensitive medical screening. Simulations corroborate our theoretical guarantees, showing that in practice our procedure efficiently estimates $\theta^{*}$ while retaining safety guarantees, and does not require too many excess tests.

翻译：我们研究对二元疾病结果进行序贯检测的问题，其真实风险由未知的逻辑斯蒂模型决定。在每一轮中，一位患者携带特征向量$x_t$到达，决策者可以选择支付费用进行（无噪声的）诊断测试——从而揭示真实标签——或者跳过测试，基于患者的特征向量和先前历史预测其疾病状态。我们的目标是在保证错误分类比例以至少$1-\delta$的概率不超过预设容错率$\alpha$的前提下，最小化所需的高成本测试总数。为此，我们开发了一种新颖算法，该算法交替进行标签收集与分布估计，以同时估计$\theta^{*}$和上下文分布$P$，并计算逻辑斯蒂分数$|x_t^\top\theta|$上的一个保守的、数据驱动的阈值$\tau_t$，以决定何时需要进行测试。我们证明，以至少$1-\delta$的概率，我们的程序不会超过目标错误分类率，并且与已知$\theta^{*}$和患者特征分布$P$的预言机基线相比，仅需要$O(\sqrt{T})$的额外测试。这为误差约束下的逻辑斯蒂测试建立了首个无遗憾保证，可直接应用于成本敏感的医学筛查。仿真结果验证了我们的理论保证，表明在实践中我们的程序能高效估计$\theta^{*}$，同时保持安全性保证，且不需要过多的额外测试。