Rejecting inputs outside the defined in-distribution (IND) service scope is critical for large language model (LLM) services, where unsupported requests should be filtered before full generation. Existing out-of-distribution (OOD) detectors often rely on final outputs or final-layer representations, leaving unclear where service-boundary signals are most clearly encoded inside the model; they also lack a theoretical guarantee for held-out inputs. In this paper, we introduce SCOPE (Sequential Conformal OOD Probing and Evaluation), a framework that selects a readable hidden layer, constructs a conformal gate with IND calibration, and uses a supermartingale e-process to certify persistent service-boundary evidence. Experiments across multiple LLM backbones and six carefully designed boundary conditions show that SCOPE improves gate-level rejection over standard final-layer detectors, while revealing how different OOD boundaries take different geometric forms in hidden space.
翻译:暂无翻译