Large Language Models (LLMs) can now solve entire exams directly from uploaded PDF assessments, raising urgent concerns about academic integrity and the reliability of grades and credentials. Existing watermarking techniques either operate at the token level or assume control over the model's decoding process, making them ineffective when students query proprietary black-box systems with instructor-provided documents. We present Integrity Shield, a document-layer watermarking system that embeds schema-aware, item-level watermarks into assessment PDFs while keeping their human-visible appearance unchanged. These watermarks consistently prevent MLLMs from answering shielded exam PDFs and encode stable, item-level signatures that can be reliably recovered from model or student responses. Across 30 exams spanning STEM, humanities, and medical reasoning, Integrity Shield achieves exceptionally high prevention (91-94% exam-level blocking) and strong detection reliability (89-93% signature retrieval) across four commercial MLLMs. Our demo showcases an interactive interface where instructors upload an exam, preview watermark behavior, and inspect pre/post AI performance & authorship evidence.
翻译:大型语言模型(LLMs)现已能够直接通过上传的PDF评估材料解答完整试卷,这引发了关于学术诚信以及成绩与证书可靠性的迫切担忧。现有的水印技术要么在词元层面操作,要么需控制模型的解码过程,当学生使用教师提供的文档查询专有黑盒系统时,这些技术均告失效。我们提出完整性盾牌——一种文档层水印系统,其将模式感知的题目级水印嵌入评估PDF中,同时保持人眼可见外观不变。这些水印能持续阻止多模态大语言模型(MLLMs)解答受保护的考试PDF,并编码稳定的题目级特征签名,这些签名可从模型或学生回答中可靠恢复。在涵盖STEM、人文和医学推理的30场考试中,完整性盾牌在四种商用MLLM上实现了极高的防护率(91-94%的考试级阻断)和强大的检测可靠性(89-93%的特征签名检索率)。我们的演示平台展示了交互式界面:教师可上传试卷、预览水印行为,并检查人工智能使用前后的表现与作者归属证据。