Model distillation -- using outputs from a large teacher model to teach a small student model -- is a practical means of creating efficient models for a particular task. We ask: Can we identify a students' teacher based on its outputs? Such "footprints" left by teacher LLMs would be interesting artifacts. Beyond this, reliable teacher inference may have practical implications as actors seek to distill specific capabilities of massive proprietary LLMs into deployed smaller LMs, potentially violating terms of service. We consider practical task distillation targets including summarization, question answering, and instruction-following. We assume a finite set of candidate teacher models, which we treat as blackboxes. We design discriminative models that operate over lexical features. We find that $n$-gram similarity alone is unreliable for identifying teachers, but part-of-speech (PoS) templates preferred by student models mimic those of their teachers.
翻译:模型蒸馏——利用大型教师模型的输出来指导小型学生模型——是创建针对特定任务的高效模型的实用方法。我们提出疑问:能否根据学生模型的输出识别其教师?这种由教师大语言模型留下的"足迹"将是有趣的研究对象。除此之外,可靠的教师推断可能具有实际意义,因为行为者试图将大型专有大语言模型的特定能力蒸馏到部署的小型语言模型中,这可能违反服务条款。我们考虑了实际的蒸馏任务目标,包括摘要生成、问答和指令遵循。我们假设存在有限的候选教师模型集合,并将其视为黑盒。我们设计了基于词汇特征的判别模型。研究发现,仅凭$n$-元语法相似性不足以可靠地识别教师,但学生模型偏好的词性模板会模仿其教师的模板。