Recent proposals advocate using keystroke timing signals, specifically the coefficient of variation ($δ$) of inter-keystroke intervals, to distinguish human-composed text from AI-generated content. We demonstrate that this class of defenses is insecure against two practical attack classes: the copy-type attack, in which a human transcribes LLM-generated text producing authentic motor signals, and timing-forgery attacks, in which automated agents sample inter-keystroke intervals from empirical human distributions. Using 13,000 sessions from the SBU corpus and three timing-forgery variants (histogram sampling, statistical impersonation, and generative LSTM), we show all attacks achieve $\ge$99.8% evasion rates against five classifiers. While detectors achieve AUC=1.000 against fully-automated injection, they classify $\ge$99.8% of attack samples as human with mean confidence $\ge$0.993. We formalize a non-identifiability result: when the detector observes only timing, the mutual information between features and content provenance is zero for copy-type attacks. Although composition and transcription produce statistically distinguishable motor patterns (Cohen's d=1.28), both yield $δ$ values 2-4x above detection thresholds, rendering the distinction security-irrelevant. These systems confirm a human operated the keyboard, but not whether that human originated the text. Securing provenance requires architectures that bind the writing process to semantic content.
翻译:近期研究提出利用击键时序信号——特别是击键间隔变异系数($δ$)——来区分人工撰写文本与AI生成内容。本文证明此类防御机制在面对两类实际攻击时存在安全隐患:其一是抄录型攻击,即人工转录LLM生成文本时产生真实的运动信号;其二是时序伪造攻击,即自动化代理从经验性人类分布中采样击键间隔。通过使用SBU语料库中的13,000个会话数据及三种时序伪造变体(直方图采样、统计模拟和生成式LSTM),我们证明所有攻击对五种分类器的规避率均达到$\ge$99.8%。虽然检测器对全自动注入攻击可实现AUC=1.000的检测性能,但其将$\ge$99.8%的攻击样本判定为人类撰写,平均置信度达$\ge$0.993。我们形式化证明了不可识别性结论:当检测器仅观测时序信号时,对于抄录型攻击,特征与内容来源之间的互信息为零。尽管原创撰写与转录行为会产生统计可区分的运动模式(科恩d值=1.28),但二者产生的$δ$值均超出检测阈值2-4倍,使得该差异在安全层面失去意义。此类系统仅能确认键盘操作者是人类,但无法判定文本是否由该操作者原创。要确保来源可信性,需要构建能将写作过程与语义内容相绑定的系统架构。