When using a public communication channel -- whether formal or informal, such as commenting or posting on social media -- end users have no expectation of privacy: they compose a message and broadcast it for the world to see. Even if an end user takes utmost precautions to anonymize their online presence -- using an alias or pseudonym; masking their IP address; spoofing their geolocation; concealing their operating system and user agent; deploying encryption; registering with a disposable phone number or email; disabling non-essential settings; revoking permissions; and blocking cookies and fingerprinting -- one obvious element still lingers: the message itself. Assuming they avoid lapses in judgment or accidental self-exposure, there should be little evidence to validate their actual identity, right? Wrong. The content of their message -- necessarily open for public consumption -- exposes an attack vector: stylometric analysis, or author profiling. In this paper, we dissect the technique of stylometry, discuss an antithetical counter-strategy in adversarial stylometry, and devise enhancements through Unicode steganography.
翻译:当使用公共通信渠道时——无论是正式还是非正式的,例如在社交媒体上发表评论或帖子——终端用户并不期望隐私:他们撰写一条消息并将其广播给全世界看。即使终端用户采取最大限度的预防措施来匿名化其在线存在——使用别名或化名;掩盖其IP地址;伪造地理位置;隐藏其操作系统和用户代理;部署加密;使用一次性电话号码或电子邮件注册;禁用非必要设置;撤销权限;以及阻止Cookie和指纹识别——一个明显的元素仍然存在:消息本身。假设他们避免了判断失误或意外的自我暴露,那么应该几乎没有证据来验证他们的实际身份,对吗?错了。他们消息的内容——必然公开供公众消费——暴露了一个攻击向量:文体计量分析或作者画像。在本文中,我们剖析了文体计量技术,讨论了对抗性文体计量中的一种对立反制策略,并通过Unicode隐写术设计了增强方法。