Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments -- comments that were later alchemized into $\textit{TraceTarnish}$ data -- to gain valuable insights. The transformed $\textit{TraceTarnish}$ data was then further augmented by $\textit{StyloMetrix}$ to manufacture stylometric features -- features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ($L\_FUNC\_A$ $\&$ $L\_FUNC\_T$); content words and content word types ($L\_CONT\_A$ $\&$ $L\_CONT\_T$); and the Type-Token Ratio ($ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$) yielded significant Information-Gain readings. The identified stylometric cues -- function-word frequencies, content-word distributions, and the Type-Token Ratio -- serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed $\textit{TraceTarnish}$'s operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.

翻译：本研究对攻击脚本$\textit{TraceTarnish}$进行了更为严格的评估，该脚本利用对抗性文体计量学原理对文本信息的作者身份进行匿名化处理。为确保攻击的有效性和实用性，我们获取、处理并分析了Reddit评论——这些评论随后被转化为$\textit{TraceTarnish}$数据，以获取有价值的信息洞察。经转化后的$\textit{TraceTarnish}$数据进一步通过$\textit{StyloMetrix}$进行增强处理，生成文体计量特征，随后利用信息增益准则剔除冗余特征，仅保留最具信息量、预测性和区分度的特征。研究结果表明，功能词及功能词类型（$L\_FUNC\_A$ $\&$ $L\_FUNC\_T$）、实义词及实义词类型（$L\_CONT\_A$ $\&$ $L\_CONT\_T$）以及型例比（$ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$）均产生显著的信息增益值。所识别的文体计量线索——功能词频率、实义词分布以及型例比——可作为可靠的入侵指标（IoCs），揭示文本是否被蓄意篡改以掩盖真实作者身份。同样，这些特征可作为取证信标，提醒防御者存在对抗性文体计量攻击；但在缺乏原始文本的情况下，该信号可能难以被察觉，因其依赖于转换前后的对比分析。“试图抹去痕迹时，往往留下更深的印记。”基于这一认识，我们围绕这五个独立特征构建了$\textit{TraceTarnish}$的操作流程与输出结果，并据此构思并实现了可进一步增强攻击效能的改进方案。