We investigate the potential of large language models (LLMs) to disentangle text variables--to remove the textual traces of an undesired forbidden variable in a task sometimes known as text distillation and closely related to the fairness in AI and causal inference literature. We employ a range of various LLM approaches in an attempt to disentangle text by identifying and removing information about a target variable while preserving other relevant signals. We show that in the strong test of removing sentiment, the statistical association between the processed text and sentiment is still detectable to machine learning classifiers post-LLM-disentanglement. Furthermore, we find that human annotators also struggle to disentangle sentiment while preserving other semantic content. This suggests there may be limited separability between concept variables in some text contexts, highlighting limitations of methods relying on text-level transformations and also raising questions about the robustness of disentanglement methods that achieve statistical independence in representation space.
翻译:我们研究了大型语言模型(LLMs)解缠文本变量的潜力——即在有时称为文本蒸馏且与AI公平性和因果推断文献密切相关的任务中,移除不当禁止变量在文本中的痕迹。我们采用多种LLM方法尝试解缠文本,通过识别并移除关于目标变量的信息,同时保留其他相关信号。结果表明,在情感移除这一严格测试中,经过LLM解缠处理后的文本与情感之间的统计关联仍可被机器学习分类器检测到。此外,我们发现人类标注者在保留其他语义内容的同时解缠情感也存在困难。这表明在某些文本语境中,概念变量间的可分离性可能有限,这揭示了依赖文本层面变换的方法的局限性,也引发了关于在表示空间中实现统计独立性的解缠方法鲁棒性的质疑。