Greenwashing refers to practices by corporations or governments that intentionally mislead the public about their environmental impact. This paper provides a comprehensive and methodologically grounded survey of natural language processing (NLP) approaches for detecting greenwashing in textual data, with a focus on corporate climate communication. Rather than treating greenwashing as a single, monolithic task, we examine the set of NLP problems, also known as climate NLP tasks, that researchers have used to approximate it, ranging from climate topic detection to the identification of deceptive communication patterns. Our focus is on the methodological foundations of these approaches: how tasks are formulated, how datasets are constructed, and how model evaluation influences reliability. Our review reveals a fragmented landscape: several subtasks now exhibit near-perfect performance under controlled settings, yet tasks involving ambiguity, subjectivity, or reasoning remain challenging. Crucially, no dataset of verified greenwashing cases currently exists. We argue that advancing automated greenwashing detection requires principled NLP methodologies that combine reliable data annotations with interpretable model design. Future work should leverage third-party judgments, such as verified media reports or regulatory records, to mitigate annotation subjectivity and legal risk, and adopt decomposed pipelines that support human oversight, traceable reasoning, and efficient model design.
翻译:绿色漂洗指企业或政府机构在环境影响方面故意误导公众的行为。本文对基于文本数据的绿色漂洗检测自然语言处理方法进行了系统且方法学基础扎实的综述,重点关注企业气候传播领域。我们并非将绿色漂洗视为单一整体任务,而是系统考察了研究者用以近似该目标的一系列自然语言处理问题(亦称气候自然语言处理任务),涵盖从气候主题检测到欺骗性传播模式识别的多个层面。本文聚焦于这些方法的方法学基础:任务如何形式化、数据集如何构建、模型评估如何影响结果可靠性。综述发现该领域呈现碎片化态势:若干子任务在受控环境下已接近完美性能,但涉及模糊性、主观性或推理能力的任务仍具挑战性。关键问题在于目前缺乏经过核实的绿色漂洗案例数据集。我们认为推进自动化绿色漂洗检测需要建立规范的自然语言处理方法论,将可靠的数据标注与可解释的模型设计相结合。未来研究应利用第三方判断(如经核实的媒体报道或监管记录)来降低标注主观性与法律风险,并采用支持人工监督、可追溯推理及高效模型设计的模块化流程。