Towards Context-Aware Image Anonymization with Multi-Agent Reasoning

Street-level imagery contains personally identifiable information (PII), some of which is context-dependent. Existing anonymization methods either over-process images or miss subtle identifiers, while API-based solutions compromise data sovereignty. We present an agentic framework CAIAMAR (\underline{C}ontext-\underline{A}ware \underline{I}mage \underline{A}nonymization with \underline{M}ulti-\underline{A}gent \underline{R}easoning) for context-aware PII segmentation with diffusion-based anonymization, combining pre-defined processing for high-confidence cases with multi-agent reasoning for indirect identifiers. Three specialized agents coordinate via round-robin speaker selection in a Plan-Do-Check-Act (PDCA) cycle, enabling large vision-language models to classify PII based on spatial context (private vs. public property) rather than rigid category rules. The agents implement spatially-filtered coarse-to-fine detection where a scout-and-zoom strategy identifies candidates, open-vocabulary segmentation processes localized crops, and $IoU$-based deduplication ($30\%$ threshold) prevents redundant processing. Modal-specific diffusion guidance with appearance decorrelation substantially reduces re-identification (Re-ID) risks. On CUHK03-NP, our method reduces person Re-ID risk by $73\%$ ($R1$: $16.9\%$ vs. $62.4\%$ baseline). For image quality preservation on CityScapes, we achieve KID: $0.001$, and FID: $9.1$, significantly outperforming existing anonymization. The agentic workflow detects non-direct PII instances across object categories, and downstream semantic segmentation is preserved. Operating entirely on-premise with open-source models, the framework generates human-interpretable audit trails supporting EU's GDPR transparency requirements while flagging failed cases for human review.

翻译：街道级图像包含个人身份信息（PII），其中部分信息具有上下文依赖性。现有匿名化方法要么过度处理图像，要么遗漏细微标识符，而基于API的解决方案会损害数据主权。我们提出了一种智能体框架CAIAMAR（基于上下文感知的多智能体推理图像匿名化），用于结合扩散模型实现上下文感知的PII分割，该框架将高置信度案例的预定义处理与间接标识符的多智能体推理相结合。三个专门智能体通过循环发言选择机制在“计划-执行-检查-行动”（PDCA）周期中协调工作，使大型视觉语言模型能够基于空间上下文（私人与公共财产）而非僵硬分类规则来判定PII。这些智能体实现了空间过滤的由粗到精检测策略：侦察-缩放策略识别候选区域，开放词汇分割处理局部裁剪区域，并基于30%阈值的IoU去重机制避免冗余处理。采用模态特异性扩散引导与外貌去相关方法显著降低重识别（Re-ID）风险。在CUHK03-NP数据集上，我们的方法将行人Re-ID风险降低73%（R1指标：16.9% vs.基准方法的62.4%）。在CityScapes数据集上的图像质量保留方面，我们实现了KID:0.001与FID:9.1的指标，显著优于现有匿名化方法。该智能体工作流可跨对象类别检测非直接PII实例，且下游语义分割结果得以保留。整个框架完全基于本地开源模型运行，生成符合人类可理解的审计轨迹以支持欧盟GDPR透明度要求，同时对失败案例进行标记以供人工审查。