Reliable, scalable detection of informal, small-scale environmental-health hazards (used lead-acid battery (ULAB) recycling, household-scale e-waste burning, indoor mercury amalgamation, brick kilns, small tanneries) remains an unsolved problem. These operations are invisible to satellites and absent from formal registries, yet disproportionately harm low-income populations in low- and middle-income countries. This paper articulates the problem class and explores a possible response: contextual geospatial features, with case-specific feature design informed by domain expertise. We use ULAB recycling as a demonstration case, drawing on 164 verified sites in Bangladesh and India from Pure Earth's Toxic Sites Identification Programme. At this sample size, five-fold cross-validation on the training set cannot statistically distinguish the engineered contextual features from a simple two-feature socio-demographic baseline. The added value only becomes visible when we evaluate outside the training set. On 172 held-out informal-recycling sites in non-NCR India and Bangladesh, the model assigns scores several times higher than to matched random urban controls; and on an independent set of 131 regulatory-confirmed formal recyclers, informal sites score materially higher than formal ones in non-NCR India, indicating that the model is picking up informal-recycler-specific structure rather than generic industrial signal. We frame these results as exploratory rather than confirmatory: label sparsity, gaps in point-of-interest coverage, and untested transfer beyond South Asia all remain open. We close with seven open problems and invite the environmental-health and geospatial machine-learning communities to engage with informal-hazard detection as a class of problems worth solving.
翻译:可靠、可扩展地检测非正规小规模环境健康危害(废旧铅酸电池(ULAB)回收、家庭级电子垃圾焚烧、室内汞齐化、砖窑、小型制革厂)仍是一个未解决的问题。这些作业在卫星图像中不可见,且未纳入正式登记体系,但对低收入和中等收入国家的贫困人口造成不成比例的伤害。本文阐述了这一问题类别,并探索了一种可能的应对方案:基于领域专业知识设计针对具体案例的上下文地理空间特征。我们以ULAB回收作为示范案例,利用“纯净地球”有毒场地识别计划中收集的印度和孟加拉国164个已验证站点数据。在该样本量下,训练集上的五折交叉验证无法从统计学上区分所设计的上下文特征与简单的双特征社会人口基线模型。仅当在训练集外进行评估时,其附加价值才显现。在印度非国家首都区(NCR)及孟加拉国的172个留出非正规回收站点上,模型评分比匹配的随机城市对照点高出数倍;而在131个经监管确认的正规回收商独立数据集中,印度非NCR地区的非正规站点评分显著高于正规站点,表明模型捕捉到了非正规回收商特有的结构特征,而非泛泛的工业信号。我们将这些结果定性为探索性而非验证性:标签稀疏性、兴趣点覆盖缺口以及南亚地区以外的迁移性未经验证等问题仍有待解决。最后我们提出七个未解决问题,并邀请环境健康与地理空间机器学习社区将非正规危害检测作为一类值得攻克的问题共同参与研究。