Reliable spatial analysis in GIScience requires preserving coordinate semantics, topology, units, and geographic plausibility. Current LLM-based GIS systems generate fluent scripts but rarely enforce these geographic rules at scale. We present GeoContra, a verification and repair framework for LLM-driven Python GIS workflows. It represents each task as an executable geospatial contract-including natural-language questions, schemas, CRS metadata, expected outputs, spatial predicates, topology, metrics, required operations, and forbidden shortcuts. Generated programs undergo static rule inspection, runtime validation, and semantic verification, with violations fed back into a bounded repair loop. Evaluated on 7,079 real geospatial tasks across 15 Boston-area zones, 9 task families, and 11 open-source models (600 runs each), GeoContra improves spatial correctness on closed models from 47.6% to 77.5% for DeepSeek-V4 and from 57.7% to 81.5% for Kimi-K2.5. Across 11 open models, average correctness rises by 26.6%. GeoContra turns fluent code production into verifiable spatial analysis, catching negative travel times, CRS/field-schema violations, missing predicates, and brittle output casts that otherwise yield executable but geographically invalid results.
翻译:GIScience中可靠的空间分析要求保持坐标语义、拓扑结构、单位一致性及地理合理性。当前基于LLM的GIS系统虽能生成流畅脚本,但很少在规模层面强制实施这些地理规则。我们提出GeoContra——面向LLM驱动的Python GIS工作流的验证与修复框架。它将每个任务表示为可执行的地理空间合约,包括自然语言问题、模式描述、CRS元数据、预期输出、空间谓词、拓扑结构、度量指标、所需操作及禁止使用的捷径。生成的程序需通过静态规则检查、运行时验证和语义验证,违规信息将被反馈至受限修复循环中。在波士顿地区15个分区的7,079个真实地理空间任务、9个任务簇及11个开源模型(每个模型运行600次)上的评估表明:GeoContra将闭源模型的空间正确率从47.6%提升至77.5%(DeepSeek-V4),从57.7%提升至81.5%(Kimi-K2.5);11个开源模型的平均正确率提升26.6%。GeoContra将流畅的代码生成转化为可验证的空间分析,成功捕获负旅行时间、CRS/字段模式违反、缺失谓词及脆弱输出类型转换等——这些错误通常会导致程序虽可执行却生成地理无效的结果。