Advances in sensor networks have enabled real-time stream discharge monitoring, yet persistent sensor malfunctions limit data utility. Manual quality control by expert hydrologists cannot scale with networks generating millions of measurements annually. We introduce HydroGEM, a foundation model for continental-scale streamflow quality control designed to support human expertise. HydroGEM uses self-supervised pretraining on 6.03 million clean sequences from 3,724 USGS stations to learn general hydrological representations, followed by fine-tuning with synthetic anomalies for detection and reconstruction. A hybrid TCN-Transformer architecture (14.2M parameters) captures both local and long-range temporal dependencies, while hierarchical normalization handles six orders of magnitude in discharge. On held-out observations from 799 stations with 18 synthetic anomaly types grounded in USGS standards, HydroGEM achieves F1=0.792 for detection and 68.7% reconstruction error reduction, outperforming the strongest baseline by 36.3%. For cross-national validation on 100 Environment and Climate Change Canada stations using tolerant evaluation with a plus or minus 24-hour buffer, HydroGEM achieves Tolerant F1=0.70 with 90.1% segment-level event detection, demonstrating cross-national generalization. The model maintains consistent detection across correction magnitudes and aligns with operational seasonal patterns, with peak flagging during winter ice-affected periods matching hydrologists' correction behavior. Architectural separation between simplified training anomalies and complex test anomalies confirms that performance reflects learned hydrometric principles rather than pattern memorization.
翻译:暂无翻译