Intelligent systems powered by large-scale sensor networks are shifting from predefined monitoring to intent-driven operation, revealing a critical Semantic-to-Physical Mapping Gap. While large language models (LLMs) excel at semantic understanding, existing perception-centric pipelines operate retrospectively, overlooking the fundamental decision of what to sense and when. We formalize this proactive decision as Semantic-Spatial Sensor Scheduling (S3) and demonstrate that direct LLM planning is unreliable due to inherent gaps in representation, reasoning, and optimization. To bridge these gaps, we introduce the Spatial Trajectory Graph (STG), a neuro-symbolic paradigm governed by a verify-before-commit discipline that transforms open-ended planning into a verifiable graph optimization problem. Based on STG, we implement IoT-Brain, a concrete system embodiment, and construct TopoSense-Bench, a campus-scale benchmark with 5,250 natural-language queries across 2,510 cameras. Evaluations show that IoT-Brain boosts task success rate by 37.6% over the strongest search-intensive methods while running nearly 2 times faster and using 6.6 times fewer prompt tokens. In real-world deployment, it approaches the reliability upper bound while reducing 4.1 times network bandwidth, providing a foundational framework for LLMs to interact with the physical world with unprecedented reliability and efficiency.
翻译:由大规模传感器网络驱动的智能系统正从预定义监测转向意图驱动操作,暴露出关键的语义-物理映射鸿沟。尽管大语言模型(LLMs)在语义理解方面表现卓越,但现有感知导向型流水线采用事后回溯的运行方式,忽视了"感知什么、何时感知"这一根本性决策问题。我们将这种前瞻性决策形式化为语义-空间传感器调度(S3),并证明直接依赖LLM进行规划并不可靠,原因在于其内在的表征、推理与优化鸿沟。为弥合这些鸿沟,我们提出空间轨迹图(STG)——一种遵循"先验证后提交"准则的神经符号范式,将开放式规划转化为可验证的图优化问题。基于STG,我们实现了物联网大脑这一具体系统载体,并构建了TopoSense-Bench——一个涵盖2510个摄像头、包含5250条自然语言查询的校园级基准测试。评估表明,物联网大脑在任务成功率上较最强搜索密集型方法提升37.6%,同时运行速度提升近2倍,提示词令牌消耗量降低6.6倍。在实际部署中,该方案逼近可靠性上限,同时将网络带宽降低4.1倍,为大语言模型以前所未有的可靠性和效率与物理世界交互奠定了基础框架。