Unmanned Aerial Vehicles (UAVs) in disaster response require complex, queryable intelligence that on-board CNNs cannot provide. While Vision-Language Models (VLMs) offer this semantic reasoning, their high resource demands make on-device deployment infeasible, and naive cloud offloading fails under the low-bandwidth networks common in disaster zones. We present AVERY, a framework that enables VLM deployment through adaptive split computing. We advance the split computing paradigm beyond traditional depth-wise partitioning by introducing a functional, cognitive-inspired dual-stream split that separates the VLM into a high-frequency, low-resolution "context stream" for real-time awareness and a low-frequency, high-fidelity "insight stream" for deep analysis. A lightweight, self-aware on-board controller manages this architecture, monitoring network conditions and operator intent to dynamically select from pre-trained compression models, navigating the fundamental accuracy-throughput trade-off. Evaluated using the VLM LISA-7B across an edge-cloud scenario under fluctuating network conditions, AVERY consistently outperforms static configurations, achieving 11.2% higher accuracy than raw image compression and 93.98% lower energy consumption compared to full-edge execution, thereby enhancing mission efficiency and enabling real-time, queryable intelligence on resource-constrained platforms in dynamic environments.
翻译:灾害响应中的无人机需要复杂、可查询的智能能力,而机载CNN无法提供此类能力。虽然视觉-语言模型具备这种语义推理能力,但其高资源需求使得在设备端部署不可行,而在灾害区域常见的低带宽网络中,简单的云端卸载方案也会失效。我们提出了AVERY框架,通过自适应分割计算实现VLM的部署。我们将分割计算范式从传统的深度划分推进到功能化、认知启发的双流分割,将VLM分离为高频低分辨率的"上下文流"用于实时感知,以及低频高保真的"洞察流"用于深度分析。一个轻量级的自适应机载控制器管理该架构,通过监控网络条件和操作者意图,动态选择预训练的压缩模型,从而在准确性与吞吐量的根本权衡中进行优化。在波动网络条件下的边缘-云端场景中,使用VLM LISA-7B进行评估,AVERY始终优于静态配置方案:相比原始图像压缩方法准确率提升11.2%,与全边缘执行相比能耗降低93.98%,从而提升了任务效率,并在动态环境中的资源受限平台上实现了实时可查询的智能能力。