Extreme Edge Computing (XEC) distributes streaming workloads across consumer-owned devices, exploiting their proximity to users and ubiquitous availability. Many such workloads are AI-driven, requiring continuous neural network inference for tasks like object detection and video analytics. Distributed Inference (DI), which partitions model execution across multiple edge devices, enables these streaming services to meet strict throughput and latency requirements. Yet consumer devices exhibit volatile computational availability due to competing applications and unpredictable usage patterns. This volatility poses a fundamental challenge: how can we quantify the probability that a device, or ensemble of devices, will maintain the processing rate required by a streaming service? This paper presents an analytical framework for computational reliability in XEC, defined as the probability that instantaneous capacity meets demand at a specified Quality of Service (QoS) threshold. We derive closed-form reliability expressions under two information regimes: Minimal Information (MI), requiring only declared operational bounds, and historical data, which refines estimates via Maximum Likelihood Estimation from past observations. The framework extends to multi-device deployments, providing reliability expressions for series, parallel, and partitioned workload configurations. We derive optimal workload allocation rules and analytical bounds for device selection, equipping orchestrators with tractable tools to evaluate deployment feasibility and configure distributed streaming systems. We validate the framework using real-time object detection with YOLO11m model as a representative DI streaming workload; experiments on emulated XED environments demonstrate close agreement between analytical predictions, Monte Carlo sampling, and empirical measurements across diverse capacity and demand configurations.
翻译:极端边缘计算(XEC)将流式工作负载分布在用户拥有的设备上,利用其接近用户和普遍可用的特性。许多此类工作负载由人工智能驱动,需要持续进行神经网络推理以完成目标检测和视频分析等任务。分布式推理(DI)将模型执行划分到多个边缘设备上,使这些流式服务能够满足严格的吞吐量和延迟要求。然而,由于竞争应用程序和不可预测的使用模式,消费级设备表现出波动的计算可用性。这种波动性带来了一个根本性挑战:我们如何量化单个设备或设备集合能够维持流式服务所需处理速率的概率?本文提出了一个XEC中计算可靠性的分析框架,该框架将计算可靠性定义为瞬时容量在指定服务质量(QoS)阈值下满足需求的概率。我们在两种信息机制下推导了闭式可靠性表达式:最小信息(MI)机制(仅需声明的操作边界)和历史数据机制(通过过去观测的最大似然估计来优化估计)。该框架扩展到多设备部署,为串联、并联和分区工作负载配置提供了可靠性表达式。我们推导了最优工作负载分配规则和设备选择的分析边界,为编排器提供了可处理的工具来评估部署可行性并配置分布式流式系统。我们使用YOLO11m模型进行实时目标检测作为代表性DI流式工作负载来验证该框架;在模拟XED环境上的实验表明,分析预测、蒙特卡洛采样和实际测量在不同容量和需求配置下具有高度一致性。