Emerging deployments of Generative AI increasingly execute inference across decentralized and heterogeneous edge devices rather than on a single trusted server. In such environments, a single device failure or misbehavior can disrupt the entire inference process, making traditional best-effort peer-to-peer routing insufficient. Coordinating distributed generative inference therefore requires mechanisms that explicitly account for reliability, performance variability, and trust among participating peers. In this paper, we present G-TRAC, a trust-aware coordination framework that integrates algorithmic path selection with system-level protocol design to ensure robust distributed inference. First, we formulate the routing problem as a \textit{Risk-Bounded Shortest Path} computation and introduce a polynomial-time solution that combines trust-floor pruning with Dijkstra's search, achieving sub-millisecond median routing latency at practical edge scales, and remaining below 10 ms at larger scales. Second, to operationally support the routing logic in dynamic environments, the framework employs a \textit{Hybrid Trust Architecture} that maintains global reputation state at stable anchors while disseminating lightweight updates to edge peers via background synchronization. Experimental evaluation on a heterogeneous testbed of commodity devices demonstrates that G-TRAC significantly improves inference completion rates, effectively isolates unreliable peers, and sustains robust execution even under node failures and network partitions.
翻译:新兴的生成式AI部署越来越多地依赖去中心化且异构的边缘设备执行推理,而非依赖单一的受信服务器。在此类环境中,单点设备故障或异常行为可能导致整个推理过程中断,使得传统的尽力而为型点对点路由策略难以胜任。协调分布式生成式推理因此需要显式考虑参与节点的可靠性、性能差异及信任度的机制。本文提出G-TRAC,一种融合算法路径选择与系统级协议设计以保障鲁棒分布式推理的信任感知协调框架。首先,我们将路由问题形式化为\textit{风险约束最短路径}计算,并提出一种结合信任基底剪枝与Dijkstra搜索的多项式时间解法,在实用边缘规模下实现亚毫秒级中位路由延迟,且在大规模场景下仍保持在10毫秒以内。其次,为在动态环境中支撑路由逻辑的运行时操作,该框架采用\textit{混合信任架构}:在稳定锚点维护全局信誉状态,同时通过后台同步机制向边缘节点传播轻量级更新。在异构商用设备实验床上的评估表明,G-TRAC显著提升了推理完成率,有效隔离了不可信节点,并能在节点失效与网络分区场景下维持鲁棒执行。