Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks

Decentralized large language model inference networks require lightweight mechanisms to reward high quality outputs under heterogeneous latency and cost. Proof of Quality provides scalable verification by sampling evaluator nodes that score candidate outputs, then aggregating their scores into a consensus signal that determines rewards. However, evaluator heterogeneity and malicious score manipulation can distort consensus and inflate payouts, which weakens incentive alignment in open participation settings. This paper extends a cost-aware Proof of Quality mechanism by adding adversary-resilient consensus formation. We study robust aggregation rules, including median and trimmed mean, and an adaptive trust-weighted consensus that updates evaluator weights from deviation signals. Using question answering and summarization workloads with a ground truth proxy for offline analysis, we quantify evaluator reliability and show strong variance across evaluators, including task-dependent misalignment that can invert correlations. We then evaluate robustness under four adversarial strategies, including noise injection, boosting, sabotage, and intermittent manipulation, across a sweep of malicious ratios and evaluator sample sizes. Our results show that robust aggregation improves consensus alignment with the ground truth proxy and reduces sensitivity to noisy and strategic attacks compared with simple averaging. We further characterize the operational trade-off introduced by evaluator sampling, where larger evaluator sets reduce evaluator rewards and increase payoff variance while inference rewards remain relatively stable in our configuration. These findings motivate robust consensus as a default component for cost-aware Proof of Quality and provide practical guidance for selecting evaluator sampling parameters under adversarial risk and resource constraints.

翻译：去中心化大语言模型推理网络需要轻量级机制，以在异构延迟与成本条件下奖励高质量输出。质量证明通过采样评估节点对候选输出进行评分，并将其分数聚合为决定奖励的共识信号，从而实现可扩展的验证。然而，评估节点的异构性与恶意评分操纵可能扭曲共识并虚增支付，从而削弱开放参与环境中的激励一致性。本文通过引入抗敌手共识构建机制，扩展了一种成本感知的质量证明方法。我们研究了包括中位数与截尾均值在内的鲁棒聚合规则，以及一种基于偏差信号动态更新评估节点权重的自适应信任加权共识机制。通过使用带有离线分析基准真值代理的问答与摘要任务负载，我们量化了评估节点的可靠性，并显示出评估节点间存在显著差异，包括可能逆转相关性的任务依赖性失准现象。随后，我们在恶意节点比例与评估节点采样规模的参数扫描中，评估了机制在四种对抗策略下的鲁棒性，包括噪声注入、分数提升、恶意破坏与间歇性操纵。结果表明，与简单平均相比，鲁棒聚合提升了共识与基准真值代理的一致性，并降低了对噪声攻击与策略性攻击的敏感性。我们进一步刻画了评估节点采样引入的运行权衡：更大的评估节点集会降低评估者奖励并增加支付方差，而在我们的配置中推理奖励保持相对稳定。这些发现论证了将鲁棒共识作为成本感知质量证明的默认组件的必要性，并为在对抗风险与资源约束下选择评估节点采样参数提供了实践指导。