A Trust-Aware and Cost-Optimized Blockchain Oracle Selection Model with Deep Reinforcement Learning

The rapid development of blockchain technology has driven the widespread application of decentralized applications (DApps) across various fields. However, DApps cannot directly access external data and rely on oracles to interact with off-chain data. As a bridge between blockchain and external data sources, oracles pose potential risks of malicious behavior, which may inject incorrect or harmful data, leading to trust and security issues. Additionally, with the surge in data requests, the disparity in oracle trustworthiness and costs has increased, making the dynamic selection of the most suitable oracle for each request a critical challenge. To address these issues, this paper proposes a Trust-Aware and Cost-Optimized Blockchain Oracle Selection Model with Deep Reinforcement Learning (TCO-DRL). The model incorporates a comprehensive trust management mechanism to evaluate oracle reputation from multiple dimensions and employs an improved sliding time window to monitor reputation changes in real time, enhancing resistance to malicious attacks. Moreover, TCO-DRL uses deep reinforcement learning algorithms to dynamically adapt to fluctuations in oracle reputation, ensuring the selection of high-reputation oracles while optimizing node selection, thereby reducing costs without compromising data quality. We implemented and validated TCO- DRL on Ethereum. Experimental results show that, compared to existing methods, TCO-DRL reduces the allocation rate to malicious oracles by more than 39.10% and saves over 12.00% in costs. Furthermore, simulated experiments on various malicious attacks further validate the robustness and effectiveness of TCO-DRL

翻译：区块链技术的快速发展推动了去中心化应用（DApps）在各领域的广泛应用。然而，DApps无法直接访问外部数据，需依赖预言机与链下数据进行交互。作为区块链与外部数据源之间的桥梁，预言机存在潜在的恶意行为风险，可能注入错误或有害数据，引发信任与安全问题。此外，随着数据请求的激增，预言机在可信度与成本方面的差异日益显著，如何为每个请求动态选择最合适的预言机成为关键挑战。为解决这些问题，本文提出一种基于深度强化学习的信任感知与成本优化的区块链预言机选择模型（TCO-DRL）。该模型集成了全面的信任管理机制，从多维度评估预言机声誉，并采用改进的滑动时间窗口实时监测声誉变化，增强了对恶意攻击的抵御能力。此外，TCO-DRL利用深度强化学习算法动态适应预言机声誉的波动，在优化节点选择的同时确保选择高声誉预言机，从而在不影响数据质量的前提下降低成本。我们在以太坊上实现并验证了TCO-DRL。实验结果表明，与现有方法相比，TCO-DRL将恶意预言机的分配率降低了超过39.10%，并节省了超过12.00%的成本。此外，针对多种恶意攻击的模拟实验进一步验证了TCO-DRL的鲁棒性与有效性。