In data-driven decision-making in marketing, healthcare, and education, it is desirable to utilize a large amount of data from existing ventures to navigate high-dimensional feature spaces and address data scarcity in new ventures. We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments and formally defining task discrepancies through the lens of Markov decision processes (MDPs). We propose a framework of Transferred Fitted $Q$-Iteration algorithm with general function approximation, enabling the direct estimation of the optimal action-state function $Q^*$ using both target and source data. We establish the relationship between statistical performance and MDP task discrepancy under sieve approximation, shedding light on the impact of source and target sample sizes and task discrepancy on the effectiveness of knowledge transfer. We show that the final learning error of the $Q^*$ function is significantly improved from the single task rate both theoretically and empirically.
翻译:在营销、医疗和教育领域的数据驱动决策中,利用现有项目的大量数据来导航高维特征空间并解决新项目中的数据稀缺问题具有重要价值。我们通过聚焦于批量平稳环境,并从马尔可夫决策过程(MDPs)的视角形式化定义任务差异,来探索动态决策中的知识迁移。我们提出了一个具有通用函数逼近的迁移拟合$Q$迭代算法框架,能够同时利用目标数据和源数据直接估计最优动作-状态函数$Q^*$。我们在筛近似下建立了统计性能与MDP任务差异之间的关系,揭示了源样本与目标样本规模以及任务差异对知识迁移效果的影响。我们从理论和实证两方面证明,$Q^*$函数的最终学习误差相较于单任务学习速率有显著改善。