Estimation of the complete distribution of a random variable is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d. setting, but the arbitrary data dependent setting remains largely unaddressed. Consistent with known impossibility results, we present computationally felicitous time-uniform and value-uniform bounds on the CDF of the running averaged conditional distribution of a real-valued random variable which are always valid and sometimes trivial, along with an instance-dependent convergence guarantee. The importance-weighted extension is appropriate for estimating complete counterfactual distributions of rewards given controlled experimentation data exhaust, e.g., from an A/B test or a contextual bandit.
翻译:随机变量的完整分布估计是手动和自动决策中的基本操作。这一问题在独立同分布(i.i.d.)设定下已得到广泛研究,但在任意数据依赖的设定下仍未得到充分解决。基于已知的不可能性结论,我们提出了实值随机变量运行平均条件分布的累积分布函数(CDF)的计算高效的时间一致与值一致上下界,这些界始终有效但偶尔平凡,同时给出了实例相关的收敛保证。重要性加权扩展适用于在给定受控实验数据(例如来自A/B测试或上下文强盗算法)的情况下估计奖励的完整反事实分布。