Although the uncertainties of the workers can be addressed by the standard Combinatorial Multi-Armed Bandit (CMAB) framework in existing proposals through a trade-off between exploration and exploitation, we may not have sufficient budget to enable the trade-off among the individual workers, especially when the number of the workers is huge while the budget is limited. Moreover, the standard CMAB usually assumes the workers always stay in the system, whereas the workers may join in or depart from the system over time, such that what we have learnt for an individual worker cannot be applied after the worker leaves. To address the above challenging issues, in this paper, we first propose an off-line Context-Aware CMAB-based Incentive (CACI) mechanism. We innovate in leveraging the exploration-exploitation trade-off in a elaborately partitioned context space instead of the individual workers, to effectively incentivize the massive unknown workers with very limited budget. We also extend the above basic idea to the on-line setting where unknown workers may join in or depart from the systems dynamically, and propose an on-line version of the CACI mechanism. Specifically, by the exploitation-exploration trade-off in the context space, we learn to estimate the sensing ability of any unknown worker (even it never appeared in the system before) according to its context information. We perform rigorous theoretical analysis to reveal the upper bounds on the regrets of our CACI mechanisms and to prove their truthfulness and individual rationality, respectively. Extensive experiments on both synthetic and real datasets are also conducted to verify the efficacy of our mechanisms.
翻译:摘要:尽管现有方案中标准组合多臂强盗(CMAB)框架通过探索与利用的权衡可解决工作者的不确定性,但当工作者数量庞大而预算有限时,我们可能没有足够预算实现个体工作者层面的权衡。此外,标准CMAB通常假设工作者始终存在于系统中,而实际中工作者可能随时间动态加入或离开系统,导致对单个工作者的学习在其离开后失效。为解决上述挑战,本文首先提出一种离线上下文感知CMAB激励(CACI)机制。我们创新性地在精心划分的上下文空间(而非个体工作者层面)利用探索-利用权衡,从而以极有限预算有效激励大规模未知工作者。进一步将上述基本思想扩展到未知工作者可动态加入或离开系统的在线场景,并提出在线版CACI机制。具体而言,通过上下文空间中的利用-探索权衡,我们学习根据工作者的上下文信息估算其感知能力(即使该工作者从未在系统中出现)。我们进行严谨的理论分析,揭示了CACI机制遗憾值的上界,并分别证明了其真实性与个体理性。通过合成数据集和真实数据集上的大量实验验证了机制的有效性。