Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2) performance-first: machines leading the process by recommending proxies based on predictive performance. Based on a controlled user study of a proxy construction task (N = 20), we show that the performance-first strategy facilitated faster iterations and decision-making, but also biased users towards well-performing proxies that are misaligned with the application goal. Our study highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables, yielding insights for future research to explore the opportunities and mitigate the risks.
翻译:预测建模具有增强人类决策的潜力。然而,许多预测模型在实践中失败,原因在于问题表述存在缺陷——当预测目标为抽象概念或构念时,实践者需定义恰当的目标变量作为代理以可操作化目标构念。在实践中,选择合适的代理目标变量往往并非不言自明,需要领域知识与迭代数据建模相结合。这一过程本质上是协作性的,涉及领域专家与数据科学家。本研究探讨了人机协同如何通过加速迭代同时保留人类判断来支持这一过程。我们研究了两种人机协同策略对代理构建的影响:1)相关性优先:由人类主导,通过选择相关代理;2)性能优先:由机器主导,基于预测性能推荐代理。基于一项代理构建任务的受控用户研究(N = 20),我们发现性能优先策略促进了更快的迭代与决策,但也使用户偏向于与应用目标不一致的高性能代理。本研究揭示了人机协同在机器学习目标变量可操作化中的机遇与风险,为未来研究探索机遇并缓解风险提供了见解。