Addressing bias in online selection with limited budget of comparisons

Consider a hiring process with candidates coming from different universities. It is easy to order candidates who have the exact same background, yet it can be challenging to compare candidates otherwise. The latter case requires additional assessments, leading to a potentially high total cost for the hiring organization. Given an assigned budget, what is the optimal strategy to select the most qualified candidate? In the absence of additional information, we model the above problem by introducing a new variant of the secretary problem. Completely ordered candidates, belonging to distinct groups, are arriving in a sequential manner. The decision maker has access to the partial order of the candidates within their own group and can request access to the total order of observed candidates by paying some price. Given a bounded budget of comparisons, the goal of the decision-maker is to maximize the probability of selecting the best candidate. We consider a special case of two groups with stochastic i.i.d.\ group membership. We introduce and analyze a particular family of algorithms that we called Dynamic Double Threshold (DDT) family, deriving its asymptotic success probability which, given an optimal choice of parameter converges rapidly to the theoretical upper bound of $1/e$ as the comparison budget growth. We provide an optimal non-asymptotic memory-less algorithm for the above problem and give numerical evidence that it belongs to the DDT family when the number of candidates is high. We compare theoretically and numerically the optimal algorithm with a more naive approach that is directly inspired by the standard single-threshold secretary algorithm. Our analysis reveals several alluring properties of the optimal algorithm. It provides a step towards a fairer online selection process in the presence of unidentifiable biases.

翻译：考虑一个来自不同大学的候选人招聘流程。对于背景完全相同的候选人，排序较为容易，但在不同背景的候选人之间进行比较则具有挑战性。后者需要额外的评估，可能导致招聘组织承担较高的总成本。在给定预算的情况下，选择最合格候选人的最优策略是什么？在缺乏额外信息的情况下，我们通过引入一种新的秘书问题变体来对上述问题进行建模。完全有序的候选人属于不同群体，以顺序方式到达。决策者可以访问候选人所在群体内的偏序关系，并可以通过支付一定代价获取已观测候选人的全序关系。在有限的比较预算约束下，决策者的目标是最大化选中最佳候选人的概率。我们考虑两个群体的特例，其中群体成员身份服从随机独立同分布。我们引入并分析了一类特定算法族——动态双阈值（DDT）族，推导出其渐近成功概率。在最优参数选择下，该成功概率随比较预算增长而快速收敛于理论上限$1/e$。我们为上述问题提供了一种最优的非渐近无记忆算法，并通过数值实验表明，当候选人数量较大时，该算法属于DDT族。我们将该最优算法与受标准单阈值秘书算法直接启发的更朴素方法进行了理论和数值比较。分析揭示了该最优算法的若干诱人性质。它为在存在不可识别偏差的情况下实现更公平的在线选择过程提供了重要方向。