To tweak or not to tweak. How exploiting flexibilities in gene set analysis leads to over-optimism

Gene set analysis, a popular approach for analysing high-throughput gene expression data, aims to identify sets of genes that show enriched expression patterns between two conditions. In addition to the multitude of methods available for this task, users are typically left with many options when creating the required input and specifying the internal parameters of the chosen method. This flexibility can lead to uncertainty about the 'right' choice, further reinforced by a lack of evidence-based guidance. Especially when their statistical experience is scarce, this uncertainty might entice users to produce preferable results using a 'trial-and-error' approach. While it may seem unproblematic at first glance, this practice can be viewed as a form of 'cherry-picking' and cause an optimistic bias, rendering the results non-replicable on independent data. After this problem has attracted a lot of attention in the context of classical hypothesis testing, we now aim to raise awareness of such over-optimism in the different and more complex context of gene set analyses. We mimic a hypothetical researcher who systematically selects the analysis variants yielding their preferred results, thereby considering three distinct goals they might pursue. Using a selection of popular gene set analysis methods, we tweak the results in this way for two frequently used benchmark gene expression data sets. Our study indicates that the potential for over-optimism is particularly high for a group of methods frequently used despite being commonly criticised. We conclude by providing practical recommendations to counter over-optimism in research findings in gene set analysis and beyond.

翻译：基因集分析是一种分析高通量基因表达数据的常用方法，旨在识别在两种条件之间表现出富集表达模式的基因集。除了用于此任务的众多方法外，用户在创建所需输入和指定所选方法的内部参数时通常面临许多选项。这种灵活性可能导致对“正确”选择的不确定性，而缺乏基于证据的指导进一步加剧了这种情况。尤其是当用户的统计学经验不足时，这种不确定性可能会诱使他们通过“试错”方法产生更合意的结果。尽管乍看之下这似乎没有问题，但这种做法可被视为一种“挑拣”形式，并导致乐观偏差，使结果无法在独立数据上复现。在该问题引起经典假设检验领域广泛关注后，我们现在希望提高对基因集分析这一不同且更复杂背景下此类过度乐观的认识。我们模拟了一位假设的研究人员，他们系统性地选择能产生其偏好结果的分析变体，从而考虑了可能追求的三种不同目标。使用一组流行的基因集分析方法，我们以这种方式对两个常用的基准基因表达数据集进行调整。我们的研究表明，一组尽管常受批评但仍被频繁使用的方法尤其容易导致过度乐观。最后，我们提供了实用建议，以应对基因集分析及其他领域研究结果中的过度乐观。