The log-transform is a common tool in statistical analysis, reducing the impact of extreme values, compressing the range of reported values for improved visualization, enabling the usage of parametric statistical tests requiring normally distributed data, or enabling linear models on non-linear data. Practitioners are rarely aware that log-transformed results can reverse findings: a hypothesis test without the transform can show a negative trend, while with the log-transform, it can show a positive trend, both statistically significant. We derive necessary and sufficient conditions underlying this paradoxical pattern reversal using finite difference notation. We show that biomedical image quantification is very susceptible to these conditions. Using a novel heuristic maximizing the reversal, we show that statistical significance of the paradoxical pattern reversal can be easily induced by changing as little as 5% of a dataset. We illustrate how quantifying the sizes of objects in proportional data, especially where object sizes capture underlying creation and destruction dynamics, satisfies the precondition for the paradox. We discuss recommendations on proper use of the log-transform, discuss methods to explore the underlying patterns robustly, and emphasize that any transformed result should always be accompanied by its non-transformed source equivalent to exclude accidental confounded findings.
翻译:对数变换是统计分析中的常用工具,可降低极端值的影响、压缩报告值范围以改善可视化、支持需正态分布数据的参数统计检验应用,或对非线性数据建立线性模型。但实践者鲜少意识到,经对数变换后的结果可能使研究发现发生反转:未进行变换时的假设检验可能呈现负趋势,而经对数变换后则可能呈现正趋势,且两者均具有统计显著性。我们利用有限差分符号推导出这种悖论性模式反转的充分必要条件,并证明生物医学图像量化极易满足这些条件。通过一种最大化反转程度的新型启发式方法,我们表明仅需改变数据集5%的内容即可轻易诱导出具有统计显著性的悖论性模式反转。我们阐释了在比例数据中量化物体尺寸(尤其是当物体尺寸反映底层生成与消亡动态时)如何满足该悖论的前提条件。最后,我们讨论了正确使用对数变换的建议,介绍了稳健探索潜在模式的方法,并强调任何经变换的结果都应始终附带其未变换的源等效结果,以排除偶然的混淆发现。