The number of modes in a probability density function is representative of the model's complexity and can also be viewed as the number of existing subpopulations. Despite its relevance, little research has been devoted to its estimation. Focusing on the univariate setting, we propose a novel approach targeting prediction accuracy inspired by some overlooked aspects of the problem. We argue for the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view blending global and local density properties. Our method builds upon a combination of flexible kernel estimators and parsimonious compositional splines. Feature exploration, model selection and mode testing are implemented in the Bayesian inference paradigm, providing soft solutions and allowing to incorporate expert judgement in the process. The usefulness of our proposal is illustrated through a case study in sports analytics, showcasing multiple companion visualisation tools. A thorough simulation study demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, our method emerges as a top-tier alternative offering innovative solutions for analysts.
翻译:概率密度函数中的模态数量反映了模型的复杂性,也可视为现有子群体的数量。尽管其具有重要相关性,但相关估计研究仍十分有限。针对单变量场景,我们提出了一种基于问题中常被忽视的方面、旨在提升预测精度的新方法。我们论证了解的空间中结构性的必要性、模态的主观性与不确定性本质,以及融合全局与局部密度属性的整体视角的便利性。该方法结合了灵活核估计与简约组合样条。通过贝叶斯推断范式实现特征探索、模型选择与模态检验,提供柔性解并允许在过程中纳入专家判断。我们通过体育分析中的案例研究展示了该方法的实用性,并提供了多种配套可视化工具。深入的模拟研究表明,传统的模态驱动方法反而难以获得准确结果。在此背景下,我们的方法作为顶级替代方案,为分析人员提供了创新性解决方案。