We introduce the QuadratiK package that incorporates innovative data analysis methodologies. The presented software, implemented in both R and Python, offers a comprehensive set of goodness-of-fit tests and clustering techniques using kernel-based quadratic distances, thereby bridging the gap between the statistical and machine learning literatures. Our software implements one, two and k-sample tests for goodness of fit, providing an efficient and mathematically sound way to assess the fit of probability distributions. Expanded capabilities of our software include supporting tests for uniformity on the d-dimensional Sphere based on Poisson kernel densities. Particularly noteworthy is the incorporation of a unique clustering algorithm specifically tailored for spherical data that leverages a mixture of Poisson kernel-based densities on the sphere. Alongside this, our software includes additional graphical functions, aiding the users in validating, as well as visualizing and representing clustering results. This enhances interpretability and usability of the analysis. In summary, our R and Python packages serve as a powerful suite of tools, offering researchers and practitioners the means to delve deeper into their data, draw robust inference, and conduct potentially impactful analyses and inference across a wide array of disciplines.
翻译:我们介绍QuadratiK软件包,该软件包整合了创新的数据分析方法。这一软件在R和Python中均有实现,提供了一套基于核二次距离的拟合优度检验与聚类技术综合工具,从而弥合了统计学与机器学习文献之间的鸿沟。我们的软件实现了单样本、双样本及多样本的拟合优度检验,为评估概率分布的拟合情况提供了高效且数学严谨的方法。软件的扩展功能包括支持基于泊松核密度的d维球面均匀性检验。尤其值得关注的是,软件集成了一种专为球形数据设计的独特聚类算法,该算法利用了球面上泊松核密度混合模型。此外,软件还包含辅助的图形功能,帮助用户验证、可视化及呈现聚类结果,从而提升了分析的可解释性与可用性。总之,我们的R与Python软件包作为一套强大的工具集,为研究人员和实践者提供了深入探索数据、进行稳健推断以及跨广泛学科开展具有潜在影响力的分析与推理的手段。