Robust estimation provides essential tools for analyzing data that contain outliers, ensuring that statistical models remain reliable even in the presence of some anomalous data. While robust methods have long been available in R, users of Python have lacked a comprehensive package that offers these methods in a cohesive framework. RobPy addresses this gap by offering a wide range of robust methods in Python, built upon established libraries including NumPy, SciPy, and scikit-learn. This package includes tools for robust preprocessing, univariate estimation, covariance matrices, regression, and principal component analysis, which are able to detect outliers and to mitigate their effect. In addition, RobPy provides specialized diagnostic plots for visualizing casewise and cellwise outliers. This paper presents the structure of the RobPy package, demonstrates its functionality through examples, and compares its features to existing implementations in other statistical software. By bringing robust methods to Python, RobPy enables more users to perform robust data analysis in a modern and versatile programming language.
翻译:稳健估计为分析包含异常值的数据提供了重要工具,确保统计模型即使在存在部分异常数据时仍保持可靠性。尽管稳健方法在R语言中早已可用,但Python用户一直缺乏一个在统一框架下提供这些方法的综合软件包。RobPy通过基于NumPy、SciPy和scikit-learn等成熟库构建的Python广泛稳健方法填补了这一空白。该软件包包含用于稳健预处理、单变量估计、协方差矩阵、回归和主成分分析的工具,能够检测异常值并减轻其影响。此外,RobPy还提供用于可视化个案异常值和单元异常值的专用诊断图。本文介绍了RobPy软件包的结构,通过示例演示其功能,并将其特性与其他统计软件中的现有实现进行比较。通过将稳健方法引入Python,RobPy使更多用户能够在现代且多功能的编程语言中执行稳健数据分析。