When it comes to the safety of cosmetic products, compliance with regulatory standards is crucialto guarantee consumer protection against the risks of skin irritation. Toxicologists must thereforebe fully conversant with all risks. This applies not only to their day-to-day work, but also to allthe algorithms they integrate into their routines. Recognizing this, ensuring the reproducibility ofalgorithms becomes one of the most crucial aspects to address.However, how can we prove the robustness of an algorithm such as the random forest, that reliesheavily on randomness? In this report, we will discuss the strategy of integrating random forest intoocular tolerance assessment for toxicologists.We will compare four packages: randomForest and Ranger (R packages), adapted in Python via theSKRanger package, and the widely used Scikit-Learn with the RandomForestClassifier() function.Our goal is to investigate the parameters and sources of randomness affecting the outcomes ofRandom Forest algorithms.By setting comparable parameters and using the same Pseudo-Random Number Generator (PRNG),we expect to reproduce results consistently across the various available implementations of therandom forest algorithm. Nevertheless, this exploration will unveil hidden layers of randomness andguide our understanding of the critical parameters necessary to ensure reproducibility across all fourimplementations of the random forest algorithm.
翻译:在化妆品安全性领域,符合监管标准对于保障消费者免受皮肤刺激风险至关重要。毒理学家因此必须全面掌握所有风险,这不仅适用于日常工作,也涵盖其日常工作中整合的所有算法。认识到这一点,确保算法的可复现性成为需要解决的最关键问题之一。然而,如何证明像随机森林这样严重依赖随机性的算法的稳健性?本报告将探讨将随机森林整合到毒理学家眼部耐受性评估中的策略。我们将比较四个软件包:randomForest和Ranger(R语言包,通过SKRanger包在Python中适配),以及广泛使用的Scikit-Learn中的RandomForestClassifier()函数。我们的目标是研究影响随机森林算法结果的参数和随机性来源。通过设置可比较的参数并使用相同的伪随机数生成器(PRNG),我们期望在随机森林算法的各种现有实现中一致地复现结果。尽管如此,本次探索将揭示隐藏的随机性层次,并指导我们理解确保所有四种随机森林算法实现之间可复现性所需的关键参数。