Open source machine learning (ML) libraries enable developers to integrate advanced ML functionality into their own applications. However, popular ML libraries, such as TensorFlow, are not available natively in all programming languages and software package ecosystems. Hence, developers who wish to use an ML library which is not available in their programming language or ecosystem of choice, may need to resort to using a so-called binding library (or binding). Bindings provide support across programming languages and package ecosystems for reusing a host library. For example, the Keras .NET binding provides support for the Keras library in the NuGet (.NET) ecosystem even though the Keras library was written in Python. In this paper, we collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13 software package ecosystems by using an approach called BindFind, which can automatically identify bindings and link them to their host libraries. Furthermore, we conduct an in-depth study of 133 cross-ecosystem bindings and their development for 40 popular open source ML libraries. Our findings reveal that the majority of ML library bindings are maintained by the community, with npm being the most popular ecosystem for these bindings. Our study also indicates that most bindings cover only a limited range of the host library's releases, often experience considerable delays in supporting new releases, and have widespread technical lag. Our findings highlight key factors to consider for developers integrating bindings for ML libraries and open avenues for researchers to further investigate bindings in software package ecosystems.
翻译:开源机器学习(ML)库使开发者能够将先进的ML功能集成到自己的应用程序中。然而,流行的ML库(如TensorFlow)并非在所有编程语言和软件包生态系统中都原生可用。因此,希望使用在其所选编程语言或生态系统中不可用的ML库的开发者,可能需要借助所谓的绑定库(或绑定)。绑定通过跨编程语言和软件包生态系统提供支持,以实现对宿主库的重用。例如,Keras .NET绑定为NuGet(.NET)生态系统中的Keras库提供支持,尽管Keras库本身是用Python编写的。在本文中,我们通过一种名为BindFind的方法,自动识别绑定并将其链接到其宿主库,从而收集了跨13个软件包生态系统的546个ML库的2,436个跨生态系统绑定。此外,我们对40个流行的开源ML库的133个跨生态系统绑定及其开发进行了深入研究。我们的研究结果表明,大多数ML库绑定由社区维护,其中npm是这些绑定最受欢迎的生态系统。我们的研究还表明,大多数绑定仅覆盖宿主库有限范围的版本,通常在新版本支持上存在显著延迟,并普遍存在技术滞后。我们的发现强调了开发者在集成ML库绑定时应考虑的关键因素,并为研究人员进一步探索软件包生态系统中的绑定开辟了途径。