模型中心及超越：分析模型流行度、性能与文档化 (Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation)

Pritam Kadasi,Sriman Reddy,Srivathsa Vamsi Chaturvedula,Rudranshu Sen,Agnish Saha,Soumavo Sikdar,Sayani Sarkar,Suhani Mittal,Rohit Jindal,Mayank Singh

from arxiv, Accepted to ICWSM'25

With the massive surge in ML models on platforms like Hugging Face, users often lose track and struggle to choose the best model for their downstream tasks, frequently relying on model popularity indicated by download counts, likes, or recency. We investigate whether this popularity aligns with actual model performance and how the comprehensiveness of model documentation correlates with both popularity and performance. In our study, we evaluated a comprehensive set of 500 Sentiment Analysis models on Hugging Face. This evaluation involved massive annotation efforts, with human annotators completing nearly 80,000 annotations, alongside extensive model training and evaluation. Our findings reveal that model popularity does not necessarily correlate with performance. Additionally, we identify critical inconsistencies in model card reporting: approximately 80\% of the models analyzed lack detailed information about the model, training, and evaluation processes. Furthermore, about 88\% of model authors overstate their models' performance in the model cards. Based on our findings, we provide a checklist of guidelines for users to choose good models for downstream tasks.

翻译：随着Hugging Face等平台上机器学习模型数量的激增，用户常常难以追踪并难以为其下游任务选择最佳模型，往往依赖于下载量、点赞数或时效性所指示的模型流行度。本研究探讨了这种流行度是否与实际模型性能相符，以及模型文档的完整性与流行度和性能之间的关联。在我们的研究中，我们对Hugging Face上500个情感分析模型进行了全面评估。该评估涉及大规模标注工作，人工标注者完成了近80,000条标注，同时进行了广泛的模型训练与评估。我们的研究结果表明，模型流行度并不必然与性能相关。此外，我们发现了模型卡片报告中的关键不一致性：约80%的被分析模型缺乏关于模型、训练和评估过程的详细信息。更有甚者，约88%的模型作者在模型卡片中夸大了其模型的性能。基于这些发现，我们为用户提供了一份指南清单，以帮助其在下游任务中选择优质模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日