Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks, introducing novel system intervention measures aimed at improving classification performance. Our methodology outperforms LLM-generated labels in seven of eight tests, demonstrating an effective strategy for incorporating LLMs into the design and deployment of specialized, supervised learning models present in many industry use cases.
翻译:大型语言模型(LLM)增强了我们快速分析和分类非结构化自然语言数据的能力。然而,成本、网络限制和安全约束方面的担忧对其融入工作流程构成了挑战。在本研究中,我们采用一种系统设计方法,将LLM用作下游监督学习任务的不完美数据标注器,并引入了旨在提升分类性能的新型系统干预措施。我们的方法在八项测试中的七项超越了LLM生成的标签,展示了一种将LLM整合到许多行业用例中存在的专用监督学习模型设计与部署中的有效策略。