Venture capital (VC) investments in early-stage startups that end up being successful can yield high returns. However, predicting early-stage startup success remains challenging due to data scarcity (e.g., many VC firms have information about only a few dozen of early-stage startups and whether they were successful). This limits the effectiveness of traditional machine learning methods that rely on large labeled datasets for model training. To address this challenge, we propose an in-context learning framework for startup success prediction using large language models (LLMs) that requires no model training and leverages only a small set of labeled startups as demonstration examples. Specifically, we propose a novel k-nearest-neighbor-based in-context learning framework, called kNN-ICL, which selects the most relevant past startups as examples based on similarity. Using real-world profiles from Crunchbase, we find that the kNN-ICL approach achieves higher prediction accuracy than supervised machine learning baselines and vanilla in-context learning. Further, we study how performance varies with the number of in-context examples and find that a high balanced accuracy can be achieved with as few as 50 examples. Together, we demonstrate that in-context learning can serve as a decision-making tool for VC firms operating in data-scarce environments.
翻译:对最终获得成功的早期初创企业进行风险投资(VC)可能带来高额回报。然而,由于数据稀缺(例如,许多风险投资公司仅掌握几十家早期初创企业的信息及其成功与否),预测早期初创企业的成功仍然具有挑战性。这限制了依赖大规模标注数据集进行模型训练的传统机器学习方法的有效性。为应对这一挑战,我们提出了一种基于大型语言模型(LLMs)的上下文学习框架,用于初创企业成功预测,该框架无需模型训练,仅利用少量标注的初创企业作为演示示例。具体而言,我们提出了一种新颖的基于k近邻的上下文学习框架,称为kNN-ICL,它根据相似性选择最相关的过往初创企业作为示例。利用来自Crunchbase的真实企业资料,我们发现kNN-ICL方法比监督机器学习基线方法和朴素上下文学习实现了更高的预测准确率。此外,我们研究了性能如何随上下文示例数量的变化,并发现仅需50个示例即可实现较高的平衡准确率。综上所述,我们证明了上下文学习可以作为在数据稀缺环境中运营的风险投资公司的决策工具。