Venture capital (VC) investments in early-stage startups that end up being successful can yield high returns. However, predicting early-stage startup success remains challenging due to data scarcity (e.g., many VC firms have information about only a few dozen of early-stage startups and whether they were successful). This limits the effectiveness of traditional machine learning methods that rely on large labeled datasets for model training. To address this challenge, we propose an in-context learning framework for startup success prediction using large language models (LLMs) that requires no model training and leverages only a small set of labeled startups as demonstration examples. Specifically, we propose a novel k-nearest-neighbor-based in-context learning framework, called kNN-ICL, which selects the most relevant past startups as examples based on similarity. Using real-world profiles from Crunchbase, we find that the kNN-ICL approach achieves higher prediction accuracy than supervised machine learning baselines and vanilla in-context learning. Further, we study how performance varies with the number of in-context examples and find that a high balanced accuracy can be achieved with as few as 50 examples. Together, we demonstrate that in-context learning can serve as a decision-making tool for VC firms operating in data-scarce environments.
翻译:对最终获得成功的早期初创企业进行风险投资(VC)能够带来高额回报。然而,由于数据稀缺(例如,许多风投机构仅掌握几十家早期初创企业的信息及其是否成功),预测早期初创企业的成功仍然具有挑战性。这限制了依赖大规模标注数据集进行模型训练的传统机器学习方法的有效性。为应对这一挑战,我们提出了一种基于大型语言模型(LLMs)的初创企业成功预测上下文学习框架,该框架无需模型训练,仅利用少量已标注的初创企业作为演示示例。具体而言,我们提出了一种新颖的基于k近邻的上下文学习框架,称为kNN-ICL,它根据相似性选择最相关的过往初创企业作为示例。利用来自Crunchbase的真实企业资料,我们发现kNN-ICL方法比监督式机器学习基线和普通上下文学习获得了更高的预测准确率。此外,我们研究了性能如何随上下文示例数量的变化,并发现仅需50个示例即可实现较高的平衡准确率。综上所述,我们证明了上下文学习可以作为在数据稀缺环境中运营的风投机构的决策工具。