We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. Docling is released as a Python package and can be used as a Python API or as a CLI tool. Docling's modular architecture and efficient document representation %, known as DoclingDocument, make it easy to implement extensions, new features, models, and customizations. Docling has been already integrated in other popular open-source frameworks (e.g., LlamaIndex, LangChain, spaCy), making it a natural fit for the processing of documents and the development of high-end applications. The open-source community has fully engaged in using, promoting, and developing for Docling, which gathered 10k stars on GitHub in less than a month and was reported as the No. 1 trending repository in GitHub worldwide in November 2024.
翻译:本文介绍Docling,一个易于使用、自包含、MIT许可的开源文档转换工具包,能够将多种流行文档格式解析为统一且结构丰富的表示形式。该工具包由用于版面分析(DocLayNet)和表格结构识别(TableFormer)的先进专用AI模型驱动,可在消费级硬件上以较低资源开销高效运行。Docling以Python包形式发布,可作为Python API或命令行工具使用。其模块化架构与高效的文档表示(称为DoclingDocument)使得扩展功能、新增特性、集成模型和定制化开发变得简便。Docling已集成至其他主流开源框架(如LlamaIndex、LangChain、spaCy),天然适用于文档处理与高端应用开发。开源社区已全面投入Docling的使用、推广与开发,其在GitHub上线不足一月即获万星标,并于2024年11月被列为全球GitHub趋势榜单首位。