Many current applications have to perform data processing in a streaming fashion. Doing so at a large scale requires a parallel system that must be equipped to handle straggling workers and different kinds of failures. YT is the main driver behind distributed systems at Yandex, home to its distributed file system, lock service, key-value storage, and internal MapReduce platform. We implement a new component of this system designed for performing streaming MapReduce operations, utilizing different core YT solutions to achieve fault-tolerance and exactly-once semantics while maintaining efficiency and low write amplification factors.
翻译:当前许多应用需要以流式方式进行数据处理。大规模执行此类处理需要一套并行系统,该系统必须具备处理滞后worker及各类故障的能力。YT是Yandex分布式系统背后的核心驱动力,承载着其分布式文件系统、锁服务、键值存储及内部MapReduce平台。我们为该系统实现了一个新组件,专用于执行流式MapReduce操作,通过利用YT的核心解决方案在保持高效性与低写入放大因子的同时,实现了容错和精确一次语义。