Cross-fitting is a key ingredient in many semiparametric estimation procedures, such as double/debiased machine learning (DML), enabling valid estimation of low-dimensional targets in the presence of high-dimensional nuisance functions by enforcing out-of-sample use of nuisance predictions. crossfit is an R package that provides a general-purpose, estimator-agnostic cross-fitting engine. Users specify (i) a target functional and (ii) a directed acyclic graph (DAG) of nuisance models, with node-specific training fold widths and target-specific evaluation windows. The engine executes a reproducible schedule over folds, panels, and repetitions, returning either a scalar estimate (mode="estimate") or a cross-fitted predictor function for application to new data (mode="predict"). Beyond standard cross-fitting, crossfit implements fold-allocation modes that control how training data are shared across nuisance components, including disjoint and independence-enforcing allocations that duplicate reused nodes to reduce dependence between nuisance branches. The implementation targets simulation-heavy benchmarking and method development, with explicit and auditable schedules, defensive validation of specifications and nuisance dependencies, reuse-aware caching to avoid redundant refits, and failure isolation policies for large experiment grids. The crossfit package is available on CRAN, openly developed on GitHub under GPL-3, and is intended as a lightweight, tested foundation to prototype and empirically evaluate cross-fitted estimators with explicit control over fold geometry, dependence, and computation.
翻译:交叉拟合是许多半参数估计程序(如双重/去偏机器学习(DML))的关键组成部分,通过强制对干扰预测进行样本外使用,从而在高维干扰函数存在的情况下实现对低维目标的有效估计。crossfit是一个R语言包,提供了一个通用、与估计器无关的交叉拟合引擎。用户需指定(i)目标泛函和(ii)干扰模型的有向无环图(DAG),并包含节点特定的训练折叠宽度以及目标特定的评估窗口。该引擎在折叠、面板和重复上执行可复现的调度,返回标量估计值(mode="estimate")或适用于新数据的交叉拟合预测函数(mode="predict")。除了标准交叉拟合外,crossfit还实现了折叠分配模式,这些模式控制训练数据如何在干扰组件间共享,包括不相交和强制独立性的分配方式,通过复制重用节点来减少干扰分支间的依赖关系。该实现面向基于仿真的密集基准测试和方法开发,具备明确且可审计的调度、对规范和干扰依赖关系的防御性验证、避免冗余重拟合的重用感知缓存,以及针对大规模实验网格的故障隔离策略。crossfit包已在CRAN上发布,在GitHub上以GPL-3许可协议公开开发,旨在作为一个轻量级、经过测试的基础框架,用于原型设计和实证评估交叉拟合估计器,并显式控制折叠几何、依赖关系和计算。