Nanopores are versatile single-molecular sensors, but their utility is fundamentally constrained by stochastic translocation dynamics warping any encoded information. We resolve it by shifting from time-domain analysis to a learned latent-space mapping via a contrastive encoder trained exclusively on simulated signals from a physics-informed model. This encoder maps solid-state nanopore signals of engineered DNA barcodes into an interpretable molecular coordinate system. The learned representation is responsive to structural barcode parameters while remaining invariant to acquisition conditions and translocation conformation, allowing data pooling across devices. Molecule identification requires a single pass through the encoder, reducing computational cost by three orders of magnitude relative to alignment-based methods. We experimentally validate through mixture quantification, rare-variant detection, consensus barcode reconstruction, and real-time signal acquisition. This shift from temporal analysis to mapping structural coordinates into a latent space changes the paradigm behind analyzing stochastic sensor signals by linking classification to interpretable encoded molecular information.
翻译:纳米孔是一种通用的单分子传感器,但其实用性从根本上受限于随机转运动力学对编码信息的扭曲。我们通过从时域分析转向基于对比编码器的学习潜空间映射来解决这一问题,该编码器专门利用物理信息模型生成的模拟信号进行训练。该编码器将工程化DNA条形码的固态纳米孔信号映射为可解释的分子坐标系统。学习到的表征对条形码结构参数敏感,同时不受采集条件和转运构象的影响,从而允许跨设备的数据池化。分子识别仅需单次编码器前向传播,相较于基于比对的方法,计算成本降低了三个数量级。我们通过混合物定量、稀有变异检测、一致条形码重建和实时信号采集进行了实验验证。这种从时间分析到结构坐标潜空间映射的转变,通过将分类与可解释的编码分子信息相连接,改变了分析随机传感器信号的范式。