Big data applications are on the rise, and so is the number of data centers. The ever-increasing massive data pool needs to be periodically backed up in a secure environment. Moreover, a massive amount of securely backed-up data is required for training binary convolutional neural networks for image classification. XOR and XNOR operations are essential for large-scale data copy verification, encryption, and classification algorithms. The disproportionate speed of existing compute and memory units makes the von Neumann architecture inefficient to perform these Boolean operations. Compute-in-memory (CiM) has proved to be an optimum approach for such bulk computations. The existing CiM-based XOR/XNOR techniques either require multiple cycles for computing or add to the complexity of the fabrication process. Here, we propose a CMOS-based hardware topology for single-cycle in-memory XOR/XNOR operations. Our design provides at least 2 times improvement in the latency compared with other existing CMOS-compatible solutions. We verify the proposed system through circuit/system-level simulations and evaluate its robustness using a 5000-point Monte Carlo variation analysis. This all-CMOS design paves the way for practical implementation of CiM XOR/XNOR at scaled technology nodes.
翻译:大数据应用日益增长,随之而来的是数据中心数量的激增。不断扩大的海量数据池需要定期在安全环境中进行备份。此外,训练用于图像分类的二值卷积神经网络也需要大量安全备份的数据。XOR和XNOR运算对于大规模数据副本验证、加密和分类算法至关重要。现有计算单元与存储单元速度不匹配,使得冯·诺依曼架构在执行这些布尔运算时效率低下。内存内计算(CiM)已被证明是此类批量计算的最优方法。现有的基于CiM的XOR/XNOR技术要么需要多个时钟周期完成计算,要么增加了制造工艺的复杂度。本文提出了一种基于CMOS的硬件拓扑结构,用于实现单周期内存内XOR/XNOR运算。与现有其他兼容CMOS的解决方案相比,我们的设计在延迟方面至少提升了2倍。我们通过电路/系统级仿真验证了所提出的系统,并采用5000点蒙特卡洛变化分析评估了其鲁棒性。这种全CMOS设计为在纳米级工艺节点上实际实现CiM XOR/XNOR铺平了道路。