Training and inference on edge devices often requires an efficient setup due to computational limitations. While pre-computing data representations and caching them on a server can mitigate extensive edge device computation, this leads to two challenges. First, the amount of storage required on the server that scales linearly with the number of instances. Second, the bandwidth required to send extensively large amounts of data to an edge device. To reduce the memory footprint of pre-computed data representations, we propose a simple, yet effective approach that uses randomly initialized hyperplane projections. To further reduce their size by up to 98.96%, we quantize the resulting floating-point representations into binary vectors. Despite the greatly reduced size, we show that the embeddings remain effective for training models across various English and German sentence classification tasks that retain 94%--99% of their floating-point.
翻译:在边缘设备上进行训练和推理通常需要高效的设置,因为计算资源有限。虽然预先计算数据表示并将其缓存在服务器上可以减轻边缘设备的大量计算负担,但这会带来两个挑战。首先,服务器所需的存储量会随实例数量线性增加。其次,将大量数据传输到边缘设备需要高带宽。为了减少预计算数据表示的内存占用,我们提出一种简单但有效的方法,该方法使用随机初始化的超平面投影。为了进一步将数据大小缩减多达98.96%,我们将得到的浮点表示量化为二进制向量。尽管大小大幅减小,但我们证明这些嵌入仍然能够有效训练模型,在多种英语和德语句子分类任务中,保留其浮点表示性能94%至99%。