We present S+t-SNE, an adaptation of the t-SNE algorithm designed to handle infinite data streams. The core idea behind S+t-SNE is to update the t-SNE embedding incrementally as new data arrives, ensuring scalability and adaptability to handle streaming scenarios. By selecting the most important points at each step, the algorithm ensures scalability while keeping informative visualisations. By employing a blind method for drift management, the algorithm adjusts the embedding space, which facilitates the visualisation of evolving data dynamics. Our experimental evaluations demonstrate the effectiveness and efficiency of S+t-SNE, whilst highlighting its ability to capture patterns in a streaming scenario. We hope our approach offers researchers and practitioners a real-time tool for understanding and interpreting high-dimensional data.
翻译:本文提出了S+t-SNE,一种为处理无限数据流而设计的t-SNE算法改进版本。S+t-SNE的核心思想是在新数据到达时增量式更新t-SNE嵌入,确保其可扩展性并适应流式处理场景。该算法通过在每一步选择最重要的数据点,在保持信息可视化效果的同时保证了可扩展性。通过采用一种盲法进行漂移管理,算法能调整嵌入空间,从而促进演化数据动态的可视化。我们的实验评估证明了S+t-SNE的有效性和效率,同时凸显了其在流式场景中捕捉数据模式的能力。我们希望该方法能为研究人员和实践者提供一个理解和解释高维数据的实时工具。