Revealing the Black Box of Device Search Engine: Scanning Assets, Strategies, and Ethical Consideration

In the digital age, device search engines such as Censys and Shodan play crucial roles by scanning the internet to catalog online devices, aiding in the understanding and mitigation of network security risks. While previous research has used these tools to detect devices and assess vulnerabilities, there remains uncertainty regarding the assets they scan, the strategies they employ, and whether they adhere to ethical guidelines. This study presents the first comprehensive examination of these engines' operational and ethical dimensions. We developed a novel framework to trace the IP addresses utilized by these engines and collected 1,407 scanner IPs. By uncovering their IPs, we gain deep insights into the actions of device search engines for the first time and gain original findings. By employing 28 honeypots to monitor their scanning activities extensively in one year, we demonstrate that users can hardly evade scans by blocklisting scanner IPs or migrating service ports. Our findings reveal significant ethical concerns, including a lack of transparency, harmlessness, and anonymity. Notably, these engines often fail to provide transparency and do not allow users to opt out of scans. Further, the engines send malformed requests, attempt to access excessive details without authorization, and even publish personally identifiable information (PII) and screenshots on search results. These practices compromise user privacy and expose devices to further risks by potentially aiding malicious entities. This paper emphasizes the urgent need for stricter ethical standards and enhanced transparency in the operations of device search engines, offering crucial insights into safeguarding against invasive scanning practices and protecting digital infrastructures.

翻译：在数字时代，设备搜索引擎（如Censys和Shodan）通过扫描互联网以编录在线设备，在理解和缓解网络安全风险方面发挥着关键作用。尽管先前研究已利用这些工具检测设备并评估漏洞，但关于其扫描的资产、采用的策略以及是否遵循伦理准则仍存在不确定性。本研究首次对这些引擎的运营与伦理维度进行全面考察。我们开发了一种新颖框架来追踪这些引擎使用的IP地址，并收集了1,407个扫描器IP。通过揭示其IP地址，我们首次深入洞察设备搜索引擎的行为，并获得了原创性发现。通过部署28个蜜罐对其扫描活动进行为期一年的广泛监测，我们证明用户难以通过屏蔽扫描器IP或迁移服务端口来规避扫描。我们的研究结果揭示了显著的伦理问题，包括缺乏透明度、无害性和匿名性。值得注意的是，这些引擎通常未能提供透明度，且不允许用户选择退出扫描。此外，这些引擎会发送畸形请求、试图未经授权获取过多细节，甚至在搜索结果中发布个人可识别信息（PII）和屏幕截图。这些做法损害了用户隐私，并可能通过协助恶意实体使设备暴露于进一步风险之中。本文强调亟需在设备搜索引擎的运营中建立更严格的伦理标准和增强透明度，为防范侵入性扫描实践和保护数字基础设施提供了重要见解。