Web measurements are a well-established methodology for assessing the security and privacy landscape of the Internet. However, existing top lists of popular websites are unlabeled and lack semantic information about the nature of the included websites, making targeted web measurements challenging, as researchers often rely on ad-hoc techniques to bias datasets toward specific website classes of interest. In this paper, we investigate the use of Large Language Models (LLMs) to enable targeted web measurement studies. Building on prior literature, we identify key website classification tasks relevant to web measurements and highlight limitations in state-of-the-art classification approaches. We construct carefully curated datasets to evaluate different LLMs on these tasks. Our results show that LLMs can achieve strong performance across multiple classification scenarios, but the choice of model and configuration plays a significant role. Motivated by the observed trade-off between classification accuracy and computational efficiency, we propose a practical two-step methodology for scalable targeted web measurements starting from the Tranco list. Finally, we conduct LLM-assisted web measurement studies inspired by prior work using our methodology and assess the validity of the resulting research inferences, showing that LLMs can effectively enable targeted measurements of security and privacy trends on the Web.
翻译:暂无翻译