We propose a lightweight hybrid approach to clickbait detection that combines OpenAI semantic embeddings with six compact heuristic features capturing stylistic and informational cues. To improve efficiency, embeddings are reduced using PCA and evaluated with XGBoost, GraphSAGE, and GCN classifiers. While the simplified feature design yields slightly lower F1-scores, graph-based models achieve competitive performance with substantially reduced inference time. High ROC--AUC values further indicate strong discrimination capability, supporting reliable detection of clickbait headlines under varying decision thresholds.
翻译:摘要:我们提出了一种轻量级混合方法用于检测点击诱饵,该方法将OpenAI语义嵌入与六个捕捉风格和信息线索的紧凑启发式特征相结合。为提升效率,嵌入通过PCA进行降维,并使用XGBoost、GraphSAGE和GCN分类器进行评估。尽管简化后的特征设计导致F1分数略有下降,但基于图的模型在显著减少推理时间的同时实现了具有竞争力的性能。高ROC–AUC值进一步表明了强大的判别能力,支持在不同决策阈值下可靠地检测点击诱饵标题。