Understanding the geographic reach and community structure of one's scholarly citations is increasingly valuable for career development, grant applications, and collaboration discovery -- yet accessible tools for answering these questions remain scarce. Existing bibliometric platforms either require costly institutional subscriptions or expose only aggregate citation counts without granular per-author metadata. We present CiteRadar, an open-source system that accepts a single Google Scholar user identifier and automatically produces a structured output folder containing: the author's complete publication list, all retrieved citing papers with enriched author metadata, two ranked author tables (by citation frequency and by h-index), a plain-text statistical summary, and a self-contained interactive HTML world map -- all from a single command-line invocation. CiteRadar integrates five heterogeneous data sources -- Google Scholar, OpenAlex, CrossRef, Semantic Scholar, and OpenStreetMap Nominatim -- through a carefully engineered five-stage pipeline. Key technical contributions include: (1) a Scholar meta-string parser resilient to Unicode non-breaking-space separators, a pervasive but undocumented quirk in Scholar's HTML that silently corrupts venue and year fields when unhandled; (2) a two-stage author disambiguation system using stop-word-filtered institution name similarity to guard against the well-known same-name entity-merging failure mode in bibliometric databases, demonstrated to eliminate h-index attribution errors of up to 9x the correct value; (3) an OpenAlex web-URL to API-URL conversion fix that raises the fraction of author records with city-level location data from 0% to ~60%; and (4) a logarithmically-scaled interactive Folium world map with per-city researcher popups, rendered as a fully self-contained HTML file.
翻译:理解个人学术引文的地理覆盖范围和社群结构,对于职业发展、基金申请及合作发现愈发重要——然而,目前仍缺乏便捷工具来解答这些关键问题。现有文献计量平台要么需要昂贵的机构订阅,要么仅提供汇总的引用次数,而无法获取个体作者的细粒度元数据。本文提出CiteRadar,一个开源系统,只需输入单个Google Scholar用户标识符,即可自动生成结构化的输出文件夹,包含:该研究者的完整论文列表、所有检索到的施引文献及其丰富的作者元数据、两个作者排名表(按施引频次和h指数排序)、纯文本统计摘要,以及一个自包含的交互式HTML世界地图——所有操作仅需单次命令行调用即可完成。CiteRadar集成了五个异构数据源——Google Scholar、OpenAlex、CrossRef、Semantic Scholar和OpenStreetMap Nominatim——通过精心设计的五阶段流水线实现。关键技术贡献包括:(1)一种Scholar元字符串解析器,可鲁棒处理Unicode非断空格分隔符——这是Scholar HTML中普遍存在但未被文档记录的怪癖,若不加处理会导致会议/年份字段被静默破坏;(2)一种两阶段作者消歧系统,利用停用词过滤后的机构名称相似度,防范文献计量数据库中知名的同名实体合并失效模式,实验表明可消除h指数归属误差(误归因值最高可达正确值的9倍);(3)一种OpenAlex网址到API网址的转换修复方法,将含城市级位置数据的作者记录比例从0%提升至约60%;(4)一个基于对数尺度、每城市研究者弹窗交互的Folium世界地图,以完全自包含的HTML文件形式呈现。