Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Yoel Zimmermann,Adib Bazgir,Zartashia Afzal,Fariha Agbere,Qianxiang Ai,Nawaf Alampara,Alexander Al-Feghali,Mehrad Ansari,Dmytro Antypov,Amro Aswad,Jiaru Bai,Viktoriia Baibakova,Devi Dutta Biswajeet,Erik Bitzek,Joshua D. Bocarsly,Anna Borisova,Andres M Bran,L. Catherine Brinson,Marcel Moran Calderon,Alessandro Canalicchio,Victor Chen,Yuan Chiang,Defne Circi,Benjamin Charmes,Vikrant Chaudhary,Zizhang Chen,Min-Hsueh Chiu,Judith Clymo,Kedar Dabhadkar,Nathan Daelman,Archit Datar,Wibe A. de Jong,Matthew L. Evans,Maryam Ghazizade Fard,Giuseppe Fisicaro,Abhijeet Sadashiv Gangan,Janine George,Jose D. Cojal Gonzalez,Michael Götte,Ankur K. Gupta,Hassan Harb,Pengyu Hong,Abdelrahman Ibrahim,Ahmed Ilyas,Alishba Imran,Kevin Ishimwe,Ramsey Issa,Kevin Maik Jablonka,Colin Jones,Tyler R. Josephson,Greg Juhasz,Sarthak Kapoor,Rongda Kang,Ghazal Khalighinejad,Sartaaj Khan,Sascha Klawohn,Suneel Kuman,Alvin Noe Ladines,Sarom Leang,Magdalena Lederbauer, Sheng-Lun, Liao,Hao Liu,Xuefeng Liu,Stanley Lo,Sandeep Madireddy,Piyush Ranjan Maharana,Shagun Maheshwari,Soroush Mahjoubi,José A. Márquez,Rob Mills,Trupti Mohanty,Bernadette Mohr,Seyed Mohamad Moosavi,Alexander Moßhammer,Amirhossein D. Naghdi,Aakash Naik,Oleksandr Narykov,Hampus Näsström,Xuan Vu Nguyen,Xinyi Ni,Dana O'Connor,Teslim Olayiwola,Federico Ottomano,Aleyna Beste Ozhan,Sebastian Pagel,Chiku Parida,Jaehee Park,Vraj Patel,Elena Patyukova,Martin Hoffmann Petersen,Luis Pinto,José M. Pizarro,Dieter Plessers,Tapashree Pradhan,Utkarsh Pratiush,Charishma Puli,Andrew Qin,Mahyar Rajabi,Francesco Ricci,Elliot Risch,Martiño Ríos-García,Aritra Roy,Tehseen Rug,Hasan M Sayeed,Markus Scheidgen,Mara Schilling-Wilhelmi,Marcel Schloz,Fabian Schöppach,Julia Schumann,Philippe Schwaller,Marcus Schwarting,Samiha Sharlin,Kevin Shen,Jiale Shi,Pradip Si,Jennifer D'Souza,Taylor Sparks,Suraj Sudhakar,Leopold Talirz,Dandan Tang,Olga Taran,Carla Terboven,Mark Tropin,Anastasiia Tsymbal,Katharina Ueltzen,Pablo Andres Unzueta,Archit Vasan,Tirtha Vinchurkar,Trung Vo,Gabriel Vogel,Christoph Völker,Jan Weinreich,Faradawn Yang,Mohd Zaki,Chi Zhang,Sylvester Zhang,Weijie Zhang,Ruijie Zhu,Shang Zhu,Jan Janssen,Calvin Li,Ian Foster,Ben Blaiszik

from arxiv, Updating author information, the submission remains largely unchanged. 98 pages total

Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research.

翻译：本文介绍了第二届材料科学与化学领域大语言模型应用黑客松的成果。该活动在全球多个混合地点展开，共收到34支团队的提交作品。提交内容涵盖七个关键应用领域，展示了大语言模型在以下方面的多样化应用潜力：(1) 分子与材料性质预测；(2) 分子与材料设计；(3) 自动化流程与新型交互界面；(4) 科学传播与教育；(5) 研究数据管理与自动化；(6) 假设生成与验证；(7) 科学文献的知识提取与推理。每支团队的成果均以摘要表格形式呈现，附代码链接，并在附录中以简短论文形式展示。除团队成果外，本文还探讨了此次黑客松的混合组织形式——活动在多伦多、蒙特利尔、旧金山、柏林、洛桑和东京设有实体枢纽，同时设立全球在线枢纽，以支持本地与虚拟协作。总体而言，本次活动突显了相较于去年黑客松，大语言模型能力取得的显著进步，预示着大语言模型在材料科学与化学研究中的应用将持续扩展。这些成果证明了大语言模型具有双重效用：既是适用于多种机器学习任务的多功能模型，也可作为科学研究中快速构建定制应用的原型开发平台。