DeepSeek-V3, GPT-4, Phi-4, and LLaMA-3.3 generate correct code for LoRaWAN-related engineering tasks

This paper investigates the performance of 16 Large Language Models (LLMs) in automating LoRaWAN-related engineering tasks involving optimal placement of drones and received power calculation under progressively complex zero-shot, natural language prompts. The primary research question is whether lightweight, locally executed LLMs can generate correct Python code for these tasks. To assess this, we compared locally run models against state-of-the-art alternatives, such as GPT-4 and DeepSeek-V3, which served as reference points. By extracting and executing the Python functions generated by each model, we evaluated their outputs on a zero-to-five scale. Results show that while DeepSeek-V3 and GPT-4 consistently provided accurate solutions, certain smaller models-particularly Phi-4 and LLaMA-3.3-also demonstrated strong performance, underscoring the viability of lightweight alternatives. Other models exhibited errors stemming from incomplete understanding or syntactic issues. These findings illustrate the potential of LLM-based approaches for specialized engineering applications while highlighting the need for careful model selection, rigorous prompt design, and targeted domain fine-tuning to achieve reliable outcomes.

翻译：本文研究了 16 种大型语言模型在自动化 LoRaWAN 相关工程任务中的性能，这些任务涉及无人机最优部署与接收功率计算，并采用逐步复杂的零样本自然语言提示。核心研究问题是轻量级、本地运行的 LLM 能否为这些任务生成正确的 Python 代码。为评估此问题，我们将本地运行的模型与作为参考基准的先进替代方案（如 GPT-4 和 DeepSeek-V3）进行了比较。通过提取并执行各模型生成的 Python 函数，我们以 0 至 5 分制对其输出进行了评估。结果表明，虽然 DeepSeek-V3 和 GPT-4 始终能提供准确解，但某些较小模型——尤其是 Phi-4 和 LLaMA-3.3——同样表现出色，这凸显了轻量级替代方案的可行性。其他模型则因理解不完整或句法问题而出现错误。这些发现说明了基于 LLM 的方法在专业工程应用中的潜力，同时强调了为实现可靠结果，需要谨慎选择模型、严格设计提示并进行针对性领域微调。

相关内容

Engineering

关注 6

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日