This paper investigates the performance of 16 Large Language Models (LLMs) in automating LoRaWAN-related engineering tasks involving optimal placement of drones and received power calculation under progressively complex zero-shot, natural language prompts. The primary research question is whether lightweight, locally executed LLMs can generate correct Python code for these tasks. To assess this, we compared locally run models against state-of-the-art alternatives, such as GPT-4 and DeepSeek-V3, which served as reference points. By extracting and executing the Python functions generated by each model, we evaluated their outputs on a zero-to-five scale. Results show that while DeepSeek-V3 and GPT-4 consistently provided accurate solutions, certain smaller models-particularly Phi-4 and LLaMA-3.3-also demonstrated strong performance, underscoring the viability of lightweight alternatives. Other models exhibited errors stemming from incomplete understanding or syntactic issues. These findings illustrate the potential of LLM-based approaches for specialized engineering applications while highlighting the need for careful model selection, rigorous prompt design, and targeted domain fine-tuning to achieve reliable outcomes.
翻译:本文研究了 16 种大型语言模型在自动化 LoRaWAN 相关工程任务中的性能,这些任务涉及无人机最优部署与接收功率计算,并采用逐步复杂的零样本自然语言提示。核心研究问题是轻量级、本地运行的 LLM 能否为这些任务生成正确的 Python 代码。为评估此问题,我们将本地运行的模型与作为参考基准的先进替代方案(如 GPT-4 和 DeepSeek-V3)进行了比较。通过提取并执行各模型生成的 Python 函数,我们以 0 至 5 分制对其输出进行了评估。结果表明,虽然 DeepSeek-V3 和 GPT-4 始终能提供准确解,但某些较小模型——尤其是 Phi-4 和 LLaMA-3.3——同样表现出色,这凸显了轻量级替代方案的可行性。其他模型则因理解不完整或句法问题而出现错误。这些发现说明了基于 LLM 的方法在专业工程应用中的潜力,同时强调了为实现可靠结果,需要谨慎选择模型、严格设计提示并进行针对性领域微调。