Safe-DS: A Domain Specific Language to Make Data Science Safe - 专知论文

会员服务 ·

0

DirectShow · Python · 数据科学 · 静态检测 · 捕获 ·

2023 年 4 月 7 日

Safe-DS: A Domain Specific Language to Make Data Science Safe

翻译：Safe-DS：保障数据科学安全的领域特定语言

Lars Reimann,Günter Kniesel-Wünsche

from arxiv, Accepted for the NIER Track of the 45th International Conference on Software Engineering (ICSE 2023)

Due to the long runtime of Data Science (DS) pipelines, even small programming mistakes can be very costly, if they are not detected statically. However, even basic static type checking of DS pipelines is difficult because most are written in Python. Static typing is available in Python only via external linters. These require static type annotations for parameters or results of functions, which many DS libraries do not provide. In this paper, we show how the wealth of Python DS libraries can be used in a statically safe way via Safe-DS, a domain specific language (DSL) for DS. Safe-DS catches conventional type errors plus errors related to range restrictions, data manipulation, and call order of functions, going well beyond the abilities of current Python linters. Python libraries are integrated into Safe-DS via a stub language for specifying the interface of its declarations, and an API-Editor that is able to extract type information from the code and documentation of Python libraries, and automatically generate suitable stubs. Moreover, Safe-DS complements textual DS pipelines with a graphical representation that eases safe development by preventing syntax errors. The seamless synchronization of textual and graphic view lets developers always choose the one best suited for their skills and current task. We think that Safe-DS can make DS development easier, faster, and more reliable, significantly reducing development costs.

翻译：由于数据科学管线的运行时间长，即使是微小的编程错误若未能被静态检测，也可能导致高昂成本。然而，数据科学管线大多使用Python编写，因此对其实现基本的静态类型检查本身就存在困难。Python仅能通过外部链接器实现静态类型，但这些需要函数参数或结果的静态类型注解，而许多数据科学库并未提供此类注解。本文展示了如何通过面向数据科学的领域特定语言Safe-DS，以静态安全的方式利用丰富的Python数据科学库。Safe-DS不仅能捕获常规类型错误，还能检测与范围限制、数据操作及函数调用顺序相关的错误，其能力远超现有Python链接器。Safe-DS通过存根语言定义声明的接口，并借助API编辑器从Python库的代码和文档中提取类型信息、自动生成合适的存根，从而实现对Python库的集成。此外，Safe-DS用图形化表示补充文本式数据科学管线，通过防止语法错误来提升开发安全性。文本视图与图形视图的无缝同步使开发者能够根据自身技能和当前任务灵活选择最适模式。我们认为Safe-DS能够使数据科学开发更简单、更快速、更可靠，显著降低开发成本。

0

相关内容

DirectShow

DirectShow是一种由微软公司开发的能够让软件开发者对媒体文件执行各种不同处理的应用程序设计接口。

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知会员服务

257+阅读 · 2022年8月31日

【2022新书】Python DevOps，245页pdf

【2022新书】Python DevOps，245页pdf

专知会员服务

91+阅读 · 2022年7月11日

【干货书】John Wiley & Sons, Inc. 《Blockchain For Dummies（区块链傻瓜书），237页pdf

【干货书】John Wiley & Sons, Inc. 《Blockchain For Dummies（区块链傻瓜书），237页pdf

专知会员服务

36+阅读 · 2022年2月22日

【2022新书】TypeScript编程，使你的JavaScript应用程序规模化，324页pdf

【2022新书】TypeScript编程，使你的JavaScript应用程序规模化，324页pdf

专知会员服务

77+阅读 · 2022年2月5日

【实用书】Python数据分析手册，437页pdf带你实战数据清洗

【实用书】Python数据分析手册，437页pdf带你实战数据清洗

专知会员服务

78+阅读 · 2021年8月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

专知会员服务

76+阅读 · 2020年7月12日

【2020新书】实战R语言4，323页pdf

【2020新书】实战R语言4，323页pdf

专知会员服务

103+阅读 · 2020年7月1日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

不喜欢 D 和 C++，程序员将 58000 行代码移植到 Jai 语言？

不喜欢 D 和 C++，程序员将 58000 行代码移植到 Jai 语言？

CSDN

0+阅读 · 2022年11月30日

10 个数据分析师必须知道的 SQL 查询语法

10 个数据分析师必须知道的 SQL 查询语法

CSDN

0+阅读 · 2022年9月13日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知

19+阅读 · 2022年8月31日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Sreg 一款社工小工具

Sreg 一款社工小工具

黑白之道

13+阅读 · 2019年8月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于捕获/重放机制的客户端JavaScript应用调试与分析研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于本体论的海洋流动场时空数据建模与可视化

国家自然科学基金

0+阅读 · 2014年12月31日

意大利蜜蜂级型分化关键基因Dnmt3启动子的分析及其上游转录调控因子的鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

云计算环境下基于运行时模型的管理复用关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于上转换荧光纳米粒子表面核酸等温延伸microRNA信息检测技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于CSSCI的句法级汉英平行语料库构建及知识挖掘研究

国家自然科学基金

0+阅读 · 2013年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

二阶逻辑的表达能力与计算复杂性

国家自然科学基金

0+阅读 · 2009年12月31日

语言环境下群体共识过程的优化研究

国家自然科学基金

0+阅读 · 2008年12月31日

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Arxiv

0+阅读 · 2023年5月26日

A Methodology and Software Architecture to Support Explainability-by-Design

Arxiv

0+阅读 · 2023年5月25日

Demystifying Privacy Policy of Third-Party Libraries in Mobile Apps

Arxiv

0+阅读 · 2023年5月25日

From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads

Arxiv

0+阅读 · 2023年5月24日

Gorilla: Large Language Model Connected with Massive APIs

Arxiv

1+阅读 · 2023年5月24日

Recent Advancements in Deep Learning Applications and Methods for Autonomous Navigation: A Comprehensive Review

Arxiv

0+阅读 · 2023年5月23日

Invariant Information Bottleneck for Domain Generalization

Invariant Information Bottleneck for Domain Generalization

Arxiv

15+阅读 · 2021年12月10日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A Survey of the State of Explainable AI for Natural Language Processing

Arxiv

26+阅读 · 2020年10月1日

Directions for Explainable Knowledge-Enabled Systems

Directions for Explainable Knowledge-Enabled Systems

Arxiv

26+阅读 · 2020年3月17日

VIP会员

文章信息

相关主题

最新内容

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

专知会员服务

0+阅读 · 今天15:52

《边缘端实时无线感知赋能现场多机器人部署》200页

《边缘端实时无线感知赋能现场多机器人部署》200页

专知会员服务

2+阅读 · 今天15:32

战力倍增器：自主武器系统与乌克兰及加沙冲突

战力倍增器：自主武器系统与乌克兰及加沙冲突

专知会员服务

1+阅读 · 今天15:24

人工智能赋能战场情报：提速决策进程

人工智能赋能战场情报：提速决策进程

专知会员服务

0+阅读 · 今天15:15

《拥抱新兴技术：面向未来军官的教育革新》

《拥抱新兴技术：面向未来军官的教育革新》

专知会员服务

2+阅读 · 今天15:11

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

ACM MM 2026 | MAR-GRPO：稳定混合图像生成的强化学习训练

专知会员服务

0+阅读 · 今天14:43

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

综述 | 大模型水印理论与部署：来源追踪、攻击鲁棒与可信治理

专知会员服务

0+阅读 · 今天14:40

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

《火线上的后勤保障：对抗环境下的随机规划模型研究——俄乌场景案例分析》99页

专知会员服务

11+阅读 · 7月16日

《无人地面战车（UGV）的崛起》报告

《无人地面战车（UGV）的崛起》报告

专知会员服务

7+阅读 · 7月16日

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

《无人机参数化与集群飞行创新项目的监控流程管理：模型、策略及自适应解决方案》

专知会员服务

6+阅读 · 7月16日

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

《美军开放式任务系统（OMS）定义与文档（D&D）——Java关键抽象层（CAL）接口生成规范》47页标准

专知会员服务

12+阅读 · 7月16日

美陆军任务式指挥人工智能解决方案

美陆军任务式指挥人工智能解决方案

专知会员服务

11+阅读 · 7月16日

ICML 2026 | 理论级自动形式化：从孤立命题到统一形式化知识库

ICML 2026 | 理论级自动形式化：从孤立命题到统一形式化知识库

专知会员服务

8+阅读 · 7月16日

综述 | 现代智能体自我改进，从模型更新到脚手架演化

综述 | 现代智能体自我改进，从模型更新到脚手架演化

专知会员服务

14+阅读 · 7月16日

美国陆军宣布“项目融合-顶点6”：现代化进程的关键里程碑

美国陆军宣布“项目融合-顶点6”：现代化进程的关键里程碑

专知会员服务

13+阅读 · 7月15日

相关VIP内容

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知会员服务

257+阅读 · 2022年8月31日

【2022新书】Python DevOps，245页pdf

【2022新书】Python DevOps，245页pdf

专知会员服务

91+阅读 · 2022年7月11日

【干货书】John Wiley & Sons, Inc. 《Blockchain For Dummies（区块链傻瓜书），237页pdf

【干货书】John Wiley & Sons, Inc. 《Blockchain For Dummies（区块链傻瓜书），237页pdf

专知会员服务

36+阅读 · 2022年2月22日

【2022新书】TypeScript编程，使你的JavaScript应用程序规模化，324页pdf

【2022新书】TypeScript编程，使你的JavaScript应用程序规模化，324页pdf

专知会员服务

77+阅读 · 2022年2月5日

【实用书】Python数据分析手册，437页pdf带你实战数据清洗

【实用书】Python数据分析手册，437页pdf带你实战数据清洗

专知会员服务

78+阅读 · 2021年8月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

【实用书】Python编程与解决问题，424页pdf，PROGRAMMING AND PROBLEM SOLVING WITH PYTHON

专知会员服务

76+阅读 · 2020年7月12日

【2020新书】实战R语言4，323页pdf

【2020新书】实战R语言4，323页pdf

专知会员服务

103+阅读 · 2020年7月1日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《边缘端实时无线感知赋能现场多机器人部署》200页

人工智能赋能战场情报：提速决策进程

DARPA拟打造十万规模自主思考作战的AI智能体集群：“受控涌现式分布式人工智能”（DICE）项目

战力倍增器：自主武器系统与乌克兰及加沙冲突

相关资讯

不喜欢 D 和 C++，程序员将 58000 行代码移植到 Jai 语言？

不喜欢 D 和 C++，程序员将 58000 行代码移植到 Jai 语言？

CSDN

0+阅读 · 2022年11月30日

10 个数据分析师必须知道的 SQL 查询语法

10 个数据分析师必须知道的 SQL 查询语法

CSDN

0+阅读 · 2022年9月13日

【2022新书】Python数据分析第三版，579页pdf

【2022新书】Python数据分析第三版，579页pdf

专知

19+阅读 · 2022年8月31日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Sreg 一款社工小工具

Sreg 一款社工小工具

黑白之道

13+阅读 · 2019年8月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

相关论文

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Arxiv

0+阅读 · 2023年5月26日

A Methodology and Software Architecture to Support Explainability-by-Design

Arxiv

0+阅读 · 2023年5月25日

Demystifying Privacy Policy of Third-Party Libraries in Mobile Apps

Arxiv

0+阅读 · 2023年5月25日

From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads

Arxiv

0+阅读 · 2023年5月24日

Gorilla: Large Language Model Connected with Massive APIs

Arxiv

1+阅读 · 2023年5月24日

Recent Advancements in Deep Learning Applications and Methods for Autonomous Navigation: A Comprehensive Review

Arxiv

0+阅读 · 2023年5月23日

Invariant Information Bottleneck for Domain Generalization

Invariant Information Bottleneck for Domain Generalization

Arxiv

15+阅读 · 2021年12月10日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A Survey of the State of Explainable AI for Natural Language Processing

Arxiv

26+阅读 · 2020年10月1日

Directions for Explainable Knowledge-Enabled Systems

Directions for Explainable Knowledge-Enabled Systems

Arxiv

26+阅读 · 2020年3月17日

相关基金

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于捕获/重放机制的客户端JavaScript应用调试与分析研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于本体论的海洋流动场时空数据建模与可视化

国家自然科学基金

0+阅读 · 2014年12月31日

意大利蜜蜂级型分化关键基因Dnmt3启动子的分析及其上游转录调控因子的鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

云计算环境下基于运行时模型的管理复用关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于上转换荧光纳米粒子表面核酸等温延伸microRNA信息检测技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于CSSCI的句法级汉英平行语料库构建及知识挖掘研究

国家自然科学基金

0+阅读 · 2013年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

二阶逻辑的表达能力与计算复杂性

国家自然科学基金

0+阅读 · 2009年12月31日

语言环境下群体共识过程的优化研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员