Improved Naive Bayes with Mislabeled Data - 专知论文

会员服务 ·

0

朴素贝叶斯 · 朴素贝叶斯算法 · 贝叶斯 · 朴素贝叶斯方法 · 贝叶斯方法 ·

2023 年 4 月 13 日

Improved Naive Bayes with Mislabeled Data

翻译：改进的朴素贝叶斯方法处理错误标注数据

Qianhan Zeng,Yingqiu Zhu,Xuening Zhu,Feifei Wang,Weichen Zhao,Shuning Sun,Meng Su,Hansheng Wang

Labeling mistakes are frequently encountered in real-world applications. If not treated well, the labeling mistakes can deteriorate the classification performances of a model seriously. To address this issue, we propose an improved Naive Bayes method for text classification. It is analytically simple and free of subjective judgements on the correct and incorrect labels. By specifying the generating mechanism of incorrect labels, we optimize the corresponding log-likelihood function iteratively by using an EM algorithm. Our simulation and experiment results show that the improved Naive Bayes method greatly improves the performances of the Naive Bayes method with mislabeled data.

翻译：标注错误在实际应用中频繁出现。若未妥善处理，这些标注错误会严重降低模型的分类性能。为解决此问题，我们提出一种用于文本分类的改进朴素贝叶斯方法。该方法在分析上简洁明了，且无需对正确与错误标签进行主观判断。通过指定错误标签的生成机制，我们利用EM算法迭代优化相应的对数似然函数。模拟与实验结果表明，在处理含错误标注数据时，改进的朴素贝叶斯方法大幅提升了传统朴素贝叶斯方法的性能。

0

相关内容

朴素贝叶斯

朴素贝叶斯

朴素贝叶斯法是基于贝叶斯定理与特征条件独立假设的分类方法。对于给定的训练数据集，首先基于“特征条件独立”的假设学习输入/输出的联合概率分布。然后基于此模型，对给定输入x，利用贝叶斯定理求后验概率最大的y。朴素贝叶斯实现简单，学习与预测的效率都很高，是一种常用的方法。

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【机器学习教程】生物导体MLInterfaces包到基因表达数据的应用，applications of the BioconductorMLInterfaces package to gene expression data

【机器学习教程】生物导体MLInterfaces包到基因表达数据的应用，applications of the BioconductorMLInterfaces package to gene expression data

专知会员服务

18+阅读 · 2020年1月11日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

3+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

1+阅读 · 2022年6月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

18+阅读 · 2019年2月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

多重排序数据的整合分析

国家自然科学基金

0+阅读 · 2015年12月31日

天然来源卤酚类高活性衍生物LM49对LPS诱导的血管内皮炎症MAPK信号通路的调控作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

冲积河流过程水沙输移模型不确定性分析及数据同化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于CERES-MAIZE模型降水保险指数研究-以北京夏玉米为例

国家自然科学基金

0+阅读 · 2013年12月31日

生物特征识别中高维数据的统计降维及算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

TAP基因阻遏炎性细胞因子信号通路促前列腺癌的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

量子点/官能团复合体系的界面态发光

国家自然科学基金

0+阅读 · 2009年12月31日

水稻和高粱基因进化的比较基因组学分析

国家自然科学基金

0+阅读 · 2009年12月31日

蛋白质组学技术筛选生物标志物诊断污染土壤的生态毒性

国家自然科学基金

0+阅读 · 2008年12月31日

How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?

Arxiv

0+阅读 · 2023年5月31日

Federated Learning in the Presence of Adversarial Client Unavailability

Arxiv

0+阅读 · 2023年5月31日

Dynamic Factor Models for Binary Data in Circular Spaces: An Application to the U.S. Supreme Court

Arxiv

0+阅读 · 2023年5月30日

Neural Importance Sampling for Rapid and Reliable Gravitational-Wave Inference

Arxiv

0+阅读 · 2023年5月30日

Composite Goodness-of-fit Tests with Kernels

Arxiv

0+阅读 · 2023年5月29日

Bayesian approach to Gaussian process regression with uncertain inputs

Arxiv

0+阅读 · 2023年5月28日

Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach

Arxiv

0+阅读 · 2023年5月26日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Few-shot Learning with Noisy Labels

Arxiv

13+阅读 · 2022年4月12日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

VIP会员

文章信息

相关主题

朴素贝叶斯

朴素贝叶斯算法

朴素贝叶斯方法

贝叶斯方法

最新内容

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

3+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

4+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

5+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

4+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

4+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

4+阅读 · 6月22日

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

8+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

21+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

相关VIP内容

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【机器学习教程】生物导体MLInterfaces包到基因表达数据的应用，applications of the BioconductorMLInterfaces package to gene expression data

【机器学习教程】生物导体MLInterfaces包到基因表达数据的应用，applications of the BioconductorMLInterfaces package to gene expression data

专知会员服务

18+阅读 · 2020年1月11日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

84+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 3D场景图：开放挑战与未来方向

21世纪的无人机战争

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

相关资讯

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

3+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

1+阅读 · 2022年6月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

18+阅读 · 2019年2月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?

Arxiv

0+阅读 · 2023年5月31日

Federated Learning in the Presence of Adversarial Client Unavailability

Arxiv

0+阅读 · 2023年5月31日

Dynamic Factor Models for Binary Data in Circular Spaces: An Application to the U.S. Supreme Court

Arxiv

0+阅读 · 2023年5月30日

Neural Importance Sampling for Rapid and Reliable Gravitational-Wave Inference

Arxiv

0+阅读 · 2023年5月30日

Composite Goodness-of-fit Tests with Kernels

Arxiv

0+阅读 · 2023年5月29日

Bayesian approach to Gaussian process regression with uncertain inputs

Arxiv

0+阅读 · 2023年5月28日

Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes Approach

Arxiv

0+阅读 · 2023年5月26日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Few-shot Learning with Noisy Labels

Arxiv

13+阅读 · 2022年4月12日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

相关基金

多重排序数据的整合分析

国家自然科学基金

0+阅读 · 2015年12月31日

天然来源卤酚类高活性衍生物LM49对LPS诱导的血管内皮炎症MAPK信号通路的调控作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

冲积河流过程水沙输移模型不确定性分析及数据同化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于CERES-MAIZE模型降水保险指数研究-以北京夏玉米为例

国家自然科学基金

0+阅读 · 2013年12月31日

生物特征识别中高维数据的统计降维及算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

TAP基因阻遏炎性细胞因子信号通路促前列腺癌的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

量子点/官能团复合体系的界面态发光

国家自然科学基金

0+阅读 · 2009年12月31日

水稻和高粱基因进化的比较基因组学分析

国家自然科学基金

0+阅读 · 2009年12月31日

蛋白质组学技术筛选生物标志物诊断污染土壤的生态毒性

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员