Interpreting Contrastive Embeddings in Specific Domains with Fuzzy Rules - 专知论文

会员服务 ·

0

嵌入 · CLIP · 表示 · 样本 · 法律 ·

Interpreting Contrastive Embeddings in Specific Domains with Fuzzy Rules

翻译：特定领域中基于模糊规则的对比嵌入解释

Javier Fumanal-Idocin,Mohammadreza Jamalifard,Javier Andreu-Perez

Free-style text is still one of the common ways in which data is registered in real environments, like legal procedures and medical records. Because of that, there have been significant efforts in the area of natural language processing to convert these texts into a structured format, which standard machine learning methods can then exploit. One of the most popular methods to embed text into a vectorial representation is the Contrastive Language-Image Pre-training model (CLIP), which was trained using both image and text. Although the representations computed by CLIP have been very successful in zero-show and few-shot learning problems, they still have problems when applied to a particular domain. In this work, we use a fuzzy rule-based classification system along with some standard text procedure techniques to map some of our features of interest to the space created by a CLIP model. Then, we discuss the rules and associations obtained and the importance of each feature considered. We apply this approach in two different data domains, clinical reports and film reviews, and compare the results obtained individually and when considering both. Finally, we discuss the limitations of this approach and how it could be further improved.

翻译：自由文本仍然是现实环境中数据录入的常见方式之一，例如法律程序和医疗记录。因此，自然语言处理领域已投入大量努力将这些文本转换为结构化格式，以便标准机器学习方法能够加以利用。将文本嵌入向量表示的最流行方法之一是对比语言-图像预训练模型（CLIP），该模型同时使用图像和文本进行训练。尽管CLIP计算出的表示在零样本和少样本学习问题上取得了显著成功，但在应用于特定领域时仍存在问题。本研究采用基于模糊规则的分类系统，结合标准文本处理技术，将我们感兴趣的部分特征映射到CLIP模型创建的空间中。随后，我们讨论了所获得的规则与关联关系，以及各特征的重要性。我们将此方法应用于两个不同的数据领域——临床报告和电影评论，分别比较单独考虑各领域及同时考虑两个领域时获得的结果。最后，我们讨论了该方法的局限性及其改进方向。

0

相关内容

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

专知会员服务

15+阅读 · 2025年2月24日

大模型如何统一生成和嵌入？最新《生成式表示指令微调》论文详细解答

大模型如何统一生成和嵌入？最新《生成式表示指令微调》论文详细解答

专知会员服务

44+阅读 · 2024年2月18日

知识图谱如何融合大模型？【斯坦福博士论文】利用结构化数据实现鲁棒和自适应的自然语言表示，141页pdf

知识图谱如何融合大模型？【斯坦福博士论文】利用结构化数据实现鲁棒和自适应的自然语言表示，141页pdf

专知会员服务

89+阅读 · 2023年4月3日

【ICML2022】基于元语义正则化的介入性对比学习

【ICML2022】基于元语义正则化的介入性对比学习

专知会员服务

21+阅读 · 2022年7月1日

如何理解词嵌入几何结构？【Edinburgh博士论文】对词和关系表示的理论理解，97页pdf

如何理解词嵌入几何结构？【Edinburgh博士论文】对词和关系表示的理论理解，97页pdf

专知会员服务

41+阅读 · 2022年2月6日

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

专知会员服务

39+阅读 · 2020年3月23日

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

专知会员服务

42+阅读 · 2020年3月17日

【AAAI2020-清华-百度】学习医学文本的概念-上下文嵌入，Learning Conceptual-Contextual Embeddings for Medical Text

【AAAI2020-清华-百度】学习医学文本的概念-上下文嵌入，Learning Conceptual-Contextual Embeddings for Medical Text

专知会员服务

38+阅读 · 2020年3月14日

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

专知会员服务

19+阅读 · 2019年11月18日

【AAAI2020论文】概念结构化嵌入医疗文本表示（Learning Conceptual-Contextual Embeddings for Medical Text）

【AAAI2020论文】概念结构化嵌入医疗文本表示（Learning Conceptual-Contextual Embeddings for Medical Text）

专知会员服务

50+阅读 · 2019年11月15日

如何在深度学习嵌入知识？美国佛蒙特大学196页博士论文《在深度学习系统中利用领域知识》讲解

如何在深度学习嵌入知识？美国佛蒙特大学196页博士论文《在深度学习系统中利用领域知识》讲解

专知

32+阅读 · 2022年4月28日

对比自监督学习

对比自监督学习

深度学习自然语言处理

35+阅读 · 2020年7月15日

【新书】自然语言处理嵌入：语义向量表示理论与进展，从Word2Vec到BERT，163页pdf

【新书】自然语言处理嵌入：语义向量表示理论与进展，从Word2Vec到BERT，163页pdf

专知

23+阅读 · 2020年4月4日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

图嵌入（Graph embedding）综述

图嵌入（Graph embedding）综述

人工智能前沿讲习班

449+阅读 · 2019年4月30日

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言中文社区

10+阅读 · 2019年4月6日

【干货】NLP中“词袋”模型和词嵌入模型的比较（附代码）

【干货】NLP中“词袋”模型和词嵌入模型的比较（附代码）

专知

11+阅读 · 2018年8月4日

Word2Vec与Glove：词嵌入方法的动机和直觉

Word2Vec与Glove：词嵌入方法的动机和直觉

论智

14+阅读 · 2018年6月23日

干货｜当深度学习遇见自动文本摘要，seq2seq+attention

干货｜当深度学习遇见自动文本摘要，seq2seq+attention

机器学习算法与Python学习

10+阅读 · 2018年5月28日

文本聚类：从非结构化数据快速获取见解

文本聚类：从非结构化数据快速获取见解

Datartisan数据工匠

15+阅读 · 2017年10月12日

面向特征提取的低秩与稀疏图嵌入理论与算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向CELP语音压缩域的通用隐写分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于复杂语义的个性化图像集摘要研究

国家自然科学基金

0+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

随机文法作为通用统计模型的扩展

国家自然科学基金

1+阅读 · 2015年12月31日

面向异构信息网络中实体归类的模糊聚类

国家自然科学基金

1+阅读 · 2015年12月31日

共现潜在语义向量空间模型及其语义核的构建与应用研究

国家自然科学基金

1+阅读 · 2015年12月31日

结合图像块联合聚类加权和混合分类器的非对齐稀疏表示识别方法

国家自然科学基金

1+阅读 · 2015年12月31日

面向词汇功能的学术文本语义识别与知识图谱构建

国家自然科学基金

5+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

LLM2Vec-Gen: Generative Embeddings from Large Language Models

Arxiv

0+阅读 · 3月11日

Uncertainty-driven Embedding Convolution

Arxiv

0+阅读 · 2月12日

Diffusion-Pretrained Dense and Contextual Embeddings

Arxiv

0+阅读 · 2月11日

Non-Contrastive Vision-Language Learning with Predictive Embedding Alignment

Arxiv

0+阅读 · 2月11日

Bagging-Based Model Merging for Robust General Text Embeddings

Arxiv

0+阅读 · 2月9日

Efficient Table Retrieval and Understanding with Multimodal Large Language Models

Arxiv

0+阅读 · 2月7日

Evaluating the impact of word embeddings on similarity scoring in practical information retrieval

Arxiv

0+阅读 · 2月5日

CASE -- Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement

Arxiv

0+阅读 · 2月2日

Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings

Arxiv

0+阅读 · 2月1日

From Prompt to Graph: Comparing LLM-Based Information Extraction Strategies in Domain-Specific Ontology Development

Arxiv

0+阅读 · 1月31日

VIP会员

文章信息

相关主题

最新内容

《通过适应复杂环境与特殊作战行动动态来变革情报周期》

《通过适应复杂环境与特殊作战行动动态来变革情报周期》

专知会员服务

1+阅读 · 今天4:15

俄乌冲突背景下军事特种公路运输日益增长的重要性

俄乌冲突背景下军事特种公路运输日益增长的重要性

专知会员服务

2+阅读 · 今天3:44

速度优先于谨慎：NSPM-11意味着什么（将人工智能融入美国国防和情报行动最全面的声明）

速度优先于谨慎：NSPM-11意味着什么（将人工智能融入美国国防和情报行动最全面的声明）

专知会员服务

7+阅读 · 6月10日

《基于深度强化学习的反无人机技术研究》178页

《基于深度强化学习的反无人机技术研究》178页

专知会员服务

6+阅读 · 6月10日

技术突破与战略优势竞争：美军人工智能技术运用阶段分析

技术突破与战略优势竞争：美军人工智能技术运用阶段分析

专知会员服务

4+阅读 · 6月10日

“史诗怒火”行动与“AI中心战”模式的浮现

“史诗怒火”行动与“AI中心战”模式的浮现

专知会员服务

6+阅读 · 6月10日

【CVPR2026教程】扩散模型的解析理解

【CVPR2026教程】扩散模型的解析理解

专知会员服务

2+阅读 · 6月10日

【CVPR2026教程】从感知到模拟：多模态推理中世界模型的涌现

【CVPR2026教程】从感知到模拟：多模态推理中世界模型的涌现

专知会员服务

4+阅读 · 6月10日

马赛克战：俄乌战场透析

马赛克战：俄乌战场透析

专知会员服务

15+阅读 · 6月10日

《利用人工智能增强军事决策》

《利用人工智能增强军事决策》

专知会员服务

7+阅读 · 6月10日

《自动机器学习在军事数据耕耘法中的应用》

《自动机器学习在军事数据耕耘法中的应用》

专知会员服务

9+阅读 · 6月10日

为何指挥所生存能力要求范式转变

为何指挥所生存能力要求范式转变

专知会员服务

6+阅读 · 6月10日

打造“新蛛网”模式与高科技动员

打造“新蛛网”模式与高科技动员

专知会员服务

5+阅读 · 6月10日

“蛛网”行动一周年：远程无人机战争

“蛛网”行动一周年：远程无人机战争

专知会员服务

3+阅读 · 6月10日

加沙、乌克兰和伊朗冲突：人工智能如何改变冲突

加沙、乌克兰和伊朗冲突：人工智能如何改变冲突

专知会员服务

4+阅读 · 6月10日

相关VIP内容

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

【ICLR2025】为多模态图像-文本表示可解释性缩小信息瓶颈理论

专知会员服务

15+阅读 · 2025年2月24日

大模型如何统一生成和嵌入？最新《生成式表示指令微调》论文详细解答

大模型如何统一生成和嵌入？最新《生成式表示指令微调》论文详细解答

专知会员服务

44+阅读 · 2024年2月18日

知识图谱如何融合大模型？【斯坦福博士论文】利用结构化数据实现鲁棒和自适应的自然语言表示，141页pdf

知识图谱如何融合大模型？【斯坦福博士论文】利用结构化数据实现鲁棒和自适应的自然语言表示，141页pdf

专知会员服务

89+阅读 · 2023年4月3日

【ICML2022】基于元语义正则化的介入性对比学习

【ICML2022】基于元语义正则化的介入性对比学习

专知会员服务

21+阅读 · 2022年7月1日

如何理解词嵌入几何结构？【Edinburgh博士论文】对词和关系表示的理论理解，97页pdf

如何理解词嵌入几何结构？【Edinburgh博士论文】对词和关系表示的理论理解，97页pdf

专知会员服务

41+阅读 · 2022年2月6日

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

专知会员服务

39+阅读 · 2020年3月23日

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

专知会员服务

42+阅读 · 2020年3月17日

【AAAI2020-清华-百度】学习医学文本的概念-上下文嵌入，Learning Conceptual-Contextual Embeddings for Medical Text

【AAAI2020-清华-百度】学习医学文本的概念-上下文嵌入，Learning Conceptual-Contextual Embeddings for Medical Text

专知会员服务

38+阅读 · 2020年3月14日

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

专知会员服务

19+阅读 · 2019年11月18日

【AAAI2020论文】概念结构化嵌入医疗文本表示（Learning Conceptual-Contextual Embeddings for Medical Text）

【AAAI2020论文】概念结构化嵌入医疗文本表示（Learning Conceptual-Contextual Embeddings for Medical Text）

专知会员服务

50+阅读 · 2019年11月15日

热门VIP内容

开通专知VIP会员享更多权益服务

俄乌冲突背景下军事特种公路运输日益增长的重要性

《基于深度强化学习的反无人机技术研究》178页

《通过适应复杂环境与特殊作战行动动态来变革情报周期》

速度优先于谨慎：NSPM-11意味着什么（将人工智能融入美国国防和情报行动最全面的声明）

相关资讯

如何在深度学习嵌入知识？美国佛蒙特大学196页博士论文《在深度学习系统中利用领域知识》讲解

如何在深度学习嵌入知识？美国佛蒙特大学196页博士论文《在深度学习系统中利用领域知识》讲解

专知

32+阅读 · 2022年4月28日

对比自监督学习

对比自监督学习

深度学习自然语言处理

35+阅读 · 2020年7月15日

【新书】自然语言处理嵌入：语义向量表示理论与进展，从Word2Vec到BERT，163页pdf

【新书】自然语言处理嵌入：语义向量表示理论与进展，从Word2Vec到BERT，163页pdf

专知

23+阅读 · 2020年4月4日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

图嵌入（Graph embedding）综述

图嵌入（Graph embedding）综述

人工智能前沿讲习班

449+阅读 · 2019年4月30日

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言中文社区

10+阅读 · 2019年4月6日

【干货】NLP中“词袋”模型和词嵌入模型的比较（附代码）

【干货】NLP中“词袋”模型和词嵌入模型的比较（附代码）

专知

11+阅读 · 2018年8月4日

Word2Vec与Glove：词嵌入方法的动机和直觉

Word2Vec与Glove：词嵌入方法的动机和直觉

论智

14+阅读 · 2018年6月23日

干货｜当深度学习遇见自动文本摘要，seq2seq+attention

干货｜当深度学习遇见自动文本摘要，seq2seq+attention

机器学习算法与Python学习

10+阅读 · 2018年5月28日

文本聚类：从非结构化数据快速获取见解

文本聚类：从非结构化数据快速获取见解

Datartisan数据工匠

15+阅读 · 2017年10月12日

相关论文

LLM2Vec-Gen: Generative Embeddings from Large Language Models

Arxiv

0+阅读 · 3月11日

Uncertainty-driven Embedding Convolution

Arxiv

0+阅读 · 2月12日

Diffusion-Pretrained Dense and Contextual Embeddings

Arxiv

0+阅读 · 2月11日

Non-Contrastive Vision-Language Learning with Predictive Embedding Alignment

Arxiv

0+阅读 · 2月11日

Bagging-Based Model Merging for Robust General Text Embeddings

Arxiv

0+阅读 · 2月9日

Efficient Table Retrieval and Understanding with Multimodal Large Language Models

Arxiv

0+阅读 · 2月7日

Evaluating the impact of word embeddings on similarity scoring in practical information retrieval

Arxiv

0+阅读 · 2月5日

CASE -- Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement

Arxiv

0+阅读 · 2月2日

Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings

Arxiv

0+阅读 · 2月1日

From Prompt to Graph: Comparing LLM-Based Information Extraction Strategies in Domain-Specific Ontology Development

Arxiv

0+阅读 · 1月31日

相关基金

面向特征提取的低秩与稀疏图嵌入理论与算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向CELP语音压缩域的通用隐写分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于复杂语义的个性化图像集摘要研究

国家自然科学基金

0+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

随机文法作为通用统计模型的扩展

国家自然科学基金

1+阅读 · 2015年12月31日

面向异构信息网络中实体归类的模糊聚类

国家自然科学基金

1+阅读 · 2015年12月31日

共现潜在语义向量空间模型及其语义核的构建与应用研究

国家自然科学基金

1+阅读 · 2015年12月31日

结合图像块联合聚类加权和混合分类器的非对齐稀疏表示识别方法

国家自然科学基金

1+阅读 · 2015年12月31日

面向词汇功能的学术文本语义识别与知识图谱构建

国家自然科学基金

5+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员