To leverage machine learning in any decision-making process, one must convert the given knowledge (for example, natural language, unstructured text) into representation vectors that can be understood and processed by machine learning model in their compatible language and data format. The frequently encountered difficulty is, however, the given knowledge is not rich or reliable enough in the first place. In such cases, one seeks to fuse side information from a separate domain to mitigate the gap between good representation learning and the scarce knowledge in the domain of interest. This approach is named Cross-Domain Knowledge Transfer. It is crucial to study the problem because of the commonality of scarce knowledge in many scenarios, from online healthcare platform analyses to financial market risk quantification, leaving an obstacle in front of us benefiting from automated decision making. From the machine learning perspective, the paradigm of semi-supervised learning takes advantage of large amount of data without ground truth and achieves impressive learning performance improvement. It is adopted in this dissertation for cross-domain knowledge transfer. (to be continued)
翻译:为了在任意决策过程中有效利用机器学习,必须将给定的知识(例如自然语言、非结构化文本)转化为机器学习模型能够在其兼容语言和数据格式下理解与处理的表征向量。然而,常见困难在于初始知识本身不够丰富或可靠。在此类情形下,需要融合来自独立领域的辅助信息,以弥合优质表征学习与目标领域知识匮乏之间的鸿沟。这一方法被称为跨域知识迁移。由于知识匮乏现象在诸多场景中普遍存在——从在线医疗平台分析到金融市场风险量化——这阻碍了我们从自动化决策中获益,因此对该问题的研究至关重要。从机器学习视角来看,半监督学习范式利用大量无标注数据实现了显著的性能提升,本论文将其应用于跨域知识迁移。(待续)