ChatPaper.aiChatPaper

噪声或蕴含可迁移知识:从实证视角理解半监督异构域适应

Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

February 19, 2025
作者: Yuan Yao, Xiaopu Zhang, Yu Zhang, Jian Jin, Qiang Yang
cs.AI

摘要

半监督异构域适应(SHDA)致力于解决特征表示和分布截然不同的跨域学习问题,其中源域样本带有标签,而目标域样本大多无标签,仅有一小部分被标注。此外,源域与目标域样本之间不存在一一对应关系。尽管已开发出多种SHDA方法来应对这一挑战,但跨异构域传递的知识本质仍不明确。本文从实证角度深入探讨了这一问题。我们在约330个SHDA任务上进行了广泛实验,采用了两种监督学习方法和七种代表性SHDA方法。出乎意料的是,我们的观察表明,源域样本的类别信息和特征信息对目标域性能的影响并不显著。此外,从简单分布中抽取的噪声,当作为源域样本时,可能蕴含可迁移的知识。基于这一发现,我们进行了一系列实验以揭示SHDA中可迁移知识的基本原理。具体而言,我们为SHDA设计了一个统一的知识迁移框架(KTF)。基于KTF,我们发现SHDA中的可迁移知识主要源于源域的可迁移性和判别性。因此,确保源域样本具备这些属性,无论其来源如何(如图像、文本、噪声),都能提升SHDA任务中知识迁移的效果。代码与数据集可在https://github.com/yyyaoyuan/SHDA获取。
English
Semi-supervised heterogeneous domain adaptation (SHDA) addresses learning across domains with distinct feature representations and distributions, where source samples are labeled while most target samples are unlabeled, with only a small fraction labeled. Moreover, there is no one-to-one correspondence between source and target samples. Although various SHDA methods have been developed to tackle this problem, the nature of the knowledge transferred across heterogeneous domains remains unclear. This paper delves into this question from an empirical perspective. We conduct extensive experiments on about 330 SHDA tasks, employing two supervised learning methods and seven representative SHDA methods. Surprisingly, our observations indicate that both the category and feature information of source samples do not significantly impact the performance of the target domain. Additionally, noise drawn from simple distributions, when used as source samples, may contain transferable knowledge. Based on this insight, we perform a series of experiments to uncover the underlying principles of transferable knowledge in SHDA. Specifically, we design a unified Knowledge Transfer Framework (KTF) for SHDA. Based on the KTF, we find that the transferable knowledge in SHDA primarily stems from the transferability and discriminability of the source domain. Consequently, ensuring those properties in source samples, regardless of their origin (e.g., image, text, noise), can enhance the effectiveness of knowledge transfer in SHDA tasks. The codes and datasets are available at https://github.com/yyyaoyuan/SHDA.

Summary

AI-Generated Summary

PDF22February 20, 2025