Unsupervised Domain Adaptation Method Based on Relative Entropy Regularization and Measure Propagation

Tan, Lianghao; Peng, Zhuo; Song, Yongjia; Liu, Xiaoyi; Jiang, Huangqi; Liu, Shubing; Wu, Weixi; Xiang, Zhiyuan

doi:10.3390/e27040426

Open AccessArticle

Unsupervised Domain Adaptation Method Based on Relative Entropy Regularization and Measure Propagation

by

Lianghao Tan

^1,*,

Zhuo Peng

¹

,

Yongjia Song

²,

Xiaoyi Liu

¹

,

Huangqi Jiang

³,

Shubing Liu

⁴,

Weixi Wu

⁵ and

Zhiyuan Xiang

⁶

¹

Department of Computer Science, Arizona State University, Tempe, AZ 85281, USA

²

Department of Language Science, University of California, Irvine, CA 92697, USA

³

Department of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA

⁴

Department of Computer Science, North Carolina at Chapel Hill, Orange, GA 27599, USA

⁵

Department of Computer Science, New York University, Brooklyn, NY 10003, USA

⁶

Department of Computer Science, University of California, San Diego, CA 92093, USA

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(4), 426; https://doi.org/10.3390/e27040426

Submission received: 9 March 2025 / Revised: 10 April 2025 / Accepted: 13 April 2025 / Published: 14 April 2025

(This article belongs to the Section Multidisciplinary Applications)

Download

Browse Figures

Versions Notes

Abstract

This paper presents a novel unsupervised domain adaptation (UDA) framework that integrates information-theoretic principles to mitigate distributional discrepancies between source and target domains. The proposed method incorporates two key components: (1) relative entropy regularization, which leverages Kullback–Leibler (KL) divergence to align the predicted label distribution of the target domain with a reference distribution derived from the source domain, thereby reducing prediction uncertainty; and (2) measure propagation, a technique that transfers probability mass from the source domain to generate pseudo-measures—estimated probabilistic representations—for the unlabeled target domain. This dual mechanism enhances both global feature alignment and semantic consistency across domains. Extensive experiments on benchmark datasets (OfficeHome and DomainNet) demonstrate that the proposed approach consistently outperforms State-of-the-Art methods, particularly in scenarios with significant domain shifts. These results confirm the robustness, scalability, and theoretical grounding of our framework, offering a new perspective on the fusion of information theory and domain adaptation.

Keywords:

unsupervised domain adaptation; information theory; relative entropy regularization; probability measure

1. Introduction

Unsupervised domain adaptation (UDA) is an important research direction in the field of machine learning, aimed at addressing the problem of distributional differences between source and target domains [1]. In many real-world applications, such as image classification, speech recognition, and natural language processing, acquiring labeled data is costly, and the target domain often lacks labels [2,3]. Furthermore, data from the source domain cannot be directly used for the target task. UDA enhances the performance of models in the target domain by leveraging labeled information from the source domain, which has broad practical applications [4]. For example, in medical image analysis, UDA can help adapt data across hospitals. In autonomous driving, UDA can adapt to different lighting conditions or viewpoints. Therefore, the study of efficient UDA methods is of significant theoretical and practical importance in advancing cross-domain generalization capabilities.

Although traditional unsupervised domain adaptation (UDA) methods, such as feature alignment and adversarial training, have alleviated distribution discrepancies to some extent, they still face notable limitations. A key challenge lies in the uncertainty of the predicted distribution in the target domain, which often leads to overfitting or reduced accuracy. Moreover, many existing approaches rely on shallow metrics, such as Maximum Mean Discrepancy (MMD) or adversarial loss, to align source and target domains, without incorporating deeper insights from information theory. As a result, these methods may overlook the intrinsic structural relationships between domains, causing critical information to be lost or redundant features to be introduced, ultimately impairing generalization capability in complex scenarios [5].

To address these limitations, there is a growing interest in exploring more principled approaches that can provide deeper theoretical guarantees while capturing both global and semantic structures in cross-domain tasks. Information theory, with its ability to model uncertainty and quantify distributional differences, offers a compelling foundation for designing such methods. However, its integration into domain adaptation frameworks remains underexplored.

In light of this, we propose a novel information-theoretic framework for UDA that explicitly addresses distribution discrepancy, defined as the difference in data distributions between the labeled source domain and the unlabeled target domain. Our method integrates relative entropy regularization and measure propagation to achieve robust domain alignment from an information-theoretic perspective. Specifically, relative entropy regularization employs Kullback–Leibler (KL) divergence to constrain the target domain’s predicted distribution, encouraging consistency with the source domain’s reference distribution and reducing information loss. Meanwhile, measure propagation transfers probability measures from the source to the target domain, constructing pseudo-measures that ensure global consistency in feature space representation [6]. These two components are jointly optimized with a feature extractor and classifier, resulting in improved performance and generalization across diverse target domains [7,8].

The innovations of this paper are summarized as follows:

(1): An unsupervised domain adaptation method based on relative entropy regularization is proposed. KL divergence is used to accurately constrain the distributional difference between the source domain and the target domain from the perspective of information theory, thus overcoming the limitations of traditional distance metrics.
(2): The measure propagation mechanism is introduced, generating the target domain pseudo-measure by propagating the probability measure of the source domain, and the application of information theory in the global structure modeling of the feature space is deepened;
(3): Combining the above methods, we construct an information-theory-driven joint optimization framework, which significantly improves the generalization ability of the model in the target domain and provides new ideas for the application of information theory in deep learning.

2. Related Work

2.1. Unsupervised Domain Adaptation

Unsupervised domain adaptation (UDA) aims to address the issue of distribution discrepancies between source and target domains, which is critical in fields like cross-domain image classification and semantic segmentation. Due to the lack of labeled data in the target domain, UDA leverages supervised information from the source domain and unsupervised data from the target domain to train models that generalize well in the target domain [9]. Traditional methods often reduce the distribution gap through feature alignment or adversarial training. In recent years, self-supervised learning and pseudo-labeling techniques have been widely explored in UDA. However, existing methods still face limitations when dealing with out-of-distribution samples or global structure alignment, providing room for further improvement [10].

Unsupervised domain adaptation has gained significant attention from both academia and industry in recent years, becoming one of the key technologies in cross-domain knowledge transfer. For instance, Xu et al. [11] proposed a Transformer-based UDA method, CDTrans, which learns source/target features and aligns domains using a shared-weight three-branch Transformer framework. To generate accurate pseudo-labels, the paper also introduced a bidirectional center-aware labeling algorithm, optimizing the pseudo-label generation process for target domain samples. Mirza et al. [12] introduced a dynamic unsupervised adaptation method, DUA, which continuously adapts the statistics of the batch normalization layer to adjust feature representations. This approach achieves significant performance improvements with minimal computational overhead, making it suitable for any network architecture using batch normalization. Huang et al. [13] proposed a novel category contrast technique, CaCo, which introduces semantic priors into instance contrast learning to improve UDA performance. CaCo constructs a semantic-aware dictionary and assigns pseudo-category labels to target domain samples based on source domain category priors, facilitating category-discriminative and domain-invariant feature representations.

In conclusion, this paper provides an overview of the recent advancements in unsupervised domain adaptation (UDA), with a focus on methods tackling challenges such as the lack of labeled data in the target domain, feature alignment, and category discrimination. While existing approaches, such as adversarial learning and self-supervised strategies, have demonstrated effectiveness, they often fall short in modeling distributional uncertainty and preserving global structural information. In contrast, the proposed method introduces relative entropy regularization and measure propagation, which directly address these limitations by leveraging information-theoretic principles. Relative entropy regularization constrains the predictive uncertainty in the target domain, while measure propagation facilitates global feature alignment through probability-based modeling. These innovations fill critical gaps in current UDA techniques, offering a theoretically grounded and practically effective solution for improving cross-domain generalization across diverse tasks.

2.2. Information Theory in Deep Learning

In recent years, the application of information theory in deep learning has garnered significant attention, especially in understanding and optimizing deep neural networks [14,15]. Classical concepts of information theory, such as mutual information, KL divergence, and entropy, have been incorporated into the training processes of deep learning models to design more effective loss functions and regularization methods [16,17]. For instance, the information bottleneck method optimizes the flow of information within deep networks by minimizing the loss of information between the input and output, thereby improving the model’s generalization ability [18,19]. Moreover, information theory has been applied to network structure design and adjustments by quantifying information transfer and compression within networks, further enhancing model efficiency and expressive power.

At the same time, the application of information theory has achieved notable progress in fields such as unsupervised learning and Generative Adversarial Networks (GANs). In unsupervised learning, information theory is used for analyzing the stability of adversarial learning and structuring the latent space, particularly in Variational Autoencoders (VAEs). Here, KL divergence is widely used as part of the loss function to optimize the distribution of the latent representation, enabling the generative model to approximate the true data distribution effectively. For example, Wu et al. [20] proposed a robust dynamic semi-supervised symmetric regularization image blur clustering algorithm, which incorporates KL divergence and spatial information constraints. By introducing weighted squared Euclidean distance and maximum entropy fuzzy clustering, this method improves the robustness and performance of image segmentation in high-noise environments. Sanokowski et al. [21] proposed a new method that uses an upper bound of the reverse Kullback–Leibler divergence loss function. This method breaks through the traditional generative model’s requirement for exact sample likelihoods, enabling the use of highly expressive latent variable models like diffusion models. The method does not require training data and allows for data-independent sample learning in areas like combinatorial optimization.

In the context of GANs, information-theory methods help clarify the adversarial game between the generator and discriminator. By introducing information gain, these methods improve the generation ability and stability of the model. Lee et al. [22] introduced a new framework that simultaneously alleviates catastrophic forgetting in the discriminator and mode collapse in the generator through contrastive learning and mutual information maximization in adversarial learning. Li et al. [23] applied information theory and GANs to recommendation algorithms, proposing a fairness-aware learning algorithm, FairGAN. FairGAN maps exposure fairness issues to negative bias in implicit feedback data. It dynamically generates fairness signals using a novel fairness-aware learning strategy, optimizing ranking search direction while ensuring fairness in exposure distribution across items, without compromising user utility.

3. Method

In this study, we propose a novel unsupervised domain adaptation method that addresses the distributional gap between a labeled source domain and an unlabeled target domain by integrating two complementary information-theoretic components: relative entropy regularization and measure propagation. The model architecture is shown in Figure 1.

The first component, relative entropy regularization, is formulated using Kullback–Leibler (KL) divergence to minimize the discrepancy between the predicted label distribution in the target domain and a reference distribution constructed from the source domain. This regularization encourages the model to generate predictions in the target domain that are statistically consistent with the knowledge obtained from the labeled source domain, thereby reducing uncertainty and enhancing model robustness.

To further address the structural misalignment in the feature space, we incorporate a second component: measure propagation. This mechanism aims to build an estimated distribution, or pseudo-measure, for the target domain by transferring probability information from the source domain. Specifically, the probability distribution of source features is adjusted using a learned transformation function to approximate the distribution of target features. This transformation is modeled as a density ratio, which quantifies how the source distribution should be reshaped to resemble the target one. The density ratio is implemented via a neural network that outputs a scaling factor for each target feature, enabling flexible and data-driven alignment.

The optimization process is composed of three loss functions: (1) a classification loss on the labeled source data, (2) a KL divergence loss enforcing consistency between the predicted target distribution and the source-derived reference distribution, and (3) an adversarial loss that promotes similarity between the propagated pseudo-measure and the actual distribution of the target domain. These components are jointly optimized during training to update the feature extractor, classifier, and auxiliary networks. This unified framework results in more stable cross-domain generalization and improved alignment at both the distributional and structural levels [24,25].

3.1. Information-Theoretic Constraint via Relative Entropy Regularization

In unsupervised domain adaptation (UDA), the distributional difference between the source domain

D_{s} = {(x_{s}^{i}, y_{s}^{i})}_{i = 1}^{N}

and target domain

D_{t} = {(x_{t}^{i})}_{i = 1}^{N}

P_{s} = (x, y) \neq P_{t} (x, y)

is a central challenge. Traditional methods usually solve the problem through feature alignment or adversarial training, but these methods struggle to accurately control the prediction behavior of the target domain due to the lack of labels in the target domain. Therefore, we propose a regularization strategy based on relative entropy (i.e., KL divergence) to achieve domain adaptation by restricting the consistency between the target domain distribution and the source-domain-derived reference distribution.

We define a feature extractor

G = X \to Z

with parameters

θ_{G}

, which maps input

x ϵ X

to feature space

Z

and obtains

z = G (x)

. The classifier

C : Z \to ∆^{K}

with parameters

θ_{C}

takes input

z

and outputs the probability distribution

K

of

P (y | z) = C (z)

categories. The source domain has supervised data, and the feature distribution

P_{Z_{s}} (z)

and conditional distribution

P_{S} (y | z)

can be directly defined. For the target domain, only

P_{Z_{t}} (z)

can be obtained through

G (x_{t})

, and

P_{t} (y | z)

is unknown because it has no labels. Our goals are as follows: (1) to train

G

and

C

on the source domain to minimize the classification error; (2) to ensure that the joint distribution

P_{t} (z, y) = P_{Z_{t}} (z) P_{t} (y | z)

of the target domain is aligned with the source domain knowledge. To this end, we introduce a reference distribution

Q (z, y)

and regularize

P_{t} (z, y)

with relative entropy to make it close to

Q (z, y)

. The model architecture of this section is shown in Figure 2.

The reference distribution

Q (z, y)

is constructed directly from the source domain and is defined as follows:

Q (z, y) = P_{Z_{s}} (z) P_{s} (y | z)

(1)

where

P_{Z_{s}} (z)

is the empirical distribution of source domain features.

P_{Z_{s}} (z) = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} δ (z - G (x_{s}^{i}))

(2)

δ

is the Dirac delta function, and

P_{S} (y | z) = C (G (x_{s}))

is the classifier’s prediction of the source domain features. This form is the simplest and most direct choice, making full use of the empirical distribution of the source domain and the classifier output to avoid introducing additional complexity. The joint distribution of the target domain is as follows:

P_{t} (z, y) = P_{Z_{t}} (z) P_{t} (y | z)

(3)

where

P_{Z_{t}} (z) = \frac{1}{N_{t}} \sum_{j = 1}^{N_{t}} δ (z - G (x_{t}^{i})), P_{t} (y | z) = C (G (x_{t}))

. We want to minimize the relative entropy between

P_{t} (z, y)

and

Q (z, y)

:

L_{r e} = D_{K L} (P_{t} (z, y) | | Q (z, y))

(4)

expands to:

L_{r e} = \int P_{t} (z, y) l o g \frac{P_{t} (z, y)}{Q (z, y)} d z d y

(5)

Substituting into the joint distribution, we can see the following:

L_{r e} = \int P_{Z_{t}} (z) P_{t} (y | z) l o g \frac{P_{Z_{t}} (z) P_{t} (y | z)}{P_{Z_{s}} (z) P_{s} (y | z)} d z d y

(6)

According to the properties of logarithms and the expected decomposition, we can obtain the following:

L_{r e} = \int P_{Z_{t}} (z) l o g \frac{P_{Z_{t}} (z)}{P_{Z_{s}} (z)} d z + \int P_{Z_{t}} (z) [\int P_{t} (y | z) l o g \frac{P_{t} (y | z)}{P_{s} (y | z)} d y] d z

(7)

which can be equivalently changed to the following:

L_{r e} = D_{K L} (P_{Z_{t}} (z) | | P_{Z_{s}} (z)) + E_{z ~ P_{Z_{t}}} [D_{K L} (P_{t} (y | z) | | P_{s} (y | z))]

(8)

The first term

D_{K L} (P_{Z_{t}} (z) | | P_{Z_{s}} (z))

measures the difference in feature distribution, and the second term

E_{z ~ P_{Z_{t}}} [D_{K L} (P_{t} (y | z) | | P_{s} (y | z))]

ensures that the conditional distribution is consistent. Since it is not feasible to directly calculate the continuous integral, we use empirical samples to approximate. The first term is as follows:

D_{K L} (P_{Z_{t}} (z) | | P_{Z_{s}} (z)) \approx \frac{1}{N_{t}} \sum_{j = 1}^{N_{t}} l o g \frac{P_{Z_{t}} (G (x_{t}^{j}))}{P_{Z_{s}} (G (x_{t}^{j}))}

(9)

However, accurate estimation of

P_{Z_{s}} (z)

and

P_{Z_{t}} (z)

requires density estimation, which increases the computational burden. For simplicity, we assume that G has been aligned with

P_{Z_{s}}

and

P_{Z_{t}}

through subsequent methods (see Section 3.2), making the first term less influential. Therefore, we focus on optimizing the second term:

L_{r e} \approx E_{z ~ P_{Z_{t}}} [D_{K L} (P_{t} (y | z) | | P_{s} (y | z))]

(10)

The empirical approximation is as follows:

L_{r e} \approx \frac{1}{N_{t}} \sum_{j = 1}^{N_{t}} D_{K L} (C (G (x_{t}^{j})) | | P_{s} (y | G (x_{t}^{j})))

(11)

Since the target domain sample

G (x_{t}^{j})

has no corresponding

P_{s} (y | G (x_{t}^{j}))

, we use the source domain nearest neighbor approximation: for each

x_{t}^{j}

, find the nearest source domain sample

x_{s}^{k (j)} = {a r g m i n}_{x_{s}^{i}} {| | G (x_{t}^{j}) - G (x_{t}^{i}) | |}_{2}

and define:

P_{s} (y | G (x_{t}^{j})) \approx C (G (x_{s}^{k (j)}))

(12)

So, we can obtain the following:

L_{r e} \approx \frac{1}{N_{t}} \sum_{j = 1}^{N_{t}} \sum_{k = 1}^{K} {C (G (x_{t}^{j}))}_{k} l o g \frac{{C (G (x_{t}^{j}))}_{k}}{{C (G (x_{t}^{k (j)}))}_{k}}

(13)

In summary, relative entropy regularization introduces KL divergence as an information-theory tool to constrain the consistency of the predicted distribution of the target domain and the reference distribution of the source domain, effectively reducing the distributional difference problem in unsupervised domain adaptation [25]. This method uses the supervised information of the source domain and the unlabeled data of the target domain, and through nearest neighbor approximation and empirical sample optimization, it significantly improves the robustness and prediction accuracy of the model in the target domain, providing a solid foundation for information-theory-driven domain adaptation.

3.2. Measure Propagation

In Section 3.2 we discussed that the goal of measure propagation is to derive the pseudo measure

{\tilde{P}}_{Z_{t}} (z)

of the target domain from

P_{Z_{s}} (z)

, and make

{\tilde{P}}_{Z_{t}} (z)

close to

P_{Z_{s}} (z)

by optimizing G, thereby achieving knowledge transfer between domains. The model architecture is shown in Figure 3.

The core idea of measure propagation is to use the probability measure of the source domain and propagate it to the target domain through some transformation to generate a reference distribution consistent with the target domain samples. We define a measure propagator

T : P (Z) \times Z \to P (Z)

, which inputs the source domain measure

P_{Z_{s}} (z)

and the target domain feature

Z_{t} = {G (x_{t}^{j})}_{j = 1}^{N_{t}}

and outputs a pseudo measure:

{\tilde{P}}_{Z_{t}} (z) = T (P_{Z_{s}} (z), Z_{t})

(14)

To simplify the implementation, we choose the simplest form based on density ratio. Assume that the target domain distribution

P_{Z_{t}} (z)

is associated with the source domain distribution

P_{Z_{s}} (z)

through a density ratio function

r (z)

:

P_{Z_{t}} (z) = r (z) P_{Z_{s}} (z)

(15)

Then, the pseudo-measure can be defined as follows:

{\tilde{P}}_{Z_{t}} (z) = r (z) P_{Z_{s}} (z)

(16)

where

r (z)

represents the distribution change ratio from the source domain to the target domain. There are many possible forms of direct estimation of

r (z)

, such as parametric models or non-parametric estimation. To keep it simple, we chose a neural network

R : Z \to R^{+}

with parameters

θ_{R}

to directly predict

r (z) = R (z)

and regard it as a scalar function of

z

. The source domain feature distribution is the empirical distribution:

P_{Z_{s}} (z) = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} δ (z - G (x_{s}^{i}))

(17)

So, the pseudo-measure is as follows:

{\tilde{P}}_{Z_{t}} (z) = R (z) \cdot \frac{1}{N_{t}} \sum_{i = 1}^{N_{t}} δ (z - G (x_{s}^{i}))

(18)

The actual feature distribution of the target domain is as follows:

P_{Z_{t}} (z) = \frac{1}{N_{t}} \sum_{i = 1}^{N_{t}} δ (z - G (x_{t}^{j}))

(19)

Our goal is to make

P_{Z_{t}} (z)

as close to

{\tilde{P}}_{Z_{t}} (z)

as possible by optimizing G and R. The optimization goal uses KL divergence to measure the difference between the two:

L_{m p} = D_{K L} (P_{Z_{t}} (z) | | {\tilde{P}}_{Z_{t}} (z))

(20)

Expanding and inserting the definition provides the following:

L_{m p} = \int P_{Z_{t}} (z) l o g \frac{P_{Z_{t}} (z)}{R (z) P_{Z_{s}} (z)} d z

(21)

Then, split the formula into three terms to obtain the following:

L_{m p} = \int P_{Z_{t}} (z) l o g P_{Z_{t}} (z) d z - \int P_{Z_{t}} (z) l o g R (z) d z - \int P_{Z_{t}} (z) l o g P_{Z_{s}} (z) d z

(22)

The first term is the entropy of

P_{Z_{t}} (z)

, which has no direct contribution to optimization and can be ignored. The second and third terms are as follows:

L_{m p} = - E_{z ~ P_{Z_{t}}} [l o g R (z)] - E_{z ~ P_{Z_{t}}} [l o g P_{Z_{s}} (z)]

(23)

The empirical approximation is as follows:

L_{m p} \approx - \frac{1}{N_{t}} \sum_{j = 1}^{N_{t}} l o g R (G (x_{t}^{j})) - \frac{1}{N_{t}} \sum_{j = 1}^{N_{t}} l o g (\frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} δ (G (x_{t}^{j}) - G (x_{s}^{i})))

(24)

Since the

δ

function is zero at

G (x_{t}^{j}) \neq G (x_{s}^{i})

, the third term is difficult to calculate directly. And Section 3.1 has been aligned, so this article focuses on optimizing

R (z)

and G:

L_{m p} \approx - \frac{1}{N_{t}} \sum_{j = 1}^{N_{t}} l o g R (G (x_{t}^{j}))

(25)

But this alone is not enough to constrain

R (z)

. We introduce a discriminator

D : Z \to [0, 1]

with parameter

θ_{D}

to distinguish

z ~ P_{Z}

and

z ~ {\tilde{P}}_{Z}

. The optimization goal is as follows:

L_{D} = E_{z ~ P_{Z_{s}}} [l o g D (z)] + E_{z ~ {\tilde{P}}_{Z_{s}}} [l o g (1 - D (z))]

(26)

The empirical form is as follows:

L_{m p} = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} l o g (D (G (x_{s}^{i})) + \frac{1}{N_{t}} \sum_{j = 1}^{N_{t}} l o g (1 - D (G (x_{t}^{j}))) R (G (x_{t}^{j}))

(27)

The goal of G and R is to deceive D, that is, to minimize the following:

L_{m p} = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} l o g (1 - D (G (x_{t}^{j}))) R (G (x_{t}^{j}))

(28)

Measure propagation generates pseudo-measures of the target domain by propagating the probability measure of the source domain, achieving global alignment of the feature space and enhancing the distribution consistency of unsupervised domain adaptation. This method uses a simple form of density ratio and adversarial training to optimize the feature extractor and discriminator, significantly improving the generalization ability of the model in complex cross-domain tasks and deepening the application of information theory in global structure modeling.

3.3. Overall Loss Function

In our unsupervised domain adaptation framework, the overall loss function combines multiple information-theory-driven components to optimize the distribution alignment of the source and target domains. The core part includes the cross-entropy loss

L_{c l s}

of the source domain, which is used to minimize the classification error and is defined as follows:

L_{c l s} = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} {- l o g C (G (x_{s}^{i}))}_{y_{s}^{i}}

(29)

where

N_{s}

is the number of source domain samples,

x_{s}^{i}

and

y_{s}^{i}

are source domain samples and labels, respectively, and

C (G (x_{s}^{i}))

is the classifier prediction probability. Combined with the relative entropy regularization loss

L_{r e}

, the consistency between the target domain prediction distribution and the source domain reference distribution is constrained by the KL divergence, and its empirical approximation is as follows:

L_{r e} \approx \frac{1}{N_{t}} \sum_{j = 1}^{N_{t}} \sum_{k = 1}^{K} {C (G (x_{t}^{j}))}_{k} l o g \frac{{C (G (x_{t}^{j}))}_{k}}{{C (G (x_{t}^{k (j)}))}_{k}}

(30)

where

N_{t}

is the number of target domain samples,

x_{t}^{j}

is the target domain sample, and

x_{t}^{k (j)}

is the nearest neighbor source domain sample. In addition, the measure propagation loss

L_{m p}

optimizes the alignment of the target domain pseudo measure with the source domain measure through adversarial training, which is defined as follows:

L_{m p} = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} l o g (1 - D (G (x_{t}^{j}))) R (G (x_{t}^{j}))

(31)

where D is the discriminator and R is the density ratio network. The final overall loss function is as follows:

L = L_{c l s} + λ_{r e} L_{r e} + λ_{m p} L_{m p}

(32)

By balancing the contributions of the three through hyperparameters

λ_{r e}

and

λ_{m p}

, the feature extractor G and classifier C are jointly optimized to achieve information-theory-driven cross-domain generalization.

4. Datasets and Evaluation Metrics

4.1. Dataset Introduction

In this section, we provide an overview of the datasets used in this study. Specifically, two datasets are utilized: OfficeHome and DomainNet. These datasets are widely recognized in the domain adaptation research community and serve as benchmarks for evaluating various algorithms in the context of unsupervised domain adaptation. The OfficeHome dataset consists of a collection of images from different office-related domains, making it ideal for evaluating the performance of domain adaptation methods across diverse visual domains. DomainNet, on the other hand, is a more challenging dataset, comprising a large number of categories across six different domains, including real-world images, clipart, and sketches, among others. Both datasets offer unique challenges in terms of domain shift and data distribution, making them well-suited for testing the generalization ability of domain adaptation models.

(A): OfficeHome

The OfficeHome dataset is a widely used benchmark in the field of domain adaptation, consisting of images from four distinct domains: Art, Clipart, Product, and Real World. Each domain contains images from 65 object categories, with a total of 15,500 images. The diversity of the domains makes it particularly suitable for evaluating algorithms in terms of their ability to handle different types of visual data, ranging from digital artworks to real-world photos. This dataset is specifically designed to assess the effectiveness of domain adaptation methods in addressing the challenge of domain shift, as the images across these domains exhibit significant differences in style, background, and content. The complexity of this dataset has made it a popular choice for testing models that aim to generalize across various domains while maintaining high performance. An example of its dataset is shown in Figure 4.

(B): DomainNet

DomainNet is a large-scale dataset designed to evaluate domain adaptation techniques, comprising over 600,000 images spread across six distinct domains: Real, Clipart, Infograph, Painting, QuickDraw, and Sketch. These domains cover a wide range of visual data types, from realistic photos to hand-drawn sketches and digital drawings, providing a diverse set of challenges for domain adaptation algorithms. DomainNet contains 345 object categories, making it one of the most comprehensive datasets for testing models in the context of cross-domain generalization. The significant variations in image style, content, and visual representation between these domains present a formidable challenge for domain adaptation, making DomainNet an ideal benchmark for assessing the robustness and versatility of domain adaptation methods. An example of its dataset is shown in Figure 5.

4.2. Evaluation Metrics

In this study, the performance of the domain adaptation model is evaluated using accuracy (ACC) as the primary metric. Accuracy measures the proportion of correctly classified samples in the target domain, reflecting the model’s ability to generalize to the target domain after adapting from the source domain. Specifically, accuracy is calculated by comparing the predicted labels with the true labels in the target domain. The accuracy is formulated as follows:

A c c = \frac{1}{N_{T}} \sum_{i = 1}^{N_{T}} I (y_{i}^{T} = {y'}_{i}^{T})

(33)

where

N_{T}

is the number of samples in the target domain,

y_{i}^{T}

is the true label of the i-th sample in the target domain,

{y'}_{i}^{T}

is the predicted label, and

I

is the indicator function that equals 1 if the prediction is correct

(y_{i}^{T} = {y'}_{i}^{T})

, and 0 otherwise.

5. Experiment

To comprehensively evaluate the proposed unsupervised domain adaptation method based on relative entropy regularization and measure propagation, a series of experiments are designed in this chapter. These include comparison experiments to validate the model’s performance on the target domain, hyperparameter sensitivity experiments to analyze the impact of key parameters on the results, ablation experiments to explore the role of each component, and visualization experiments to visually demonstrate the feature distribution and distribution alignment effects. These experiments are conducted on multiple benchmark datasets, aiming to systematically verify the method’s effectiveness, robustness, and theoretical advantages.

5.1. Experimental Details

In the experiments, the model is trained for 300 epochs using a ResNet50 pre-trained on ImageNet as the backbone to extract deep features. The Adam optimizer is used with an initial learning rate of 0.001, which is dynamically adjusted via cosine annealing. The batch size is set to 32, and all experiments are conducted on a computing platform equipped with an NVIDIA 4090D GPU to ensure computational efficiency and consistency. The model weights are fine-tuned through random initialization, and hyperparameters, such as the relative entropy regularization weight and measure propagation weight, are optimized on the validation set to ensure the fairness and reproducibility of the experimental results.

5.2. Performance Comparison Experiment

The training indicator evaluation of the comparative experiment follows the unsupervised domain adaptation task. For the OfficeHome dataset, its four domains C, P, R, and A and the other three domains are tested for domain adaptation. Each result is tested three times with different random seeds and the average is taken. The average accuracy of 12 experiments is used to express the overall evaluation of the dataset.

The training DomainNet dataset contains six data domains. Given that the Infograph and QuickDraw domains are too different from other data, and it is difficult to obtain better feature expression when using Resnet50 as the backbone, only four of the six domains in the dataset (Real, Painting, Sketch, Clipart) are selected for the same test as OfficeHome. Its overall evaluation indicator is still the average accuracy of the 12 pairs of tasks. The experimental results are shown in Table 1.

The experimental results demonstrate that our proposed unsupervised domain adaptation method based on relative entropy regularization and measure propagation achieves an average accuracy of 72.3% on the OfficeHome dataset, outperforming all baseline models, including ToAlign (71.8%). This result confirms the overall effectiveness of our method in enhancing cross-domain generalization.

From an information-theoretic standpoint, relative entropy regularization effectively constrains the predicted label distribution of the target domain to align with a reference distribution derived from the source domain via KL divergence. This helps reduce prediction uncertainty in the target domain, especially in domain pairs where category-level alignment is critical (e.g., A2P and C2P). On the other hand, measure propagation promotes global consistency in feature representation by transferring probability mass from the source to the target domain, which is particularly beneficial in domain pairs such as R2P and P2R, where structural differences in the feature space are more pronounced.

Notably, our method excels on several domain pairs with relatively moderate distribution shifts, including A2P (77.5%), C2P (76.5%), and R2P (85.2%). These results suggest that our method effectively captures both local label consistency and global feature structure in such settings. However, it does not achieve the best results on more challenging pairs like A2C (59.0%), C2A (66.5%), and P2C (57.5%), where the domain shift is more complex and localized. In these cases, the current formulation of KL divergence and the pseudo-measure approximation may lack the granularity needed for fine-grained adaptation, limiting performance.

Despite these limitations, the joint optimization of relative entropy regularization and measure propagation provides a strong balance between alignment accuracy and structural generalization. This synergy results in consistently competitive performance across a wide range of domain pairs, validating the robustness and theoretical soundness of the proposed method.

This paper also gives the Grad-CAM image of the OfficeHome dataset, as shown in Figure 6.

The Grad-CAM [40,41] visualization results show the performance of our proposed unsupervised domain adaptation method based on relative entropy regularization and measure propagation on the OfficeHome target domain, for six examples. Each example includes the original image, Grad-CAM heat map, and superimposed image. The heat map focuses on the key areas of the target object, and the superimposed image clearly shows the target part that the model focuses on, indicating that relative entropy regularization constrains the consistency of the target domain distribution with the source domain through KL divergence, and measure propagation enhances the distinguishing ability of features through probability measure alignment, thereby achieving accurate category recognition in the target domain, verifying the effectiveness of the method. This paper also gives the experimental results of other datasets, as shown in Table 2.

The experimental results show that our proposed unsupervised domain adaptation method based on relative entropy regularization and measure propagation achieves an average performance of 46.2% on the DomainNet dataset, significantly outperforming other baseline models (such as ToAlign with 45.4%). This verifies the effectiveness of our method in handling large-scale, cross-domain tasks with strong distribution heterogeneity. The complexity of DomainNet, with larger distributional differences between domains, imposes higher demands on information-theoretic methods. Relative entropy regularization precisely constrains the consistency between the predicted distribution of the target domain and the reference distribution of the source domain using KL divergence, reducing information loss. Measure propagation, by propagating the source domain’s probability measures to generate pseudo-measures for the target domain, enhances the global alignment of the feature space. The method performs exceptionally well on challenging domain pairs such as A2C (51.2%) and P2A (57.7%), contributing to the overall improvement in average performance.

Similarly, our method does not perform the best on all domain pairs in the DomainNet dataset. For example, it is does not perform as well as some models (such as PAN and FixBi) on A2P (49.6%), C2P (34.2%), and P2R (36.4%). This may be due to the larger distribution heterogeneity and noise in the DomainNet dataset, where relative entropy regularization and measure propagation may require stronger local adaptation capabilities in some extreme scenarios. Nevertheless, the overall robustness of the method benefits from the information-theory-driven joint optimization. The use of KL divergence to quantify distributional differences and measure propagation to model global structure allows the method to maintain significant performance advantages on large-scale, complex datasets, highlighting the applicability and scalability of the theoretical design.

5.3. Ablation Experiment

To comprehensively evaluate the contribution of each component in our proposed unsupervised domain adaptation method, based on relative entropy regularization and measure propagation, this section presents ablation experiments on the OfficeHome dataset. The focus is on analyzing the performance of four representative domain pairs: A2P, C2P, P2A, and R2C. By systematically removing key modules (such as relative entropy regularization or measure propagation), we verify the contribution of each part to the overall model performance. The experimental results are shown in Table 3.

The ablation experiment results show that our proposed unsupervised domain adaptation method, based on relative entropy regularization and measure propagation, performs excellently on the A2P, C2P, P2A, and R2C domain pairs of the OfficeHome dataset. The complete model (ours) achieves the best performance across all domain pairs, with A2P reaching 77.5%, C2P 76.5%, P2A 68.0%, and R2C 61.8%. After removing relative entropy regularization, A2P drops to 75.2%, C2P to 74.3%, P2A to 65.7%, and R2C to 59.4%. This indicates that KL divergence plays a critical role in constraining the consistency between the target domain’s predicted distribution and the source domain’s reference distribution. The impact is especially noticeable in A2P and C2P, where there are large distributional differences, validating the necessity of information-theoretic alignment.

After removing measure propagation, A2P drops to 76.0%, C2P to 75.1%, P2A to 66.5%, and R2C to 60.3%. This shows that measure propagation, by propagating the source domain’s probability measures to generate pseudo-measures for the target domain, makes a significant contribution to the global alignment of the feature space, especially in domain pairs with higher heterogeneity, such as P2A and R2C. Although performance decreases when either of the two modules is removed, the collaborative effect of the complete model significantly improves performance, fully demonstrating the complementary advantages of relative entropy regularization and measure propagation in information-theory-driven domain adaptation.

Secondly, for the P2A task, this paper selected 10 categories, with 50 samples in each category, and used T-SNE [46] to visualize the results of three different ablation experiments. The experimental results are shown in Figure 7.

The T-SNE visualization in the experimental results shows the feature distribution of our proposed unsupervised domain adaptation method, based on relative entropy regularization and measure propagation, on the A2P task of the OfficeHome dataset, involving 10 categories (from Alarm Clock to Mug). The complete model (ours, with an accuracy of 77.5%) exhibits the clearest category clustering in its feature distribution (Figure 7c), with compact clusters and clear boundaries for the 10 categories. This indicates that relative entropy regularization constrains the consistency between the target domain and the source domain distributions using KL divergence, while measure propagation achieves global feature space alignment by propagating probability measures, significantly enhancing the separability between categories and reflecting the method’s optimal performance.

After removing relative entropy regularization (Figure 7a, accuracy 75.2%) or measure propagation (Figure 7b, accuracy 76.0%), the clustering effect of the feature distribution significantly worsens. In Figure 7a, the category clusters become more scattered and partially overlap, indicating that the absence of the KL divergence constraint increases the deviation between the target domain’s predicted distribution and the source domain’s reference distribution, exacerbating information loss. In Figure 7b, although the category distribution is more concentrated compared to Figure 7a, it is still more dispersed than in the complete model, indicating the indispensable role of measure propagation in global feature alignment. The performance degradation in these two figures validates the complementary role of relative entropy regularization and measure propagation in information-theory-driven domain adaptation.

5.4. Hyperparameter Sensitivity Experiments

In order to deeply evaluate the robustness of our unsupervised domain adaptation method based on relative entropy regularization and measure propagation, this section conducts hyperparameter sensitivity experiments on the OfficeHome dataset, focusing on analyzing the impact of key hyperparameters on model performance. First, this paper experiments on conventional hyperparameters, such as the optimizer. Secondly, we pay special attention to changes in relative entropy regularization weight

λ_{r e}

and measure propagation weight

λ_{m p}

. By testing different value ranges on the four representative domain pairs of A2C, C2A, P2R, and R2P, we systematically explore the impact of hyperparameter settings on accuracy, thereby ensuring the stability and optimal performance of the method in practical applications. The experimental results are shown in Table 4.

The results of hyperparameter sensitivity experiments show that our proposed unsupervised domain adaptation method, based on relative entropy regularization and measure propagation, has the best performance on the A2C, C2A, P2R, and R2P domain pairs of the OfficeHome dataset, with the Adam optimizer reaching 59.0%, 66.5%, 82.2%, and 85.2%, respectively, verifying the high efficiency of Adam in complex optimization that combines KL divergence constraints and probabilistic measure propagation. In contrast, RMSprop performs slightly worse on A2C (56.7%), C2A (64.2%), P2R (80.4%), and R2P (83.6%), indicating that it converges slowly when dealing with information-theoretic-driven distribution alignment; the performance of Adagrad and SGD is between the two, with Adagrad slightly outperforming SGD (A2C 55.9%, C2A 65.0%, P2R 79.8%) on A2C (57.3%), C2A (63.8%), and P2R (81.0%), but still lower than Adam overall, reflecting the differences between different optimizers in dynamic learning rate adjustment and non-convex optimization.

These results further illustrate that the Adam optimizer can better adapt to the complex requirements of relative entropy regularization to minimize distributional differences through KL divergence and measure propagation to align feature distributions through probability measures, especially in domain pairs with high distribution heterogeneity, such as A2C and R2P. The performance degradation of RMSprop, Adagrad, and SGD indicates that they may be limited by the dynamic adjustment of learning rate or convergence efficiency in information-theory-driven joint optimization, which verifies the robustness and applicability of Adam as the default optimizer and provides an important reference for performance optimization of models on different domain pairs. In order to intuitively display the experimental results, an image of the hyperparameter sensitivity experiment is given, as shown in Figure 8.

The results of hyperparameter sensitivity experiments show that the unsupervised domain adaptation method based on relative entropy regularization and measure propagation proposed by us performs best on the A2C, C2A, P2R, and R2P domain pairs of the OfficeHome dataset when the relative entropy weight is

λ_{r e}

= 0.5 and the measure propagation weight is

λ_{m p}

= 0.5, reaching 59.0%, 66.5%, 82.2%, and 85.2%, respectively, verifying the synergistic optimization effect of weight balance on the KL divergence constraint and probability measure alignment; when

λ_{r e}

= 0.75 and

λ_{m p}

= 0.25, or

λ_{r e}

= 0.25 and

λ_{m p}

= 0.75, the performance decreases slightly (for example, A2C drops to 57.8% and 58.2%, respectively), indicating that weight imbalance may weaken the distribution alignment ability driven by information theory, especially on domain pairs with high distribution heterogeneity, such as A2C and C2A.

5.5. Performance on Text Data for Sentiment Classification

In order to further verify the generalization ability of our proposed unsupervised domain adaptation method based on relative entropy regularization and measure propagation on different types of data, we designed an experiment for text data, focusing on the cross-domain sentiment classification task. Sentiment classification is an important natural language processing task widely used in social media analysis, user comment mining, and other fields. However, due to differences in the distribution of language styles and expressions in different text domains, directly applying the model trained in the source domain to the target domain usually leads to performance degradation. Therefore, this experiment aims to verify the effectiveness of this method in text domain adaptation.

We selected the Amazon Reviews Dataset as the experimental dataset, which is a benchmark dataset for cross-domain sentiment classification. It contains user reviews of product categories such as Books (source domain, 10,000 labeled reviews, 5000 positive and 5000 negative) and Electronics (target domain, 10,000 unlabeled reviews), reflecting the changes in language style and vocabulary distribution due to category differences. In the experiment, we used the pre-trained BERT-base-uncased model as the feature extractor, fine-tuned it with Books domain data, and added a fully connected classifier on BERT, combined with relative entropy regularization and measure propagation. The experiment was run on the NVIDIA 4090D GPU to ensure efficiency. The experimental results are shown in Table 5.

Table 5 shows the experimental results of sentiment classification of our unsupervised domain adaptation method on the Amazon Reviews dataset (Books → Electronics), where our method achieved an accuracy of 78.4%, significantly better than the baseline model: BERT (Source Only) only achieved an accuracy of 65.2%, indicating that the performance is poor without domain adaptation; DANN and CDAN achieved 71.3% and 73.5%, respectively, showing that adversarial training can alleviate distributional differences, but the effect is limited; ToAlign reached 76.1%, which is close to our results but still inferior. This shows that relative entropy regularization and measure propagation effectively improve the performance of cross-domain sentiment classification by constraining distribution consistency and global semantic alignment through KL divergence, especially in text tasks with large differences in language style, verifying the robustness and superiority of the method on text data.

6. Conclusions

This paper proposes an unsupervised domain adaptation method based on relative entropy regularization and measure propagation. By incorporating KL divergence as an information-theoretic constraint and propagating probability measures for structural alignment, the method significantly improves generalization on target domains. Experimental results on the OfficeHome and DomainNet datasets demonstrate strong performance, with average accuracies of 72.3% and 46.2%, respectively, outperforming several competitive baselines. The proposed framework offers a principled and scalable solution for reducing distribution discrepancies and enhancing model robustness under moderate domain shifts.

While accuracy is a widely used evaluation metric and effectively demonstrates the performance of our approach, we acknowledge its limitations in capturing model behavior under class imbalance or skewed label distributions—common issues in real-world domain adaptation tasks. Relying solely on accuracy may overlook cases where the model performs poorly on minority classes. In future work, we plan to incorporate additional evaluation metrics such as F1-score or balanced accuracy and perform stratified analysis to better understand model behavior across different categories.

Despite its advantages, the proposed method still exhibits limitations when facing highly heterogeneous domain pairs, where global alignment alone may be insufficient. To address this, future research will focus on integrating local structure-aware adaptation strategies and enhancing the flexibility of pseudo-measure modeling to handle fine-grained distribution shifts. Additionally, scaling the framework to large-scale datasets and real-time applications remains an important challenge. Potential deployment scenarios include cross-institutional medical image analysis, multi-sensor perception in autonomous driving, and cross-platform sentiment classification, where distributional gaps and label scarcity are prevalent. These future directions will further improve the adaptability, interpretability, and real-world applicability of our method in complex cross-domain scenarios.

Author Contributions

Conceptualization, Y.S. and X.L.; Methodology, L.T. and W.W.; Software, L.T., Z.P., Y.S., H.J. and Z.X.; Validation, Z.P. and Z.X.; Formal analysis, Y.S. and Z.X.; Investigation, X.L. and S.L.; Resources, L.T. and S.L.; Data curation, L.T., Y.S. and S.L.; Writing—original draft, L.T. and W.W.; Writing—review & editing, L.T. and W.W.; Visualization, X.L.; Supervision, Z.P.; Project administration, H.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, X.; Yoo, C.; Xing, F.; Oh, H.; El Fakhri, G.; Kang, J.W.; Woo, J. Deep unsupervised domain adaptation: A review of recent advances and perspectives. APSIPA Trans. Signal Inf. Process. 2022, 11, e25. [Google Scholar] [CrossRef]
Park, G.Y.; Lee, S.W. Information-theoretic regularization for multi-source domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 9214–9223. [Google Scholar]
Deng, W.; Zhao, L.; Liao, Q.; Guo, D.; Kuang, G.; Hu, D.; Pietikainen, M.; Liu, L. Informative feature disentanglement for unsupervised domain adaptation. IEEE Trans. Multimed. 2021, 24, 2407–2421. [Google Scholar] [CrossRef]
Ahmed, S.M.; Raychaudhuri, D.S.; Paul, S.; Oymak, S.; Roy-Chowdhury, A.K. Unsupervised multi-source domain adaptation without access to source data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10103–10112. [Google Scholar]
Guan, D.; Huang, J.; Xiao, A.; Lu, S.; Cao, Y. Uncertainty-aware unsupervised domain adaptation in object detection. IEEE Trans. Multimed. 2021, 24, 2502–2514. [Google Scholar] [CrossRef]
Fang, Z.; Lu, J.; Liu, F.; Zhang, G. Semi-supervised heterogeneous domain adaptation: Theory and algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1087–1105. [Google Scholar] [CrossRef] [PubMed]
Ma, A.; Li, J.; Lu, K.; Zhu, L.; Shen, H.T. Adversarial entropy optimization for unsupervised domain adaptation. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6263–6274. [Google Scholar] [CrossRef]
Shi, Y.; Ying, X.; Yang, J. Deep unsupervised domain adaptation with time series sensor data: A survey. Sensors 2022, 22, 5507. [Google Scholar] [CrossRef]
Oza, P.; Sindagi, V.A.; Vs, V.; Patel, V.M.; Sharmini, V.V. Unsupervised domain adaptation of object detectors: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 4018–4040. [Google Scholar] [CrossRef] [PubMed]
Xu, T.; Chen, W.; Wang, P.; Wang, F.; Li, H.; Jin, R. Cdtrans: Cross-domain transformer for unsupervised domain adaptation. arXiv 2021, arXiv:2109.06165. [Google Scholar]
Mirza, M.J.; Micorek, J.; Possegger, H.; Bischof, H. The norm must go on: Dynamic unsupervised domain adaptation by normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 14765–14775. [Google Scholar]
Huang, J.; Guan, D.; Xiao, A.; Lu, S.; Shao, L. Category contrast for unsupervised domain adaptation in visual tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1203–1214. [Google Scholar]
Yu, W.; Sohrabi, F.; Jiang, T. Role of deep learning in wireless communications. IEEE BITS Inf. Theory Mag. 2022, 2, 56–72. [Google Scholar] [CrossRef]
Ali, A.; Anam, S.; Ahmed, M.M. Shannon entropy in artificial intelligence and its applications based on information theory. J. Appl. Emerg. Sci. 2023, 13, 9–17. [Google Scholar]
Chung, W.; Zhang, Y.; Pan, J. A theory-based deep-learning approach to detecting disinformation in financial social media. Inf. Syst. Front. 2023, 25, 473–492. [Google Scholar] [CrossRef] [PubMed]
Walunj, V.; Gharibi, G.; Alanazi, R.; Lee, Y. Defect prediction using deep learning with Network Portrait Divergence for software evolution. Empir. Softw. Eng. 2022, 27, 118. [Google Scholar] [CrossRef]
Cui, S.; Wang, S.; Zhuo, J.; Li, L.; Huang, Q.; Tian, Q. Fast batch nuclear-norm maximization and minimization for robust domain adaptation. arXiv 2021, arXiv:2107.06154. [Google Scholar]
Hamidi, S.M. The Interplay of Information Theory and Deep Learning: Frameworks to Improve Deep Learning Efficiency and Accuracy; University of Waterloo: Waterloo, ON, Canada, 2024. [Google Scholar]
Shwartz Ziv, R.; LeCun, Y. To compress or not to compress—Self-supervised learning and information theory: A review. Entropy 2024, 26, 252. [Google Scholar] [CrossRef]
Wu, C.; Zhang, J. Robust semi-supervised spatial picture fuzzy clustering with local membership and KL-divergence for image segmentation. Int. J. Mach. Learn. Cybern. 2022, 13, 963–987. [Google Scholar] [CrossRef]
Sanokowski, S.; Hochreiter, S.; Lehner, S. A diffusion model framework for unsupervised neural combinatorial optimization. arXiv 2024, arXiv:2406.01661. [Google Scholar]
Lee, K.S.; Tran, N.T.; Cheung, N.M. Infomax-gan: Improved adversarial image generation via information maximization and contrastive learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January2021; pp. 3942–3952. [Google Scholar]
Li, J.; Ren, Y.; Deng, K. FairGAN: GANs-based fairness-aware learning for recommendations with implicit feedback. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 297–307. [Google Scholar]
Ge, P.; Ren, C.X.; Xu, X.L.; Yan, H. Unsupervised domain adaptation via deep conditional adaptation network. Pattern Recognit. 2023, 134, 109088. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Z.; He, W. Class relationship embedded learning for source-free unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7619–7629. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Saito, K.; Watanabe, K.; Ushiku, Y.; Harada, T. Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
Chen, M.; Zhao, S.; Liu, H.; Cai, D. Adversarial-learned loss for domain adaptation. Proc. AAAI Conf. Artif. Intell. 2020, 34, 3521–3528. [Google Scholar] [CrossRef]
Hung, C.C.; Lange, L.; Strötgen, J. TADA: Efficient task-agnostic domain adaptation for transformers. arXiv 2023, arXiv:2305.12717. [Google Scholar]
Zhang, Y.; Liu, T.; Long, M.; Jordan, M. Bridging theory and algorithm for domain adaptation. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019. [Google Scholar]
Hu, L.; Kan, M.; Shan, S.; Chen, X. Unsupervised domain adaptation with hierarchical gradient synchronization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Cui, S.; Wang, S.; Zhuo, J.; Su, C.; Huang, Q.; Tian, Q. Gradually vanishing bridge for adversarial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 12455–12464. [Google Scholar]
Zhong, L.; Fang, Z.; Liu, F.; Lu, J.; Yuan, B.; Zhang, G. How does the combined risk affect the performance of unsupervised domain adaptation approaches? Proc. AAAI Conf. Artif. Intell. 2021, 35, 11079–11087. [Google Scholar] [CrossRef]
Cui, S.; Jin, X.; Wang, S.; He, Y.; Huang, Q. Heuristic Domain Adaptation; NeurlPS: Singapore, 2020; Volume 33, pp. 7571–7583. [Google Scholar]
Wei, G.; Lan, C.; Zeng, W.; Chen, Z. Metaalign: Coordinating domain alignment and classification for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16643–16653. [Google Scholar]
Wei, G.; Lan, C.; Zeng, W.; Zhang, Z.; Chen, Z. Toalign: Task-oriented alignment for unsupervised domain adaptation. Adv. Neural Inf. Process. Syst. 2021, 34, 13834–13846. [Google Scholar]
Wang, Q.; Meng, F.; Breckon, T.P. Data augmentation with norm-AE and selective pseudo-labelling for unsupervised domain adaptation. Neural Netw. 2023, 161, 614–625. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Wang, B.; Chen, Y. A contrastive deep learning approach to cryptocurrency portfolio with us treasuries. J. Comput. Technol. Appl. Math. 2024, 1, 1–10. [Google Scholar]
Chang, W.G.; You, T.; Seo, S.; Kwak, S.; Han, B. Domain-specific batch normalization for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7354–7362. [Google Scholar]
Deng, W.; Liao, Q.; Zhao, L.; Guo, D.; Kuang, G.; Hu, D.; Liu, L. Joint clustering and discriminative feature alignment for unsupervised domain adaptation. IEEE Trans. Image Process. 2021, 30, 7842–7855. [Google Scholar] [CrossRef]
Lee, S.; Kim, D.; Kim, N.; Jeong, S.G. Drop to adapt: Learning discriminative features for unsupervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republc of Korea, 27 October–2 November 2019; pp. 91–100. [Google Scholar]
Cao, Z.; Ma, L.; Long, M.; Wang, J. Partial adversarial domain adaptation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 135–150. [Google Scholar]
Zhang, L.; Xu, L.; Motamed, S.; Chakraborty, S.; De la Torre, F. D3GU: Multi-target Active Domain Adaptation via Enhancing Domain Alignment. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 1–6 January 2024; pp. 2577–2586. [Google Scholar]
Cheng, Y.; Yao, P.; Xu, L.; Chen, M.; Liu, P.; Shao, P.; Shen, S.; Xu, R.X. DCST: Dual Cross-Supervision for Transformer-based Unsupervised Domain Adaptation. Neural Netw. 2025, 181, 106749. [Google Scholar] [CrossRef]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. The network architecture used in this article.

Figure 2. This figure shows the network architecture used in the article, showing the relationship between source and target domain samples. By introducing a reference distribution

Q (z, y)

and regularizing the joint distribution

P_{t} (z, y)

of the target domain, the network aims to transfer source domain knowledge to the target domain.

Figure 2. This figure shows the network architecture used in the article, showing the relationship between source and target domain samples. By introducing a reference distribution

Q (z, y)

and regularizing the joint distribution

P_{t} (z, y)

of the target domain, the network aims to transfer source domain knowledge to the target domain.

Figure 3. The figure mainly describes the knowledge transfer between the target domain and the source domain, which includes the propagation of the target domain measurement and the source domain measurement. Function

R (z)

is responsible for calculating the mapping between different domains, and the network is trained and adjusted through

W

and

b

.

Figure 3. The figure mainly describes the knowledge transfer between the target domain and the source domain, which includes the propagation of the target domain measurement and the source domain measurement. Function

R (z)

is responsible for calculating the mapping between different domains, and the network is trained and adjusted through

W

and

b

.

Figure 4. OfficeHome dataset example.

Figure 5. DomainNet dataset example image.

Figure 6. The result of the Grad-CAM algorithm in the target domain of the OfficeHome dataset.

Figure 7. T-SNE graphs under different ablation conditions. (a) w/o Relative Entropy Regularization, (b) w/o Measure Propagation, (c) Ours (Full Model).

Figure 8. Hyperparameter sensitivity experiments of

λ_{r e}

and

λ_{m p}

.

Figure 8. Hyperparameter sensitivity experiments of

λ_{r e}

and

λ_{m p}

.

Table 1. Experimental results on an OfficeHome dataset.

Model	A2C	A2P	A2R	C2A	C2P	C2R	P2A	P2C	P2R	R2A	R2C	R2P	Avg
ResNet50 [26]	34.9	50.0	58.0	37.4	41.9	46.2	38.5	31.2	60.4	53.9	41.2	59.9	46.1
MCD [27]	48.9	68.3	74.6	61.3	67.6	68.8	57.0	47.1	75.1	69.1	52.2	79.6	64.1
CDANs [28]	50.7	70.6	76.0	57.6	70.0	70.0	57.4	50.9	77.3	70.9	56.7	81.6	65.8
ALDA [29]	53.7	70.1	76.4	60.2	72.6	71.5	56.8	51.9	77.1	70.2	56.3	82.1	66.6
TADA [30]	53.1	72.3	77.2	59.1	71.2	72.1	59.7	53.1	78.4	72.4	60.0	82.9	67.6
MCD [31]	54.9	73.7	77.8	60.0	71.4	71.8	61.2	53.6	78.1	72.5	60.2	82.3	68.1
BNM	56.2	73.7	79.0	63.1	73.6	74.0	62.4	54.8	80.7	72.4	58.9	83.5	69.4
GSDA [32]	61.3	76.1	79.4	65.4	73.3	74.3	65.0	53.2	80.0	72.2	60.6	83.1	70.3
GVB [33]	57.0	74.7	79.8	64.6	74.1	74.6	65.2	55.1	81.0	74.6	59.7	84.3	70.4
E-Mix [34]	57.7	76.0	79.8	63.6	74.1	75.0	63.4	56.4	79.7	72.8	62.4	85.5	70.6
HDA [35]	56.8	75.2	79.8	65.1	73.9	75.2	66.3	56.7	81.8	75.4	59.7	84.7	70.9
Meta-Align [36]	59.3	76.0	80.2	65.7	74.7	75.1	65.7	56.5	81.6	74.1	61.1	85.2	71.3
ToAlign [37]	57.9	76.9	80.8	66.7	75.6	77.0	67.8	57.0	82.5	75.1	60.0	84.9	71.8
Norm-AE-SPL [38]	51.6	76.0	80.6	63.0	77.0	78.4	62.9	50.7	81.2	66.3	52.8	82.9	68.6
SACAEM [39]	55.2	70.7	75.7	60.7	68.3	68.9	62.3	55.9	78.5	72.8	82.1	82.8	69.5
Ours	59.0	77.5	81.0	66.5	76.5	77.2	68.0	57.5	82.2	75.0	61.8	85.2	72.3

The best result is in bold.

Table 2. Experimental results on a DomainNet dataset.

Model	R2C	R2P	R2S	C2R	C2P	C2S	P2R	P2C	P2S	S2R	S2C	S2P	Avg
ResNet50	41.6	42.7	29.6	42.4	27.2	32.1	49.5	32.5	26.7	38.7	40.8	27.5	35.9
MSTN [40]	27.2	32.9	24.3	28.1	21.1	24.1	30.7	19.8	22.5	24.3	26.2	23.5	25.4
RSDA [41]	27.2	35.8	24.3	36.9	24.9	31.1	41.3	26.1	24.7	29.4	26.2	27.7	29.6
MCD	36.3	36.5	24.9	40.3	25.8	32.1	43.6	29.6	25.7	34.1	39.1	26.8	32.9
DANN	45.9	44.5	35.4	46.8	30.5	36.7	48.0	34.7	32.1	47.1	46.4	38.4	40.5
CAN [42]	40.7	37.7	33.7	54.9	31.4	37.3	51.0	33.6	30.9	52.1	42.1	32.0	39.8
PAN [43]	49.2	48.1	36.4	49.6	33.2	38.7	51.8	36.0	32.9	49.1	50.9	39.8	43.0
CDANs	50.1	48.3	39.0	50.0	33.3	39.3	52.2	36.4	33.6	48.4	49.2	38.6	43.2
HDA	46.3	47.5	34.3	49.9	33.9	37.9	55.2	40.8	32.7	49.0	49.7	40.0	43.1
FixBi	51.1	49.1	39.6	50.0	34.5	41.1	52.2	36.4	33.6	50.8	53.5	41.6	44.5
ToAlign	50.8	50.7	35.1	49.5	33.8	41.4	57.9	43.5	36.2	47.9	55.5	41.6	45.4
D³GU [44]	51.0	50.3	35.6	49.8	34.0	41.5	57.5	43.1	36.3	48.1	55.3	41.6	45.7
DCST [45]	51.1	50.1	35.9	50.0	34.1	41.6	57.5	43.1	36.5	48.5	55.0	41.3	45.3
Ours	51.2	49.6	36.1	50.3	34.2	41.7	57.7	43.3	36.4	48.4	55.2	41.7	46.2

The best result is in bold.

Table 3. Ablation experiment (OfficeHome dataset).

Model	A2P	C2P	P2A	R2C
Ours (Full Model)	77.5	76.5	68.0	61.8
w/o Relative Entropy Regularization	75.2	74.3	65.7	59.4
w/o Measure Propagation	76.0	75.1	66.5	60.3

Table 4. Hyperparameter sensitivity experiments (OfficeHome dataset).

Optimizer	A2C	C2A	P2R	R2P
RMSprop	56.7	64.2	80.4	83.6
Adagrad	57.3	63.8	81.0	84.1
SGD	55.9	65.0	79.8	83.2
Adam	59.0	66.5	82.2	85.2

Table 5. Experimental results on text sentiment classification.

Method	ACC
BERT (Source Only)	65.2%
DANN	71.3%
CDAN	73.5%
ToAlign	76.1%
Ours	78.4%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tan, L.; Peng, Z.; Song, Y.; Liu, X.; Jiang, H.; Liu, S.; Wu, W.; Xiang, Z. Unsupervised Domain Adaptation Method Based on Relative Entropy Regularization and Measure Propagation. Entropy 2025, 27, 426. https://doi.org/10.3390/e27040426

AMA Style

Tan L, Peng Z, Song Y, Liu X, Jiang H, Liu S, Wu W, Xiang Z. Unsupervised Domain Adaptation Method Based on Relative Entropy Regularization and Measure Propagation. Entropy. 2025; 27(4):426. https://doi.org/10.3390/e27040426

Chicago/Turabian Style

Tan, Lianghao, Zhuo Peng, Yongjia Song, Xiaoyi Liu, Huangqi Jiang, Shubing Liu, Weixi Wu, and Zhiyuan Xiang. 2025. "Unsupervised Domain Adaptation Method Based on Relative Entropy Regularization and Measure Propagation" Entropy 27, no. 4: 426. https://doi.org/10.3390/e27040426

APA Style

Tan, L., Peng, Z., Song, Y., Liu, X., Jiang, H., Liu, S., Wu, W., & Xiang, Z. (2025). Unsupervised Domain Adaptation Method Based on Relative Entropy Regularization and Measure Propagation. Entropy, 27(4), 426. https://doi.org/10.3390/e27040426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unsupervised Domain Adaptation Method Based on Relative Entropy Regularization and Measure Propagation

Abstract

1. Introduction

2. Related Work

2.1. Unsupervised Domain Adaptation

2.2. Information Theory in Deep Learning

3. Method

3.1. Information-Theoretic Constraint via Relative Entropy Regularization

3.2. Measure Propagation

3.3. Overall Loss Function

4. Datasets and Evaluation Metrics

4.1. Dataset Introduction

4.2. Evaluation Metrics

5. Experiment

5.1. Experimental Details

5.2. Performance Comparison Experiment

5.3. Ablation Experiment

5.4. Hyperparameter Sensitivity Experiments

5.5. Performance on Text Data for Sentiment Classification

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI