1. Introduction
Unsupervised domain adaptation (UDA) is an important research direction in the field of machine learning, aimed at addressing the problem of distributional differences between source and target domains [
1]. In many real-world applications, such as image classification, speech recognition, and natural language processing, acquiring labeled data is costly, and the target domain often lacks labels [
2,
3]. Furthermore, data from the source domain cannot be directly used for the target task. UDA enhances the performance of models in the target domain by leveraging labeled information from the source domain, which has broad practical applications [
4]. For example, in medical image analysis, UDA can help adapt data across hospitals. In autonomous driving, UDA can adapt to different lighting conditions or viewpoints. Therefore, the study of efficient UDA methods is of significant theoretical and practical importance in advancing cross-domain generalization capabilities.
Although traditional unsupervised domain adaptation (UDA) methods, such as feature alignment and adversarial training, have alleviated distribution discrepancies to some extent, they still face notable limitations. A key challenge lies in the uncertainty of the predicted distribution in the target domain, which often leads to overfitting or reduced accuracy. Moreover, many existing approaches rely on shallow metrics, such as Maximum Mean Discrepancy (MMD) or adversarial loss, to align source and target domains, without incorporating deeper insights from information theory. As a result, these methods may overlook the intrinsic structural relationships between domains, causing critical information to be lost or redundant features to be introduced, ultimately impairing generalization capability in complex scenarios [
5].
To address these limitations, there is a growing interest in exploring more principled approaches that can provide deeper theoretical guarantees while capturing both global and semantic structures in cross-domain tasks. Information theory, with its ability to model uncertainty and quantify distributional differences, offers a compelling foundation for designing such methods. However, its integration into domain adaptation frameworks remains underexplored.
In light of this, we propose a novel information-theoretic framework for UDA that explicitly addresses distribution discrepancy, defined as the difference in data distributions between the labeled source domain and the unlabeled target domain. Our method integrates relative entropy regularization and measure propagation to achieve robust domain alignment from an information-theoretic perspective. Specifically, relative entropy regularization employs Kullback–Leibler (KL) divergence to constrain the target domain’s predicted distribution, encouraging consistency with the source domain’s reference distribution and reducing information loss. Meanwhile, measure propagation transfers probability measures from the source to the target domain, constructing pseudo-measures that ensure global consistency in feature space representation [
6]. These two components are jointly optimized with a feature extractor and classifier, resulting in improved performance and generalization across diverse target domains [
7,
8].
The innovations of this paper are summarized as follows:
- (1)
An unsupervised domain adaptation method based on relative entropy regularization is proposed. KL divergence is used to accurately constrain the distributional difference between the source domain and the target domain from the perspective of information theory, thus overcoming the limitations of traditional distance metrics.
- (2)
The measure propagation mechanism is introduced, generating the target domain pseudo-measure by propagating the probability measure of the source domain, and the application of information theory in the global structure modeling of the feature space is deepened;
- (3)
Combining the above methods, we construct an information-theory-driven joint optimization framework, which significantly improves the generalization ability of the model in the target domain and provides new ideas for the application of information theory in deep learning.
3. Method
In this study, we propose a novel unsupervised domain adaptation method that addresses the distributional gap between a labeled source domain and an unlabeled target domain by integrating two complementary information-theoretic components: relative entropy regularization and measure propagation. The model architecture is shown in
Figure 1.
The first component, relative entropy regularization, is formulated using Kullback–Leibler (KL) divergence to minimize the discrepancy between the predicted label distribution in the target domain and a reference distribution constructed from the source domain. This regularization encourages the model to generate predictions in the target domain that are statistically consistent with the knowledge obtained from the labeled source domain, thereby reducing uncertainty and enhancing model robustness.
To further address the structural misalignment in the feature space, we incorporate a second component: measure propagation. This mechanism aims to build an estimated distribution, or pseudo-measure, for the target domain by transferring probability information from the source domain. Specifically, the probability distribution of source features is adjusted using a learned transformation function to approximate the distribution of target features. This transformation is modeled as a density ratio, which quantifies how the source distribution should be reshaped to resemble the target one. The density ratio is implemented via a neural network that outputs a scaling factor for each target feature, enabling flexible and data-driven alignment.
The optimization process is composed of three loss functions: (1) a classification loss on the labeled source data, (2) a KL divergence loss enforcing consistency between the predicted target distribution and the source-derived reference distribution, and (3) an adversarial loss that promotes similarity between the propagated pseudo-measure and the actual distribution of the target domain. These components are jointly optimized during training to update the feature extractor, classifier, and auxiliary networks. This unified framework results in more stable cross-domain generalization and improved alignment at both the distributional and structural levels [
24,
25].
3.1. Information-Theoretic Constraint via Relative Entropy Regularization
In unsupervised domain adaptation (UDA), the distributional difference between the source domain and target domain is a central challenge. Traditional methods usually solve the problem through feature alignment or adversarial training, but these methods struggle to accurately control the prediction behavior of the target domain due to the lack of labels in the target domain. Therefore, we propose a regularization strategy based on relative entropy (i.e., KL divergence) to achieve domain adaptation by restricting the consistency between the target domain distribution and the source-domain-derived reference distribution.
We define a feature extractor
with parameters
, which maps input
to feature space
and obtains
. The classifier
with parameters
takes input
and outputs the probability distribution
of
categories. The source domain has supervised data, and the feature distribution
and conditional distribution
can be directly defined. For the target domain, only
can be obtained through
, and
is unknown because it has no labels. Our goals are as follows: (1) to train
and
on the source domain to minimize the classification error; (2) to ensure that the joint distribution
of the target domain is aligned with the source domain knowledge. To this end, we introduce a reference distribution
and regularize
with relative entropy to make it close to
. The model architecture of this section is shown in
Figure 2.
The reference distribution
is constructed directly from the source domain and is defined as follows:
where
is the empirical distribution of source domain features.
is the Dirac delta function, and
is the classifier’s prediction of the source domain features. This form is the simplest and most direct choice, making full use of the empirical distribution of the source domain and the classifier output to avoid introducing additional complexity. The joint distribution of the target domain is as follows:
where
. We want to minimize the relative entropy between
and
:
expands to:
Substituting into the joint distribution, we can see the following:
According to the properties of logarithms and the expected decomposition, we can obtain the following:
which can be equivalently changed to the following:
The first term
measures the difference in feature distribution, and the second term
ensures that the conditional distribution is consistent. Since it is not feasible to directly calculate the continuous integral, we use empirical samples to approximate. The first term is as follows:
However, accurate estimation of
and
requires density estimation, which increases the computational burden. For simplicity, we assume that G has been aligned with
and
through subsequent methods (see
Section 3.2), making the first term less influential. Therefore, we focus on optimizing the second term:
The empirical approximation is as follows:
Since the target domain sample
has no corresponding
, we use the source domain nearest neighbor approximation: for each
, find the nearest source domain sample
and define:
So, we can obtain the following:
In summary, relative entropy regularization introduces KL divergence as an information-theory tool to constrain the consistency of the predicted distribution of the target domain and the reference distribution of the source domain, effectively reducing the distributional difference problem in unsupervised domain adaptation [
25]. This method uses the supervised information of the source domain and the unlabeled data of the target domain, and through nearest neighbor approximation and empirical sample optimization, it significantly improves the robustness and prediction accuracy of the model in the target domain, providing a solid foundation for information-theory-driven domain adaptation.
3.2. Measure Propagation
In
Section 3.2 we discussed that the goal of measure propagation is to derive the pseudo measure
of the target domain from
, and make
close to
by optimizing G, thereby achieving knowledge transfer between domains. The model architecture is shown in
Figure 3.
The core idea of measure propagation is to use the probability measure of the source domain and propagate it to the target domain through some transformation to generate a reference distribution consistent with the target domain samples. We define a measure propagator
, which inputs the source domain measure
and the target domain feature
and outputs a pseudo measure:
To simplify the implementation, we choose the simplest form based on density ratio. Assume that the target domain distribution
is associated with the source domain distribution
through a density ratio function
:
Then, the pseudo-measure can be defined as follows:
where
represents the distribution change ratio from the source domain to the target domain. There are many possible forms of direct estimation of
, such as parametric models or non-parametric estimation. To keep it simple, we chose a neural network
with parameters
to directly predict
and regard it as a scalar function of
. The source domain feature distribution is the empirical distribution:
So, the pseudo-measure is as follows:
The actual feature distribution of the target domain is as follows:
Our goal is to make
as close to
as possible by optimizing
G and
R. The optimization goal uses KL divergence to measure the difference between the two:
Expanding and inserting the definition provides the following:
Then, split the formula into three terms to obtain the following:
The first term is the entropy of
, which has no direct contribution to optimization and can be ignored. The second and third terms are as follows:
The empirical approximation is as follows:
Since the
function is zero at
, the third term is difficult to calculate directly. And
Section 3.1 has been aligned, so this article focuses on optimizing
and
G:
But this alone is not enough to constrain
. We introduce a discriminator
with parameter
to distinguish
and
. The optimization goal is as follows:
The empirical form is as follows:
The goal of
G and
R is to deceive
D, that is, to minimize the following:
Measure propagation generates pseudo-measures of the target domain by propagating the probability measure of the source domain, achieving global alignment of the feature space and enhancing the distribution consistency of unsupervised domain adaptation. This method uses a simple form of density ratio and adversarial training to optimize the feature extractor and discriminator, significantly improving the generalization ability of the model in complex cross-domain tasks and deepening the application of information theory in global structure modeling.
3.3. Overall Loss Function
In our unsupervised domain adaptation framework, the overall loss function combines multiple information-theory-driven components to optimize the distribution alignment of the source and target domains. The core part includes the cross-entropy loss
of the source domain, which is used to minimize the classification error and is defined as follows:
where
is the number of source domain samples,
and
are source domain samples and labels, respectively, and
is the classifier prediction probability. Combined with the relative entropy regularization loss
, the consistency between the target domain prediction distribution and the source domain reference distribution is constrained by the KL divergence, and its empirical approximation is as follows:
where
is the number of target domain samples,
is the target domain sample, and
is the nearest neighbor source domain sample. In addition, the measure propagation loss
optimizes the alignment of the target domain pseudo measure with the source domain measure through adversarial training, which is defined as follows:
where
D is the discriminator and
R is the density ratio network. The final overall loss function is as follows:
By balancing the contributions of the three through hyperparameters and , the feature extractor G and classifier C are jointly optimized to achieve information-theory-driven cross-domain generalization.
5. Experiment
To comprehensively evaluate the proposed unsupervised domain adaptation method based on relative entropy regularization and measure propagation, a series of experiments are designed in this chapter. These include comparison experiments to validate the model’s performance on the target domain, hyperparameter sensitivity experiments to analyze the impact of key parameters on the results, ablation experiments to explore the role of each component, and visualization experiments to visually demonstrate the feature distribution and distribution alignment effects. These experiments are conducted on multiple benchmark datasets, aiming to systematically verify the method’s effectiveness, robustness, and theoretical advantages.
5.1. Experimental Details
In the experiments, the model is trained for 300 epochs using a ResNet50 pre-trained on ImageNet as the backbone to extract deep features. The Adam optimizer is used with an initial learning rate of 0.001, which is dynamically adjusted via cosine annealing. The batch size is set to 32, and all experiments are conducted on a computing platform equipped with an NVIDIA 4090D GPU to ensure computational efficiency and consistency. The model weights are fine-tuned through random initialization, and hyperparameters, such as the relative entropy regularization weight and measure propagation weight, are optimized on the validation set to ensure the fairness and reproducibility of the experimental results.
5.2. Performance Comparison Experiment
The training indicator evaluation of the comparative experiment follows the unsupervised domain adaptation task. For the OfficeHome dataset, its four domains C, P, R, and A and the other three domains are tested for domain adaptation. Each result is tested three times with different random seeds and the average is taken. The average accuracy of 12 experiments is used to express the overall evaluation of the dataset.
The training DomainNet dataset contains six data domains. Given that the Infograph and QuickDraw domains are too different from other data, and it is difficult to obtain better feature expression when using Resnet50 as the backbone, only four of the six domains in the dataset (Real, Painting, Sketch, Clipart) are selected for the same test as OfficeHome. Its overall evaluation indicator is still the average accuracy of the 12 pairs of tasks. The experimental results are shown in
Table 1.
The experimental results demonstrate that our proposed unsupervised domain adaptation method based on relative entropy regularization and measure propagation achieves an average accuracy of 72.3% on the OfficeHome dataset, outperforming all baseline models, including ToAlign (71.8%). This result confirms the overall effectiveness of our method in enhancing cross-domain generalization.
From an information-theoretic standpoint, relative entropy regularization effectively constrains the predicted label distribution of the target domain to align with a reference distribution derived from the source domain via KL divergence. This helps reduce prediction uncertainty in the target domain, especially in domain pairs where category-level alignment is critical (e.g., A2P and C2P). On the other hand, measure propagation promotes global consistency in feature representation by transferring probability mass from the source to the target domain, which is particularly beneficial in domain pairs such as R2P and P2R, where structural differences in the feature space are more pronounced.
Notably, our method excels on several domain pairs with relatively moderate distribution shifts, including A2P (77.5%), C2P (76.5%), and R2P (85.2%). These results suggest that our method effectively captures both local label consistency and global feature structure in such settings. However, it does not achieve the best results on more challenging pairs like A2C (59.0%), C2A (66.5%), and P2C (57.5%), where the domain shift is more complex and localized. In these cases, the current formulation of KL divergence and the pseudo-measure approximation may lack the granularity needed for fine-grained adaptation, limiting performance.
Despite these limitations, the joint optimization of relative entropy regularization and measure propagation provides a strong balance between alignment accuracy and structural generalization. This synergy results in consistently competitive performance across a wide range of domain pairs, validating the robustness and theoretical soundness of the proposed method.
This paper also gives the Grad-CAM image of the OfficeHome dataset, as shown in
Figure 6.
The Grad-CAM [
40,
41] visualization results show the performance of our proposed unsupervised domain adaptation method based on relative entropy regularization and measure propagation on the OfficeHome target domain, for six examples. Each example includes the original image, Grad-CAM heat map, and superimposed image. The heat map focuses on the key areas of the target object, and the superimposed image clearly shows the target part that the model focuses on, indicating that relative entropy regularization constrains the consistency of the target domain distribution with the source domain through KL divergence, and measure propagation enhances the distinguishing ability of features through probability measure alignment, thereby achieving accurate category recognition in the target domain, verifying the effectiveness of the method. This paper also gives the experimental results of other datasets, as shown in
Table 2.
The experimental results show that our proposed unsupervised domain adaptation method based on relative entropy regularization and measure propagation achieves an average performance of 46.2% on the DomainNet dataset, significantly outperforming other baseline models (such as ToAlign with 45.4%). This verifies the effectiveness of our method in handling large-scale, cross-domain tasks with strong distribution heterogeneity. The complexity of DomainNet, with larger distributional differences between domains, imposes higher demands on information-theoretic methods. Relative entropy regularization precisely constrains the consistency between the predicted distribution of the target domain and the reference distribution of the source domain using KL divergence, reducing information loss. Measure propagation, by propagating the source domain’s probability measures to generate pseudo-measures for the target domain, enhances the global alignment of the feature space. The method performs exceptionally well on challenging domain pairs such as A2C (51.2%) and P2A (57.7%), contributing to the overall improvement in average performance.
Similarly, our method does not perform the best on all domain pairs in the DomainNet dataset. For example, it is does not perform as well as some models (such as PAN and FixBi) on A2P (49.6%), C2P (34.2%), and P2R (36.4%). This may be due to the larger distribution heterogeneity and noise in the DomainNet dataset, where relative entropy regularization and measure propagation may require stronger local adaptation capabilities in some extreme scenarios. Nevertheless, the overall robustness of the method benefits from the information-theory-driven joint optimization. The use of KL divergence to quantify distributional differences and measure propagation to model global structure allows the method to maintain significant performance advantages on large-scale, complex datasets, highlighting the applicability and scalability of the theoretical design.
5.3. Ablation Experiment
To comprehensively evaluate the contribution of each component in our proposed unsupervised domain adaptation method, based on relative entropy regularization and measure propagation, this section presents ablation experiments on the OfficeHome dataset. The focus is on analyzing the performance of four representative domain pairs: A2P, C2P, P2A, and R2C. By systematically removing key modules (such as relative entropy regularization or measure propagation), we verify the contribution of each part to the overall model performance. The experimental results are shown in
Table 3.
The ablation experiment results show that our proposed unsupervised domain adaptation method, based on relative entropy regularization and measure propagation, performs excellently on the A2P, C2P, P2A, and R2C domain pairs of the OfficeHome dataset. The complete model (ours) achieves the best performance across all domain pairs, with A2P reaching 77.5%, C2P 76.5%, P2A 68.0%, and R2C 61.8%. After removing relative entropy regularization, A2P drops to 75.2%, C2P to 74.3%, P2A to 65.7%, and R2C to 59.4%. This indicates that KL divergence plays a critical role in constraining the consistency between the target domain’s predicted distribution and the source domain’s reference distribution. The impact is especially noticeable in A2P and C2P, where there are large distributional differences, validating the necessity of information-theoretic alignment.
After removing measure propagation, A2P drops to 76.0%, C2P to 75.1%, P2A to 66.5%, and R2C to 60.3%. This shows that measure propagation, by propagating the source domain’s probability measures to generate pseudo-measures for the target domain, makes a significant contribution to the global alignment of the feature space, especially in domain pairs with higher heterogeneity, such as P2A and R2C. Although performance decreases when either of the two modules is removed, the collaborative effect of the complete model significantly improves performance, fully demonstrating the complementary advantages of relative entropy regularization and measure propagation in information-theory-driven domain adaptation.
Secondly, for the P2A task, this paper selected 10 categories, with 50 samples in each category, and used T-SNE [
46] to visualize the results of three different ablation experiments. The experimental results are shown in
Figure 7.
The T-SNE visualization in the experimental results shows the feature distribution of our proposed unsupervised domain adaptation method, based on relative entropy regularization and measure propagation, on the A2P task of the OfficeHome dataset, involving 10 categories (from Alarm Clock to Mug). The complete model (ours, with an accuracy of 77.5%) exhibits the clearest category clustering in its feature distribution (
Figure 7c), with compact clusters and clear boundaries for the 10 categories. This indicates that relative entropy regularization constrains the consistency between the target domain and the source domain distributions using KL divergence, while measure propagation achieves global feature space alignment by propagating probability measures, significantly enhancing the separability between categories and reflecting the method’s optimal performance.
After removing relative entropy regularization (
Figure 7a, accuracy 75.2%) or measure propagation (
Figure 7b, accuracy 76.0%), the clustering effect of the feature distribution significantly worsens. In
Figure 7a, the category clusters become more scattered and partially overlap, indicating that the absence of the KL divergence constraint increases the deviation between the target domain’s predicted distribution and the source domain’s reference distribution, exacerbating information loss. In
Figure 7b, although the category distribution is more concentrated compared to
Figure 7a, it is still more dispersed than in the complete model, indicating the indispensable role of measure propagation in global feature alignment. The performance degradation in these two figures validates the complementary role of relative entropy regularization and measure propagation in information-theory-driven domain adaptation.
5.4. Hyperparameter Sensitivity Experiments
In order to deeply evaluate the robustness of our unsupervised domain adaptation method based on relative entropy regularization and measure propagation, this section conducts hyperparameter sensitivity experiments on the OfficeHome dataset, focusing on analyzing the impact of key hyperparameters on model performance. First, this paper experiments on conventional hyperparameters, such as the optimizer. Secondly, we pay special attention to changes in relative entropy regularization weight
and measure propagation weight
. By testing different value ranges on the four representative domain pairs of A2C, C2A, P2R, and R2P, we systematically explore the impact of hyperparameter settings on accuracy, thereby ensuring the stability and optimal performance of the method in practical applications. The experimental results are shown in
Table 4.
The results of hyperparameter sensitivity experiments show that our proposed unsupervised domain adaptation method, based on relative entropy regularization and measure propagation, has the best performance on the A2C, C2A, P2R, and R2P domain pairs of the OfficeHome dataset, with the Adam optimizer reaching 59.0%, 66.5%, 82.2%, and 85.2%, respectively, verifying the high efficiency of Adam in complex optimization that combines KL divergence constraints and probabilistic measure propagation. In contrast, RMSprop performs slightly worse on A2C (56.7%), C2A (64.2%), P2R (80.4%), and R2P (83.6%), indicating that it converges slowly when dealing with information-theoretic-driven distribution alignment; the performance of Adagrad and SGD is between the two, with Adagrad slightly outperforming SGD (A2C 55.9%, C2A 65.0%, P2R 79.8%) on A2C (57.3%), C2A (63.8%), and P2R (81.0%), but still lower than Adam overall, reflecting the differences between different optimizers in dynamic learning rate adjustment and non-convex optimization.
These results further illustrate that the Adam optimizer can better adapt to the complex requirements of relative entropy regularization to minimize distributional differences through KL divergence and measure propagation to align feature distributions through probability measures, especially in domain pairs with high distribution heterogeneity, such as A2C and R2P. The performance degradation of RMSprop, Adagrad, and SGD indicates that they may be limited by the dynamic adjustment of learning rate or convergence efficiency in information-theory-driven joint optimization, which verifies the robustness and applicability of Adam as the default optimizer and provides an important reference for performance optimization of models on different domain pairs. In order to intuitively display the experimental results, an image of the hyperparameter sensitivity experiment is given, as shown in
Figure 8.
The results of hyperparameter sensitivity experiments show that the unsupervised domain adaptation method based on relative entropy regularization and measure propagation proposed by us performs best on the A2C, C2A, P2R, and R2P domain pairs of the OfficeHome dataset when the relative entropy weight is = 0.5 and the measure propagation weight is = 0.5, reaching 59.0%, 66.5%, 82.2%, and 85.2%, respectively, verifying the synergistic optimization effect of weight balance on the KL divergence constraint and probability measure alignment; when = 0.75 and = 0.25, or = 0.25 and = 0.75, the performance decreases slightly (for example, A2C drops to 57.8% and 58.2%, respectively), indicating that weight imbalance may weaken the distribution alignment ability driven by information theory, especially on domain pairs with high distribution heterogeneity, such as A2C and C2A.
5.5. Performance on Text Data for Sentiment Classification
In order to further verify the generalization ability of our proposed unsupervised domain adaptation method based on relative entropy regularization and measure propagation on different types of data, we designed an experiment for text data, focusing on the cross-domain sentiment classification task. Sentiment classification is an important natural language processing task widely used in social media analysis, user comment mining, and other fields. However, due to differences in the distribution of language styles and expressions in different text domains, directly applying the model trained in the source domain to the target domain usually leads to performance degradation. Therefore, this experiment aims to verify the effectiveness of this method in text domain adaptation.
We selected the Amazon Reviews Dataset as the experimental dataset, which is a benchmark dataset for cross-domain sentiment classification. It contains user reviews of product categories such as Books (source domain, 10,000 labeled reviews, 5000 positive and 5000 negative) and Electronics (target domain, 10,000 unlabeled reviews), reflecting the changes in language style and vocabulary distribution due to category differences. In the experiment, we used the pre-trained BERT-base-uncased model as the feature extractor, fine-tuned it with Books domain data, and added a fully connected classifier on BERT, combined with relative entropy regularization and measure propagation. The experiment was run on the NVIDIA 4090D GPU to ensure efficiency. The experimental results are shown in
Table 5.
Table 5 shows the experimental results of sentiment classification of our unsupervised domain adaptation method on the Amazon Reviews dataset (Books → Electronics), where our method achieved an accuracy of 78.4%, significantly better than the baseline model: BERT (Source Only) only achieved an accuracy of 65.2%, indicating that the performance is poor without domain adaptation; DANN and CDAN achieved 71.3% and 73.5%, respectively, showing that adversarial training can alleviate distributional differences, but the effect is limited; ToAlign reached 76.1%, which is close to our results but still inferior. This shows that relative entropy regularization and measure propagation effectively improve the performance of cross-domain sentiment classification by constraining distribution consistency and global semantic alignment through KL divergence, especially in text tasks with large differences in language style, verifying the robustness and superiority of the method on text data.
6. Conclusions
This paper proposes an unsupervised domain adaptation method based on relative entropy regularization and measure propagation. By incorporating KL divergence as an information-theoretic constraint and propagating probability measures for structural alignment, the method significantly improves generalization on target domains. Experimental results on the OfficeHome and DomainNet datasets demonstrate strong performance, with average accuracies of 72.3% and 46.2%, respectively, outperforming several competitive baselines. The proposed framework offers a principled and scalable solution for reducing distribution discrepancies and enhancing model robustness under moderate domain shifts.
While accuracy is a widely used evaluation metric and effectively demonstrates the performance of our approach, we acknowledge its limitations in capturing model behavior under class imbalance or skewed label distributions—common issues in real-world domain adaptation tasks. Relying solely on accuracy may overlook cases where the model performs poorly on minority classes. In future work, we plan to incorporate additional evaluation metrics such as F1-score or balanced accuracy and perform stratified analysis to better understand model behavior across different categories.
Despite its advantages, the proposed method still exhibits limitations when facing highly heterogeneous domain pairs, where global alignment alone may be insufficient. To address this, future research will focus on integrating local structure-aware adaptation strategies and enhancing the flexibility of pseudo-measure modeling to handle fine-grained distribution shifts. Additionally, scaling the framework to large-scale datasets and real-time applications remains an important challenge. Potential deployment scenarios include cross-institutional medical image analysis, multi-sensor perception in autonomous driving, and cross-platform sentiment classification, where distributional gaps and label scarcity are prevalent. These future directions will further improve the adaptability, interpretability, and real-world applicability of our method in complex cross-domain scenarios.