Towards Discriminability with Distribution Discrepancy Constrains for Multisource Domain Adaptation

Lu, Yuwu; Huang, Wanming

doi:10.3390/math12162564

Open AccessArticle

Towards Discriminability with Distribution Discrepancy Constrains for Multisource Domain Adaptation

by

Yuwu Lu

^*

and

Wanming Huang

School of Software, South China Normal University, Foshan 528225, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(16), 2564; https://doi.org/10.3390/math12162564

Submission received: 10 July 2024 / Revised: 18 August 2024 / Accepted: 18 August 2024 / Published: 20 August 2024

(This article belongs to the Special Issue Mathematics Methods in Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

Multisource domain adaptation (MDA) is committed to mining and extracting data concerning target tasks from several source domains. Many recent studies have focused on extracting domain-invariant features to eliminate domain distribution differences. However, there are three aspects that require further consideration. (1) Efforts should be made to ensure the maximum correlation in the potential subspace between the source and target domains. (2) While aligning the marginal distribution, the conditional distribution must also be considered. (3) Merely aligning the source distribution and target distribution cannot guarantee sufficient differentiation for classification tasks. To address these problems, we propose a novel approach named towards discriminability with distribution discrepancy constrains for multisource domain adaptation (TD-DDC). Specifically, TD-DDC first mines features of maximal relations learned from all domains while constructing domain data distribution mean distance metrics for interdomain distribution adaptation. Simultaneously, we integrate discriminability into domain alignment, which means increasing the distance among labels that are distinct from one another while reducing the distance among labels that are the same. Our proposed method not only reduces the interdomain distributional differences but also takes into account the preservation of interdomain correlation and inter-category discrimination. Numerous experiments have shown that TD-DDC performs much better than its competitors on three visual benchmark test databases.

Keywords:

multisource domain adaptation; correlation learning; class discriminative feature learning

MSC:

68T45

1. Introduction

The massive data in this information age have made it possible for machine learning technology to be implemented successfully in many fields. Domain adaptation (DA) is a component of machine learning that performs effectively when the distributions of the source and target domains are the same and has been used to make significant contributions in practical applications such as image classification [1,2,3,4,5]. In a complex environment, because of the diversity of light and angles, the impact of domain shifts caused by different data distributions in the source and target domains, as well as typical machine learning models’ reliance on many labeled annotated data, greatly limits the development of traditional machine learning. To address these issues, one approach is to utilize labeled training data to obtain pseudolabels for the test data and apply them in target tasks, often referred to as unsupervised domain adaptation (UDA).

UDA has gained much popularity as a result of its practical usefulness [6,7,8,9,10,11]. Currently, most UDA methodologies are centered around single-source domain adaptation (SDA) [6,7,8], which transfers knowledge obtained from a single-source domain to the target domain. However, because of the complexity and unpredictability of real-world scenarios, the samples that are obtained frequently contain varied data distributions along with multiple angles, rays, colors, etc. Additionally, there are frequently several sets of data samples from various sources. Multisource domain adaptation (MDA) [9,10,11], which adapts a model from numerous rich source domains with labels to a target domain without labels, was therefore created to address the needs of complicated and different data-gathering sites.

MDA stands for multiple source domain inputs, bringing rich information while also hiding much redundant information. Firstly, redundant information in the input data may lead to increased computational resources, which require more computational resources. In addition, redundant data may introduce noise, as shown in Figure 1a, where some useless and obstructive noisy source samples are mapped into subspace, which may affect the correct matching of domain samples. Therefore, it is noteworthy that domain alignment is hindered by the presence of redundant data. To ensure that the extracted features are effective and the transferred knowledge is positive and useful for the target task, we need to consider correlation information for knowledge transfer.

The primary goal of MDA is to reduce the distribution of differences between the source and target domains, which is often achieved by feature transformations that reduce the distributional differences between these domains. One of the widely employed tactics uses the MMD algorithm to calculate the distribution distance among all domains. MMD is used to jointly modify the marginal and conditional distributions between the source and target domains in transfer component analysis (TCA) [12]. Joint distribution adaptation (JDA) [13] has demonstrated that domain distribution alignment should align both the conditional distribution and the marginal distribution. Under the domain distribution metric, the domain distribution discrepancy can be effectively mitigated. In fact, most previous MMD-based methods implement the data-centric alignment of different domains as a whole. However, the feature transformation process based on global constraints often neglects the local category structure preservation of the samples. The category samples are also easily mixed if they are not classified in the area, which leads to a mismatch among the samples of different classes in the source and target domains, which negatively affects image classification. If intra- and inter-class discriminability can be improved, it is possible to counteract the negative effects of category mixing and further improve the accuracy of image classification.

In this article, we mainly focus on fully considering discrimination and correlation information to guide knowledge transfer. We suggest a novel DA method, named TD-DDC. An overview of the method is depicted in Figure 2. First, the strongest correlation between each pair of source and target domains in the potential subspace is extracted by TD-DDC, as shown in Figure 1b, eliminating redundant information. Then, a domain discrepancy metric is constructed to minimize the marginal and conditional distribution discrepancy between the domains. Simultaneously, we construct discriminative learning matrices for intra- and inter-class features to maintain the category discriminant structure, which minimizes the distance between the same labeled samples and maximizes the distance between distinct labeled samples, so as to form the tight intra-class structure and clear class boundaries. Finally, based on the above metric-optimized feature transformations, we integrate these three processes and continually iteratively refine the target pseudolabels, and then the domain data are projected into the q-dimensional potential feature space to perform the classification task. In summary, the main contributions of this article are as follows:

(1): To address the problem that the DA algorithm based on global distribution adaptation easily leads to feature structure loss, a novel DA method, TD-DDC, is proposed to utilize correlation extraction, distribution difference constraints, and discriminant analysis to carry out global, local, and discriminant maintenance.
(2): The proposed correlation learning scheme draws the most relevant features among different domains. Our strategy is expanded to multisource domain scenarios to address increasingly challenging actual circumstances.
(3): On three multisource domain adaptive datasets, our tests demonstrate that the proposed TD-DDC approach produces good results. Additional experiments also support the usefulness of our proposed TD-DDC method from various perspectives.

The remainder of this article is organized as follows. A review of related work is provided in Section 2. Details of the proposed TD-DDC method are presented in Section 3. In Section 4, we assess TD-DDC’s effectiveness using three datasets in a variety of ways. Section 5 concludes this article.

2. Related Work

2.1. Multisouce Domain Adaptation

To reduce the probability of target mistakes, MDA approaches explore information from various sources and target domain data [14,15,16]. The two major categories of MDA techniques now in use are sample adaptation-based and feature-level adaptation-based approaches.

Sample adaptation-based methods work by resampling the samples and increasing the corresponding weights for the desired samples before adding them to the model training. For instance, Wen et al. [17] introduced the DARN model to efficiently alter each source domain’s weight during the training phase to guarantee the adaptation of relevant domains. To match the distribution between the source and target domain data more accurately, Huang et al. [18] suggested a method to determine the weights of the source samples directly. For each defined objective function, Mansour et al. [16] designed an algorithm with a distribution-weighted combination, which reflects the target hypothesis by a weighted combination of the source hypotheses. Feature-level adaptation-based approaches align distributions to produce domain-invariant representations in multiple source domains by projecting source and target domain samples onto a shared feature subspace. MFSAN [19] aligns the classifier’s output and the distribution of the source and target domains in many distinct feature spaces while accounting for the effects of decision boundaries among classes. PTMDA [20] builds a succession of pseudolabel domains that successfully align the pseudolabels and remaining source domains after assigning each pair of corresponding source and target domains to a particular subspace.

TD-DDC proposes a feature-based method for MDA, which uses the interpretable data structure to construct the global maximum correlation feature extraction, distribution discrepancy, and local inter- and intra-category discriminant feature extraction. This is conducive to a better understanding of the relationship among cross-domain samples in MUDA scenarios.

2.2. Maximum Correlation Feature Extraction Mechanism

In the era of information explosion, it is a challenge to discover the potential relationship among objects and to mine and measure some nonintuitive related factors. Furthermore, most of the existing MDA methods neglect the latent correlation information among multiple domains, which helps extract useful features for target tasks. Canonical correlation analysis (CCA) is a subspace approach that focuses on finding the most correlative relationship between two datasets. Many variations of CCA have been proposed [21,22,23]. For example, by studying the inherent link among various modalities, Chen et al. [21] introduced kernel CCA to increase the recognition rate of target samples. Xiu et al. [22] proposed a new robust sparse formula to enhance dependability. To extract dynamic associations, Zhu et al. [23] concentrated on eliminating the impact of lagged historical data and expanded CCA to the dynamic weighting case. This method maximizes the correlation between the weighted representations of current and previous data.

In contrast to other domain adaptation approaches, our method constructs the greatest correlation among domains to increase categorization accuracy while absorbing the benefits of CCA to extract connections among all domains. Additionally, unlike traditional CCA methods, which are sensitive to noisy data and require a consistent number of samples, leading to poor generalization performance, TD-DDC proposes a new global correlation extraction method that solves the problem perfectly, which is devoted to extracting the global maximum correlation of multiple domains to ensure transfer effective data.

2.3. Category Discriminative Feature Extraction Mechanism

For cross-domain classification, the intra- and inter-category discriminative information of all domain data is important. Thus, many related discriminative methods have been proposed for discrimination learning.

MSFAN [24] learns class-specific and domain-invariant features to improve feature consistency and domain invariance. To address the issue of class center shift, DIA [25] integrates the discriminatory data from the source domain with the local data structure details from the target domain. To maximize target label estimation and feature representation, a CAN [26] model was built to achieve cross-domain similarity and sparsity of various classes as well as interdomain microstructure alignment. Li et al. [27] proposed novel methodologies to utilize discriminative discriminability to reduce distributional disparities among domains, sparse the distance among classes, and improve the model’s separability properties.

In contrast to the existing methods, TD-DDC emphasizes exploring the most challenging data pairs across domains, minimizing the gap between sample pairs that have the same label, and maximizing the gap between sample pairs that have distinct labels, with continuous optimization iterations in the algorithm to improve recognition accuracy.

3. Method

Here, we introduce our method in detail. First, the problem is defined and the motivation for the method is provided. Then, the different learning steps of our method are described. A list of all the notations used in this article is provided in Table 1.

3.1. Problem Definition and Motivation

In the case of multiple source domains, there are

M

different source domain data with different labels

D_{s} = {D_{s m}}_{m = 1}^{M}

and one target data without labels

D_{t}

. In addition, we mark

D_{s m} = {(x_{s m}^{i}, y_{s m}^{i})}_{i = 1}^{n_{s m}} = {X_{s m}, Y_{s m}}

. Among them,

X_{s m} = {x_{s m}^{i}}_{i = 1}^{n_{s m}}

denotes the sample of the source domain

m (m = 1, \dots, M)

and

Y_{s m} = {y_{s m}^{i}}_{i = 1}^{n_{s m}}

corresponds to the source domain’s real labels.

n_{s m}

is the sample number of the m-th source domain, and

n_{t}

is the target domain sample number. Furthermore, we assume that

x_{s m}, x_{t} \in X

and

X \subset ℝ

is the corresponding feature space; correspondingly,

y_{s m} \in Y

,

Y \subset ℝ

. At the same time, the conditional and marginal distributions are unequal because of the effect of domain differences, that is,

P (x_{s m}) \neq P (x_{t})

,

Q_{s m} (y_{s m} | x_{s m}) \neq Q (y_{t} | x_{t})

.

In our proposed TD-DDC method, we aim to design a novel DA method including the following three components: (1) a maximum correlation multisource measure, (2) a cross-domain distribution discrepancy measure, and (3) a category discriminative measure.

The aim of the maximum correlation multisource measure is to find the degree of correlation between data from different source and target domains, uncover hidden relationships between them, extract useful information, and then incorporate this relationship into a model that forecasts the target domain data, which performs better at image classification. The present research bottleneck for domain adaptation is the issue of inconsistent domain distribution caused by numerous factors, such as illumination, time, and angle, in the real world when there are several sources and targets for domain data. Our proposed TD-DDC method aims to reduce the domain distribution to achieve domain alignment in the prospective subspace, which reduces domain shift and promotes domain transfer more effectively. If we strive only to mitigate the effects of domain bias, the output features will only work to decrease the distance among domain distributions and ignore aligning class samples, this could result in the misalignment of class samples that cross domains. We must pay attention to the role of class discriminability in classification, which means that samples of the identical class are closely clustered together and samples of distinct classes are kept as far apart as possible to ensure that the identical samples in different domains can be aligned more effectively.

3.2. Maximum Correlation Multisource Measure

Canonical correlation analysis (CCA) [28,29] aims to mine the shared information between two or more data domains and then maximize the correlation between the data domains through linear combination. To leverage the data fields already available while avoiding the disadvantage of limited data, we apply the CCA methodology to DA to enhance the positive information of the existing samples and increase the accuracy of image categorization. Therefore, the maximum correlation measure between each source and target domain can be written as follows:

\begin{matrix} \min_{P} \sum_{m = 1}^{M} {‖P^{T} X_{s m} - P^{T} X_{t}‖}_{F}^{2}, \\ s . t . P^{T} X_{s m} X_{s m}^{T} P = I, P^{T} X_{t} X_{t}^{T} P = I, m = 1, 2, \dots, M . \end{matrix}

(1)

where

I

denotes the unit matrix,

X_{s m}

denotes the m-th source domain, and

X_{t}

denotes the target domain. Equation (1) requires that the number of each pair of domains of the training data must be equal. In multisource domain scenarios, the training samples from the source domains tend to be more numerous than the samples from the target domains, a limitation that degrades the performance of the algorithm in multisource domain model training and restricts the scope of its application areas.

Therefore, to address this problem, we devised a novel correlation algorithm that introduces matrix

K_{s m}

,

K_{t}

as follows:

\begin{matrix} \min_{P} \sum_{m = 1}^{M} {‖P^{T} X_{s m} K_{s m} - P^{T} X_{t} K_{t}‖}_{F}^{2}, \\ s . t {. P}^{T} X_{s m} X_{s m}^{T} P = I, P^{T} X_{t} X_{t}^{T} P = I, m = 1, 2, \dots, M . \end{matrix}

(2)

where

K_{s m} = [I_{n_{s m}}, 0_{n_{s m} \times n_{t}}]

,

K_{t} = [I_{n_{t}}, 0_{n_{t} \times n_{s m}}]

, which cleverly transforms the correlation algorithm into a case independent of the number of source-domain target domains.

3.3. Cross-Domain Distribution Discrepancy Measure

Excessive differences in the domain distributions of multisource domains can degrade model performance, so in addition to mining the correlation of data from different domains, aligning the domain distribution is also very important. Different domain metrics (e.g., MMD and JMMD) are used to achieve domain distribution alignment in current domain adaptation research with excellent results. JDA [13] performs data reconstruction based on PCA [30] to find an orthogonal transformation matrix

P

for edge distribution adaptation and conditional distribution adaptation to achieve distribution alignment across domains.

Similarly, the TD-DDC method obtains an optimal adaptive matrix

P

as a domain-invariant projection matrix for cross domains to realize domain and class adaptation.

First, we utilize the classical cross-domain distribution discrepancy measure approach to measure the marginal and conditional distributions as follows:

\begin{matrix} {MMD}_{0 m} (X_{s m}, X_{t}) = {‖\frac{1}{n_{s m}} \sum_{i = 1}^{n_{s m}} P^{T} x_{s m}^{i} - \frac{1}{n_{t}} \sum_{i = 1}^{n_{t}} P^{T} x_{t i}‖}^{2} \\ = t r {(P}^{T} X_{s m} M_{0 m} X_{s m}^{T} P), \end{matrix}

(3)

\begin{array}{l} {MMD}_{c m} (X_{s m}, X_{t}) & = {‖\frac{1}{n_{s m}^{(k)}} \sum_{x_{s m} \in D_{s m}^{(k)}} P^{T} x_{s i}^{(k)} - \frac{1}{n_{t}^{(k)}} \sum_{x_{t m} \in {\hat{D}}_{t}^{(k)}} P^{T} x_{t m}^{(k)}‖}^{2} \\ = \sum_{k = 0}^{C} t r (P^{T} X_{c m} M_{c m}^{k} X_{c m}^{T} P) \\ = t r (P^{T} X_{c m} M_{c m} X_{c m}^{T} P), \end{array}

(4)

where Equations (3) and (4) measure the interdomain discrepancy through a global perspective. In the formula, we attempt to seek the optimal adaptive transformation

P

, where

D_{s m}^{(k)}

denotes all samples that belong to category

k

in the m-th source domain,

n_{s m}^{(k)}

represents the quantity of source domain samples of class

k

,

{\hat{D}}_{t}^{(k)}

displays the samples of class

k

target pseudolabels matched to the target domain samples,

n_{t}^{(k)}

shows how many class

k

pseudolabels correspond to the target domain samples, and

M_{0 m}

,

M_{c m}^{k}

is the marginal matrix, represented as:

{(M_{0 m})}_{i j} = \{\begin{matrix} \frac{1}{n_{s m} n_{t}}, x_{i}, x_{j} \in D_{s m}; \\ \frac{1}{n_{t} n_{t}}, x_{i}, x_{j} \in D_{t}; \\ \frac{- 1}{n_{s m} n_{t}}, otherwise . \end{matrix}

(5)

{(M_{c m}^{k})}_{i j} = \{\begin{array}{l} \frac{1}{n_{s m}^{(k)} n_{s m}^{(k)}}, x_{i}, x_{j} \in D_{s m}^{(k)}; \\ \frac{- 1}{n_{s m}^{(k)} n_{t}^{(k)}}, \{\begin{matrix} x_{i} \in D_{s m}^{(k)}, x_{j} \in {\hat{D}}_{t}^{(k)} \\ x_{j} \in D_{s m}^{(k)}, x_{i} \in {\hat{D}}_{t}^{(k)} \end{matrix} \\ \frac{1}{n_{t}^{(k)} n_{t}^{(k)}}, x_{i}, x_{j} \in {\hat{D}}_{t}^{(k)}; \\ 0, otherwise . \end{array}

(6)

Extending Equations (3) and (4) to multiple source domains, the variance metric formula is as follows:

{MMD}_{0} (X_{s}, X_{t}) = t r {(P}^{T} X \sum_{m = 1}^{M} M_{0 m} X^{T} P) = t r {(P}^{T} {XM}_{0} X^{T} P)

(7)

{MMD}_{c} (X_{s}, X_{t}) = t r {(P}^{T} X \sum_{m = 1}^{M} M_{c m} X^{T} P) = t r {(P}^{T} {XM}_{c} X^{T} P)

(8)

where

X_{m} = [X_{s 1}, X_{s 2}, \dots, X_{s M}]

,

X = [X_{m}, X_{t}]

. We define

M = \sum_{c = 0}^{C} M_{c}

to integrate the marginal and conditional distributions so we can obtain the entire cross-domain distribution discrepancy loss as follows:

\begin{array}{l} L_{MMD} & = {MMD}_{0} (X_{s}, X_{t}) + {MMD}_{c} (X_{s}, X_{t}) \\ = t r {(P}^{T} {X (M}_{0} {+ M}_{c} {) X}^{T} P) \\ = t r {(P}^{T} {XMX}^{T} P) . \end{array}

(9)

It is obvious that if we minimize Equation (9), we can efficiently learn domain-invariant representations among domains and reduce the distributional differences across domains.

3.4. Category Discriminative Measure

During the process of interdomain distribution alignment, feature transformation may distort the category structure and data structure, thereby affecting the target classifier’s discriminative characterization. Therefore, in addition to maximizing correlation extraction and minimizing the differences in the marginal and conditional distributions among all domains, TD-DDC also aims to learn category discriminative features to achieve class-aware unsupervised domain adaptive learning, i.e., maximizing the preservation of the data structure of the original dataset, minimizing the differences in the domains within a uniform class, and maximizing the differences in the domains among the same classes.

First, to reduce variation in a class, the distance among samples belonging to the same class is minimized. For each source domain, we minimize the distance among samples with the same label and maximize the distance among samples with different labels, following the formula:

\begin{array}{l} L_{s m} & = \sum_{c = 1}^{C} {\sum_{x_{s m}^{p} \in D_{s m}^{c}} \max_{x_{s m}^{q} \in D_{s m}^{c}} | | P^{T} x_{s m}^{p} - P^{T} x_{s m}^{q} | |}^{2} - \min_{x_{s m}^{k} \notin D_{s m}^{c}} | | P^{T} x_{s m}^{p} - P^{T} x_{s m}^{k} | |^{2} \\ = \sum_{c = 1}^{C} t r (P^{T} X_{s m} (W_{s m, s a m e}^{c} - W_{s m, d i f f}^{c}) X_{s m}^{T} P), \end{array}

(10)

where

W_{s m, s a m e}^{c}

,

W_{s m, d i f f}^{c}

can be expressed as:

{(W_{s m, s a m e}^{c})}_{p q} = \{\begin{array}{l} \begin{array}{l} I (x_{s m}^{p} \in D_{s m}^{c}) \\ + \sum_{x_{s m}^{a} \in D_{s m}^{c}} I (x_{s m}^{p} = \underset{x_{s m}^{k} \in D_{s m}^{c}}{\arg \max} | | P^{T} x_{s m}^{a} - P^{T} x_{s m}^{k} | |^{2}), \end{array} & p = q, \\ \begin{array}{l} - I (x_{s m}^{p} \in D_{s m}^{c}, x_{s m}^{q} = \underset{x_{s m}^{k} \in D_{s m}^{c}}{\arg \max} | | P^{T} x_{s m}^{q} - P^{T} x_{s m}^{k} | |^{2}) \\ - I (x_{s m}^{q} \in D_{s m}^{c}, x_{s m}^{p} = \underset{x_{s m}^{k} \in D_{s m}^{c}}{\arg \max} | | P^{T} x_{s m}^{q} - P^{T} x_{s m}^{k} | |^{2}), \end{array} & p \neq q . \end{array}

(11)

{(W_{s m, d i f f}^{c})}_{p q} = \{\begin{array}{l} \begin{array}{l} I (x_{s m}^{p} \in D_{s m}^{c}) \\ + \sum_{x_{s m}^{a} \in D_{s m}^{c}} I (x_{s m}^{p} = \underset{x_{s m}^{k} \notin D_{s m}^{c}}{\arg \min} | | P^{T} x_{s m}^{a} - P^{T} x_{s m}^{k} | |^{2}), \end{array} & p = q \\ \begin{array}{l} - I (x_{s m}^{p} \in D_{s m}^{c}, x_{s m}^{q} = \underset{x_{s m}^{k} \notin D_{s m}^{c}}{\arg \min} | | P^{T} x_{s m}^{p} - P^{T} x_{s m}^{k} | |^{2}) \\ - I (x_{s m}^{q} \in D_{s m}^{c}, x_{s m}^{p} = \underset{x_{s m}^{k} \notin D_{s m}^{c}}{\arg \min} | | P^{T} x_{sm}^{q} - P^{T} x_{s m}^{k} | |^{2}), \end{array} & p \neq q \end{array}

(12)

where

I (\cdot)

is an indicator function. We define

W_{s a m e}^{s m} = \sum_{c = 1}^{C} W_{s m, s a m e}^{c}

and

W_{d i f f}^{s m} = \sum_{c = 1}^{C} W_{s m, d i f f}^{c}

; then, Equation (10) can be transformed to:

L_{s m} = t r (P^{T} X_{s m} (W_{s a m e}^{s m} - W_{d i f f}^{s m}) X_{s m}^{T} P) .

(13)

Therefore, extending Equation (13) to multisource domains, we have:

L_{s} = \sum_{m = 1}^{M} L_{s m} = t r (P^{T} X_{s} \sum_{m = 1}^{M} (W_{s a m e}^{s m} - W_{d i f f}^{s m}) X_{s}^{T} P) = t r (P^{T} X_{s} {(W}_{s a m e}^{s} - W_{d i f f}^{s} {) X}_{s}^{T} P),

(14)

where

X_{s} = [X_{s 1}, X_{s 2}, \dots, X_{s M}]

,

W_{s a m e}^{s} = \sum_{m = 1}^{M} W_{s a m e}^{s m}

, and

W_{d i f f}^{s} = \sum_{m = 1}^{M} W_{d i f f}^{s m}

. Likewise, noting that the target domain labels are unknown, we assign the target domain data samples pesudolabels as its initial label values. Pesudolabels can be generated by KNN, and it has been demonstrated that pesudolabels can be refined and tuned in an iterative manner and ultimately achieve the desired results. Therefore, here, using samples of the target domain with pseudolabels, we construct the target domain’s discriminative loss function as:

L_{t} = t r (P^{T} X_{t} (W_{s a m e}^{t} - W_{d i f f}^{t}) X_{t}^{T} P),

(15)

Combining Equations (14) and (15) yields the entire discriminative loss term for all domains:

L_{d i s t a n c e} = L_{s} + L_{t} = t r (P^{T} {XWX}^{T} P),

(16)

where

W = W_{s a m e} - W_{d i f f}

,

W_{s a m e} = d i a g (W_{s a m e}^{s}, W_{s a m e}^{t})

,

W_{d i f f} = d i a g (W_{d i f f}^{s}, W_{d i f f}^{t})

, and

X = [X_{s}, X_{t}] .

3.5. Optimization

The logic behind our proposed TD-DDC method is based on extracting the maximum interdomain correlation, learning domain invariance, and class-discriminative feature representation for each source and target domain. Therefore, we combine Equations (2), (9) and (16) into one function as follows:

L_{o v e r a l l} = \sum_{m = 1}^{M} {‖P^{T} X_{s m} K_{s m} - P^{T} X_{t} K_{t}‖}_{F}^{2} + α L_{d i s t a n c e} + β L_{MMD} + λ | | P | |_{F}^{2},

(17)

where

α

constrains the domain discriminativeness of the learned features,

β

balances the learned features’ domain invariance, and the role of

λ

is to maintain statistical stability. Notably, regarding the size of

α

and

β

, if the value is too small, the domain invariance and class discrimination obtained by extraction may be lost; in contrast, if the value is too large, it may affect the feature learning process. Regarding the value of

λ

, the model’s capacity for generalization can be enhanced by selecting an appropriate size of

λ

. Therefore, it is necessary to select the appropriate coefficient size.

The equation after the optimization of

P

based on Equation (17) is expressed as:

\begin{array}{l} \min_{P} \sum_{m = 1}^{M} {‖P^{T} X_{s m} K_{s m} - P^{T} X_{t} K_{t}‖}_{F}^{2} + α t r (P^{T} {XWX}^{T} P) + β t r (P^{T} {XMX}^{T} P) + λ | | P | |_{F}^{2} . \\ s . t . \sum_{m = 1}^{M} P^{T} X_{s m} X_{s m}^{T} P = I, P^{T} X_{t} X_{t}^{T} P = I, P^{T} {XHH}^{T} X^{T} P = I, m = 1, 2, \dots, M . \end{array}

(18)

where

H

is a central matrix to centralize data and

X = [X_{m}, X_{t}]

,

X_{m} = [X_{s 1}, X_{s 2}, \dots, X_{s M}]

, which maximizes the variance in the data in each source and target domain, is the goal of the constraint in Equation (18). Clearly, this constrained nonlinear optimization issue can be addressed by using the Lagrange multiplier method to solve a generalized eigen-decomposition problem, and the corresponding solution formula can be obtained as follows:

\begin{array}{l} L (P, Φ) = t r (P^{T} \sum_{m = 1}^{M} (X_{s m} K_{s m} K_{s m}^{T} X_{s m}^{T} - X_{s m} K_{s m} K_{t}^{T} X_{t}^{T} - X_{t} K_{t} K_{s m}^{T} X_{s m}^{T} + X_{t} K_{t} K_{t}^{T} X_{t}^{T}) P + α P^{T} {XWX}^{T} P + \\ β P^{T} {XMX}^{T} P + λ P^{T} IP) + t r ((\sum_{m = 1}^{M} (I - P^{T} X_{s m} X_{s m}^{T} P) + I - P^{T} X_{t} X_{t}^{T} P + I - P^{T} {XHH}^{T} X^{T} P) Φ) . \end{array}

(19)

By calculating Equation (19), we can obtain:

\begin{array}{l} \sum_{m = 1}^{M} (X_{s m} K_{s m} K_{s m}^{T} X_{s m}^{T} - X_{s m} K_{s m} K_{t}^{T} X_{t}^{T} - X_{t} K_{t} K_{s m}^{T} X_{s m}^{T} + X_{t} K_{t} K_{t}^{T} X_{t}^{T}) + (α {XWX}^{T} + β {XMX}^{T} + λ I) P \\ = (\sum_{m = 1}^{M} (X_{s m} X_{s m}^{T}) + X_{t} X_{t}^{T} + {XHH}^{T} X^{T}) P Φ, \end{array}

(20)

where

Φ

is the matrix of diagonals with Lagrange multipliers. The best answers to the equations are the eigenvectors that match the obtained minimal eigenvalues in the q-dimension. The complete TD-DDC algorithm procedure is shown in Algorithm 1.

Algorithm 1: TD-DDC algorithm

Input: Labeled source samples {X_sm, Y_sm}, m ∈ {1, …, M}; unlabeled target samples {X_t}; tradeoff parameters: α, β, λ; iteration: N; number of latent subspace dimension: m.
1. Construct matric Ω = βM + αW, where

W_{s a m e} = d i a g (\sum_{m = 1}^{M} W_{s a m e}^{s m}, W_{s a m e}^{t}), and W_{d i f f} = d i a g (\sum_{m = 1}^{M} W_{d i f f}^{s m}, W_{d i f f}^{t})

.
While not converge or t ≤ N do
2. Derive the projected matrix P by computing the k-smallest eigenvectors of equations.
3. For each domain, let [Z_sm, Z_t] = P^T[X_sm, X_t], and train the standard classifier f_m (m ∈ {1, …, M}) on {Z_sm, y_sm} to predict target pseudolabels

{\hat{y}}_{t}

then compare the accuracy of the predicted target pseudolabels, taking the maximum value.
4. Update matrix Ω = βM + αW.
5. t = t + 1.
End while
Output:
Projected matrix P, and the final classifier

f_{m}, m \in {1, \dots, M}

.

3.6. Time Complexity

Here, we analyze the computational complexity of Algorithm 1 in detail. In step 1 and step 4, constructing cross-domain distribution matrix

M

and category discriminative matrix

W

costs around

O (NC {(\sum_{m = 1}^{M} n_{s m} + n_{t})}^{2} + N {(\sum_{m = 1}^{M} n_{s m} + n_{t})}^{2})

. In step 2, solving the generalized eigen-decomposition of the transformation matrix

P

takes around

O (N q d^{2})

. In step 3 and all other steps, it costs approximately

O (N q d (\sum_{m = 1}^{M} n_{s m} + n_{t}))

. In generalize, the whole computational complexity of TD-DDC is

O (NC {(\sum_{m = 1}^{M} n_{s m} + n_{t})}^{2} + N q d^{2} + N q d (\sum_{m = 1}^{M} n_{s m} + n_{t}))

.

4. Experiment

In this section, we evaluate our novel domain-adaptive technique using widely recognized visual cross-domain benchmarks, including CMU-PIE, Office-31, and Office-Caltech10. Some selected sample images from these three datasets are shown in Figure 3.

4.1. Data Description

The CMU-PIE [31] dataset is a collection of more than 40,000 facial images created by 68 different individuals under various lighting conditions, expressions, gestures, and expressions. The dataset is divided into five parts, C05, C07, C09, C27, and C29, which represent left posture, up posture, downward posture, forward posture, and right posture, respectively. Groups of two subsets, such as “C05, C07→C09” and “C05, C07→C27”, …, “C27, C29→C09”, were chosen as the source domain, and each subset was used as the target domain in our experiment, for a total of 30 cross-domain categorization jobs.

Office-31 is a standard adaptive benchmark for image classification, containing 31 categories and approximately more than 4000 images, including Webcam, Dslr, and Amazon, representing photos taken by webcams and DSLRs, as well as images downloaded from Amazon. In our experiment, we selected any two subsets as the source domain and selected any one as the target domain, such as “A, W→D”, “A, D→W”, and “W, D→A”, indicating “source domains→target domain”.

Caltech256, a widely recognized adaptive benchmark database for image classification collected by the California Institute of Technology, contains 256 categories and more than 30,000 samples; it has the following two features: 800-dimensional SURF features and 4096-dimensional DeCAF₆ features. In addition, we selected 10 categories common to Office-31 and Caltech256 as classification objects, and together with the Office-31 dataset, they formed Office+Caltech10, including Amazon(Z), Caltech(C), Dslr (D), and Webcam(W) four parts. We obtained 12 separate cross-domain tasks by randomly selecting three domains as the source domain and one as the target domain, such as “A, C, D→W”, “A, C, W→D “, “C, D, W→A” and “A, D, W→C” (using the notation “source domains→target domain”).

4.2. Experimental Setup

We compared the TD-DDC methodology with the following seven single-source domain adaptation approaches to evaluate our strategy more thoroughly: 1-nearest neighbor classifier (1-NN) [32], joint distribution adaptation (JDA) [13], TCA [12], joint geometrical and statistical alignment (JGSA) [33], DICD [27], discriminative invariant alignment (DIA) [25], and SGA-MDA [34]. We also evaluated TD-DDC against the following multisource domain adaptation strategies to further demonstrate its superiority: MDA [35], IMTL [36], and WBT [37]. FADA [38], MHDA [39], CWAN [40], and HyMOS [41].

Since there have been few studies on the use of MDA in real-life benchmarking and some of the introduced method species are performed in single-source domain setups, we introduced two different criteria for the MDA. (1) Single-source best: The final result is determined by the best single-source domain picture classification result. (2) Multiple source domains: the MUDA approach. The first technique examines whether adding more source domains is necessary, while the second method shows how well our proposed TD-DDC method performs.

Our experiments are based on unsupervised domain adaptation. Source domain data with labels and target domain data without labels are the kinds of data we needed, and we employed 1NN as a basis criterion for unperforming domain adaptation in all experiments. We chose the subspace dimension d = 100 and the iterations T = 10 for the parameterization of TD-DDC.

β

is the loss of cross-domain distribution discrepancy measure; the range of the parameter is set to [0.01, 50].

α

is the parameter of the discriminative information term, which is [0.01, 50], along with the parameter of the canonical term used for tuning

λ

, within the range [0.01, 50].

4.3. Results

We conducted broad-based experimental comparisons on the three datasets, and Table 2, Table 3 and Table 4 present the results.

(1) Experimental results using the CMU-PIE dataset: The performance comparison of classification accuracy on the CMU-PIE dataset is shown in Table 2, where the optimal classification accuracies are denoted in bold. The results show that TD-DDC has a clear advantage in classification performance on the 23 cross-domain datasets, with an average classification accuracy of 89.7%, which is 10.8% higher than that of the single-source optimal DIA method and 4.2% higher than that of the previous multisource domain method. The multisource technique outperforms the single-source strategy, which demonstrates that adding more source domains enriches the data and can enhance the performance of the majority of classification jobs.

(2) Experimental results using the Office-31 dataset: Table 3 shows the performance comparison of classification accuracy on the Office-31 (DeCAF₇ features) dataset. The table shows that TD-DDC surpasses the other comparable methods and improves by 1.6% over the MDA method. Positive results highlight the significance of learning correlation data across domains and accounting for both intradomain and interdomain boundaries.

(3) Experimental results using Office-Caltech10 with the DeCAF6 dataset: The mean classification accuracies of TD-DDC are exhibited in Table 4. In the context of the single-source domain, three-quarters of the subtasks are superior to the previous single-source domain approach. In the previous multisource domain methods, the WBT method has better results. In terms of average accuracy, the TD-DDC method is 1.4% higher than that of the WBT method, which further demonstrates that TD-DDC outperforms the other comparison methods in most tasks.

4.4. Analysis

Feature Analysis: In Figure 4, we visualize the tasks of C07→C29 for JDA (single source) and C07→C29 for DICD (single source). Figure 4 shows that the results in Figure 4b are better than those in Figure 4a, which indicates the importance of extracting domain invariance and improving the intradomain interdomain discriminativeness. In addition, the results in Figure 4c are better than those in Figure 4a,b, which shows that there are many benefits from considering more source domain data. In the figure, different colors indicate different domains and the same color indicates the same domain, where red and green represent source domains and blue represents the target domain.

Parameter Sensitivity: Using the control variables method, we fix any two values of

α

,

β

, and

λ

. In Figure 5a, the effect of changing the value of α from [0.01, 50] on the change in classification accuracy is demonstrated when the values of

β

and

λ

are fixed, and the experimental results show that the effect of the α value on the classification results is relatively small. In Figure 5b, the value of

β

is shown when the values of

α

and

λ

are fixed and the value of

β

varies in [0.01, 50]. We found that as

β

increases, its corresponding classification accuracy varies less. In Figure 5c, the value of

λ

is shown when the values of

α

and

β

are fixed and the value of

λ

varies in [0.01, 50]. On subtask A, W→D in the Office31 dataset and subtask A, C, D→W in the Office-Caltech10 dataset, the corresponding classification accuracies increase with

λ

.

Subspace Base Quantity Analysis: The impact of changing the number of subspace bases in the three subtasks across the three datasets on categorization accuracy is shown in Figure 6a. We concluded from the data that the accuracy of the classification task increases as the subspace base increases, so we set the subspace base to K = 100 in our classification task.

Algorithm Convergence: We observed the impact of multiple methodologies on the classification performance for the PIE dataset against various numbers of iterations to evaluate the algorithmic convergence performance of our algorithm, as shown in Figure 6b. The figure shows that the TD-DDC method is more effective than the MDA and JDA methods, which further suggests that an increase in the number of source domains significantly improves classification performance. Additionally, after many iterations, the classification performance of the model gradually reaches the convergence state and gradually tends to stabilize, so in our experiments, we set the number of iterations to T = 10 to account for this.

Ablation Study: As shown in Figure 7, in order to further explore the role played by each of the components in TD-DDC, we conducted ablation experiments on TD-DDC. Because of space constraints, we selected only one subtask from each of the three datasets, and we evaluated the model on TD-DDC, TD-DDC_wo_mmd, TD-DDC_wo_cor, and TD-DDC_wo_dis. Among them, TD-DDC represents our method. TD-DDC_wo_mmd represents the method without cross-domain distribution discrepancy measure. TD-DDC_wo_cor represents the method without maximum correlation multisource measure. TD-DDC_wo_dis represents the method without category discriminative measure. The worst overall performance was TD-DDC_wo_cor, which performed especially terribly on the CMU-PIE dataset. This may be because the CMU-PIE dataset has too many categories, and ignoring the global correlation measure is more likely to lead to feature distortion. This further indicates that focusing only on inter-class shift and domain shift is not enough and that we must also concentrate on the correlation between samples. TD-DDC_wo_mmd and TD-DDC_wo_dis behaved similarly overall. TD-DDC performed best, which shows that each item proposed in TD-DDC plays an indispensable role.

Time Comparison: In this section, we conducted a time cost comparison experiment on the Office-Caltech10 and CMU-PIE datasets, utilizing the single-source domain method, JDA, and the multisource domain approach, MDA, for comparison. We compared the algorithmic time of each method using ten iterations as a benchmark. The experimental results are illustrated Figure 8.

Regarding the Office-Caltech10 dataset, the increased time consumption when using TD-DDC may be due to the computation of inter-category and intra-category in the discriminative feature extraction module. It is noteworthy that although TD-DDC takes 75 s longer than MDA, TD-DDC achieves significant advances on the Office-Caltech10 dataset. As the number of classes increases, transfer learning becomes more difficult because the feature distortion caused by aligning the feature space performs more severely as it is aligned. We can observe that the three methods have a large time consumption on the CMU-PIE dataset, which contains more than 60 categories, our method not only takes less time than MDA but also performs better than MDA and JDA, indicating that it is able to perform effective discriminative feature extraction and reduce the impact of feature distortion.

5. Conclusions

The majority of previous multisource domain approaches concentrate on identifying domain invariance across all sources, frequently omitting the impact of interdomain spacing within a domain on the classification result. The number of training samples in the source and target domains in the CCA regularization matrix must be the same to achieve the maximum correlation among all domains. This is addressed in this article by introducing a mediation matrix K in our proposed TD-DDC method, which also introduces the cross-domain distribution discrepancy measure matrix to align the marginal and conditional losses. Meanwhile, our model also accounts for intradomain spacing, increasing the intradomain gap and decreasing the interdomain gap to improve the learned representation’s capacity to discriminate. We implemented the TD-DDC model using three image datasets. The results demonstrate that the TD-DDC model’s effect is quite good, outperforming some of the domain adaptation approaches. Our approach studies applications on a single target domain; however, in real-world application scenarios, there are usually important scenarios with multiple target domains. In future work, we recommend extending the method to multisource and multitarget domain scenarios that are more adapted to realistic scenarios.

Author Contributions

Writing—original draft, W.H.; Writing—review & editing, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62176162) and the Guangdong Basic and Applied Basic Research Foundation (2023A1515012875, 2022A1515140099).

Data Availability Statement

The authors confirm that the data supporting the findings of this study are available within the article, and it is available upon request by contact with the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liang, J.; He, R.; Sun, Z.; Tan, T. Aggregating randomized clustering promoting invariant projections for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1027–1042. [Google Scholar] [CrossRef]
Ye, Y.; Fu, S.; Chen, J. Learning cross-domain representations by vision transformer for unsupervised domain adaptation. Neural Comput. Appl. 2023, 35, 10847–10860. [Google Scholar] [CrossRef]
Chen, S.; Chen, L. Joint-product representation learning for domain generalization in classification and regression. Neural Comput. Appl. 2023, 35, 16509–16526. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, Z.; Hao, W.; Song, C. Attention Guided Multiple Source and Target Domain Adaptation. IEEE Trans. Image Process. 2021, 30, 892–906. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Shen, Z.; Li, D.; Zhong, P.; Chen, Y. Probability-Based Graph Embedding Cross-Domain and Class Discriminative Feature Learning for Domain Adaptation. IEEE Trans. Image Process. 2023, 32, 72–87. [Google Scholar] [CrossRef]
Li, S.; Liu, C.H.; Su, L.; Xie, B.; Ding, Z.; Chen, C.L.P.; Wu, D. Discriminative Transfer Feature and Label Consistency for Cross-Domain Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4842–4856. [Google Scholar] [CrossRef] [PubMed]
Zhou, Q.; Zhou, W.; Wang, S.; Xing, Y. Unsupervised domain adaptation with adversarial distribution adaptation network. Neural Comput. Appl. 2021, 33, 7709–7721. [Google Scholar] [CrossRef]
Li, Y.; Liu, Y.; Zheng, D.; Huang, Y.; Tang, Y. Discriminable feature enhancement for unsupervised domain adaptation. Image Vis. Comput. 2023, 137, 104755. [Google Scholar] [CrossRef]
Li, Y.; Yuan, L.; Chen, Y.; Wang, P.; Vasconcelos, N. Dynamic Transfer for Multi-Source Domain Adaptation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Zuo, Y.; Yao, H.; Xu, C. Attention-Based Multi-Source Domain Adaptation. IEEE Trans. Image Process. 2021, 30, 3793–3803. [Google Scholar] [CrossRef]
Fu, Y.; Zhang, M.; Xu, X.; Cao, Z.; Ma, C.; Ji, Y.; Zuo, K.; Lu, H. Partial Feature Selection and Alignment for Multi-Source Domain Adaptation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 16649–16658. [Google Scholar]
Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain Adaptation via Transfer Component Analysis. IEEE Trans. Neural Netw. 2011, 22, 199–210. [Google Scholar] [CrossRef]
Long, M.S.; Wang, J.M.; Ding, G.G.; Sun, J.G.; Yu, P.S. Transfer feature learning with joint distribution adaptation. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 2200–2207. [Google Scholar]
Shai, B.; John, B.; Crammer, K.; Kulesza, A.; Pereira, F.; Vaughan, J. A theory of learning from different domains. Mach. Learn. 2010, 79, 151–175. [Google Scholar]
Crammer, K.; Kearns, M.; Wortman, J. Learning from Multiple Sources. J. Mach. Learn. Res. 2006, 9, 1757–1774. [Google Scholar]
Mansour, Y.; Mohri, M.; Rostamizadeh, A. Domain Adaptation with Multiple Sources. In Proceedings of the 21st International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–11 December 2008; pp. 1041–1048. [Google Scholar]
Wen, J.; Greiner, R.; Schuurmans, D. Domain Aggregation Networks for Multi-Source Domain Adaptation. Int. Conf. Mach. Learn. 2019, 946, 10214–10224. [Google Scholar]
Huang, J.; Smola, A.J.; Gretton, A.; Borgwardt, K.M.; Scholkopf, B. Correcting Sample Selection Bias by Unlabeled Data. In Proceedings of the Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference, Vancouver, BC, Canada, 4–7 December 2006; pp. 601–608. [Google Scholar]
Zhu, Y.; Zhuang, F.; Wang, D. Aligning Domain-specific Distribution and Classifier for Cross-domain Classification from Multiple Sources. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; pp. 5989–5996. [Google Scholar]
Ren, C.X.; Liu, Y.H.; Zhang, X.W.; Huang, K.K. Multi-Source Unsupervised Domain Adaptation via Pseudo Target Domain. IEEE Trans. Image Process. 2022, 31, 2122–2135. [Google Scholar] [CrossRef]
Chen, L.F.; Wang, K.; Wu, M.; Pedrycz, W.; Hirota, K. K-Means Clustering-based Kernel Canonical Correlation Analysis for Multimodal Emotion Recognition. Int. Fed. Autom. Control 2020, 53, 10250–10254. [Google Scholar] [CrossRef]
Xiu, X.; Miao, Z. Robust Sparse Canonical Correlation Analysis: New Formulation and Application to Fault Detection. IEEE Sens. Lett. 2022, 6, 7002804. [Google Scholar] [CrossRef]
Zhu, Q.; Liu, Q.; Qin, S.J. Dynamic Weighted Canonical Correlation Analysis for Auto-Regressive Modeling. Int. Fed. Autom. Control 2020, 53, 200–205. [Google Scholar]
Yue, X.; Zheng, Z.; Reed, C.; Das, H.P.; Keutzer, K.; Vincentelli, A.S. Multi-source Few-shot Domain Adaptation. IEEE Workshop Appli. Comput. Vis. 2021; under view. [Google Scholar]
Lu, Y.; Li, D.; Wang, W.; Lai, Z.; Zhou, J.; Li, X. Discriminative Invariant Alignment for Unsupervised Domain Adaptation. IEEE Trans. Multimed. 2022, 24, 1871–1882. [Google Scholar] [CrossRef]
Kang, G.; Jiang, L.; Wei, Y.; Hauptmann, A. Contrastive Adaptation Network for Single- and Multi-Source Domain Adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 1793–1804. [Google Scholar] [CrossRef]
Li, S.; Song, S.; Huang, G.; Ding, Z.M.; Wu, C. Domain Invariant and Class Discriminative Feature Learning for Visual Domain Adaptation. IEEE Trans. Image Process. 2018, 27, 4260–4272. [Google Scholar] [CrossRef]
Hotelling, H. The most predictable criterion. J. Educ. Psychol. 1935, 26, 139–142. [Google Scholar] [CrossRef]
Hotelling, H. Relations between two sets of variates. Biometrika 1936, 28, 321–377. [Google Scholar] [CrossRef]
Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar] [CrossRef]
Sim, T.; Baker, S.; Bsat, M. The CMU pose, illumination, and expression database. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1615–1618. [Google Scholar]
FuKunaga, K.; Narendra, P.M. A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Comput. 1975, 24, 750–753. [Google Scholar] [CrossRef]
Zhang, J.; Li, W.; Ogunbona, P. Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5150–5158. [Google Scholar]
Sanodiya, R.K.; Mathew, A.; Mathew, J.; Khushi, M. Statistical and Geometrical Alignment using Metric Learning in Domain Adaptation. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Karimpour, M.; Saray, S.N.; Tahmoresnezhad, J. Multisource domain adaptation for image classification. Mach. Vis. Appl. 2020, 31, 1–19. [Google Scholar] [CrossRef]
Ding, Z.; Shao, M.; Fu, Y. Incomplete Multisource Transfer Learning. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 310–323. [Google Scholar] [CrossRef] [PubMed]
Montesuma, E.F.; Mboula, F.M.N. Wasserstein barycenter for multi-source domain adaptation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 16785–16793. [Google Scholar]
Peng, X.; Huang, Z.; Zhu, Y.; Saenko, K. Federated Adversarial Domain Adaptation. arXiv 2020, arXiv:1911.02054. [Google Scholar]
Zhan, J.; Zhang, T.; Yu, Y. Multi-source Heterogeneous Data Aggregation Method Based on Adversarial Domain Adaptation. In Proceedings of the 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021. [Google Scholar]
Yao, Y.; Li, X.; Zhang, Y.; Ye, Y. Multisource Heterogeneous Domain Adaptation with Conditional Weighting Adversarial Network. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 2079–2092. [Google Scholar] [CrossRef]
Bucci, S.; Borlino, F.C.; Caputo, B.; Tommasi, T. Distance-based Hyperspherical Classification for Multi-source open-set domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 1030–1039. [Google Scholar]

Figure 1. (a): Existing DA tasks have much redundant information, which may affect domain distribution alignment. (b): Our method is dedicated to extracting the key features among domains and eliminating redundant information.

Figure 2. Illustration of the TD-DDC method. TD-DDC focuses on learning the maximum interdomain correlation and domain invariance and mitigating interdomain differences, which engage in domain-discriminative learning to expand interclass gaps and narrow intraclass gaps to improve the performance of image classification.

Figure 3. Image samples from the CMU-PIE, Office-31, and Office-Caltech10 datasets.

Figure 4. Visualization of different adaptation methods using t-SNE, where (a,b) denote the single-source domain adaptation of JDA and DICD on the task C07→C29, respectively, and (c) denotes our proposed method on the task C07, C27→C29.

Figure 5. Different performances of the parameters on the three datasets, where (a–c) are the changes in the relevant parameters on A, W→D, C05, C07→C09, and A, C, W→D.

Figure 6. (a) The effect of variations in the subspace dimension parameter K. (b) The algorithm convergence of the different methods performed on C05, C07→C09.

Figure 7. Accuracy (%) of different variants of TD-DDC.

Figure 8. Time comparison on the Office-Caltech10 and CMU-PIE datasets. (a) C→W of the JDA method and A, C, D→W of the MDA and TD-DDC methods. (b) C05-C09 of the JDA method and C05, C07→C09 of the MDA and TD-DDC methods.

Table 1. Symbols used in this paper.

Notation	Description	Notation	Description
$D_{s m} {/ D}_{t}$	Source/target domain	$X_{s m} / X_{t}$	Source/target matrix
$n_{s m} / n_{t}$	Source/target examples	$P$	Adaptation matrix
C	Shared classes	$E$	Embedding matrix
N	Iterations	$H$	Centering matrix
$α$ , $β$	Tradeoff parameters	$M$	MMD matrix
$λ$	Regularization parameters	$W$	Discriminative matrix
$q$	Dimensionality of subspace	$d$	Dimensionality of original space

Table 2. Comparison of classification accuracy (%) performance on CMU-PIE.

Datasets	Single Best							Multisource	Ours
Datasets	1NN	JDA	TCA	JGSA	DICD	MDA	DIA	MDA	Ours
C05, C07-09	46.6	60.2	51.5	57.2	72.0	79.8	79.8	78.3	97.7
C05, C07-27	54.1	64.7	64.7	69.2	92.2	93.1	93.0	93.5	99.0
C05, C07-29	37.2	33.7	33.7	49.8	66.9	67.7	73.1	69.9	96.8
C05, C09-07	41.0	47.7	47.7	58.7	73.0	81.3	78.2	81.8	95.0
C05, C09-27	46.5	59.6	59.6	69.5	92.2	93.1	93.0	96.1	99.1
C05, C09-29	26.2	29.4	33.2	52.2	66.9	73.4	73.1	71.8	97.0
C05, C27-07	62.7	82.8	67.8	65.7	90.1	91.3	94.7	94.1	82.1
C05, C27-09	73.2	87.2	75.9	62.6	89.0	88.5	88.8	95.1	79.3
C05, C27-29	37.2	49.9	37.4	57.0	75.6	74.4	85.6	77.4	65.0
C05, C29-07	26.1	58.6	40.8	54.7	73.0	71.1	78.2	81.1	94.0
C05, C29-09	28.3	52.0	41.8	56.4	72.0	75.3	74.3	74.1	97.0
C05, C29-27	31.2	83.7	59.6	66.0	92.2	93.1	93.0	92.6	99.0
C07, C09-05	24.5	60.6	41.8	57.5	69.9	84.0	72.9	83.1	52.3
C07, C09-27	54.1	75.4	56.2	69.5	83.4	92.4	89.7	91.9	74.3
C07, C09-29	26.2	40.9	33.7	52.2	61.4	73.4	71.1	72.4	96.6
C07, C27-05	33.0	81.0	55.6	63.2	93.1	84	93.7	91.4	99.4
C07, C27-09	73.2	87.2	75.9	69.2	89.0	88.5	88.8	93.4	97.4
C07, C27-29	37.2	49.9	40.3	57.0	75.6	74.4	85.6	76.2	97.3
C07, C29-05	24.5	60.6	41.8	57.5	69.9	84	64.7	84.7	51.8
C07, C29-09	46.6	60.2	51.5	57.2	65.9	79.8	79.8	81.0	96.9
C07, C29-27	54.1	75.4	64.7	69.2	85.3	91.0	89.5	92.2	74.8
C09, C27-05	33.0	81.0	55.6	63.2	93.1	91.5	93.7	90.6	99.5
C09, C27-07	62.7	82.8	67.8	62.6	90.1	91.3	94.7	94.0	95.2
C09, C27-29	37.2	49.9	40.3	57.0	75.6	74.4	85.6	77.9	97.5
C09, C29-05	21.4	50.9	34.7	56.3	69.4	75.3	72.9	79.9	99.5
C09, C29-07	41.0	56.1	47.7	58.7	65.4	81.3	74.2	80.7	93.1
C09, C29-27	46.5	68.0	56.2	69.5	83.4	92.4	89.7	91.2	72.4
C27, C29-05	33.0	81.0	55.6	63.2	93.1	91.5	93.7	91.4	99.5
C27, C29-07	62.7	82.8	67.8	65.7	90.1	91.3	94.7	93.7	94.7
C27, C29-09	73.2	87.2	75.9	62.6	89.0	88.5	88.8	94.6	96.7
Avg	43.1	64.7	52.6	61.0	73.2	83.7	78.9	85.5	89.7

The best results are bold.

Table 3. Comparison of classification accuracy (%) performance on Office-31.

Dataset	Single Best				Multisource				Ours
Dataset	1NN	TCA	JDA	MDA	MDA	CWAN	MHDA	HyMOS	Ours
A, D→W	90.9	93.8	94.7	94.3	92.7	82.5	80.3	90.2	95.0
A, W→D	97.8	98.6	97.6	96.4	94.2	80.7	88.1	89.9	98.8
D, W→A	42.4	46.3	46.2	51.9	52.6	58.4	60.3	60.8	50.3
avg	77.0	79.6	79.5	80.9	79.7	73.9	76.7	80.3	81.4

The best results are bold.

Table 4. Comparison of classification accuracy (%) performance on Office-Caltech10.

Dataset	Single Best				Multisource				Ours
Dataset	1NN	TCA	JDA	SGA-MDA	IMTL	MDA	FADA	WBT	Ours
A, C, D→W	89.2	98.9	98.9	99.3	74.7	71.2	88.1	96.6	100
A, C, W→D	98.7	100	100	100	68.3	79.7	84.2	95.9	100
A, D, W→C	70.4	79.6	85.1	82.5	58.1	50.1	88.7	85.0	85.6
C, D, W→A	85.7	89.5	91.4	88.8	67.4	52.3	87.1	92.7	89.9
avg	86.0	92.0	93.8	92.1	67.1	84.4	87.0	92.5	93.9

The best results are bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, Y.; Huang, W. Towards Discriminability with Distribution Discrepancy Constrains for Multisource Domain Adaptation. Mathematics 2024, 12, 2564. https://doi.org/10.3390/math12162564

AMA Style

Lu Y, Huang W. Towards Discriminability with Distribution Discrepancy Constrains for Multisource Domain Adaptation. Mathematics. 2024; 12(16):2564. https://doi.org/10.3390/math12162564

Chicago/Turabian Style

Lu, Yuwu, and Wanming Huang. 2024. "Towards Discriminability with Distribution Discrepancy Constrains for Multisource Domain Adaptation" Mathematics 12, no. 16: 2564. https://doi.org/10.3390/math12162564

APA Style

Lu, Y., & Huang, W. (2024). Towards Discriminability with Distribution Discrepancy Constrains for Multisource Domain Adaptation. Mathematics, 12(16), 2564. https://doi.org/10.3390/math12162564

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Discriminability with Distribution Discrepancy Constrains for Multisource Domain Adaptation

Abstract

1. Introduction

2. Related Work

2.1. Multisouce Domain Adaptation

2.2. Maximum Correlation Feature Extraction Mechanism

2.3. Category Discriminative Feature Extraction Mechanism

3. Method

3.1. Problem Definition and Motivation

3.2. Maximum Correlation Multisource Measure

3.3. Cross-Domain Distribution Discrepancy Measure

3.4. Category Discriminative Measure

3.5. Optimization

3.6. Time Complexity

4. Experiment

4.1. Data Description

4.2. Experimental Setup

4.3. Results

4.4. Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI