An Empirical Study of Self-Supervised Learning with Wasserstein Distance
Abstract
:1. Introduction
- We propose to use the tree Wasserstein distance for self-supervised learning including SimCLR and SimSiam for the first time.
- We investigate the combination of probability models and TWD (total variation and ClusterTree). We find that the ArcFace model with prior information is suited for total variation, while SEM [10] is suited for ClusterTree models.
- We propose a robust variant of TWD (RTWD) and show that RTWD is equivalent to total variation.
- We propose the Jeffrey divergence regularization for TWD minimization, and find that the regularization significantly stabilizes training.
- We demonstrate that the combination of TWD and probability models can obtain better performance in self-supervised training for CIFAR10, STL10, and SVHN compared to the cosine similarity in SimCLR experiments, while the performance of CIFAR100 can be improved further in the future.
2. Related Work
3. Background
3.1. Self-Supervised Learning Methods
3.2. p-Wasserstein Distance
3.3. 1-Wasserstein Distance with Tree Metric (Tree-Wasserstein Distance)
4. SSL with 1-Wasserstein Distance
4.1. SimCLR with Tree Wasserstein Distance
4.2. SimSiam with Tree Wasserstein Distance
4.3. Robust Variant of Tree Wasserstein Distance
4.4. Probability Models
4.5. Jeffrey Divergence Regularization
5. Experiments
5.1. Performance Comparison for SimCLR
5.2. Performance Comparison for SimSiam
5.3. Effect of Number of Nearest Neighbors
5.4. Effect of the Regularization Parameter for Jeffrey Divergence
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Kramer, M.A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 1991, 37, 233–243. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Chen, X.; Fan, H.; Girshick, R.; He, K. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap your own latent—A new approach to self-supervised learning. In Proceedings of the NeurIPS, Virtual, 6–12 December 2020; pp. 21271–21284. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the CVPR, Virtual, 14–19 June 2020; pp. 9729–9738. [Google Scholar]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. In Proceedings of the NeurIPS, Virtual, 6–12 December 2020; pp. 9912–9924. [Google Scholar]
- Chen, X.; He, K. Exploring simple siamese representation learning. In Proceedings of the CVPR, Virtual, 19–25 June 2021; pp. 15750–15758. [Google Scholar]
- Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging properties in self-supervised vision transformers. In Proceedings of the ICCV, Virtual, 11–17 October 2021; pp. 9650–9660. [Google Scholar]
- Jiang, Q.; Chen, C.; Zhao, H.; Chen, L.; Ping, Q.; Tran, S.D.; Xu, Y.; Zeng, B.; Chilimbi, T. Understanding and constructing latent modality structures in multi-modal representation learning. In Proceedings of the CVPR, Vancouver, BC, Canada, 18–22 June 2023; pp. 7661–7671. [Google Scholar]
- Lavoie, S.; Tsirigotis, C.; Schwarzer, M.; Vani, A.; Noukhovitch, M.; Kawaguchi, K.; Courville, A. Simplicial embeddings in self-supervised learning and downstream classification. In Proceedings of the ICLR, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the ICML, Sydney, NSW, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
- Kusner, M.; Sun, Y.; Kolkin, N.; Weinberger, K. From word embeddings to document distances. In Proceedings of the ICML, Lille, France, 6–11 July 2015; pp. 957–966. [Google Scholar]
- Sato, R.; Yamada, M.; Kashima, H. Re-evaluating Word Mover’s Distance. In Proceedings of the ICML, Baltimore, MD, USA, 17–23 July 2022; pp. 19231–19249. [Google Scholar]
- Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In Proceedings of the CVPR, Virtual, 14–19 June 2020; pp. 4938–4947. [Google Scholar]
- Xian, R.; Yin, L.; Zhao, H. Fair and Optimal Classification via Post-Processing. In Proceedings of the ICML, Honolulu, HI, USA, 23–29 July 2023; pp. 37977–38012. [Google Scholar]
- Zhao, H. Costs and Benefits of Fair Regression. TMLR 2022, 1–22. [Google Scholar]
- Indyk, P.; Thaper, N. Fast image retrieval via embeddings. In Proceedings of the 3rd International Workshop on Statistical and Computational Theories of Vision, Nice, France, 12 October 2003; Volume 2, p. 5. [Google Scholar]
- Le, T.; Yamada, M.; Fukumizu, K.; Cuturi, M. Tree-sliced variants of wasserstein distances. In Proceedings of the NeurIPS, Vancouver, BC, Canada, 8–14 December 2019; pp. 12283–12294. [Google Scholar]
- Rabin, J.; Peyré, G.; Delon, J.; Bernot, M. Wasserstein Barycenter and Its Application to Texture Mixing. In Proceedings of the International Conference on Scale Space and Variational Methods in Computer Vision, Ein-Gedi, Israel, 29 May–2 June 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 435–446. [Google Scholar]
- Kolouri, S.; Zou, Y.; Rohde, G.K. Sliced Wasserstein kernels for probability distributions. In Proceedings of the CVPR, Las Vegas, NV, USA, 26 June –1 July 2016; pp. 5258–5267. [Google Scholar]
- Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the CVPR, Long Beach, CA, USA, 16–20 June 2019; pp. 4690–4699. [Google Scholar]
- Becker, S.; Hinton, G.E. Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature 1992, 355, 161–163. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the ICML, Vienna, Austria, 12–18 July 2020; pp. 1597–1607. [Google Scholar]
- Oord, A.v.d.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Zbontar, J.; Jing, L.; Misra, I.; LeCun, Y.; Deny, S. Barlow twins: Self-supervised learning via redundancy reduction. In Proceedings of the ICML, Virtual, 18–24 July 2021; pp. 12310–12320. [Google Scholar]
- Gretton, A.; Bousquet, O.; Smola, A.; Schölkopf, B. Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings of the ALT, Singapore, 8–11 October 2005; pp. 63–77. [Google Scholar]
- Tsai, Y.H.H.; Bai, S.; Morency, L.P.; Salakhutdinov, R. A note on connecting barlow twins with negative-sample-free contrastive learning. arXiv 2021, arXiv:2104.13712. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Zhao, W.; Peyrard, M.; Liu, F.; Gao, Y.; Meyer, C.M.; Eger, S. MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance. In Proceedings of the EMNLP-IJCNLP, Hong Kong, China, 3–7 November 2019; pp. 563–578. [Google Scholar]
- Yokoi, S.; Takahashi, R.; Akama, R.; Suzuki, J.; Inui, K. Word Rotator’s Distance. In Proceedings of the EMNLP, Virtual, 16–20 November 2020; pp. 2944–2960. [Google Scholar]
- Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. In Proceedings of the NIPS, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 2292–2300. [Google Scholar]
- Kolouri, S.; Nadjahi, K.; Simsekli, U.; Badeau, R.; Rohde, G. Generalized sliced wasserstein distances. In Proceedings of the NeurIPS, Vancouver, BC, Canada, 8–14 December 2019; pp. 261–272. [Google Scholar]
- Mueller, J.W.; Jaakkola, T. Principal differences analysis: Interpretable characterization of differences between distributions. In Proceedings of the NIPS, Montreal, QC, Canada, 7–12 December 2015; pp. 1702–1710. [Google Scholar]
- Deshpande, I.; Hu, Y.T.; Sun, R.; Pyrros, A.; Siddiqui, N.; Koyejo, S.; Zhao, Z.; Forsyth, D.; Schwing, A.G. Max-Sliced Wasserstein distance and its use for GANs. In Proceedings of the CVPR, Long Beach, CA, USA, 16–20 June 2019; pp. 10648–10656. [Google Scholar]
- Paty, F.P.; Cuturi, M. Subspace Robust Wasserstein Distances. In Proceedings of the ICML, Long Beach, CA, USA, 9–15 June 2019; pp. 5072–5081. [Google Scholar]
- Evans, S.N.; Matsen, F.A. The phylogenetic Kantorovich–Rubinstein metric for environmental sequence samples. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2012, 74, 569–592. [Google Scholar] [CrossRef] [PubMed]
- Lozupone, C.; Knight, R. UniFrac: A new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 2005, 71, 8228–8235. [Google Scholar] [CrossRef] [PubMed]
- Sato, R.; Yamada, M.; Kashima, H. Fast Unbalanced Optimal Transport on Tree. In Proceedings of the NeurIPS, Virtual, 6–12 December 2020. [Google Scholar]
- Le, T.; Nguyen, T. Entropy partial transport with tree metrics: Theory and practice. In Proceedings of the AISTATS, Virtual, 13–15 April 2021; pp. 3835–3843. [Google Scholar]
- Takezawa, Y.; Sato, R.; Yamada, M. Supervised tree-wasserstein distance. In Proceedings of the ICML, Virtual, 18–24 July 2021; pp. 10086–10095. [Google Scholar]
- Takezawa, Y.; Sato, R.; Kozareva, Z.; Ravi, S.; Yamada, M. Fixed Support Tree-Sliced Wasserstein Barycenter. In Proceedings of the AISTATS, Valencia, Spain, 28–30 March 2022; pp. 1120–1137. [Google Scholar]
- Le, T.; Nguyen, T.; Fukumizu, K. Optimal transport for measures with noisy tree metric. In Proceedings of the AISTATS, Valencia, Spain, 2–4 May 2024; pp. 3115–3123. [Google Scholar]
- Chen, S.; Tabaghi, P.; Wang, Y. Learning ultrametric trees for optimal transport regression. In Proceedings of the AAAI, Buffalo, NY, USA, 3–6 June 2024; pp. 20657–20665. [Google Scholar]
- Houry, G.; Bao, H.; Zhao, H.; Yamada, M. Fast 1-Wasserstein distance approximations using greedy strategies. In Proceedings of the AISTATS, Valencia, Spain, 2–4 May 2024; pp. 325–333. [Google Scholar]
- Tong, A.Y.; Huguet, G.; Natik, A.; MacDonald, K.; Kuchroo, M.; Coifman, R.; Wolf, G.; Krishnaswamy, S. Diffusion earth mover’s distance and distribution embeddings. In Proceedings of the ICML, Virtual, 18–24 July 2021; pp. 10336–10346. [Google Scholar]
- Le, T.; Nguyen, T.; Phung, D.; Nguyen, V.A. Sobolev transport: A scalable metric for probability measures with graph metrics. In Proceedings of the AISTATS, Virtual, 28–30 March 2022; pp. 9844–9868. [Google Scholar]
- Otao, S.; Yamada, M. A linear time approximation of Wasserstein distance with word embedding selection. In Proceedings of the EMNLP, Singapore, 6–10 December 2023; pp. 15121–15134. [Google Scholar]
- Laouar, C.; Takezawa, Y.; Yamada, M. Large-scale similarity search with Optimal Transport. In Proceedings of the EMNLP, Singapore, 6–10 December 2023; pp. 11920–11930. [Google Scholar]
- Zapatero, M.R.; Tong, A.; Opzoomer, J.W.; O’Sullivan, R.; Rodriguez, F.C.; Sufi, J.; Vlckova, P.; Nattress, C.; Qin, X.; Claus, J.; et al. Trellis tree-based analysis reveals stromal regulation of patient-derived organoid drug responses. Cell 2023, 186, 5606–5619. [Google Scholar] [CrossRef] [PubMed]
- Backurs, A.; Dong, Y.; Indyk, P.; Razenshteyn, I.; Wagner, T. Scalable nearest neighbor search for optimal transport. In Proceedings of the ICML, Vienna, Austria, 12–18 July 2020; pp. 497–506. [Google Scholar]
- Dey, T.K.; Zhang, S. Approximating 1-Wasserstein Distance between Persistence Diagrams by Graph Sparsification. In Proceedings of the ALENEX, Alexandria, VA, USA, 9–10 January 2022; pp. 169–183. [Google Scholar]
- Yamada, M.; Takezawa, Y.; Sato, R.; Bao, H.; Kozareva, Z.; Ravi, S. Approximating 1-Wasserstein Distance with Trees. TMLR 2022, 1–9. [Google Scholar]
- Frogner, C.; Zhang, C.; Mobahi, H.; Araya, M.; Poggio, T.A. Learning with a Wasserstein loss. In Proceedings of the NIPS, Montreal, QC, Canada, 7–12 December 2015; pp. 2053–2061. [Google Scholar]
- Toyokuni, A.; Yokoi, S.; Kashima, H.; Yamada, M. Computationally Efficient Wasserstein Loss for Structured Labels. In Proceedings of the ECAL: Student Research Workshop, Virtual, 19–23 April 2021; pp. 1–7. [Google Scholar]
- Raginsky, M.; Sason, I. Concentration of measure inequalities in information theory, communications, and coding. Found. Trends® Commun. Inf. Theory 2013, 10, 1–246. [Google Scholar] [CrossRef]
- Neumann, J.V. Zur theorie der gesellschaftsspiele. Math. Ann. 1928, 100, 295–320. [Google Scholar] [CrossRef]
- Fan, K. Minimax theorems. Proc. Natl. Acad. Sci. USA 1953, 39, 42–47. [Google Scholar] [CrossRef] [PubMed]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the NIPS, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Ahmed, N.; Natarajan, T.; Rao, K.R. Discrete cosine transform. IEEE Trans. Comput. 1974, 100, 90–93. [Google Scholar] [CrossRef]
Similarity | Prob Model | STL10 | CIFAR10 | CIFAR100 | SVHN |
---|---|---|---|---|---|
Cosine | N/A | 75.77± 0.47 | 67.39± 0.46 | 32.06± 0.06 | 76.35 ± 0.39 |
Softmax | 70.12 ± 0.04 | 63.20 ± 0.23 | 26.88 ± 0.26 | 74.46 ± 0.62 | |
SEM | 71.33 ± 0.45 | 61.13 ± 0.56 | 26.08 ± 0.07 | 74.28 ± 1.13 | |
AF (DCT) | 72.95 ± 0.31 | 65.92 ± 0.65 | 25.96 ± 0.13 | 76.51± 0.24 | |
TWD (TV) | Softmax | 65.54 ± 0.47 | 59.72 ± 0.39 | 26.07 ± 0.19 | 72.67 ± 0.33 |
SEM | 65.35 ± 0.31 | 56.56 ± 0.46 | 24.31 ± 0.43 | 73.36 ± 1.19 | |
AF | 65.61 ± 0.56 | 60.92 ± 0.42 | 26.33 ± 0.42 | 75.01 ± 0.32 | |
AF (PE) | 71.71 ± 0.17 | 64.68 ± 0.33 | 26.38 ± 0.37 | 76.44 ± 0.45 | |
AF (DCT) | 73.28 ± 0.27 | 67.03 ± 0.24 | 25.85 ± 0.39 | 77.62 ± 0.40 | |
Softmax + JD | 72.64 ± 0.27 | 67.08 ± 0.14 | 27.82± 0.22 | 77.69 ± 0.46 | |
SEM + JD | 71.79 ± 0.92 | 63.60 ± 0.50 | 26.14 ± 0.40 | 75.64 ± 0.44 | |
AF + JD | 72.64 ± 0.37 | 67.15 ± 0.27 | 27.45 ± 0.37 | 78.00 ± 0.15 | |
AF (PE) + JD | 74.47 ± 0.10 | 67.28 ± 0.65 | 27.01 ± 0.39 | 78.12 ± 0.48 | |
AF (DCT) + JD | 76.28± 0.07 | 68.60± 0.36 | 26.49 ± 0.24 | 79.70± 0.23 | |
TWD (Clus) | Softmax | 69.15 ± 0.45 | 62.33 ± 0.40 | 24.47 ± 0.40 | 74.87 ± 0.13 |
SEM | 72.88 ± 0.12 | 63.82 ± 0.32 | 22.55 ± 0.28 | 77.47 ± 0.92 | |
AF | 70.40 ± 0.40 | 63.28 ± 0.57 | 24.28 ± 0.15 | 75.24 ± 0.52 | |
AF (PE) | 72.37 ± 0.28 | 65.08 ± 0.74 | 23.33 ± 0.35 | 76.67 ± 0.26 | |
AF (DCT) | 71.95 ± 0.46 | 65.89 ± 0.11 | 21.87 ± 0.19 | 77.92 ± 0.24 | |
Softmax + JD | 73.52 ± 0.16 | 66.76 ± 0.29 | 24.96± 0.07 | 77.65 ± 0.53 | |
SEM + JD | 75.93± 0.14 | 67.68± 0.46 | 22.96 ± 0.28 | 79.19± 0.53 | |
AF + JD | 73.66 ± 0.23 | 66.61 ± 0.32 | 24.55 ± 0.14 | 77.64 ± 0.19 | |
AF (PE) + JD | 73.92 ± 0.57 | 67.00 ± 0.13 | 23.83 ± 0.42 | 77.87 ± 0.29 | |
AF (DCT) + JD | 74.29 ± 0.30 | 67.50 ± 0.49 | 22.89 ± 0.12 | 78.31 ± 0.72 |
Similarity | Probability Model | Linear Classifier |
---|---|---|
Cosine | N/A | 91.13 ± 0.14 |
TWD (TV) | Softmax + JD | 9.99 ± 0.00 |
AF (DCT) + JD | 90.60 ± 0.02 |
Similarity | Prob Model | STL10 | CIFAR10 | CIFAR100 | SVHN |
---|---|---|---|---|---|
Cosine | N/A | 75.44± 0.21 | 66.96± 0.45 | 31.63± 0.25 | 74.71 ± 0.31 |
Softmax | 71.25 ± 0.30 | 63.80 ± 0.48 | 26.18 ± 0.36 | 73.06 ± 0.47 | |
SEM | 71.34 ± 0.31 | 61.26 ± 0.42 | 25.40 ± 0.06 | 73.41 ± 0.95 | |
AF (DCT) | 72.15 ± 0.53 | 65.52 ± 0.45 | 24.93 ± 0.24 | 75.68± 0.13 | |
TWD (TV) | Softmax | 63.42 ± 0.24 | 59.03 ± 0.58 | 24.95 ± 0.31 | 70.87 ± 0.29 |
SEM | 63.72 ± 0.17 | 55.57 ± 0.35 | 23.40 ± 0.36 | 71.69 ± 0.75 | |
AF | 63.97 ± 0.05 | 59.96 ± 0.44 | 25.29 ± 0.17 | 73.44 ± 0.35 | |
AF (PE) | 71.04 ± 0.37 | 64.28 ± 0.14 | 25.71 ± 0.20 | 75.70 ± 0.42 | |
AF (DCT) | 72.75 ± 0.11 | 67.01 ± 0.03 | 24.95 ± 0.17 | 76.98 ± 0.44 | |
Softmax + JD | 72.05 ± 0.30 | 66.61 ± 0.20 | 26.91 ± 0.19 | 76.65 ± 0.56 | |
SEM + JD | 70.73 ± 0.89 | 62.75 ± 0.61 | 24.83 ± 0.27 | 74.71 ± 0.43 | |
AF + JD | 71.74 ± 0.19 | 66.74 ± 0.20 | 26.68± 0.35 | 77.10 ± 0.04 | |
AF (PE) + JD | 74.10 ± 0.20 | 66.82 ± 0.36 | 26.17 ± 0.00 | 77.55 ± 0.50 | |
AF (DCT) + JD | 76.24± 0.22 | 68.62± 0.40 | 25.70 ± 0.14 | 79.28± 0.22 | |
TWD (Clust) | Softmax | 67.95 ± 0.42 | 61.59 ± 0.29 | 23.34 ± 0.26 | 73.88 ± 0.05 |
SEM | 72.43 ± 0.11 | 63.63 ± 0.42 | 21.29 ± 0.28 | 77.04 ± 0.77 | |
AF | 69.09 ± 0.05 | 62.49 ± 0.45 | 22.56 ± 0.25 | 74.31 ± 0.40 | |
AF (PE) | 72.08 ± 0.07 | 64.56 ± 0.31 | 22.51 ± 0.29 | 75.98 ± 0.23 | |
AF (DCT) | 71.64 ± 0.15 | 65.51 ± 0.36 | 21.04 ± 0.10 | 77.59 ± 0.25 | |
Softmax + JD | 73.07 ± 0.13 | 66.38 ± 0.27 | 23.97± 0.11 | 76.82 ± 0.50 | |
SEM + JD | 75.50± 0.15 | 67.44± 0.10 | 21.90 ± 0.19 | 78.91± 0.30 | |
AF + JD | 72.70 ± 0.08 | 66.12 ± 0.26 | 23.50 ± 0.21 | 76.92 ± 0.06 | |
AF (PE) + JD | 73.66 ± 0.47 | 66.58 ± 0.01 | 22.86 ± 0.02 | 77.44 ± 0.30 | |
AF (DCT) + JD | 73.79 ± 0.12 | 67.34 ± 0.38 | 21.96 ± 0.34 | 78.00 ± 0.60 |
Similarity | K | STL10 | CIFAR10 | CIFAR100 | SVHN |
---|---|---|---|---|---|
TWD (TV) | 10 | 76.24 ± 0.22 | 68.62 ± 0.40 | 25.70 ± 0.14 | 79.28 ± 0.22 |
50 | 76.28 ± 0.07 | 68.60 ± 0.36 | 26.49 ± 0.24 | 79.70 ± 0.23 |
Similarity Function | STL10 | CIFAR10 | CIFAR100 | SVHN | |
---|---|---|---|---|---|
TWD (TV) | 73.28 ± 0.27 | 67.03 ± 0.24 | 25.85 ± 0.39 | 77.62 ± 0.40 | |
76.28 ± 0.07 | 68.60± 0.36 | 26.49± 0.24 | 79.70 ± 0.23 | ||
77.40 ± 0.17 | 68.48 ± 0.11 | 25.59 ± 0.16 | 79.67 ± 0.26 | ||
77.67± 0.06 | 68.26 ± 0.51 | 24.21 ± 0.35 | 79.91± 0.42 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yamada, M.; Takezawa, Y.; Houry, G.; Düsterwald, K.M.; Sulem, D.; Zhao, H.; Tsai, Y.-H. An Empirical Study of Self-Supervised Learning with Wasserstein Distance. Entropy 2024, 26, 939. https://doi.org/10.3390/e26110939
Yamada M, Takezawa Y, Houry G, Düsterwald KM, Sulem D, Zhao H, Tsai Y-H. An Empirical Study of Self-Supervised Learning with Wasserstein Distance. Entropy. 2024; 26(11):939. https://doi.org/10.3390/e26110939
Chicago/Turabian StyleYamada, Makoto, Yuki Takezawa, Guillaume Houry, Kira Michaela Düsterwald, Deborah Sulem, Han Zhao, and Yao-Hung Tsai. 2024. "An Empirical Study of Self-Supervised Learning with Wasserstein Distance" Entropy 26, no. 11: 939. https://doi.org/10.3390/e26110939
APA StyleYamada, M., Takezawa, Y., Houry, G., Düsterwald, K. M., Sulem, D., Zhao, H., & Tsai, Y. -H. (2024). An Empirical Study of Self-Supervised Learning with Wasserstein Distance. Entropy, 26(11), 939. https://doi.org/10.3390/e26110939