An Optimal-Transport-Based Multimodal Big Data Clustering
Abstract
:1. Introduction
- An innovative weak-notion distance metric-based method is designed to measure differences between the manifold structures of data collected from diverse devices, which ensures the full fusion of complementary information from data with heterogeneous distributions.
- Multimodal clustering is innovatively modeled using an optimal-transport-based multimodal clustering method (OTMC), which can capture fusion information with clear discriminative structures from heterogeneous modalities for mining intrinsic patterns.
- A variational solution is derived to solve OT based on a generative transport plan, which can precisely match the transport map for transporting the multimodal data to clustering prototypes in OTMC.
- Extensive experiments are conducted on four real-world benchmark datasets, which verify the superiority of OTMC compared with other methods in multimodal clustering, helped by never relying on the phantom of heterogeneous manifold intersections. In particular, OTMC obtains 92.15% ACC, 84.96% NMI, and 83.35% ARI on Handwritten, improving by 2.25%, 2.82%, and 3.28%, respectively.
2. Related Work
3. OT for Multimodal Clustering
3.1. Modeling Multimodal Clustering Based on OT
3.2. Decomposition of OTMC
3.3. The Consistent Constraint
4. The Variational Generative Solution Network Implementation
4.1. Variational Generative Solution
4.2. The Variational Generative Solution to OTMC
4.3. Multimodal OT Clustering Network
4.4. The Overall Loss
Algorithm 1. Multimodal OT clustering network. |
Input: Multimodal dataset and convergence criteria thr. Output: Soft-clustering-assignment matrix and the parameters of clustering network , generative network categorical distribution . Initialize: modality-specific categorical distributions modality-common categorical distribution parameters of modality-specific clustering networks , parameters of modality-common clustering network , parameters of modality-specific generative networks , and parameters of modality-common generative network while convergence criteria thr is not reached by Equation (19) do for each do Sample data in the -th modality . Generate the -th modality soft-assignment matrix according to Equation (13). Update by minimizing Equation (14). Sample modality-specific clustering prototypes from . Generate the -th modality reconstructed data according to Equation (16). Update and by minimizing Equation (17). end Sample multimodal data . Generate modality-common soft-assignment matrix . Update modality-common clustering network by minimizing Equation (15). Sample modality-common clustering prototypes from . Generate multimodal reconstructed data. Update modality-common generative network by minimizing Equation (18). Update all network parameters by minimizing Equation (19) according to Equation (20). end |
5. Experiments
5.1. Experimental Setup
5.2. Clustering Performance Evaluation
5.3. Further Evaluation
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kiaei, I.; Lotfifard, S. A two-stage fault location identification method in multiarea power grids using heterogeneous types of data. IEEE Trans. Ind. Inform. 2019, 15, 4010–4020. [Google Scholar] [CrossRef]
- Li, Y.; Yang, M.; Zhang, Z. A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 2019, 31, 1863–1883. [Google Scholar] [CrossRef]
- Fu, L.; Lin, P.; Vasilakos, A.V.; Wang, S. An overview of recent multi-view clustering. Neurocomputing 2020, 402, 148–161. [Google Scholar] [CrossRef]
- Gao, J.; Liu, M.; Li, P.; Laghari, A.A.; Javed, A.R.; Victor, N.; Gadekallu, T.R. Deep incomplete multi-view clustering via in-formation bottleneck for pattern mining of data in extreme-environment IoT. IEEE Internet Things J. 2023, 11, 26700–26712. [Google Scholar] [CrossRef]
- Gao, J.; Liu, M.; Li, P.; Zhang, J.; Chen, Z. Deep multiview adaptive clustering with semantic invariance. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 12965–12978. [Google Scholar] [CrossRef]
- Wang, H.; Yang, Y.; Liu, B. GMC: Graph-based multi-view clustering. IEEE Trans. Knowl. Data Eng. 2020, 32, 1116–1129. [Google Scholar] [CrossRef]
- Caron, M.; Bojanowski, P.; Joulin, A.; Douze, M. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 139–156. [Google Scholar]
- Gao, J.; Li, P.; Laghari, A.A.; Srivastava, G.; Gadekallu, T.R.; Abbas, S.; Zhang, J. Incomplete multiview clustering via semidiscrete optimal transport for multimedia data mining in IoT. ACM Trans. Multim. Comput. Commun. Appl. 2024, 20, 1–20. [Google Scholar] [CrossRef]
- Zhang, C.; Liu, Y.; Fu, H. AE2-Nets: Autoencoder in autoencoder networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2577–2585. [Google Scholar]
- Wan, Z.; Zhang, C.; Zhu, P.; Hu, Q. Multi-view information-bottleneck representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 10085–10092. [Google Scholar]
- Yang, Y.; Wang, H. Multi-view clustering: A survey. Big Data Min. Anal. 2018, 1, 83–107. [Google Scholar] [CrossRef]
- Chao, G.; Sun, S.; Bi, J. A survey on multi-view clustering. arXiv preprint 2017, arXiv:1712.06246. [Google Scholar]
- Li, P.; Laghari, A.A.; Rashid, M.; Gao, J.; Gadekallu, T.R.; Javed, A.R.; Yin, S. A deep multimodal adversarial cycle-consistent network for smart enterprise system. IEEE Trans. Ind. Inform. 2023, 19, 693–702. [Google Scholar] [CrossRef]
- Xiao, Q.; Dai, J.; Luo, J.; Fujita, H. Multi-view manifold regularized learning-based method for prioritizing candidate disease miRNAs. Knowl. Based Syst. 2019, 175, 118–129. [Google Scholar] [CrossRef]
- Neema, I.; Ardecani, F.B.; Shoghli, O. Cluster-based deterioration prediction of composite pavements with incorporation of flooding. In Proceedings of the 39th International Symposium on Automation and Robotics in Construction, Bogota, Colombia, 13–15 July 2022; pp. 99–106. [Google Scholar]
- Mirghaderi, H.; Hassanizadeh, B. k-most suitable locations problem: Greedy search approach. Int. J. Ind. Syst. Eng. 2022, 42, 80–95. [Google Scholar] [CrossRef]
- Rahiminasab, A.; Tirandazi, P.; Ebadi, M.J.; Ahmadian, A.; Salimi, M. An energy-aware method for selecting cluster heads in wireless sensor networks. Appl. Sci. 2020, 10, 7886. [Google Scholar] [CrossRef]
- Andrew, G.; Arora, R.; Bilmes, J.A.; Livescu, K. Deep canonical correlation analysis. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1247–1255. [Google Scholar]
- Wang, W.; Arora, R.; Livescu, K.; Bilmes, J.A. On deep multi-view representation learning. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1083–1092. [Google Scholar]
- Zhao, H.; Ding, Z.; Fu, Y. Multi-view clustering via deep matrix factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 2921–2927. [Google Scholar]
- Bodaghi, M.; Hosseini, M.; Gottumukkala, R. A multimodal intermediate fusion network with manifold learning for stress detection. arXiv preprint 2024, arXiv:2403.08077. [Google Scholar]
- Ma, Z.; Yu, J.; Wang, L.; Chen, H.; Zhao, Y.; He, X.; Wang, Y.; Song, Y. Multi-view clustering based on view-attention driven. Int. J. Mach. Learn. Cybern. 2023, 14, 2621–2631. [Google Scholar] [CrossRef]
- Liu, S.; Zhu, C.; Li, Z.; Yang, Z.; Gu, W. View-driven multi-view clustering via contrastive double-learning. Entropy 2024, 26, 470. [Google Scholar] [CrossRef]
- Dornaika, F.; Hajjar, S.E.; Charafeddine, J.; Barrena, N. Unified multi-view data clustering: Simultaneous learning of consensus coefficient matrix and similarity graph. Cogn. Comput. 2025, 17, 38. [Google Scholar] [CrossRef]
- Yin, M.; Huang, W.; Gao, J. Shared generative latent representation learning for multi-view clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 6688–6695. [Google Scholar]
- Trosten, D.J.; Lokse, S.; Jenssen, R.; Kampffmeyer, M. Reconsidering representation alignment for multi-view clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 1255–1265. [Google Scholar]
- Xu, J.; Ren, Y.; Tang, H.; Yang, Z.; Pan, L.; Yang, Y.; Pu, X.; Yu, P.S.; He, L. Self-supervised discriminative feature learning for multi-view clustering. IEEE Trans. Knowl. Data Eng. 2023, 35, 7470–7482. [Google Scholar] [CrossRef]
- Gao, Q.; Lian, H.; Wang, Q.; Sun, G. Cross-modal subspace clustering via deep canonical correlation analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 3938–3945. [Google Scholar]
- Mao, Y.; Yan, X.; Guo, Q.; Ye, Y. Deep mutual information maximin for cross-modal clustering. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 8893–8901. [Google Scholar]
- Yang, H.; Deng, Z.; Zhang, W.; Wu, Q.; Choi, K.; Wang, S. End-to-end multiview fuzzy clustering with double representation learning and visible-hidden view cooperation. IEEE Trans. Fuzzy Syst. 2024, 32, 483–497. [Google Scholar] [CrossRef]
- Multiple Features – UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/datasets/Multiple+Features (accessed on 28 January 2025).
- The Database of Faces. Available online: https://cam-orl.co.uk/facedatabase.html (accessed on 28 January 2025).
- UC Merced Land Use Dataset. Available online: http://weegee.vision.ucmerced.edu/datasets/landuse.html (accessed on 28 January 2025).
- 15-Scene Image Dataset. Available online: https://figshare.com/articles/dataset/15-Scene_Image_Dataset/7007177 (accessed on 28 January 2025).
- Fränti, P.; Sieranoja, S. Clustering accuracy. Appl. Comput. Intell. 2024, 4, 24–44. [Google Scholar] [CrossRef]
- Kvalseth, T.O. Entropy and correlation: Some comments. IEEE Trans. Syst. Man and Cybern. 1987, 17, 517–519. [Google Scholar] [CrossRef]
Notations | Description |
---|---|
Multimodal data/reconstructed data space. | |
-th modality data/reconstructed data space. | |
Number of modalities. | |
Number of clusters. | |
Multimodal data measure. | |
-th modality-specific data measure. | |
Clustering prototype set. | |
-th clustering prototype. | |
-th modality-specific clustering prototype set. | |
-th modality, -th clustering prototype. | |
Clustering prototype measure over all modalities. | |
-th modality-specific clustering prototype measure. | |
Clustering transport map from data to prototypes. | |
-th modality-specific transport map. | |
Common transport map over all modalities. | |
Multimodal data partition scheme. | |
Set of data split into -th cluster. | |
Coupled measure of measures and . | |
Generative transport plan from prototypes to data. | |
-th modality-specific generative transport plan. | |
Categorical prior distribution. | |
-th modality-specific categorical prior distribution. | |
Categorical distribution with parameter . | |
Clustering network with parameter . | |
-th modality soft-clustering-assignment matrix. | |
Generative network with parameter . | |
OT-based topological reconstruction loss. |
Dataset | Number | Modality | Class |
---|---|---|---|
Handwritten | 2000 | 2 (pixel/profile correlation) | 10 (handwritten numbers 0–9) |
ORL | 400 | 2 (intensity/Gabor) | 10 (face image shooting conditions) |
LandUse | 2100 | 2 (LBP/PHOG) | 21 (satellite image scene categories) |
Scene | 4485 | 2 (PHOG/GIST) | 15 (natural scene categories) |
Dataset | Handwritten | ORL | LandUse | Scene | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI |
FeatConcate | 0.6104 | 0.6070 | 0.5532 | 0.5710 | 0.7528 | 0.4770 | 0.1232 | 0.1608 | 0.0365 | 0.3076 | 0.3540 | 0.1862 |
DCCA [18] | 0.6626 | 0.6601 | 0.6136 | 0.5968 | 0.7784 | 0.5020 | 0.1551 | 0.2315 | 0.0443 | 0.3618 | 0.3892 | 0.2087 |
DCCAE [19] | 0.6917 | 0.6696 | 0.6327 | 0.5940 | 0.7752 | 0.4993 | 0.1562 | 0.2441 | 0.0442 | 0.3644 | 0.3978 | 0.2147 |
DMF [20] | 0.6962 | 0.7160 | 0.5993 | 0.6733 | 0.8164 | 0.5407 | 0.1450 | 0.1543 | 0.0360 | 0.3393 | 0.3576 | 0.1862 |
AE2-Nets [9] | 0.8152 | 0.7139 | 0.6667 | 0.6885 | 0.8573 | 0.5637 | 0.2479 | 0.3036 | 0.1035 | 0.3610 | 0.4039 | 0.2208 |
MVaDE [25] | 0.8875 | 0.8076 | 0.7765 | 0.6950 | 0.8356 | 0.5643 | 0.2248 | 0.2848 | 0.0936 | 0.3782 | 0.3992 | 0.2178 |
CoMVC [26] | 0.8205 | 0.8142 | 0.7559 | 0.7063 | 0.8652 | 0.5353 | 0.2624 | 0.3083 | 0.1085 | 0.3859 | 0.4117 | 0.2231 |
SiMVC [26] | 0.8295 | 0.7608 | 0.6985 | 0.6921 | 0.8560 | 0.5256 | 0.2448 | 0.2581 | 0.0957 | 0.3775 | 0.3935 | 0.2259 |
SDMVC [27] | 0.8990 | 0.8214 | 0.8007 | 0.7104 | 0.8557 | 0.5942 | 0.2681 | 0.2986 | 0.1201 | 0.3857 | 0.4132 | 0.2259 |
CMIB [10] | 0.8972 | 0.8178 | 0.7988 | 0.7207 | 0.8823 | 0.6004 | 0.2716 | 0.3016 | 0.1242 | 0.3954 | 0.4177 | 0.2383 |
OTMC | 0.9215 | 0.8496 | 0.8335 | 0.7650 | 0.8837 | 0.6496 | 0.2814 | 0.3254 | 0.1312 | 0.4181 | 0.4373 | 0.2621 |
Methods | ACC | NMI | ARI |
---|---|---|---|
modality-specific OT 1 | 0.7215 | 0.7276 | 0.6212 |
modality-specific OT 2 | 0.7255 | 0.7092 | 0.6141 |
modality-common | 0.9090 | 0.8306 | 0.8098 |
OTMC | 0.9215 | 0.8496 | 0.8335 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Z.; Shi, C.; Guan, Y. An Optimal-Transport-Based Multimodal Big Data Clustering. Electronics 2025, 14, 666. https://doi.org/10.3390/electronics14040666
Yang Z, Shi C, Guan Y. An Optimal-Transport-Based Multimodal Big Data Clustering. Electronics. 2025; 14(4):666. https://doi.org/10.3390/electronics14040666
Chicago/Turabian StyleYang, Zheng, Chongyang Shi, and Ying Guan. 2025. "An Optimal-Transport-Based Multimodal Big Data Clustering" Electronics 14, no. 4: 666. https://doi.org/10.3390/electronics14040666
APA StyleYang, Z., Shi, C., & Guan, Y. (2025). An Optimal-Transport-Based Multimodal Big Data Clustering. Electronics, 14(4), 666. https://doi.org/10.3390/electronics14040666