**6. Conclusions**

We propose a framework for multimodal joint representation learning of pins on CCSNs. The obtained representation contains the information of user interests, which is useful for recommender systems and user modeling. We modeled boards and users with the FV and propose a series of recommendation methods for different recommendation tasks, including a novel board thumbnail recommendation defined by us and based on our pin recommendation. The experimental results show that the obtained representations perform better in terms of interpreting pin-level interests than unimodal representations with lower dimensions, and our recommendation methods based on our multimodal representation are effective in terms of recommending pins, board thumbnails, board categories, and boards.

**Author Contributions:** Conceptualization, H.L. and S.D.; methodology, H.L. and S.D.; software, B.Y. and D.Z.; validation, B.Y. and D.Z.; formal analysis, H.L. and S.D.; investigation, B.Y. and D.Z.; resources, B.Y.; data curation, H.L. and D.Z.; writing—original draft preparation, H.L., S.D. and B.Y.; writing—review and editing, L.W. and M.J.; visualization, H.L.; supervision, L.W.; project administration, M.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding. **Conflicts of Interest:** The authors declare no conflict of interest.
