Next Article in Journal
The Effectiveness of eHMI Displays on Pedestrian–Autonomous Vehicle Interaction in Mixed-Traffic Environments
Previous Article in Journal
Deciphering Optimal Radar Ensemble for Advancing Sleep Posture Prediction through Multiview Convolutional Neural Network (MVCNN) Approach Using Spatial Radio Echo Map (SREM)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Score Images as a Modality: Enhancing Symbolic Music Understanding through Large-Scale Multimodal Pre-Training

1
School of Artificial Intelligence, Guangxi Colleges and Universities Key Laboratory of AI Algorithm Engineering, Guilin University of Electronic Technology, Guilin 541004, China
2
Engineering Comprehensive Training Center, Guilin University of Aerospace Technology, Guilin 541004, China
3
Cloud Computing & Big Data Center, Gongcheng Management Consulting Co., Ltd., Guangzhou 510630, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(15), 5017; https://doi.org/10.3390/s24155017
Submission received: 17 June 2024 / Revised: 23 July 2024 / Accepted: 1 August 2024 / Published: 2 August 2024
(This article belongs to the Section Sensing and Imaging)

Abstract

Symbolic music understanding is a critical challenge in artificial intelligence. While traditional symbolic music representations like MIDI capture essential musical elements, they often lack the nuanced expression in music scores. Leveraging the advancements in multimodal pre-training, particularly in visual-language pre-training, we propose a groundbreaking approach: the Score Images as a Modality (SIM) model. This model integrates music score images alongside MIDI data for enhanced symbolic music understanding. We also introduce novel pre-training tasks, including masked bar-attribute modeling and score-MIDI matching. These tasks enable the SIM model to capture music structures and align visual and symbolic representations effectively. Additionally, we present a meticulously curated dataset of matched score images and MIDI representations optimized for training the SIM model. Through experimental validation, we demonstrate the efficacy of our approach in advancing symbolic music understanding.
Keywords: music understanding; transformer; score images; large-scale pre-training music understanding; transformer; score images; large-scale pre-training

Share and Cite

MDPI and ACS Style

Qin, Y.; Xie, H.; Ding, S.; Li, Y.; Tan, B.; Ye, M. Score Images as a Modality: Enhancing Symbolic Music Understanding through Large-Scale Multimodal Pre-Training. Sensors 2024, 24, 5017. https://doi.org/10.3390/s24155017

AMA Style

Qin Y, Xie H, Ding S, Li Y, Tan B, Ye M. Score Images as a Modality: Enhancing Symbolic Music Understanding through Large-Scale Multimodal Pre-Training. Sensors. 2024; 24(15):5017. https://doi.org/10.3390/s24155017

Chicago/Turabian Style

Qin, Yang, Huiming Xie, Shuxue Ding, Yujie Li, Benying Tan, and Mingchuan Ye. 2024. "Score Images as a Modality: Enhancing Symbolic Music Understanding through Large-Scale Multimodal Pre-Training" Sensors 24, no. 15: 5017. https://doi.org/10.3390/s24155017

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop