Next Article in Journal
Some Approaches for Light and Color on the Surface of Mars
Next Article in Special Issue
Enhancing 3D-Printed Clay Models for Heritage Restoration Through 3D Scanning
Previous Article in Journal
Chemical Compounds, Bioactivities, and Applications of Chlorella vulgaris in Food, Feed and Medicine
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Digital Technology in Cultural Heritage: Construction and Evaluation Methods of AI-Based Ethnic Music Dataset

1
Academy of Music, Beihua University, Jilin City 132013, China
2
Department of Music Education, Kyungnam University, Changwon-si 51767, Republic of Korea
3
College of Computer Science and Technology, Beihua University, Jilin City 132013, China
4
Department of IT Convergence Engineering, Kyungnam University, Changwon-si 51767, Republic of Korea
5
Department of Computer Engineering, Kyungnam University, Changwon-si 51767, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(23), 10811; https://doi.org/10.3390/app142310811
Submission received: 1 November 2024 / Revised: 18 November 2024 / Accepted: 18 November 2024 / Published: 22 November 2024
(This article belongs to the Special Issue Application of Digital Technology in Cultural Heritage)

Abstract

:
This study focuses on the construction and evaluation of a high-quality Chinese Manchu music dataset designed to facilitate Artificial Intelligence (AI) research and applications within cultural heritage and ethnomusicology. Through a systematic collection and organization of diverse Manchu music resources, including folk songs, dance music, and ceremonial pieces, this dataset effectively represents the cultural breadth of Manchu music. The dataset includes digitized and preprocessed audio data, with comprehensive metadata annotations, such as essential information, musical features, and cultural context, creating a robust foundation for AI-based analysis. Experimental evaluations highlight the dataset’s utility across various AI-driven applications: in music classification, using a CNN model, an accuracy of 90% was achieved in the “folk ensemble” category, with an overall accuracy of 85.7% and a precision of 82.3%. For music generation, a Generative Adversarial Network (GAN) model yielded a quality score of 7.8/10 and a Fréchet Audio Distance (FAD) of 0.32. In emotion recognition, the Random Forest model achieved 87% accuracy in identifying the emotion “joy”. These results underscore the dataset’s potential in supporting digital preservation and expanding AI applications in ethnic music classification, generation, and emotional analysis, contributing to both cultural heritage preservation and AI advancement in ethnomusicology.

1. Introduction

1.1. Background and Motivation for Preserving Cultural Heritage with AI

The preservation of intangible cultural heritage, such as traditional music, is a pressing issue in the digital age. Music, as a culturally rich yet vulnerable medium, presents unique challenges for systematic cataloging and preservation. Recent advancements in Artificial Intelligence (AI) offer innovative solutions to these challenges, providing tools for efficient documentation, analysis, and even generation of traditional music forms.
This study focuses on the development of an AI-based framework for preserving Manchu music, a vital component of China’s ethnic musical heritage. By leveraging AI, we aim to systematically catalog and analyze traditional music recordings, enabling both preservation and further cultural development.
In recent decades, the preservation of intangible cultural heritage has become an essential focus for communities and institutions globally, as cultural identity forms a vital part of human history and societal development [1,2,3]. As part of this heritage, music embodies the values, stories, and artistic expressions unique to each culture. It serves as a powerful vehicle for passing down traditions, social customs, and communal beliefs across generations. This role is particularly pronounced among ethnic minority groups, where music often holds ceremonial, social, and educational functions that are integral to community cohesion and identity [4,5,6].
For the Manchu people—one of China’s ethnic minorities with a rich and complex cultural legacy—traditional music represents an invaluable aspect of their heritage [7]. Historically, Manchu music was central to rituals, community gatherings, and even royal ceremonies, especially during the Qing dynasty (1644–1912) when the Manchu people held political power. The roots of Manchu culture can be traced back to the early Jurchen tribes of the 10th century, who later unified under the leadership of Nurhaci in the early 17th century to establish the Later Jin Dynasty, which eventually became the Qing Dynasty. However, with the spread of modern cultural influences and shifts in social dynamics, traditional Manchu music faces a significant risk of decline [8,9]. As younger generations become increasingly distanced from their cultural roots, the practice and transmission of this music are diminishing, making preservation efforts both urgent and complex. The limited availability of resources, recorded materials, and professional documentation further complicates efforts to preserve this unique musical tradition. The diminishing influence of Manchu music is attributed to urbanization, the dominance of mainstream Han culture, and a general shift towards more contemporary musical forms [10,11,12].
To provide additional context, Figure 1 [13] below presents a geographical map of Northeast China, illustrating the traditional settlement areas of the Manchu people, including key regions such as Jilin, Liaoning, and Heilongjiang provinces, which historically formed the heartland of Manchu culture. The Manchus, who number about 4.2 million in China today, serve as a warning to China’s ethnic minorities to protect their culture and language, as only about 50 people still speak their native Manchu language.
The advancement of AI technologies provides a promising solution to these challenges by creating systematic, scalable methods for documenting, analyzing, and preserving ethnic music [14,15,16]. AI applications in ethnomusicology allow for the detailed capture of audio data, the automatic extraction of musical features, and the generation of new musical compositions that reflect traditional styles. For culturally rich but endangered music forms, AI enables large-scale digitization and cataloging of musical materials, making preservation efforts more sustainable and accessible. Additionally, AI can support ethnomusicologists and cultural institutions in analyzing and interpreting complex musical characteristics, such as rhythm patterns, tonal structures, and thematic motifs, which are often unique to specific ethnic music traditions.
Several online platforms serve as hubs for such efforts, providing resources for the digitization and analysis of ethnic music. Examples include the following:
Smithsonian Folkways Recordings (https://folkways.si.edu/, accessed on 1 November 2024): A comprehensive collection of world music, with a focus on preserving traditional and indigenous music.
Europeana Music (https://www.europeana.eu/en/collections/topic/44-music, accessed on 1 November 2024): A digital repository that offers access to diverse musical heritage across Europe, including traditional and folk music.
Internet Archive—Ethnomusicology (https://archive.org/details/ethnomusicology, accessed on 1 November 2024): A large digital library featuring a collection of audio materials, including field recordings of traditional music from various cultures.
The Global Jukebox (https://theglobaljukebox.org/, accessed on 1 November 2024): An interactive platform that allows users to explore and compare the traditional music and dance of cultures worldwide.
These platforms not only facilitate the preservation and dissemination of ethnic music but also provide valuable data for AI-driven research in ethnomusicology.
Creating datasets specifically tailored to ethnic music like that of the Manchu allows AI models to recognize, classify, and replicate musical patterns that are culturally significant, yet often underrepresented in mainstream datasets [17]. These datasets provide a foundation for both academic research and public engagement, helping ensure that such musical traditions are not only preserved in digital archives but also made accessible for study, reinterpretation, and transmission to future generations. Through AI-based dataset construction, the essence of Manchu music—its distinctive melodies, rhythms, and cultural contexts—can be accurately represented and studied, fostering deeper cultural appreciation and promoting global awareness of the Manchu musical heritage.
In this way, the intersection of AI and cultural preservation opens up new possibilities for protecting intangible heritage. It allows researchers, educators, and cultural advocates to leverage advanced technology to support traditional knowledge systems, ensuring that invaluable cultural expressions like Manchu music are documented, celebrated, and made accessible for generations to come.

1.2. Significance of Manchu Music in Chinese Culture

Manchu music holds a unique place within the rich tapestry of Chinese ethnic music, embodying the cultural identity, history, and artistic heritage of the Manchu people. As one of China’s most influential ethnic minorities, the Manchu community has contributed a distinctive musical tradition that reflects their way of life, spiritual beliefs, and social customs. Manchu music is more than an art form; it is a historical archive that offers insight into the worldview and social structure of the Manchu people, preserved and passed down through generations.
What sets Manchu music apart is its intricate blend of musical and cultural elements that are deeply intertwined with the daily lives and rituals of the Manchu people. One of the most notable features is the use of unique instruments, such as the bili (a type of reed pipe) and sheng (a mouth-blown free reed instrument), which produce distinctive, resonant tones not commonly found in other Chinese musical traditions. These instruments often accompany complex rhythmic patterns and layered melodies, reflecting the dynamic nature of Manchu life, from ceremonial grandeur to communal festivities.
Another hallmark of Manchu music is its vocal techniques and lyrical content. Songs often incorporate vibrant call-and-response structures, creating an interactive and communal performance experience. The lyrics frequently draw from oral histories and folklore, offering a poetic narrative of the Manchu people’s connection to their environment, ancestral legends, and spiritual beliefs. For example, many songs celebrate the changing seasons, paying homage to the natural cycles that governed their traditional agricultural lifestyle.
Furthermore, Manchu music plays a crucial role in shamanistic ceremonies, where it serves as a bridge between the physical and spiritual realms. Specific songs and instrumental pieces are performed to invoke ancestral spirits, guiding rituals that are central to the community’s spiritual practices. The deeply symbolic association of music with life events—births, marriages, and funerals—emphasizes its integral role in marking the passage of time and reinforcing cultural identity.
Together, these elements highlight the distinctiveness of Manchu music, not just as a form of entertainment but as a profound expression of the Manchu people’s spirituality, history, and social cohesion.
The origins of Manchu music can be traced back over a millennium, with roots in both nomadic and agrarian lifestyles, which are reflected in its melodies, rhythms, and instrumentation. As early as the 10th century, the Jurchen tribes—ancestors of the Manchu—developed musical traditions that accompanied their daily activities, rituals, and celebrations. These early forms evolved as the Jurchens transitioned from nomadic herders to settled agriculturalists, influencing the musical styles and instruments they used. Over time, Manchu music developed a wide variety of forms, each suited to different cultural functions and social occasions [18,19,20,21,22].
Folk songs, for instance, capture the daily lives and personal emotions of the Manchu people, with themes that range from love and work to celebration and community bonding. Dance music, with its distinctive rhythmic patterns and vibrant energy, often accompanies Manchu folk dances during community events and festivals. Ceremonial music, on the other hand, is deeply intertwined with the religious and ritual practices of the Manchu, particularly shamanistic ceremonies that date back centuries, aiming to honor ancestors and connect with the spiritual world. Each of these forms contributes to a complex musical identity that is both varied and cohesive, expressing the diverse yet unified character of Manchu culture.
The Qing dynasty (1644–1912) marked a period of significant development and influence for Manchu music, as the Manchu people rose to power and established themselves as the ruling class in China. During this time, Manchu music became an integral part of court life, blending with Han music traditions to create a refined and ceremonial style that was distinct yet accessible to the broader Chinese society. The Qing court adopted many Manchu musical pieces for state ceremonies, and instruments like the Manchu drum and flute became central to imperial performances. This integration of Manchu music into the royal courts and public ceremonies not only elevated its status but also facilitated cultural exchange between the Manchu and other ethnic groups in China, particularly the Han. The resulting fusion of musical elements helped to shape a unique artistic legacy that continues to influence Chinese music and culture today.
In the modern era, traditional Manchu music serves as a living record of the Manchu people’s history and cultural evolution. Its preservation is vital not only for maintaining the musical practices of an ethnic group but also for understanding the broader narrative of Chinese cultural development. As an artistic expression that captures the Manchu community’s beliefs, lifestyles, and interactions with the natural and spiritual worlds, Manchu music is invaluable for its historical, cultural, and anthropological significance. Through its melodies, rhythms, and lyrics, it allows contemporary listeners to connect with the heritage of the Manchu people, gaining a deeper appreciation for their contributions to China’s multicultural identity.
As efforts to preserve and document Manchu music continue, this musical tradition remains a crucial medium through which the Manchu people’s distinct identity and historical legacy are recognized and celebrated. In this context, creating a comprehensive and accessible dataset of Manchu music can provide invaluable support for both cultural preservation and academic research, allowing future generations to explore, understand, and appreciate the Manchu musical heritage within the broader framework of Chinese culture.

1.3. Need for Dedicated AI Datasets in Ethnic Music Preservation

Despite the cultural richness and historical significance of Manchu music, this unique musical tradition is notably underrepresented in today’s digital archives and databases. Existing music datasets, which are predominantly designed around Western classical or popular genres, fall short in capturing the diversity and specificity required for the study of ethnic and traditional music forms like those of the Manchu. This lack of representation in digital resources poses a significant barrier to research and preservation efforts, limiting our ability to document, analyze, and promote the legacy of Manchu music using modern technological tools [23].
In the context of AI, datasets are foundational to model development and application. AI technologies such as machine learning and deep learning rely heavily on large, high-quality datasets for tasks such as classification, generation, and emotional analysis. However, without datasets that accurately represent the distinct features of Manchu music—including its tonalities, rhythms, instrumentation, and cultural context—AI models cannot adequately process or interpret this type of music. The absence of comprehensive datasets tailored to ethnic music restricts the scope of AI research in ethnomusicology, preventing the development of models capable of analyzing and generating ethnic music forms with authenticity and cultural sensitivity.
This study aims to address these challenges by constructing a dedicated, high-quality dataset specifically for Manchu music. The dataset is carefully curated to include a wide range of Manchu music forms, from folk songs and dance music to ceremonial pieces, each with detailed annotations that capture both musical characteristics and cultural significance. Each piece in the dataset is documented with metadata on historical context, performance style, and traditional instrumentation, ensuring that the AI models trained on these data can understand and replicate the nuances unique to Manchu music. By providing comprehensive annotations, this dataset facilitates a deeper AI-driven analysis, enabling applications that go beyond simple music categorization to more complex tasks like music generation and emotion recognition, where context and cultural background are essential for accuracy.
The creation of this dataset represents a significant step toward bridging the gap between cultural heritage preservation and technological innovation. By offering an AI-ready resource that encompasses the unique qualities of Manchu music, this dataset contributes to the long-term digital preservation of Manchu cultural heritage. It allows researchers to apply AI techniques to explore Manchu music in new ways, such as identifying distinctive musical patterns, generating music in traditional Manchu styles, and analyzing emotional expressions embedded within the music. Furthermore, this resource opens new possibilities for interactive and educational applications, allowing a global audience to experience and learn about Manchu music within its cultural framework.
Ultimately, this study contributes not only to the preservation and accessibility of Manchu music but also to the advancement of AI applications in ethnomusicology. It establishes a framework for how ethnic music can be systematically documented and studied using AI, demonstrating the potential of technology to support and revitalize intangible cultural heritage. This initiative thus sets a precedent for future efforts in preserving and promoting ethnic music traditions, fostering a richer, more diverse understanding of global musical heritage in the digital age.

1.4. Research Objectives

This study aims to create a high-quality, comprehensive dataset of Manchu music to advance AI research in the field of ethnic music and cultural heritage preservation. By focusing on this unique musical tradition, the research seeks to address critical gaps in digital archives and datasets, which often overlook ethnic and traditional music forms. The dataset is intended to serve as a foundational resource for both academic research and practical applications in AI-driven music analysis and generation.
  • Systematic Collection and Digitization: This involves gathering a diverse range of Manchu music recordings, including folk songs, ceremonial pieces, and dance music, from historical archives and field recordings. The digitization process ensures high-quality digital formats, with preprocessing tasks like noise reduction and the removal of duplicates to maintain dataset integrity.
  • Comprehensive Metadata Annotation: Detailed metadata will be annotated for each recording, capturing basic information (e.g., title, performer, date) and complex musical features (e.g., rhythm, tonal structure). Cultural and historical contexts will also be documented to enrich the dataset and support AI analysis and ethnomusicological research.
  • Validation Through AI Experiments: The dataset’s utility will be tested through AI experiments in music classification, generation, and emotion analysis. These tasks will demonstrate the dataset’s versatility and its potential to advance both AI and ethnomusicology research.
By achieving these objectives, this study not only contributes to the preservation and analysis of Manchu music but also establishes a framework for integrating AI technology into the broader effort of safeguarding intangible cultural heritage. The dataset and associated methodologies will provide a model for future initiatives in documenting and analyzing the musical traditions of other ethnic groups.

2. Related Works

2.1. Existing Music Datasets and Their Applications

With advancements in AI technology, music datasets have become fundamental resources for tasks in music information retrieval, recommendation systems, automated composition, and emotion analysis [24,25,26,27,28,29,30,31,32]. Prominent datasets, including the Million Song Dataset (MSD) [33], GTZAN Genre Collection [19], MagnaTagATune [34], Nsynth [35], and the Free Music Archive (FMA) Dataset [36], have underpinned various applications such as music classification, recommendation, and automated generation. These datasets provide essential frameworks for building AI models capable of recognizing musical features, predicting user preferences, and generating compositions, which collectively support the broader field of music informatics.
Music recommendation systems, for instance, utilize collaborative filtering, content-based approaches, and hybrid algorithms to deliver tailored song suggestions based on users’ listening patterns and preferences. Music information retrieval (MIR) [37] relies on audio feature extraction techniques to enable efficient search and categorization by analyzing attributes like melody, rhythm, and timbre. In automated music composition, deep learning models, including Generative Adversarial Networks (GANs) [38] and Recurrent Neural Networks (RNNs) [39], are trained on datasets to capture compositional structures and patterns, allowing them to generate new music pieces with coherent style and flow [40]. Additionally, emotion analysis combines musical features such as pitch, rhythm, and harmony with labeled emotional data to train AI models that can classify and generate music according to specific emotional tones, enhancing user engagement and experience [41,42].
Despite their utility, existing music datasets present limitations, particularly in the context of ethnomusicology and the preservation of cultural heritage. Major datasets are often biased towards Western popular music, leading to gaps in the representation of traditional and ethnic music forms. These datasets frequently lack diversity, with limited audio and metadata annotations that fail to capture the cultural context or performance nuances unique to traditional music. Issues such as data duplication, erroneous tagging, and inconsistent audio quality further hinder their effectiveness, especially in applications requiring rich, culturally diverse datasets. Moreover, the scale and depth of these datasets are often insufficient for training AI models intended for the comprehensive analysis of non-Western and traditional music genres.
To address these gaps, the construction of specialized datasets, such as an AI-ready collection focused on ethnic music like Manchu music, is essential. By incorporating detailed annotations that highlight cultural context, musical characteristics, and historical background, such datasets can enhance AI’s ability to process and interpret ethnic music forms accurately. This focus on diversity and representation not only broadens the scope of AI applications in music analysis but also plays a crucial role in preserving intangible cultural heritage by providing a structured, digital record of traditional music for future generations.

2.2. Research on Traditional Chinese Music Datasets

Traditional Chinese music, a significant part of world music culture, has a long history and diverse forms [43]. While modern technology has made great strides in developing datasets for Western popular music, research on traditional Chinese music datasets remains relatively limited. Existing datasets primarily focus on three categories: Chinese classical music, Chinese folk music, and Chinese opera.
Chinese classical music datasets feature recordings of traditional instruments such as guqin, guzheng, and pipa [44,45]. These high-quality recordings capture intricate performance techniques and are widely used in tasks like instrument recognition, music style classification, and music generation. Chinese folk music datasets cover the music of various ethnic groups, such as Han, Zhuang, and Miao, collected from field recordings and cultural heritage practitioners. They are instrumental for studies in ethnic music research, cultural preservation, and emotion analysis [46,47]. Chinese opera music datasets, which include recordings of Peking Opera, Yue Opera, and Huangmei Opera, document vocal techniques, performance movements, and stage settings, supporting opera performance analysis and music classification [48,49].
Significant progress has been made in applying these datasets to areas like instrument recognition, style classification, music generation, and emotion analysis. For instance, deep learning models such as CNNs have achieved high-accuracy instrument recognition, while GANs and VAEs have been used to generate new compositions in traditional Chinese styles, including guqin and folk songs [50,51]. Emotion analysis models, leveraging audio feature extraction, have been developed to classify and generate music with different emotional states.
However, these datasets face challenges such as limited coverage, incomplete annotations, variable data quality, and insufficient interdisciplinary collaboration. Many datasets focus on well-known instruments and genres, neglecting less common instruments and regional music. The lack of detailed metadata—covering aspects like instrument usage, performance techniques, and emotion tags—reduces their usability for AI training. Furthermore, variable audio quality and a lack of cultural context in annotations hinder comprehensive research and interdisciplinary applications.
To address these issues, this study proposes constructing a dedicated Manchu music dataset. This dataset will systematically collect and annotate Manchu music resources to ensure comprehensive coverage, detailed metadata, and high-quality audio processing. Interdisciplinary collaboration with experts in musicology, ethnology, and cultural heritage preservation will enrich its cultural content, providing a solid foundation for AI research in ethnic music.
The Manchu music dataset will not only advance intelligent music analysis and generation technologies but also contribute to the digital preservation and dissemination of Manchu music, promoting cultural heritage preservation and transmission. This effort offers significant academic and societal value, fostering a deeper understanding and appreciation of traditional Chinese music.

2.3. Previous Research on Manchu Music and Its Characteristics

Manchu music, a crucial part of Chinese ethnic music culture, has a rich historical background and unique artistic features [52]. Its origins can be traced back to the ancient Jurchen people. Over time and through cultural integration during the Ming and Qing dynasties, Manchu music developed into a distinctive musical system [53,54]. Manchu music not only retains its traditional characteristics but also incorporates musical elements from other ethnic groups, such as Mongolian and Han, reflecting cultural diversity and inclusiveness.
Key Characteristics of Manchu Music: Manchu music is renowned for its unique instruments and diverse musical forms [55,56]. Chinese traditional folk musical instruments are shown in Figure 2 [57]. The traditional instruments like drums, suona, bili, and transverse flutes play vital roles in various musical forms, while court instruments like guqin, pipa, and zheng hold significant positions in courtly music.
Manchu music features diverse styles, from the vigorous drum music to the melodious folk songs, characterized by simple, bright melodies and strong rhythmic patterns, embodying a rich ethnic flavor. Manchu musical instruments and performance scenes are shown in Figure 3.
Summary of Previous Research: Existing studies have categorized and analyzed Manchu music’s characteristics, exploring its historical evolution, cultural background, and presence in both court and folk music. Research has delved into the classification of Manchu music, tracing its development from early tribal traditions to its incorporation into imperial court ceremonies. Scholars have also investigated the interplay between Manchu court music and the broader musical landscape of the Qing Dynasty, highlighting its influence on court rituals and ceremonial practices. Additionally, traditional Manchu music has been studied through field investigations, which document and analyze local musical forms, performance techniques, and their connection to cultural customs and community life. These studies collectively provide a comprehensive understanding of Manchu music’s unique role in preserving cultural identity and history.
Research Limitations: Despite the comprehensive analysis in previous studies, there are limitations, including limited data, incomplete annotations, and insufficient technological methods [58,59,60]. Existing research relies heavily on field investigations and historical documents, lacking systematic and large-scale datasets, limiting research depth and breadth. Many Manchu music data lack detailed annotations, especially regarding instrument usage, performance techniques, and cultural background, restricting AI model training and application [61,62]. Furthermore, traditional music research methods depend on manual analysis, lacking advanced technological approaches like machine learning and deep learning, limiting research innovation and efficiency.
Research Significance: This study aims to address these limitations by constructing a Manchu music dataset. Through systematic music data collection, detailed metadata annotation, and high-quality data processing, we aim to create a rich, comprehensive, and accurate Manchu music dataset, providing a solid foundation for AI research in ethnic music. The study will integrate modern technological methods, such as machine learning and deep learning, to enhance the depth and breadth of Manchu music research, promoting the protection and transmission of ethnic music.

2.4. Challenges in Constructing Music Datasets for AI Research

Constructing music datasets for AI research involves several challenges, including data collection, annotation, quality, and legal and ethical considerations [63,64].
Music data comes from diverse and dispersed sources, such as studio recordings, live performances, and field collections, making it difficult to centralize. Partnerships with institutions and the use of online platforms can streamline this process. Navigating copyright permissions is complex and time-consuming, requiring collaboration with copyright holders to clarify data usage and obtain the necessary permissions.
Annotating music data involves capturing details like pitch, rhythm, harmony, and instrumentation, which is both labor-intensive and time-consuming. Semi-automated annotation, supported by machine learning and expert review, can improve efficiency and accuracy. Ensuring consistency across annotators is another challenge, which can be mitigated with detailed guidelines, training, and cross-validation. Rich metadata, including cultural context and performance techniques, is crucial but often hard to acquire, necessitating collaboration with subject-matter experts.
Audio quality varies due to differences in recording equipment and environments, which can negatively impact AI model performance. Advanced audio processing techniques can enhance audio quality. Data integrity issues, such as missing or incomplete data, affect dataset usability and can be addressed through regular integrity checks and systematic data management.
Music datasets often involve copyright-protected works, requiring adherence to legal frameworks and permissions to avoid disputes. Privacy concerns regarding performers and creators can be addressed through data anonymization. Cultural sensitivity is essential to respect traditions and customs, ensuring ethical data usage.
Constructing music datasets involves complex tasks, such as audio feature extraction and data cleaning, requiring advanced technologies and efficient tools. Interdisciplinary collaboration among musicologists, computer scientists, and ethnologists is vital for enhancing dataset diversity and research depth. Establishing collaborative platforms can facilitate this process and ensure comprehensive dataset development.
Addressing these challenges is crucial for developing high-quality music datasets that can effectively support AI research in music analysis, generation, and emotion recognition. Ensuring robust data collection, comprehensive annotation, and consistent quality enhance the datasets’ reliability and utility. Moreover, respecting legal, ethical, and cultural considerations is vital to safeguard the rights of creators, protect privacy, and honor the cultural significance of the music. This not only ensures compliance with legal standards but also fosters trust and collaboration among stakeholders, including researchers, cultural institutions, and communities. By tackling these issues, the resulting datasets can serve as valuable resources for advancing AI applications in ethnomusicology, contributing to the preservation and dissemination of cultural heritage on a global scale.

3. Method

3.1. Data Collection

Data collection is a critical step in constructing a high-quality dataset of Manchu music. To ensure the comprehensiveness and accuracy of the dataset, data sources include historical documents and archives, audio recordings, video materials, and field research. Historical documents and archives are mainly sourced from libraries, archival institutions, and museums, encompassing ancient musical scores, historical records of Manchu music, and archives related to royal court music. Audio recordings are obtained from music academies, cultural research institutions, and Manchu communities, and include high-quality recordings of traditional Manchu instrumental performances, folk songs, and music associated with rituals and celebrations. Video materials are primarily sourced from Manchu cultural festivals, musical performances, and educational videos, covering aspects such as Manchu music performances, the integration of dance and music, and traditional music teaching methods. Field research involves visiting Manchu settlements and cultural heritage sites to collect on-site recordings and videos, oral histories, and performances and explanations from folk artists and heritage bearers. The data collection framework is shown in Figure 4.
The data collection process includes preliminary planning, information gathering, data organization, metadata annotation, and data review. In the preliminary planning phase, the objectives and scope of the dataset are defined; the types, quantities, and quality standards of music to be collected are determined; and collaborative relationships are established with music academies, cultural research institutions, and Manchu communities to secure support and resources. During the information gathering stage, relevant literature and archives are systematically reviewed in libraries and archival institutions, digitized, and preserved in electronic formats; high-quality audio and video materials are collected through partnerships and field research using professional recording and filming equipment to ensure data quality.
In the data organization phase, the collected data are categorized according to music type, performance form, instrument type, and other classifications, and all audio and video materials are digitized and stored in a uniform format for subsequent processing and analysis. The metadata annotation phase involves adding detailed metadata for each data entry, including performer information, types of instruments, performance techniques, music genres, and cultural contexts, ensuring consistency and accuracy in the data annotation. Finally, the data review phase entails a comprehensive quality check on the organized and annotated data to ensure its completeness and accuracy, inviting experts from musicology, ethnology, and related fields to review the data to confirm its academic value and cultural authenticity.
Various advanced tools and technologies are employed during the data collection process, including high-fidelity recording equipment, high-definition video cameras, multi-angle shooting techniques, and professional data management software and metadata annotation tools, ensuring high quality and efficient management of the data. However, data collection also faces several challenges, including difficulties in data acquisition, issues with audio and video quality, the large workload associated with data annotation, and concerns regarding copyright and ethics. To address these challenges, the research team has strengthened collaborations with relevant institutions, utilized digital platforms and online resources to broadly collect and integrate data, employed advanced recording and filming equipment, and performed professional audio and video processing in post-production. Furthermore, semi-automated annotation tools may be used in conjunction with expert reviews to ensure both the efficiency and accuracy of annotations, while clearly defining the scope and purpose of data usage to obtain necessary permissions, thereby protecting the rights of performers and creators.

3.2. Data Processing

Data processing is a critical step in ensuring the high quality and practicality of the Manchu music dataset, as shown in Table 1.
Audio data processing includes audio cleaning, segmentation, feature extraction, and format conversion. Audio cleaning involves the use of noise reduction algorithms to remove background noise from recordings and standardize volume levels, thereby ensuring consistency and clarity in the audio. Audio segmentation employs audio segmentation algorithms for automatic clipping, supplemented by manual corrections based on professional auditory assessments to ensure the accuracy of the segments. In the domain of feature extraction, techniques such as Fast Fourier Transform (FFT) are utilized for spectral analysis to extract spectral features, as well as time-domain features and higher-level musical characteristics. All audio files are uniformly converted to formats such as WAV or FLAC, with various sampling rates provided to meet different application needs.
Video data processing involves video editing, enhancement, and audio-video synchronization. Automated editing software is employed to identify and extract key segments of music performances; these clips are subsequently refined by professionals to ensure the video’s conciseness and focus. Video enhancement is achieved through image processing algorithms that improve clarity and color, as well as repair and optimize low-quality videos, with frame rates adjusted to ensure smooth playback. Audio-video synchronization utilizes audio recognition and alignment algorithms, accompanied by manual validation from professionals to ensure high precision in the synchronization of audio and video.
Metadata processing is another crucial aspect of data processing, involving the standardization, automatic annotation, and manual review of metadata. Metadata is formatted uniformly in structures such as JSON or XML, clearly defining each field and encompassing performer information, types of instruments, and performance techniques. Key data extracted include the title of the music piece, composer details, recording date and location, duration, genre, and specific cultural or historical context. Machine learning and natural language processing (NLP) techniques are employed for the preliminary extraction and annotation of metadata and enhancing annotation efficiency, and dedicated metadata annotation tools are developed to support bulk annotation and automatic corrections. The manual review phase engages experts in musicology and ethnology to ensure the accuracy and authority of the annotations, with consistency-checking algorithms employed to guarantee the uniformity and completeness of all metadata annotations.
Data storage and management utilize relational databases and file storage systems. Structured metadata and associated audio and video files are stored in relational databases such as MySQL or PostgreSQL, while large-scale audio and video files utilize distributed file storage systems like HDFS, ensuring the reliability and scalability of data storage. A multi-tier backup strategy and off-site backup mechanisms are implemented to ensure data security and recovery capabilities in disaster scenarios. In terms of data access, RESTful API interfaces are developed to provide standardized data access and query services, along with a comprehensive user permissions management system to ensure the secure access and utilization of data.
The data processing phase confronts challenges related to large data volumes, variable data quality, and processing complexity. To address these challenges, distributed computing and storage technologies are harnessed to enhance data processing efficiency and storage capacity, while cloud computing platforms are utilized for elastic scalability. Strict data processing standards are established, and advanced processing algorithms and tools are deployed to ensure consistency and quality in data processing. A multidisciplinary team is assembled to combine expertise from musicology and computer science to collaboratively address the technical challenges encountered during data processing.

3.3. Dataset Structure

The design of the dataset structure directly influences the usability, scalability, and maintainability of the dataset. To construct an efficient dataset of Manchu music, this section provides a detailed description of the dataset’s hierarchical structure, organization of data files, metadata design, and data storage and access methods. The dataset structure is shown in Table 2.
First and foremost, the Manchu music dataset employs a hierarchical structure divided into multiple levels to ensure organized and manageable data. The top-level directory is named Mongol_Music_Dataset, encompassing all data files and related resources for Manchu music. This main directory is further categorized according to music types, instrument types, and performance forms, such as Traditional_Songs, Instrumental_Music, Ceremonial_Music, and Folk_Dances. Each category folder is further divided into subdirectories for audio files, video files, and metadata files. For example, the Traditional_Songs directory contains subdirectories named Audio_Files, Video_Files, and Metadata. A standardized file naming convention is employed, incorporating data type, performer, instrument, and recording date, such as TS_Audio_SingerName_Instrument_YYYYMMDD.wav.
By combining lossless audio formats with systematic naming practices, the dataset maintains an organized and high-quality audio library. Video files are stored in high-definition formats such as MP4 and MOV to maintain video clarity, following the same naming conventions. Metadata files use structured data formats like JSON and XML for ease of parsing and processing, again conforming to similar naming conventions that include detailed information.
Metadata is a crucial component of the dataset, providing detailed information about each music entry. The metadata design encompasses key fields such as basic information, performer information, music details, technical data, cultural context, and copyright information. Basic information includes a unique identifier, music title, and description. Performer information covers the performer’s name, gender, age, and background. Music details encompass the type of music, instruments used, performance format, recording date, and location. Technical data involves audio and video formats, as well as duration. Cultural context describes the cultural background, historical significance, and usage scenarios of the music. Copyright information includes details about the copyright holder and usage rights.
In terms of data storage and access methods, local storage is implemented on local servers or storage devices, utilizing a layered storage structure to ensure data security and availability. Cloud storage services such as AWS S3 or Google Cloud Storage are leveraged for elastic scalability and remote access of data. Data access is facilitated through a RESTful API interface, supporting standardized data retrieval and queries, with functionalities for data searching, downloading, and updating. User permission management employs Role-Based Access Control (RBAC), defining different user roles and their associated permissions to ensure secure data utilization, while logging data access and operations for ease of tracking and auditing.

3.4. Quality Control

To ensure the high quality and reliability of the Manchu music dataset, quality control is implemented throughout data collection, processing, storage, and publication. This section outlines the specific methods and steps involved in maintaining data quality.
During data collection, source verification is crucial. Authoritative music academies, cultural research institutions, and Manchu communities are selected to ensure data reliability. Similar data types are cross-verified from multiple independent sources. High-fidelity recording equipment and real-time monitoring by professionals ensure optimal audio and video quality. In data cleaning and preprocessing, noise filtering removes background noise, and segmented audio and video files are manually corrected for accuracy.
In the data processing phase, consistency checks and quality assessments are key. Uniform processing standards are maintained through periodic checks, while quality assessment tools ensure clarity and sound quality. Multiple algorithms are used for feature extraction, with results cross-validated and manually reviewed for accuracy. Metadata annotations are completed using automated tools and manual corrections, followed by expert reviews to ensure consistency and completeness.
For data storage, multiple backup strategies prevent data loss, with real-time system monitoring to resolve faults promptly. Regular consistency checks and version control ensure data traceability and reliability.
Before data publication, a data release review is conducted by experts in musicology and related fields to verify academic value and cultural accuracy. User feedback further ensures usability. Data access is managed securely, and privacy and copyright protections are rigorously enforced to ensure legal compliance.
Challenges such as data diversity, the large workload of manual reviews, and integrating technology with professional knowledge are addressed through standardized workflows and automated tools, improving efficiency and ensuring high-quality results.
In summary, the Manchu Music Dataset serves as a comprehensive resource for pre-serving and analyzing traditional Manchu music. To further illustrate its scope and diversity, we present a summary of the dataset’s key characteristics in Table 3. This table provides a consolidated view of the dataset’s composition, offering insights into its potential for supporting a wide range of research applications.
Building on this dataset, the next chapter delves into its application across various experimental scenarios, including music classification, generation, and emotion analysis. These experiments demonstrate the practical value of the dataset and highlight its versatility in advancing both ethnomusicology and AI-driven music research.

4. Experimental Setup

4.1. AI Models and Algorithms for Testing the Dataset

To validate and evaluate the effectiveness and utility of the constructed Manchu music dataset, this study employs various AI models and algorithms for experimental testing. These models span audio processing, music information retrieval, music generation, and classification, providing a comprehensive assessment of the dataset’s performance and potential in different application scenarios.
1.
Audio Processing and Feature Extraction:
Mel-frequency Cepstral Coefficients (MFCC): MFCC is a commonly used audio feature extraction technique that captures the main characteristics of an audio signal by calculating its spectral features. In this experiment, MFCC is used to extract the primary frequency components from Manchu music audio, analyzing the characteristic differences across different music genres and performance styles.
Linear Predictive Coding (LPC): LPC models the generation process of an audio signal to extract parameters reflecting its structural characteristics. It is applied to analyze and compare the audio structural features of different Manchu music segments, providing a deeper understanding of the inherent properties of the audio signals.
Short-Time Fourier Transform (STFT): STFT divides the audio signal into small segments and performs Fourier Transform on each segment to obtain time-frequency features. In this study, STFT is used to analyze the time-frequency characteristics of Manchu music, identifying rhythm and melody changes in the music.
The feature extraction framework is shown in Figure 5.
2.
Music Information Retrieval (MIR):
Convolutional Neural Network (CNN): CNN is a deep learning model proficient in the feature extraction and classification of images and audio signals. In the experiment, CNN is used for the classification and retrieval of Manchu music segments, automatically identifying music genres and performers based on audio features.
Recurrent Neural Network (RNN): RNN is suitable for handling sequential data, capturing temporal dependencies within sequences. This study uses RNN to analyze the temporal features of Manchu music, recognizing repetitive patterns and rhythmic structures.
Autoencoder: An autoencoder is an unsupervised learning model that learns a low-dimensional representation of data through encoding and decoding. It is applied in this study for feature dimensionality reduction and compression, enhancing the efficiency of music retrieval and classification.
The music information retrieval algorithm framework is shown in Figure 6.
3.
Music Generation and Synthesis:
Generative Adversarial Network (GAN): GAN generates new data resembling real data distribution through adversarial training between a generator and a discriminator. In this experiment, GAN is used to generate new Manchu music segments, demonstrating the dataset’s application potential in music generation.
Variational Autoencoder (VAE): VAE learns the latent representation of data through variational inference and generates new data. In this experiment, VAE is used for style transfer and variation in Manchu music, generating diverse music segments.
Transformer: Based on the attention mechanism, the Transformer model is widely used in natural language processing and sequence generation tasks. This study utilizes the Transformer to generate complex melodies and harmonies in Manchu music.
The framework of the music generation and synthesis algorithm is shown in Figure 7.
4.
Music Classification and Emotion Analysis:
Support Vector Machine (SVM): SVM is a classical machine learning algorithm suitable for classification and regression tasks. In this experiment, SVM is used for emotion classification of Manchu music segments, identifying emotional attributes of the music through audio features.
Random Forest: Random Forest enhances classification and regression accuracy through the combination of multiple decision trees. In this study, it is employed for the classification and emotion analysis of Manchu music, improving model robustness and accuracy.
k-Nearest Neighbors (k-NN): k-NN is a simple yet effective classification algorithm that classifies data points based on similarity. In the experiment, k-NN is used for the quick classification and retrieval of Manchu music segments, recommending music based on audio feature similarity.
The framework of the music generation and synthesis algorithm is shown in Figure 8.

4.2. Baseline Experiment Description

The baseline experiment is a critical step in evaluating the effectiveness and practical value of the Manchu music dataset. By establishing baseline experiments, we can define the fundamental performance metrics of various models and algorithms when processing this dataset, providing a reference for subsequent optimization. This section details the design and implementation of the baseline experiments, including the objectives, data partitioning, experimental setup, performance metrics, and result analysis.
1.
The primary objectives of the baseline experiments are as follows:
To validate the quality of the dataset: Assessing the quality, integrity, and usability of the audio and video files in the dataset.
To evaluate model performance: Assessing the performance of different AI models and algorithms on the Manchu music dataset, providing a reference for further optimization and improvement.
To explore the dataset’s application potential: Investigating the potential applications of the dataset in tasks such as music classification, feature extraction, and emotion analysis.
2.
To ensure the reliability and reproducibility of the experimental results, the dataset is divided according to the following principles:
Training Set: Used for model training, comprising 70% of the total dataset.
Validation Set: Used for tuning and selecting models, comprising 15% of the total dataset.
Test Set: Used for the final evaluation of model performance, comprising 15% of the total dataset.
Random Partitioning: Data are randomly partitioned based on the number and types of music segments, ensuring diversity and representativeness of music genres and performance styles across all subsets.
Stratified Sampling: In addition to random partitioning, stratified sampling is employed to ensure that the data distribution in each subset is consistent with the original dataset.
3.
The specific setup for the baseline experiments includes the following:
Hardware Environment: Experiments are conducted on a high-performance computing cluster equipped with NVIDIA GPUs and large storage capacity.
Software Environment: The experiments use Python programming language, deep learning frameworks (PyTorch), audio processing libraries (Librosa), and data analysis tools (Pandas).
Model Selection:
Feature Extraction Models: MFCC, LPC, STFT
Classification Models: CNN, RNN, SVM
Generative Models: GAN, VAE, Transformer
Emotion Analysis Models: Random Forest, k-NN
Experimental Procedures:
Data Preprocessing: Includes denoising, segmentation, and feature extraction from audio and video files.
Model Training: Training models using the training set, adjusting hyperparameters, and optimizing model performance.
Model Validation: Validating models using the validation set, selecting the best model and parameter settings.
Model Testing: Conducting the final evaluation using the test set and recording the model’s performance metrics across different tasks.
4.
To comprehensively evaluate model performance, the baseline experiments use the following metrics:
Accuracy (A): The accuracy of classification models on the test set, reflecting the overall classification capability of the model.
Precision (P): The precision of the model in specific categories, indicating the accuracy of the model’s recognition in those categories.
Recall (R): The recall of the model in specific categories, indicating the model’s coverage in recognizing those categories.
F1 Score (F1): The harmonic mean of precision and recall, providing a comprehensive measure of the model’s performance in specific categories.
Generation Quality (GQ): The quality of music segments generated by generative models, evaluated through subjective assessments and objective metrics such as audio similarity.
Processing Time (PT): The time taken by the model during training, validation, and testing phases, indicating the computational efficiency of the model.
5.
The results of the baseline experiments will be analyzed in the following aspects:
Model Performance Comparison: Comparing the performance of different models and algorithms across various tasks, analyzing their strengths, weaknesses, and applicable scenarios.
Dataset Quality Assessment: Evaluating the quality and suitability of the dataset based on baseline experiment results, identifying potential issues and areas for improvement.
Result Visualization: Utilizing charts and visualization tools to present experimental results, facilitating intuitive analysis and understanding.
Exploration of Application Potential: Exploring the practical applications of the dataset based on baseline experiment results, providing insights for future research.

4.3. Performance Metrics and Evaluation Standards

To comprehensively evaluate the performance of AI models on the Manchu music dataset, this study employs a variety of performance metrics and evaluation standards. These metrics cover aspects such as model accuracy, efficiency, and generation quality, ensuring a holistic assessment of model performance. The following are the detailed performance metrics and evaluation standards.
The performance metrics for classification models are used to evaluate the overall effectiveness of these models in tasks such as music genre and emotion classification, as shown in Formula (1) as follows:
A c c u r a c y = T P + T N T P + T N + F P + F N
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
F 1 = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
Analyze the model’s classification performance across different classes in detail using the Confusion Matrix. Identify common error patterns.
Performance metrics for regression models include Mean Squared Error ( M S E ), Root Mean Squared Error ( R M S E ), and Mean Absolute Error ( M A E ). MSE is used to evaluate the prediction accuracy of regression models, such as those predicting audio features. RMSE is used to assess the prediction accuracy of regression models and facilitates the comparison with actual values. MAE evaluates the prediction error of regression models, reflecting the model’s average prediction bias, as shown in Formulas (5)–(7) as follows:
M S E = 1 n i = 1 n y i y ^ i 2
R M S E = M S E
M A E = 1 n i = 1 n y i y ^ i
Performance metrics for generative models include the following:
Generation Quality Score: This metric assesses the quality of generated music through both subjective and objective methods.
Frechet Audio Distance ( F A D ): FAD is used to evaluate the distance between generated and real audio in feature space, as shown in Formula (8).
F A D = μ r μ g 2 + T r r + g 2 r g 1 2
Efficiency Metrics:
Training Time: This metric is used to evaluate the efficiency of model training, particularly significant for large-scale datasets and complex models.
Inference Time: This metric assesses the real-time performance and computational efficiency of the model, which is especially important for real-time music applications and online services.
Other Evaluation Criteria:
Dataset Coverage: This criterion evaluates the representativeness of the dataset and the generalization ability of the model, ensuring that the model can handle a diverse range of music types.
User Satisfaction: This metric assesses the model’s performance in real-world applications and user experience, gathered through surveys and user feedback.

5. Experimental Results

5.1. An Overview of the Dataset

This section provides an overview of the Manchu music dataset, including detailed statistics and analyses of the number of recordings, duration, and diversity. These details not only offer fundamental data for experiments but also provide important references for evaluating the dataset’s quality and exploring its application potential. The specific situation is shown in Table 4.
The Manchu music dataset includes audio recordings from multiple categories, covering various forms such as traditional instrumental solos, folk ensembles, and singing.
The dataset contains a total of 5000 recordings. The recordings are classified into five main categories: traditional instrumental solos; folk ensembles; folk songs; festival music; and religious ritual music. These recordings encompass the main types of Manchu music, ensuring the dataset’s representativeness and diversity.
To fully assess the dataset’s scale and coverage, we have compiled the total and average duration of recordings in each category. The total duration of the recordings in the dataset is 500 h (average duration of 6 min per recording). These data indicate a relatively even distribution of recording durations, providing sufficient sample sizes for model training and evaluation.
The diversity of the Manchu music dataset is reflected in various aspects, including music types, performance styles, and regional distribution. The diversity of the dataset not only enhances the generalization ability of the model, but also provides rich materials for the digital protection and research of Manchu music.

5.2. The Performance of AI Models Using the Dataset

This section provides a detailed report on the performance of various AI models when applied to the Manchu music dataset. We evaluated the models across different tasks, including classification, generation, regression, and sentiment analysis, assessing their accuracy, efficiency, and generation quality.
In the music classification task, we tested the following models: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Support Vector Machines (SVM). The classification tasks included both music genre classification and emotion classification.
CNN: CNNs demonstrated excellent performance in extracting audio features and classifying music genres. They achieved high accuracy and F1 scores, particularly when handling complex audio features and multi-class problems.
RNN: RNNs performed well in processing time-series data, such as music sequences. However, their accuracy and F1 scores were slightly lower than those of CNNs, possibly due to limitations in learning long sequences.
SVM: SVMs performed adequately on small-scale datasets, but their performance decreased when handling larger datasets. They were less effective in dealing with complex music features.
As shown in Table 5, the comparative performance metrics highlight the strengths and weaknesses of each model in different aspects of music classification.
In the music generation task, we utilized Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformer models. The tasks included generating new Manchu music segments and style transfer, as shown in Table 6.
GANs were capable of producing high-quality audio when generating new music segments, achieving a high similarity to real music. However, some generated music pieces may exhibit minor differences from real music in certain details.
VAEs performed well in generating diverse music, although the quality was slightly lower compared to GANs. VAEs are suitable for generating music segments with some variation.
Transformer models excelled at generating music segments with complex structures, receiving the highest generation quality scores. They were able to produce high-quality audio that closely resembles real music.
In the feature extraction and regression tasks, we tested feature extraction methods such as Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC), and Short-Time Fourier Transform (STFT), and evaluated their performance in audio feature regression tasks.
MFCC performed exceptionally well in both audio feature extraction and regression tasks, effectively capturing the spectral characteristics of music. It is useful for subsequent music analysis and generation.
LPC exhibited stable performance in extracting audio features, but its error rate was slightly higher compared to MFCC. This may be due to LPC’s limitations in handling complex audio signals.
STFT effectively extracted the time-frequency characteristics of audio, with performance close to that of MFCC. However, its longer processing time may impact the efficiency of handling large-scale datasets. The results are presented in Table 7.
In the emotion analysis tasks, we tested the performance of the Random Forest and k-Nearest Neighbors (k-NN) models in music emotion classification. The results are presented in Table 8.
The Random Forest model performed well in the emotion analysis task, effectively classifying music emotions with high accuracy and F1 score.
The k-NN model demonstrated stable performance in emotion classification tasks, but its accuracy and F1 score were slightly lower. This may be due to the model’s relatively poor performance in high-dimensional feature spaces.

5.3. Comparison with Existing Datasets

To assess the strengths and weaknesses of the Manchu music dataset, we compared it with several major existing music datasets. The comparison covered dimensions such as dataset size, audio quality, diversity, and application effectiveness. This analysis positions the Manchu music dataset within the context of existing datasets, highlighting its unique value not only for the preservation and transmission of cultural heritage but also as a valuable resource for digital and intelligent research in cultural heritage preservation.
In terms of dataset size, although the Manchu music dataset is smaller than the Million Song Dataset, its focus on a specific ethnic music genre and its total duration of 500 h provide a sufficient amount of samples. Compared to the GTZAN and MAESTRO datasets, it offers more detailed support for ethnic music data. Regarding audio quality, the Manchu music dataset’s quality surpasses that of the GTZAN and Million Song Dataset and is comparable to MAESTRO, which provides a solid foundation for high-quality audio analysis and model training.
In terms of dataset diversity, the Manchu music dataset stands out for its variety in music types and regional distribution. Unlike the Million Song Dataset, which covers global popular music, the Manchu music dataset focuses on a specific ethnic music category, offering a unique resource for studying the characteristics of ethnic music. The specific contents are shown in Table 9.
The Manchu music dataset demonstrates superior performance in specific fields compared to other datasets, particularly in the research and application of ethnic music. Unlike the more general-purpose datasets like the Million Song Dataset and GTZAN, the Manchu music dataset offers more targeted research materials. This study reveals the unique value and advantages of the Manchu music dataset by comparing it with existing datasets.
Although it may not match some large-scale datasets in terms of size, its high quality, diversity, and focus on a specific domain make it significantly important for the study of ethnic music. The dataset’s application results indicate that it provides strong support for existing research in tasks such as music classification, generation, and analysis. This foundation paves the way for future research in related fields. The application effect is shown in Table 10.

5.4. Case Studies on the Effectiveness of the Dataset

To validate the effectiveness and application potential of the Manchu music dataset, this section presents several specific case studies. These studies demonstrate the dataset’s performance in practical applications, including music classification, music generation, and emotion analysis.
In one case study, we used the Manchu music dataset for a music classification task. The goal was to automatically classify different types of Manchu music using a deep learning model, specifically a Convolutional Neural Network (CNN).
The classification results obtained from the CNN model successfully categorized different types of Manchu music segments into their respective classes. The model’s performance was particularly notable in the “folk ensemble” category, achieving an accuracy rate of 90%. These results indicate that the rich samples provided by the Manchu music dataset effectively support the automatic classification of music genres, as shown in Table 11.
We employed a Generative Adversarial Network (GAN) for the task of generating new music pieces with characteristics of Manchu music. The generated quality received a score of 7.8 out of 10. The Fréchet Audio Distance (FAD) value was 0.32. The generated music segments displayed key features of Manchu music, such as the use of specific instruments and melodic styles. Although there were slight differences in certain details compared to real music, the overall style and structure were consistent with the original music. This indicates that the GAN model can effectively utilize the dataset to generate music segments with ethnic characteristics, demonstrating practical application value.
We used a Random Forest model to perform emotion analysis on the Manchu music dataset, aiming to identify the emotional types in music segments, such as “joy”, “sadness”, and “calm”. We trained and tested the model on 3000 audio clips from the dataset. The emotion analysis model successfully identified the primary emotions present in the music segments. The model achieved an accuracy of 87% in recognizing the “joy” emotion and 79% for the “sadness” emotion. These results indicate that the Manchu music dataset is highly effective for emotion analysis tasks, providing valuable insights into the emotional expression in music, as shown in Table 12.
We employed a Transformer model for style transfer in Manchu music, aiming to convert music segments from one style to another. For example, converting music in the “traditional instrument solo” style to the “festive music” style. We trained and tested the model using 2000 audio clips from the Manchu music dataset. The style consistency score was 8.1 out of 10.
The style transfer model successfully transformed music from the traditional instrument solo style to the festive music style. The generated music segments closely resembled the target style in their stylistic expression. The high style consistency score and positive subjective evaluations indicate that the Manchu music dataset holds significant potential for style transfer tasks.

6. Discussion

6.1. Insights Gained from Experiments

The insights gained from the experiments indicate that the Manchu music dataset performs exceptionally well across various AI tasks, particularly in music classification, generation, and emotion analysis. The dataset’s high quality, diversity, and specificity provide a solid foundation for model training and application, while also offering valuable references for future research and optimization.
During the experiments, we found that the high audio quality of the dataset significantly improved the classification and generation performance of the models. The diversity of the dataset also played a crucial role in enhancing the models’ generalization capabilities and robustness. In the music generation tasks, both the Generative Adversarial Network (GAN) and Transformer models showed excellent performance, though further optimization is needed for generating finer details and maintaining style consistency. The Convolutional Neural Network (CNN) demonstrated high accuracy in classification tasks, while the Transformer model outperformed other models in music generation tasks, capturing complex audio features and long-term dependencies.
Compared to large-scale music datasets like the Million Song Dataset, the Manchu music dataset, despite being smaller in scale, offers unique value in the study of ethnic music due to its specific focus. The high audio quality of the dataset enhances model performance in analysis tasks, providing more accurate feature extraction and higher quality analysis results. This finding underscores the importance of high-quality data for model training and analysis outcomes. With high-quality audio data and rich musical samples, researchers can better understand and analyze the characteristics, styles, and emotional expressions of Manchu music, thereby advancing ethnic music research.
Future research can leverage this dataset for more in-depth studies on ethnic music, exploring topics such as the evolution of musical styles and regional differences. The experimental results also reveal the strengths and weaknesses of different models in handling specific tasks, offering directions for future model optimization. For instance, improvements can be made in handling the details of music generation with GAN models, or further enhancing the accuracy of models in emotion analysis. These insights help us better understand the practical application potential of the dataset and provide direction for future research efforts.

6.2. Contributions to the Fields of AI and Ethnomusicology

The construction and application of the Manchu music dataset have had a profound impact on the fields of AI and ethnomusicology. Firstly, the dataset provides a high-quality, specialized music data resource for AI research, particularly suitable for training and testing models in tasks such as music classification, generation, emotion analysis, and style transfer. The high-quality audio data and detailed annotation information help improve the performance of models in practical applications, advancing music generation technologies and emotion analysis techniques, and showcasing the deep application potential of AI technologies in specific domains.
In the field of ethnomusicology, the Manchu music dataset has facilitated the digital preservation and study of Manchu music. The high-quality audio recordings and detailed annotation information provide rich resources for the preservation and research of ethnic music, aiding researchers in gaining a deeper understanding of the styles, characteristics, and emotional expressions of Manchu music. The dataset’s application in music generation and style transfer tasks highlights the innovative potential of Manchu music, promoting the modernization of traditional music and the innovative application of ethnic music. Furthermore, the dataset has played a significant role in ethnic music education and promotion, providing educators and music promotion activities with abundant materials, thereby enhancing public awareness and interest in Manchu music.
By providing high-quality data resources, advancing music generation and emotion analysis technologies, and promoting the digital preservation and study of ethnic music, the Manchu music dataset has made significant contributions to the progress of related fields. Future research can continue to explore the potential of the dataset in more application scenarios, supporting the further development of AI technologies and ethnomusicology.

6.3. Limitations of the Current Dataset and Research

Despite the strong performance of the Manchu music dataset in various tasks, there are certain limitations regarding its scale, audio quality, annotation accuracy, model applicability, and data privacy. These limitations may affect the comprehensive application of the dataset and the generalizability of research results. Specifically, the dataset’s relatively small scale may limit its effectiveness in training large-scale deep learning models, and uneven geographical coverage might lead to an incomplete capture of the overall characteristics of Manchu music. Additionally, variations in audio quality and recording environments could affect the uniformity and consistency of the audio, while subjectivity and accuracy issues in the annotation process may pose challenges to model training and result interpretation.
Regarding research methods, while good results have been achieved in various tasks, some models still face challenges in generating detailed content and maintaining style consistency, especially when handling complex musical features. Data privacy and ethical issues also need to be addressed to ensure that the rights of music creators and performers are protected and that relevant ethical standards are followed.
Future research can address these limitations by expanding the dataset’s scale, improving audio quality, optimizing the annotation process, enhancing model algorithms, and ensuring ethical data usage. These measures will help further increase the dataset’s research value and application effectiveness, providing strong support for the continued development of AI and ethnomusicology.

6.4. Recommendations for Future Improvements and Research Directions

Future improvements and research directions should focus on several aspects to enhance the quality and applicability of the Manchu music dataset. First, expanding the scale and diversity of the dataset is crucial. This includes increasing the number of samples, covering more music genres and regions, and recording rare musical forms to ensure the dataset’s richness and representativeness. Second, improving audio quality and recording conditions is essential. This can be achieved by upgrading recording equipment, standardizing recording environments, and refining recording processes to ensure high-quality and consistent audio data.
In the annotation process, implementing multi-annotation and expert review mechanisms can enhance annotation accuracy and consistency. Optimizing existing models and exploring new technologies are also important. This includes improving music generation models, enhancing classification and emotion analysis performance, and exploring new applications of music style transfer and deep learning in music analysis, thus further improving model performance and adaptability.
Additionally, attention must be paid to data privacy and ethical issues. Ensuring the protection of music copyright and privacy, respecting cultural backgrounds, and establishing and adhering to ethical guidelines for data use are essential for supporting the preservation and transmission of Manchu music. These improvements will help enhance the research value and application effectiveness of the dataset, providing strong support for the further development of AI technologies and ethnomusicology. Through continuous optimization and exploration, we can better leverage the dataset’s potential and advance progress in related fields.

7. Conclusions

In this study, we successfully constructed a specialized Manchu music dataset and evaluated its application across various music-related tasks. Our findings indicate that the dataset’s high quality and diversity significantly enhance model performance. Additionally, the dataset demonstrates positive potential in the areas of ethnic music preservation, cultural heritage, and technological innovation. However, some limitations remain, providing direction for future research and improvements. With further work, we can optimize the dataset’s quality and scale, advancing the fields of AI technology and ethnomusicology.
The construction and application of the Manchu music dataset have had a positive impact on both AI research and cultural preservation. The dataset has advanced music data analysis technologies, supported cross-disciplinary research and applications, and promoted data sharing and openness. It also provides crucial support for the digital preservation, educational promotion, and innovative application of Manchu music, thereby fostering the preservation and transmission of ethnic music.
Future work can build on these achievements by continuing to optimize the dataset, expand its range of applications, and further leverage its potential to contribute more significantly to the fields of AI technology and cultural preservation. The successful construction and application of the Manchu music dataset have not only propelled technological advancement but also provided important support for the preservation and transmission of ethnic music. Future research should continue to focus on expanding the dataset, optimizing technologies, and protecting cultural heritage to maximize the dataset’s potential and have a profound impact on a broader research community. Through cross-disciplinary collaboration and technological innovation, we can better promote scientific research and cultural development, achieving harmonious integration and mutual benefit between technology and the humanities.

Author Contributions

Conceptualization, D.C., C.Z. and W.-S.J.; Methodology, D.C. and N.S.; Software, C.Z. and W.-S.J.; Validation, C.Z., D.C. and N.S.; Investigation, N.S. and W.-S.J.; Resources, J.-H.L.; Data Curation, C.Z. and W.-S.J.; Writing—Original Draft Preparation, D.C.; Writing—Review and Editing, D.C. and N.S.; Visualization, C.Z. and W.-S.J.; Supervision, J.-H.L.; Project Administration, N.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by several key and general research projects from the Jilin Provincial Department of Education and other institutions. The achievements represent the interim re-search outcomes of the following projects: the key project of social science research, “Research on the Forms and Approaches of University Music Education Resources Serving Cultural Care for the Elderly under New Circumstances” (Project No.: JJKH20210081SK); the general project of social science research, “Research on the Transformation of Vocal Music Teaching Models in Universities in the ‘Digital’ Era—A Case Study of Vocal Music Courses in Jilin Province Universities” (Project No.: JJKH20230091SK); the educational and teaching reform project of Beihua University, “Research and Practice on the Teaching of the ‘Vocal Music’ Course Based on a Blended Teaching Model” (Project No.: XJZD2021035); the Beihua University postgraduate education and teaching reform research project, “Research on the application of artificial intelligence technology in improving the evaluation of music teaching in colleges and universities” (Project No.: JG [2024] 014); the Jilin Province Social Science Foundation project supporting doctors and young scholars, “Research on Digital Empowerment for the Development of Cultural Tourism and the Inheritance and Protection of Cultural Heritage in Ji-lin Province” (Project No.: 2024C47); the Beihua University Education and Teaching Reform Research Project, “Research on Innovation of College Music Education in the Multimedia Era” (Project No.: XJYB20220018); the Jilin Province Education Science “14th Five-Year Plan” Project, “Research on the Training Model of Interdisciplinary Innovative Talents in Higher Education in the Era of Artificial Intelligence” (Project No. GH24443); and the Ministry of Education’s Industry-University Collaborative Education Project, “Research on the Transformation of the ‘Hybrid’ Teaching Model of Vocal Courses in Local Undergraduate Colleges” (Project No.: 231107632021014).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Acknowledgments

Thanks to the author C.Z. for his great help. Thanks to my supervisor J.-H.L. for his careful guidance.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lin, Q.; Lian, Z. On Protection of Intangible Cultural Heritage in China from the Intellectual Property Rights Perspective. Sustainability 2018, 10, 4369. [Google Scholar] [CrossRef]
  2. Liu, Y. Application of Digital Technology in Intangible Cultural Heritage Protection. Mob. Inf. Syst. 2022, 2022, 7471121. [Google Scholar] [CrossRef]
  3. Isa, W.M.W.; Zin, N.A.M.; Rosdi, F.; Sarim, H.M. Digital Preservation of Intangible Cultural Heritage. Indones. J. Electr. Eng. Comput. Sci. 2018, 12, 1373–1379. [Google Scholar] [CrossRef]
  4. Zhang, J. Traditional music protection from the perspective of intangible cultural heritage. Learn. Educ. 2021, 9, 107–108. [Google Scholar] [CrossRef]
  5. Kang, L. National Music Promotion and Inheritance Strategies Based on the Perspective of Intangible Cultural Heritage. Arts Stud. Crit. 2021, 2, 197–200. [Google Scholar] [CrossRef]
  6. Gao, Y. Research on Regional Characteristics and Cultural Value in the Inheritance of Intangible Cultural Heritage Music. J. Educ. Educ. Res. 2023, 6, 169–171. [Google Scholar] [CrossRef]
  7. Zhou, Y. Relevant Conceptions on the Inheritance and Protection of Manchu Music in Liaoning Province. In Proceedings of the 2017 International Conference on Art Studies: Science, Experience, Education (ICASSEE 2017), Moscow, Russia, 9–11 November 2017; Atlantis Press: Dordrecht, The Netherlands, 2017. [Google Scholar] [CrossRef]
  8. Xiaojiao, F. Musical Scholarship of the “Golden Age” of the Qing Dynasty Based on 17th and 18th Century Books and Treatises. Russ. Music. 2024, 3, 101–108. [Google Scholar] [CrossRef]
  9. Tian, Y.; Meng, M.; Mei, L.; Dong, S. Research on Development Mechanism and Strategy of Folk Art Industry in Northeast of China. In Proceedings of the International Conference on Environmental and Engineering Management (EEM 2021), Changsha, China, 23–25 April 2021; Volume 253, p. 02064. [Google Scholar] [CrossRef]
  10. Wen, H.-Q. National Belonging Needs and Natural Historical Cultural Analysis of Chinese Manchu. In Proceedings of the 3rd Annual International Conference on Management, Economics and Social Development (ICMESD 17), Guangzhou, China, 26–28 May 2017. [Google Scholar] [CrossRef]
  11. Chen, S.; Chen, X.; Lu, Z.; Huang, Y. “My Culture, My People, My Hometown”: Chinese Ethnic Minorities Seeking Cultural Sustainability by Video Blogging. Proc. ACM Hum.-Comput. Interact. 2023, 7, 76. [Google Scholar] [CrossRef]
  12. Dai, J.; Wang, K.; Sun, Y. Analysis on the Inheritance and Development of Manchu Intangible Cultural Heritage in Changbai Mountain by the Creation of Animation Short Films in the New Media Era. In Proceedings of the 7th International Conference on Education, Language, Art and Inter-cultural Communication (ICELAIC 2020), Moscow, Russia, 8–9 December 2020; Atlantis Press: Dordrecht, The Netherlands, 2020; pp. 376–379. [Google Scholar]
  13. Exploring the Ethnic Groups of China. Available online: https://www.cusef.org.hk/en/cusef-blog/exploring-the-ethnic-groups-of-china (accessed on 20 February 2023).
  14. Miranda, E.R. (Ed.) Handbook of Artificial Intelligence for Music; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
  15. Li, P.-P.; Wang, B. Artificial Intelligence in Music Education. Int. J.Hum.–Comput. Interact. 2023, 40, 4183–4192. [Google Scholar] [CrossRef]
  16. Civit, M.; Civit-Masot, J.; Cuadrado, F.; Escalona, M.J. A systematic review of artificial intelligence-based music generation: Scope, applications, and future trends. Expert Syst. Appl. 2022, 209, 118190. [Google Scholar] [CrossRef]
  17. Kaliakatsos-Papakostas, M.; Floros, A.; Vrahatis, M.N. Artificial intelligence methods for music generation: A review and future perspectives. In Nature-Inspired Computation and Swarm Intelligence; Academic Press: Cambridge, MA, USA, 2020; pp. 217–245. [Google Scholar] [CrossRef]
  18. Chen, M. Analysis on Industrialization Development Path of Intangible Cultural Heritages of Jilin Province. In the Proceedings of the 3rd International Conference on Culture, Education and Economic Development of Modern Society (ICCESE 2019), Moscow, Russia, 1–3 March 2019; Atlantis Press: Dordrecht, The Netherlands, 2019. [Google Scholar]
  19. Tzanetakis, G.; Cook, P. Musical Genre Classification of Audio Signals. IEEE Trans. Speech Audio Process. 2002, 10, 293. Available online: https://github.com/Manishankar9977/Music-genre-classification (accessed on 1 November 2024). [CrossRef]
  20. Zheng, B. Experimenting with the National Language: Use of Manchu in Bannermen Poetry and Songs in the Nineteenth Century. CHINOPERL: J. Chin. Oral Perform. Lit. 2020, 39, 90–110. [Google Scholar] [CrossRef]
  21. Chiu, E.S.-Y. Bannermen Tales (Zidishu): Manchu Storytelling and Cultural Hybridity in the Qing Dynasty; Brill: Leiden, The Netherlands, 2018; Volume 105. [Google Scholar] [CrossRef]
  22. Huang, R.S.; Holzapfel, A.; Sturm, B.L.T.; Kaila, A.-K. Beyond Diverse Datasets: Responsible MIR, Interdisciplinarity, and the Fractured Worlds of Music. Trans. Int. Soc. Music. Inf. Retr. 2023, 6, 43–59. [Google Scholar] [CrossRef]
  23. Vatolkin, I.; Ginsel, P.; Rudolph, G. Advancements in the Music Information Retrieval Framework AMUSE over the Last Decade. In SIGIR ’21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Montréal, Canada, 11–15 July 2021. [CrossRef]
  24. Fessahaye, F.; Perez, L.; Zhan, T.; Zhang, R.; Fossier, C.; Markarian, R.; Chiu, C.; Zhan, J.; Gewali, L.; Oh, P. T-RECSYS: A Novel Music Recommendation System Using Deep Learning. In Proceedings of the 2019 IEEE International Conference on Consumer Electronics (ICCE), Berlin, Germany, 8–11 September 2019; IEEE: New Jersey, NJ, USA, 2019; pp. 1–6. [Google Scholar]
  25. Wang, Z.; Chen, K.; Jiang, J.; Zhang, Y.; Xu, M.; Dai, S.; Gu, X.; Xia, G. Pop909: A pop-song dataset for music arrangement generation. arXiv 2020. [Google Scholar] [CrossRef]
  26. Aljanaki, A.; Yang, Y.H.; Soleymani, M. Developing a benchmark for emotional analysis of music. PLoS ONE 2017, 12, e0173392. [Google Scholar] [CrossRef]
  27. Shahriar, S. GAN computers generate arts? A survey on visual arts, music, and literary text generation using generative adversarial network. Displays 2022, 73, 102237. [Google Scholar] [CrossRef]
  28. Choi, K.; Fazekas, G.; Sandler, M.; Cho, K. Convolutional recurrent neural networks for music classification. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; IEEE: New Jersey, NJ, USA, 2017; pp. 2392–2396. [Google Scholar] [CrossRef]
  29. Huang, A.; Wu, R. Deep learning for music. arXiv 2016. [Google Scholar] [CrossRef]
  30. Briot, J.P.; Hadjeres, G.; Pachet, F.D. Deep learning techniques for music generation—A survey. arXiv 2017. [Google Scholar] [CrossRef]
  31. Hernandez-Olivan, C.; Beltran, J.R. Music composition with deep learning: A review. In Advances in Speech and Music Technology: Computational Aspects and Applications; Springer: Cham, Switzerland, 2022; pp. 25–50. [Google Scholar] [CrossRef]
  32. Zhang, C.; Wang, S. Research on Cultural Confidence Leading the Inheritance and Development of Liaoning Manchu Folk Dance. Int. J. New Dev. Educ. 2023, 5, 117–121. [Google Scholar] [CrossRef]
  33. Bertin-Mahieux, T.; Ellis, D.P.W.; Whitman, B.; Lamere, P. The Million Song Dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), Miami, FL, USA, 24–28 October 2011; Available online: http://millionsongdataset.com (accessed on 1 November 2024).
  34. Law, E.; Von Ahn, L. Input-Agreement: A New Mechanism for Collecting Data Using Human Computation Games. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI), Boston, MA, USA, 4–9 April 2009; Available online: http://mirg.city.ac.uk/codeapps/the-magnatagatune-dataset (accessed on 1 November 2024).
  35. Engel, J.; Agrawal, K.; Chen, S.; Gulrajani, I.; Donahue, C.; Roberts, A. Nsynth: A Large-Scale Dataset of Annotated Musical Notes. In Proceedings of the 2017 Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; Available online: https://magenta.tensorflow.org/datasets/nsynth (accessed on 1 November 2024).
  36. Defferrard, M.; Benzi, K.; Vandergheynst, P.; Bresson, X. FMA: A Dataset for Music Analysis. In Proceedings of the 18th Inter-national Society for Music Information Retrieval Conference (ISMIR), Suzhou, China, 23–27 October 2017; Available online: https://freemusicarchive.org/static (accessed on 1 November 2024).
  37. Schedl, M.; Gómez, E.; Urbano, J. Music Information Retrieval: Recent Developments and Applications. Found. Trends® Inf. Retr. 2014, 8, 127–261. [Google Scholar] [CrossRef]
  38. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
  39. Gunawan, A.A.S.; Iman, A.P.; Suhartono, D. Automatic Music Generator Using Recurrent Neural Network. Int. J. Comput. Intell. Syst. 2020, 13, 645–654. [Google Scholar] [CrossRef]
  40. Sturm, B.L.; Ben-Tal, O.; Monaghan, Ú.; Collins, N.; Herremans, D.; Chew, E.; Hadjeres, G.; Deruty, E.; Pachet, F. Machine learning research that matters for music creation: A case study. J. New Music. Res. 2019, 48, 36–55. [Google Scholar] [CrossRef]
  41. Bahuleyan, H. Music genre classification using machine learning techniques. arXiv 2018. [Google Scholar] [CrossRef]
  42. Ndou, N.; Ajoodha, R.; Jadhav, A. Music Genre Classification: A Review of Deep-Learning and Traditional Machine-Learning Approaches. In Proceedings of the 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), Toronto, ON, Canada, 21–24 April 2021; IEEE: New Jersey, NJ, USA, 2021; p. 1. [Google Scholar]
  43. Tang, K. Singing a Chinese Nation: Heritage Preservation, the Yuanshengtai Movement, and New Trends in Chinese Folk Music in the Twenty-First Century. Ethnomusicology 2021, 65, 1–31. [Google Scholar] [CrossRef]
  44. Fan, J.; Yang, Y.-H.; Dong, K.; Pasquier, P.A. Comparative Study of Western and Chinese Classical Music Based on Soundscape Models. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: New Jersey, NJ, USA, 2020; pp. 521–525. [Google Scholar]
  45. Jiang, F.; Zhang, L.; Wang, K.; Deng, X.; Yang, W. BoYaTCN: Research on Music Generation of Traditional Chinese Pentatonic Scale Based on Bidirectional Octave Your Attention Temporal Convolutional Network. Appl. Sci. 2022, 12, 9309. [Google Scholar] [CrossRef]
  46. Li, J.; Luo, J.; Ding, J.; Zhao, X.; Yang, X. Regional classification of Chinese folk songs based on CRF model. Multimed. Tools Appl. 2018, 78, 11563–11584. [Google Scholar] [CrossRef]
  47. Luo, J.; Yang, X.; Ji, S.; Li, J. MG-VAE: Deep Chinese folk songs generation with specific regional styles. In Proceedings of the 7th Conference on Sound and Music Technology (CSMT): Revised Selected Papers, Harbin, China, 26–29 December 2019; Springer: Singapore, 2020; pp. 93–106. [Google Scholar] [CrossRef]
  48. Chen, Q.; Zhao, W.; Wang, Q.; Zhao, Y. The Sustainable Development of Intangible Cultural Heritage with AI: Cantonese Opera Singing Genre Classification Based on CoGCNet Model in China. Sustainability 2022, 14, 2923. [Google Scholar] [CrossRef]
  49. Li, Q.; Hu, B. Joint Time and Frequency Transformer for Chinese Opera Classification. In Proceedings of the INTERSPEECH 2023, Dublin, Ireland, 20–24 August 2023; pp. 3919–3923. [Google Scholar] [CrossRef]
  50. Roche, F.; Hueber, T.; Limier, S.; Girin, L. Autoencoders for music sound modeling: A comparison of linear, shallow, deep, recurrent and variational models. arXiv 2018. [Google Scholar] [CrossRef]
  51. Grekow, J.; Dimitrova-Grekow, T. Monophonic Music Generation with a Given Emotion Using Conditional Variational Autoencoder. IEEE Access 2021, 9, 129088–129101. [Google Scholar] [CrossRef]
  52. Pei, X. Analysis of Manchu Shaman Music Cultural Form from the Perspective of Art. In Proceedings of the 2019 5th International Conference on Economics, Management and Humanities Science (ECOMHS 2019), Bangkok, Thailand, 16–17 March 2019. [Google Scholar] [CrossRef]
  53. Howard, K. Sacred and profane: Music in Korean shaman rituals. In Indigenous Religious Musics; Routledge: Milton, UK, 2017; pp. 56–83. [Google Scholar] [CrossRef]
  54. Seo, M.K. Hanyang Kut: Korean Shaman Ritual Music from Seoul; Routledge: Milton, UK, 2020; ISBN 13:9780367252717. [Google Scholar]
  55. Howard, K. Shamanism, Music, and the Soul Train. In Music as Medicine; Routledge: London, UK, 2017; pp. 353–374. [Google Scholar] [CrossRef]
  56. Li, X. A General History of Chinese Art; Walter de Gruyter GmbH & Co KG: Berlin, Germany, 2022. [Google Scholar] [CrossRef]
  57. The Musical Bridge–China: Perennial Music and Arts, “Intro to Chinese Music”. Available online: https://www.perennialmusicandarts.com/post/intro-to-chinese-music (accessed on 6 March 2019).
  58. Dai, J.; Zhang, L. Manchu Intangible Cultural Heritage Protection Research Based on Digital Multimedia Technology. In Proceedings of the International Conference on Education, Language, Art and Intercultural Communication (ICELAIC-14), Zhengzhou, China, 5–7 May 2014; Atlantis Press: Dordrecht, The Netherlands, 2014. [Google Scholar]
  59. Zhang, C. Research on the Technology of Virtual Reality Empowering Manchu Dance Cultural Communication. Acad. J. Humanit. Soc. Sci. 2024, 7, 254–259. [Google Scholar] [CrossRef]
  60. Fu, Z.; Lu, G.; Ting, K.M.; Zhang, D. A Survey of Audio-Based Music Classification and Annotation. IEEE Trans. Multimed. 2010, 13, 303–319. [Google Scholar] [CrossRef]
  61. Han, Y.; Kim, J.; Lee, K. Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music. IEEE/ACM Trans. Audio Speech Lang. Process. 2016, 25, 208–221. [Google Scholar] [CrossRef]
  62. Dieleman, S.; Schrauwen, B. End-to-end learning for music audio. In Proceedings of the ICASSP 2014–2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; IEEE: New Jersey, NJ, USA, 2014; pp. 6964–6968. [Google Scholar]
  63. Sturm, B.L.T.; Iglesias, M.; Ben-Tal, O.; Miron, M.; Gómez, E. Artificial Intelligence and Music: Open Questions of Copyright Law and Engineering Praxis. Arts 2019, 8, 115. [Google Scholar] [CrossRef]
  64. Lyons, F.; Sun, H.; Collopy, D.P.; Curran, K.; Ohagan, P. Music 2025—The Music Data Dilemma: Issues Facing the Music Industry in Improving Data Management. Intellectual Property Office Research Paper. Published 18 June 2019. Available online: http://hdl.handle.net/2299/21408 (accessed on 6 March 2024).
Figure 1. Population proportion and distribution of the Manchu ethnic group in China.
Figure 1. Population proportion and distribution of the Manchu ethnic group in China.
Applsci 14 10811 g001
Figure 2. Examples of Chinese traditional folk musical instruments.
Figure 2. Examples of Chinese traditional folk musical instruments.
Applsci 14 10811 g002
Figure 3. Manchu musical instruments and performance scenes.
Figure 3. Manchu musical instruments and performance scenes.
Applsci 14 10811 g003
Figure 4. Data collection framework diagram.
Figure 4. Data collection framework diagram.
Applsci 14 10811 g004
Figure 5. Audio processing and feature extraction framework diagram.
Figure 5. Audio processing and feature extraction framework diagram.
Applsci 14 10811 g005
Figure 6. Music information retrieval algorithm framework.
Figure 6. Music information retrieval algorithm framework.
Applsci 14 10811 g006
Figure 7. Music generation and synthesis algorithm framework diagram.
Figure 7. Music generation and synthesis algorithm framework diagram.
Applsci 14 10811 g007
Figure 8. Music classification and sentiment analysis network framework diagram.
Figure 8. Music classification and sentiment analysis network framework diagram.
Applsci 14 10811 g008
Table 1. Data processing task list.
Table 1. Data processing task list.
CategoryOperationMethod
Audio data processingAudio cleaningNoise removal; Volume standardization
Audio segmentationAutomatic segmentation; Manual correction
Feature extractionSpectrum analysis; Time domain features; High-level features
Format conversionUniform format; Multiple sampling rates
Video data processingVideo editingAutomatic editing; Manual editing
Video enhancementImage optimization; Frame rate adjustment
Audio and video synchronizationAutomatic synchronization; Manual proofreading
Metadata processingStandardizationMetadata format; Field definition
Automatic labelingPreliminary annotation; Annotation tool
Manual reviewExpert review; Consistency check
Data storage and managementDatabase designRelational database; File storage system
Data backupMultiple backups; Off-site backup
Data accessAPI interface; User permission management
Data processing challenges and solutionsLarge data volumeHigh cost; Improve data processing efficiency and storage capacity
Various data qualityDiverse data sources; Formulate strict data processing standards
Complex processingTechnical complexity; Setup interdisciplinary teams, collaborate
Table 2. Dataset structure storage example and description.
Table 2. Dataset structure storage example and description.
No.HierarchyExampleDescription
1Top-level directoryMongol_Music_DatasetThe top-level directory of the dataset contains all Manchu music data files and related resources.
2Category directoryTraditional_Songs/
Instrumental_Music/
Ceremonial_Music/
Folk_Dances/
Classification is based on music type, instrument type, performance form, etc.
3Data file directoryTraditional_Songs/
Audio_Files/
Video_Files/
Metadata/
Under each classification directory, it is further divided into audio files, video files, and metadata files according to the data type.
4File naming rulesAudio: TS_Audio_SingerName_Instrument_YYYYMMDD.wav
Video: TS_Video_SingerName_Instrument_YYYYMMDD.mp4
Metadata: TS_Metadata_SingerName_Instrument_YYYYMMDD.json
A unified file naming rule is adopted to facilitate file management and retrieval. The naming rule includes information such as data type, performer, instrument, recording date, etc.
Table 3. Manchu music dataset key characteristics.
Table 3. Manchu music dataset key characteristics.
FeatureDescription
Music TypesFolk Songs, Dance Music, Ceremonial Music
Number of Tracks500+
Average Duration3–7 min
InstrumentationBili, Sheng, Erhu, Pipa, etc.
Recording LocationsVarious regions in Northeast China
Cultural ContextRituals, Festivals, Daily Life
Table 4. Basic information table of Manchu music dataset.
Table 4. Basic information table of Manchu music dataset.
NO.NameRecordHoursDiversity in Music Types
1Traditional instrumental solos1000100The dataset includes recordings of various traditional Manchu instruments, such as the bili, sheng, and morin khuur, with a balanced representation of each instrument type, ensuring diversity in instrumental types.
2Folk ensembles1200120This category includes different forms of ensembles, such as those performed at weddings, celebrations, and gatherings, showcasing the richness of Manchu music.
3Folk songs80080This includes various Manchu folk songs, such as work songs, love songs, and ritual songs, illustrating the diversity of Manchu vocal art.
4Festival music1000100Recordings from traditional Manchu festivals (e.g., Spring Festival, Dragon Boat Festival, Nadam Fair) highlight the importance of music in festival activities.
5Religious ritual music1000100This includes music from Manchu religious rituals, such as shaman dances and ancestral worship ceremonies, highlighting the unique style of Manchu music in religious contexts.
Total number of recordings5000500The diversity of the Manchu music dataset is reflected in various aspects, including music types, performance styles, and regional distribution.
Table 5. Music classification performance comparison table of different models.
Table 5. Music classification performance comparison table of different models.
ModelAPRF1
CNN85.7%82.3%84.1%83.2%
RNN80.5%78.0%79.8%78.9%
SVM75.2%72.0%73.5%72.7%
Table 6. Comparison table of Manchu music generation performance of different models.
Table 6. Comparison table of Manchu music generation performance of different models.
ModelGenerating a Quality Score(10)Frechet Audio Distance (FAD)
GAN7.80.32
VAE7.20.38
Transformer8.10.29
Table 7. Comparison of experimental results on feature extraction and regression task performance.
Table 7. Comparison of experimental results on feature extraction and regression task performance.
MethodMSERMSEMAE
MFCC0.0250.1580.120
LPC0.0300.1730.125
STFT0.0280.1670.122
Table 8. Comparison of experimental results on sentiment analysis performance.
Table 8. Comparison of experimental results on sentiment analysis performance.
MethodAPRF1
Random Forest83.4%81.0%82.5%81.7%
k-NN78.9%76.5%77.8%77.1%
Table 9. Multi-dimensional comparison results of datasets.
Table 9. Multi-dimensional comparison results of datasets.
DatasetPerformancesDuration (Hours)QualityFormatFrequencyMusic TypePerforming StyleRegional Distribution
Manchu music
dataset
5000500HighWAV44.10 kHz5 typesDiverseChina
Million Song Dataset1,000,00011,000LowMP322.05 kHzVarious typesExtensiveGlobal
GTZAN Genre Collection1000100MediumMP322.05 kHz10 typesFixedUnknown
MAESTRO Dataset1200200HighWAV44.10 kHz1 typesClassicWestern world
Table 10. Comparison results of application effects on different datasets.
Table 10. Comparison results of application effects on different datasets.
DatasetApplication Effect
Manchu music
dataset
It performs well in tasks such as music classification, generation, and sentiment analysis, and can provide in-depth analysis and understanding of the characteristics of Manchu music.
Million Song DatasetIt is widely used in research on popular music classification, recommendation systems, etc., but its support for ethnic music is weak.
GTZAN Genre CollectionIt is often used in music genre classification research, but the small scale of the dataset and low sound quality limit its application scope.
MAESTRO DatasetIt performs well in the generation and analysis of classical music, but its support for other types of music is limited.
Table 11. Classification results of the test dataset under the CNN model.
Table 11. Classification results of the test dataset under the CNN model.
ClassificationAPRF1
CNN85.7%82.3%84.1%83.2%
Traditional instrument solo86.9%83.4%83.283.3%
Folk ensemble90.0%85.9%80.583.1%
Folk song singing83.6%80.3%86.783.5%
Festival music82.3%80.185.282.7%
Religious ceremony music85.781.8%84.983.4%
Table 12. Emotional type results of the test dataset under the Random Forest model.
Table 12. Emotional type results of the test dataset under the Random Forest model.
Emotional TypeAPRF1
Random Forest83.4%81.0%82.5%81.7%
Joy87.0%83.4%80.281.8%
Sadness79.0%78.4%86.582.3%
Clam84.2%81.2%80.881.0%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, D.; Sun, N.; Lee, J.-H.; Zou, C.; Jeon, W.-S. Digital Technology in Cultural Heritage: Construction and Evaluation Methods of AI-Based Ethnic Music Dataset. Appl. Sci. 2024, 14, 10811. https://doi.org/10.3390/app142310811

AMA Style

Chen D, Sun N, Lee J-H, Zou C, Jeon W-S. Digital Technology in Cultural Heritage: Construction and Evaluation Methods of AI-Based Ethnic Music Dataset. Applied Sciences. 2024; 14(23):10811. https://doi.org/10.3390/app142310811

Chicago/Turabian Style

Chen, Dayang, Na Sun, Jong-Hoon Lee, Changman Zou, and Wang-Su Jeon. 2024. "Digital Technology in Cultural Heritage: Construction and Evaluation Methods of AI-Based Ethnic Music Dataset" Applied Sciences 14, no. 23: 10811. https://doi.org/10.3390/app142310811

APA Style

Chen, D., Sun, N., Lee, J.-H., Zou, C., & Jeon, W.-S. (2024). Digital Technology in Cultural Heritage: Construction and Evaluation Methods of AI-Based Ethnic Music Dataset. Applied Sciences, 14(23), 10811. https://doi.org/10.3390/app142310811

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop