Sound and Music Computing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics and Vibrations".

Deadline for manuscript submissions: closed (3 November 2017) | Viewed by 200822

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, Aalto University, Espoo 02150, Finland
Interests: virtual acoustics; spatial sound; psychoacoustics

E-Mail Website
Co-Guest Editor
Multisensory Experience Lab, Department of Architecture, Design and Media Technology, Aalborg University, 2450 Copenhagen SV, Denmark
Interests: sonic interaction design; sound for virtual and augmented reality; audio-haptic interaction; sound synthesis by physical models; multimodal interfaces; multimodal perception and cognition; virtual and augmented reality
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Co-Guest Editor
International Audio Laboratories Erlangen, Friedrich-Alexander Universität Erlangen-Nürnberg, 91058 Erlangen, Germany
Interests: music information retrieval; music processing; audio signal processing

E-Mail Website
Co-Guest Editor
Department of Signal Processing and Acoustics, School of Electrical Engineering, Aalto University, P.O. Box 13000 FI-00076 Aalto, Espoo, Finland
Interests: acoustic signal processing; audio signal processing; audio systems; music technology
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Sound and music computing is a young and highly multidisciplinary research field. It combines scientific, technological, and artistic methods to produce, model, and understand audio and sonic arts with the help of computers. Sound and music computing borrows methods, for example, from computer science, electrical engineering, mathematics, musicology, and psychology.

In this Special Issue, we want to address recent advances in the following topics:

·         Analysis, synthesis, and modification of sound

·         Automatic composition, accompaniment, and improvisation

·         Computational musicology and mathematical music theory

·         Computer-based music analysis

·         Computer music languages and software

·         High-performance computing for audio

·         Interactive performance systems and new interfaces

·         Multi-modal perception and emotion

·         Music information retrieval

·         Music games and educational tools

·         Music performance analysis and rendering

·         Robotics and music

·         Room acoustics modeling and auralization

·         Social interaction in sound and music computing

·         Sonic interaction design

·         Sonification

·         Soundscapes and environmental arts

·         Spatial sound

·         Virtual reality applications and technologies for sound and music

Submissions are invited for both original research and review articles. Additionally, invited papers based on excellent contributions to recent conferences in this field will be included in this Special Issue; for example, from the 2017 Sound and Music Computing Conference SMC-17. We hope that this collection of papers will serve as an inspiration for those interested in sound and music computing.

Prof. Dr. Tapio Lokki,
Prof. Dr. Stefania Serafin,
Prof. Dr. Meinard Müller,
Prof. Dr. Vesa Välimäki
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Audio signal processing

  • computer interfaces

  • computer music

  • multimedia

  • music cognition

  • music control and performance

  • music information retrieval

  • music technology

  • sonic interaction design

  • virtual reality

Published Papers (30 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review

5 pages, 197 KiB  
Editorial
Special Issue on “Sound and Music Computing”
by Tapio Lokki, Meinard Müller, Stefania Serafin and Vesa Välimäki
Appl. Sci. 2018, 8(4), 518; https://doi.org/10.3390/app8040518 - 28 Mar 2018
Viewed by 3505
Abstract
Sound and music computing is a young and highly multidisciplinary research field. [...]
Full article
(This article belongs to the Special Issue Sound and Music Computing)

Research

Jump to: Editorial, Review

15 pages, 40278 KiB  
Article
Stay True to the Sound of History: Philology, Phylogenetics and Information Engineering in Musicology
by Sebastiano Verde, Niccolò Pretto, Simone Milani and Sergio Canazza
Appl. Sci. 2018, 8(2), 226; https://doi.org/10.3390/app8020226 - 01 Feb 2018
Cited by 16 | Viewed by 4646
Abstract
This work investigates computational musicology for the study of tape music works tackling the problems concerning stemmatics. These philological problems have been analyzed with an innovative approach considering the peculiarities of audio tape recordings. The paper presents a phylogenetic reconstruction strategy that relies [...] Read more.
This work investigates computational musicology for the study of tape music works tackling the problems concerning stemmatics. These philological problems have been analyzed with an innovative approach considering the peculiarities of audio tape recordings. The paper presents a phylogenetic reconstruction strategy that relies on digitizing the analyzed tapes and then converting each audio track into a two-dimensional spectrogram. This conversion allows adopting a set of computer vision tools to align and equalize different tracks in order to infer the most likely transformation that converts one track into another. In the presented approach, the main editing techniques, intentional and unintentional alterations and different configurations of a tape recorded are estimated in phylogeny analysis. The proposed solution presents a satisfying robustness to the adoption of the wrong reading setup together with a good reconstruction accuracy of the phylogenetic tree. The reconstructed dependencies proved to be correct or plausible in 90% of the experimental cases. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

14 pages, 14998 KiB  
Article
SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification
by Jongpil Lee, Jiyoung Park, Keunhyoung Luke Kim and Juhan Nam
Appl. Sci. 2018, 8(1), 150; https://doi.org/10.3390/app8010150 - 22 Jan 2018
Cited by 101 | Viewed by 14042
Abstract
Convolutional Neural Networks (CNN) have been applied to diverse machine learning tasks for different modalities of raw data in an end-to-end fashion. In the audio domain, a raw waveform-based approach has been explored to directly learn hierarchical characteristics of audio. However, the majority [...] Read more.
Convolutional Neural Networks (CNN) have been applied to diverse machine learning tasks for different modalities of raw data in an end-to-end fashion. In the audio domain, a raw waveform-based approach has been explored to directly learn hierarchical characteristics of audio. However, the majority of previous studies have limited their model capacity by taking a frame-level structure similar to short-time Fourier transforms. We previously proposed a CNN architecture which learns representations using sample-level filters beyond typical frame-level input representations. The architecture showed comparable performance to the spectrogram-based CNN model in music auto-tagging. In this paper, we extend the previous work in three ways. First, considering the sample-level model requires much longer training time, we progressively downsample the input signals and examine how it affects the performance. Second, we extend the model using multi-level and multi-scale feature aggregation technique and subsequently conduct transfer learning for several music classification tasks. Finally, we visualize filters learned by the sample-level CNN in each layer to identify hierarchically learned features and show that they are sensitive to log-scaled frequency. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Graphical abstract

21 pages, 2858 KiB  
Article
Analyzing Free-Hand Sound-Tracings of Melodic Phrases
by Tejaswinee Kelkar and Alexander Refsum Jensenius
Appl. Sci. 2018, 8(1), 135; https://doi.org/10.3390/app8010135 - 18 Jan 2018
Cited by 18 | Viewed by 8790
Abstract
In this paper, we report on a free-hand motion capture study in which 32 participants ‘traced’ 16 melodic vocal phrases with their hands in the air in two experimental conditions. Melodic contours are often thought of as correlated with vertical movement (up and [...] Read more.
In this paper, we report on a free-hand motion capture study in which 32 participants ‘traced’ 16 melodic vocal phrases with their hands in the air in two experimental conditions. Melodic contours are often thought of as correlated with vertical movement (up and down) in time, and this was also our initial expectation. We did find an arch shape for most of the tracings, although this did not correspond directly to the melodic contours. Furthermore, representation of pitch in the vertical dimension was but one of a diverse range of movement strategies used to trace the melodies. Six different mapping strategies were observed, and these strategies have been quantified and statistically tested. The conclusion is that metaphorical representation is much more common than a ‘graph-like’ rendering for such a melodic sound-tracing task. Other findings include a clear gender difference for some of the tracing strategies and an unexpected representation of melodies in terms of a small object for some of the Hindustani music examples. The data also show a tendency of participants moving within a shared ‘social box’. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Graphical abstract

15 pages, 17748 KiB  
Article
Desert and Sonic Ecosystems: Incorporating Environmental Factors within Site-Responsive Sonic Art
by Lauren Hayes and Julian Stein
Appl. Sci. 2018, 8(1), 111; https://doi.org/10.3390/app8010111 - 14 Jan 2018
Cited by 6 | Viewed by 6211
Abstract
Advancements in embedded computer platforms have allowed data to be collected and shared between objects—or smart devices—in a network. While this has resulted in highly functional outcomes in fields such as automation and monitoring, there are also implications for artistic and expressive systems. [...] Read more.
Advancements in embedded computer platforms have allowed data to be collected and shared between objects—or smart devices—in a network. While this has resulted in highly functional outcomes in fields such as automation and monitoring, there are also implications for artistic and expressive systems. In this paper we present a pluralistic approach to incorporating environmental factors within the field of site-responsive sonic art using embedded audio and data processing techniques. In particular, we focus on the role of such systems within an ecosystemic framework, both in terms of incorporating systems of living organisms, as well as sonic interaction design. We describe the implementation of such a system within a large-scale site-responsive sonic art installation that took place in the subtropical desert climate of Arizona in 2017. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

19 pages, 6703 KiB  
Article
Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses
by Michael Lovedee-Turner and Damian Murphy
Appl. Sci. 2018, 8(1), 105; https://doi.org/10.3390/app8010105 - 12 Jan 2018
Cited by 10 | Viewed by 5814
Abstract
Spatial impulse response analysis techniques are commonly used in the field of acoustics, as they help to characterise the interaction of sound with an enclosed environment. This paper presents a novel approach for spatial analyses of binaural impulse responses, using a binaural model [...] Read more.
Spatial impulse response analysis techniques are commonly used in the field of acoustics, as they help to characterise the interaction of sound with an enclosed environment. This paper presents a novel approach for spatial analyses of binaural impulse responses, using a binaural model fronted neural network. The proposed method uses binaural cues utilised by the human auditory system, which are mapped by the neural network to the azimuth direction of arrival classes. A cascade-correlation neural network was trained using a multi-conditional training dataset of head-related impulse responses with added noise. The neural network is tested using a set of binaural impulse responses captured using two dummy head microphones in an anechoic chamber, with a reflective boundary positioned to produce a reflection with a known direction of arrival. Results showed that the neural network was generalisable for the direct sound of the binaural room impulse responses for both dummy head microphones. However, it was found to be less accurate at predicting the direction of arrival of the reflections. The work indicates the potential of using such an algorithm for the spatial analysis of binaural impulse responses, while indicating where the method applied needs to be made more robust for more general application. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

29 pages, 1815 KiB  
Article
Live Convolution with Time-Varying Filters
by Øyvind Brandtsegg, Sigurd Saue and Victor Lazzarini
Appl. Sci. 2018, 8(1), 103; https://doi.org/10.3390/app8010103 - 12 Jan 2018
Cited by 8 | Viewed by 7845
Abstract
The paper presents two new approaches to artefact-free real-time updates of the impulse response in convolution. Both approaches are based on incremental updates of the filter. This can be useful for several applications within digital audio processing: parametrisation of convolution reverbs, dynamic filters, [...] Read more.
The paper presents two new approaches to artefact-free real-time updates of the impulse response in convolution. Both approaches are based on incremental updates of the filter. This can be useful for several applications within digital audio processing: parametrisation of convolution reverbs, dynamic filters, and live convolution. The development of these techniques has been done within the framework of a research project on crossadaptive audio processing methods for live performance. Our main motivation has thus been live convolution, where the signals from two music performers are convolved with each other, allowing the musicians to “play through each other’s sound”. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

21 pages, 586 KiB  
Article
Audlet Filter Banks: A Versatile Analysis/Synthesis Framework Using Auditory Frequency Scales
by Thibaud Necciari, Nicki Holighaus, Peter Balazs, Zdeněk Průša, Piotr Majdak and Olivier Derrien
Appl. Sci. 2018, 8(1), 96; https://doi.org/10.3390/app8010096 - 11 Jan 2018
Cited by 22 | Viewed by 5886
Abstract
Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis–synthesis system is the reconstruction error; it has to be minimized to avoid audible artifacts. Other advantageous properties include stability and [...] Read more.
Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis–synthesis system is the reconstruction error; it has to be minimized to avoid audible artifacts. Other advantageous properties include stability and low redundancy. To exploit some aspects of auditory perception in the signal chain, some applications rely on FBs that approximate the frequency analysis performed in the auditory periphery, the gammatone FB being a popular example. However, current gammatone FBs only allow partial reconstruction and stability at high redundancies. In this article, we construct an analysis–synthesis system for audio applications. The proposed system, referred to as Audlet, is an oversampled FB with filters distributed on auditory frequency scales. It allows perfect reconstruction for a wide range of FB settings (e.g., the shape and density of filters), efficient FB design, and adaptable redundancy. In particular, we show how to construct a gammatone FB with perfect reconstruction. Experiments demonstrate performance improvements of the proposed gammatone FB when compared to current gammatone FBs in terms of reconstruction error and stability, especially at low redundancies. An application of the framework to audio source separation illustrates its utility for audio processing. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

4206 KiB  
Article
A Real-Time Sound Field Rendering Processor
by Tan Yiyu, Yasushi Inoguchi, Makoto Otani, Yukio Iwaya and Takao Tsuchiya
Appl. Sci. 2018, 8(1), 35; https://doi.org/10.3390/app8010035 - 28 Dec 2017
Cited by 5 | Viewed by 4036
Abstract
Real-time sound field renderings are computationally intensive and memory-intensive. Traditional rendering systems based on computer simulations suffer from memory bandwidth and arithmetic units. The computation is time-consuming, and the sample rate of the output sound is low because of the long computation time [...] Read more.
Real-time sound field renderings are computationally intensive and memory-intensive. Traditional rendering systems based on computer simulations suffer from memory bandwidth and arithmetic units. The computation is time-consuming, and the sample rate of the output sound is low because of the long computation time at each time step. In this work, a processor with a hybrid architecture is proposed to speed up computation and improve the sample rate of the output sound, and an interface is developed for system scalability through simply cascading many chips to enlarge the simulated area. To render a three-minute Beethoven wave sound in a small shoe-box room with dimensions of 1.28 m × 1.28 m × 0.64 m, the field programming gate array (FPGA)-based prototype machine with the proposed architecture carries out the sound rendering at run-time while the software simulation with the OpenMP parallelization takes about 12.70 min on a personal computer (PC) with 32 GB random access memory (RAM) and an Intel i7-6800K six-core processor running at 3.4 GHz. The throughput in the software simulation is about 194 M grids/s while it is 51.2 G grids/s in the prototype machine even if the clock frequency of the prototype machine is much lower than that of the PC. The rendering processor with a processing element (PE) and interfaces consumes about 238,515 gates after fabricated by the 0.18 µm processing technology from the ROHM semiconductor Co., Ltd. (Kyoto Japan), and the power consumption is about 143.8 mW. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Graphical abstract

1375 KiB  
Article
Populating the Mix Space: Parametric Methods for Generating Multitrack Audio Mixtures
by Alex Wilson and Bruno M. Fazenda
Appl. Sci. 2017, 7(12), 1329; https://doi.org/10.3390/app7121329 - 20 Dec 2017
Cited by 4 | Viewed by 5251
Abstract
The creation of multitrack mixes by audio engineers is a time-consuming activity and creating high-quality mixes requires a great deal of knowledge and experience. Previous studies on the perception of music mixes have been limited by the relatively small number of human-made mixes [...] Read more.
The creation of multitrack mixes by audio engineers is a time-consuming activity and creating high-quality mixes requires a great deal of knowledge and experience. Previous studies on the perception of music mixes have been limited by the relatively small number of human-made mixes analysed. This paper describes a novel “mix-space”, a parameter space which contains all possible mixes using a finite set of tools, as well as methods for the parametric generation of artificial mixes in this space. Mixes that use track gain, panning and equalisation are considered. This allows statistical methods to be used in the study of music mixing practice, such as Monte Carlo simulations or population-based optimisation methods. Two applications are described: an investigation into the robustness and accuracy of tempo-estimation algorithms and an experiment to estimate distributions of spectral centroid values within sets of mixes. The potential for further work is also described. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

11615 KiB  
Article
Virtual Analog Models of the Lockhart and Serge Wavefolders
by Fabián Esqueda, Henri Pöntynen, Julian D. Parker and Stefan Bilbao
Appl. Sci. 2017, 7(12), 1328; https://doi.org/10.3390/app7121328 - 20 Dec 2017
Cited by 21 | Viewed by 8384
Abstract
Wavefolders are a particular class of nonlinear waveshaping circuits, and a staple of the “West Coast” tradition of analog sound synthesis. In this paper, we present analyses of two popular wavefolding circuits—the Lockhart and Serge wavefolders—and show that they achieve a very similar [...] Read more.
Wavefolders are a particular class of nonlinear waveshaping circuits, and a staple of the “West Coast” tradition of analog sound synthesis. In this paper, we present analyses of two popular wavefolding circuits—the Lockhart and Serge wavefolders—and show that they achieve a very similar audio effect. We digitally model the input–output relationship of both circuits using the Lambert-W function, and examine their time- and frequency-domain behavior. To ameliorate the issue of aliasing distortion introduced by the nonlinear nature of wavefolding, we propose the use of the first-order antiderivative method. This method allows us to implement the proposed digital models in real-time without having to resort to high oversampling factors. The practical synthesis usage of both circuits is discussed by considering the case of multiple wavefolder stages arranged in series. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

25421 KiB  
Article
Playing for a Virtual Audience: The Impact of a Social Factor on Gestures, Sounds and Expressive Intents
by Simon Schaerlaeken, Didier Grandjean and Donald Glowinski
Appl. Sci. 2017, 7(12), 1321; https://doi.org/10.3390/app7121321 - 19 Dec 2017
Cited by 7 | Viewed by 4988
Abstract
Can we measure the impact of the presence of an audience on musicians’ performances? By exploring both acoustic and motion features for performances in Immersive Virtual Environments (IVEs), this study highlights the impact of the presence of a virtual audience on both the [...] Read more.
Can we measure the impact of the presence of an audience on musicians’ performances? By exploring both acoustic and motion features for performances in Immersive Virtual Environments (IVEs), this study highlights the impact of the presence of a virtual audience on both the performance and the perception of authenticity and emotional intensity by listeners. Gestures and sounds produced were impacted differently when musicians performed at different expressive intents. The social factor made features converge towards values related to a habitual way of playing regardless of the expressive intent. This could be due to musicians’ habits to perform in a certain way in front of a crowd. On the listeners’ side, when comparing different expressive conditions, only one congruent condition (projected expressive intent in front of an audience) boosted the participants’ ratings for both authenticity and emotional intensity. At different values for kinetic energy and metrical centroid, stimuli recorded with an audience showed a different distribution of ratings, challenging the ecological validity of artificially created expressive intents. Finally, this study highlights the use of IVEs as a research tool and a training assistant for musicians who are eager to learn how to cope with their anxiety in front of an audience. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

1524 KiB  
Article
Mobile Music, Sensors, Physical Modeling, and Digital Fabrication: Articulating the Augmented Mobile Instrument
by Romain Michon, Julius Orion Smith, Matthew Wright, Chris Chafe, John Granzow and Ge Wang
Appl. Sci. 2017, 7(12), 1311; https://doi.org/10.3390/app7121311 - 19 Dec 2017
Cited by 14 | Viewed by 6089
Abstract
Two concepts are presented, extended, and unified in this paper: mobile device augmentation towards musical instruments design and the concept of hybrid instruments. The first consists of using mobile devices at the heart of novel musical instruments. Smartphones and tablets are augmented with [...] Read more.
Two concepts are presented, extended, and unified in this paper: mobile device augmentation towards musical instruments design and the concept of hybrid instruments. The first consists of using mobile devices at the heart of novel musical instruments. Smartphones and tablets are augmented with passive and active elements that can take part in the production of sound (e.g., resonators, exciter, etc.), add new affordances to the device, or change its global aesthetics and shape. Hybrid instruments combine physical/acoustical and “physically informed” virtual/digital elements. Recent progress in physical modeling of musical instruments and digital fabrication is exploited to treat instrument parts in a multidimensional way, allowing any physical element to be substituted with a virtual one and vice versa (as long as it is physically possible). A wide range of tools to design mobile hybrid instruments is introduced and evaluated. Aesthetic and design considerations when making such instruments are also presented through a series of examples. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

2125 KiB  
Article
A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs
by Merlijn Blaauw and Jordi Bonada
Appl. Sci. 2017, 7(12), 1313; https://doi.org/10.3390/app7121313 - 18 Dec 2017
Cited by 71 | Viewed by 18160
Abstract
We recently presented a new model for singing synthesis based on a modified version of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a parametric vocoder that separates the influence of pitch and timbre. This allows conveniently modifying [...] Read more.
We recently presented a new model for singing synthesis based on a modified version of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a parametric vocoder that separates the influence of pitch and timbre. This allows conveniently modifying pitch to match any target melody, facilitates training on more modest dataset sizes, and significantly reduces training and generation times. Nonetheless, compared to modeling waveform directly, ways of effectively handling higher-dimensional outputs, multiple feature streams and regularization become more important with our approach. In this work, we extend our proposed system to include additional components for predicting F0 and phonetic timings from a musical score with lyrics. These expression-related features are learned together with timbrical features from a single set of natural songs. We compare our method to existing statistical parametric, concatenative, and neural network-based approaches using quantitative metrics as well as listening tests. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

2320 KiB  
Article
A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity
by Maoshen Jia, Jiaming Zhang, Changchun Bao and Xiguang Zheng
Appl. Sci. 2017, 7(12), 1301; https://doi.org/10.3390/app7121301 - 14 Dec 2017
Cited by 8 | Viewed by 4486
Abstract
Rendering spatial sound scenes via audio objects has become popular in recent years, since it can provide more flexibility for different auditory scenarios, such as 3D movies, spatial audio communication and virtual classrooms. To facilitate high-quality bitrate-efficient distribution for spatial audio objects, an [...] Read more.
Rendering spatial sound scenes via audio objects has become popular in recent years, since it can provide more flexibility for different auditory scenarios, such as 3D movies, spatial audio communication and virtual classrooms. To facilitate high-quality bitrate-efficient distribution for spatial audio objects, an encoding scheme based on intra-object sparsity (approximate k-sparsity of the audio object itself) is proposed in this paper. The statistical analysis is presented to validate the notion that the audio object has a stronger sparseness in the Modified Discrete Cosine Transform (MDCT) domain than in the Short Time Fourier Transform (STFT) domain. By exploiting intra-object sparsity in the MDCT domain, multiple simultaneously occurring audio objects are compressed into a mono downmix signal with side information. To ensure a balanced perception quality of audio objects, a Psychoacoustic-based time-frequency instants sorting algorithm and an energy equalized Number of Preserved Time-Frequency Bins (NPTF) allocation strategy are proposed, which are employed in the underlying compression framework. The downmix signal can be further encoded via Scalar Quantized Vector Huffman Coding (SQVH) technique at a desirable bitrate, and the side information is transmitted in a lossless manner. Both objective and subjective evaluations show that the proposed encoding scheme outperforms the Sparsity Analysis (SPA) approach and Spatial Audio Object Coding (SAOC) in cases where eight objects were jointly encoded. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

1683 KiB  
Article
Wearable Vibration Based Computer Interaction and Communication System for Deaf
by Mete Yağanoğlu and Cemal Köse
Appl. Sci. 2017, 7(12), 1296; https://doi.org/10.3390/app7121296 - 13 Dec 2017
Cited by 9 | Viewed by 6993
Abstract
In individuals with impaired hearing, determining the direction of sound is a significant problem. The direction of sound was determined in this study, which allowed hearing impaired individuals to perceive where sounds originated. This study also determined whether something was being spoken loudly [...] Read more.
In individuals with impaired hearing, determining the direction of sound is a significant problem. The direction of sound was determined in this study, which allowed hearing impaired individuals to perceive where sounds originated. This study also determined whether something was being spoken loudly near the hearing impaired individual. In this manner, it was intended that they should be able to recognize panic conditions more quickly. The developed wearable system has four microphone inlets, two vibration motor outlets, and four Light Emitting Diode (LED) outlets. The vibration of motors placed on the right and left fingertips permits the indication of the direction of sound through specific vibration frequencies. This study applies the ReliefF feature selection method to evaluate every feature in comparison to other features and determine which features are more effective in the classification phase. This study primarily selects the best feature extraction and classification methods. Then, the prototype device has been tested using these selected methods on themselves. ReliefF feature selection methods are used in the studies; the success of K nearest neighborhood (Knn) classification had a 93% success rate and classification with Support Vector Machine (SVM) had a 94% success rate. At close range, SVM and two of the best feature methods were used and returned a 98% success rate. When testing our wearable devices on users in real time, we used a classification technique to detect the direction and our wearable devices responded in 0.68 s; this saves power in comparison to traditional direction detection methods. Meanwhile, if there was an echo in an indoor environment, the success rate increased; the echo canceller was disabled in environments without an echo to save power. We also compared our system with the localization algorithm based on the microphone array; the wearable device that we developed had a high success rate and it produced faster results at lower cost than other methods. This study provides a new idea for the benefit of deaf individuals that is preferable to a computer environment. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

1630 KiB  
Article
Audio Time Stretching Using Fuzzy Classification of Spectral Bins
by Eero-Pekka Damskägg and Vesa Välimäki
Appl. Sci. 2017, 7(12), 1293; https://doi.org/10.3390/app7121293 - 12 Dec 2017
Cited by 28 | Viewed by 6490
Abstract
A novel method for audio time stretching has been developed. In time stretching, the audio signal’s duration is expanded, whereas its frequency content remains unchanged. The proposed time stretching method employs the new concept of fuzzy classification of time-frequency points, or bins, in [...] Read more.
A novel method for audio time stretching has been developed. In time stretching, the audio signal’s duration is expanded, whereas its frequency content remains unchanged. The proposed time stretching method employs the new concept of fuzzy classification of time-frequency points, or bins, in the spectrogram of the signal. Each time-frequency bin is assigned, using a continuous membership function, to three signal classes: tonalness, noisiness, and transientness. The method does not require the signal to be explicitly decomposed into different components, but instead, the computing of phase propagation, which is required for time stretching, is handled differently in each time-frequency point according to the fuzzy membership values. The new method is compared with three previous time-stretching methods by means of a listening test. The test results show that the proposed method yields slightly better sound quality for large stretching factors as compared to a state-of-the-art algorithm, and practically the same quality as a commercial algorithm. The sound quality of all tested methods is dependent on the audio signal type. According to this study, the proposed method performs well on music signals consisting of mixed tonal, noisy, and transient components, such as singing, techno music, and a jazz recording containing vocals. It performs less well on music containing only noisy and transient sounds, such as a drum solo. The proposed method is applicable to the high-quality time stretching of a wide variety of music signals. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

647 KiB  
Article
Automatic Transcription of Polyphonic Vocal Music
by Andrew McLeod, Rodrigo Schramm, Mark Steedman and Emmanouil Benetos
Appl. Sci. 2017, 7(12), 1285; https://doi.org/10.3390/app7121285 - 11 Dec 2017
Cited by 16 | Viewed by 5777
Abstract
This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs [...] Read more.
This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs spectrogram decomposition, extending probabilistic latent component analysis (PLCA) using a six-dimensional dictionary with pre-extracted log-spectral templates. The music language model performs voice separation and assignment using hidden Markov models that apply musicological assumptions. By integrating the two models, the system is able to detect multiple concurrent pitches in polyphonic vocal music and assign each detected pitch to a specific voice type such as soprano, alto, tenor or bass (SATB). We compare our system against multiple baselines, achieving state-of-the-art results for both multi-pitch detection and voice assignment on a dataset of Bach chorales and another of barbershop quartets. We also present an additional evaluation of our system using varied pitch tolerance levels to investigate its performance at 20-cent pitch resolution. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Graphical abstract

4665 KiB  
Article
The Effects of Musical Experience and Hearing Loss on Solving an Audio-Based Gaming Task
by Kjetil Falkenberg Hansen and Rumi Hiraga
Appl. Sci. 2017, 7(12), 1278; https://doi.org/10.3390/app7121278 - 10 Dec 2017
Cited by 7 | Viewed by 4670
Abstract
We conducted an experiment using a purposefully designed audio-based game called the Music Puzzle with Japanese university students with different levels of hearing acuity and experience with music in order to determine the effects of these factors on solving such games. A group [...] Read more.
We conducted an experiment using a purposefully designed audio-based game called the Music Puzzle with Japanese university students with different levels of hearing acuity and experience with music in order to determine the effects of these factors on solving such games. A group of hearing-impaired students (n = 12) was compared with two hearing control groups with the additional characteristic of having high (n = 12) or low (n = 12) engagement in musical activities. The game was played with three sound sets or modes; speech, music, and a mix of the two. The results showed that people with hearing loss had longer processing times for sounds when playing the game. Solving the game task in the speech mode was found particularly difficult for the group with hearing loss, and while they found the game difficult in general, they expressed a fondness for the game and a preference for music. Participants with less musical experience showed difficulties in playing the game with musical material. We were able to explain the impacts of hearing acuity and musical experience; furthermore, we can promote this kind of tool as a viable way to train hearing by focused listening to sound, particularly with music. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

579 KiB  
Article
Optimization of Virtual Loudspeakers for Spatial Room Acoustics Reproduction with Headphones
by Otto Puomio, Jukka Pätynen and Tapio Lokki
Appl. Sci. 2017, 7(12), 1282; https://doi.org/10.3390/app7121282 - 09 Dec 2017
Cited by 8 | Viewed by 4764
Abstract
The use of headphones in reproducing spatial sound is becoming more and more popular. For instance, virtual reality applications often use head-tracking to keep the binaurally reproduced auditory environment stable and to improve externalization. Here, we study one spatial sound reproduction method over [...] Read more.
The use of headphones in reproducing spatial sound is becoming more and more popular. For instance, virtual reality applications often use head-tracking to keep the binaurally reproduced auditory environment stable and to improve externalization. Here, we study one spatial sound reproduction method over headphones, in particular the positioning of the virtual loudspeakers. The paper presents an algorithm that optimizes the positioning of virtual reproduction loudspeakers to reduce the computational cost in head-tracked real-time rendering. The listening test results suggest that listeners could discriminate the optimized loudspeaker arrays for renderings that reproduced a relatively simple acoustic conditions, but optimized array was not significantly different from equally spaced array for a reproduction of a more complex case. Moreover, the optimization seems to change the perceived openness and timbre, according to the verbal feedback of the test subjects. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

926 KiB  
Article
Melodic Similarity and Applications Using Biologically-Inspired Techniques
by Dimitrios Bountouridis, Daniel G. Brown, Frans Wiering and Remco C. Veltkamp
Appl. Sci. 2017, 7(12), 1242; https://doi.org/10.3390/app7121242 - 01 Dec 2017
Cited by 7 | Viewed by 4719
Abstract
Music similarity is a complex concept that manifests itself in areas such as Music Information Retrieval (MIR), musicological analysis and music cognition. Modelling the similarity of two music items is key for a number of music-related applications, such as cover song detection and [...] Read more.
Music similarity is a complex concept that manifests itself in areas such as Music Information Retrieval (MIR), musicological analysis and music cognition. Modelling the similarity of two music items is key for a number of music-related applications, such as cover song detection and query-by-humming. Typically, similarity models are based on intuition, heuristics or small-scale cognitive experiments; thus, applicability to broader contexts cannot be guaranteed. We argue that data-driven tools and analysis methods, applied to songs known to be related, can potentially provide us with information regarding the fine-grained nature of music similarity. Interestingly, music and biological sequences share a number of parallel concepts; from the natural sequence-representation, to their mechanisms of generating variations, i.e., oral transmission and evolution respectively. As such, there is a great potential for applying scientific methods and tools from bioinformatics to music. Stripped-down from biological heuristics, certain bioinformatics approaches can be generalized to any type of sequence. Consequently, reliable and unbiased data-driven solutions to problems such as biological sequence similarity and conservation analysis can be applied to music similarity and stability analysis. Our paper relies on such an approach to tackle a number of tasks and more notably to model global melodic similarity. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

2044 KiB  
Article
Exploring the Effects of Pitch Layout on Learning a New Musical Instrument
by Jennifer MacRitchie and Andrew J. Milne
Appl. Sci. 2017, 7(12), 1218; https://doi.org/10.3390/app7121218 - 24 Nov 2017
Cited by 9 | Viewed by 4484
Abstract
Although isomorphic pitch layouts are proposed to afford various advantages for musicians playing new musical instruments, this paper details the first substantive set of empirical tests on how two fundamental aspects of isomorphic pitch layouts affect motor learning: shear, which makes the [...] Read more.
Although isomorphic pitch layouts are proposed to afford various advantages for musicians playing new musical instruments, this paper details the first substantive set of empirical tests on how two fundamental aspects of isomorphic pitch layouts affect motor learning: shear, which makes the pitch axis vertical, and the adjacency (or nonadjacency) of pitches a major second apart. After receiving audio-visual training tasks for a scale and arpeggios, performance accuracies of 24 experienced musicians were assessed in immediate retention tasks (same as the training tasks, but without the audio-visual guidance) and in a transfer task (performance of a previously untrained nursery rhyme). Each participant performed the same tasks with three different pitch layouts and, in total, four different layouts were tested. Results show that, so long as the performance ceiling has not already been reached (due to ease of the task or repeated practice), adjacency strongly improves performance accuracy in the training and retention tasks. They also show that shearing the layout, to make the pitch axis vertical, worsens performance accuracy for the training tasks but, crucially, it strongly improves performance accuracy in the transfer task when the participant needs to perform a new, but related, task. These results can inform the design of pitch layouts in new musical instruments. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

3045 KiB  
Article
EigenScape: A Database of Spatial Acoustic Scene Recordings
by Marc Ciufo Green and Damian Murphy
Appl. Sci. 2017, 7(11), 1204; https://doi.org/10.3390/app7111204 - 22 Nov 2017
Cited by 19 | Viewed by 6753
Abstract
The classification of acoustic scenes and events is an emerging area of research in the field of machine listening. Most of the research conducted so far uses spectral features extracted from monaural or stereophonic audio rather than spatial features extracted from multichannel recordings. [...] Read more.
The classification of acoustic scenes and events is an emerging area of research in the field of machine listening. Most of the research conducted so far uses spectral features extracted from monaural or stereophonic audio rather than spatial features extracted from multichannel recordings. This is partly due to the lack thus far of a substantial body of spatial recordings of acoustic scenes. This paper formally introduces EigenScape, a new database of fourth-order Ambisonic recordings of eight different acoustic scene classes. The potential applications of a spatial machine listening system are discussed before detailed information on the recording process and dataset are provided. A baseline spatial classification system using directional audio coding (DirAC) techniques is detailed and results from this classifier are presented. The classifier is shown to give good overall scene classification accuracy across the dataset, with 7 of 8 scenes being classified with an accuracy of greater than 60% with an 11% improvement in overall accuracy compared to use of Mel-frequency cepstral coefficient (MFCC) features. Further analysis of the results shows potential improvements to the classifier. It is concluded that the results validate the new database and show that spatial features can characterise acoustic scenes and as such are worthy of further investigation. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

3513 KiB  
Article
Identifying Single Trial Event-Related Potentials in an Earphone-Based Auditory Brain-Computer Interface
by Eduardo Carabez, Miho Sugi, Isao Nambu and Yasuhiro Wada
Appl. Sci. 2017, 7(11), 1197; https://doi.org/10.3390/app7111197 - 21 Nov 2017
Cited by 10 | Viewed by 4602
Abstract
As brain-computer interfaces (BCI) must provide reliable ways for end users to accomplish a specific task, methods to secure the best possible translation of the intention of the users are constantly being explored. In this paper, we propose and test a number of [...] Read more.
As brain-computer interfaces (BCI) must provide reliable ways for end users to accomplish a specific task, methods to secure the best possible translation of the intention of the users are constantly being explored. In this paper, we propose and test a number of convolutional neural network (CNN) structures to identify and classify single-trial P300 in electroencephalogram (EEG) readings of an auditory BCI. The recorded data correspond to nine subjects in a series of experiment sessions in which auditory stimuli following the oddball paradigm were presented via earphones from six different virtual directions at time intervals of 200, 300, 400 and 500 ms. Using three different approaches for the pooling process, we report the average accuracy for 18 CNN structures. The results obtained for most of the CNN models show clear improvement over past studies in similar contexts, as well as over other commonly-used classifiers. We found that the models that consider data from the time and space domains and those that overlap in the pooling process usually offer better results regardless of the number of layers. Additionally, patterns of improvement with single-layered CNN models can be observed. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

953 KiB  
Article
Sound Synthesis of Objects Swinging through Air Using Physical Models
by Rod Selfridge, David Moffat and Joshua D. Reiss
Appl. Sci. 2017, 7(11), 1177; https://doi.org/10.3390/app7111177 - 16 Nov 2017
Cited by 13 | Viewed by 6547
Abstract
A real-time physically-derived sound synthesis model is presented that replicates the sounds generated as an object swings through the air. Equations obtained from fluid dynamics are used to determine the sounds generated while exposing practical parameters for a user or game engine to [...] Read more.
A real-time physically-derived sound synthesis model is presented that replicates the sounds generated as an object swings through the air. Equations obtained from fluid dynamics are used to determine the sounds generated while exposing practical parameters for a user or game engine to vary. Listening tests reveal that for the majority of objects modelled, participants rated the sounds from our model as plausible as actual recordings. The sword sound effect performed worse than others, and it is speculated that one cause may be linked to the difference between expectations of a sound and the actual sound for a given object. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

795 KiB  
Article
SymCHM—An Unsupervised Approach for Pattern Discovery in Symbolic Music with a Compositional Hierarchical Model
by Matevž Pesek, Aleš Leonardis and Matija Marolt
Appl. Sci. 2017, 7(11), 1135; https://doi.org/10.3390/app7111135 - 04 Nov 2017
Cited by 6 | Viewed by 4509
Abstract
This paper presents a compositional hierarchical model for pattern discovery in symbolic music. The model can be regarded as a deep architecture with a transparent structure. It can learn a set of repeated patterns within individual works or larger corpora in an unsupervised [...] Read more.
This paper presents a compositional hierarchical model for pattern discovery in symbolic music. The model can be regarded as a deep architecture with a transparent structure. It can learn a set of repeated patterns within individual works or larger corpora in an unsupervised manner, relying on statistics of pattern occurrences, and robustly infer the learned patterns in new, unknown works. A learned model contains representations of patterns on different layers, from the simple short structures on lower layers to the longer and more complex music structures on higher layers. A pattern selection procedure can be used to extract the most frequent patterns from the model. We evaluate the model on the publicly available JKU Patterns Datasetsand compare the results to other approaches. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

1163 KiB  
Article
Supporting an Object-Oriented Approach to Unit Generator Development: The Csound Plugin Opcode Framework
by Victor Lazzarini
Appl. Sci. 2017, 7(10), 970; https://doi.org/10.3390/app7100970 - 21 Sep 2017
Cited by 5 | Viewed by 4711
Abstract
This article presents a new framework for unit generator development for Csound, supporting a full object-oriented programming approach. It introduces the concept of unit generators and opcodes, and its centrality with regards to music programming languages in general, and Csound in specific. The [...] Read more.
This article presents a new framework for unit generator development for Csound, supporting a full object-oriented programming approach. It introduces the concept of unit generators and opcodes, and its centrality with regards to music programming languages in general, and Csound in specific. The layout of an opcode from the perspective of the Csound C-language API is presented, with some outline code examples. This is followed by a discussion which places the unit generator within the object-oriented paradigm and the motivation for a full C++ programming support, which is provided by the Csound Plugin Opcode Framework (CPOF). The design of CPOF is then explored in detail, supported by several opcode examples. The article concludes by discussing two key applications of object-orientation and their respective instances in the Csound code base. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

12209 KiB  
Article
A Two-Stage Approach to Note-Level Transcription of a Specific Piano
by Qi Wang, Ruohua Zhou and Yonghong Yan
Appl. Sci. 2017, 7(9), 901; https://doi.org/10.3390/app7090901 - 02 Sep 2017
Cited by 15 | Viewed by 6613
Abstract
This paper presents a two-stage transcription framework for a specific piano, which combines deep learning and spectrogram factorization techniques. In the first stage, two convolutional neural networks (CNNs) are adopted to recognize the notes of the piano preliminarily, and note verification for the [...] Read more.
This paper presents a two-stage transcription framework for a specific piano, which combines deep learning and spectrogram factorization techniques. In the first stage, two convolutional neural networks (CNNs) are adopted to recognize the notes of the piano preliminarily, and note verification for the specific individual is conducted in the second stage. The note recognition stage is independent of piano individual, in which one CNN is used to detect onsets and another is used to estimate the probabilities of pitches at each detected onset. Hence, candidate pitches at candidate onsets are obtained in the first stage. During the note verification, templates for the specific piano are generated to model the attack of note per pitch. Then, the spectrogram of the segment around candidate onset is factorized using attack templates of candidate pitches. In this way, not only the pitches are picked up by note activations, but the onsets are revised. Experiments show that CNN outperforms other types of neural networks in both onset detection and pitch estimation, and the combination of two CNNs yields better performance than a single CNN in note recognition. We also observe that note verification further improves the performance of transcription. In the transcription of a specific piano, the proposed system achieves 82% on note-wise F-measure, which outperforms the state-of-the-art. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

1629 KiB  
Article
A Low Cost Wireless Acoustic Sensor for Ambient Assisted Living Systems
by Miguel A. Quintana-Suárez, David Sánchez-Rodríguez, Itziar Alonso-González and Jesús B. Alonso-Hernández
Appl. Sci. 2017, 7(9), 877; https://doi.org/10.3390/app7090877 - 27 Aug 2017
Cited by 21 | Viewed by 6031
Abstract
Ambient Assisted Living (AAL) has become an attractive research topic due to growing interest in remote monitoring of older people. Development in sensor technologies and advances in wireless communications allows to remotely offer smart assistance and monitor those people at their own home, [...] Read more.
Ambient Assisted Living (AAL) has become an attractive research topic due to growing interest in remote monitoring of older people. Development in sensor technologies and advances in wireless communications allows to remotely offer smart assistance and monitor those people at their own home, increasing their quality of life. In this context, Wireless Acoustic Sensor Networks (WASN) provide a suitable way for implementing AAL systems which can be used to infer hazardous situations via environmental sounds identification. Nevertheless, satisfying sensor solutions have not been found with the considerations of both low cost and high performance. In this paper, we report the design and implementation of a wireless acoustic sensor to be located at the edge of a WASN for recording and processing environmental sounds which can be applied to AAL systems for personal healthcare because it has the following significant advantages: low cost, small size, audio sampling and computation capabilities for audio processing. The proposed wireless acoustic sensor is able to record audio samples at least to 10 kHz sampling frequency and 12-bit resolution. Also, it is capable of doing audio signal processing without compromising the sample rate and the energy consumption by using a new microcontroller released at the last quarter of 2016. The proposed low cost wireless acoustic sensor has been verified using four randomness tests for doing statistical analysis and a classification system of the recorded sounds based on audio fingerprints. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

Review

Jump to: Editorial, Research

5709 KiB  
Review
Room Response Equalization—A Review
by Stefania Cecchi, Alberto Carini and Sascha Spors
Appl. Sci. 2018, 8(1), 16; https://doi.org/10.3390/app8010016 - 23 Dec 2017
Cited by 69 | Viewed by 11608
Abstract
Room response equalization aims at improving the sound reproduction in rooms by applying advanced digital signal processing techniques to design an equalizer on the basis of one or more measurements of the room response. This topic has been intensively studied in the last [...] Read more.
Room response equalization aims at improving the sound reproduction in rooms by applying advanced digital signal processing techniques to design an equalizer on the basis of one or more measurements of the room response. This topic has been intensively studied in the last 40 years, resulting in a number of effective techniques facing different aspects of the problem. This review paper aims at giving an overview of the existing methods following their historical evolution, and discussing pros and cons of each approach with relation to the room characteristics, as well as instrumental and perceptual measures. The review is concluded by a discussion on emerging topics and new trends. Full article
(This article belongs to the Special Issue Sound and Music Computing)
Show Figures

Figure 1

Back to TopTop