Empowering Musicians: Innovating Virtual Ensemble Concert Music with Networked Audio Technology

Wu, Jiayue Cecilia

doi:10.3390/virtualworlds4010009

Open AccessArticle

Empowering Musicians: Innovating Virtual Ensemble Concert Music with Networked Audio Technology

by

Jiayue Cecilia Wu

College of Arts and Media, University of Colorado Denver, Denver, CO 80204, USA

Virtual Worlds 2025, 4(1), 9; https://doi.org/10.3390/virtualworlds4010009

Submission received: 26 August 2024 / Revised: 27 December 2024 / Accepted: 3 March 2025 / Published: 14 March 2025

(This article belongs to the Special Issue Networked Virtual Reality, Mixed Reality and Augmented Reality Systems)

Download

Browse Figures

Versions Notes

Abstract

:

This study investigates the application of network audio technology in performing arts and media art collaborations within virtual environments, analyzing its impact through four case studies. Employing a practice-based research methodology through using a variety of open-source software and communication protocols, it examines the cultural and social dynamics, creative workflows, and technical frameworks of ensembles leveraging network audio technology for remote recording and virtual production. These projects, recognized internationally within the electroacoustic music community, underscore the potential of network audio to transform virtual music performance, industry practices, and education. The research addresses challenges in internet-based production, particularly in real-time multichannel audio recording, mixing, and production with limited home setups. Insights into managing multiple audio networks effectively and capturing distinct tracks across virtual spaces are presented, offering both creative and technical strategies for virtual music performance and production in emerging digital environments.

Keywords:

network audio technology; virtual music production; interdisciplinary; media art

1. Introduction

Network audio technology for the real-time transmitting and distributing of high-fidelity, low-latency audio signals through the internet is essential when multiple performers or large ensembles from different geographical locations aim to play acoustic or electroacoustic music instruments or voice synchronously. It has revolutionized multi-user collaborative experiences. The primary goal is to achieve the lowest-possible-latency, high-fidelity audio with minimum glitches, ensuring musicians from different locations can perform remotely together without significant delays in audio transmission. This approach enables seamless remote group performance and synchronization, overcoming geographical barriers and allowing musicians to create music collectively as if they were physically present in the same space. OSC [1], MIDI [2], and/or other communicational protocols and their data transfer are not in the scope of this discussion.

While early experiments with audio transmission and networking predate this era, it is within the late 20th century and early 2000s that we witness the surge of the network music era. This transformative period saw the integration of the internet into our daily lives, coupled with the rapid evolution of computational power. On the cusp of reshaping our world, technology played a significant role in transforming the performing arts. In the upcoming pages, we briefly journey through two dynamic decades marked by innovation, collaboration, and artistic exploration in network audio technology. This journey changes how the performing arts embrace fresh forms and expressions, ushering in a new era of creative possibilities.

There are many past-life stories of network music using telecommunications. Back in the early 1900s, the Telharmonium made a significant mark as the first electronic synthesizer and one of the earliest forms of “semi” net-worked audio technology. This innovation had a distinct way of sharing its sounds. It used telephone receivers connected to large paper cones, effectively acting as an early version of a loudspeaker. The Telharmonium’s performances were distributed to the public through the telephone system [3].

One of the earliest telematic group performances involved Black Jazz musician Paul Robeson, who remotely performed at a choral festival in London in 1957 [4]. Despite the challenges of low audio quality and the inability to play synchronously, Robeson utilized a telephone line to transmit his powerful voice advocating against segregation and racial discrimination from New York to Wales. In this historic event, Robeson and his Welsh collaborators made a meaningful connection and striking social impact, responding creatively to the constraints of their era.

While John Cage’s Imaginary Landscape No. 4 (March No. 2), scored for 12 radios, is widely acknowledged by scholars as one of the earliest compositions for an inter-dependent network and a potential example of Network Music Performance (NMP) [5], it is essential to acknowledge the presence of diverse intriguing early telematic explorations during the same era in the 1960s. For instance, the ’Dial-A-Poem’ project, initiated by John Giorno in the late 1960s, utilized telephone lines to distribute pre-recorded poems to callers, offering an early form of telecommunication artistic expression. Additionally, early group telematic collaborations emerged during the Experiments in Art and Technology (E.A.T.) series in the same era, reflecting the intersection of art, technology, and telematic consciousness [6].

Digital network audio technology has evolved gradually since the late 1980s, and its development involved contributions from various institutions and researchers. During this time, much research has been done on high-resolution digital audio and network architecture. On 23 September 2000, a pioneering experiment transpired at McGill University, where a jazz group performed in a concert hall. Notably, the 12 channels of uncompressed Pulse Code Modulation (PCM) audio were recorded and sent to an audio engineer for real-time mixing in a theater at the University of Southern California in Los Angeles. This instance marked a significant milestone, representing the first known occurrence of streaming high-quality live audio over the internet [7]. However, to the best of the author’s knowledge, the significant evolution of network audio which put the technology into practical use begins at Stanford University’s Center for Computer Research in Music and Acoustics (CCRMA), a renowned hub for music technology innovation.

In the late 1990s and early 2000s, CCRMA played a pivotal role in pioneering open-source network audio technology: Jack Audio (JACK) and JackTrip [8]. JACK enables low-latency audio and MIDI data transfer between applications and hardware within a single computer. It is commonly used for creating complex audio setups, focusing on local audio routing. It is worth mentioning that many other network audio technologies, such as Netty McNetface and Sunobus, also use JACK as their audio routing system.

Historically, musicians were limited by physical proximity, and collaborative performances and rehearsals required all participants to be in the same location. JackTrip revolutionized this by enabling remote musical collaboration with near-instantaneous audio transmission. It provided a model for addressing challenges such as latency and jitter, which are critical for applications requiring real-time audio transmission. It has inspired and influenced subsequent developments in network audio technology. It has fostered a community of musicians, educators, and technologists who utilize and contribute to its development.

Although early experiments in audio transmission and networking have been around for decades, the recent surge in networked immersive systems, such as Virtual Reality (VR), Mixed Reality (MR), and Augmented Reality (AR), has pushed real-time networked audio into the spotlight. Networked audio technology adds a rich, immersive layer to these environments, making the experience feel more alive. When you integrate spatialized audio and real-time signal transmission, networked music systems do not just complement VR/MR/AR—they become a core part of the immersive experience. Sound is not just background anymore; it is a key player in how we feel and interact in these virtual worlds [9].

A key challenge across networked immersive systems and networked audio lies in managing latency, synchronization, and accessibility. Just as VR applications demand precise visual and haptic synchronization to maintain immersion, networked audio must minimize signal delay to ensure real-time remote musical collaboration. Similarly, accessibility concerns—ranging from computational power requirements to user-friendly software user interfaces—impact both fields, necessitating solutions that prioritize usability without compromising performance.

This study investigates the applications of networked audio technology within performing arts and media collaborations, focusing on its transformative potential in virtual environments. Through a practice-based research methodology leveraging open-source software and communication protocols, it examines cultural and social dynamics, creative workflows, and technical frameworks that support remote recording and virtual production. By analyzing real-world implementations, this research highlights how networked audio technology can enhance multi-user digital experiences, paralleling and enriching the evolution of immersive virtual platforms.

In this work, four case studies show what is possible using affordable or free open-source network audio technology by telling stories from professional musicians to create collaborative albums and multimedia productions. These music productions have high value and have been well received in the field of electronic music around the world. Although many challenges confront internet-based virtual music studio production, especially in online network real-time multichannel recording, mixing, and production, the great potential of this state-of-art technology brings a game-changing impact to the music industry and music education. We focus on providing solutions to the following challenges: how does a music producer/audio engineer implement multiple audio processing networks and record each location’s audio input in separate tracks over the internet using low-speed home-quality internet connections, prosumer hardware, and the diverse quality of performing and recording spaces available in each unique home? What musical style or approach to music performance and recording would be both effective and satisfying? This chapter provides insightful reflections and a technical framework.

2. Strategic Objectives

Network music practices during the COVID-19 pandemic necessitated the use of software that treats all incoming signals from different geographical locations as independent audio channels, each of which can be routed to separate hardware outputs and tracks in a DAW. Software establishing multiple peer-to-peer connections especially met this requirement. While the initial motivation was to realize a particular approach to real-time interactive electroacoustic music performance, it naturally led to multitracking in a DAW.

The decision of which open-source applications to choose was based on the following criteria: (1) the highest quality of internet audio transmission; (2) low latency; (3) autonomy of choosing cloud servers by the users; (4) approachable and timely technical support from the technology developer; (5) ease of installation and use; (6) stability; and (7) discrete, multichannel routing of each node’s signals. Commercial applications are beyond the scope of this research.

Initial experiments were guided by performances, recordings of graphic scores, and structured improvisations with real-time visual production. The collection of graphic scores used, Spring 2020 [10], focuses on aspects of sound, indicated by the titles, leaving other dimensions open, such as instrumentation, timbre, tempo, and pitch. This provided a range of data regarding the impact of the network on the musical element targeted in each score. The nature of the scores devalues rigid synchronization. Early on, it was recognized that latency induced by physical distance, the vagaries of the network, and local DSP routing could be a frustrating obstacle to overcome; the solution pursued was to embrace the latency as a feature of the telematic experience and embed it in the aesthetic of the music being created.

The free and open-course software packages that became the focus of the exploration were JackTrip, Soundjack, and QuackTrip/Netty McNetface. Sonobus and Jamulus were also explored. Due to the lack of freedom to use private servers, the inability to route multichannel audio, and/or the lower fidelity/compressed audio transmission, these two options were not further explored. Experiments also showed Soundjack to be a promising option. It offers an intuitive web-based interface, routable audio, and a variety of excellent compression algorithms in addition to uncompressed audio. Over the course of experimenting, some rehearsals could not be run because of issues having to do with the Soundjack server. It became clear that relying on a third party to maintain and offer a server would be extremely risky. Additionally, a series of driver updates during this time confused performer users and led to interruptions of the rehearsal schedule.

JackTrip also met many of the criteria. It was used in conjunction with Netty McNetface at the time of this writing. JackTrip’s long history, wide user base, the QJackTrip application [11]—which provides an easy-to-use Graphical User Interface (GUI)—cross-platform stability, and improved installation experience since 2021 make JackTrip an essential part of the toolkit going forward. Furthermore, the timely launch of JackTrip’s “hub-mode” during the pandemic enables standing up a JackTrip server on a cloud-based virtual machine (VM), reducing bandwidth demands on users, and an approach to mixing and recording directly on the VM. Finally, the growth of third-party DIY and preconfigured kits make it easier than ever for musicians to work with JackTrip regardless of technical comfort level or access to the necessary technology.

In the spring and summer of 2020, when the production project was undertaken, however, JackTrip presented obstacles to achieving the goals in multichannel recording and creating professional music production with musicians with limited computer technological skills. For example, the installation was challenging for new users, due to not entirely up-to-date documentation on the latest drivers and required software. Implementation on Windows computers was also confoundingly complex. The requirement for command line operation, and the use of other software with a challenging user interface, made it difficult for musicians with a low tolerance for frustration with computer technology to commit to using the software.

3. Method: A User-Friendly Network Music Production Framework

3.1. Core Framework Overview, Network Configuration, and Audio Routing

This networked music production framework supports remote collaboration, recording, and live performance using Netty McNetface [12], Pure Data (Pd) [13], and JackTrip. It is optimized for multichannel recording, real-time DSP, and networked monitoring, serving as the foundation for case studies including rarescale, Dilate Ensemble, and Heart Sutra, with an additional framework using JackTrip for comparison.

An alternative framework is also presented with real-life examples that only utilize JackTrip. These feasible and accessible frameworks met all the music and technical criteria, particularly their ability to easily route different incoming signals to different audio channels for multichannel recording through a private server at any time the musicians wanted. It is worth mentioning that Netty McNetface can establish an efficient peer-to-peer high-fidelity audio network of as many as 12 users. The private Netty McNetface server is hosted on a cloud-based VM and has successfully networked and recorded as many as seven musicians using home-based Internet Service Providers (ISP) located in Europe, North America, and Asia.

Routing software was applied to assign each incoming network audio signal to an independent channel on a multichannel audio interface, then routed to Pro Tools [14] (or any compatible DAW), where it is recorded on a unique track. Each channel is sent via the interface’s hardware outputs to a creative real-time DSP system, Kyma [15] for real-time audio sampling, analysis, and manipulation. Up to eight channels of audio output from Kyma are returned via hardware aux returns on the audio interface to the DAW and recorded to separate tracks.

In both a live performance and recording session, two mixes are produced. The first is a cue mix for performer monitoring of Kyma’s output, returned to the performers via Netty McNetface. Each remote performer hears the results of Kyma processing on the studio Netty McNetface channel. Netty McNetface allows each participant to manage a mix of all participants and a loopback of their input, albeit with noticeable latency, since this is the result of a peer-to-peer connection between themselves and the server.

The second mix is for broadcast to the performance platform, typically via OBS [16]. In a recording session, this is the rough mix monitored in the studio. This mix includes all dry incoming signals and Kyma signals, subjected to channel processing, spatialization (both stereo and 3D), and an analog summing bus. If broadcast, it is synced with video in OBS, which is then streamed to the platform of choice. Figure 1 shows the core framework.

The most significant post-production editing required due to internet disruption is correcting discontinuous waveforms resulting from packet loss. This can be corrected by manual editing of individual discontinuities and/or the use of audio repair software, such as Izotope RX [17]. Increasing buffer size—and therefore, latency—mitigates but does not typically eliminate the need for this editing. There are simply too many variables in the multiple peer-to-peer connections that can be encountered. The mixing and editing procedures otherwise required are those typical of any recording session. The same issues having to do with mic placement, gain staging, ambient noise floor, etc., all apply.

The peripheral technologies of each performer varied greatly in these projects. Operating systems, type and quality of microphones, digital audio interfaces, and headphones for monitoring were unique to each participant, as was their proficiency, experience, and competence using the technology. The speeds of their ISP varied across the spectrum but were typically less than those of the studio.

For all of the first three case studies, the studio’s computer used for networking and recording audio is a Mac Mini (2018) with OSX.11, Pro Tools Ultimate DAW, Antelope OrionIII routing and mixing software, Pure Data 0.52 and the Netty McNetface 0.92 patch, with an Antelope Orion 32+ Gen. 3 audio interface, Universal Audio UAD-2 Octo DSP, and an analog summing bus (Roll Music Folcrom, A Designs Audio Pacifica mic pre-amplifier, TK BC1 compressor and Neve 8803 EQ). The Kyma sound design workstation is run on a MacBook Pro (2016) with OSX.11, Kyma 7++, Pacarana, and a MOTU Traveler-mk3 audio interface, connected via eight balanced line-level channels to the Antelope Orion interface. The studio ISP offers 400 Mbps download and 20 Mbps upload speeds.

The hardware and software were used on hand in the studio for convenience. However, open-source and shareware equivalents can minimize reliance on commercial software. The following open-source and shareware is a sampling of functionally equivalent software to that used in this research’s four case studies: (1) Ardour (DAW) [18]; (2) Blackhole (audio routing) [19]; (3) VB-Audio Cable (audio routing) [20]; and (4) Pure Data (creative real-time DSP).

The production can also be done entirely in the box, eliminating the need for extensive professional hardware. The number of audio channels available on the audio interface or the number of virtual channels does, however, impact the number of channels available for independent routing in Netty McNetface. This will impact the number of simultaneous channels that can be processed and recorded.

Based on the above strategic objectives and the framework described above, the following sessions elaborate three case studies of network music recording arts and virtual studio engineering solutions. Since JackTrip has complemented the use of Netty McNetface, and recent developments suggest it may eventually become a preferred alternative to the musicians’ reliance on one specific technology, a fourth case study in NowNet Arts Lab Ensemble using the JackTrip Hub solution is also documented as a comparative analysis.

3.2. Case Study 1: “Rarescale”

Rarescale [21] is a contemporary chamber ensemble based in the UK, founded in 2003 by Artistic Director and flutist Carla Rees. She was an early collaborator in this project, which ultimately led to the recording, production, and commercial release of the album 05 IX [22]. This project was guided by the interpretation of the graphic score collection and the full range of flutes—from piccolo to contrabass—to learn about the impact of the network music performance on different musical dimensions (register, dynamics, duration, density of activity, and texture) in an interactive electroacoustic context. Rarescale founding member Sarah Watts joined the project in the late summer of 2020. Her instruments of choice were the alto clarinet, bass clarinet, and basset horn, which broadened the range of timbres, texture, and spectrum in general that the musicians worked with in the interpretation of the scores. Figure 2 shows the signal flow of this virtual music production solution.

Performer satisfaction with the network rehearsals and music production has been high. To quote Carla Rees, “There’s something quite magical about performing with people in different locations and creating a shared musical experience at a distance.” [23] Following several months of experiments, rehearsals, and recording sessions, rarescale began a series of online concerts and editing of the recordings that would become the album 05 IX. It was recorded between October 2020 and April 2021 and released on rarescale records on 20 June 2021.

The album was met with excellent reviews. David McDade of MusicWeb International wrote a review focused entirely on the music, without evidence of the quality of the recording being any less than expected of a commercial recording. He writes: “The performers bounce off each other’s ideas with real glee…This is one of the notable aspects of this recording: for all that, it can be abrasive and angular, often it is very beautiful in ways that are surprising and new. The most powerful impression made upon me by this music is the simple joy of creation. Its style will not be to all tastes but what this case study has are three musicians engaged in pushing at the boundaries of music and, by the sounds of it, having a whale of a time doing so. I certainly did, listening to it [24].” Notably, these reviewers’ text is purely engaged with the performances and performer relationships, and neither the technology nor the circumstances of its creation/production. This evidence indicates that our framework is effective and achieves our high-standard studio engineering goals.

3.3. Case Study 2: “Dilate Ensemble”

Another ensemble engaged in this virtual music production study is Dilate Ensemble [25]. Whereas rarescale was a music ensemble with a long pre-pandemic history, Dilate Ensemble is an audiovisual collaborative group formed at the beginning of the pandemic by visual artist Carole Kim. They have actively performed and recorded using network audio technology since their inception. The ensemble’s focus is on live performance and simultaneous audiovisual tracking.

Dilate Ensemble’s approach to working on this project is distinct from rarescale’s. Rather than interpret and realize graphic scores, the ensemble develops and performs each work as a collective structured improvisation unified around an audio-reactive visual component. The musicians’ response to the visual component is essential to the coherence of the work, while at the same time, the visuals respond algorithmically to a variety of sonic parameters generated by the ensemble. The ensemble’s performance practice could not exist without the Netty McNetface network or an equivalent. Figure 3 shows the Dilate Ensemble audiovisual signal flow and the network music production solution.

During the project, Carole Kim (Southern California) built micro-installations under her kitchen table. These installations were projected on and filmed, and then processed in real time using Isadora [26] video processing software. Netty McNetface was routed to Isadora, where spectral analysis generated data mapped to various video processing parameters. Jon Raskin (Northern California) performed saxophones, electronics, concertina, jaw harp, and found objects. Vocal artist Luisa Muhr (New York City) performed jaw harp in addition to spoken word and singing, Gloria Damijan (Vienna) performed toy piano and percussion, as well as Kyma. The ensemble composition and approach provide a complementary test of the network recording framework as compared to rarescale. It has a larger and more diverse instrumentation, with a strong representation of percussive sounds and a dependence on a simultaneous visual component. The personnel are more widely distributed—in this case, across nine time zones. Furthermore, Dilate Ensemble’s completed works have immense durations when compared to rarescale’s, typically 30 or more minutes long, whereas rarescale’s tracks average just under five minutes in duration.

The long duration of Dilate pieces means there are fewer opportunities for repeated takes. The stakes are higher because of the incorporation of audio-reactive video. This is all compounded by the increased likelihood of a network error or interruption being experienced simply by being actively networked for longer periods and containing more nodes. This proposed framework has proven effective, operating through many sessions that take well between half an hour to an hour of virtual network recording duration.

These sessions are frequently simultaneous live-streamed performances. This makes additional demands on the host computer, which is syncing audio to video in OBS while also tracking the recording from the network. As an alternative to this, Dilate Ensemble has used JackTrip to transmit the finished mix to a remote location for video synchronization and streaming. This dual use of Netty McNetface and JackTrip has proved to be a successful and effective method of distributing the tasks and demands on computing resources.

Dilate’s completed post-productions are long-form, audiovisual pieces hosted on Vimeo and screened at prestigious film and multimedia festivals such as Kunstbetriebe, CURRENTS New Media Festival, New Music Gathering, and the 2021 Australasian Computer Music Conference. Their discography includes Dilate (2020) [27], Parasol (2020) [28], On Memory II: Scaling the Palace (2020, with Paul Chavez) [29], Aerial (2021) [30], and CATENA: sound/image through a hybrid network (2022) [31].

Overall, the artists in Dilate Ensemble are pleased with the recorded results of this framework and the experience of working with it in a live-streamed performance. An ensemble survey following the development of CATENA as part of their Improvising the Net(work) Residency with the CounterPulse Combustible Program and Thoughtworks Arts provided the following qualitative data: “… Diving into a world on its own. Exploring this special universe. Merging our different nodes into one organism of sound… I’m very pleased with the quality of (the) recording on our documentation of CATENA… You pulled a rabbit out (of) the hat, so to speak, with the extreme difficulties we had with the internet which shows the strength of the concept and the hard work we had put into putting it together”.

3.4. Case Study 3: “Heart Sutra”

Heart Sutra is an immersive audiovisual composition, the collective effort of interdisciplinary scholars in fine art, music, computer science, mathematics, and religious studies. Heart Sutra’s music composition was realized through nine virtual network recording sessions. The author applied the proposed framework to develop, rehearse, and record the composition. In the fall of 2021, the result of this audiovisual collaboration was precisely video mapped onto the Western sacred art in mosaic and stained glass at Stanford University’s historical monument, Memorial Church [32]. It was then documented by three high-definition camera recordings for the final postproduction. Figure 4 demonstrates the Heart Sutra network music production solution.

The foundation of the music is a field recording of chanting by monks in the ancestral seat temple for the Jogye Order of Korean Buddhism, Tongdosa Monastery [33], in Seoul, South Korea. In the 1–3 virtual recording session, this field recording was subjected to real-time processing in Kyma, accompanying improvised performances in flute, electronics, voice, and percussion. For the 4–6 recording sessions, the unprocessed field recording provided the structure of a synchronous performance by the flutist and the vocalists. Latency (in the order of hundreds of milliseconds) was problematic using this approach and was adjusted in the mix. The 7–9 recording sessions used a peer-to-peer JackTrip connection to overdub the celletto [34] performance in the mix.

Noticeably, the audio transmission latency for multi-channel synchronicity recording from different geographical locations’ buffer underruns as well as the unstable internet speed and network issues contributing to the audio imperfections can be perceived and was intended to become part of the composition itself. We used iZotopeRX [20] to eliminate most of the unpleasant audio glitches and left the slightly mismatched vocal recitals of the chants, differentiating in milliseconds, to acknowledge the current technical state of the network music practices. Through this work, we have embraced a new aesthetic that includes noises, glitchy sounds, and “micro-rhythmic” timing mismatch. This unique sonic phenomenon defines technology-driven music collaboration in performance and recording arts during the COVID-19 era.

The audio-driven computer graphic algorithms create a series of media art representations that can be manipulated in real time. Audio spectrums and frequencies are extracted from the network music composition. Then, we feed the audio into a Fast Fourier analyzer. The audio data drive different parameters in the custom-designed algorithms and visual filters to manipulate all the visual elements that reflect the piece’s sonic narrative, as detailed in Section 2. The results of the algorithm-driven audiovisual are output as raw footage for editing and rendering in Davinci Resolve [35]. The subsequent iteration is then mapped onto the Western sacred art wall at Stanford University’s Memorial Church.

On 10 October 2021, at the Sankofa World Music Concert, Heart Sutra’s immersive audiovisual installation was presented in front of an audience of 200 at the Memorial Church. The installation used three 6000-lumen projectors, each with 1080p output. At the installation site, we ported the video art into Madmapper [36] and used the software to calibrate and fine-tune the computer graphic elements and colors, so they are precisely mapped onto Memorial Church’s architectural structures and existing Western sacred art. This final alignment was done with care and respect, to enable the positive impact of our multicultural Visual Music to account for the Memorial Church’s grand scale.

The immersive installation harnesses the three-dimensionalities of the neoclassical monument at Stanford University. The reflective optical properties of Memorial Church’s mosaic art and stained-glass windows added a complexity that requires us to intentionally consider the compositional process. This multicultural and multifaith artistic expression creates depth and textural richness that would otherwise be impossible to achieve without the monumental space itself and the open-minded spiritual leadership at Stanford University. It also inspired the spiritual leaders at Stanford to reimagine their sacred space after the first viewing of our installation.

For an immersive audio experience, the church’s characteristic room acoustic reverberation [30] was also recorded by a Zoom H3-VR 360-degree ambisonics field recorder. This spatial reverberation was then mixed into the final Visual Music piece’s audio postproduction to truthfully recreate the church’s sacred soundscape. Figure 5 shows the system architecture of the entire immersive installation process, and video documentation of this installation’s premiere can be reviewed at https://youtu.be/vSGVRu3mAiU (accessed on 2 March 2025).

Overall, the impact of Heart Sutra as a networked music piece and its immersive installation at Stanford’s Memorial Church is multifold. At the time of this writing, the completed audiovisual production was officially selected by prestigious international music/film festivals and research venues such as the Berlin International Art Film Festival, Toronto International Women Film Festival, DAVAMOT Film Festival, New York City Electronic Music Festival (NYCFMF), MOXsonic Festival, the International Computer Music Conference (ICMC), Audio Engineering Society’s 152 Convention, and the Seventh Computer Art Congress. An additional version of Heart Sutra’s music production was realized for live concert performances with video projection and live electronics at the 2022 National Conference of the Society for Electro-Acoustic Music in the United States (SEAMUS) and the College of Music Society (CMS/ATMI) National Conference. It also wins the “Best New Media” Golden Diamond Award at the LA International Asian Film Festival, and the “Best Experimental Short Film” and “Best Director(s)” Awards at the Cuckoo International Film Festival India.

Powered by cutting-edge technology in computer music, network audio, augmented reality, and algorithmic visual art, Heart Sutra aimed at mobilizing the global imagination to collectively envision and build a future that is enriched by the time-tested contemplative wisdom and healing practices found in the sacred traditions throughout the world through media arts and technology. It employs new media, digital, and multidisciplinary art techniques and respectfully expands on the symbolism found in traditional sacred expressions to give them renewed relevance in the contemporary world. It engages and enlightens while critically examining and subverting the existing construct of race, ethnicity, gender, and the related social limitations that pervade the global culture.

3.5. Case Study 4: “The NowNet Arts Lab Ensemble”

The NowNet Arts Lab Ensemble (NNAL) was established in the spring of 2020 to explore the possibilities of telematic arts created by a large ensemble of audio and visual artist participants using residential quality ISPs. I have performed with the ensemble periodically since May 2020. Their experience with the group introduced them to JackTrip and approaches to working with musicians in this project. NNAL changed its name to NowNet Art Hub Ensemble in the fall of 2021.

In spring 2021, NNAL transitioned to the new JackTrip “hub mode”. This mode establishes a hub and spoke network between the server and clients, rather than multiple peer-to-peer connections with each node. The server can be located on a cloud-based VM, selected for its location relative to the ensemble personnel to further minimize latency. The processing power of the VM can be scaled up or down as needed, depending on the size of the ensemble and the required number of audio channels. Clients connect to the server using either a command line or GUI, whatever they are comfortable with.

Mixing and recording the client signals can be done by installing an open-source DAW (e.g., Ardour) on the server VM and routing the incoming client signals to the DAW. Individual client cue mixes can be routed back via JackTrip. Experiments with recording using this method are now underway. This system can accommodate up to 25 different clients, and Lab Ensemble participants have been in Asia, North America, South America, and Europe in their performances. A detailed description of this process is available at the Pretty Good JackTrip Toolkit website [37].

As a lab for experimentation, recordings of NNAL’s work are generally not available to the public, and much work is ahead for this approach to recording arts. It is promising and may offer an efficient means of tracking raw audio, which is then downloaded for editing and mixing. Figure 6 demonstrates the audio signal flow of one of the NNAL’s network music solutions.

4. Discussion

Noticeably, network music technology not only has become a crucial vehicle for music ensembles, bands, and institutions to remotely rehearse, record, perform, and produce concerts and albums during the pandemic, but it also fosters a distinctive type of collaboration and online music communities that are cross-cultural, serving as an alternative venue to live performance and/or traditional studio recording, not as a replacement [38]. Its broad applications in real-life scenarios such as online concerts, remote recording and audio production, and online music instruction, have been extended to reach diverse cultural and socioeconomic populations. In other words, this music technology provides opportunities to transcend geographical distance and deepen human connections between cultures and communities.

Many musicians and ensembles have recorded new albums using the proposed framework with success, such as [39,40]. These albums were released in 2022 and 2023 with New Focus Recordings and Scarp Records. The positive reception of work recorded in this manner suggests that this framework will continue to be viable as a network audio multitrack recording method, at least for some styles of music.

On the other hand, the potential for NNAL’s network audio solution is to connect large ensembles and/or more than 12 nodes of performers. The JackTrip development community’s work improving the installation and operation experience has been exceptional since 2020. The JackTrip user experience will continue to improve soon. Currently, the advantage of this framework using Netty McNetface is that most first-time participants enjoy relative ease in implementing the technology needed and high musical satisfaction from the beginning. As the Internet 2.0 technology has rapidly developed, evolved, and improved, some barriers and challenges in the next few years will undoubtedly be overcome.

5. Conclusions

This research introduces an open-source, multi-channel network audio recording framework designed to facilitate remote collaboration, recording, and live performance for musicians, composers, and audio engineers. Through four detailed case studies, including a comparative analysis with an alternative framework, this study demonstrates the framework’s effectiveness and its potential to reshape virtual music production and performance.

The framework offers an accessible and efficient solution for multichannel recording, real-time DSP, and networked monitoring, addressing key technical and logistical challenges faced by music ensembles, bands, and institutions during and beyond the pandemic. Its success in real-world applications—such as remote album production and virtual ensemble performances—underscores the viability of network music technology as a multitrack recording method, particularly for specific musical genres and performance styles. Albums recorded with this framework, released on New Focus Recordings and Scarp Records, have been well received, highlighting the creative and technical quality achievable through this approach.

In addition to supporting professional production, the framework fosters a distinctive type of online music community and cross-cultural collaboration, bridging geographical divides and enabling meaningful connections among diverse cultural and socioeconomic groups. This technology provides an alternative venue for live performance and traditional studio recording, expanding access to artistic opportunities and community engagement.

Beyond auditory communication, integrating networked audio with visuals, multimedia, and/or haptic feedback mechanisms could significantly enhance multi-sensory immersion. For example, incorporating audio-driven haptic responses—such as vibrations triggered by specific frequencies—could deepen user engagement in interactive virtual environments. Similarly, synchronizing networked audio with real-time visual rendering in mixed-reality applications could lead to innovative artistic and educational experiences that bridge the gap between physical and virtual spaces.

Moving forward, further research is needed to optimize networked audio for integration with immersive digital platforms. Future developments could unlock new possibilities for artistic collaboration, remote learning, and interactive media by addressing technical constraints such as bandwidth limitations and real-time processing efficiency. As networked immersive systems advance, the convergence of audio, visual, and haptic modalities will be crucial in shaping the next generation of fully interactive and immersive experiences.

In summary, this study not only proves the efficacy of the current framework but also emphasizes the transformative impact of network music technology. As these technologies continue to develop, they will likely empower a broader range of musicians, educators, and institutions to embrace virtual collaboration, offering new ways to connect across distances and pushing the boundaries of musical expression and cultural exchange.

Funding

This research was funded by the Stanford Office of Religious and Spiritual Life, The Ho Center for Buddhist Studies at Stanford, the University of Colorado Denver’s College of Arts and Media (CU Denver-CAM), Center for Faculty Development and Advancement (CU Denver-CFDA), and Center for Excellence in Teaching and Learning (CU Denver-CETL), as well as Mahavajra Seon Sanctuary, Yuming Wang, and Atman Partners.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

Many thanks to collaborators and colleagues Rebecca Nie, Scott Miller, Jane Rigler, Chris Chafe, Sarah Weaver, Bonnie Wai-Lee Kwong, and Mike O’Connor.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wright, M. Open Sound Control: An enabling technology for musical networking. Organised Sound 2005, 10, 193–200. [Google Scholar] [CrossRef]
Gareth, L. Musicians make a standard: The MIDI phenomenon. Comput. Music. J. 1985, 9, 8–26. [Google Scholar]
Han, E.; Nowa, K.L.; Bailenson, J.N. Prerequisites for learning in networked immersive virtual reality. Learn. Immersive Virtual Real. 2022, 3. [Google Scholar] [CrossRef]
Behrendt, F. Telephones, music and history: From the invention era to the early smartphone days. Convergence 2021, 27, 1678–1695. [Google Scholar] [CrossRef]
Dessen, M. Thinking Telematically: Improvising Music Worlds Under COVID and Beyond. Jazz Perspect. 2021, 13, 279–287. [Google Scholar] [CrossRef]
Iorwerth, M. Networked Music Performance: Theory and Applications; CRC Press: Boca Raton, FL, USA, 2023. [Google Scholar]
Cáceres, J.P.; Chafe, C. JackTrip: Under the hood of an engine for network audio. J. New Music. Res. 2010, 39, 183–187. [Google Scholar] [CrossRef]
Ascott, R. Telematic Embrace: Visionary Theories of Art, Technology, and Consciousness; University of California Press: Berkeley, CA, USA, 2003. [Google Scholar]
Cooperstock, J.R.; Spackman, S.P. The recording studio that spanned a continent. In Proceedings of the First International Conference on WEB Delivering of Music, Florence, Italy, 23–24 November 2001; pp. 161–167. [Google Scholar]
Miller, S. Spring 2020; American Composers Alliance, Inc. (BMI): New York, NY, USA, 2020. [Google Scholar]
Available online: https://www.psi-borg.org/other-dev.html (accessed on 25 August 2024).
Available online: https://msp.ucsd.edu/tools/quacktrip/ (accessed on 25 August 2024).
Puckette, M.S. Pure Data; ICMC: Geneva, Switzerland, 1997. [Google Scholar]
Available online: https://www.avid.com/ (accessed on 25 August 2024).
Available online: https://kyma.symbolicsound.com/ (accessed on 25 August 2024).
Available online: https://obsproject.com/ (accessed on 25 August 2024).
Available online: https://www.izotope.com/ (accessed on 25 August 2024).
Available online: https://ardour.org (accessed on 25 August 2024).
Available online: https://existential.audio/blackhole/ (accessed on 25 August 2024).
Available online: https://vb-audio.com (accessed on 25 August 2024).
Available online: https://www.rarescale.org.uk (accessed on 25 August 2024).
rarescale, and S. Miller, 05 IX, rarescale records rr006.
Rees, C. Reimagining performance: The internet as a catalyst for change. Pan J. Br. Flute Soc. 2021, 40, 24–28. [Google Scholar]
McDade, D. Scott L Miller 05 IX. MusicWeb International. July 2021. Available online: http://www.musicweb-international.com/classrev/2021/Jul/Miller-05IX-RR06.htm (accessed on 25 August 2024).
Available online: https://www.dilateensemble.com (accessed on 25 August 2024).
Available online: https://troikatronix.com (accessed on 25 August 2024).
Dilate Ensemble. Dilate. Available online: https://vimeo.com/showcase/7960471 (accessed on 25 August 2024).
Dilate Ensemble. Parasol. Available online: https://vimeo.com/showcase/7960471 (accessed on 25 August 2024).
Chavez, P.; Dilate Ensemble. On Memory II. Available online: https://vimeo.com/showcase/7960471 (accessed on 25 August 2024).
Dilate Ensemble. Aerial. Available online: https://vimeo.com/showcase/7960471 (accessed on 25 August 2024).
Dilate Ensemble; Iova-Koga, S. CATENA: Sound/image Through a Hybrid Network. Available online: https://vimeo.com/687804045 (accessed on 25 August 2024).
Available online: https://orsl.stanford.edu/who-we-are/memorial-church-companion-spaces/memorial-church (accessed on 25 August 2024).
Available online: http://www.tongdosa.or.kr/eng/ (accessed on 25 August 2024).
Available online: https://ccrma.stanford.edu/~cc/shtml/cellettoMusic.shtml (accessed on 25 August 2024).
Available online: https://www.blackmagicdesign.com/products/davinciresolve (accessed on 25 August 2024).
Available online: https://madmapper.com/ (accessed on 25 August 2024).
Available online: https://www.pgjtt.com/ (accessed on 25 August 2024).
Oliveros, P.; Weaver, S.; Dresser, M.; Pitcher, J.; Braasch, J.; Chafe, C. Telematic music: Six perspectives. Leonardo Music. J. 2009, 19, 95–96. [Google Scholar]
Zeitgeist; Miller, S.; Kim, C. COINCIDENT Episodes 1, 2, and 3. Available online: http://www.zeitgeistnewmusic.org/coincident.html (accessed on 25 August 2024).
Zeitgeist and No Exit, “Exit Velocity” (a.k.a. “COINCIDENT Episode 4”). Available online: https://noexitnewmusic.com/?past_events=no-exit-and-zeitgeist-may-2021-new-sound-worlds (accessed on 25 August 2024).

Figure 1. The network music production core framework.

Figure 2. Rarescale Ensemble solution.

Figure 3. Dilate Ensemble solution.

Figure 4. Heart Sutra solution.

Figure 5. NowNet Arts Lab Ensemble performing at a network music concert.

Figure 6. Example of a NowNet Arts Lab Ensemble audio signal flow.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, J.C. Empowering Musicians: Innovating Virtual Ensemble Concert Music with Networked Audio Technology. Virtual Worlds 2025, 4, 9. https://doi.org/10.3390/virtualworlds4010009

AMA Style

Wu JC. Empowering Musicians: Innovating Virtual Ensemble Concert Music with Networked Audio Technology. Virtual Worlds. 2025; 4(1):9. https://doi.org/10.3390/virtualworlds4010009

Chicago/Turabian Style

Wu, Jiayue Cecilia. 2025. "Empowering Musicians: Innovating Virtual Ensemble Concert Music with Networked Audio Technology" Virtual Worlds 4, no. 1: 9. https://doi.org/10.3390/virtualworlds4010009

APA Style

Wu, J. C. (2025). Empowering Musicians: Innovating Virtual Ensemble Concert Music with Networked Audio Technology. Virtual Worlds, 4(1), 9. https://doi.org/10.3390/virtualworlds4010009

Article Menu

Empowering Musicians: Innovating Virtual Ensemble Concert Music with Networked Audio Technology

Abstract

1. Introduction

2. Strategic Objectives

3. Method: A User-Friendly Network Music Production Framework

3.1. Core Framework Overview, Network Configuration, and Audio Routing

3.2. Case Study 1: “Rarescale”

3.3. Case Study 2: “Dilate Ensemble”

3.4. Case Study 3: “Heart Sutra”

3.5. Case Study 4: “The NowNet Arts Lab Ensemble”

4. Discussion

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI