Next Article in Journal
Non-Local Parallel Processing and Database Settlement Using Multiple Teleportation Followed by Grover Post-Selection
Next Article in Special Issue
Source Acquisition Device Identification from Recorded Audio Based on Spatiotemporal Representation Learning with Multi-Attention Mechanisms
Previous Article in Journal
Effective Connectivity and Bias Entropy Improve Prediction of Dynamical Regime in Automata Networks
Previous Article in Special Issue
Improved Transformer-Based Dual-Path Network with Amplitude and Complex Domain Feature Fusion for Speech Enhancement
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations

Analysis/Synthesis Team—STMS, IRCAM, Sorbonne University, CNRS, French Ministry of Culture, 75004 Paris, France
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(2), 375; https://doi.org/10.3390/e25020375
Submission received: 1 January 2023 / Revised: 13 February 2023 / Accepted: 17 February 2023 / Published: 18 February 2023
(This article belongs to the Special Issue Information-Theoretic Approaches in Speech Processing and Recognition)

Abstract

Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate part of its content, primarily its identity, while maintaining the rest unchanged. Research in neural VC has accomplished considerable breakthroughs with the capacity to falsify a voice identity using a small amount of data with a highly realistic rendering. This paper goes beyond voice identity manipulation and presents an original neural architecture that allows the manipulation of voice attributes (e.g., gender and age). The proposed architecture is inspired by the fader network, transferring the same ideas to voice manipulation. The information conveyed by the speech signal is disentangled into interpretative voice attributes by means of minimizing adversarial loss to make the encoded information mutually independent while preserving the capacity to generate a speech signal from the disentangled codes. During inference for voice conversion, the disentangled voice attributes can be manipulated and the speech signal can be generated accordingly. For experimental evaluation, the proposed method is applied to the task of voice gender conversion using the freely available VCTK dataset. Quantitative measurements of mutual information between the variables of speaker identity and speaker gender show that the proposed architecture can learn gender-independent representation of speakers. Additional measurements of speaker recognition indicate that speaker identity can be recognized accurately from the gender-independent representation. Finally, a subjective experiment conducted on the task of voice gender manipulation shows that the proposed architecture can convert voice gender with very high efficiency and good naturalness.
Keywords: voice conversion; attribute manipulation; representation learning; information disentanglement; adversarial learning; cross-entropy voice conversion; attribute manipulation; representation learning; information disentanglement; adversarial learning; cross-entropy

Share and Cite

MDPI and ACS Style

Benaroya, L.; Obin, N.; Roebel, A. Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations. Entropy 2023, 25, 375. https://doi.org/10.3390/e25020375

AMA Style

Benaroya L, Obin N, Roebel A. Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations. Entropy. 2023; 25(2):375. https://doi.org/10.3390/e25020375

Chicago/Turabian Style

Benaroya, Laurent, Nicolas Obin, and Axel Roebel. 2023. "Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations" Entropy 25, no. 2: 375. https://doi.org/10.3390/e25020375

APA Style

Benaroya, L., Obin, N., & Roebel, A. (2023). Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations. Entropy, 25(2), 375. https://doi.org/10.3390/e25020375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop