Acoustic space learning for sound-source separation and localization on binaural manifolds

Antoine Deleforge; Florence Forbes; Radu Horaud

doi:10.1142/S0129065714400036

Acoustic space learning for sound-source separation and localization on binaural manifolds

Int J Neural Syst. 2015 Feb;25(1):1440003. doi: 10.1142/S0129065714400036.

Authors

Antoine Deleforge¹, Florence Forbes, Radu Horaud

Affiliation

¹ INRIA Grenoble Rhône-Alpes, 655 Avenue de l'Europe, Saint-Ismier, 38334, France.

PMID: 25164245
DOI: 10.1142/S0129065714400036

Abstract

In this paper, we address the problems of modeling the acoustic space generated by a full-spectrum sound source and using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the binaural manifold paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head. A nonlinear dimensionality reduction technique is used to show that these data lie on a two-dimensional (2D) smooth manifold parameterized by the motor states of the listener, or equivalently, the sound-source directions. We propose a probabilistic piecewise affine mapping model (PPAM) specifically designed to deal with high-dimensional data exhibiting an intrinsic piecewise linear structure. We derive a closed-form expectation-maximization (EM) procedure for estimating the model parameters, followed by Bayes inversion for obtaining the full posterior density function of a sound-source direction. We extend this solution to deal with missing data and redundancy in real-world spectrograms, and hence for 2D localization of natural sound sources such as speech. We further generalize the model to the challenging case of multiple sound sources and we propose a variational EM framework. The associated algorithm, referred to as variational EM for source separation and localization (VESSL) yields a Bayesian estimation of the 2D locations and time-frequency masks of all the sources. Comparisons of the proposed approach with several existing methods reveal that the combination of acoustic-space learning with Bayesian inference enables our method to outperform state-of-the-art methods.

Keywords: Binaural hearing; EM inference; manifold learning; mixture of regressors; sound localization; sound-source separation.

MeSH terms

Acoustics*
Bayes Theorem
Cues
Humans
Learning / physiology*
Models, Theoretical*
Principal Component Analysis
Signal Processing, Computer-Assisted
Sound Localization*
Spectrum Analysis