Mel frequency spectral domain defenses against adversarial attacks on speech recognition systems

JASA Express Lett. 2023 Mar;3(3):035208. doi: 10.1121/10.0017680.

Abstract

Automatic speech recognition (ASR) systems are vulnerable to adversarial attacks due to their reliance on machine learning models. Many of the defenses explored for defending ASR systems simply adapt defense approaches developed for the image domain. This paper explores speech-specific defenses in the feature domain and introduces a defense method called mel domain noise flooding (MDNF). MDNF injects additive noise to the mel spectrogram speech representation prior to re-synthesizing the audio signal input to ASR. The defense is evaluated against strong white-box threat models and shows competitive robustness.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Machine Learning
  • Noise / adverse effects
  • Speech Perception*
  • Speech Recognition Software
  • Speech*