Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation
The introduction of interview speech in recent NIST Speaker Recognition Evaluations (SREs) has necessitated the development of robust voice activity detectors (VADs) that can work under very low signal-to-noise ratio. This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties of detecting speech/non-speech segments in these files. To alleviate […]
ENERGY-BASED MULTI-SPEAKER VOICE ACTIVITY DETECTION WITH AN AD HOC MICROPHONE ARRAY
In this paper, we propose an energy-based technique to track the power of multiple simultaneous speakers using an ad hoc microphone array with unknown microphone positions. By considering the short-term power of the microphone signals, the problem can be converted into a non-negative blind source separation (NBSS) problem. By exploiting the prior knowledge that the […]
Voice activity detection based on statistical models and machine learning approaches
The voice activity detectors (VADs) based on statistical models have shown impressive performances especially when fairly precise statistical models are employed. Moreover, the accuracy of the VAD utilizing statistical models can be significantly improved when machine-learning techniques are adopted to provide prior knowledge for speech characteristics. In the first part of this paper, we introduce […]