Click here to flash read.
The past decade has witnessed substantial growth of data-driven speech
enhancement (SE) techniques thanks to deep learning. While existing approaches
have shown impressive performance in some common datasets, most of them are
designed only for a single condition (e.g., single-channel, multi-channel, or a
fixed sampling frequency) or only consider a single task (e.g., denoising or
dereverberation). Currently, there is no universal SE approach that can
effectively handle diverse input conditions with a single model. In this paper,
we make the first attempt to investigate this line of research. First, we
devise a single SE model that is independent of microphone channels, signal
lengths, and sampling frequencies. Second, we design a universal SE benchmark
by combining existing public corpora with multiple conditions. Our experiments
on a wide range of datasets show that the proposed single model can
successfully handle diverse conditions with strong performance.
No creative common's license