Studieren. Wissen. Machen.



Spatially Informed Neural Beamforming with Distributed Microphone Arrays

Robust real-time audio signal enhancement increasingly relies on multichannel microphone arrays for signal acquisition. Sophisticated beamforming algorithms have been developed to maximize the benefit of multiple microphones. With the recent success of deep learning models created for audio signal processing, the task of Neural Beamforming remains an open research topic. This project investigates a Neural Beamformer architecture capable of performing spatial beamforming with microphones randomly distributed over very large areas, even in negative signal-to-noise ratio environments with multiple noise sources and reverberation. The proposed method combines adaptive, nonlinear filtering and the computation of spatial relations with state-of-the-art mask estimation networks. The resulting end-to-end network architecture is fully differentiable and provides excellent signal separation performance. Combining a small number of principle building blocks, the method is capable of low-latency, domain-specific beamforming even in challenging environments.

Audio Examples


This collection of listening material concentrates on the evaluation of the sound characteristics produced by the Neural Beamformer. For a range of scenes generated from previously unseen speech and noise data, all input channels and processed outputs are provided. Additionally, a mixture of reference input and output is given. This mixture provides some insight into the phase coherence between input and model output.

Supplied signals are listed in the following (not all signals will be present in every scenario):

Aux channel x all input channels to the model
reference channel input to the model defined as Ref
Single-channel DPRNN Output of Dual-Path RNN applied to reference channel
NBF Neural Beamformer output
NBF-DPRNN Output of combination model containing Neural Beamformer and DPRNN.
GCC-PHAT DS BF Output of GCC-PHAT-based Delay-and-Sum beamforming algorithm.
75% mix Model output, mixed at a 3:1 ratio with the reference input channel
target target output, used for SDRi computation