Mar 28, 2022, 1:55 PM PST
In this session, we will look at a low-power real-time embedded mask-based beamformer for voice UI systems. Our solution is designed to improve wake word and voice commands trigger rates in real-life noisy scenarios and do not require any cloud interaction. The voice UI system is built with a denoising audio front-end, a wake word engine, and a voice command engine. Such a system is constrained by low-power and high-performance requirements. In particular, real-time processing and noise robustness are the most challenging issues. To meet the challenges, our solution is designed for embedded systems and is hybrid—a neural network is feeding a MWF-based multichannel processing algorithm. The 18k-parameter network is quantized in 16 bits and runs efficiently at 12MHz on an RT1060 MCU. In a 3-mics configuration, the complete speech enhancement solution is running on average at 160Mhz on the Arm Cortex-M7 device and leads to a 40% hit-rate improvement.