Enabling Software Technologies for Voice Command Processing on NXP Arm® Cortex® M-Based MCUs

Overview

Features

SLN-ALEXA-IOT

SLN-ALEXA-IOT

MCU Minutes | Alango Voice Enhancement Package Running on i.MX RT600 Crossover MCUs

MCU Minutes | Alango Voice Enhancement Package Running on i.MX RT600 Crossover MCUs thumbnail

MCU Tech Minutes: Voice Controlled Audio Player Demo Using NXP’s Voice Intelligent Library (VIT) on i.MX RT600 MCUs

MCU Tech Minutes: Voice Controlled Audio Player Demo Using NXP’s Voice Intelligent Library (VIT) on i.MX RT600 MCUs thumbnail

Libraries and examples provided through MCUXpresso SDK

NXP EdgeReady MCU-based solution for Amazon's Alexa Voice Service (AVS) integration for AWS IoT Core leverages the i.MX RT crossover MCU, enabling developers to quickly and easily add Alexa voice assistant capabilities to their products. This ultra-small form-factor, turnkey hardware design comes completely integrated with Amazon qualified software for an out-of-the-box AVS experience, enabling voice control with fastest time to market with no prior knowledge of voice control required. Please follow the link above for more information.

NXP has developed Voice Intelligent Technology (VIT) to enable voice command recognition at no cost and without the need for lengthy and expensive training. Available partner solutions include the ability to combine voice recognition with audio playback solutions, are compatible with MCUXpresso SDK, and have demonstrations are available to enable in-depth evaluation for your design.

Turnkey AVS solution

  • i.MX RT106A based – includes license to use all 3rd party software
  • Amazon AVS qualified
  • Supports up to 3 microphones
  • Wi-Fi and BLE enabled
  • Machine Learning far field audio front end
  • No voice or audio expertise required

Turnkey Local Voice Solution

  • i.MX RT106L based – includes license to use all 3rd party software
  • Supports up to 3 microphones
  • Wi-Fi® and Bluetooth® LE enabled
  • Machine Learning far field audio front end
  • Industry leading phrase spotting speech recognition engine, supports up to 30 custom commands in more than 60 languages.
  • No voice or audio expertise required

VIT

  • Custom commands in English created via Text2Model tool
  • Custom Trigger Word option
  • Low latency detection (<200ms)
  • Wake Word + Voice Command ready on i.MX RT1060 and i.MX RT600

Partner solutions

  • Advanced audio front ends and system design tools for high performance pre-processing for voice recognition for far field voice (DSP Concepts, Alango)
  • Solutions for human to human and human to machine communications (DSP Concepts, Alango)
  • Industry leading phrase spotting speech recognition engine, supports over 12 custom commands in many languages with option to create via on-line text2model tool (Sensory)

Voice Intelligent Technology (VIT)

VIT is based on state of the art deep learning and speech recognition technologies and has been developed by NXP as a complete Wake Word / Voice Commands solution. VIT is available royalty-free on supported NXP devices in MCUXpresso SDK, and currently supports English language. VIT features include:

  • Wake Word Engine (WWE) which uses recorded trigger word files required for training. Data augmentation techniques are used during training phase to integrate variability in the dataset. A Neural network Classifier determines whether the sequence of phonemes extracted corresponds to the targeted key word.
  • Voice Command Engine (VCE) that does not require an audio dataset. Voice commands targeted are converted into a word symbols sequence (in an offline process), and the VCE determines (at runtime) the likelihood that the sequence of phonemes extracted corresponds to a particular word symbol sequence (and hence a command). 1 model supports up to ~30 voice commands from a large vocabulary choice.

Platforms supported by VIT are shown below:

Device Family Core Co-Processor Recommended Evaluation Board(s)
i.MX RT600 Cortex-M33 Cadence® Tensilica® HiFi 4 audio DSP MIMXRT685-EVK
i.MX RT1060 Cortex-M7 - MIMXRT1060-EVK

Partner voice processing solutions

NXP has partnered with leaders in the area speech recognition (e.g. Sensory) and far field audio front end (e.g. DSP Concepts & Alango) to enable a range of specialist, high performance solutions on our range of Arm Cortex-M based microcontrollers.

Alango

Alango’s DSP/MCU sound processing software technologies improve the quality of voice communication and enhance the audio experience in Automotive hands-free systems, Bluetooth communication headsets, Smart speakers, Mobile and Cordless phones, High-end audio/video conferencing systems, Intercom systems, Laptops, Office speakerphones, Tablets, Assistive listening and Hearing enhancement devices.

Voice products available from Alango for NXP Arm Cortex-M microcontrollers are shown below. Alango products also support several of NXP’s i.MX processors.

Alango product Overview i.MX RT600 Arm® Cortex® M33 plus Cadence Xtensa Hifi4 DSP i.MX RT 1xxx Arm Cortex M7
OnlyVoice OnlyVoice is Alango’s advanced voice acquisition technology for true wireless (TWS) earphones, Bluetooth headphones and earbuds, and high-performance headsets.
Voice Activity Detection (VAD) Alango’s Voice Activity Detection (VAD) technology reliably detects human speech in an acoustic signal. The technology is based on a proprietary, high-resolution spectral noise estimation algorithm operating in real time.
Voice Communication Package (VCP) Voice Communication Package (VCP) is a universal software package of digital signal processing technologies for voice applications enabling high quality, full duplex, and noise free communication from various environments.
Voice Enhancement Package (VEP) Voice Enhancement Package (VEP) is a suite of real-time software DSP technologies designed for improving speech recognition performance in voice-controlled multimedia devices.

DSP Concepts

Create, tune, and productize audio features with Audio Weaver Designer, a low code, drag and drop real-time interface with live module inspectors and over 400 different audio building blocks. Deploy highly optimized audio to NXP's most popular embedded processors with AWE Core. Customize your own playback sound and combine it with TalkTo, an Audio Front End, to achieve the highest performing voice control system on the market. TalkTo detects and extracts faint voice commands in extremely noisy environments and passes AVS 2.1 premium and Google ART.

Products available from DSP Concepts for NXP Arm Cortex-M microcontrollers are shown below. DSP Concepts products also support several of NXP’s i.MX processors.

DSP Concepts product Overview i.MX RT600 Arm Cortex M33 plus Cadence Xtensa Hifi4 DSP
Audio Weaver Designer A low code, drag and drop real-time interface with live module inspectors and over 400 different audio building blocks to integrateor create advanced audio features quickly.
TalkTo TalkTo, an Audio Front End, is the highest performing voice control system on the market. TalkTo detects and extracts faint voice commands in extremely noisy environments and passes AVS 2.1 premium and Google ART.

Sensory

Sensory’s TrulyHandsfree wake word and phrase spotting technology is known for fast response, low power consumption, and excellent performance from a distance or in noisy environments. This technology is an integral component for fully featured voice control of devices in the home, car, and anywhere voice user interfaces could be deployed. Sensory’s technology is complementary to front-end processing solutions from other partners, such as DSP Concepts. It is available for Arm Cortex M4/M33/M7 cores and also for Cadence Xtensa DSP cores. TrulyHandsfree is compatible with the Sensory VoiceHub, enabling developers to quickly build models for custom commands and wake words with text input.

Sensory also support several of NXP’s other products, including i.MX processors.