Enabling Software Technologies for Voice Command Processing on NXP Arm® Cortex® M-Based MCUs

Overview

Features

SLN-ALEXA-IOT

SLN-ALEXA-IOT

MCU Minutes | Alango Voice Enhancement Package Running on i.MX RT600 Crossover MCUs

MCU Minutes | Alango Voice Enhancement Package Running on i.MX RT600 Crossover MCUs thumbnail

MCU Tech Minutes: Voice Controlled Audio Player Demo Using NXP’s Voice Intelligent Library (VIT) on i.MX RT600 MCUs

MCU Tech Minutes: Voice Controlled Audio Player Demo Using NXP’s Voice Intelligent Library (VIT) on i.MX RT600 MCUs thumbnail

Voice Intelligent Technology (VIT)"

Libraries and examples provided through MCUXpresso SDK

NXP EdgeReady MCU-based solution for Amazon's Alexa Voice Service (AVS) integration for AWS IoT Core leverages the i.MX RT crossover MCU, enabling developers to quickly and easily add Alexa voice assistant capabilities to their products. This ultra-small form-factor, turnkey hardware design comes completely integrated with Amazon qualified software for an out-of-the-box AVS experience, enabling voice control with fastest time to market with no prior knowledge of voice control required. Please follow the link above for more information.

NXP has developed Voice Intelligent Technology (VIT) to enable voice command recognition at no cost and without the need for lengthy and expensive training. Available partner solutions include the ability to combine voice recognition with audio playback solutions, are compatible with MCUXpresso SDK, and have demonstrations are available to enable in-depth evaluation for your design.

Turnkey AVS Solution

  • i.MX RT106A based – includes license to use all 3rd party software
  • Amazon AVS qualified
  • Supports up to 3 microphones
  • Wi-Fi and BLE enabled
  • Machine Learning far field audio front end
  • No voice or audio expertise required

Turnkey Local Voice Solution

  • i.MX RT106L based – includes license to use all 3rd party software
  • Supports up to 3 microphones
  • Wi-Fi® and Bluetooth® LE enabled
  • Machine Learning far field audio front end
  • Industry leading phrase spotting speech recognition engine, supports up to 30 custom commands in more than 60 languages.
  • No voice or audio expertise required

Voice Intelligent Technology (VIT)

  • Always-on technology
  • Wake word model creation options:
    • Text2Model: no audio database required
    • Audio2Model: trained from database with audio files (higher performance, NRE required)
  • Custom commands using Text2Model
  • Large vocabulary available for Text2Model
  • Far-field audio front end supporting different microphones topologies (no tuning required) - up to 3 microphones supported
  • VAD (voice activity detection) helping minimize processing load during silent, non-speech period
  • English and Chinese language support
  • i.MX RT500, i.MX RT600, i.MX RT1060 and i.MX RT1170 supported

Partner Solutions

  • Advanced audio front ends and system design tools for high performance pre-processing for voice recognition for far field voice (DSP Concepts, Alango)
  • Solutions for human to human and human to machine communications (DSP Concepts, Alango)
  • Industry leading phrase spotting speech recognition engine, supports over 12 custom commands in many languages with option to create via on-line text2model tool (Sensory)

Partner voice processing solutions

NXP has partnered with leaders in the area speech recognition (e.g. Sensory) and far field audio front end (e.g. DSP Concepts & Alango) to enable a range of specialist, high performance solutions on our range of Arm Cortex-M based microcontrollers.

Alango

Alango’s DSP/MCU sound processing software technologies improve the quality of voice communication and enhance the audio experience in Automotive hands-free systems, Bluetooth communication headsets, Smart speakers, Mobile and Cordless phones, High-end audio/video conferencing systems, Intercom systems, Laptops, Office speakerphones, Tablets, Assistive listening and Hearing enhancement devices.

Voice products available from Alango for NXP Arm Cortex-M microcontrollers are shown below.

Alango partner profile

Alango product Overview i.MX RT600 Arm® Cortex® M33 plus Cadence Xtensa Hifi4 DSP i.MX RT 1xxx Arm Cortex M7
OnlyVoice OnlyVoice is Alango’s advanced voice acquisition technology for true wireless (TWS) earphones, Bluetooth headphones and earbuds, and high-performance headsets.
Voice Activity Detection (VAD) Alango’s Voice Activity Detection (VAD) technology reliably detects human speech in an acoustic signal. The technology is based on a proprietary, high-resolution spectral noise estimation algorithm operating in real time.
Voice Communication Package (VCP) Voice Communication Package (VCP) is a universal software package of digital signal processing technologies for voice applications enabling high quality, full duplex, and noise free communication from various environments.
Voice Enhancement Package (VEP) Voice Enhancement Package (VEP) is a suite of real-time software DSP technologies designed for improving speech recognition performance in voice-controlled multimedia devices.

DSP Concepts

Create, tune, and productize audio features with Audio Weaver Designer, a low code, drag and drop real-time interface with live module inspectors and over 400 different audio building blocks. Deploy highly optimized audio to NXP's most popular embedded processors with AWE Core. Customize your own playback sound and combine it with TalkTo, an Audio Front End, to achieve the highest performing voice control system on the market. TalkTo detects and extracts faint voice commands in extremely noisy environments and passes AVS 2.1 premium and Google ART.

Products available from DSP Concepts for NXP Arm Cortex-M microcontrollers are shown below.

DSP Concepts partner profile

DSP Concepts product Overview i.MX RT600 Arm Cortex M33 plus Cadence Xtensa Hifi4 DSP
Audio Weaver Designer A low code, drag and drop real-time interface with live module inspectors and over 400 different audio building blocks to integrateor create advanced audio features quickly.
TalkTo TalkTo, an Audio Front End, is the highest performing voice control system on the market. TalkTo detects and extracts faint voice commands in extremely noisy environments and passes AVS 2.1 premium and Google ART.

Sensory

Sensory’s TrulyHandsfree wake word and phrase spotting technology is known for fast response, low power consumption, and excellent performance from a distance or in noisy environments. This technology is an integral component for fully featured voice control of devices in the home, car, and anywhere voice user interfaces could be deployed. Sensory’s technology is complementary to front-end processing solutions from other partners, such as DSP Concepts. It is available for Arm Cortex M4/M33/M7 cores and also for Cadence Xtensa DSP cores. TrulyHandsfree is compatible with the Sensory VoiceHub, enabling developers to quickly build models for custom commands and wake words with text input.

Sensory partner profile