Enabling Software Technologies for Voice Command Processing on NXP Arm® Cortex® M-Based MCUs


Roll over image to zoom in



  • Complete, turn-key reference designs
  • Fully-functional example applications make evaluation fast and easy
  • Solutions from leading front end and voice recognition specialist partners
  • ML enablement for customers wishing to develop in-house solutions
  • Support on several i.MX RT processors

Libraries and examples provided through MCUXpresso SDK

NXP EdgeReady MCU-based solution for Amazon's Alexa Voice Service (AVS) integration for AWS IoT Core leverages the i.MX RT crossover MCU, enabling developers to quickly and easily add Alexa voice assistant capabilities to their products. This ultra-small form-factor, turnkey hardware design comes completely integrated with Amazon qualified software for an out-of-the-box AVS experience, enabling voice control with fastest time to market with no prior knowledge of voice control required. Please follow the link above for more information.

NXP has developed Voice Intelligent Technology (VIT) to enable voice command recognition at no cost and without the need for lengthy and expensive training. Available premium or partner solutions include the ability to combine voice recognition with audio playback solutions, are compatible with MCUXpresso SDK and have demonstrations available to enable in-depth evaluation for your design.

Turnkey AVS Solution

  • i.MX RT106A based – includes license to use all 3rd party software
  • Amazon AVS qualified
  • Supports up to 3 microphones
  • Wi-Fi and BLE enabled
  • Machine Learning far field audio front end
  • No voice or audio expertise required

Turnkey Local Voice Solution

  • i.MX RT106L based – includes license to use all 3rd party software
  • Supports up to 3 microphones
  • Wi-Fi® and Bluetooth® LE enabled
  • Machine Learning far field audio front end
  • Industry leading phrase spotting speech recognition engine, supports up to 30 custom commands in more than 60 languages.
  • No voice or audio expertise required

Voice Intelligent Technology (VIT)

  • Always-on technology
  • Wake word model creation with Text to Model (no audio database required)
  • Up to 3 Wake words supported in parallel
  • Custom commands using Text to Model
  • Large vocabulary available for Text to Model
  • Far-field audio front end supporting different microphones topologies (no tuning required) - up to 3 microphones supported
  • VAD (voice activity detection) helping minimize processing load during silent, non-speech period
  • English and Mandarin language support
  • German, Spanish and Japanese available on request (local-commands@nxp.com)
  • VIT model generation tool available online at vit.nxp.com
  • i.MX RT500, i.MX RT600, i.MX RT1050 i.MX RT1060, i.MX RT1160 and i.MX RT1170 supported

Partner Audio Solutions

NXP has partnered with leaders in the area of audio technology to enable a range of specialized, high performance solutions on a range of NXP MCUs based on Arm® Cortex®-M cores.

DSP Concepts

Create, tune, and productize audio features with Audio Weaver Designer, a low code, drag and drop real-time interface with live module inspectors and over 400 different audio building blocks. Deploy highly optimized audio with AWE Core.

Products available from DSP Concepts for NXP Arm®Cortex®-M microcontrollers are shown below. DSP Concepts products also support several of NXP's i.MX applications processors.

Product Overview i.MX RT600 Arm Cortex M33 plus Cadence Xtensa Hifi4 DSP i.MX RT 1xxx Arm Cortex M7
Voice Enhancement Package (VEP) Voice Enhancement Package (VEP) is a suite of real-time software DSP technologies designed for improving speech recognition performance in voice-controlled multimedia devices.

Audio Weaver Designer A low code, drag and drop real-time interface with live module inspectors and over 400 different audio building blocks to integrateor create advanced audio features quickly.

TalkTo TalkTo, an Audio Front End, is the highest performing voice control system on the market. TalkTo detects and extracts faint voice commands in extremely noisy environments and passes AVS 2.1 premium and Google ART.


4 trainings


What do you need help with?