Generative AI is no longer confined to the cloud. With NXP’s eIQ® GenAI Flow , developers can now run large language models (LLMs)—like Llama
and Qwen—directly on embedded edge devices securely, efficiently and close to the data. This paradigm shift unlocks
new opportunities for real-time intelligence across industries, from automotive to industrial automation.
Built as a complete software deployment pipeline, eIQ GenAI Flow simplifies the once-daunting task of implementing
generative AI models on power-and compute-constrained systems. It combines the latest model optimization techniques
like quantization with hardware acceleration from NXP’s eIQ Neutron
NPU to make GenAI practical and performant, right at the edge.
Smarter AI, Locally Deployed
At its core, GenAI Flow helps overcome the traditional barriers of running advanced models in embedded environments.
The pipeline already enables today’s most powerful open language models, with support for multimodal and
vision-language models (VLMs) soon. GenAI Flow provides the necessary optimizations out-of-the-box for real-time
execution on application processors like the i.MX 95—the kind of
performance needed for conversational AI, physical AI and more.
GenAI is moving from the cloud to the edge, so what does that mean for embedded developers?
Learn more by listening to our EdgeVerse Techcast episode on Apple Podcasts , Spotify or YouTube .
By using accuracy-preserving quantization techniques such as integer 8 and 4 (INT8 and INT4) precision, we can fully
leverage the neural processing unit (NPU) for inference acceleration. Using GenAI Flow dramatically improves
response speed and power efficiency on-device. For example, time to first token (TTFT)—a key metric for any GenAI
application—can be reduced from 9.6 seconds on an Arm Cortex CPU (Float32 precision) to less than 1 second on the
Neutron NPU with INT8 quantization. This enables captivating, real-time AI experiences, without requiring
power-hungry servers or cloud infrastructure.
Generative AI is driving innovations at the edge. GenAI Flow, included with NXP's
eIQ Toolkit, makes enabling Gen AI at the edge simple and secure.
GenAI Flow also supports small language models (SLMs), which are lighter, yet still capable of delivering
high-quality results. The pipeline offers flexible execution across central processing unit (CPU), NPU or a hybrid
configuration, allowing developers to tune performance based on their specific product needs.
Adding Context with RAG
A defining feature of GenAI Flow is the built-in support for retrieval-augmentation generation (RAG) . This form of model fine-tuning
allows LLMs to access domain-specific or private data sources—such as device and service manuals, internal PDFs and
equipment maintenance logs—without having to retrain the original model. RAG injects the relevant external knowledge
as a vector database stored on the edge device, enabling highly contextual, grounded responses that can eliminate an
AI’s hallucination problem and prevent certain errors in judgement.
RAG is particularly powerful for edge use cases because all data processing happens locally. This protects sensitive
information while delivering dynamic, on-demand AI responses. Developers can simply turn a new document into a
highly compact, LLM-friendly database and the model immediately adopts the additional context, no retraining
required! This efficiency alone can save millions of dollars and energy spent on numerous iterations of GenAI
fine-tuning in data centers.
Real-World Impact: From Cars to Robots
GenAI Flow is already being used across multiple industries where low-latency performance and data privacy are
critical.
In Automotive, AI-powered infotainment systems can respond to
natural voice commands by referencing service manuals embedded in the vehicle. This creates a seamless, hands-free
experience without the typical connectivity requirements.
In healthcare, touchless-AI interfaces equip clinicians to securely access procedure or patient data using voice
prompts, which is an ideal solution for reducing physical contact and contamination risk in sensitive environments.
AICHI, the AI controller for health insights, securely collects and analyzes
multimodal heath and other sensor data in real time, detecting early anomalies and enabling proactive,
personalized care.
In mobile robotics, generative AI models interpret written instructions and visual inputs—using optical character
recognition (OCR) and RAG—to take context-aware actions. These systems move beyond basic automation and into
intelligent interaction between humans and environments.
This 3D perception sensor fusion demo showcases trusted spatial perception at the
edge, operating in dynamic and uncertain environments.
In industrial automation, AI assistants help technicians troubleshoot machine issues using real-time sensor data and
maintenance documentation: all processed locally, even in remote or low-bandwidth settings.
Across these scenarios, GenAI Flow offers developers a powerful and privacy-conscious framework for building
intelligent edge solutions.
What’s Next for GenAI at the Edge?
The next evolution of GenAI at the edge is multimodal and agentic. Future systems will blend together voice, vision
and language inputs to create richer, more intuitive user experiences. With GenAI Flow, this convergence is already
underway, enabling unified edge pipelines that can reason and act from a combination of input types.
There’s also a strong focus on continuing to optimize edge AI performance—both in scaling up support for larger
models and by making smaller models even faster. This includes advancements in quantization, execution flexibility
and support for increasingly compact LLM architectures.
As AI systems become more adaptive and locally responsive, access to the best tooling becomes ever more critical.
GenAI Flow is designed with scalability in mind, helping developers integrate today’s rapidly evolving AI
capabilities into products across microprocessor unit (MPU) platforms and potentially even into future
microcontroller unit (MCU)-class devices.