For best experience this site requires Javascript to be enabled. To enable on your browser, follow our accessibility instructions.

About NXP
Smarter World Blog
Keeping It Local: Bringing Generative AI to the Intelligent Edge

Keeping It Local: Bringing Generative AI to the Intelligent Edge

August 7, 2025
by Davis Sawyer

Generative AI is no longer confined to the cloud. With NXP’s eIQ® GenAI Flow , developers can now run large language models (LLMs)—like Llama and Qwen—directly on embedded edge devices securely, efficiently and close to the data. This paradigm shift unlocks new opportunities for real-time intelligence across industries, from automotive to industrial automation.

Built as a complete software deployment pipeline, eIQ GenAI Flow simplifies the once-daunting task of implementing generative AI models on power-and compute-constrained systems. It combines the latest model optimization techniques like quantization with hardware acceleration from NXP’s eIQ Neutron NPU to make GenAI practical and performant, right at the edge.

Smarter AI, Locally Deployed

At its core, GenAI Flow helps overcome the traditional barriers of running advanced models in embedded environments. The pipeline already enables today’s most powerful open language models, with support for multimodal and vision-language models (VLMs) soon. GenAI Flow provides the necessary optimizations out-of-the-box for real-time execution on application processors like the i.MX 95—the kind of performance needed for conversational AI, physical AI and more.

GenAI is moving from the cloud to the edge, so what does that mean for embedded developers? Learn more by listening to our EdgeVerse Techcast episode on Apple Podcasts , Spotify or YouTube .

By using accuracy-preserving quantization techniques such as integer 8 and 4 (INT8 and INT4) precision, we can fully leverage the neural processing unit (NPU) for inference acceleration. Using GenAI Flow dramatically improves response speed and power efficiency on-device. For example, time to first token (TTFT)—a key metric for any GenAI application—can be reduced from 9.6 seconds on an Arm Cortex CPU (Float32 precision) to less than 1 second on the Neutron NPU with INT8 quantization. This enables captivating, real-time AI experiences, without requiring power-hungry servers or cloud infrastructure.

Generative AI is driving innovations at the edge. GenAI Flow, included with NXP's eIQ Toolkit, makes enabling Gen AI at the edge simple and secure.

GenAI Flow also supports small language models (SLMs), which are lighter, yet still capable of delivering high-quality results. The pipeline offers flexible execution across central processing unit (CPU), NPU or a hybrid configuration, allowing developers to tune performance based on their specific product needs.

Adding Context with RAG

A defining feature of GenAI Flow is the built-in support for retrieval-augmentation generation (RAG) . This form of model fine-tuning allows LLMs to access domain-specific or private data sources—such as device and service manuals, internal PDFs and equipment maintenance logs—without having to retrain the original model. RAG injects the relevant external knowledge as a vector database stored on the edge device, enabling highly contextual, grounded responses that can eliminate an AI’s hallucination problem and prevent certain errors in judgement.

RAG is particularly powerful for edge use cases because all data processing happens locally. This protects sensitive information while delivering dynamic, on-demand AI responses. Developers can simply turn a new document into a highly compact, LLM-friendly database and the model immediately adopts the additional context, no retraining required! This efficiency alone can save millions of dollars and energy spent on numerous iterations of GenAI fine-tuning in data centers.

Real-World Impact: From Cars to Robots

GenAI Flow is already being used across multiple industries where low-latency performance and data privacy are critical.

In Automotive, AI-powered infotainment systems can respond to natural voice commands by referencing service manuals embedded in the vehicle. This creates a seamless, hands-free experience without the typical connectivity requirements.

In healthcare, touchless-AI interfaces equip clinicians to securely access procedure or patient data using voice prompts, which is an ideal solution for reducing physical contact and contamination risk in sensitive environments.

AICHI, the AI controller for health insights, securely collects and analyzes multimodal heath and other sensor data in real time, detecting early anomalies and enabling proactive, personalized care.

In mobile robotics, generative AI models interpret written instructions and visual inputs—using optical character recognition (OCR) and RAG—to take context-aware actions. These systems move beyond basic automation and into intelligent interaction between humans and environments.

This 3D perception sensor fusion demo showcases trusted spatial perception at the edge, operating in dynamic and uncertain environments.

In industrial automation, AI assistants help technicians troubleshoot machine issues using real-time sensor data and maintenance documentation: all processed locally, even in remote or low-bandwidth settings.

Across these scenarios, GenAI Flow offers developers a powerful and privacy-conscious framework for building intelligent edge solutions.

What’s Next for GenAI at the Edge?

The next evolution of GenAI at the edge is multimodal and agentic. Future systems will blend together voice, vision and language inputs to create richer, more intuitive user experiences. With GenAI Flow, this convergence is already underway, enabling unified edge pipelines that can reason and act from a combination of input types.

There’s also a strong focus on continuing to optimize edge AI performance—both in scaling up support for larger models and by making smaller models even faster. This includes advancements in quantization, execution flexibility and support for increasingly compact LLM architectures.

As AI systems become more adaptive and locally responsive, access to the best tooling becomes ever more critical. GenAI Flow is designed with scalability in mind, helping developers integrate today’s rapidly evolving AI capabilities into products across microprocessor unit (MPU) platforms and potentially even into future microcontroller unit (MCU)-class devices.

Tags: Technologies

Author

Davis Sawyer

AI Product Marketing Manager, NXP Semiconductors

Davis Sawyer is an AI Product Marketing Manager at NXP Semiconductors focused on SW tools and generative AI enablement on i.MX microprocessors. Based in Kanata, Canada, Davis also serves as the chair for the Edge AI Foundation’s “Industry” Working Group, dedicated to real-world solutions leveraging edge AI. Previously, he cofounded Deeplite, an AI model compression startup acquired in 2025. Davis loves to build interdisciplinary products and working with brilliant, kind people.