For best experience this site requires Javascript to be enabled. To enable on your browser, follow our accessibility instructions.

About NXP
Smarter World Videos
Large Language and Vision Assistant Demo

Large Language and Vision Assistant Demo

This video demonstrates how NXP’s i.MX95 and i.MX 8M Plus processors work with LLaVA—a large language and vision assistant—and the Ara240 neural processing unit (NPU) to analyze images and video using AI. LLaVA is an open-source, multi-modal model that combines language and vision capabilities, built on OpenAI’s CLIP and Meta’s Llama 3.1 8B LLM. You will see this model describe scenes from video feeds or stored images. The demo walks through the complete process of image and video analysis at the edge.

Use cases include static images, live camera feeds and short video clips. These capabilities lay the foundation for smarter autonomous systems and open up endless possibilities for generative AI, such as enabling devices to act on what they “see”.

Explore NXP's artificial intelligence (AI) solutions and use cases across automotive, industrial and consumer applications.

Highlights

Discover Gen-AI solutions running at the edge in a secure, low-latency, cost-effective and power-efficient way
See how the i.MX 8M Plus Application Processor with Ara240 NPU (capable of up to 40 eTOPS) powers the autonomous vision process
Learn about the flexible architecture to run multimodal or unimodal models
Understand the integrated and efficient voice and vision pipeline

Resources

MoreLess