Sign in to access this content and additional site features.
This video demonstrates how NXP’s i.MX95 and i.MX 8M Plus processors work with LLaVA—a large language and vision assistant—and the Ara-2 neural processing unit (NPU) to analyze images and video using AI. LLaVA is an open-source, multi-modal model that combines language and vision capabilities, built on OpenAI’s CLIP and Meta’s Llama 3.1 8B LLM. You will see this model describe scenes from video feeds or stored images. The demo walks through the complete process of image and video analysis at the edge.
Use cases include static images, live camera feeds and short video clips. These capabilities lay the foundation for smarter autonomous systems and open up endless possibilities for generative AI, such as enabling devices to act on what they “see”.
Explore NXP's artificial intelligence (AI) solutions and use cases across automotive, industrial and consumer applications.