I recently got into a discussion of which graphics card to pair with NXP’s Layerscape LX2160A processor. You can imagine how odd that felt. Like most other Layerscape processors, the LX2160A processor targets headless embedded systems. It’s not for computers, with their brawny CPUs and slick user interfaces. True, it scores highly on general benchmarks, but its distinguishing attributes are prodigious network acceleration and I/O performance. It’s made for designs like wireless transport cards and industrial controllers. Why hook up an LX2 chip to a GPU?
The answer is that LX2 is strong enough for a computer but made for demanding embedded applications. The LX2 excels at general-purpose computation by dint of its 16 CPUs. Licensed from Arm®, these Cortex®-A72 CPUs are used in other Layerscape processors and other companies’ ASICs. The latter include Tesla’s “FSD computer”, which has 12 of these CPUs and Amazon Web Services’s Graviton, which has 16 like the LX2. Our analysis shows the 16-core LX2 performing comparably to a 16-thread/8-core processor from a competing architecture typically targeted at PCs and servers.
Applications demanding this kind of performance span a wide gamut. Some are data-plane functions in the types of communication equipment that Layerscape and its Power Architecture® predecessors have long targeted: transport cards in base stations, smart NICs for data-center servers and Layer-2/Layer-3 accelerators for routers. These applications benefit from the LX2’s multitude of cores and the connectivity and crypto acceleration built into every Layerscape processor.
NXP also has long targeted functions within communication equipment that benefits especially from high single-thread performance. Here, too, LX2 shines–but in this case for its speedy CPUs that carve through layers of software. Network function virtualization (NFV), for example, encapsulates software inside a virtual machine that previously ran on standalone physical hardware. Although virtualization originated with servers and workstations, it operates efficiently on NXP processors thanks to the virtualization capabilities of our CPU cores and SoC chassis that we have been designing in since the QorIQ P Series in 2008.
Various factors, nonetheless, conspire to absorb CPU cycles in typical NFV systems. The old physical systems were purpose-built hardware with lean system software. Virtualized network functions (VNFs), however, execute on general-purpose, computer-like hardware and system software, which is intrinsically less efficient. These platforms, moreover, host multiple VNFs that communicate with each other via virtual switches–Ethernet switches implemented in software. Libraries such as the Data Plane Development Kit (DPDK) mitigate operating-system overhead but rely on speedy cores to run tasks to completion. The LX2 has the requisite cores and NXP invests in optimizing networking and virtualization software, such as DPDK and the Linux Kernel Virtual Machine (KVM). NXP also invests in containerization, an alternative to virtualization with lower overhead.
The most important software, however, is Linux. Yes, NXP still collaborates with embedded OS specialists, such as Wind River. For customers wanting an open-source embedded experience, NXP offers a Yocto embedded Linux development environment. However, many developers, especially, but not exclusively, those with an IT background, prefer an environment like that on their computers. A popular choice is NXP’s Layerscape Development Kit (LSDK). It is a reference integration, an example of how a customer can meld numerous components into a system image or personalized Linux distribution.
To construct the LSDK, NXP starts by modifying numerous open-source boot loaders, kernels, libraries and tools to work with a particular device, such as LX2. We submit these modifications as patches to the associated open-source projects. Once upstreamed, the LSDK components are available from public repositories, such as kernel.org and github. NXP keeps these patches updated, periodically releasing new versions and supporting two recent long-term-support (LTS) kernels. Developers can select the components they need for their own integration. No massive ISO file or tarball must be downloaded.
The LSDK reference integration conveniently includes a user land—the files and folder hierarchy—derived from the popular Ubuntu distribution. This provides the developer a familiar operating environment. Importantly, it also provides easy access to a vast library of binary software. These applications run without a hitch because LX2 and other Layerscape processors use standard Arm cores, maximizing compatibility. Other popular distributions also support 64-bit Arm processors and developers building their own distribution with LSDK can adapt these user lands instead of the Ubuntu example we provide. Figure 1 is a screenshot of a Linux desktop on LX2.
Figure 1: Screenshot of a Linux Desktop Running on the Layerscape LX2160A Processor
The performance and software ecosystem of LX2 is pulling it into markets beyond communication. High-end assisted driving automation systems (ADAS) have often thrown server hardware at the problem of integrating the information extracted from cameras, lidar and other sensors by artificial intelligence (AI) hardware. Such hardware is bulky, expensive and hot. Offering comparable performance and better integration and coming from a company with a track record of meeting automakers’ demands for supply longevity, safety and tolerance of harsh environments, LX2 is an appealing alternative. We’re engaged with ADAS and self-driving efforts with major companies worldwide. Similarly, customers are using LX2 in industrial machine vision, aerospace and numerically controlled cutting machines.
Even within communication, LX2 can address workloads heretofore not run on a general-purpose processor. For example, 5G cellular standards allow for various functional splits, including implementing upper PHY functions in a system called a distributed unit (DU) instead of the channel card of a traditional macro base station. These functions, which include channel coding, bit manipulation, channel estimation, equalization and precoding, can run in software on a general-purpose processor. That processor, however, must be able to crunch a lot of data. Our analysis shows that the LX2 is up to the task.
To understand how LX2 delivers such competitive performance, let’s look under the hood at some of the blocks that contribute to its performance on general computational workloads. It has two ECC-protected 64-bit interfaces to DDR4 memory, as the block diagram in Figure 2 shows. Although this is fewer than server processors have, the LX2 operates its DDR4 interfaces at up to 3200 GT/s. This is about 50% faster than other processors on the market. The result is a balance of good memory throughput and the lower cost associated with fewer wide DDR ports.
Figure 2: NXP LX2160A Processor Block Diagram
Backing this up is 8MB of platform cache which buffers off-chip memory from accesses by CPU cores, network accelerators and I/O controllers. A 2MB packet express buffer is also on chip to minimize DRAM access by the integrated Ethernet switch. Adjacent to each pair of CPUs is 1MB of Level 2 cache. This is the same amount per core as Graviton, which has no L3 cache and greater than many compute processors. Being quicker to access than off-chip memory, caches are essential to feeding instructions and data to CPUs.
As noted above, the LX2 uses Arm Cortex-A72 CPUs. They’re among the company’s “big” A-series cores. Some Layerscape use the “little” Arm Cortex-A53 CPUs. The two are compatible, implementing the same version of Arm’s 64-bit instruction set. At the same clock rate, the A72 is about twice as fast. It can decode three instructions per cycle and can execute them out of order in one of eight pipelines. The A53 decodes and executes only two instructions at a time and does so to reduce power and cost. The performance-oriented A72 also has faster floating-point units and wider Neon SIMD execution units, enhancing its ability to carve up math-intensive computational workloads, such as upper PHY functions in a wireless DU. Each of the A72’s two Neon units can compute two complex 16-bit multiply-accumulate operations per cycle.
In summary, the NXP Layerscape LX2160A processor excels at computing. We had a lead with a customer looking to replace a PC processor in their embedded system. They wanted comparable performance and didn’t want to give up their software ecosystem. The LX2 fit the bill. An engineer had a standard GPU card lying around and plugged it into a system’s PCIe slot. Thanks to the open-source community and LSDK, software installation was also a snap. Designed for the environmental challenges of embedded systems and optimized for communications applications, the LX2 is also shines at computational workloads. Perhaps NXP has found its new advertising slogan: Strong enough for computing but made to be embedded.
For more information regarding the product and the related products, please visit www.nxp.com/LX2160A