INTRODUCTION TO QorIQ LS2088A AND LS1088A PROCESSORS DAVID ROSADO, ANJU BHARTIYA, CHUN CHANG FTF-NET-N1881 MAY 18, 2016 ### **AGENDA** - The 1-2 Punch! - LS2088A at a Glance - LS1088A at a Glance - Applications - Summary ### **Expanding ARM-based Portfolio** - Comprehensive portfolio of 32-bit/64-bit ARM processor - Pin compatible two to 8-core devices - A53 and A72 solutions for scalable performance/watt - Integrated Pixel processing and network/data processing - Video transcoding - Advanced I/O Processor (AIOP) for programmable packet forwarding - Single-core 64-bit ARM - Less then 1W to support battery powered applications - PCIe, USB3.0, SATA3.0, 2.5GbE - Enables Low-cost 4-layer PCB - Small footprint 10mmx10mm #### **NXP Provides the 1-2 Punch!** ## SEGMENTATION ### Segments: Branch Office/Service Provider Router - Four to Eight-cores ARMv8 A53/A72 - Support for Secure and Protected Deployment of Services - Unified standard API for hardware abstraction simplifies software development ### **Segments: PCIe SR-IOV End Point** Use Case: LS1088 as services card, Converged Network Adapter, "smart NIC". Single Management physical or virtual machine on host handles end-point configuration. Each Virtual Machine running on Host thinks it has a private version of the services card. Translation agent (in host or chipset) performs PAMU like address translation on behalf of the VFs. LS2088A or LS1088A ### **Segments: I/O Virtualization and Resource Management** ### Segments SDN White Box Switch Solution Platforms Planning ToR/Edge Switch Box ONIE Support Traditional Mgmt Stack Ethernet Switch 48-Port Switch (L2-3+ Switch) PCle QorlQ P2041 Mgmt CPU **Classical Ethernet Switch** Established Legacy Embedded Switch ToR/Edge Switch Box with Extended SDN Services Data Plane offload via 10GE/HiGig+ Hybrid Traditional L2/L3 & SDN support **ONIE Support** 64-bit ARMv8 Dedicated AIOP Data Path Processing Upgrade Path 2015 to 2016 Upgrade 40G to 100G SDN Solution Switch Platform Upgrade Path 2016 QorlQ LS2088A ARMv8 (OF-Agent / L4-7 NSS Software) (Dedicated AIOP Packet Processor) 2-4x10G **PCle** AIOP Packet Engine SDN Switch supplement 100G Ethernet Switch 48-Port Switch Tomahawk (L2-3+ Switch) **Network Service Switch (NSS) with L4/L7 SDN Support** NXP SDN White Box Application Switch ## Segments: LS2088A based AS7700-32X High-performance Data Center Switch #### System LEDS - Diagnostic, Locator, PSU & Fan Status - Link Status, Activity, Rate #### Thirty two QSFP28 ports - Capable of operating at 32x 100G /50G/40G/10G Ethernet with standard QSFP28/QSFP+ - modules and/or appropriate break out cables #### **RJ45 RS232 UART** Management port Supports asynchronous mode with the default being eight data bits, one stop bit, no parity #### RJ45 10/100/1000 Ethernet management port Connected directly to the system LS2088 host CPU #### Accton Making Partnership Work | Dimensions | Inches | |------------|--------| | Length | 20.27 | | Width | 17.26 | | Height | 1.71 | Accton AS7700-32X is a Bare-Metal hardware switch pre-loaded with diagnostic and with Open Network Install Environment (ONIE) for automated loading of compatible independent Networking and Switch-OS software # PRODUCT PERFORMANCE ### LS2088A 2 Has Taped Out! ### Core Complex upgraded to ARM A72 - July 2016 for Revision 2 samples - CPU Maximum Frequency 2000MHz - DDR Maximum Frequency 2133MHz - Platform Frequency 800MHz (AIOP frequency) - Adopting ARM's A72 (Maia) cores without schedule impact - Production 3Q16 - Given the core change, revision 2 part numbers will change as follows: - LS2088A and LS2048A feature AIOP, L2 switch and 8/4 - A72 cores respectively - LS2084A and LS2044A feature A72 cores (no AIOP, no L2 switch) ### Performance Improvement A72 relative to A57 - 20% performance improvement! - 10% less power! ### **Highest Performance Cortex-A72 ARMv8 Processor** ### Highest single-threaded performance - Lower power enabling maximum performance in thermal limit - Large performance increase across integer, memory-streaming, float - Significant advancements in power efficiency - -17.4% power reduction from Cortex-A57 - -6% ~ 10% cluster area reduction lowers static power - Enhanced multi-core scalability - Larger L2 for optimizing SDN/NFV applications ### Performance Improvement A72 relative to A57 L2 cache in LS2088A set to 512KB Improved performance in floating point, branch prediction and data pre-fetch) ### LS2088 Performance: GPP and Peripherals | GPP - General | LS2085<br>@2GHz | Improvements in LS2088A | LS2088 @2GHz | |------------------------------------------------|--------------------------|--------------------------------------------------------------------------------------------------|---------------------------------| | CoreMark / MHz<br>(gcc 4.9 – O3 - vanilla) | 3.97 | A57 to A72 Core micro- | <b>4.46</b> ( up 12% ) | | Composite CoreMark<br>(gcc 4.9 – O3 - vanilla) | 69,920 | architecture<br>improvements | <b>77,760</b> (up 11% ) | | <b>SpecINT2006</b> (gcc 4.9 – O3 – optimized) | 12.2 | <ul> <li>A72 Core micro-<br/>architecture, better<br/>hardware pre-fetch</li> </ul> | <b>14</b><br>(up 16% ) | | SpecINT2006-Rate (gcc 4.9 – O3 – optimized) | 74.4 | | <b>82</b><br>(up 10% ) | | GPP Latency (depload) | 107 ns | | <b>101.5 ns</b> (better by 5% ) | | <b>DDR BW</b> seq. reads (rd64) | 69% | <ul><li>Larger ARM FEQ Buffer</li><li>DDR Hashing fix</li><li>ARM Write-stream fix</li></ul> | <b>78%</b><br>(up 13% ) | | Bare-Metal LMBench<br>(8-cores active) | <b>1</b><br>(normalized) | | <b>1.22</b><br>(up 22% ) | | $PCle \leftrightarrow DDR$ | ~4GB/s copy | <ul><li>DPAA-NoC bottleneck fix</li><li>QDMA multi-thread fix,</li><li>QDMA IP enhance</li></ul> | 8 GB/s copy | | $QDMA \leftrightarrow DDR$ | | | | LS2088A A Better General Purpose Processor and Data Mover ### LS2088A Performance: GPP Application SW | GPP - Packet Processing | LS2085 @ 2GHz | Improvements<br>LS2085A to<br>LS2088A | LS2088 @ 2GHz | |---------------------------------|----------------------------------------------|-------------------------------------------------------------------------------------------|----------------------------| | GPP Packet Reflector PP<br>@64B | 24 Gbps ARM / Platform | | Better than | | GPP Simple IPFwd PP<br>@64B | <b>22.5 Gbps</b> (28 flows) | configuration optimization | 30Gbps | | GPP IPSec App @ 390B | 13.3 Gbps (600MHz platform) • Increased SoC | | Better than 15.7<br>Gbps + | | GPP IPSec App @ 1420B | 18.1 Gbps<br>(600MHz platform) | BW throughput capability | Better than<br>18.GBps | | GPP IPSec App @ 64B | <b>6.8 Mpps</b> (1.8GHz core) | <ul> <li>BW smoothing<br/>&amp; Performance<br/>improvements<br/>to SEC engine</li> </ul> | Better than 7.5<br>Mpps | **GPP Complex exhibits better Application Software Performance in LS2088A** ### LS2088A Performance Improvements: Offload (AIOP) | AIOP - Packet<br>Processing | LS2085A | Improve: LS2085A to LS2088A | LS2088A | |-----------------------------------|-----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------| | AIOP Packet Reflector<br>PP @ 64B | 15 Gbps | Bottleneck Fixes | <b>21.2Gbps</b> (up 40%) | | AIOP Simple IPFwd PP<br>@ 64B | 15 Gbps | System hardware enhancements in packet express buffer | <b>20Gbps</b> (up 33% ) | | AIOP Complex Fwd PP<br>(@128B | 17 Gbps<br>(L3-cache required) | <ul><li>Bottleneck fixes</li><li>SSRAM latency improvements</li></ul> | <b>23Gbps</b><br>(up 35% )<br>(no L3-Cache) | | AIOP Netflow PP @ 128B | 17Mpps | Bottleneck Fix | <b>24.5Mpps</b> @ 64B (up >45%) | | AIOP Simple IPSec PP<br>@ 390B | 17.6Gbps<br>(without scatter<br>gather) | <ul> <li>Bottleneck fixes</li> <li>Added Scatter Gather</li> <li>Increased SoC BW throughput capability</li> <li>Improvements to SEC bandwidth</li> </ul> | Better than 18Gbps (and with scatter gather) | | AIOP Simple IPSec PP<br>@ 1420B | 23.7Gbps | | Better than 24Gbps | | AIOP Simple IPSec PP<br>@ 64B | 8.5Mpps (without scatter gather) | | Better than 9Mpps<br>(and with scatter<br>gather) | Significant Improvements in Offload Capabilities for the LS 2088A # SOFTWARE ENABLEMENT ### **Software Development Kit** - Rich set of drivers and middleware. - Easy to use - NW object model abstracts data-path HW programming. - Centralized resource management. - Provides choice of 3 programming environments - Linux kernel space - Linux user-space - AIOP - Complete AIOP programmability - Drivers, debug tools, sample apps, libraries. - Standard API - Linux sys-call, sockets - ODP API - Linaro active member - aarch64 support - ODP definition - Best performance out-of-the-box - Simple/Complex IP-Fwd - IPSec, NAPT, Firewall. - AIOP & GPP ### **Application Development Kits** - Designed for Use-case/Applications support - SMB/Enterprise IPSec, NAPT/FW, IPS - SDN OF-Switch + Controller - Switch supplement NetFlow, E-OAM offloads - Data-center SSL+TCP termination, Intelligent-NIC - WLAN CAPWAP/DTLS - NFV (virtual edge) vAccess, vCPE - More .... - A Component to a complete solution - Rather, an optimized toolkit - Customers expected to integrate with own solution. - NXP offers customization services - General strategy - Separate data-path & control-path components - Different offerings for customers with/without own control-path. - Can have data-path in GPP or AIOP or hybrid - Standard, consistent API for to allow customers to use their own applications. - **Business Model** - See ADK Business Model slide for details. \*\* e.g. – SMB/Enterprise ### LS1088A/84A/48A Block Diagram #### **Device** - 28HPM Process - FCBGA, 0.8mm pitch #### Power target • <20W #### Schedule • Samples: August '16 • Production: 4Q16 #### Security - Hardware Encryption (IPSec) - Secure Boot - Trust Zone & Trust Architecture - MACSec support #### **Performance** IPSec: 10 Gbps (IMIX) IPv4: 10 Gbps (IMIX) #### **General Purpose Processing Layer** - 4 or 8 x ARM A53 CPUs, 64b, 1.5GHz - 1MB L2 cache / cluster - HW L1 & L2 Prefetch Engines - Neon SIMD in all CPUs #### **Memory Subsystem** • 64b DDR4 up to 2.1GT/s #### **CCI-400 Switch Fabric** Advanced VM hardware support #### Advanced I/O Processor Programmable packet handling #### High Speed I/O - 3x PCIe Gen3 controllers - SATA 3.0, 2 x USB 3.0 with PHYs #### Network I/O - 2x1/10GbE + 8x1G - XFI/KR and SGMII/KX - MACSec on up to 4x 1GbE - uQE for HDLC, T1/E1 support #### Industrial connectivity Ethernet, Serial (RS485/422), uQE (for additional serial fieldbus apps) ### At a Glance QorlQ LS2088A Platform - General Purpose Processing - 8x ARM A72 CPUs, 64b, 2.0GHz - 1MB L2 cache - HW L1 & L2 Prefetch Engines - Neon SIMD in all CPUs - 1MB L3 platform cache w/ECC - 4MB Coherent Cache - 2x64b DDR4 up to 2.1GT/s - Accelerated Packet Processing - 20Gbps SEC- crypto acceleration - 10Gbps Pattern Match/RegEx - 20Gbps Data Compression Engine - Express Packet IO - Supports1x8, 4x4, 4x2, 4x1 PCIe Gen3 controllers - SR-IOV support, End Point - 2 x SATA 3.0. 2 x USB 3.0 with PHY - Network IO - Wire Rate IO Processor: - 8x1/10GbE + 8x1G - XAUI/XFI/KR and SGMII - MACSec on up to 4x 1/10GbE - Layer 2 Switch Assist #### **Datapath Acceleration** - · SEC- crypto acceleration - · DCE Data Compression Engine - PME Pattern Matching Engine - L2 Switching -- via Datapath Acceleration Hardware - Management Complex Configuration Abstraction #### Other Parametrics - 37.5x37.5 Flipchip - 1mm Pitch - 1292pins **Full Featured Highly Flexible Platform** 4-8 A72 Cores ### Summary - Leading value for 64-bit ARM - ARM A57, ARM A72 - Readiness - Hardware, Software Available Today - Production Target - -LS2080A in production - -LS2088A on target ## SECURE CONNECTIONS FOR A SMARTER WORLD #### ATTRIBUTION STATEMENT NXP, the NXP logo, NXP SECURE CONNECTIONS FOR A SMARTER WORLD, CoolFlux, EMBRACE, GREENCHIP, HITAG, I2C BUS, ICODE, JCOP, LIFE VIBES, MIFARE, MIFARE Classic, MIFARE DESFire, MIFARE Plus, MIFARE Flex, MANTIS, MIFARE ULTRALIGHT, MIFARE4MOBILE, MIGLO, NTAG, ROADLINK, SMARTLX, SMARTMX, STARPLUG, TOPFET, TrenchMOS, UCODE, Freescale, the Freescale logo, AltiVec, C 5, CodeTEST, CodeWarrior, ColdFire+, C Ware, the Energy Efficient Solutions logo, Kinetis, Layerscape, MagniV, mobileGT, PEG, PowerQUICC, Processor Expert, QorlQ, QorlQ Qonverge, Ready Play, SafeAssure, the SafeAssure logo, StarCore, Symphony, VortiQa, Vybrid, Airfast, BeeKit, BeeStack, CoreNet, Flexis, MXC, Platform in a Package, QUICC Engine, SMARTMOS, Tower, TurboLink, and UMEMS are trademarks of NXP B.V. All other product or service names are the property of their respective owners. ARM, AMBA, ARM Powered, Artisan, Cortex, Jazelle, Keil, SecurCore, Thumb, TrustZone, and µVision are registered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. ARM7, ARM9, ARM11, big.LITTLE, CoreLink, CoreSight, DesignStart, Mali, mbed, NEON, POP, Sensinode, Socrates, ULINK and Versatile are trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. The Power Architecture and Power.org word marks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. © 2015–2016 NXP B.V.