Software ISP Application Note

1. Introduction

This document describes the software-based image signal processing application (SW-ISP) and its implementation on the i.MX8 family device System-on-Chip[^1] (SoC) processors. The application is used for image processing in i.MX8 family device. This pipelined image processing engine is optimized by on-chip GPU. It includes functions of Bad Pixel Correction, White Balance, Histogram Equalization, High-Quality Demosaicing, and High-Quality Noise Reduction. Its purpose is to use on-chip GPU to implement ISP function with a high-speed image processing.
2. Software theory

From the software point of view, the full SW-ISP application consists of these parts:

- Linux OS: The SD card image is created by using the Yocto Project\textsuperscript{[1]}.
- SW-ISP pipeline optimized by OpenCL1.2\textsuperscript{[2]}.
- SW-ISP pipeline optimized by OpenVX1.1\textsuperscript{[2]}.

SW-ISP pipeline includes five functions of image processing. Bad Pixel Correction algorithm\textsuperscript{[3]}, White Balance algorithm\textsuperscript{[4]} , Histogram Equalization algorithm\textsuperscript{[5]} and High-Quality Demosaicing algorithm\textsuperscript{[6]} are applied into Bayer image\textsuperscript{[1]}, while High-Quality Noise Reduction algorithm\textsuperscript{[7]} is applied into Y-channel alone. The High-Quality Noise Reduction process consists of following steps:

1) The RGBA image is converted to YUV image\textsuperscript{[1]}.
2) Apply bilateral filtering\textsuperscript{[7]} to Y-Channel to reduce noise of image.
3) Filtered Y-Channel image is merged into UV-Channels to covert to RGBA image.

In both OpenCL and OpenVX pipelines, input raw images captured by Bayer camera (Used color filter array over Bayer camera is shown in Figure 2) are read from file. The raw values would be uploaded to a GPU as a grayscale texture.

Every function in SW-ISP pipeline is independent, so they can be combined willfully. All the functions except High-Quality Noise Reduction are enable in both SW-ISP OpenCL pipeline and SW-ISP OpenVX pipeline. That’s because the kernels running on GPU are compiled and built online and High-Quality Noise Reduction is compiled on the GPU hardware.
Quality Noise Reduction is so complex that it takes long time in OpenCL/OpenVX pipeline when compiling the kernel. This function can be enable by command line “--Enable”.

![Bayer color filter array](image)

**Figure 2. Bayer color filter array**

**NOTE**

The color filter array is blue-green-green-red (BG/GR). It is 50% green, 25% blue, and 25% red.

*Figure 3* shows effects of each algorithm:

![Comparison of algorithms applied to an image](image)

**Figure 3. Comparison of algorithms applied to an image.**

**NOTE**

Because the currently released software ISP works in the demo mode only, the input array of Bayer image is BG/GR mode.
2.1. SW-ISP Pipeline Optimized by OpenCL

In OpenCL pipeline, combine different functions of SW-ISP by setting kernel arguments. The detailed directory structure of the SW-ISP of OpenCL application is shown in Figure 4:

```
SoftISP
    | build
    |   Yocto
    | Content
    |     bayer.data
    |     isp_kernel.cl
    |     License.json
    | Example.jpg
    | Fsl.gen
    | GNUmakefile_Yocto
    | README.md
    | SoftISP
        | source
        |     SoftISP.cpp
        |     SoftISP.hpp
        |     SoftISP_Register.cpp
```

Figure 4. Directory structure of the SW-ISP of OpenCL application

The root directory is called SoftISP under OpenCL. There are three subdirectories and an executable binary file in the root directory:

- Executable binary “SoftISP” is in folder of SoftISP.
- Build - contains OBJ files generated by the SW-ISP.
- Content - contains the prepared raw images (“bayer.data”) in the *.data format. The raw image was captured by Bayer camera (resolution: 1920 x 1080, Bayer array: BG/GR). The application can read raw data by the filename of “bayer.data”.
- Source - contains source code running on CPU and GPU. “isp_kernel.cl” is C code of kernels.

2.2. SW-ISP Pipeline Optimized by OpenVX

In OpenVX pipeline, combination of different functions are controlled by graphs in OpenVX. The detailed directory structure of the SW-ISP of OpenVX application is shown in Figure 5:
Figure 5. Directory structure of the SW-ISP of OpenVX application

The root directory is called SoftISP under OpenVX. There are three subdirectories and an executable binary file in the root directory:

- Executable binary “SoftISP” is in folder of SoftISP.
- Build - contains OBJ files generated by the SW-ISP.
- Content - contains the prepared raw images (“bayer.data”) in the *.data format. The raw image was captured by Bayer camera (resolution: 1920 x 1080, Bayer array: BG/GR). The application can read raw data by the filename of “bayer.data”. “cl_viv_vx_ext.h” is header of OpenVX API used in i.MX8.
- Source - contains source code running on CPU and GPU. “Kernels_VXC.hpp” is code of kernels.

3. Performance measurement for i.MX8 Series applications processors

This section describes the profiling results of the SW-ISP application. Two implementation versions are available: first one based on OpenCL framework and second one based on OpenVX platform.

The measurements were performed on all families of the i.MX8 Series Application Processors with OpenCL support. The boards used for the measurements are: i.MX8QuadMax, i.MX8Quad and i.MX8QuadXPlus. i.MX8QuadMax supports both OpenCL and OpenVX, while i.MX8Quad and i.MX8QuadXPlus support only OpenCL.

Two main use cases are compared:
- OpenCL vs OpenVX results on i.MX8QuadMax
- Results obtained with OpenCL on i.MX8QuadMax, i.MX8Quad and i.MX8QuadXPlus

The profiling was performed in the following environment:
Table 1. Common SW-ISP profiling conditions

<table>
<thead>
<tr>
<th>Feature</th>
<th>Input method</th>
<th>Graphical backend</th>
<th>Platform</th>
<th>BSP</th>
<th>GPU driver</th>
<th>OpenVX (just i.MX8QuadMax)</th>
<th>OpenCL</th>
<th>Used HDMI resolution</th>
<th>Doc</th>
</tr>
</thead>
<tbody>
<tr>
<td>Description</td>
<td>Image (1920 x 1080, Bayer)</td>
<td>wayland</td>
<td>OpenCL, OpenVX</td>
<td>4.9.88</td>
<td>6.2.4</td>
<td>1.1</td>
<td>1.2FP</td>
<td>1920 x 1080 (1080p)</td>
<td>Software ISP Application</td>
</tr>
</tbody>
</table>

Please find below a relevant selection of the hw/sw capabilities of the two boards:

Table 2. Hardware/Software capabilities of the boards

<table>
<thead>
<tr>
<th>FEATURE</th>
<th>i.MX8QM</th>
<th>i.MX8MQ</th>
<th>i.MX8QXP</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPU</td>
<td>2 x A72 4 x A53</td>
<td>4 x A53</td>
<td>4 x A35</td>
</tr>
<tr>
<td>GPU</td>
<td>2 x GC7000XSVX</td>
<td>1 x GC7000Lite</td>
<td>1 x GC7000Lite</td>
</tr>
<tr>
<td>DDR</td>
<td>LPDDR4 @ 1600 MHz 2 x 32b 6 GB</td>
<td>LPDDR4 @ 1600 MHz 1 x 32b 4 GB</td>
<td>LPDDR4 @ 1200 MHz (no ECC) 1 x 32b 3 GB</td>
</tr>
<tr>
<td>OpenCL</td>
<td>1.2 FP</td>
<td>1.2 FP</td>
<td>1.2 FP</td>
</tr>
<tr>
<td>OpenVX</td>
<td>1.1</td>
<td>n/a</td>
<td>n/a</td>
</tr>
</tbody>
</table>

NOTE
- SoftISP runs on one CPU when GPU is running.
- SoftISP runs on one GPU (even if 8QuadMax has two available).
- The CPU is used to load the data. The GPU is used to run the pipeline algorithms.
- The CPUs can run at certain frequencies. There is no common frequency between all three processors, so the measurements were performed in similar but not identical conditions:
  - i.MX8QuadMax: A72@1.5 GHz; A53@1.2 GHz

3.1. SW-ISP Pipeline Optimized by OpenCL/OpenVX for i.MX8QuadMax

Table 3 and Table 4 show system memory profiling by Linux tool.
To obtain the results in this section, the application was run on A72 core @ 1.6 GHz. It was modified to load the data and run the algorithms in a loop, to be able to compute average results.
• CPU and memory usage

<table>
<thead>
<tr>
<th></th>
<th>VIRTUAL (MB)</th>
<th>PHYSICAL (MB)</th>
<th>SHARED (MB)</th>
<th>%CPU</th>
<th>%PHYSICAL</th>
</tr>
</thead>
<tbody>
<tr>
<td>OpenCL SoftISP</td>
<td>414</td>
<td>58</td>
<td>11</td>
<td>11</td>
<td>1</td>
</tr>
<tr>
<td>OpenVX SoftISP</td>
<td>320</td>
<td>37</td>
<td>10</td>
<td>25</td>
<td>0.6</td>
</tr>
</tbody>
</table>

• DDR bandwidth.

<table>
<thead>
<tr>
<th></th>
<th>Read-Cycles (/s)</th>
<th>Read (MB/s)</th>
<th>Write-Cycles (/s)</th>
<th>Write (MB/s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>OpenCL SoftISP</td>
<td>56,621,872</td>
<td>863</td>
<td>23,245,245</td>
<td>355</td>
</tr>
<tr>
<td>OpenVX SoftISP</td>
<td>86,753,842</td>
<td>1,324</td>
<td>60,161,806</td>
<td>918</td>
</tr>
</tbody>
</table>

Table 5 shows the statistic results of distinct functions. The default pipeline includes bad pixel correction node, white balance node and high-quality demosaicing node.

<table>
<thead>
<tr>
<th>SW-ISP Functions</th>
<th>Consumed time of OpenCL (ms)</th>
<th>Consumed time of OpenVX(ms)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bad Pixel Correction Node</td>
<td>34</td>
<td>5</td>
</tr>
<tr>
<td>White Balance Node</td>
<td>4</td>
<td>7</td>
</tr>
<tr>
<td>Histogram Equalization Node</td>
<td>5</td>
<td>3</td>
</tr>
<tr>
<td>Demosaicing Node</td>
<td>9</td>
<td>2</td>
</tr>
<tr>
<td>RGBA2YUV Node</td>
<td>7</td>
<td>4</td>
</tr>
<tr>
<td>High-quality Noise Reduction Node</td>
<td>10,923</td>
<td>1,673</td>
</tr>
<tr>
<td>YUV2RGBA Node</td>
<td>9</td>
<td>3</td>
</tr>
<tr>
<td>Pipeline(default)</td>
<td>52</td>
<td>20</td>
</tr>
</tbody>
</table>

3.2. SW-ISP Performance comparison between between i.MX8 Series Application Processors for OpenCL 1.2

There are only a few fixed frequencies that can be set for the CPUs for each board. There is no common frequency between all three boards, so the measurements were performed in similar but not identical conditions:

- i.MX8QuadMax: A53@1.2 GHz
- i.MX8MQuad: A53@1 GHz
- i.MX8QuadXPlus: A35@1 GHz

For i.MX8QuadMax A53 core was chosen to run the application, as it could be set at a frequency value closer to the ones available on the other two boards (1.2 GHz vs 1 GHz).
• CPU and memory usage

Table 6. Memory and CPU usage

<table>
<thead>
<tr>
<th></th>
<th>VIRTUAL (MB)</th>
<th>PHYSICAL (MB)</th>
<th>SHARED (MB)</th>
<th>%CPU</th>
<th>%PHYSICAL</th>
</tr>
</thead>
<tbody>
<tr>
<td>i.MX8QuadMax <a href="mailto:A53@1.2GHz">A53@1.2GHz</a></td>
<td>414</td>
<td>58</td>
<td>11</td>
<td>12</td>
<td>1</td>
</tr>
<tr>
<td>i.M8MQuad A53@1GHz</td>
<td>414</td>
<td>59</td>
<td>11</td>
<td>8</td>
<td>2</td>
</tr>
<tr>
<td>i.MX8QuadXPlus A35@1GHz</td>
<td>413</td>
<td>58</td>
<td>11</td>
<td>6.7</td>
<td>2</td>
</tr>
</tbody>
</table>

The algorithm uses the CPU to load the input data. Only one of the available CPUs is used to perform this operation.

• DDR bandwidth

Table 7. DDR bandwidth

<table>
<thead>
<tr>
<th></th>
<th>Read-Cycles (/s)</th>
<th>Read (MB/s)</th>
<th>Write-Cycles (/s)</th>
<th>Write (MB/s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>i.MX8QuadMax LPDDR4 @ 1600 MHz 2 x 32b 6 GB</td>
<td>1,632,089,432</td>
<td>844</td>
<td>695,038,942</td>
<td>359</td>
</tr>
<tr>
<td>i.M8MQuad LPDDR4 @ 1600 MHz 1 x 32b 4 GB</td>
<td>1,744,614,872</td>
<td>467</td>
<td>680,330,891</td>
<td>182</td>
</tr>
<tr>
<td>i.MX8QuadXPlus LPDDR4 @ 1200 MHz (no ECC) 1 x 32b 3 GB</td>
<td>1,657,446,637</td>
<td>460</td>
<td>1,013,582,601</td>
<td>281</td>
</tr>
</tbody>
</table>

• Pipeline stages

Table 8. Pipeline stages

<table>
<thead>
<tr>
<th></th>
<th>i.MX8QuadMax GC7000XSVX</th>
<th>i.M8MQuad GC7000L</th>
<th>i.MX8QuadXPlus GC7000Lite</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bad Pixel Correction Node</td>
<td>34</td>
<td>68</td>
<td>68</td>
</tr>
<tr>
<td>White balance Node</td>
<td>4</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td>Histogram Equalization Node</td>
<td>7</td>
<td>11</td>
<td>10</td>
</tr>
<tr>
<td>Demosaicing Node</td>
<td>9</td>
<td>21</td>
<td>20</td>
</tr>
<tr>
<td>RGBA2YUV Node</td>
<td>7</td>
<td>15</td>
<td>15</td>
</tr>
<tr>
<td>High-quality Noise Reduction Node</td>
<td>7,360</td>
<td>21,674</td>
<td>21,159</td>
</tr>
<tr>
<td>YUV2RGBA Node</td>
<td>9</td>
<td>21</td>
<td>21</td>
</tr>
<tr>
<td>Pipeline</td>
<td>54</td>
<td>104</td>
<td>102</td>
</tr>
</tbody>
</table>
NOTE

• Pipeline (default) value is sum of Bad Pixel Correction, White Balance, Histogram Equalization, Demosaicing.
• SoftISP runs on only one GPU (even if 8QM has two available).
• Different GPU’s are used for each board: GC7000XSVX for i.MX8QuadMax and GC7000L for i.MX8MQuad.
• GFLOPS on GC7000XSVX is twice the value of GC7000L. This explains why most of the algorithms running on i.MX8MQuad last twice longer than the ones running on i.MX8QuadMax.
• The performance of an application can be influenced by several factors including GFLOPS and caching method.

3.3. Profiling guidelines

• CPU and memory usage were performed using Linux ‘top’ command.
• DDR bandwidth was measured using Linux ‘perf stat’ command.

```
# perf stat -I 1000 -a -e ddr0/read-cycles/,ddr0/write-cycles/
```

NOTE

i.MX8QuadMax has 2 DDR controllers that must be taken into consideration:

```
# perf stat -I 1000 -a -e ddr0/read-cycles/,ddr0/write-cycles/,ddr1/read-cycles/,ddr1/write-cycles/
```

• To run an application on a specific CPU use taskset command. It uses a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU. E.g., to run SoftISP on core 3:

```
root@imx8qmmek:~# taskset 0x8 ./SoftISP
```

To obtain the index of each CPU:

```
root@imx8qmmek:~# cat /proc/cpuinfo
```

To check that the process is really running on the specified core, check PSR column:

```
root@imx8qmmek:~# ps -aF
```

4. Conclusion

This application represents a software ISP solution.

Its performance is influenced by the hardware capabilities: GPU type, CPU frequency, DDR features.

The SW-ISP application has these main advantages:

• High application performance.
• Algorithm of ISP optimized deeply by OpenCL/OpenVX.
• Support different combination of functions.
• SW-ISP optimized by OpenCL support most OpenCL platform.
5. References

2. OpenCL 1.2 and OpenVX 1.1 Reference Pages, available at https://www.khronos.org/.
8. i.MX8 Series Applications Processors
9. Vivante GC7000 GPUs

6. Revision history

<table>
<thead>
<tr>
<th>Revision number</th>
<th>Date</th>
<th>Substantive changes</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>10/2017</td>
<td>Initial release</td>
</tr>
<tr>
<td>1</td>
<td>09/2018</td>
<td>Source release</td>
</tr>
<tr>
<td>2</td>
<td>12/2018</td>
<td>Chapter 3 &amp; 4 updated</td>
</tr>
</tbody>
</table>
Information in this document is provided solely to enable system and software implementers to use NXP products. There are no express or implied copyright licenses granted hereunder to design or fabricate any integrated circuits based on the information in this document. NXP reserves the right to make changes without further notice to any products herein.

NXP makes no warranty, representation, or guarantee regarding the suitability of its products for any particular purpose, nor does NXP assume any liability arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. "Typical" parameters that may be provided in NXP data sheets and/or specifications can and do vary in different applications, and actual performance may vary over time. All operating parameters, including "typicals," must be validated for each customer application by customer's technical experts. NXP does not convey any license under its patent rights nor the rights of others. NXP sells products pursuant to standard terms and conditions of sale, which can be found at the following address: nxp.com/SalesTermsandConditions.

While NXP has implemented advanced security features, all products may be subject to unidentified vulnerabilities. Customers are responsible for the design and operation of their applications and products to reduce the effect of these vulnerabilities on customer's applications and products, and NXP accepts no liability for any vulnerability that is discovered. Customers should implement appropriate design and operating safeguards to minimize the risks associated with their applications and products.

NXP, the NXP logo, NXP SECURE CONNECTIONS FOR A SMARTER WORLD, COOLFLUX, EMBRACE, GREENCHIP, HITAG, I2C BUS, ICODE, JCOP, LIFE VIBES, MIFARE, MIFARE CLASSIC, MIFARE DESFire, MIFARE PLUS, MIFARE FLEX, MANTIS, MIFARE ULTRALIGHT, MIFARE4MOBILE, MIGLO, NTAG, ROADLINK, SMARTLX, SMARTMX, STARPLUG, TOPFET, TRENCHMOS, UCODE, Freescale, the Freescale logo, Altivec, C 5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C Ware, the Energy Efficient Solutions logo, Kinetis, Layerscape, MagniV, mobileGT, PEG, PowerQUICC, Processor Expert, QoriQ, QoriQ Converge, Ready Play, SafeAssure, the SafeAssure logo, StarCore, Symphony, VortiQa, Vybrid, Airfast, BeeKit, BeeStack, CoreNet, Flexis, MXC, Platform in a Package, QUICC Engine, SMARTMOS, Tower, TurboLink, and UMEMS are trademarks of NXP B.V. All other product or service names are the property of their respective owners. Arm, AMBA, Arm Powered, Artisan, Cortex, Jazelle, Keil, SecurCore, Thumb, TrustZone, and μVision are registered trademarks of Arm Limited (or its subsidiaries) in the EU and/or elsewhere. Arm?, Arm9, Arm11, big.LITTLE, CoreLink, CoreSight, DesignStart, Mali, Mbed, NEON, POP, Sensinode, Socrates, ULINK and Versatile are trademarks of Arm Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. The Power Architecture and Power.org word marks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.

© 2019 NXP B.V.