Benchmarks comparison for Jetson Nano, TX2 and AGX Xavier¶
NVIDIA® Jetson is the world's leading embedded platform for image processing and DL/AI tasks.
Its high-performance, compact size, variability and low-power computing for deep learning make it the ideal component of mobile compute-intensive projects.
NVIDIA has released a series of SBC (Single board computer) Jetson hardware modules focused on utilization in embedded vision systems and applications.
XIMEA has developed a carrier board for Jetson TX2 and offers a wide portfolio of cameras that are able to run on Jetson Nano and AGX Xavier.
Hardware features for Jetson Nano, TX2, AGX Xavier¶
The following is a brief comparison of Jetsons hardware features showing a variety of setup options for different markets.
Feature | Nano | TX2 / TX2i | Xavier |
CPU (ARM) | 4-core ARM A57 @ 1.43 GHz | 4-core ARM Cortex-A57, 2-core Denver2 @ 2GHz | 8-core ARM Carmel v.8.2 @ 2.26GHz |
GPU | 128-core Maxwell @ 921MHz | 256-core Pascal @ 1.3GHz | 512-core Volta @ 1.37GHz |
Memory | 4GB LPDDR4, 25.6 GB/s | 8GB 128-bit LPDDR4, 58.3 GB/s | 16GB 256-bit LPDDR4, 137 GB/s |
Storage | MicroSD | 32 GB eMMC 5.1 | 32 GB eMMC 5.1 |
Tensor cores | NA | NA | 64 |
Video encoding | (1x) 4Kp30, (2x) 1080p60, (4x) 1080p30 | (1x) 4Kp60, (3x) 4Kp30, (4x) 1080p60, (8x) 1080p30 | (4x) 4Kp60, (8x) 4Kp30, (32x) 1080p30 |
Video decoding | (1x) 4Kp60, (2x) 4Kp30, (4x) 1080p60, (8x) 1080p30 | (2x) 4Kp60, (4x) 4Kp30, (7x) 1080p60 | (2x) 8Kp30, (6x) 4Kp60, (12x) 4Kp30 |
USB | (4x) USB 3.0 + Micro-USB 2.0 | USB 3.0 + USB 2.0 | (3x) USB 3.1 + (4x) USB 2.0 |
PCIe | 4 lanes PCIe Gen 2 | 5 lanes PCIe Gen 2 | 16 lanes PCIe Gen 4 |
Power | 5W / 10W | 7.5W / 15W | 10W / 15W / 30W |
Size | 70 x 45 mm | 90 x 50 mm | 100 x 87 mm |
In the camera applications, the Host-to-Device transfers can be usually hidden by implementing the GPU Zero Copy or by overlapping GPU copy/compute.
Performance Comparison: Jetson Nano vs TX1 vs TX2 vs AGX Xavier¶
In order to fairly compare the performance of each module the following basic image processing tasks were chosen.
They are specific for benchmarking the camera applications: white balance, demosaic (debayer), color correction, optional resize, jpeg encoding, etc.
Hardware and software for benchmarking¶
- CPU/GPU NVIDIA Jetson Nano, TX1, TX2/TX2i, AGX Xavier
- OS L4T (Ubuntu 18.04)
- CUDA Toolkit 10.0 for Jetson Nano, TX2/TX2i, AGX Xavier
- Fastvideo SDK 0.14.2
GPU kernel times for 2K image processing (1920×1080, 8/16 bits per channel, milliseconds)¶
Algorithm and parameters | Nano | TX2 / TX2i | Xavier | ||
Host to Device | 0.2 | 0.2 | 0.05 | ||
White Balance | 0.6 | 0.24 | 0.08 | ||
HQLI Debayer | 1.8 | 0.47 | 0.36 | ||
DFPD Debayer | 4.7 | 2.06 | 0.95 | ||
MG Debayer | 12.7 | 5.9 | 2.2 | ||
Color Correction with 3×4 matrix | 1.7 | 0.81 | 0.25 | ||
Resize from 2K to 960×540 | 10 | 4.3 | 1.5 | ||
Resize from 2K to 1919×1079 | 19.8 | 8.2 | 2.4 | ||
Gamma (1920×1080) | 1.4 | 0.84 | 0.2 | ||
JPEG Encoding (1920×1080, 90%, 4:2:0) | 4.3 | 1.7 | 0.62 | ||
JPEG Encoding (1920×1080, 90%, 4:4:4) | 6.8 | 2.6 | 0.75 | ||
JPEG2000 Encoding (lossy, 32×32, single mode) | 81 | 63 | 11.1 | ||
JPEG2000 Encoding (lossless, 32×32, single mode) | 190 | 163 | 23.3 | ||
Device to Host | 0.1 | 0.1 | 0.02 |
It is possible to choose a particular debayer algorithm and output compression (JPEG or JPEG2000) to define the image processing pipeline.
The Fastvideo company has also done the same kernel time measurements for NVIDIA GeForce and Quadro GPUs.
You can get that document HERE