Realtime image processing on NVIDIA GeForce RTX 2080ti¶
High resolution cameras are getting more and more popular nowadays with resolution parameter ever growing due to advancement in latest sensor technology.
The newest of them offer a remarkable resolution of 50 or more Megapixel pushing the data bandwidth to the edge where it can become a bottleneck.
In the past, high resolution meant slow transfer speed (low fps) which was far from ideal for a smooth video stream on the monitor where you would expect to get at least 20-30 fps meaning an output with low latency and not just a sequence of separate frames.
Even the simplest task of getting the full RAW stream from the image sensor to the computer with maximum bit depth and at full image resolution can be complicated.
Fig.1. The high resolution 48 MP camera from XIMEA with active EF-mount
Camera¶
Camera vendor XIMEA offers a portfolio of various types of industrial cameras including the xiB series.
The xiB camera line includes a model called CB500 with 48 MPix Global shutter CMOS image sensor providing 22 Fps (frames per second) at 12-bit readout or 30 Fps at 8-bit.
This results in a substantial data stream where equipping the camera with PCI-Express interface that secures 20 Gbits throughput comes in handy.
The real data stream of this 8K camera model can go up to 1550 MBytes/s.
GPU¶
One option to handle all this RAW data is to store them to high-end SSD, yet there could be a more effective solution.
It is possible to process the data with the help of GPU and save the compressed color frames to conventional SSD with the final data rate being less than 500 MB/s.
This procedure allows to solve important tasks of real-time applications: image acquisition, RAW image processing, image compression and storage.
One more option is to send the data from XIMEA CB500 camera directly to system memory and then copy it to NVIDIA GPU memory thus all image processing will be done on the GPU.
Below are two pipelines and corresponding benchmarks for CB500 camera on NVIDIA GeForce RTX 2080ti.
The first is an example of a common pipeline and second a representation of a Preview mode.
Fig.2. Fast CinemaDNG Processor on CUDA
Realtime image processing on GPU for 8K camera CB500¶
This is not a full image processing pipeline, but an example of a common one.
It includes camera calibration data (dark frame, flat field, dcp profile, lcp profile) and has an option for JPEG compression to get output bandwidth to around 400-450 MB/s which should comply with conventional SSD.
• Acquisition software gets RAW data from the camera and makes a copy to GPU
• Unpacking module transforming from 12-bit to 16-bit
• Dark image subtraction
• Flat-Field Correction
• White Balance
• 1D LUT for RAW data
• Image demosaicing
• Base Color Correction
• Curves and Levels with 1D LUT
• DCP profile (DNG specification)
• Remap with LCP profile
• Gamma
• Transform from 16-bit to 8-bit per channel
• Resize and OpenGL output to monitor
• JPEG compression with quality 90%
• Async write jpg images to SSD
Time measurements from NVIDIA GeForce RTX 2080ti with CB500¶
These benchmarks are for camera application at full resolution and 12-bit output.
The pipeline can be tested with Fast CinemaDNG Processor software in offline mode to tune the parameters, check image quality and performance to implement them in real time afterwards.
• Input RAW image: 7920 × 6004 pixels, 12 bits per pixel
• Host-to-device transfer = 7.83 ms
• Dark frame subtraction and flat-field correction = 0.88 ms
• Linearization LUT = 0.37 ms
• White balance = 0.36 ms
• MG Debayer = 4.70 ms
• ProPhoto space transform = 1.28 ms
• RGB Lut = 1.52 ms
• Output color space transform = 1.39 ms
• Geometry transform (undistortion) = 7.24 ms
• Crop time (no crop after undistortion): 0.00 ms
• 16 to 8 bit transform = 0.80 ms
• JPEG encoder time (quality 90%, subsampling 4:2:0) = 2.69 ms
• Viewport crop = 0.02 ms
• Viewport resize (no viewport resize) = 0.00 ms
• Total GPU = 29.07 ms
• Total GPU + CPU = 33.92 ms
Fig.3. Fastvideo SDK for Jetson
Time for GPU processing on NVIDIA GeForce RTX 2080ti could be around 30-40 ms per frame which is faster than the maximum frame rate of the camera.
For more complicated image processing pipeline that can include bad pixels removal, denoising, intermediate color space transforms, defringe, resize, rotate, crop, sharp, histogram, parade, image and video compression, etc., the second GPU could assist in accomplishing these tasks in realtime.
If there is only one GPU, the total time to process one frame (GPU + CPU) could reach 60-70 ms making it important to optimize both software and hardware for getting the maximum performance from the system. To create a fast multithreaded solution with XIMEA CB500 camera, both high-end software and hardware (CPU, GPU, SSD) are essential.
Parallel¶
For workflows as these, JPEG store to fast SSD or NMVe is implemented in a separate CPU thread making it asynchronous and thus the time of jpg storing is not added to the total time.
Video output is also implemented in a separate thread - the main idea is to divide the whole task into parts and process them in parallel both on CPU and GPU to get the maximum performance.
Preview mode for XIMEA CB500¶
Here you can review the results of time measurements when running in preview mode.
This mode is used when it is not necessary to compress and to store processed frames – in such case the image processing pipeline is very simple and performance is higher.
• Input raw image: 7920 × 6004 pixels, 12 bits per pixel
• Host-to-device transfer = 8.49 ms
• Linearization LUT = 0.38 ms
• White balance = 0.36 ms
• MG Debayer = 4.59 ms
• ProPhoto space transform = 1.29 ms
• RGB Lut = 1.58 ms
• Output color space transform = 1.38 ms
• 16 to 8 bit transform = 0.81 ms
• Viewport crop = 0.02 ms
• Viewport resize (no viewport resize) = 0.00 ms
• Total GPU = 18.90 ms
• Total GPU + CPU = 21.86 ms
It's possible to get even better results by overlapping the host-to-device transfer with computations to exclude the time this takes from the benchmarks.
Above benchmarks do still include this time.
Fig.4. Application example: Print circuit board or PCB inspection
Custom software design for applications with GPU-based processing¶
There are a lot of different tasks which require a camera with high image resolution at 12-bit and high speed.
For example, the CB500 camera model is successfully utilized in applications like:
Aerial mapping, 3D scanning, flat panel inspection (FPD), solar panel analysis, printed circuit board (PCB) examination, wide area surveillance, persistent stadium and border security, cinematography, sports and entertainment, 360 panorama, UAV and Autonomous, Unmanned vehicles, etc.
The Fastvideo company offers high performance software solutions for such applications and most of them are based on GPU image processing pipeline from PRO version of Fast CinemaDNG Processor software which is highly optimized and has a digital cinema workflow inside for excellent image quality.
Embedded vision system options¶
Additionally, the CB500 camera comes with a flat ribbon flex cable ( MX500 model ) making it a perfect fit for embedded vision systems or multiple camera setups.
Software for this kind of complex, integrated solutions is based on Fastvideo SDK for Jetson and is available for: NVIDIA TK1, TX1, TX2, TX2i and AGX Xavier hardware*.
With the help of Fastvideo it is also possible to implement the desired pipeline for any specific task using CB500 camera and similar models in a single or multi-camera system.
For example, you can take advantage of 12-bit per channel JPEG compression on GPU at the end of the image processing pipeline to compress and to store more information at every frame.
Credentials
Fastvideo Blog:
https://www.fastcompression.com/blog/48-mpix-ximea-camera-gpu-processing.htm