xiAPI CUDA support¶
XIMEA API can write image data to memory allocated by NVIDIA CUDA runtime library on supported configurations.
This can save you one cudaMemcpy
operation for copying the data from CPU to device memory on each acquired frame.
This is especially useful on systems with a physically unified memory like TX1, TX2 or Xavier.
XIMEA API also has support for GPUDirect technology described on a separate page.
Requirements¶
- 64 bit Linux or Windows system with NVIDIA GPU
- Camera models from following lines: xiB, xiB-64, xiC, xiX, xiJ, xiT
- XIMEA API package version 4.17.01 (4.17.14 for Windows) or later
- proprietary NVIDIA video drivers
- CUDA toolkit version 6 or later (tested with versions 8.0 and 9.2)
How to enable¶
Set relevant xiApi parameters in your code:
xiSetParamInt(handle, XI_PRM_BUFFER_POLICY, XI_BP_UNSAFE); xiSetParamInt(handle, XI_PRM_IMAGE_DATA_FORMAT, XI_FRM_TRANSPORT_DATA); xiSetParamInt(handle, XI_PRM_TRANSPORT_DATA_TARGET, XI_TRANSPORT_DATA_TARGET_ZEROCOPY); // or XI_TRANSPORT_DATA_TARGET_UNIFIED
Note that you can't use safe buffer policy or image format other than transport data.
Unified memory¶
XIMEA API allocates unified CUDA memory using cudaMallocManaged
function and then pins it to CPU RAM using cudaMemAdvise
call setting cudaMemAdviseSetPreferredLocation
to cudaCpuDeviceId
.
CUDA device must have cudaDevAttrConcurrentManagedAccess
attribute set to true for this feature to work.
Pointer returned in bp
field of XI_IMG
structure in xiGetImage
call can be used therefore both in CPU and GPU code.
Zerocopy memory¶
XIMEA API allocates zerocopy CUDA memory using cudaHostAlloc
function with cudaHostAllocMapped
flag set.
Pointer returned in bp
field of XI_IMG
structure in xiGetImage
call is for access from CPU.
It needs to be converted into device pointer for use in CUDA kernels with cudaHostGetDevicePointer
function unless cudaDevAttrCanUseHostPointerForRegisteredMem
device attribute is true, then the pointer can be used as is.
Sample application¶
Attached is an example demonstrating discussed features: xiCUDASample.tar.bz2.
It is based on 3_Imaging/histogram
from CUDA samples which computes 64-bin histogram on GPU.
Results are displayed on the terminal using ASCII-art.
Also this application prints time measurements, so you can compare the time needed for running the computation with different options.
Please refer to readme.txt
included in the tarball for building instructions.