https://www.ximea.com/support/wiki/vision-libraries/artificial_intelligence

Neural Networks¶

Preparing datasets for deep learning
using embedded vision and multi camera setup¶

Neural Networks

Preparing datasets is an important step in any deep learning program. Embedded vision and/or a multi camera setup offer the chance to gather high quality data for applications in almost all areas of life.

How to prepare data set for AI application and Deep learning?¶

AI (Artificial Intelligence) based applications are becoming more and more popular in various fields nowadays.
They already solve tasks with different levels of complexity, often being faster and more reliable than humans.

This article is focused on implementations which are based on image and video processing like UAV control, self-driving cars, driverless trains, boats or mobile robots.

Data is key in deep learning projects¶

Any of the automated systems mentioned above will need a lot of data for inference purposes to learn how to behave.
Getting this high quality data for a particular set of situations is a crucial starting point, posing an important question: where or how to obtain such dataset.

Using just any standard training set is a possibility, but it does not usually correspond to the real situation which requires to be managed.
Feeding the neural network with such material will not provide enough confidence that the system will behave correctly.
Most of such data setups, especially the complex ones, are therefore built on a collection of real life examples.
For autonomous cars, this would mean the practical installation of the necessary number of specific embedded camera models, thus creating a multi camera setup on the vehicle, and running a plethora of recordings.

Important parts of a typical multi camera setup¶

Deep learning sets will then depend on a particular camera and image processing algorithms and such a camera system generates some artifacts.
Which is why the following contents and aspects need to be considered when assembling a camera setup for data gathering:

Camera (image sensor, bit depth, resolution, FPS, S/N, firmware, mode of operation, etc.)
Lens, its control and settings
Camera and lens calibration and testing
Software for image/video processing
Different illumination conditions
How to handle multicamera or embedded solutions

How to train a neural network?¶

For example, in the case of NVIDIA DALI project, the workflow starting point is to utilize a standard image database.
Decode JPEG images and then apply several image processing transforms to train the network on changed images which could be derived from the original set via the following operations:

jpeg decoding
exposure change
resize
rotation
color correction (augment)

This could be an artificial way how to significantly increase the number of images in the database.
It a virtual increase, but images are not the same and such an approach turns can be useful.

In fact, something like this can be done for video as well by getting video in RAW and then choosing different sets of parameters for GPU-based RAW processing to multiply new image series.
Provided the original RAW video is of high enough quality, many more different videos can be prepared for use in neural network training. Such GPU-based RAW processing takes minimum time.

List of transforms applied to RAW data¶

Combining XIMEA embedded cameras for video recordings and Fastvideo SDK for raw image/video processing the following can be achieved:

exposure correction
denoising
color correction
color space transforms
1D and 3D LUT in RGB/HSV
crop and resize
rotation
geometric transforms
lens distortion/undistortion
sharp
any image processing filter
gamma

This is also the approach to simulate through software different lighting conditions in terms of exposure control and spectral characteristics of illumination.
Possible to simulate are various lenses and orientations, so the total number of new videos/pictures for training could be increased in magnitude.
There is no need to save these processed videos, they can be generated on-the-fly by doing realtime RAW processing on GPU.
These are the basics of how to prepare a dataset for deep learning and what type of equipment is needed for a multi camera setup.

Credentials
Fastvideo Blog:
https://www.fastcompression.com/blog/ai-video-training.htm