Using synthetic data for machine learning training

July 31, 2020Image AnnotationKarina Grosheva

The data problem

Today, many artificial intelligence projects face the problem of not having large enough sets of annotated data. Although new neural network architectures and learning algorithms are emerging, these algorithms cannot solve the data problem.

Modern methods of training deep neural networks have already shown their effectiveness in situations where huge sets of annotated data are available. The main classic dataset for computer vision called ImageNet contains more than 14 million images, divided into almost 22 thousand classes. In other words, every photo in ImageNet has a tag that people have checked and that shows what it shows. Tags are very detailed: for example, ImageNet has 120 different dog breeds.

However, the problem with annotated sets remains unsolved. We don’t just need a large set — for many applications, it’s not that difficult to get a lot of images or texts — but a annotated one. In other words, at some point people had to annotate or at least personally check and correct errors in the annotation of every piece of data: every photo, every text, every snapshot…

What should be done? Today we are talking about a method that gives new hope to many artificial intelligence applications — synthetic data.

What is synthetic data?

The basic idea is very simple: if we can’t collect and annotate data, maybe we can create it ourselves. Typically, synthetic data is used in computer vision tasks. We can make 3D models of objects of interest (manually) and place them in the desired virtual environment, and then generate images as renderers of these 3D scenes. We will get frames from the “cartoon”, and we can “shoot” the desired objects from all sides, in any combinations and angles. It is very important that the annotation is also obtained automatically and absolutely perfect. Since we created this 3D scene and placed the camera ourselves,we can understand which object each pixel of the resulting render belongs to.

For example, Neuromation is creating synthetic data for a project to recognize products on supermarket shelves. Here is the situation described above: the catalog of only Russian retail contains about 170 thousand different objects, and to create a set of manually annotated photos for such a huge set of objects is beyond human strength.

The approach with synthetic data remains: create 3D models of product packages (here you can save: different types of packages, of course, less than 170 thousand), overlay them with textures of different labels, place these models (randomly, with noise and possible errors) on a virtual shelf of a virtual supermarket and get an unlimited stream of perfectly annotatedsynthetic data, on which you can then train a neural network.

The quality and beauty of the created models worthy of attention, to recognize the subtle differences between tens of thousands of products they really need. But for other tasks, it is not so necessary to have very beautiful, photorealistic images. It is often enough that the image is slightly similar to reality, recognizable. We must understand that the ultimate goal here is not a good picture, but a well-trained model.