Active Learning for Machine Learning: Basic Strategy

Basic strategy

Pool-based active learning. There is a certain selection, and the algorithm uses objects from it as requests to the Oracle. In this strategy, each object is assigned a degree of informativeness — how much benefit information about the true object label will bring, and the most informative objects are sent to the Oracle. The methods for selecting objects described below are relevant to this strategy.

Selective sampling. The algorithm uses a data stream rather than a static sample, and for each object in the stream, a decision is made whether to request an Oracle on this object or not. If the decision is made to request an Oracle, the object and its label are used in further training of the model, otherwise the object is simply discarded. Unlike selecting objects from a selection, selecting from a stream does not make any assumptions about the density of the distribution of objects, does not store the objects themselves, and works much faster.

Query synthesis. Instead of using pre-defined objects, the algorithm constructs the objects itself and submits them to the Oracle for input. For example, if the objects are vectors in n-dimensional space separated by a hyperplane and the binary classification problem is solved, it makes sense to give the Oracle input synthesized vectors close to the boundary.