A new active labeling method for deep learning

Authors

Dan Wang

Published

July 1, 2014

Abstract

Deep learning has been shown to achieve outstanding performance in a number of challenging real-world applications. However, most of the existing works assume a fixed set of labeled data, which is not necessarily true in real-world applications. Getting labeled data is usually expensive and time consuming. Active labelling in deep learning aims at achieving the best learning result with a limited labeled data set, i.e., choosing the most appropriate unlabeled data to get labeled. This paper presents a new active labeling method, AL-DL, for cost-effective selection of data to be labeled. AL-DL uses one of three metrics for data selection: least confidence, margin sampling, and entropy. The method is applied to deep learning networks based on stacked restricted Boltzmann machines, as well as stacked autoencoders. In experiments on the MNIST benchmark dataset, the method outperforms random labeling consistently by a significant margin.