The present disclosure describes a nearest-neighbor based manifold expansion technique integrated into an active learner for seeking human review. Initially, the active learner performs a sampling formulation in which an unlabeled dataset, including unlabeled examples, is provided as an input to the active learner. The unlabeled dataset is then divided into seed datasets (i.e. a positive seed dataset and a negative seed dataset) and a test dataset. The positive seed dataset includes positive seeds, the negative seed dataset includes negative seeds and the test dataset includes test examples. In a voting process, each of the positive seeds and the negative seeds votes to the test examples that are in a neighborhood of the positive seed or the negative seed. A ranked list of the test examples is prepared based on an overall score for each test example accumulated by votes. Top-k examples in the ranked list are sent to annotators for review. The annotators assign labels (i.e. positive or negative) to the top-k examples. The annotators can interpret why a particular example got a particular score and how much the positive seeds and the negative seeds contributed to that score. The examples labeled by the annotators are added to the seed datasets. The voting process is executed again based on the updated seed datasets. This way, the voting process is executed continuously, and the ranked list is updated in an incremental manner in real time.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.