Abstract

In online search or content selection systems, significant computational resources are expended to classify or categorize electronic documents into topics, concepts, or entities. A classifier can process, parse or otherwise analyze the document to assign one or more labels to the document based on the taxonomy. The classifier can generate a score for each of the labels, and provide the labels and the scores to other components or modules for further downstream processing. To keep downstream processes efficient without causing excessive processing of labels, the classifier may filter out the labels to return a subset of labels based on comparing a label’s score with a threshold. However, using a threshold-based technique to filter out labels may not account for the tree structure of the taxonomy, and it may also fail to take into account the likelihood dependencies between all parent nodes and child nodes. The proposed technique solves this by (1) selecting a set of labels returned by the classifier that optimizes certain metrics, such as precision and recall metrics; and (2) using a greedy multi-label selection algorithm that optimizes the precision/recall in step (1). Using these techniques, the system can select a subset of labels to return or provide for further processing.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Lin, Hsin-yi; Milch, Brian; Adar, Michel; Fang, Scot; and van de Veerdonk, Rene, "Threshold-free Selection of Taxonomic Multilabels", Technical Disclosure Commons, (November 04, 2016)
https://www.tdcommons.org/dpubs_series/312

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Threshold-free Selection of Taxonomic Multilabels

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Threshold-free Selection of Taxonomic Multilabels

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information