User generated hashtags are widely used on social media platforms, blogs, and community sites to annotate content and make it easily discoverable by humans and algorithms. Hashtags reflect the semantics of the content from the users’ points of view, in their own vocabulary. Detecting the relationship between hashtags is often a challenging problem. This disclosure describes techniques that leverage the historical distribution of topics in social media posts or other content to accurately detect candidates for higher level topics in a topical graph. A generalization score is calculated based on the observation that higher level topics typically have a long history with a nearly even distribution. Topics that have a longer history and a consistent presence across various time periods get a higher generalization score and can be accurately detected as having coarser granularity.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.