Abstract
CaRRIE 2.0 extracts key insights/summaries from unstructured data such as papers, news
document, notes, blog scrapped website, and RFP (Request for Proposal) data. The user gains
faster understanding and insights to make informed decisions. The engine groups/ranks
collections of documents based on keywords of interest presented within the documents. The
user can then focus on a handful of documents aligned with their interests, reducing the time
needed to read each document and create insights. He/she can also understand the larger data
trends without needing to read the entire stack of documents, one by one. We accomplish those
tasks by stacking and modifying natural language processing (NLP) algorithms, as well as
creating a couple of new NLP algorithms. On the high-level, what we are doing is very similar to
how a search engine algorithm works. Our solution is divided into three main parts: (i) Topic
Modeling, (ii) Scoring and Ranking, and (iii) Insight extractions.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License.
Recommended Citation
INC, HP, "CARRIE 2.0: INSIGHT EXTRACTION FROM UNSTRUCTURED DATA USING PROXIMITY CLUSTERING ENGINE", Technical Disclosure Commons, (February 04, 2021)
https://www.tdcommons.org/dpubs_series/4055