Abstract
Feedback from stakeholders connected to a software product is crucial for product development. Feedback can be used for a variety of purposes, such as continuous improvement, rapid prototyping, experimentation, resource allocation, improving user experience, etc. However, for products with a large set of features and a large volume of feedback (which may be in the form of freeform text), analyzing the feedback to generate useful insight is difficult. This disclosure describes the use of a vector database formed based on unsupervised learning to cluster a set of seed feedback data, tested by partitioning the seed data into training and test datasets. The vector database stores cluster IDs and embeddings and can classify and generate embeddings for new feedback data as it is received. Contents of the vector database can be provided as inputs to a large language model (LLM) to obtain insight regarding the product.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Han, Zhaoying and Siruguppa, Sathya Anurag, "Automated Text Clustering and Classification of Feedback Data Using LLMs", Technical Disclosure Commons, (July 16, 2024)
https://www.tdcommons.org/dpubs_series/7199