HP INCFollow


Telemetry captures rich information about what is happening in the customer environment in tabular format. To exploit this data effectively we need a representation which is amenable to machine learning algorithms. The challenge in processing tabular data arises due to presence of variety of features like categorical, ordinal and numerical. Traditional representation techniques like label encoding/one-hot encoding fails to capture inherent order, if present, in categorical features, which is important for downstream tasks like classification or clustering. We propose a methodology which uses large language model embeddings to get a continuous space representation of the tabular data using domain knowledge. We show how such a representation can be used in clustering to generate insights.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License.