Abstract
The present disclosure relates to the field of Artificial Intelligence (AI), in particular to AI methods and systems for analyzing transactional data using Natural Language Processing (NLP) techniques to identify latent relationships and proximity within networks. The disclosed system utilizes text fields within datasets to categorize input data into appropriate classifications. The input comprises a dataset containing text data that provides contextual information related to transactions. Feature embeddings are generated using a sentence-based Language Learning Model (LLM), which captures semantic meaning of sentences. The LLM converts input data of varying lengths into fixed-dimension embeddings. The sentence-based LLM is fine-tuned based on a distance function to maximize embedding distance between categories and minimize embedding distance within categories. A clustering methodology is employed to reduce dimensionality. The generated distance features are fed into a classification model for prediction. Entities with smaller distance values are identified as closely resembling a particular class.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Zhang, Shuhan; Tangri, Anurag; Sonar, Chinmay Narendra; Meng, Lin; Liu, Can; and Chetia, Chiranjeet, "A DISTANCE BASED EMBEDDING METHOD USING NLP METHODS TO IDENTIFY PROXIMITY TO A NETWORK", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/8990