The present disclosure relates to a system and method for disambiguated merchants identification using distillation of a Large Language Model (LLM). The present disclosure suggests the generation of synthetic merchants data using commercial LLMs and specific prompts. Thereafter, the method includes performing a fine-tuning on an open-source model using a specific instruction and resulting in a distilled merchant model. Subsequently, the method includes generating embeddings for all the merchant's data obtained from the merchant’s transaction data based on a specified instruction and/or dimensions and storing the generated embeddings in a vector database. Finally, the method includes executing the prompts along with the generated embeddings using the distilled merchant model to identify disambiguated merchants.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.