The present disclosure relates to a method and system (102) for smart data selection for continuous training of the machine learning model (104). The system (102) is configured to perform a mini-batch representation by calculating a gradient vector for each data point in a mini-batch and assess the mini-batch-level similarity. It may be configured to measure how well a given data instance describes a current task at each training step of the machine learning model (104) and to minimize redundancy among one or more data samples of the current task to ensure that the selected coreset covers a diverse range of categories using the data diversity module (206). The system (102) may be further configured to consider an amalgamation of mini-batch similarity and sample diversity to identify the most relevant instances for the current task training of the machine learning model (104).

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.