The present disclosure provides a method and a system for identifying stale data sets in the data lake for deprecation. The system comprises a metadata Hub, having a processor and a memory, and may act as a central repository. The metadata hub may also include a deprecator system or a deprecator, that uses an AI (artificial intelligence) model to deprecate unused datasets. The metadata hub may be communicatively connected to a large-scale data repository, also known as a data lake. The metadata hub may be configured to control the flow of ingest data from any relational database into the data lake. The data lake may include any data storage technology for offering storage for all data types. The metadata hub may be configured to send meta pulse queries to the data lake, where the meta pulse queries are used to pull usage stats of datasets, which includes identifying one or more actions performed on the datasets stored in the data lake. In response to the meta pulse queries the data lake is configured to provide a response pulse or metadata pulses indicating, the one or more actions performed on the datasets stored in the data lake. Upon receiving the response pulses, the metadata hub may be configured to perform one or more response actions on the datasets stored in the data lake, based on the response pulses. The one or more response actions may include deprecation of unused datasets by the deprecator system in the metadata hub, using a decision tree model algorithm.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Kaur, Harleen and Kala, Durga, "METHOD AND SYSTEM FOR METADATA MAINTENANCE IN A DATA LAKE", Technical Disclosure Commons, (October 23, 2023)