Abstract

Present disclosure provides a method and system for cognitive optimization of distributed engines framework. The present disclosure discloses a telemetry-driven optimization system designed to elevate the performance and efficiency of distributed data pipelines operating on Hadoop YARN. The system harnesses the analytical power of Large Language Models (LLMs) and a standardized Model Context Protocol (MCP). Further, the system identifies and remediates inefficiencies in application code, resource allocation, and cluster configurations. The system seamlessly integrates with developer workflows, enabling automated diagnostics and corrective actions while embedding feedback loops for continuous learning and improvement. Thus, present disclosure discloses the architecture and implementation of system (i.e., CODE), using Apache Spark as a representative use case, and demonstrates its impact on operational efficiency, developer productivity, and system throughput in large-scale data environments

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS