Abstract

The present disclosure relates to a system and method for dynamically managing memory allocation in a distributed data processing platform. The system includes a plurality of workers executing tasks with varying memory requirements, each equipped with a memory sentinel monitoring runtime parameters. A risk engine computes a risk score from telemetry data received from the memory sentinels to anticipate out-of-memory conditions and adjust resource profiles accordingly. When a high risk is detected, the system requests larger resource profiles from a resource manager, provisions new larger workers, and reallocates memory-intensive tasks while preserving their state through a state transfer plane. This structure facilitates the proactive management of memory resources, ensuring efficient task execution in data processing environments subjected to input data skew.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS