Defensive Publications Series

SYSTEM AND METHOD FOR CHECKPOINTING AND STATE SYNCHRONIZATION FOR FAULT TOLERANCE IN LONG-RUNNING MAP REDUCE JOBS

ALOK ROY, VISAFollow

Abstract

The present disclosure relates to a method and system for performing checkpointing and state synchronization for fault tolerance in long-running MapReduce jobs. The method involves integrating checkpointing mechanism into both the Map and Reduce phases of the MapReduce jobs, capturing and storing critical data and metadata at key intervals. Additionally, the method includes replicating these checkpoints across all clusters in the active-active setup, ensuring that any cluster can access the most recent checkpoint and resume the MapReduce job in case of failure or re-routing. Finally, the method ensures that the checkpoints are synchronized across clusters before the job proceeds, providing a consistent and reliable recovery point. Present disclosure improves fault tolerance and ensures more efficient processing for MapReduce jobs in distributed environments.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

ROY, ALOK, "SYSTEM AND METHOD FOR CHECKPOINTING AND STATE SYNCHRONIZATION FOR FAULT TOLERANCE IN LONG-RUNNING MAP REDUCE JOBS", Technical Disclosure Commons, (December 03, 2024)
https://www.tdcommons.org/dpubs_series/7610

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

SYSTEM AND METHOD FOR CHECKPOINTING AND STATE SYNCHRONIZATION FOR FAULT TOLERANCE IN LONG-RUNNING MAP REDUCE JOBS

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

SYSTEM AND METHOD FOR CHECKPOINTING AND STATE SYNCHRONIZATION FOR FAULT TOLERANCE IN LONG-RUNNING MAP REDUCE JOBS

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information