Defensive Publications Series

DEBUGGING LARGE-SCALE DATA PIPELINES WITH CONSISTENT HASHING

Abstract

A mechanism is provided for debugging large-scale data pipelines by sampling inputs and outputs of the large-scale data with consistent hashing. The mechanism can ensure the same output set across different pipelines given the same input set and the same machine learning model. The mechanism computes consistent hashing based on inputs and produces a consistent sample (e.g., the same subset) of events in the input and output for computing alignment. The mechanism tracks the alignment of input and output sets throughout the pipeline to identify any bugs and to determine exactly where misalignment is introduced.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Skvortsov, Evgeny and Nguyen, Long, "DEBUGGING LARGE-SCALE DATA PIPELINES WITH CONSISTENT HASHING", Technical Disclosure Commons, (March 22, 2018)
https://www.tdcommons.org/dpubs_series/1107

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

DEBUGGING LARGE-SCALE DATA PIPELINES WITH CONSISTENT HASHING

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

DEBUGGING LARGE-SCALE DATA PIPELINES WITH CONSISTENT HASHING

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information