Inventor(s)

Abstract

Techniques are disclosed for learned selection of a column encoder using column statistics. A feature vector is computed in a single pass over a column and includes base statistics (e.g., sparsity, cardinality, dominant frequency, bit-width percentiles, run-length and delta statistics, entropy, sortedness, and data type) and interaction features (e.g., sparsity clusteredness, bit-width gap, delta-entropy ratio, dominant runlength, outlier fraction, and delta uniformity). A benchmark harness evaluates candidate encoders on column samples and labels each sample with an encoder selected by a composite score combining decode performance, compression performance, and encode performance using configurable weights. A compact neural network maps the feature vector to an encoder distribution for sub-microsecond inference at encode time. Drift is detected using a normalized L2 distance between recent and training feature means, enabling selective benchmarking and fine-tuning.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS