Abstract
Checking the quality of huge data collections can be slow and often only gives simple "good" or "bad" labels. If a new type of flaw is found, re-scanning old data can be very expensive. This system solves that by turning data, like satellite photos, into digital signatures called embeddings. It measures how close these signatures are to anchors, reference points that represent specific problems like clouds or blur. This creates a detailed, multi-layered quality report for every file that can be easily searched. Because it works with these small digital signatures instead of the original bulky files, the system can quickly re-check millions of old records for new types of defects without the high cost of processing the original data again.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Wohltman, Sean, "Data Quality Assessment via Semantic Proximity to Artifact Anchors in a Vector Space", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/9604