In certain applications, e.g., online advertising where user data may be utilized, it is a requirement to estimate the cardinality of data values. This disclosure presents a lightweight mechanism to determine data cardinality. A Bloom filter is updated for each key by setting bits that are identified based on hashing the key. To determine whether there are multiple keys in a set, the count of bits in the Bloom filter that are set is obtained and is compared with a threshold value. If the threshold value is met, it is determined that the data set has at least the corresponding cardinality, e.g., at least two keys, at least three keys, etc.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.