Sketch of the Day: K-Minimum Values

Usage

  • Step: Clicking the step button will generate a new random value and insert it into the DV sketch;
  • Play: Clicking the play button will start the simulation running. Random values will be generated and inserted into the sketch with a delay of 'n' ms between each value. The play button will change to the "Stop" button;
  • Reset: Clicking the reset button will stop a running simulation and reset the values to their defaults;
  • Step Size: The number of values that will be generated and inserted into the sketch for each step;
  • Delay: The number of miliseconds between generating new values used when the simulation is played. This can be reduced to make the simulation play faster or increased to make it play slower;
  • k: The k-minimum values used for estimating the cardinality. This must be greater than one;

Simulation Elements

  • Value: The last randomly generated value (between 1 and 65,536) that was inserted into the sketch. At most 65,335 values will be generated;
  • Hash Value: The Jenkins hash of the value;
  • Number Line: All hashed values that have been inserted into the sketch. Values less than or equal to k are red and those greater than k are green. A vertical line is placed at the kth value and its normalized value shown;
  • Line Graph: Plots the actual cardinality on the x-axis and the % Error on the y-axis. The blue area is the theoretical KMV error bounds;

Footnotes

  • If there are less than k values in the sketch then the estimated cardinality is the number of values in the sketch (which is commonly the actual cardinality resulting in an error of 0%). Let the simulation play until the cardinality is greater than k.
  • A k of at least 64 will result in an "acceptable" error. If you use a larger k the consider using using a larger step size.
  • Refer to the corresponding blog post for more information about KMV.
  • Some liberties have been taken with the number line so that the simulation remains performant when there are a large number of values in the sketch. Visual artifacts may be seen.
  • We have open sourced this simulation so that you are free to go in and play with its guts to learn more about this amazing algorithm.

Have fun and happy counting!

Rob Grzywinski