Sketch of the Day: Probabilistic Counting with Stochastic Averaging (PCSA)

Usage

  • Step: Clicking the step button will generate a single new random value and update the simulation;
  • Play: Clicking the play button will start the simulation running. Random values will be generated and the simulation updated with a delay of 'n' ms between each value. The play button will change to the "Stop" button;
  • Reset: Clicking the reset button will stop a running simulation and reset the values to their defaults;
  • Auto Scale: If checked then the step size (the number of values that are generated for each step) will be automatically changed to based on the number of values that have already been added to the sketch. This provides for a smoother visualization across the span of values;
  • Step Size: The number of values that will be generated and inserted into the sketch for each step;
  • Delay: The number of miliseconds between generating new values used when the simulation is played. This can be reduced to make the simulation play faster or increased to make it play slower;

Simulation Elements

  • Value: The last randomly generated value (between 1 and 16,777,215) that was inserted into the sketch. At most 65,335 values will be generated;
  • Hash Value: The Jenkins hash of the value;
  • Bit String: The binary representation of the lower 24bits of the hash value. The least-significant 1 will be highlighted;
  • Bitmap: Bit 'i' is set if a run of length 'i' was seen in the bit string. All ones are highlighted to make the 'fringe' easier to see. The actual cardinality is shown below the bitmap with an indicator pointing to the location on the 1/2index scale;
  • Actual Cardinality: The cardinality of the set of all inserted values;

Footnotes

  • Refer to the corresponding blog post for more information about Probabilistic Counting with Stochastic Averaging.
  • We have open sourced this simulation so that you are free to go in and play with its guts to learn more about these amazing algorithms. You can increase the bit string width, maximum cardinality, etc.

Have fun and happy counting!

Rob Grzywinski