Open Source Release: js-hll

Usage

  • Set Table: A list of all sets, each represented by a HyperLogLog structure. Sort values by clicking on the table headers. Select sets by clicking on the checkboxes to the left.
  • log2m: Slide the control to change the value of m and view how this effects the accuracy of the estimations. The value of m is varied by folding the HLL sets to the required value of m. A folded set results in the exact same strucutre as if a new HLL was created with that log2m and the entire stream reprocessed.
  • Limit Selection to 3 Sets: Limit the selection of sets to three sets (the maximum number of sets displayable in symmetric Venn form). There is a limit of 5 sets regardless if this option is checked or not.
  • Hide Spurious Results: Hides results where the calculated intersection value is more than 20% of the experimental error bound (1.04/√m * | A ∪ B |). Note that there are no known theoretical bounds for the intersection that we are aware of.
  • Expression Table: Table of all intersection combinations. Sort values by clicking on the table headers.
  • Hovers: Hover over Venn sets for additional information. Hover over table elements will highlight related information.

Visualization Elements

  • Size: Cardinality estimation of the corresponding HLL set. Cardinality of intersections of sets is computed using the inclusion-exclusion principle to compute intersection cardinality from union cardinality.
  • Error Bounds of Size: Standard Error of the cardinality estimation as described in the above linked blogs and the paper. Error of an intersection is calculated as the error of the union of the intersecting sets
  • Venn Diagram: Sets in the Venn diagram are positioned using the algorithm described in 'Drawing Area-Proportional Venn and Euler Diagrams'. Positioning is an estimation; though the distance of each pairwise intersection is guaranteed correct, other regions may not match. The radius of each region is proportional to the cardinality estimation of the set. The annular regions surrounding each circle is a radius-proportional visualization of the error bounds.

Footnotes

  • Refer to the corresponding blog post for more information about js-hll.
  • A big shout out to our intern Jacob Cole for cranking out this demo.
  • We have open sourced this simulation so that you are free to go in and play with its guts to learn more about this visualization.