Probabilistic counting algorithms estimate approximately the number of different words in a very large collection of data in a single pass using a very small memory. We consider a modification of the Probabilistic Counting algorithm of Flajolet and Martin [FM85]. This is based on the observation of occurrences of initial sequences of 's in the binary representation of the words. The precision obtained is around for a memory requirement of 10kbits. The analysis involves maxima of geometric random variables and Mellin transform techniques.
Back to the Index