Jérémie Bourdon: Generalized Pattern Matching Statistics

This is a joint work with Brigitte Vallée.

In pattern matching algorithms, a characteristic parameter is the number of occurrences of a given pattern in a random text of length $n$ generated by a source. We consider here a generalization of the pattern matching problem in both ways. First, we deal with a generalized notion of pattern that encompasses both classical patterns as well as ``hidden patterns''. Second, we consider a quite general probabilistic model of sources that may possess a high degree of correlations. Such sources are built with dynamical systems and are called dynamical sources. We determine the mean and the variance of the number of occurrences in this generalized pattern matching problem, and establish a property of concentration of distributions. These results are obtained via combinatorics on words, formal language techniques, and methods of analytic combinatorics based on generating operators and generating functions. These generating operators come from dynamical system framework and generate themselves generating functions. The motivation to study this problem comes from an attempt at finding a reliable threshold for intrusion detections, from textual data processing applications, and from molecular biology.

Back to the Index

Please send comments and corrections to Thomas Klausner.