Louis Abraham's Home Page

This is the solution to this exercise

We consider instead the inverse problem: what is the length of the largest word such that the maximum number of comparisons between two suffices is less than a constant.

Hint

The condition “you can discriminate two suffices in at most $j$ comparisons” is equivalent to “all the subwords of length $j$ are different”.

Suppose the cardinal of the alphabet is $k$.

The result is optimal: if you allow $j$ comparisons, then you can build a string of length $k^j + j-1$ such that the maximum number of comparisons between any two suffixes is at most $j$.

Solution

Upper bound

First of all, let’s prove that in any word of length more than $k^j+j-1$, you can find two identical subwords of length $j$.

Suppose the length of the word is $n ≥ k^j + j$. Then there are $n - j + 1$ suffixes of length at least $j$ (just forget the last $j - 1$ offsets). $n - j + 1 ≥ k^j + 1 > k^j$ so at least one of the $k^j$ words of length $j$ is repeated by the pigeonhole principle.

Lower bound

Conversely, let’s prove that you can find a word of length $k^j+j-1$ with all the different subwords of length $j$.

We are going to use a classical technique: let’s consider the oriented graph of words of length $j-1$. Add all the edges $(a, b)$ such that the last $j-2$ letters of $a$ are the $j-2$ first letters of $b$.

All the vertices have $k$ outgoing edges and $k$ incoming edges, so their degree is $0$. Then there is an Eulerian path.

Note that there is a bijection between the edges and the words of length $j$. In an Eulerian path, two edges are adjacent if they have a node in common, that is the $j-1$ last letters of the first are the same as the first $j-1$ letters of the second.

So an Eulerian path gives a way to concatenate all words of length $j$ by just adding one letter at once, thus constructing a word of total length $k^j+j-1$ and contains all subwords of length $j$.

Around this question

An example is the fact that you can test all 4-digit entry codes by pressing only 10003 keys.

While I investigated this question, I saw this entry about the case $n = 2$: https://oeis.org/A284248. This entry was submitted by Jeffrey Shallit, the author of the amazing book Automatic Sequences, which I warmly recommend if you are interested in such problems.