Computing Runs in Strings Over General Ordered Alphabets

Orateur : Jonas Ellert
05 Octobre 2021 à 14:00

A run in a string is a maximal periodic substring. For example, the string « bananatree » contains exactly the runs « anana » and « ee ». There are less than n runs in any length-n string, and computing all runs in a string over a linearly-sortable alphabet takes O(n) time (Bannai et al., SIAM J. Comput. 2017).

We consider the problem of computing all runs in a string over a general ordered alphabet (i.e. the symbols are totally ordered, and testing the order of any two symbols takes constant time — but linear-time sorting of the symbols is not possible). We show that by exploiting powerful properties of the Lyndon array, we can achieve optimal linear time and space. This improves the previously best known result by Crochemore et al., who solved the problem in O((n)) time (where α is the inverse Ackermann function; SPIRE 2016). Our result positively answers the at least 29-year-old question whether square-freeness can be tested in linear time over general ordered alphabets (Breslauer, PhD thesis, Columbia University 1992).

The presentation will cover a brief introduction to the different alphabet types, as well as selected techniques used for the linear time runs computation. Some of them (like the efficient computation of the Lyndon array) are of independent interest.

Paper: https://doi.org/10.4230/LIPIcs.ICALP.2021.63
Video presentation: https://www.youtube.com/watch?v=bHbXVlb_EJ4

Joint work with Johannes Fischer.