We study indexing data structures for substring compression problems.Given an input text \(T\), a substring compression problem for a compression scheme \(C\) and a query interval \([i..j]\) asks to compute the compressed form of the substring \(T[i..j]\) with respect to \(C\) — that is, to apply \(C\) to \(T[i..j]\), which we write as \(C(T[i\ldots j])\). While \(T\) and \(C\) are given in advance, the interval \([i\ldots j]\) is specified at query time.
This setting allows preprocessing \(T\) to facilitate efficient answers to substring compression queries under \(C\). Space- or time-optimal solutions are straightforward but impractical:
— A space-optimal solution applies \(C(T[i\ldots j])\) at query time, without any preprocessing (selecting the optimal implementation of \(C\) with respect to space).
— A time-optimal solution precomputes \(C(T[i\ldots j])\) for all text positions \(i, j\), but requires at least quadratic space and preprocessing time.
Here, optimal time refers to linearity in the size of the compressed output. Thus, a natural question arises: Can we build an index over \(T\) that supports substring compression queries with near-optimal query time and subquadratic space?
In this talk, we explore approaches for compression schemes \(C\) related to the Lempel–Ziv 78 factorization, and outline promising directions for future research.
Localisation
Salle de séminaire 4B125 (bâtiment Copernic)
5 Boulevard Descartes 77420 Champs-sur-Marne