mreps software homepage

What is mreps

mreps is a flexible and efficient software for identifying serial repeats (usually called tandem repeats) in DNA sequences. It was developed in the years 2000-2005 at LORIA in former Adage group and is currently maintained by Gregory Kucherov.

See a mini-tutorial of mreps for more explanations on what mreps computes.

The following paper describes mreps together with some examples of its application. Please cite this paper when referring to mreps.

[1] R. Kolpakov, G. Bana, and G. Kucherov, mreps: efficient and flexible detection of tandem repeats in DNA, Nucleic Acid Research, 31 (13), July 1 2003, pp 3672-3678.

Combinatorial algorithms implemented in mreps have been presented in the following publications.

[2] R. Kolpakov, G. Kucherov, Finding maximal repetitions in a word in linear time, 1999 Symposium on Foundations of Computer Science (FOCS), New-York (USA), pp. 596-604, IEEE Computer Society

[3] R. Kolpakov, G. Kucherov, Finding approximate repetitions under Hamming distance, Theoretical Computer Science, 2003, vol 303 (1), pp 135-156. An extended abstract appeared in the 9th European Symposium on Algorithms (ESA 2001), Aarhus, Denmark, 2001

Download, install and use it!

Go to the Help page, all instructions are there

Run it via a Web interface

You can run mreps via a Web interface. Try it!

Some features of mreps

Mixed combinatorial/heuristic approach
mreps is based on a mixed combinatorial/heuristic paradigm. The core of mreps is constituted by exhaustive combinatorial algorithms (described in [2,3]) used to find all repeats verifying certain mathematical properties. This insures the exhaustivity of the approach. Those repeats are then submitted to an heuristic treatment in order to obtain more biologically relevant representation of the repeats. A description of mreps can be found in [1].

Identifying "fuzzy" repeats
mreps has a resolution parameter that allows to compute "fuzzy" repeats. In metaphoric terms, this parameter plays the role of "magnifying glass" allowing to "zoom out" the genomic sequence in order to compute looser repeats.

Efficiency
mreps has no limitation whatsoever on the pattern size (size of the repeated unit) of computed repeats -- repeats of all possible pattern sizes can be computed within a single program run. Moreover, depending on the resolution parameter, this run is very fast: for low resolution values processing sequences of dozens of millions bases takes only several seconds on a regular PC.

Limitations
mreps algorithm does not deal with indels (insertions/deletions of nucleotides), but only with substitutions. As a result, indels are treated in an indirect way, and certain repeats containing indels may be missed.

Credits

The following people contributed to mreps: Ghizlane Bana, Mathieu Giraud, Liliana Ibanescu, Roman Kolpakov, Gregory Kucherov, Ralph Rabbat. Special thanks to Laurent Noé for help.

For questions about mreps or for bug reports, please contact