What is mreps
mreps is a flexible and efficient software for identifying serial repeats (usually called tandem repeats) in DNA sequences. It was developed in the years 2000-2005 at LORIA in former Adage group and is currently maintained by Gregory Kucherov.
See a mini-tutorial of mreps for more explanations on what mreps computes.
The following paper describes mreps together with some examples of its application. Please cite this paper when referring to mreps.
[1] R. Kolpakov, G. Bana, and G. Kucherov, mreps: efficient and flexible detection of tandem repeats in DNA, Nucleic Acid Research, 31 (13), July 1 2003, pp 3672-3678.
Combinatorial algorithms implemented in mreps have been presented in the following publications.
[2] R. Kolpakov, G. Kucherov, Finding maximal repetitions in a word in linear time, 1999 Symposium on Foundations of Computer Science (FOCS), New-York (USA), pp. 596-604, IEEE Computer Society
[3] R. Kolpakov, G. Kucherov, Finding approximate repetitions under Hamming distance, Theoretical Computer Science, 2003, vol 303 (1), pp 135-156. An extended abstract appeared in the 9th European Symposium on Algorithms (ESA 2001), Aarhus, Denmark, 2001
Download, install and use it!
Go to the Help page, all instructions are thereRun it via a Web interface
You can run mreps via a Web interface. Try it!Some features of mreps
Mixed combinatorial/heuristic approach
mreps is based on a mixed combinatorial/heuristic paradigm. The core of mreps is constituted by exhaustive combinatorial algorithms (described in [2,3]) used to find all
repeats verifying certain mathematical properties. This insures the
exhaustivity of the approach. Those repeats are then submitted to an
heuristic treatment in order to obtain more biologically relevant
representation of the repeats. A description of mreps can be found in [1].
Identifying "fuzzy" repeats
mreps has a resolution parameter that allows to compute "fuzzy" repeats. In metaphoric terms, this parameter plays the role of "magnifying glass" allowing to "zoom out" the genomic sequence in order to compute looser repeats.
Efficiency
mreps has no limitation whatsoever on the pattern size (size of
the repeated unit) of computed repeats -- repeats of all possible
pattern sizes can be computed within a single program run. Moreover,
depending on the resolution parameter, this run is very fast: for low
resolution values processing sequences of dozens of millions bases takes
only several seconds on a regular PC.
Limitations
mreps algorithm does not deal with indels (insertions/deletions
of nucleotides), but only with substitutions. As a result, indels are
treated in an indirect way, and certain repeats containing indels may be
missed.
Credits
The following people contributed to mreps: Ghizlane Bana, Mathieu Giraud, Liliana Ibanescu, Roman Kolpakov, Gregory Kucherov, Ralph Rabbat. Special thanks to Laurent Noé for help.
For questions about mreps or for bug reports, please contact