HPC@LSU | Documentation

REPET

The REPET package is a package integrating bioinformatics programs in order to tackle biological issues at the genomic scale. REPET consists in two main pipelines, TEdenovo and TEannot, which are dedicated to the detection, annotation and analysis of repeats in genomic sequences. These pipelines are specifically designed for transposable elements (TEs) detection and annotation.

TEdenovo - this pipeline starts by comparing the genome with itself using BLASTER. Then it clusters matches with GROUPER, RECON and PILER, programs specific for interspersed repeats. For each cluster, it builds a multiple alignment from which a consensus sequence is derived. Finally these consensus are classified according to TE features and redundant consensus are removed. At the end we obtain a library of classified, non-redundant consensus sequences.
TEannot - this pipeline mines a genome with a library of TE sequences, for instance the one produced by the TEdenovo pipeline, using BLASTER, RepeatMasker and CENSOR. An empirical statistical filter is applied to discard false-positive matches. Short simple repeats (SSRs) are annotated along the way with TRF, RepeatMasker and MREPS. Then, MATCHER, via dynamic programming, chains TE fragments belonging to the same, disrupted copy. A "long join" procedure is subsequently applied to connect distant fragments. Finally, annotations are exported into GFF3 or gameXML files.

For additional information, see the text files in:

/usr/local/packages/bioinformatics/REPET/2.0/doc (as of the time this was written).

Users may direct questions to sys-help@loni.org.

High Performance Computing

Louisiana State University

REPET