A simple script that does what it reads on the tin. When used with Illumina high-throughput projects, it identifies rRNA, contaminants, trims in a standard fashion etc. Maintains read pairs. Built for fire-and-forget high throughput projects (terabytes of data).

Also known as "Alexie’s Illumina preprocessing pipeline", in other words, no warranty but we use it.

Obtaining it

Dependencies

Uses pbzip2 (optional) On Ubuntu you can install most of these software:

sudo apt-get install pbzip2

See 3rd_party directory for the others.

preprocess_illumina.pl <infile> <infile2> etc

or for pairs:

preprocess_illumina.pl -paired <infile1> <infile2>

Usage

See

perldoc preprocess_illumina.pl

or code within

vim preprocess_illumina.pl

Programs used

Trimmomatic

Lohse M, Bolger AM, Nagel A, Fernie AR, Lunn JE, Stitt M, Usadel B. RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res. 2012 Jul;40(Web Server issue):W622-7.
http://www.usadellab.org/cms/index.php?page=trimmomatic

pbzip2

Jeff Gilchrist
http://compression.ca
Debian/Ubuntu:
apt-get install pbzip2

fastqc

simon.andrews@babraham.ac.uk
http://www.bioinformatics.bbsrc.ac.uk