Tools and Features

The IonGAP workflow consists of three stages or modules. However, the user is allowed to disable any of these modules if desired, in order to speed up the whole process. A test dataset is available for trying the platform.

Genome Assembly

The Genome Assembly module is mainly composed by the MIRA assembler. The determination of the most suitable assembler for Ion Torrent data, as well as the assembler optimization, have entailed a series of exhaustive comparative studies. Moreover, the configuration of the assembler has been greatly simplified by means of a user-friendly Web interface, which allows to start an assembly project just by submitting a FASTQ or BAM file. This file may be compressed in zip or tar.bz2 format (the output compression formats offered by the Torrent Server's FileExporter plugin), in order to reduce the upload time. The user is allowed to configure a variety of relevant assembly parameters, in case the default assembly is not satisfactory, as well as to choose between 11 assembly output formats.

The assembler is set to deal with Ion Torrent single-end reads, which must be contained in a single file and will be also analyzed by FastQC in order to generate an informative quality report. Due to resource limitations, IonGAP allows readsets of up to about 3 million Ion Torrent single-end reads. This corresponds to about 1 GB in FASTQ format (the size will vary largely depending on the input format). We do not recommend preprocessing or trimming the reads before submitting, but users could find random sampling or digital normalization useful in order to reduce the amount of reads. The processes which compose this module and the results obtained from each one are detailed in the table below.

Application Process Result Result examples
MIRA Genome assembly Assembled contigs in various formats, assembly information and statistics Examples
FastQC Reads quality analysis Quality analysis report Examples

The assembly stage may also be disregarded, as the service allows the user to supply a set of assembled contigs in FASTA format. This makes the rest of the pipeline accessible to users of other sequencing platforms.

Comparative Genomics

Following the assembly, if the user provides a reference sequence (or its NCBI accession/GI number), there is a succession of comparative analysis processes performed by different external applications. The result of this stage is a set of graphical and textual reports of the alignment between the contigs and the reference sequence, as well as relevant information derived from it, such as the set of gaps and missing regions found in the assembly. If a set of sequence reads is available (Genome Assembly module enabled), variant calling and annotation of SNPs are also performed. The tools involved in this stage and the results obtained from each one are detailed in the table below.

Application Process Result Result examples
MUMmer, Circos, Circoletto, genoPlotR Genome alignment Linear and circular alignment graphs Examples
Mauve Genome alignment, contig reordering Reordered contigs, alignment summary, information on gaps and missing regions of the assembly Examples
Cortex Variant calling Variant calls (from raw reads) in VCF format Examples
TRAMS SNP annotation Functional annotation of SNP calls (from raw reads) Examples

Bacterial Classification and Annotation

When studying bacterial genomes, there are some vital processes aimed at the identification of the organism and its genomic characteristics, which can influence its pathogenic potential. The present module, composed of classification and functional analysis routines, makes IonGAP ideal for clinical bacteriology procedures involving bacterial genome sequencing, assembly and characterization. The applications which take part in this module and the results obtained from each one are detailed in the table below.

Application Process Result Result examples
BLAST, NCBI's 16S rRNA DB Taxonomic classification Tabular file containing 16S rRNA sequence alignments for each contig Examples
Prokka Genome annotation Annotated contigs in several formats, annotated protein sequences Examples
Torsten Seemann's mlst Multilocus sequence typing Identified ST and allele numbers, allele sequences Examples
BLAST, NCBI's Plamids DB Identification of plamids Tabular file containing plasmid sequence alignments for each contig Examples
BLAST, MvirDB Identification of virulence factors Tabular files containing sequence alignments of antibiotic resistance genes, virulence proteins and pathogenicity islands for each contig Examples

For more information about the pipeline results and how to interpret them, please consult the User Manual.