Program gto_fastq_metagenomics

The gto_fastq_metagenomics is an ultra-fast method to infer metagenomic composition of sequenced reads relative to a database. gto_fastq_metagenomics measures similarity between any FASTQ file (or FASTA), independently from the size, against any multi-FASTA database, such as the entire set of complete genomes from the NCBI. gto_fastq_metagenomics supports single reads, paired-end reads, and compositions of both. It has been tested in many plataforms, such as Illumina MySeq, HiSeq, Novaseq, IonTorrent.

gto_fastq_metagenomics is efficient to detect the presence and authenticate a given species in the FASTQ reads. The core of the method is based on relative data compression. gto_fastq_metagenomics uses variable multi-threading, without multiplying the memory for each thread, being able to run efficiently in a common laptop.

For help type:


			  			./gto_fastq_metagenomics -h

In the following subsections, we explain the input and output paramters.

Input parameters

The gto_fastq_metagenomics program needs a FASTQ file to compute.

The attribution is given according to:


			  			NAME
gto_fastq_metagenomics v3.1: a tool to infer metagenomic composition.

SYNOPSIS
gto_fastq_metagenomics [OPTION]... [FILE1]:[FILE2]:... [FILE]

SAMPLE
gto_fastq_metagenomics -v -F -l 47 -Z -y pro.com reads1.fq:reads2.fq DB.fa

DESCRIPTION
It infers metagenomic sample composition of sequenced reads.
The core of the method uses a cooperation between multiple
context and tolerant context models with several depths.
The reference sequences must be in a multi-FASTA format.
The sequenced reads must be trimmed and in FASTQ format.

Non-mandatory arguments:

-h                   give this help,
-F                   force mode (overwrites top file),
-V                   display version number,
-v                   verbose mode (more information),
-Z                   database local similarity,
-s                   show compression levels,

-l            compression level [1;47],
-p           subsampling (default: 1),
-t              top of similarity (default: 20),
-n         number of threads (default: 2),

-x             similarity top filename,
-y             profile filename (-Z must be on).

Mandatory arguments:

[FILE1]:[FILE2]:...  metagenomic filename (FASTQ),
Use ":" for splitting files.

[FILE]               database filename (Multi-FASTA).

COPYRIGHT
Copyright (C) 2014-2019, IEETA, University of Aveiro.
This is a Free software, under GPLv3. You may redistribute
copies of it under the terms of the GNU - General Public
License v3 .

An example of such an input file is:


			  			@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=72
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACCAAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=72
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9ICIIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=72
GTTCAGGGATACGACGTTTGTATTTTAAGAATCTGAAGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=72
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII6IBIIIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I

Output

The output of the gto_fastq_metagenomics program is a CSV file (top.csv) with the highest probability of being contained in the samples. An example for this CSV file is the following:


			  			1  66725   12.263   NC_037703.1_Saccharomycodes_ludwigii_strain_Y-8871_mitochondrion
2  66725   12.263   NC_037703.1_Saccharomycodes_ludwigii_strain_Y-8871_mitochondrion
3  107123  11.492   NC_012621.1_Nakaseomyces_bacillisporus_mitochondrion
4  107123  11.492   NC_012621.1_Nakaseomyces_bacillisporus_mitochondrion
5  16592   11.153   NC_024030.1_Equus_przewalskii_mitochondrial_DNA
6  14583   10.851   NC_021120.1_Bursaphelenchus_mucronatus_mitochondrion
7  162504  10.607   NC_018415.1_Candidatus_Carsonella_ruddii_CS_isolate_Thao2000
8  10315   10.586   NC_016117.1_Mnemiopsis_leidyi_mitochondrion
9  162589  10.550   NC_018414.1_Candidatus_Carsonella_ruddii_CE_isolate_Thao2000
10 166163  10.476   NC_018416.1_Candidatus_Carsonella_ruddii_HC_isolate_Thao2000