Program gto_fastq_clust_reads

The gto_fastq_clust_reads agroups reads and creates an index file. It cluster reads in therms of Seq k-mer Lexicographical order.

For help type:

./gto_fastq_clust_reads -h


In the following subsections, we explain the input and output paramters.

Input parameters

The gto_fastq_clust_reads program needs two streams for the computation, namely the input and output standard. The input stream is a FASTQ file. The program sorts the FASTQ reads accoring to the lexicographic order of the genomic sequences.

The attribution is given according to:

Usage: ./gto_fastq_clust_reads [options] [[--] args]
or: ./gto_fastq_clust_reads [options]

It agroups reads and creates an index file.
It cluster reads in therms of Seq k-mer Lexicographical order


-h, --help Show this help message and exit

Basic options
-c, --ctx=
< input.fastq Input FASTQ file format (stdin)
> output.fastq Output FASTQ file format (stdout)

Example: ./gto_fastq_clust_reads -c < input.fastq > output.fastq


An example of such an input file is:

@SRR001661.1 071112_SLXA-EAS1_s_7:5:1:817:345
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACCAAGTTACCCTTAACAACTTAAGGG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9ICIIIIIIIIIIIIIIIIIIIIDIII
@SRR001661.2 071112_SLXA-EAS1_s_7:5:1:801:338
GTTCAGGGATACGACGTTTGTATTTTAAGAATCTGAAGCAGAAGTCGATGATAATACGCG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII6IBIIIIIIIIIIIIIIIIIIIIIIIGI
@SRR001661.3 071112_SLXA-EAS1_s_7:5:1:821:328
AACGCGTATTCGGAGCTTCTTCGTTGGGTACGTGCGCCTATTATGCGGCGCGATTGCTAT
+
IIIIIII6BBB6BBBBBBBBBBBBBBBBBDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
@SRR001661.4 071112_SLXA-EAS1_s_7:5:1:943:128
ATCGCGCATTCGACTGGTACGTGTACGTGTAGTCGTAGCGTATGTTCGGTCGTATGCGTG
+
II77777LPMMMPPMMMMIIIIIIIIIIIIII777777777BBBBBBBBDDDDDIIIIII


Output

The output of the gto_fastq_clust_reads program is a FASTQ file with clustered reads in therms of the genomic sequence k-mer Lexicographical order.

An example, for the output, is:

@SRR001661.3 071112_SLXA-EAS1_s_7:5:1:821:328
AACGCGTATTCGGAGCTTCTTCGTTGGGTACGTGCGCCTATTATGCGGCGCGATTGCTAT
+
IIIIIII6BBB6BBBBBBBBBBBBBBBBBDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
@SRR001661.4 071112_SLXA-EAS1_s_7:5:1:943:128
ATCGCGCATTCGACTGGTACGTGTACGTGTAGTCGTAGCGTATGTTCGGTCGTATGCGTG
+
II77777LPMMMPPMMMMIIIIIIIIIIIIII777777777BBBBBBBBDDDDDIIIIII
@SRR001661.1 071112_SLXA-EAS1_s_7:5:1:817:345
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACCAAGTTACCCTTAACAACTTAAGGG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9ICIIIIIIIIIIIIIIIIIIIIDIII
@SRR001661.2 071112_SLXA-EAS1_s_7:5:1:801:338
GTTCAGGGATACGACGTTTGTATTTTAAGAATCTGAAGCAGAAGTCGATGATAATACGCG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII6IBIIIIIIIIIIIIIIIIIIIIIIIGI