Program gto_fasta_find_n_pos

The gto_fasta_find_n_pos reports the ''N'' regions in a sequence or FASTA (seq) file.

For help type:

./gto_fasta_find_n_pos -h


In the following subsections, we explain the input and output paramters.

Input parameters

The gto_fasta_find_n_pos program needs two streams for the computation, namely the input and output standard. The input stream is a FASTA file or a sequence.

The attribution is given according to:

Usage: ./gto_fasta_find_n_pos [options] [[--] args]
or: ./gto_fasta_find_n_pos [options]

It reports the 'N' regions in a sequence or FASTA (seq) file.

-h, --help show this help message and exit

Basic options
< input.fasta Input FASTQ file format or a sequence (stdin)
> output Output report of 'N' positions (stdout)

Example: ./gto_fasta_find_n_pos < input.fasta > output

The output obeys the following structure:
Begin End Positions


An example of such an input file is:

>AB000264 |acc=AB000264|descr=Homo sapiens mRNA
NCNNNACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCC
GNCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAA
GTNGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCC
GCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACNTCTTCTGGAAGACCTTCTCCACCCCCCCAGC
TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAN


Output

The output of the gto_fasta_find_n_pos program is a structured report of ''N'' appearances in the sequence or FASTA file. The first column is the first position of the ''N'' appearance, the second is the position of the last ''N'' in the interval found, and the last column is the count of ''N'' in this interval.

Using the input above, an output example for this is the following:

1 1 1
3 5 3
82 82 1
163 163 1
289 289 1