The gto_fasta_find_n_pos reports the ''N'' regions in a sequence or FASTA (seq) file.
For help type:
./gto_fasta_find_n_pos -h
In the following subsections, we explain the input and output paramters.
The gto_fasta_find_n_pos program needs two streams for the computation, namely the input and output standard. The input stream is a FASTA file or a sequence.
The attribution is given according to:
Usage: ./gto_fasta_find_n_pos [options] [[--] args]
or: ./gto_fasta_find_n_pos [options]
It reports the 'N' regions in a sequence or FASTA (seq) file.
-h, --help show this help message and exit
Basic options
< input.fasta Input FASTQ file format or a sequence (stdin)
> output Output report of 'N' positions (stdout)
Example: ./gto_fasta_find_n_pos < input.fasta > output
The output obeys the following structure:
Begin End Positions
An example of such an input file is:
>AB000264 |acc=AB000264|descr=Homo sapiens mRNA
NCNNNACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCC
GNCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAA
GTNGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCC
GCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACNTCTTCTGGAAGACCTTCTCCACCCCCCCAGC
TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAN
The output of the gto_fasta_find_n_pos program is a structured report of ''N'' appearances in the sequence or FASTA file. The first column is the first position of the ''N'' appearance, the second is the position of the last ''N'' in the interval found, and the last column is the count of ''N'' in this interval.
Using the input above, an output example for this is the following:
1 1 1
3 5 3
82 82 1
163 163 1
289 289 1