What is GTO ?

GTO is a toolkit to unify pipelines in genomic and proteomic research.

GTO is a toolkit for genomics and proteomics, namely for FASTQ, FASTA and SEQ formats, with many complementary tools. The toolkit is for Unix-based systems, built for ultra-fast computations. GTO supports pipes for easy integration with the sub-programs belonging to GTO as well as external tools. GTO works as LEGOs, since it allows the construction of multiple pipelines with many combinations.

GTO includes tools for information display, randomisation, edition, conversion, extraction, search, calculation, compression, simulation and visualisation. GTO is prepared to deal with very large datasets, typically in the scale of Gigabytes or Terabytes (but not limited). The complete toolkit is an optimised command-line version, using the prefix "gto_" followed by the suffix with the respective name of the program. GTO is implemented in C language and it is available, under the MIT license

Installation

Get GTO by installing them directly using the Cobilab channel from Conda:


              conda install -c cobilab gto --yes

Or through the GitHub repository:


              git clone https://github.com/cobilab/gto.git

                 cd gto/src/

                 make

Note, an already compiled version of GTO is available for 64 bit Linux OS in the bin/ directory.

GTO Programs

The GTO provides pipe support for easy integration with the majority of the tools. These include programs to shuffle, transform, simulate, compress, vizualize, among others. The GTO includes the following tools, divided by genomic data format type.

1. FASTQ tools


            
              1.1 gto_fastq_to_fasta

              1.2 gto_fastq_to_mfasta

              1.3 gto_fastq_exclude_n

              1.4 gto_fastq_extract_quality_scores

              1.5 gto_fastq_info

              1.6 gto_fastq_maximum_read_size

              1.7 gto_fastq_minimum_quality_score

              1.8 gto_fastq_minimum_read_size

              1.9 gto_fastq_rand_extra_chars

              1.10 gto_fastq_from_seq

              1.11 gto_fastq_mutate

              1.12 gto_fastq_split

              1.13 gto_fastq_pack

              1.14 gto_fastq_unpack

              1.15 gto_fastq_quality_score_info

              1.16 gto_fastq_quality_score_max

              1.17 gto_fastq_quality_score_min

              1.18 gto_fastq_cut

              1.19 gto_fastq_minimum_local_quality_score_forward

              1.20 gto_fastq_minimum_local_quality_score_reverse

              1.21 gto_fastq_xs

              1.22 gto_fastq_clust_reads

              1.23 gto_fastq_complement

              1.24 gto_fastq_reverse

              1.25 gto_fastq_variation_map

              1.26 gto_fastq_variation_filter

              1.27 gto_fastq_variation_visual

              1.28 gto_fastq_metagenomics

2. FASTA tools


            
              2.1 gto_fasta_to_seq

              2.2 gto_fasta_from_seq

              2.3 gto_fasta_extract

              2.4 gto_fasta_extract_by_read

              2.5 gto_fasta_info

              2.6 gto_fasta_mutate

              2.7 gto_fasta_rand_extra_chars

              2.8 gto_fasta_extract_read_by_pattern

              2.9 gto_fasta_find_n_pos

              2.10 gto_fasta_split_reads

              2.11 gto_fasta_rename_human_headers

              2.12 gto_fasta_extract_pattern_coords

              2.13 gto_fasta_complement

              2.14 gto_fasta_reverse

              2.15 gto_fasta_variation_map (also an alias to gto_fastq_variation_map)

              2.16 gto_fasta_variation_filter (also an alias to gto_fastq_variation_filter)

              2.17 gto_fasta_variation_visual (also an alias to gto_fastq_variation_visual)

3. Genomic sequence tools


            
              3.1 gto_genomic_gen_random_dna

              3.2 gto_genomic_rand_seq_extra_chars

              3.3 gto_genomic_dna_mutate

              3.4 gto_genomic_extract

              3.5 gto_genomic_period

              3.6 gto_genomic_count_bases

              3.7 gto_genomic_compressor

              3.8 gto_genomic_decompressor

              3.9 gto_genomic_complement

              3.10 gto_genomic_reverse

              3.11 gto_genomic_variation_map (also an alias to gto_fastq_variation_map)

              3.12 gto_genomic_variation_filter (also an alias to gto_fastq_variation_filter)

              3.13 gto_genomic_variation_visual (also an alias to gto_fastq_variation_visual)

4. Amino acid sequence tools


            
              4.1 gto_amino_acid_to_group

              4.2 gto_amino_acid_to_pseudo_dna

              4.3 gto_amino_acid_compressor

              4.4 gto_amino_acid_decompressor

              4.5 gto_amino_acid_from_fastq

              4.6 gto_amino_acid_from_fasta

              4.7 gto_amino_acid_from_seq

5. General purpose tools


            
              5.1 gto_char_to_line

              5.2 gto_new_line_on_new_x

              5.3 gto_upper_bound

              5.4 gto_lower_bound

              5.5 gto_brute_force_string

              5.6 gto_real_to_binary_with_threshold

              5.7 gto_sum

              5.8 gto_filter

              5.9 gto_word_search

              5.10 gto_permute_by_blocks

              5.11 gto_info

              5.12 gto_segment

              5.13 gto_comparative_map

              5.14 gto_max

              5.15 gto_min

Contribute

You can contribute to this project here .

Core team

João R. Almeida
Armando José Pinho
José Luís Oliveira
Olga Fajarda
Diogo Pratas