What is GTO ?

GTO is a toolkit to unify pipelines in genomic and proteomic research.

GTO is a toolkit for genomics and proteomics, namely for FASTQ, FASTA and SEQ formats, with many complementary tools. The toolkit is for Unix-based systems, built for ultra-fast computations. GTO supports pipes for easy integration with the sub-programs belonging to GTO as well as external tools. GTO works as LEGOs, since it allows the construction of multiple pipelines with many combinations.

GTO includes tools for information display, randomisation, edition, conversion, extraction, search, calculation, compression, simulation and visualisation. GTO is prepared to deal with very large datasets, typically in the scale of Gigabytes or Terabytes (but not limited). The complete toolkit is an optimised command-line version, using the prefix "gto_" followed by the suffix with the respective name of the program. GTO is implemented in C language and it is available, under the MIT license


Get GTO by installing them directly using the Cobilab channel from Conda:

conda install -c cobilab gto --yes

Or through the GitHub repository:

git clone https://github.com/cobilab/gto.git
cd gto/src/

Note, an already compiled version of GTO is available for 64 bit Linux OS in the bin/ directory.

GTO Programs

The GTO provides pipe support for easy integration with the majority of the tools. These include programs to shuffle, transform, simulate, compress, vizualize, among others. The GTO includes the following tools, divided by genomic data format type.

1. FASTQ tools

2. FASTA tools

3. Genomic sequence tools

4. Amino acid sequence tools

5. General purpose tools


You can contribute to this project here .

Core team