GDE Linux Integrated Software

An integrated linux environment for bioinformatics and evolutionary analysis based on the Genetic Data Environment (GDE).

Cite the usage of GDE by citing the following paper
de Oliveira T, Miller R, Tarin M, Cassol S. An Integrated Genetic Data Environment (GDE)-Based LINUX Interface for Analysis of HIV-1 and Other Microbial Sequences. (2003) Bioinformatics, 19(1): 153-4.

Phylogenetic Software: Alignment, Consensus, DNA translation and Assembly Software:
BIRCH & GDE:  Sequence database integration PERL scripts:
  • BIRCH is a resource for molecular biology, consisting of software and databases. Most of the programs and databases of BIRCH have been unified through the Genetic Data Environment (GDE).
  • XYLEM Tools for Manipulation of Genetic Databases. 

PAUP -Official PAUP web page
Abstract (from PAUP* web pages):
PAUP* version 4.0 is a major upgrade and new release of the software package for inference of evolutionary trees, for use in Macintosh, Windows, UNIX/VMS, or DOS-based formats. The influence of high speed computer analysis of molecular, morphological and/or behavioral data to infer phylogenetic relationships has expanded well beyond its central role in evolutionary biology, now encompassing applications in areas as diverse as conservation biology, ecology, and forensic studies. The success of previous versions of PAUP: Phylogenetic Analysis Using Parsimony has made it the most widely used software package for the inference of evolutionary trees. In addition, the PAUP manual has proven to be an essential guide, serving as a comprehensive introduction to phylogenetic analysis for beginning researchers, as well as an important reference for experts in the field. With the inclusion of maximum likelihood and distance methods in PAUP* 4.0, the new version represents a great improvement over its predecessors. In addition, the speed of the branch-and-bound algorithm has been enhanced and a number of new features have been added, from agreement subtrees to tests for combinability of data and permutation tests for non randomness of data structure. These, along with many other improvements, will make PAUP* 4.0 an even more indispensable tool in comparative biological analysis than were previous editions of the program and manual. PAUP* 4.0 and MacClade 3 use a common data file format (NEXUS) , allowing easy interchange of data between the two programs.

PHYLIP (the PHYLogeny Inference Package) - Official Phylip Web Page
PHYLIP is a freely available package of programs for inferring phylogenies. PHYLIP can perform a large number of analysis including parsimony, distance matrix and likelihood methods as well as bootstrapping and consensus trees. Data types that can be handled include DNA sequences, gene frequencies, RFLP data, distance matrices and discrete character data (0/1).

PAML- Official PAML Web Page
A short summary of the types of analyses performed by different programs in the package is given below (extracted from documentation).

baseml: ML analysis of nucleotide sequences: estimation of tree topology, branch lengths, and substitution parameters under a variety of nucleotide substitution models; constant or gamma rates for sites; molecular clock (rate constancy among lineages) or no clock, among-gene and within-gene variation of substitution rates; models for combined analyses of multiple sequence data sets; calculation of substitution rates at sites; reconstruction of ancestral nucleotides.

basemlg: ML analysis of nucleotide sequences under the model of gamma rates among sites.

codonml: ML analysis of protein-coding DNA sequences using codon substitution models (e.g.Goldman and Yang 1994); calculation of the codon-usage table; estimation of synonymous and non synonymous substitution rates; likelihood ratio test of positive selection or relaxed selective constraints along lineages; identification of amino acid sites or evolutionary lineages potentially under positive selection; reconstruction of ancestral codon sequences.

aaml: ML analysis of amino acid sequences under a number of amino acid substitution models; constant or gamma-distributed rates among sites; molecular clock (rate constancy among lineages) or no clock, among-gene and within-gene variation of substitution rates; models for combined analyses of multiple gene data; calculation of substitution rates at sites; reconstruction of ancestral amino acid sequences.

pamp: Parsimony-based analyses for a given tree topology, estimation of the substitution pattern, the gamma parameter for variable rates among sites, and reconstruction of ancestral character states.

mcmctree: Bayesian estimation of phylogenies using DNA sequence data (Rannala and Yang 1996; Yang and Rannala 1997); Markov chain Monte Carlo calculation of posterior probabilities of trees.

evolver: This program does miscellaneous things, such as listing all rooted and unrooted trees for a given number of species, generating random trees with branch lengths from a birth-death process with species sampling, and calculating tree bipartition distances. It now also simulates nucleotide, codon, or amino acid sequence data sets.

yn00: This program implements the method of Yang and Nielsen 2000 for estimating synonymous and non synonymous substitution rates in pairwise comparison of protein-coding DNA sequences.

Tree-Puzzle - Official Tree Puzzle Web Page
    TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. It implements a fast tree search algorithm, quartet puzzling, that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. TREE-PUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. Branch lengths can be calculated under the clock-assumption. In addition, TREE-PUZZLE offers a novel method, likelihood mapping, to investigate the support of a hypothesized internal branch without computing an overall tree and to visualize the phylogenetic content of a sequence alignment. TREE-PUZZLE also conducts a number of statistical tests on the data set (chi-square test for homogeneity of base composition, likelihood ratio clock test, Kishino-Hasegawa test). The models of substitution provided by TREE-PUZZLE are TN, HKY, F84, SH for nucleotides, Dayhoff, JTT,mtREV24, VT, WAG, BLOSUM 62 for amino acids, and F81 for two-state data. Rate heterogeneity is modeled by a discrete Gamma distribution and by allowing invariable sites. The corresponding parameters can be inferred from the data set.

LAMARC -     Official LAMARC Web Page.
    Abstract (from LAMARC webpages):
    LAMARC is a package of programs for computing population parameters, such as population size, population growth rate and migration rates by using likelihoods for samples of data (sequences, microsatellites, and electrophoretic polymorphisms) from populations. It approximates the summation of likelihood over all possible gene genealogies that could explain the observed sample. The programs are memory-intensive but can run effectively on workstations or modern microcomputers. The package is continually expanding and more executables for different machines will become available soon. LAMARC currently has four core programs: Coalesce estimates the effective population size of a single constant population using nonrecombining sequences, Fluctuate estimates the effective population size and a exponential growth rate of a single growing population using nonrecombining sequences, Migrate estimates the effective population sizes and migration rates of n constant population using nonrecombining sequences, microsatellite data or enzyme electrophoretic data, Recombine estimates the effective population size and per-site recombination rate of a single constant-size population.
  • Migrate Manual
  • Coalesce Manual
  • Fluctuate Manual

TREETOOL -  Official Treetool Web Page.
Treetool is an interactive tool for displaying, editing, and printing phylogenetic trees. The tree is displayed visually on screen, in various formats, and the user is able to modify the format, structure, and characteristics of the tree. Trees may be viewed, compared, formatted for printing, constructed from smaller trees, etc...
Treetool works with Newick format tree files (Paup and Phylip compatible). It handles multifurcating trees, branch lengths (evolutionary distances), rooted/unrooted trees, and multiple trees per file. It can print to aPostScript printer, or output PICT graphics for Macintosh drawing programs (MacDraw).

TreeViewOfficial TreeView Web Page
TreeView X is program to display phylogenetic trees on Linux and Unix platforms. It can read and displays NEXUS and Newick format tree files (such as those output by PAUP*, ClustalX, TREE-PUZZLE, and other programs). It has a subset of the functionality of the version of TreeView available for the Windows and Macs (it is roughly equivalent to version 0.95 of TreeView).

Clustalw Official Clustalw Web Page
Clustal W (Thompson et al., 1994) is a global alignment program for DNA or protein sequence data. It uses a progressive alignment algorithm and a guide tree based on sequence similarity to align DNA or amino acid sequences. It aligns similar sequences first before aligning more distant sequences. The user specifies a gap cost which consists of the cost of opening a gap plus the cost of lengthening a gap. ClustalW v. 1.81 is now more compatible with GCG and has new features which allow biologically meaningful alignment of divergent sequences. For recent reviews comparing multiple alignment algorithms see Hickson et al. (2000), Thompson et al. (1999), and McClure et al. (1994). Morrison and Ellis (1997) discuss the effects of nucleotide sequence alignment on the estimation of phylogenetic hypotheses. You can visit the ClustalX site to see the graphics interface.
    Some Features of CLUSTALW:

    1. Individual weights are assigned to each sequence in a partial alignment in order to down-weight sequences that are nearly identical and up-weight the most divergent ones.

    2. Amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned.

    3. Residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure.

    4. Positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions.

    5. Several amino acid substitution matrices series are available, and user defined matrices can be used.

  • Clustalw Manual.

PhrapOfficial Phrap Web Page
    Phrap is a program for shotgun sequence assembly. Key featuresinclude:

    --Use of data quality information, both direct (from phred trace analysis) and indirect (from pairwise read comparisons), to delineate the likely accurate base calls in each read; this helps discriminate repeats, permits use of the full reads in assembly, and allows a highly accurate consensus sequence to be generated. A probability of error is computed for each consensus sequence position, which can be used to focus human editing on particular regions, to automate decision-making about where additional data are needed, and to provide users of the final sequence with information about local variations in quality.

  • Phrap Manual.

Translate Official Translate Web Page
Translate program translates selected sequences from DNA/RNA to Amino Acid. This program can either be used with a packed FASTA format sequence or GDE format sequence. The frame number is appended to the sequence name as '.#'
This translate program is a modified version of 'Translate' program, distributed with GDE (Genetic Data Environment) package. The present version is produced by John C. Kelley and Rao Parasa of CIT at NIH.

BLAST - Official BLAST Web Page
BLAST (Basic Local Alignment Search Tool) is the heuristic search algorithm employed by the programs blastp, blastn, blastx, tblastn, tblastx and blast3.

NOTE: To differentiate the local BLAST programs from their network counterpart, an 'l' is prepended to the name of each local blast program.

The programs are used for the following purposes:

blastp - to compare an amino acid query sequence vs. a protein sequence database.

blastn - to compare a nucleotide query sequence vs. a nucleotide sequence database.

blastx - to compare a nucleotide query sequence translated in all reading frames vs. a protein sequence database.

tblastn - to compare a protein query sequence vs. a nucleotide sequence database dynamically translated in all reading frames.

tblastx - to compare a nucleotide query sequence translated in all reading frames vs. a nucleotide sequence database translated in all reading frames.

blast3 - protein database search for three-way alignments, using the BLAST pairwise search algorithm.

FASTA - Official FASTA Web Page
Compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.

VESPA - Official VESPA Web Page
The VESPA program detects signature patterns (atypical amino acid or nucleotide residues) in a set of query sequences relative to a set of reference sequences. VESPA calculates the the frequency of each amino acid (or nucleotide) at each position (column) in an alignment for the query and the reference set, and selects the positions for which the most common character in the query set differs from that in the background set. The frequencies of characters at the distinguishing sites are also calculated. See Korber B and Myers G: Signature pattern analysis: a method for assessing viral sequence relatedness AIDS Res. Human Retroviruses 8(9): 1549-1560 (1992).

Readseq is a reformatting program for DNA or protein sequence data developed by Dr. Don Gilbert. It allows the input of single or multiple sequences in a variety of common formats and converts to a specified format. Additional information including acceptable formats are available at the Baylor College of Medicine site. A complete history of this program can be found at the Univerisity of Manitoba site.It is a widely used tool to convert various sequence data formats and is used to make the sequences palatable for a specific application. It can be used to extract the raw sequence from any supported database format. The program has some limitations in terms of the total number of sequences and compatible input formats.

Page last updated by Tulio de Oliveira.