GDE Linux Integrated Software
An integrated linux environment for
bioinformatics and evolutionary analysis based on the Genetic
Data Environment (GDE).
Cite the usage of GDE by citing the following paper
de Oliveira T, Miller R, Tarin M, Cassol S. An Integrated Genetic Data Environment (GDE)-Based LINUX Interface for Analysis of HIV-1 and Other Microbial Sequences. (2003) Bioinformatics, 19(1): 153-4.
PAUP web page
Abstract (from PAUP* web pages):
PAUP* version 4.0 is a major upgrade and new release of the software
package for inference of evolutionary trees, for use in Macintosh,
Windows, UNIX/VMS, or DOS-based formats. The influence of high speed
computer analysis of molecular, morphological and/or behavioral data to
infer phylogenetic relationships
has expanded well beyond its central role in evolutionary biology, now
encompassing applications in areas as diverse as conservation biology,
ecology, and forensic studies. The success of previous versions of
PAUP: Phylogenetic Analysis
Using Parsimony has made it the most widely used software package for
inference of evolutionary trees. In addition, the PAUP manual has
to be an essential guide, serving as a comprehensive introduction to
analysis for beginning researchers, as well as an important reference
experts in the field. With the inclusion of maximum likelihood and
methods in PAUP* 4.0, the new version represents a great improvement
its predecessors. In addition, the speed of the branch-and-bound
has been enhanced and a number of new features have been added, from
subtrees to tests for combinability of data and permutation tests for
randomness of data structure. These, along with many other
will make PAUP* 4.0 an even more indispensable tool in comparative
analysis than were previous editions of the program and manual. PAUP*
and MacClade 3 use a common data file format (NEXUS) , allowing easy
of data between the two programs.
PHYLIP (the PHYLogeny Inference
Phylip Web Page
PHYLIP is a freely available package of programs for inferring
phylogenies. PHYLIP can perform a large number of analysis including
parsimony, distance matrix and likelihood methods as well as
bootstrapping and consensus trees. Data types that can be handled
include DNA sequences, gene frequencies, RFLP
data, distance matrices and discrete character data (0/1).
PAML- Official PAML
A short summary of the types of analyses performed by
different programs in the package is given below (extracted from
baseml: ML analysis of nucleotide sequences:
tree topology, branch lengths, and substitution parameters under a
variety of nucleotide substitution models; constant or gamma rates for
sites; molecular clock (rate constancy among lineages) or no clock,
among-gene and within-gene variation of substitution rates; models for
combined analyses of multiple sequence data sets; calculation of
substitution rates at sites; reconstruction of ancestral nucleotides.
basemlg: ML analysis of nucleotide sequences under
gamma rates among sites.
codonml: ML analysis of protein-coding DNA sequences
codon substitution models (e.g.Goldman and Yang 1994); calculation of
the codon-usage table; estimation of synonymous and non synonymous
substitution rates; likelihood ratio test of positive selection or
relaxed selective constraints along
lineages; identification of amino acid sites or evolutionary lineages
under positive selection; reconstruction of ancestral codon sequences.
aaml: ML analysis of amino acid sequences under a
of amino acid substitution models; constant or gamma-distributed rates
among sites; molecular clock (rate constancy among lineages) or no
clock, among-gene and
within-gene variation of substitution rates; models for combined
analyses of multiple gene data; calculation of substitution rates at
sites; reconstruction of ancestral amino acid sequences.
pamp: Parsimony-based analyses for a given tree
estimation of the substitution pattern, the gamma parameter for
variable rates among sites, and reconstruction of ancestral character
mcmctree: Bayesian estimation of phylogenies using
sequence data (Rannala and Yang 1996; Yang and Rannala 1997); Markov
chain Monte Carlo
calculation of posterior probabilities of trees.
evolver: This program does miscellaneous things, such
listing all rooted and unrooted trees for a given number of species,
trees with branch lengths from a birth-death process with species
and calculating tree bipartition distances. It now also simulates
codon, or amino acid sequence data sets.
yn00: This program implements the method of Yang and
for estimating synonymous and non synonymous substitution rates in
pairwise comparison of protein-coding DNA sequences.
Tree-Puzzle - Official Tree Puzzle Web Page
TREE-PUZZLE is a computer program to reconstruct phylogenetic trees
from molecular sequence data by maximum likelihood. It implements a
search algorithm, quartet puzzling, that allows analysis of large data
and automatically assigns estimations of support to each internal
TREE-PUZZLE also computes pairwise maximum likelihood distances as well
branch lengths for user specified trees. Branch lengths can be
under the clock-assumption. In addition, TREE-PUZZLE offers a novel
likelihood mapping, to investigate the support of a hypothesized
branch without computing an overall tree and to visualize the
content of a sequence alignment. TREE-PUZZLE also conducts a number of
tests on the data set (chi-square test for homogeneity of base
likelihood ratio clock test, Kishino-Hasegawa test). The models of
provided by TREE-PUZZLE are TN, HKY, F84, SH for nucleotides, Dayhoff,
VT, WAG, BLOSUM 62 for amino acids, and F81 for two-state data. Rate
is modeled by a discrete Gamma distribution and by allowing invariable
The corresponding parameters can be inferred from the data set.
LAMARC - Official
LAMARC Web Page.
Abstract (from LAMARC webpages):
LAMARC is a package of programs for computing population parameters,
such as population size, population growth rate and migration rates by
likelihoods for samples of data (sequences, microsatellites, and
polymorphisms) from populations. It approximates the summation of
over all possible gene genealogies that could explain the observed
The programs are memory-intensive but can run effectively on
or modern microcomputers. The package is continually expanding and more
executables for different machines will become available soon. LAMARC
has four core programs: Coalesce estimates the effective population
of a single constant population using nonrecombining sequences,
estimates the effective population size and a exponential growth rate
a single growing population using nonrecombining sequences, Migrate
the effective population sizes and migration rates of n constant
using nonrecombining sequences, microsatellite data or enzyme
data, Recombine estimates the effective population size and per-site
rate of a single constant-size population.
TREETOOL - Official
Treetool Web Page.
Treetool is an interactive tool for displaying, editing, and printing
phylogenetic trees. The tree is displayed visually on screen, in
various formats, and
the user is able to modify the format, structure, and characteristics
the tree. Trees may be viewed, compared, formatted for printing,
constructed from smaller trees, etc...
Treetool works with Newick format tree files (Paup and Phylip
compatible). It handles multifurcating trees, branch lengths
(evolutionary distances), rooted/unrooted trees, and multiple trees per
file. It can print to aPostScript printer, or output PICT graphics for
Macintosh drawing programs (MacDraw).
TreeView - Official
TreeView Web Page
TreeView X is program to display phylogenetic trees on Linux and Unix
platforms. It can read and displays NEXUS and Newick format tree files
(such as those output by PAUP*, ClustalX, TREE-PUZZLE, and other
programs). It has a subset of the functionality of the version of
TreeView available for the Windows and Macs (it is roughly equivalent
to version 0.95 of TreeView).
Clustalw - Official Clustalw
Clustal W (Thompson et al., 1994) is a global alignment program for DNA
or protein sequence data. It uses a progressive alignment algorithm and
a guide tree based on sequence similarity to align DNA or amino acid
sequences. It aligns similar sequences first before aligning more
distant sequences. The user specifies a gap cost which consists of the
cost of opening a gap plus the cost of lengthening a gap. ClustalW v.
1.81 is now more compatible with GCG and has new features which allow
biologically meaningful alignment of divergent sequences. For recent
reviews comparing multiple alignment algorithms see Hickson et al.
(2000), Thompson et al. (1999), and McClure et al. (1994). Morrison and
Ellis (1997) discuss the effects of nucleotide sequence alignment on
the estimation of phylogenetic hypotheses. You can visit the ClustalX
site to see the graphics interface.
Some Features of CLUSTALW:
1. Individual weights are assigned to each sequence in a
alignment in order to down-weight sequences that are nearly identical
and up-weight the most divergent ones.
2. Amino acid substitution matrices are varied at different
alignment stages according to the divergence of the sequences to be
3. Residue-specific gap penalties and locally reduced gap
hydrophilic regions encourage new gaps in potential loop regions rather
regular secondary structure.
4. Positions in early alignments where gaps have been opened
receive locally reduced gap penalties to encourage the opening up of
new gaps at these positions.
5. Several amino acid substitution matrices series are
and user defined matrices can be used.
Phrap - Official
Phrap Web Page
Phrap is a program for shotgun sequence assembly. Key featuresinclude:
--Use of data quality information, both direct (from phred
analysis) and indirect (from pairwise read comparisons), to delineate
the likely accurate base calls in each read; this helps discriminate
repeats, permits use of
the full reads in assembly, and allows a highly accurate consensus
to be generated. A probability of error is computed for each consensus
sequence position, which can be used to focus human editing on
particular regions, to automate decision-making about where additional
data are needed, and
to provide users of the final sequence with information about local
variations in quality.
Translate - Official Translate
Translate program translates selected sequences from DNA/RNA to Amino
Acid. This program can either be used with a packed FASTA format
sequence or GDE format sequence. The frame number is appended to the
sequence name as '.#'
This translate program is a modified version of 'Translate' program,
distributed with GDE (Genetic Data Environment) package. The present
version is produced by John C. Kelley and Rao Parasa of CIT at NIH.
BLAST - Official BLAST Web Page
BLAST (Basic Local Alignment Search Tool) is the heuristic search
algorithm employed by the programs blastp, blastn, blastx, tblastn,
tblastx and blast3.
NOTE: To differentiate the local BLAST programs from their
counterpart, an 'l' is prepended to the name of each local blast
The programs are used for the following purposes:
blastp - to compare an amino acid query sequence vs.
protein sequence database.
blastn - to compare a nucleotide query sequence vs. a
nucleotide sequence database.
blastx - to compare a nucleotide query sequence
in all reading frames vs. a protein sequence database.
tblastn - to compare a protein query sequence vs. a
nucleotide sequence database dynamically translated in all reading
tblastx - to compare a nucleotide query sequence
in all reading frames vs. a nucleotide sequence database translated in
blast3 - protein database search for three-way
using the BLAST pairwise search algorithm.
FASTA - Official
FASTA Web Page
Compares a protein sequence to another protein sequence or to a protein
database, or a DNA sequence to another DNA sequence or a DNA library.
VESPA - Official VESPA Web Page
The VESPA program detects signature patterns (atypical amino acid or
nucleotide residues) in a set of query sequences relative to a set of
reference sequences. VESPA calculates the the frequency of each amino
acid (or nucleotide) at
each position (column) in an alignment for the query and the reference
and selects the positions for which the most common character in the
set differs from that in the background set. The frequencies of
characters at the distinguishing sites are also calculated. See Korber
B and Myers
G: Signature pattern analysis: a method for assessing viral sequence
relatedness AIDS Res. Human Retroviruses 8(9): 1549-1560 (1992).
Readseq is a reformatting program for DNA or protein sequence data
developed by Dr. Don Gilbert. It allows the input of single or multiple
in a variety of common formats and converts to a specified format.
Additional information including acceptable formats are available at
the Baylor College of Medicine site. A complete history of this program
can be found at the
Univerisity of Manitoba site.It is a widely used tool to convert
sequence data formats and is used to make the sequences palatable for a
application. It can be used to extract the raw sequence from any
database format. The program has some limitations in terms of the total
of sequences and compatible input formats.