

GDE Linux Integrated Software
An integrated linux environment for
bioinformatics and evolutionary analysis based on the Genetic
Data Environment (GDE).
Cite the usage of GDE by citing the following paper
de Oliveira T, Miller R, Tarin M, Cassol S. An Integrated Genetic Data Environment (GDE)-Based LINUX Interface for Analysis of HIV-1 and Other Microbial Sequences. (2003) Bioinformatics, 19(1): 153-4.
|
|
PAUP -Official
PAUP web page
Abstract (from PAUP* web pages):
PAUP* version 4.0 is a major upgrade and new release of the software
package for inference of evolutionary trees, for use in Macintosh,
Windows, UNIX/VMS, or DOS-based formats. The influence of high speed
computer analysis of molecular, morphological and/or behavioral data to
infer phylogenetic relationships
has expanded well beyond its central role in evolutionary biology, now
encompassing applications in areas as diverse as conservation biology,
ecology, and forensic studies. The success of previous versions of
PAUP: Phylogenetic Analysis
Using Parsimony has made it the most widely used software package for
the
inference of evolutionary trees. In addition, the PAUP manual has
proven
to be an essential guide, serving as a comprehensive introduction to
phylogenetic
analysis for beginning researchers, as well as an important reference
for
experts in the field. With the inclusion of maximum likelihood and
distance
methods in PAUP* 4.0, the new version represents a great improvement
over
its predecessors. In addition, the speed of the branch-and-bound
algorithm
has been enhanced and a number of new features have been added, from
agreement
subtrees to tests for combinability of data and permutation tests for
non
randomness of data structure. These, along with many other
improvements,
will make PAUP* 4.0 an even more indispensable tool in comparative
biological
analysis than were previous editions of the program and manual. PAUP*
4.0
and MacClade 3 use a common data file format (NEXUS) , allowing easy
interchange
of data between the two programs.
PHYLIP (the PHYLogeny Inference
Package)
- Official
Phylip Web Page
Abstract:
PHYLIP is a freely available package of programs for inferring
phylogenies. PHYLIP can perform a large number of analysis including
parsimony, distance matrix and likelihood methods as well as
bootstrapping and consensus trees. Data types that can be handled
include DNA sequences, gene frequencies, RFLP
data, distance matrices and discrete character data (0/1).
PAML- Official PAML
Web Page
A short summary of the types of analyses performed by
different programs in the package is given below (extracted from
documentation).
baseml: ML analysis of nucleotide sequences:
estimation of
tree topology, branch lengths, and substitution parameters under a
variety of nucleotide substitution models; constant or gamma rates for
sites; molecular clock (rate constancy among lineages) or no clock,
among-gene and within-gene variation of substitution rates; models for
combined analyses of multiple sequence data sets; calculation of
substitution rates at sites; reconstruction of ancestral nucleotides.
basemlg: ML analysis of nucleotide sequences under
the
model of
gamma rates among sites.
codonml: ML analysis of protein-coding DNA sequences
using
codon substitution models (e.g.Goldman and Yang 1994); calculation of
the codon-usage table; estimation of synonymous and non synonymous
substitution rates; likelihood ratio test of positive selection or
relaxed selective constraints along
lineages; identification of amino acid sites or evolutionary lineages
potentially
under positive selection; reconstruction of ancestral codon sequences.
aaml: ML analysis of amino acid sequences under a
number
of amino acid substitution models; constant or gamma-distributed rates
among sites; molecular clock (rate constancy among lineages) or no
clock, among-gene and
within-gene variation of substitution rates; models for combined
analyses of multiple gene data; calculation of substitution rates at
sites; reconstruction of ancestral amino acid sequences.
pamp: Parsimony-based analyses for a given tree
topology,
estimation of the substitution pattern, the gamma parameter for
variable rates among sites, and reconstruction of ancestral character
states.
mcmctree: Bayesian estimation of phylogenies using
DNA
sequence data (Rannala and Yang 1996; Yang and Rannala 1997); Markov
chain Monte Carlo
calculation of posterior probabilities of trees.
evolver: This program does miscellaneous things, such
as
listing all rooted and unrooted trees for a given number of species,
generating random
trees with branch lengths from a birth-death process with species
sampling,
and calculating tree bipartition distances. It now also simulates
nucleotide,
codon, or amino acid sequence data sets.
yn00: This program implements the method of Yang and
Nielsen 2000
for estimating synonymous and non synonymous substitution rates in
pairwise comparison of protein-coding DNA sequences.
Tree-Puzzle - Official Tree Puzzle Web Page
Abstract:
TREE-PUZZLE is a computer program to reconstruct phylogenetic trees
from molecular sequence data by maximum likelihood. It implements a
fast tree
search algorithm, quartet puzzling, that allows analysis of large data
sets
and automatically assigns estimations of support to each internal
branch.
TREE-PUZZLE also computes pairwise maximum likelihood distances as well
as
branch lengths for user specified trees. Branch lengths can be
calculated
under the clock-assumption. In addition, TREE-PUZZLE offers a novel
method,
likelihood mapping, to investigate the support of a hypothesized
internal
branch without computing an overall tree and to visualize the
phylogenetic
content of a sequence alignment. TREE-PUZZLE also conducts a number of
statistical
tests on the data set (chi-square test for homogeneity of base
composition,
likelihood ratio clock test, Kishino-Hasegawa test). The models of
substitution
provided by TREE-PUZZLE are TN, HKY, F84, SH for nucleotides, Dayhoff,
JTT,mtREV24,
VT, WAG, BLOSUM 62 for amino acids, and F81 for two-state data. Rate
heterogeneity
is modeled by a discrete Gamma distribution and by allowing invariable
sites.
The corresponding parameters can be inferred from the data set.
LAMARC - Official
LAMARC Web Page.
Abstract (from LAMARC webpages):
LAMARC is a package of programs for computing population parameters,
such as population size, population growth rate and migration rates by
using
likelihoods for samples of data (sequences, microsatellites, and
electrophoretic
polymorphisms) from populations. It approximates the summation of
likelihood
over all possible gene genealogies that could explain the observed
sample.
The programs are memory-intensive but can run effectively on
workstations
or modern microcomputers. The package is continually expanding and more
executables for different machines will become available soon. LAMARC
currently
has four core programs: Coalesce estimates the effective population
size
of a single constant population using nonrecombining sequences,
Fluctuate
estimates the effective population size and a exponential growth rate
of
a single growing population using nonrecombining sequences, Migrate
estimates
the effective population sizes and migration rates of n constant
population
using nonrecombining sequences, microsatellite data or enzyme
electrophoretic
data, Recombine estimates the effective population size and per-site
recombination
rate of a single constant-size population.
- Migrate
Manual
- Coalesce
Manual
- Fluctuate
Manual
TREETOOL - Official
Treetool Web Page.
Abstract:
Treetool is an interactive tool for displaying, editing, and printing
phylogenetic trees. The tree is displayed visually on screen, in
various formats, and
the user is able to modify the format, structure, and characteristics
of
the tree. Trees may be viewed, compared, formatted for printing,
constructed from smaller trees, etc...
Treetool works with Newick format tree files (Paup and Phylip
compatible). It handles multifurcating trees, branch lengths
(evolutionary distances), rooted/unrooted trees, and multiple trees per
file. It can print to aPostScript printer, or output PICT graphics for
Macintosh drawing programs (MacDraw).
TreeView - Official
TreeView Web Page
Abstract:
TreeView X is program to display phylogenetic trees on Linux and Unix
platforms. It can read and displays NEXUS and Newick format tree files
(such as those output by PAUP*, ClustalX, TREE-PUZZLE, and other
programs). It has a subset of the functionality of the version of
TreeView available for the Windows and Macs (it is roughly equivalent
to version 0.95 of TreeView).
Clustalw - Official Clustalw
Web
Page
Abstract:
Clustal W (Thompson et al., 1994) is a global alignment program for DNA
or protein sequence data. It uses a progressive alignment algorithm and
a guide tree based on sequence similarity to align DNA or amino acid
sequences. It aligns similar sequences first before aligning more
distant sequences. The user specifies a gap cost which consists of the
cost of opening a gap plus the cost of lengthening a gap. ClustalW v.
1.81 is now more compatible with GCG and has new features which allow
biologically meaningful alignment of divergent sequences. For recent
reviews comparing multiple alignment algorithms see Hickson et al.
(2000), Thompson et al. (1999), and McClure et al. (1994). Morrison and
Ellis (1997) discuss the effects of nucleotide sequence alignment on
the estimation of phylogenetic hypotheses. You can visit the ClustalX
site to see the graphics interface.
Some Features of CLUSTALW:
1. Individual weights are assigned to each sequence in a
partial
alignment in order to down-weight sequences that are nearly identical
and up-weight the most divergent ones.
2. Amino acid substitution matrices are varied at different
alignment stages according to the divergence of the sequences to be
aligned.
3. Residue-specific gap penalties and locally reduced gap
penalties in
hydrophilic regions encourage new gaps in potential loop regions rather
than
regular secondary structure.
4. Positions in early alignments where gaps have been opened
receive locally reduced gap penalties to encourage the opening up of
new gaps at these positions.
5. Several amino acid substitution matrices series are
available,
and user defined matrices can be used.
- Clustalw
Manual.
Phrap - Official
Phrap Web Page
Abstract:
Phrap is a program for shotgun sequence assembly. Key featuresinclude:
--Use of data quality information, both direct (from phred
trace
analysis) and indirect (from pairwise read comparisons), to delineate
the likely accurate base calls in each read; this helps discriminate
repeats, permits use of
the full reads in assembly, and allows a highly accurate consensus
sequence
to be generated. A probability of error is computed for each consensus
sequence position, which can be used to focus human editing on
particular regions, to automate decision-making about where additional
data are needed, and
to provide users of the final sequence with information about local
variations in quality.
- Phrap
Manual.
Translate - Official Translate
Web Page
Abstratc:
Translate program translates selected sequences from DNA/RNA to Amino
Acid. This program can either be used with a packed FASTA format
sequence or GDE format sequence. The frame number is appended to the
sequence name as '.#'
This translate program is a modified version of 'Translate' program,
distributed with GDE (Genetic Data Environment) package. The present
version is produced by John C. Kelley and Rao Parasa of CIT at NIH.
BLAST - Official BLAST Web Page
Abstract:
BLAST (Basic Local Alignment Search Tool) is the heuristic search
algorithm employed by the programs blastp, blastn, blastx, tblastn,
tblastx and blast3.
NOTE: To differentiate the local BLAST programs from their
network
counterpart, an 'l' is prepended to the name of each local blast
program.
The programs are used for the following purposes:
blastp - to compare an amino acid query sequence vs.
a
protein sequence database.
blastn - to compare a nucleotide query sequence vs. a
nucleotide sequence database.
blastx - to compare a nucleotide query sequence
translated
in all reading frames vs. a protein sequence database.
tblastn - to compare a protein query sequence vs. a
nucleotide sequence database dynamically translated in all reading
frames.
tblastx - to compare a nucleotide query sequence
translated
in all reading frames vs. a nucleotide sequence database translated in
all reading
frames.
blast3 - protein database search for three-way
alignments,
using the BLAST pairwise search algorithm.
FASTA - Official
FASTA Web Page
Abstract:
Compares a protein sequence to another protein sequence or to a protein
database, or a DNA sequence to another DNA sequence or a DNA library.
VESPA - Official VESPA Web Page
Abstract:
The VESPA program detects signature patterns (atypical amino acid or
nucleotide residues) in a set of query sequences relative to a set of
reference sequences. VESPA calculates the the frequency of each amino
acid (or nucleotide) at
each position (column) in an alignment for the query and the reference
set,
and selects the positions for which the most common character in the
query
set differs from that in the background set. The frequencies of
characters at the distinguishing sites are also calculated. See Korber
B and Myers
G: Signature pattern analysis: a method for assessing viral sequence
relatedness AIDS Res. Human Retroviruses 8(9): 1549-1560 (1992).
READSEQ -
Abstract:
Readseq is a reformatting program for DNA or protein sequence data
developed by Dr. Don Gilbert. It allows the input of single or multiple
sequences
in a variety of common formats and converts to a specified format.
Additional information including acceptable formats are available at
the Baylor College of Medicine site. A complete history of this program
can be found at the
Univerisity of Manitoba site.It is a widely used tool to convert
various
sequence data formats and is used to make the sequences palatable for a
specific
application. It can be used to extract the raw sequence from any
supported
database format. The program has some limitations in terms of the total
number
of sequences and compatible input formats.
|
|