Release Notes for HIVdb, HIVseq, HIValg
|last updated Nov 16, 2010|
Table of Contents
The presence of HIV-1 drug resistance before starting a new antiretroviral (ARV) drug treatment regimen is an independent predictor of the virological response to that regimen. Several studies have shown that the use of genotypic resistance testing prior to the start of new treatment regimen increases the likelihood of virological response to that regimen. However, interpreting the results of HIV-1 drug resistance tests is one of the most difficult tasks facing health care providers (TF Liu & RW Shafer, CID 2006). First, there are many different drug resistance mutations (RW Shafer & JM Schapiro, AIDS Rev 2008). Second, these mutations cause varying levels of decreased susceptibility to different ARVs. Third, standard genotypic resistance tests fail to detect drug-resistance mutations that are present at low levels within a patient's virus quasispecies.
More than 50 published studies have been performed to discover rules that correlate pre-therapy drug-resistance mutations with the clinical response to a salvage therapy ARV treatment regimen including more than 30 studies of protease inhibitors and more than 20 studies of NRTI inhibitors and studies of NNRTI inhibitors. However, rules-discovery studies are limited in power because of the large number of drug-resistance mutations, the large number of covariates that influence virologic response, and the different patient populations, optimized background regimens, and virological endpoints used in these studies (F Brun-Vezinet et al. Antivir Ther 2004). As a result, many academic and commercial groups have developed integrated genotypic resistance interpretation systems that supplement the rules discovered in these studies with other forms of data such as the results of in vitro susceptibility testing and of in vitro and in vivo drug selection studies.
The most commonly used publicly available integrated genotypic resistance interpretation systems include the HIVdb system found here, the ANRS (Agence Nationale de Recherhes sur le Sida) system, the Rega Institute System, The Antiretroscan (Italian Antiretroviral Resistance Cohort), and the Geno2pheno (German National Reference Center) (Liu & Shafer, CID 2006). The most commonly used proprietary systems include the ViroSeq system which is associated with Celera's FDA-approved HIV-1 RT and protease sequencing kit, the TrueGene system, which is associated with Siemen's FDA-approved HIV-1 RT and protease sequencing kit, the VircoType system developed by Virco Laboratories, and the GeneSeq system developed by Monogram Biosciences (Liu & Shafer CID 2006).
Each of these systems performs the same basic function: assess how active an ARV is likely to be against a particular mutant virus compared with the drug's activity against a wildtype virus. When combined with a sound understanding of the principles of antiretroviral therapy, these systems and other Web and printed drug-resistance summaries help health care providers better understand the results of HIV-1 genotypic resistance tests. However, because these systems do not explicitly consider the relative potencies of different ARV drugs and drug combinations or the results of other relevant clinical data such as previous drug-resistance test results, ARV treatment history, plasma HIV-1 RNA levels, CD4 counts, and drug toxicity, they do not have the logical power to instruct clinicians on which ARV drugs should be used when constructing a salvage therapy regimen (Liu & Shafer 2006).
There are three programs in the HIV Drug Resistance Database which share a common code base: HIVseq, HIVdb, and HIValg. HIVseq accepts user-submitted protease, RT, and integrase sequences, compares them to the consensus subtype B reference sequence, and uses the differences as query parameters for interrogating the HIV Drug Resistance database (Shafer, D Jung, & B Betts, Nat Med 2000; Rhee SY et al. AIDS 2006). The query result provides users with the prevalence of protease, RT and integrase mutations according to subtype and PI, nucleoside RT inhibitor (NRTI), non-nucleoside RT inhibitor (NNRTI), and integrase inhibitor (INI) exposure. This allows users to detect unusual sequence results immediately so that the person doing the sequencing can check the primary sequence output while it is still on the desktop. In addition, unexpected associations between sequences or isolates can be discovered by immediately retrieving data on isolates sharing one or more mutations with the sequence.
HIVdb is an expert system that accepts user-submitted HIV-1 pol sequences and returns inferred levels of resistance to 20 FDA-approved ARV drugs including 8 PIs, 7 NRTIs, 4 NNRTIs, and - with this update - one INI. In the HIVdb system, each HIV-1 drug resistance mutation is assigned a drug penalty score and a comment; the total score for a drug is derived by adding the scores of each mutation associated with resistance to that drug. Using the total drug score, the program reports one of the following levels of inferred drug resistance: susceptible, potential low-level resistance, low-level resistance, intermediate resistance, and high-level resistance.
With this update, the PIs have been renamed as follows: atazanavir/r (ATV/r), darunavir/r (DRV/r), fosamprenavir/r (FPV/r), indinavir/r (IDV/r), lopinavir/r (LPV/r), saquinavir/r (SQV/r), and tipranavir/r (TPV/r) where "/r" indicates co-administration with low-dose ritonavir (RTV) for pharmacological "boosting". Nelfinavir (NFV) which cannot be reliably boosted by ritonavir has not been changed. This change has been made to indicate unambiguously that the penalty scores and activity estimates for the PIs apply to their boosted form. Indeed, LPV is co-formulated with low-dose RTV; DRV and TPV are approved only with low-dose RTV; and ATV, FPV, SQV, and IDV are usually administered with low-dose RTV. Although ATV and FPV have been approved for administration without RTV boosting, these drugs are generally used in this manner for treating viruses lacking PI-resistance mutations.
HIValg is designed for users interested in comparing the results of different algorithms or who are interested in comparing and evaluating existing and newly developed algorithms. The ability to develop new algorithms that can be run on the HIV Drug Resistance Database depends on the Algorithm Specific Interface (ASI) compiler (Shafer & Betts JCM 2003).
2.1 User Interface
For each of the three programs, sequences can be entered using either the Sequence Analysis Form or the Mutation List form. To use the Sequence Analysis Form, paste one or more non-interleaved sequences in fasta format into the textbox or upload a file containing up to 100 non-interleaved fasta sequences Consistent with the fasta format each sequence should be preceded by a line containing ">" followed by a sequence name and optionally followed by additional descriptors separated by pipes ("|").
To use the Mutation List Form, select mutations using the drop down boxes or by entering the mutations into the textboxes. When using the textboxes, it is essential that amino acid mutations are entered in UPPERCASE whereas insertions and deletions should be entered using lowercase "ins" or "del". If there is a mixture of more than one amino acid at a position, write both amino acids (intervening slashes are optional). The consensus mutations must be separated either by spaces or commas; preceding the amino acid position by the consensus amino acid residue is optional. When using the drop down menu, choose the amino acid present in the sequence. If the amino acid is not present, then choose select the asterisk which will open a text box allowing you to enter an amino acid that is not on the drop-down list.
If you are a frequent user and typically enter many sequences at one time then it will be more convenient for you to use the Web service, Sierra. Sequences entered are not stored on our servers. Sierra allows you to enter 1,000 sequences at one time and returns the results as an XML report that is easy to interpret and parse, making it unnecessary to manually inspect a large number of HTML results. Whatever interface you use: Mutation List Form, Sequence Analysis Form, or the Web Service Sierra your results are not stored on our servers.
2.2 Sequence Alignment and Amino Acid Translation
Nucleotide sequences are aligned to the consensus B HIV-1 pol amino acid sequence using a nucleotide to amino acid sequence local alignment program (X Huang Genomics 1996). Very short sequences or sequences containing multiple insertions, deletions, and frame-shifts may not aligned successfully and may yield a warning. The current version of the program should be able to align all HIV-1 sequences. However; HIV-2 sequences usually produce warnings. We are planning to modify the code so that HIV-2 sequences can also be submitted. Amino acid insertions that appear in the region between RT amino acids 65 to 74 are hard-coded to appear at position 69; whereas amino acid deletions in this region are hard-coded to appear at position 67. This is consistent with how these mutations are most frequently described in published papers and with how the drug-resistance mutation penalties have been established.
Nucleotide triplets containing ambiguities are translated into each of the possible amino acids they encode. However, when the resulting list of possible amino acids is more than four, we replace this list with an 'X'. For example, WMC is translated to NTYS (N for AAC, T for ACC, Y for TAC, S for TCC), but WMS is translated to X instead of NTYSK* (N for AAC, T for ACC, Y for TAC, S for TCC, K for AAG, T for ACG, * for TAG, and S for TCG). All possible translations are explicitly defined in the triplets-table.txt file.
3.1 Quality control analysis
The quality control analysis reports three types of problem positions: (i) Positions containing stop codons or frame shifts; (ii) Positions containing highly ambiguous nucleotides: N (cannot distinguish between A,C,G, or T), B (contains a combination of C, G, and T), D (contains a combination of A, G, and T), H (contains a combination of A, C, T), and V (contains a combination of A, C, and G). (iii) Highly unusual mutations defined as mutations that are not associated with drug resistance and which are present in HIVDB at a frequency of <0.05% or in only a single reference. Tables containing non-highly-unusual mutations in protease, RT, and integrase can be found at these links: PR variation, RT variation, IN variation. (iv) Mutations strongly suggestive of APOBEC3G-induced G to A hypermutation. A table with such mutations in protease and RT can be found at this link.
Each sequence is compared to a list of reference sequences for each of the Main group of HIV-1 sequences representing subtypes A, B, C, D, F, G, H, J, K, CRF01_AE, and CRF02_AG. The subtype of the closest reference sequence is assigned to the submitted sequence. This method will generally be accurate (Gifford R et al AIDS 2006); however, it will not accurately characterize circulating recombinant forms (CRFs) other than 01 and 02 or other non-CRF recombinants. Moreover, performance on protease sequences alone is often suboptimal because there is often insufficient phylogenetic signal to distinguish between subtypes B and D or between subtype A and CRF01_AE without additional sequence data.
Several other programs have increased accuracy for HIV-1 subtyping including (i) Rega HIV-1 Subtyping Tool (de Oliveira T et al Bioinformatics 2005). This program follows the most rigorous approach for HIV-1 subtyping and uses boot-scanning to detect recombinant sequences. It can be found on this website or at the BioAfrica site maintained by Dr. de Oliveira); (ii) The STAR Subtype Analyzer which uses a position-specific scoring matrix for HIV-1 subtyping (http://www.vgb.ucl.ac.uk/starn.shtml) (Myers RE et al, BioInformatics 2005); (iii) Virus subtyping tool, NCBI (http://www.ncbi.nlm.nih.gov/projects/genotyping/formpage.cgi) (Rozanov M et al Nucl Acids Res 2004); and (iv) Subtyping Distance Tool (SUDI) found at the Los Alamos HIV Sequence Database (http://www.hiv.lanl.gov/content/sequence/SUDI/sudi.html).
3.3 Mutation Categories
Mutations are defined as differences from the consensus B reference sequence (PR, RT, and IN). Mutations are further characterized as follows: (i) RT mutations - NRTI-resistance mutations, NNRTI-resistance mutations, and Other mutations; (ii) PR mutations - Major PI resistance mutations, Minor PI resistance mutations, and Other mutations; (iii) INI mutations - Major INI resistance mutations, Minor INI-resistance mutations. A partial explanation for how mutations are assigned to specific categories can be found in the HIVDB FAQ page. This categorization is occasionally modified as new drug resistance knowledge accrues. The latest categorization of mutations can be found in the tables linked here: PI Major and PI Minor, NRTI and NNRTI, and INI Major and INI Minor.
The "Other mutations" list may often contain mutations that are associated with drug resistance but which are primarily accessory and which are polymorphic (meaning they frequently occur even in untreated persons). The decision to move these mutations from these mutations from the "Minor" category to the "Other" category was made for the following reasons: (i) these mutations have little effect on drug susceptibility, (ii) these mutations often represent the consensus sequence in non-B subtypes, (iii) including these mutations in the "Minor" mutation category would complicate the report and make it more difficult to identify mutations indicative of past selective drug pressure.
The "Other mutations" list may often contain rare non-polymorphic mutations that are associated with drug resistance but which have not yet been described in a peer-reviewed paper or received widespread recognition. Fortunately, these mutations are generally uncommon and generally emerge only after multiple "Major" and "Minor" resistance mutations have emerged (explaining why they have not been well studied). The decision to place these mutations in the "Other" category was made to simplify the report. Future versions of the report may indicate which of the mutations in the "Other" category may be indicative of past selective drug pressure and of uncertain clinical significance.
3.4 Mutation Penalty Scores
Mutation penalty scores are developed based on the following considerations: (i) Published studies and data linking mutations to ARV therapy; (ii) Published studies and data linking mutations to decreased ARV susceptibility; (iii) Published studies linking pre-therapy mutations with the virological response to a new ARV treatment regimen. Mutation penalty scores undergo repeated testing to insure that most common mutation papers receive total scores consistent with the studies described above. Mutation penalty scores are frequently modified based on new papers, scientific presentations, and occasionally by user feedback.
(i) Published studies and data linking mutations to ARV therapy. Mutations that are polymorphic (i.e. occur in the absence of selective drug pressure) generally do not receive scores or receive low scores even if the prevalence of these mutations increases during ARV therapy. The rationale for this approach is that polymorphic mutations have not been shown to significantly impair the response to a new ARV treatment regimen.
(ii) Mutations that decrease ARV susceptibility in vitro receive significant mutation penalty scores, particularly if these mutations have been shown to decrease ARV susceptibility in the absence of additional major mutations. Occasionally data of this type are available for site-directed mutants containing a single or a small set of mutations thus allowing an assessment of the precise contribution a mutation makes to decreased susceptibility. More often, however, such data are obtained from the statistical analysis of clinical isolates for which both genotypic and phenotypic data are available. Mutations that are associated with increased drug susceptibility generally receive a small negative score unless the mutation occurs in a mixed virus population.
(iii) Published studies linking pre-therapy mutations with virological response to a new ARV treatment regimen. As noted above, there have been more than 50 published studies of this type summarized in the Genotype-Clinical section of this website. These studies are often underpowered because of the large number of drug-resistance mutations, the large number of covariates that influence virologic response, and the different patient populations, optimized background regimens, and virological endpoints used in these studies. Therefore, our approach is to weigh the evidence from these studies carefully. Mutations that are associated with a decreased clinical response in large studies or in more than one small study are given more credence.
Some of the data from these studies such as those for tipranavir (RESIST study), darunavir (POWER studies), and etravirine (DUET study) have become so widely known that we provide a specific comment listing the number of specific mutations reported to be associated with resistance in these studies. Nonetheless, the mutations from these studies do not necessarily receive mutation penalties particularly if they are polymorphic. For example, many of the mutations from the original RESIST study list were highly polymorphic and were not assigned penalties. Two of the mutations in the current etravirine DUET study list (V90I and V106I) are polymorphic and are not assigned mutation penalties.
The most recent scores are available as tab-delimited files or tables sortable by position or drug:
Throughout our website we refer to each drug by its abbreviation and here you can find the different names for each drug
3.5 ARV Resistance Estimates
The drug resistance estimate for an ARV is obtained by adding together the scores of each for the mutations associated with resistance to that drug. The scores are titrated to fall within the following ranges: (i) 0 to 9: Susceptible, no evidence of reduced susceptibility compared with wildtype; (ii) 10 to 14: Potential low-level resistance. The virus is likely to be fully susceptible yet it contains mutations that may be indicative of previous exposure to the ARV class of the drug; (iii) 15 to 29: Low-level resistance. Virus isolates of this type have reduced in-vitro drug-susceptibility and/or patients with viruses of this genotype may have a suboptimal virologic response to treatment compared with the treatment of a wildtype virus; (iv) 30 to 59: The genotype suggests a degree of drug resistance greater than low-level resistance but lower than high-level resistance; (v) >=60: the genotype is similar to that of isolates with the highest levels of in vitro drug resistance and/or patients infected with isolates having similar genotypes generally have little or no virologic response to treatment with the drug.
At the end of every report is a table listing each of the ARV-resistance mutations present, their scores for each of the drugs, and the summary of scores for each of the drugs. This table is important to examine because it contains more information than the five categories listed at the top of the report. It is not uncommon for an isolate to have intermediate resistance to two PIs with one PI having a score of 31 (close to low-level resistance) and another having a score of 59 (close to high-level resistance). The scores themselves are also links to information in the database supporting the level of the mutation penalty.
As noted in the Introduction, the purpose of this program is to assess how active an ARV is likely to be against a particular mutant virus compared with its activity against wildtype virus. The program does little else to help a health care provider choose therapy. For example, it is often wiser to use a highly potent drug assigned intermediate resistance than to use a less potent drug assigned low-level resistance. Second, some drugs such as 3TC and FTC continue to provide some degree of virological benefit even in the presence of high-level resistance possibly because the mutations usually responsible for resistance M184V/I, increase HIV-1 susceptibility to other NRTIs and because M184V/I are associated with decreased virus replication. Although a program that could select the appropriate treatment regimen for a patient would be desirable, no such program exists making it necessary for all health-care providers to have a sound understanding of the principles of antiretroviral therapy (http://aidsinfo.nih.gov/Guidelines/Default.aspx?MenuItem=Guidelines).
Following the list of ARV Resistance Estimates, the HIVdb report contains a series of comments: (i) The first type of comment includes a listing of the mutations associated with the "GSSs" developed by Boehringer-Ingelheim for tipranavir (Baxter JD et al, J Virol 2006; Scherer J et al 11th European HIV Conf 2007) and by Tibotec for darunavir (De Meyer et al AIDS Res Hum Retrovirus 2008) and etravirine (Vingerhoets J et al HIVDRW 2007; Vingerhoets J et al HIVDRW 2008); (ii) Mutation-specific comments. These are brief 1 to 2 sentence synopses of XX number of protease, YY number of RT, and ZZ number of integrase mutations that have been associated with ARV resistance. (iii) A listing of mutations associated with hypersusceptibility. (iv) Highly unusual mutations defined as mutations that are not associated with drug resistance and which are present in HIVDB at a frequency of <0.05% or in only a single reference (These are also indicated in the Quality control analysis section of the report).
Files with the most recent comments are available in tab-delimited format:
The scoring tables, comments, and programs are frequently updated; these updates are tracked in the Updates page. Below is a listing of our current and previous versions linking to the specific improvements since January 2003.
HIVseq allows users to examine new sequences in the context of previously published sequence data on RT, protease, and integrase (Shafer R, Jung D, and Betts B, Nature Med 2000; Rhee et al AIDS 2006). Like HIVDB, HIVseq can accept either mutations or complete sequences and produces an assessment of quality control.
Detailed description of the tabular output of HIVseq:
Each table contains one row for each mutation and 20 columns. Columns 1 to 4 list the position, the position's consensus amino acid, the submitted nucleotide triplet and the submitted mutation. Columns 5 to 12 list the frequency of each mutation in subtypes A, B, C, D, F, G, CRF01_AE and CRF02_AG in drug class naive persons. Columns 13 to 20 list the frequency of each mutation in subtypes A, B, C, D, F, G, CRF01_AE and CRF02_AG in drug class experiences persons. Each mutation is also a hyper-link to a separate web page with detailed information on each isolate, including literature references with Medline abstracts, the GenBank accession number, and complete sequence and treatment records.
Note: To minimize reporting bias, the mutation frequency tables contain one sequence per individual. For individuals in whom sequences from multiple isolates were published, the mutation tables include the earliest sequence from untreated persons and the latest sequence (while on therapy) from persons receiving antiretroviral therapy. To exclude technical sequencing errors and cases of circulating virus containing unusual variants, the mutation tables include only mutations present as the predominant form whenever multiple clones from the same isolate were sequenced. Sequences of poor quality and those considered to be possible laboratory contaminants are excluded from the data sets.
The following table provides a summary of number of persons used for the HIVseq output.
The objectives of this program are to 1) identify the extent of agreement between three commonly used genotypic drug resistance interpretation systems; and 2) to identify sequences responsible for disagreements between these systems. It is important to note that two of the three algorithms have been simplified from a five-to-six level output (Rega) or a five level output (HIVdb) to a three level output so that all three algorithms can be roughly compared. It is also important to note that discrepancies of one level (e.g. susceptible vs low/intermediate resistance or low/intermediate resistance vs high-level resistance) can frequently occur by chance if the level of resistance is on the borderline between two levels. Only discrepancies between fully susceptible and high-level resistance should be examined closely.
The following algorithms are available online in their XML form in the "Algorithm Specification Interface page". They are all encoded using the ASI format, which is also described in the same page.
Each of the algorithms reports their results differently. The table below shows how the results of the algorithm are normalized for comparison by the program. Users of HIValg can select whether they prefer to receive output with the original interpretation or with the normalized interpretation ('SIR' option).
Selecting which algorithms appear in the output report can be done in two different ways. The first technique is to select from the list of algorithms made available on our servers. The second technique allows you to upload an algorithm from your machine, assuming that the algorithm is in proper ASI format as described in the Algorithm Specification Interface page (Betts BJ & Shafer RW J Clin Microbiol 2003). These techniques can be used in combination.
Appendix 1. Consensus B Sequences
The subtype B consensus sequence is derived from an alignment of subtype B sequences maintained at the Los Alamos HIV Sequence Database (hiv-web.lanl.gov). The consensus B sequence is therefore a commonly used reference sequence to which new sequences are compared. Files containing the consensus PR, consensus RT, and consensus IN are also available.
Appendix 2. Sample Data Sets
A small data set (N=10) has been compiled to provide users with a sample input for running our programs. To view the results for these sequences, copy and paste them into the input form.
A large data set (N=2055) is also available. We ask users to restrict the number of sequences they process at a time using our programs to 100, so this data set cannot be directly submitted to our programs.
A very large data set (N=5838) is available. Again, we ask users to restrict the number of sequences they process at a time using our programs to 100, so this data set cannot be directly submitted to our programs.