Number of CRFs not included on Rega V3: 16
CRF 15_01B - Exclusion reason: CRF15_01B is similar to CRF01_AE in most of the genomic regions. It also clusters with CRF01_AE and if used in the Rega V3 do not allow the classification of CRF01_AE as it will compete and break CRF01_AE cluster, specially on the usage of Pol sequences that are normally generated for resistance studies.
CRF 16_A2D - Exclusion reason: It is a rare A2 strain in Kenya, Argentina and Korea. It also cause cluster problems on a phylogentic tree with 21_A2D (a 16_A2D related recombinant ) and old D sequences from South Africa. Martine - recombination fragment small.
CRF 17_BF - Exclusion reason: CRF17_BF is a B/F recombinant related to CRF12_BF. It causes clustering probem on the phylogenetic tree with CRF12_BF and do not allow others CRF12 sequences to be correctly classified. In addition, there is problem to classify others CRFs that are B and F recombinants. My experience suggest that recombination between B and F is still happening in South America and that most of the CRFs_BF are not epidemiologically important, but a few complete genomes with common origin that have been identified by researchers and complete genome generated.
CRF 21_A2D - Exclusion reason: It is a rare A2 recombinant strain found in Kenya, which probably is not epidemiologically significant. It also cause cluster problems on a phylogentic tree with 16_A2D and old D sequences from South Africa.
CRF 22_01A1 - Exclusion reason: It is a CRF01_AE recombinant with an A1 strain sampled only in Cameroon, which is not comfirmed in bootscanning analysis.
CRF 23_BG - Exclusion reason: It is too similar to CRF24_BG and causes clustering problem with CRF24_BG on a phylogenetic tree. When used as a reference strain it not classify correctly on GAG and ENV. Needs reference.
CRF 26_AU - Exclusion reason: Sequence recently available, need to be tested.
CRF 28_BF - Exclusion reason: CRF28_BF reference sequences are quite diverse and the recombination break points are not exactly the same. It cluster together with CRF29_BF in many regions of the genome. It is possible to use either CRF28 or CRF29, I used CRF29_BF. However, both of CRFs (28 and 29) does not seen to be epidemiologically important and could be excluded.
CRF 32_06A1 - Exclusion reason: CRF32_06A1 cluster with CRF06_cpx. It needs tree to be displayed and reference reviewed.
CRF 33_01B - Exclusion reason: CRF33_01B is most similar to CRF34_01B, they cluster together in many regions of the genome with CRF01_AE. This do not allow the classification of CRF01_AE complete genomes if CRF34_01B or CRF33_01B are in the reference dataset.
CRF 34_01B - Exclusion reason: CRF34_01B is most similar to CRF33_01B, they cluster together in many regions of the genome. However if CRF33_01B is excluded, CRF34_01B can be used as a CRF in Rega V3.
CRF 36_cpx - Exclusion reason: CRF36_cpx is to similar to CRF02_AG, they cluster together in many regions of the genome and break the CRF01_AG cluster, which if kept as a reference strain do not allow CRF02_AG sequences to be classified correctly. This CRF was also identified in Cameroon, according to the authors of the paper yhe ancestral sequences present in CRF36_cpx represent a link to extinct strains, and, potentially, insight into the evolution of HIV-1.
CRF 44_BF - Exclusion reason: Only one complete genome available. Unpublished manuscript.
CRF 45_cpx - Exclusion reason: This CRF (CRF45_cpx) is supported by clustering using phylotype analysis. However the bootstrap support on the Pol gene is relatively low (83%). In addition, this CRF is found in West-Central Africa and probably part of the early diversity of HIV-1.
CRF 46_BF - Exclusion reason: CRF46_BF is classified as F1 (no B recombination detected in Rega V3). This CRF is also too diverse and do not cluster together with high bootstrap values on the complete genome and Pol genes. This CRF cluster withing subtype F1 in CG and POL trees.
CRF 48_01B - Exclusion reason: Only one complete genome available. Unpublished manuscript.