Haplogroup R and Subclades

DNA Helix

Haplogroup R is defined by rs2032658 also known as M207. The group is believed to have developed about 19,000 to 34,000 years ago in Central Asia. In modern times descendants are common in Europe, South Asia, and Central Asia.

This site's emphasis is on collecting the original BAM raw data when possible to construct a phylogentic tree using the GRCh38 human genome reference. To contribute please use the Submission Tool.

Supporting data from publicly available Haplogroup R related repositories is integrated as a service to the community. The Kits page contains a cross-reference list to track sample donors between labs. This allows testers to be placed to their closest branch in the Experimental Tree. To aid converting coordinates of variants placed in the tree, consult the Variant Index.

YSEQ customers are encouraged to join Group 223: haplogroup-r.org Public Results. This group is the primary location monitored to collect new sequencing results.



Individual kit information for all R-FGC22501 Subclades removed per request from the haplogroup project. Kit owners who wish to be made visible again must send a request to the contact address in the page footer.


The private variants report has been improved. A small percentage of SNPs were not being presented when the same source had Sanger Sequencing verification performed. The report now also allows INDELs to show. GRCh38 results are currently showing a larger ratio of INDELs than expected. It is recommended advanced users do not attempt to verify these unshared INDELs via Sanger Sequencing. These changes also laid the foundation to redefine private to local tree context. Future updates will leverage this capability.

Swapped the ancestral and derived alleles for several basal variants to match observations in haplogroups P, Q and R men. Upstream variants that remain negative in the Kit "Known SNPs" report appear to be mixed calls, possible sequencing errors, or full back mutations.

A small percentage of kits with sequences close to the reference or low overall SNP testing remain classified to the wrong group. These will self correct as additional branches are approved.


Addressed an issue with group assignment reported by R-FGC5494 samples. The problem was caused by recurrent SNPs in branches related within 3000 years and sparse data loads. This has introduced a new control value for the best fit algorithm, which may require additional training. Please report any issues you may notice.

Adjusted the algorithm for the "Private Variants" report. Diploid (calls having more than one possible allele value) positions with low read depths have been removed.

"Known SNPs" report enhanced to show immediate descendants of the kit's terminal branch. This is intended to further support the placement, but could be used as a guide to see if any shared downstream variants remain to be tested.

Removed all terminal tree branch leaves with less than two supporting kits. Future updates will remove interior branches without supporting splits.


Matrix report generation is on-hold while enhancing the software. They are expected to resume late-August or early-September.