Haplogroup R and Subclades

DNA Helix

Haplogroup R is defined by rs2032658 also known as M207. The group is believed to have developed about 19,000 to 34,000 years ago in Central Asia. In modern times descendants are common in Europe, South Asia, and Central Asia.

Supporting data from publicly available Haplogroup R related repositories is integrated as a service to the community. The Kits page contains a cross-reference list to track sample donors between labs. This allows testers to be placed to their closest branch in the Experimental Tree. To aid converting coordinates of variants placed in the tree, consult the Variant Index.

Contribute Y-DNA Data for Analysis

Haplogroup-R.org's emphasis is on collecting the original BAM raw data when possible to construct a phylogentic tree using the GRCh38 human genome reference. To contribute please use the BAM Submission Tool.

Direct to consumer sequencing data in the form of VCF, variantCompare, masterVar, and other formats can also be contributed to the Y-DNA Warehouse. The warehouse is a Private FTP server available to citizen scientists interested in promoting knowledge of the human Y-DNA tree.

YSEQ customers are encouraged to join Group 223: haplogroup-r.org Public Results. This group is the primary location monitored to collect new sequencing results.



A BETA quality histogram plot of coverage area has been added to the Kits page. See the FAQ for information on what is reprensented in the PNG images.


The site is now directing all HTTP requests to use HTTPS to improve security.


The Kits report has been repurposed. BAM donors can use this to check if their submission has completed the primary quality check gate. Others can use the report to gather some useful metrics on how well each NGS platform sequences the Y chromosome when aligned to GRCh38 using BWA-MEM. The full set of BAMs should be reflected here by end of day on Black Friday 2018.

The Statistics is now updated in real-time from the Kits report data.


All FTDNA project data has been scrubbed from the database. Sources now reflects the academic sources used in data analysis and lists the off-the-shelf software used in the analysis workflow.


The experimental tree has been partially republished with samples from the 13 Million Workbook. The workset is currently restricted to regions identified in Poznik et al. (2013). Future updates will reintroduce remaining BAM files.


The Data Use Policy has been amended to facilitate the changes needed for the Shared Data repository. The repository is a collection of variant calls sourced from the testing vendor by the data owners. The Data Use Policy lists all individuals with access to retrieve the data at this time. Project administrators who would like to gain access to their members' data should apply via email.

Raw BAM & FASTQ format submissions are not included in this private repository to protect privacy. Called formats produced by haplogroup-r.org may be added in the future with the data owners' consent.