Haplogroup R and Subclades


Haplogroup R is defined by rs2032658 also known as M207. The group is believed to have developed about 19,000 to 34,000 years ago in Central Asia. In modern times the group is common in Europe, South Asia, and Central Asia. With the increasing popularity of genetic genealogy large amounts of data has been collected using direct to consumer testing. This site attempts to collect and interpret as much of this data as possible.


The underlying tree structure was based on ISOGG's Y-DNA Haplogorup Tree 2015. There are important methodology differences which have prompted the branch:

  1. ISOGG's tree form has suffered from a manual update process. This results in not conforming to YCC's original principle of being "drawn as asymmetrically as possible by sorting the descendants of each interior node so that the bottom most descendant had the greatest number of immediate descendants." YCC, 2002.
  2. The submission requirements regarding Next Generation Sequencing are not observed in the same manner. Rather than excluding variants due to numbers of reads, alignment quality, proximity or inclusion in more active regions of the Y chromosome, these markers are coded on the tree. The ultimate goal is to create an accurate vision of the evolutionary tree with a willingness to restructure when nodes are found to be unstable in newer sets of data.
  3. All new branches must be evidence based to the samples where they were originally found. Information about the kit, surnames, origins, and testing platforms are presented with the terminal branch markers. In an effort to be more useful to genetic genealogists terminal branches are not restricted to arbitrary diversity criteria. As long as two men share a mutation, it forms a potentially interesting branch.
  4. Having the original sequencing results in the database allows age estimations using a variation of the methodology presented by Adamov et al's, Defining a New Rate Constant for Y-Chromosome SNPs based on Full Sequencing Data. The major deviation in the estimation method comes from using a recursive calculation for all samples under a given branch rather than using averaging of the child branches.

Data Policy

  1. The project recognizes there is a low probability that direct to consumer genetic tests may contain data outside the vendor's targeted regions. Only Y chromosome data is considered in the analysis.
  2. Filtered analysis may be shared with cooperating projects to facilitate research comparisons. Raw data will only be shared with a member’s explicit permission.
  3. FTDNA kit #’s are displayed for convenience of related surname projects or haplogroups in all reporting. Project members may request the tree or matrix reports use an internal project id instead.
  4. Project members have the right to request that their raw data is removed from reporting at any time, but shared variants in the tree will be retained.