Haplogroup R is defined by rs2032658
also known as
The group is believed to have developed about 19,000 to 34,000 years ago in Central Asia. In modern times
the group is common in Europe,
South Asia, and Central Asia. With the increasing popularity of genetic genealogy large amounts of data has
been collected using direct
to consumer testing. This site attempts to collect and interpret as much of this data as possible.
The underlying tree structure was based on ISOGG's Y-DNA Haplogorup Tree 2015.
There are important methodology differences which have prompted the branch:
- ISOGG's tree form has suffered from a manual update process. This results in not conforming to YCC's
original principle of being
"drawn as asymmetrically as possible by sorting the descendants of each interior node so that the
bottom most descendant had the
greatest number of immediate descendants." YCC, 2002.
requirements regarding Next Generation Sequencing are not
observed in the same manner. Rather than excluding variants due to numbers of reads, alignment quality,
proximity or inclusion
in more active regions of the Y chromosome, these markers are coded on the tree. The ultimate goal is to
create an accurate vision
of the evolutionary tree with a willingness to restructure when nodes are found to be unstable in newer
sets of data.
All new branches must be evidence based to the samples where they were originally found. Information
about the kit, surnames,
origins, and testing platforms are presented with the terminal branch markers. In an effort to be more
useful to genetic
genealogists terminal branches are not restricted to arbitrary diversity criteria. As long as two men
share a mutation, it
forms a potentially interesting branch.
Having the original sequencing results in the database allows age estimations using a variation of the
methodology presented by
Adamov et al's, Defining a New Rate
Constant for Y-Chromosome SNPs based on Full Sequencing Data.
The major deviation in the estimation method comes from using a recursive calculation for all samples
under a given branch rather than
using averaging of the child branches.
This policy was created to balance the rights and privacy of individuals, with the benefit to the whole
community of gathering information for their research projects. We have tried to ensure the safety and
privacy of any personal data, including data likely to have significant medical relevance, or which can
identify a specific person. At the same time, we have tried to retain enough information that test results
are useful for research, meaningful for close matches and can be cross-referenced against information on
The following is the Policy agreed to on upload of data, between Submitters of that data (genetic
testers or their designated proxies) and the Project. The Project is defined as those persons with
administrative access to the data archive, or successors thereof.
- Submitters give the Project free license to analyse the genetic and ancestral data they submit, and
publicly release semi-anonymized, filtered analyses of that data, and any associated meta-data found
in the public domain. Released genetic data is to be limited to calls assigned to the Y-chromosome.
- Raw DNA sequencing data (e.g. BAM or FASTQ datasets) will only be shared with a member's explicit
written consent. However, reduced sets of Y-chromosome data (including calls in VCF/gVCF format, test
coverage information in BED format, and submitted meta-data) may be shared with co-operating projects.
- Tests are publicly identified by the meta-data supplied on submission, i.e. kit numbers and most-distant
known paternal ancestor information. Project members may request that public reports anonymize all or
part of this information to an internal project identifier instead. Such requests should be made by
e-mail before submission to prevent public release of information.
- Submitters or legal data owners have the right to request that their raw data is removed from the data
analysis at any time. However, since we release a reduced set of data into the public domain, we cannot
guarantee these data are removed from external sites once the kit has been analyzed.
The Project may contact Submitters about specific queries regarding their data, using the e-mail address
supplied on submission. Sharing of e-mail addresses with any third parties will only be done with
Minor updates to this agreement may be necessary, e.g. to modify or make explicit the names of people
and parties; to include new data formats; to clarify specific points of ambiguity; or to ensure
compliance with national and international law, existing privacy agreements with testing companies,
or community guidelines. Such changes may me made by the Project without notification, provided they
don't constitute material infringements of the rights and/or privacy granted to Submitters, as described
in the version of the Policy they initially accept.
As of 19 October 2017, the list of project administrators is:
James Kane (www.haplogroup-r.org),
Alex Williamson (www.ytree.net),
Iain McDonald (www.jb.man.ac.uk/~mcdonald/genetics.html),
Mike Walsh (for the R-P312 project groups) and Jef Treece (data analyst).