The original YCC tree was envisioned to be presented with the branches with the most descendants on the bottom. This means R1 and R2 need to swap positions. With the makeup of direct-to-consumer testing U106 and P312 branches will dominate anything likely to be found under R-M479.
The experimental tree will be moving away from long form haplgroup names in the near future. This will help to reduce the confusion caused by the change.
The HR prefix is simply an abbreviation for haplogroup-r.org. These variants have been placed into the experimental tree without a known name. Since using (chromosome) position [ancestral]->[derived] formats tend to get very long when including long insertion or deletion events, a placeholder name is assigned to all shared variants in the tree. Information about the coordinates can be found by searching the variant index.
When a formal research study or direct-to-consumer lab assigns a name to an HR variant and submits the information to ybrowse.org the HR name is retired. Therefore, it is not recommended to use these names outside discussion of this experimental tree.
The coverage area histograms show the regions in the BAM, which have 4 or more reads between chrY:2,500,000 and chrY:27,100,000. This block represents the bulk of the known Y chromosome on GRCh38. Each horizontal pixel represents 3,000 bases.
The bands are color coded in green or red. Green represents no more than 10% of the reads have a PHRED-scaled alignment quality less than 10. Roughly speaking we can assume a 0.01% chance all of the reads have been assigned to the wrong location. Red represents all other locations with more than three reads. We may find phylogentically important markers in these regions, but cannot be sure of the actual location on the chromosome.
The preferred data formats for submission are BAM, CRAM or FASTQ files for Next Generation Sequencing tests. Having the raw data available allows the reads to be aligned to the most recent human genome reference. The current reference version is GRCh38 dated 2013-12-17 available from the 1000 Genomes Project. This allows us to compare with over 200 samples in the repository without realigning their data. As future reference builds are released, we plan to continue realigning the entire library.
Due to the size of the files being uploaded, the source of the files must be a web hosted site allowing retrieval from a URL. This capacity is provided by most labs with a limited time-window. You can also use a file sharing service like DropBox or Google Drive. Both services allow free basic accounts which can hold between 3 and 4 Big Y BAMs. Professional accounts may be required for a 30x WGS file.
Please submit using the Submission Tool. Allow between four to six weeks for the free analysis results to be fully integrated.
You can also submit the vendor analysis files provided by your testing company via email. Supported formats include:
Please use the Y-DNA Warehouse to contribute the call files.
Raw data importers have been created for each of these test types and can be included. The Chromo2 and Geno2 tests continue to be beneficial for establishing branching with features lying outside the most popular NGS test, Big Y. The autosomal oriented tests intended for genetic matching have a much more limited selection of Y DNA SNPs tested. They can be included but usually will not be displayed in the tree.
Please use the Y-DNA Warehouse to contribute the call files.
The Experimental Tree provides the capability to report several details about a kit. Search for the kit# of interest and click the button. A popup will appear with details on lab IDs, surname, origin, BAM coverage summary, and private mutation details.
The private variants table list details of position, any known names, and scoring information. The GRCh38 Coord is the position recorded in the haplogroup-r.org database. The GRCh37 Coord is the more common location used by testing laboratories. Use this definition when requesting they add new tests. Names represent any existing matches found in ybrowse.org
. The Depth column reports the number of reads with the derived allele present. This is systematically limited to having at least four derived reads. The Likelihood column is a measure of how certain the genotype caller is the variant exists. The report requires at least 90% (or a 10% chance of being false positive.) The Source column indicates the genotype caller used. Finally, the combBED column indicates if the variant exists in the Big Y target regions. Future updates will report if the variant is contained in the Poznik "Gold Regions" instead.