Next Generation Sequencing Statistics

This page is moving to the Y-DNA Warehouse. Please update your links to the new home.

Mean NGS Callable Coverage by Test Type
Test Type Callable Loci combBED Loci Poznik Loci Total Loci Samples (n) Est. Years/SNP Histogram
Avg CV Avg CV Avg CV Avg CV

Explanation of Metrics

The coverage table is produced using BWA-MEM aligned BAMs on the GRCh38 reference genome. The BAMs are processed by GATK's CallableLoci tool with default settings. For a location to be considered callable it must have four reads overlapping the site. No more than ten percent of those reads may have a PHRED-scaled alignment quality of less than 10. Yielding a heuristic combined quality indicating less than 0.01% chance all the reads are aligned incorrectly to the site.

The combBED regions were originally published in Adamov et al (2015.) The combBED is actually the intersection of Poznik et al. (2013) and the Big Y White Paper regions. The Poznik regions defined sections of the Y chromosome where the aligner reported confident quality scores and depths within expected ranges. The Big Y regions define those targeted by FTDNA's NGS test. The intersection becomes a simple filter criterion when sequences can be aligned with 90% confidence with 75-150 base reads and are consistently reported in Big Y tests.

Special Note on Chromium LR

This tool has revealed a large coefficient of variance in 10x Genomics Chromium genome sequencing. These variances are due to poor sample quality and highlight the importance of following collection protocols and quality screening. While several examples support the case saliva samples on this technology are viable, just as many are showing disappointing results. It would seem blood samples are needed to fully realize the potential of this test and mitigate risks unless additional quality gates are put in check prior to sequencing.

The vendor notes that they invested in additional sequencing for the customers impacted by this issue at no charge. The problem is at least one sample was so badly contaminated a completely new run would be required. When viewed with two reads as miminum for calling instead of four, this sample could be compared with the other men in his cluster but at a lower specificity.