Next Generation Sequencing Statistics

Due to our preference to use GRCh38-aligned results, a large number of raw test results have been collected. This allows a unique opportunity to show case how each direct-to-consumer Next Generation Test performs to the vendors' marketing claims. All statistics are provided by GATK's CallableLoci tool with default settings after performing realignment. The tool reports locations with a minimum depth of 4 and less than 10% of the reads have a PHRED-scaled mapping quality less than 10.

The "Est. Years/SNP" calculation uses Adamov et al (2015.)'s 8.2E-10 SNP mutation rate and the CALLABLE coverage. The figure represents the average number of years between potential SNP branches. Some branches will see more and others less. Inclusion of simple Insertion or Deletion events, and small Multiple Nucleotide Polymorphisms could reduce this interval in all test types.

Use this information in choosing the platform most capable of answering your genetic genealogy questions.

Mean NGS Callable Coverage by Test Type
Test Type Callable Loci combBED Loci Poznik Loci Total Loci Samples (n) Est. Years/SNP
Avg CV Avg CV Avg CV Avg CV

* Using vendor provided CallableLoci report lifted to GRCh38. CombBED region loci totals are not as accurate as the fully realigned versions.

Notes on CALLABLE and combBED Metrics

The combBED regions were originally published in Adamov et al (2015.) The combBED is actually the intersection of Poznik et al. (2013) and the Big Y White Paper regions. The Poznik regions defined sections of the Y chromosome where the aligner reported confident quality scores and depths within expected ranges. The Big Y regions define those targeted by FTDNA's NGS test. The intersection becomes a simple filter criterion when sequences can be aligned with 90% confidence with 75-150 base reads and are consistently reported in Big Y tests.

The combBED statistic becomes a valuable tool in anticipating how well other Next Generation Sequencing platforms can be used for direct comparison with Big Y. Given the 14% coefficient of variance between Big Y themselves, it becomes clear any direct-to-consumer platform with 6.7 million combBED loci will be a direct substitution. The data in the 1000 Genomes project and Ancient Whole Genome Sequencing released over the last year is more difficult to compare as the callable coverage areas do not align as well.

The Y DNA coverage metric for VeritasGenetics and FullGenomes Whole Genome Sequencing tests is under reported. At present converting a full 30x WGS test takes over 96 hours. To more efficiently add the samples only the GRCh37 Y DNA segments are being realigned. Future iterations may correct this potential issue.