Next Generation Sequencing Statistics

Due to our preference to use GRCh38-aligned results, a large number of raw test results have been collected. This allows a unique opportunity to show case how each direct-to-consumer Next Generation Test performs beyond the vendors' marketing claims. All statistics are provided by GATK's CallableLoci tool with default settings after performing realignment. The tool reports locations with a minimum depth of 4 and less than 10% of the reads have a PHRED-scaled mapping quality less than 10.

The "Est. Years/SNP" calculation uses Adamov et al (2015.)'s 8.2E-10 SNP mutation rate and the CALLABLE coverage. The figure represents the average number of years between potential SNP branches. Some branches will see more and others less. Inclusion of simple Insertion or Deletion events, and small Multiple Nucleotide Polymorphisms could reduce this interval in all test types.

Being armed with this data can help you to make an informed decision in evaluating your options, if NGS testing is in your future. The ultimate decision is made more complex by how well you can compare the results with others locked in vendor proprietary databases. The only real certainty here is value judgement changes on rapid basis.

Mean NGS Callable Coverage by Test Type
Test Type Callable Loci combBED Loci Total Loci Samples (n) Est. Years/SNP
1KG 6,349,638 (48.197% CV) 3,457,028 (50.394% CV) 21,805,241 (10.526% CV) 776 192.06
Big Y 9,318,380 (4.768% CV) 7,684,996 (3.876% CV) 16,412,164 (4.462% CV) 586 130.87
WGS 15x-FGC 11,168,783 (0.000% CV) 5,955,495 (0.000% CV) 22,744,367 (0.000% CV) 1 109.19
WGS 20x-FGC 13,811,633 (0.000% CV) 7,352,261 (0.000% CV) 23,246,415 (0.000% CV) 1 88.30
WGS 30x-FGC 13,813,860 (6.043% CV) 7,325,530 (5.810% CV) 22,991,641 (0.583% CV) 8 88.28
WGS-Ancient 4,232,889 (99.059% CV) 2,708,633 (99.890% CV) 15,470,848 (51.527% CV) 2 288.10
WGS-LR Chromium 17,998,203 (0.000% CV) 7,792,979 (0.000% CV) 22,940,547 (0.000% CV) 1 67.76
WGS-Other 14,911,039 (0.121% CV) 6,654,911 (13.145% CV) 23,389,332 (0.173% CV) 2 81.79
WGS-Veritas 14,923,228 (4.953% CV) 7,986,530 (6.907% CV) 23,581,587 (0.567% CV) 61 81.72
Y Elite 15,313,381 (12.918% CV) 7,731,569 (5.400% CV) 21,766,426 (7.678% CV) 3 79.64
Y Elite 1.0 13,997,228 (4.641% CV) 7,911,702 (6.175% CV) 23,344,928 (2.290% CV) 32 87.13
Y Elite 2.0 14,270,887 (3.165% CV) 7,653,058 (6.797% CV) 22,445,261 (3.381% CV) 19 85.45
Y Elite 2.1 13,758,146 (3.458% CV) 7,774,999 (1.676% CV) 20,747,691 (2.022% CV) 4 88.64
Y Elite 2.1a 13,895,496 (0.000% CV) 7,774,910 (0.000% CV) 20,965,083 (0.000% CV) 1 87.76
Y Elite 2.1b 13,238,298 (2.090% CV) 7,516,391 (1.562% CV) 20,670,686 (0.775% CV) 5 92.12
Y Elite* 13,935,448 (3.906% CV) 6,926,796 (15.546% CV) 21,194,731 (6.534% CV) 29 87.51

* Using vendor provided CallableLoci report lifted to GRCh38. CombBED region loci totals are not as accurate as the fully realigned versions.

Notes on CALLABLE and combBED Metrics

The combBED regions were originally published in Adamov et al (2015.) The combBED is actually the intersection of Poznik et al. (2013) and the Big Y White Paper regions. The Poznik regions defined sections of the Y chromosome where the aligner reported confident quality scores and depths within expected ranges. The Big Y regions define those targeted by FTDNA's NGS test. The intersection becomes a simple filter criterion when sequences can be aligned with 90% confidence with 75-150 base reads and are consistently reported in Big Y tests.

The combBED statistic becomes a valuable tool in anticipating how well other Next Generation Sequencing platforms can be used for direct comparison with Big Y. Given the 6% coefficient of variance between Big Y themselves, it becomes clear any direct-to-consumer platform with 7 million combBED loci will be a direct substitution. The data in the 1000 Genomes project and Ancient Whole Genome Sequencing released over the last year is more difficult to compare as the callable coverage areas do not align as well.

The Y DNA coverage metric for VeritasGenetics and FullGenomes Whole Genome Sequencing tests is under reported. At present converting a full 30x WGS test takes over 96 hours. To more efficiently add the samples only the GRCh37 Y DNA segments are being realigned. Future iterations may correct this potential issue.