How low can you go?

How low can you go?

A common project design question is “how much information do I need?”. The usual response is, “as much as possible!”. But this is perhaps informed as much by tradition as actual need. There are several dimensions to information as well–number of loci assayed, read depth at each locus, for example.

A little history of genotyping by sequencing

The early genotyping by sequencing methods were attempts at replicating the standards of the time as closely as possible. Fixed-content genotyping arrays ruled, and these delivered high-quality genotype calls for heterozygous alleles, so next-gen sequencing methods tried to emulate these types of data. The GBS method broke away from this by making light sequencing of many loci and inferring the nearby genotypes the standard.

Bears and bears and bears, oh my!

So researchers have been more comfortable with assaying many markers but at a low depth, essentially getting high quality data from just one of the two homologous chromosomes (in a diploid). A great example of this, and an interesting read is:

Genomic Evidence for Island Population Conversion Resolves Conflicting Theories of Polar Bear Evolution

from Beth Shapiro’s group at UCSC. After light sequencing of polar, brown and black bears, the data were downsampled to only choose one allele at each locus, even if two alleles were present. They were then able to apply informative population statistics to the downsampled data, such as assessing genetic diversity (spoiler: polar bears aren’t very diverse), quantifying admixture using the D-statistic, and using the data for simulations¬†of gene flow.

A lot from a little

The paradigm of getting a little information about a lot of loci is a useful one. Sometimes input DNA amounts are scarce, or the DNA is damaged and low quality. These issues can prevent the creation of a fully complex sequencing library. But “scans” of the genome like the paper above are still possible, and can be incredibly useful for providing new insights into long-studied populations of ecological, environmental and evolutionary importance.