Over the past decade, genome-wide association studies (GWAS)have been pivotal for identifying genetic variants associated with traits and diseases. Genotyping arrays have been the foundation for many of these experiments and have revealed insights into single nucleotide polymorphisms(SNPs) found throughout the genome. However, the cost of genome sequencing has decreased significantly over the past decade, which has led to an increase in low-pass whole genome sequencing (LP-WGS) compared to genotyping arrays. Furthermore, with technological advancements and increased human reference genomes, it is now possible to screen millions of single nucleotide polymorphisms (SNPs) compared to hundreds of thousands found on traditional genotyping arrays and without downtime to develop new arrays for newly identified SNPs. This blog will cover the advantages that SkimSEEK™ brings to human genomics applications and present cited examples of the benefits of this revolutionary technology.
Neogen®’s SkimSEEK is a cost-effective alternative to traditional high-depth sequencing methods. With SkimSEEK, researchers can generate high-quality genomic data at a fraction of the cost of conventional high-coverage sequencing. SkimSEEK leverages LP-WGS, meaning that the genome is sequenced to a low depth, then imputed to predict genotypes that are not directly observed in a sample. For example, gaps between the aligned sequencing reads are present when a sample is sequenced with low-pass sequencing and aligned to a genomic reference assembly. The result of the lower depth means that some SNPs of interest may not be directly observed in the raw sequencing data. Still, we can impute those SNPs with up to 99% accuracy using Gencove’s imputation pipeline. In addition, SkimSEEK delivers adapter-trimmed FASTQ files, a full public VCF of approximately 60 million SNPs for each sample.
SkimSEEK is an attractive alternative with significantadvantages compared to genotyping arrays for large-scale projects, such aspopulation genetics studies, where it may be required to sequence thousands ofgenomes. Whole genome sequence data will now allow for the possibility ofidentifying causal mutations, and these variants can be used to improve thereliability of genomic prediction. Array technology cannot detect rarelow-frequency variants (RLFV), contributing to genetic variance sincefunctional variants are more likely rare than common ones. While arrays havetraditionally been less expensive than sequencing, newer sequencing platformsand innovations continuously make sequencing more affordable.
SkimSEEK excels at capturing genomic diversity for GWASapplications due to the haplotype diversity in the imputation reference panel.This will allow our customers to identify rare genetic variants in somepopulations but common in others. An example of this can be found directly inthe literature. In their 2021 research article, Li et al. compared low-passsequencing and imputation, defined as sequencing a genome to an average depthless than 1x, to array genotyping using the Illumina Global Screening Array(GSA) on 120 DNA samples derived from African and European-ancestry individualsthat are part of the 1000 Genomes Project. The authors observed that genotypesimputed from sequence data were consistently and considerably more accuratethan genotypes imputed from array data, with the mean African non-referenceconcordance 7% higher for sequencing data. They concluded that low-pass wholegenome sequencing provides better coverage of the genome, which will allow thedetection of rare variants and reduce the impact of genotyping errors comparedto the Illumina GSA array, improving the accuracy of GWAS and polygenic riskscores.
A similar, recently published article noted that evenultra-low-coverage whole genome sequencing (ulcWGS - <0.5x) generated highlyaccurate GWAS data. Chat et al. performed whole genome sequencing of 72European individuals to a target coverage of 0.4x. They compared the sequencingperformance to the Infinium Global Screening Multi-Disease Array (GSA-MD) andfound that the number of variants captured was similar to the imputed GSA-MDfor low-frequency and common variants, with high imputation R2 accuracy(mean of 0.93 for SNPs and 0.86 for indels). Using 30x whole genomesequencing as a “truth” dataset, the authors observed that ulcWGS had higheroverall non-reference genotype concordance than imputed GSA-MD for SNPs andindels. The authors conclude that LP-WGS is an attractive alternative to arrayswhen planning and designing GWAS experiments.
As next-generation sequencing continues to evolve, Neogen’s goalis to stay at the forefront of technological innovation and providestate-of-the-art products to our customers. While Neogen has a long-standingreputation in the agricultural industry, we are excited to expand our productportfolio to human genomics with SkimSEEK. SkimSEEK is a powerful tool thatwill empower our research customers to explore deeper into the genome than everbefore, with the quality of service Neogen is known for. If you are interestedin learning more about SkimSEEK and how it can accelerate your research, pleaseget in touch with us by email or at 877.443.6489.
Li, J. H., Mazur, C. A., Berisa, T. & Pickrell, J. K."Low-pass sequencing increases the power of GWAS and decreases measurementerror of polygenic risk scores compared to genotyping arrays." Genome Res 31,529–537 (2021).
Chat, V., Ferguson, R., Morales, L. & Kirchhoff, T."Ultra Low-Coverage Whole-Genome Sequencing as an Alternative toGenotyping Arrays in Genome-Wide Association Studies." Frontiers Genetics 12,790445 (2022)