Models, Inference and Algorithms
Broad Institute of MIT and Harvard
Primer: A deep learning approach to structural variant discovery
Chris Rohlicek
Popic Lab, Broad Institute
Victoria Popic
Broad Institute
Structural variants (SV) are the greatest source of genetic diversity in the human genome and play a pivotal role in diseases such as Alzheimer’s, autism, autoimmune and cardiovascular disorders, and cancer. Breakthroughs in whole-genome sequencing, especially the advent of long-read technologies, have enabled significant progress in method development geared toward SV detection. Current state-of-the-art approaches extract hand-crafted features from the data and employ expert-driven statistical modeling or heuristics to predict different SV classes. However, manual engineering of SV-informative features and models is challenging given the multi-dimensionality of the sequencing data and the diversity of SV types, sizes, and sequencing platforms. As a result, general SV discovery still remains an open problem. In this primer talk, we will describe the problem of SV detection and its current challenges, motivate the need to develop extensible and generalizable methods to improve SV calling and genotyping, and introduce our formulation for SV detection as a task that can be effectively solved with deep learning. In particular, we show how SV detection can be reduced to a keypoint localization task in images constructed from sequence alignments and review model architectures suited for this task.
Meeting: Cue: A framework for cross-platform structural variant calling and genotyping with deep learning
We introduce the framework Cue designed to call and genotype structural variants (SV), including complex and subclonal SVs, using data from a range of sequencing platforms. At a high level, Cue first converts sequence alignments into multi-channel images that capture platform-specific read alignment signals and then detects SV breakpoints (categorized by type and genotype) directly in these images using a stacked hourglass network. In this talk we will provide an overview of the framework and present the latest results in the detection of five common SV types (namely: deletions, tandem duplications, inversions, inverted duplications, and inversions flanked by deletions; the latter two are examples of complex SVs, which have been linked to several genomic disorders) from short, linked, and long read data. We will also discuss the challenges of generating training data and benchmarking in the absence of comprehensive ground-truth SV callsets.
Chapters:
00:00 MIA Introduction
01:52 MIA Primer
49:32 MIA Meeting
[ Ссылка ]
For more information visit: [ Ссылка ]
Copyright Broad Institute, 2022. All rights reserved.
Ещё видео!