Application of linked read sequencing to identify germline predisposition variants in families

Billy LauStephanie GreerJohn Bell

Cancer is a disease of the genome caused by genetic aberrations such as point mutations and rearrangements. To identify these genetic aberrations, cancer studies rely on DNA sequencing of short reads from highly fragmented DNA molecules that are typically less than 500 bases long. However, genomic structural alterations and context of genetic aberrations are much larger and cannot be effectively analyzed. To characterize such alterations with completeness, the contiguous relationship of variants from a given chromosome homologue must be determined; this analysis process is referred to as genome phasing. In our research group, we utilize a new technology called linked-read sequencing that enables tracking of individual DNA sequence reads to the input high molecular weight DNA at single-molecule resolution through the use of barcoded reads (Zheng, Lau, et al. Nat. Biotechnol. 2016). We then reconstruct at megabase-scale the complex genomic somatic events derived from the original paternal or maternal chromosome homologues and demonstrate new insight into the undiscovered details of cancer genome alteration.

Linked-read sequencing enables the determination of structural variants by detecting reads belonging to disparate regions of the genome but share the same barcode. We can further confirm the presence of structural variants with targeted sequencing methods. In contrast to structural variant events detected by conventional whole-genome sequencing, linked-reads can provide stronger and more specific signals. To confirm calls made with this technology, we used OS-Seq technology (Myllykangas, Buenrostro, et al. Nat. Biotechnol. 2011, and Hopmans, et al. Nucleic Acids Res. 2014), to program an Illumina flowcell to capture candidate structural variant (deletion) locations of a genome and its downstream sequence via primer extension. These ribbon plots show the locations of sequencing reads mapped to breakpoints of a deletion in NA12878 chromosome 3 found by linked-read sequencing.  On the left are the positions of reads mapped to the left breakpoint, where red represents probes mapping to the 5’ end of the breakpoint (coordinates at the bottom of the plot), and blue represents probes mapping to the 3’ end of the breakpoint (coordinates at the top of the plot).  On the right are the positions of reads mapped to the right breakpoint.  The y-axis indicates the index of the reads.  Because the deletion is heterozygous, reads colored in red on the left plot represent reads from the wild type allele, and reads colored in blue on the left plot represents reads from the deleted haplotype.  Because one read is anchored outside the breakpoint and much larger than typical insert sizes that are able to be sequenced, we confirm the presence of a deletion event. (LinkedRead_1-2.png)

We have extended the use of linked read sequencing data for the purpose of resolving complex genomic structural rearrangements. We developed a method to identify molecular barcodes that are unique to individual structural variant (SV) events, which can then be leveraged to identify other cis-related SVs and to assign SVs to haplotypes. This novel application has enabled us to reconstruct complex events with multiple breakpoints in cancer genomes for events with clinical implications.