Software
SVEngine
Allele Specific and Haplotype Aware Structural Variants Simulator
INTRODUCTION
Simulation-based evaluation is an efficient way to benchmark structural variant analysis programs and facilitate their development. However, there is only limited choice of structural variant simulators. Specifically, tools allowing specifying locus specific variant fraction and allelic haplotype were not generally available. We developed SVEngine, an open source tool to address such need. It simulates next generation sequencing data embedded with structural variations as well as an assortment of complex sequence features. SVEngine takes template haploid sequences (FASTA) and an explicit variant file, a variant distribution file and/or a clonal phylogeny tree file (NEWICK) as input. SVEngine simulates and outputs mutated sequence contigs (FASTA), sequence reads (FASTQ) and/or alignments (BAM) files with desired variants, along with BED files containing ground truth. SVEngine’s flexible design enables one to specify size, position, and heterogeneity for deletion, insertion, duplication and inversion and translocation variants. SVEngine’s additional features include simulating multiple sequencing libraries and targeted sequencing. SVEngine is highly parallelized for rapid and high throughput simulation. We showed superior versatility and efficiency of SVEngine by feature comparison and runtime comparison with other available simulators. We demonstrated the utility of SVEngine in an example of simulating locus-specific variant frequency mimicking the phylogeny in cancer clonal evolution. We validated the correctness of the simulations by examining expected sequencing mapping features such as coverage change, read clipping, insert size shift and neighboring hang read pairs for representative variant types.
IMPLEMENTATION
SVEngine is implemented as a standard Python package
AVAILABILITY
SVEngine is freely available for academic use at: https://bitbucket.org/charade/svengine
CONTACTS
Questions and comments shall be addressed to lixia@stanford.edu
REFERENCES
Identification of large rearrangements in cancer genomes with barcode linked reads. LC Xia, JM Bell, C Wood-Bouwens, JJ Chen, NR Zhang, HP Ji. Nucleic acids research 46(4), e19 (2018) https://doi.org/10.1093/nar/gkx1193