The KmerVC program requires three inputs: (i) the list of mutations of interest in VCF or bed file format (ii) the reference genome, and (iii) the sequencing data in fastq or fast file format. KmerVC generates the outputs regarding the validation of mutations from the input variant file based on the relative k-mer frequency counts.Our tool validates the mutations of interest in cancer sequencing data over five main steps; i) pre-processing to obtain the uniqueness of k-mers in the human genome, ii) counting k-mers in sequencing data, iii) retrieving expected k-mers from mutations of interest, iv) compiling k-mer counts into mutation events, and v) validating mutations.


EXPANDS leverages DNA-sequencing of up to tens of thousands of cells to detect subclones into which these cells apportion and to assign mutations to each subclone. Thereby it infers which mutations co-occur together, in the same subclone, and which are mutually exclusively present in distinct subclones.


GRAMMy is a computational framework developed for Genome Relative Abundance using Mixture Model theory (GRAMMy) based estimation.


To improve computation efficiency, incorporate new features, such as time series data with replicates, and make the analysis technique more accessible to users, we have re-implemented the ELSA algorithm as a C++ extension to Python. We also integrated the new ELSA tool set with the popular Galaxy framework.


Large clinical genomics studies require the ability to select and track samples from a large population of patients through the various experimental steps involved in genome sequencing. To meet these needs we have developed a web-based laboratory information system (LIMS) referred to as MendeLIMS. MendeLIMS is implemented using open-source tools (eg. Ruby on Rails), and has a flexible configuration which is adaptable to continuously evolving experimental protocols, and to various user needs. We maintain a publicly available demonstration version of the application for evaluation purposes which can be found on the Web Resources page.


We developed SVEngine, an open source tool, to address the need for tools allowing specifying locus specific variant fraction and allelic haplotype. It simulates next generation sequencing data embedded with structural variations as well as an assortment of complex sequence features.


SWAN is the first to introduce a statistically verifiable heterogeneity SV model to the community. As in SWAN, the genetic material sampled is no longer viewed as a homogenous mutant or reference sample but explicitly modeled as a mixture of both mutant and reference sequences with their fractions estimable.


Using barcode-linked read sequences, we have developed a new somatic and germline rearrangement caller, ZoomX, that detects large scale (>200 kb intra- or inter-chromosomal) rearrangements. ZoomX works for linked-read data and is optimized specifically for analyzing somatic variants of varying allelic fractions.


The RVD program takes BAM files of deep sequencing reads in as input. Using a Beta-Binomial model, the algorithm estimates the error rate at each base position in the reference sequence. For each sample, the relative difference between the reference and sample error rate is calculated and tested against the null distribution estimated from the model.


Gemtools is a collection of tools for the downstream and in-depth analysis of linked-read data (10X Genomics or TELL-Seq). The gemtools package provides the user with the flexibility to perform basic functions on their linked-read sequencing output in order to address even more questions!