EXPANDS: expanding ploidy and allele frequency on nested subpopulations
When analyzing sequencing data obtained from a tumor sample or from genome-edited cell populations, it is important to recognize that technically, sequences obtained from each sample encode a metagenome, representing the aggregate genomes of all coexistent subclones within the sample. Knowing the structure of this diversity is essential when studying complex phenotypes in heterogeneous cell populations, because it requires us to know whether two or more mutations co-occur in the same cell or are mutually exclusively present in distinct cells. EXPANDS is a computational method that provides this knowledge.
The method models the cellular prevalence of each mutation as a copy-number dependent probability distribution. Subsequently, these cellular prevalence distributions are clustered to obtain the set of point mutations and copy number variations that accumulated in ancestral cells prior to each clonal expansion.
Figure 1. The five steps of the EXPANDS algorithm.
Given a set of SNVs and genomic copy number segments, EXPANDS predicts the number of clonal expansions in a tumor, the size and phylogeny of the resulting subpopulations (SPs) in the tumor bulk and which SNVs accumulate in a cell before its clonal expansion.
A) Cell frequency estimation. EXPANDS combines the copy number and allele frequency assigned to a SNV to estimate what fraction of cells harbor the SNV. In this example, the observed allele frequency (0.3) and copy number (2.1) can be explained either by a homozygous SNV, present in 30% of the cells or a heterozygous SNV, present in 60% of the cells. The cell-frequency probability P(f) is calculated for each mutated locus separately.
B) Clustering. All SNVs are clustered based on their cell-frequency probability distributions. Each cluster (color-coded) is extended by SNV members with similar distributions in an interval around the cluster-maxima.
C) Filtering. Clusters are pruned based on statistics within and outside the core region (interval around the cluster-maxima: highlighted in red). The blue cluster is pruned as peaks within the core region are low and do not significantly exceed peaks observed outside the core region. In contrast, the green cluster is kept as it has high and abundant peaks within and only a few peaks outside the core region. The number of remaining clusters denotes the number of predicted clonal expansions. Cell frequencies at cluster-maxima denote the predicted size of an SP in the tumor bulk.
D) Assignment of SNVs and ploidies to clusters. Each SNV is assigned to one of the predicted clonal expansions, based on the cell frequency estimation calculated in A). SP specific ploidies are assigned to each copy number segment.
E) Phylogenetic tree estimation. Pairwise phylogenetic distances between SPs are calculated from SP specific ploidy profiles and used as input for a neighbor-joining tree estimation algorithm to model the tumor’s phylogeny. In this example the phylogeny of seven coexistent SPs is displayed. The size of each SP is indicated at each branch end. Figure from (Andor et al., 2013) by permission of Oxford University Press.
EXPANDS was initially developed at UCSF and further enhanced at Stanford. The development, validation and first application of EXPANDS are described in the orginial publication in Bioinformatics.
Andor, N., Harness, J. V., Müller, S., Mewes, H. W. & Petritsch, C. EXPANDS: expanding ploidy and allele frequency on nested subpopulations. Bioinforma. Oxf. Engl. 30, 50–60 (2014).
Publications using EXPANDS to measure intra-tumor heterogeneity
- Lee, W. et al. PRC2 is recurrently inactivated through EED or SUZ12 loss in malignant peripheral nerve sheath tumors. Nat. Genet. 46, 1227–1232 (2014).
- Tawana, K. et al. Disease evolution and outcomes in familial AML with germline CEBP mutations. Blood 126, 1214–1223 (2015).
- Andor, N. et al. Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nat. Med. 22, 105–113 (2016).
- Morrissy, A. S. et al. Divergent clonal selection dominates medulloblastoma at recurrence. Nature 529, 351–357 (2016).
Publications comparing EXPANDS to other methods
- Yadav, V. K. & De, S. An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples. Brief. Bioinform. bbu002 (2014). doi:10.1093/bib/bbu002
- Li, B. & Li, J. Z. A general framework for analyzing tumor subclonality using SNP array and DNA sequencing data. Genome Biol. 15, 473 (2014).