This dataset includes 10X Cell Ranger 3.0 output for each individual sample. The Cellranger pipeline produces two types of feature-barcode matrices in a sparse matrix format (matrix.mtx.gz file). The unfiltered matrix contains both background and cell-associated barcodes and filtered matrix contains only cellular barcodes. Additional files (features.tsv.gz, barcodes.tsv.gz) contain feature and barcode sequences corresponding to row and column indices in the matrix respectively. For more details see https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/3.0/output/matrices Each sample is named in the following format: patientID_condition_replicateID patientID = 4-digit identifier used throughout the manuscript condition = n: normal, t: tumor, pbmc: peripheral blood mononuclear cells. Note that 6649 metaplasia sample is labelled as '6649_t1'. replicateID = 1 for samples with single replicate, 2 for 2nd replicate from the same biological sample. For more details, refer to Supplementary Table 1 in the manuscript (https://pubmed.ncbi.nlm.nih.gov/32060101/). Cell labels file contains cell lineage annotation for each single cell following subset analysis as described in the manuscript. Note that mast cells were not subjected to subset analysis and are represented by a single cluster. Column identifiers in the file: 'cell_barcode': original cell barcode appended with a suffix unique to each sample 'orig.ident': Sample name in the "patientID_condition_replicateID" format described above. 'condition': normal/tumor/metaplasia/pbmc 'final_celltype': cell lineage assigned following subset analysis with re-clustering 'cluster_celltype': cell lineage with the cluster number following subset analysis with re-clustering.