Technology

Oligonucleotide-Selective Sequencing (OS-Seq)

OS-SEQ PRIMER PROBE DESIGN

Oligonucleotide-Selective SEQuencing (OS-Seq) is an integrated automated targeted sequencing approach for high depth interrogation of high numbers of genomic regions-of-interest (ROI). In an automated cBot protocol, an Illumina flow cell is modified with target specific primer probe oligonucleotides, turning the flow cell surface into a target enrichment platform. Subsequently a sequencing library is hybridized against these target specific primer probes and enriched for ROI, after which the flow cell is prepared for sequencing (Figure 1). We developed a computational pipeline for optimized design of target specific primer probe oligonucleotides. This design pipeline has been demonstrated to give improved capture uniformity and capture rate, while vastly increasing the number of targeted bases.

Figure 1: Overview of OS-Seq method.

DIRECTIONS FOR RUNNING THE PIPELINE

Download files:

 

The OS-Seq_scripts.zip contains 2 required folders (lowercase):

  • “inputs”
  • “scripts”

 

The “inputs” folder should contain a tab delimited txt-file named “coordinates_to_extract.txt” with 4 columns (no header) :

  • Column 1: a unique identifier for the target
  • Column 2: The chromosome number, accepted values are 1 through 24 and X or Y (capitalized) (do not add chr)
  • Column 3: start position of target region of interest (0-based)
  • Column 4: end position of target region of interest (1-based)

The target regions will be increased in size by 500 bases to find optimal positions for the primer probes.

The “scripts” folder should contain the matlab script “os_seq_design_from_coordinates_v3.1.m”:

  • Adapt the script to point to the KMER folder containing the unzipped kmer.txt files in line 23.

From within the scripts folder, run the following command to find optimally placed primer probes in the listed target regions:

nohup nice time matlab -nodisplay -nodesktop -nojvm -nosplash osseq.log 2>&1 &

OUTPUT

The “results_design” folder contains 2 tab-delimited files:

  • output_by_capseq_os_seq_strict_design_density_dense_recess60_midsize300.txt
    contains optimal primer probe sequences up and downstream for each given target in a ready to order format, describing ID, target region, primer probe position, sequence of target specific region and complete primer probe and targeted strand
  • output_by_pair_os_seq_strict_design_density_dense_recess60_midsize300.txt
    contains possible primer probe pair sequences up and downstream of a given target and their scores.

The “results_annot” contents are subsets of the kmer files for each target region to reduce memory usage and the scores folder in “results_design” lists the scores for each of these kmer positions and are summarized in the “output_by*” files.

Within the “scripts” folder a log file (“osseq.log”) is created for error messages.