Paired Target Finder Tutorial-- Search a genome or promoterome
Paired Target Finder identifies candidate binding sites for TAL Effector nucleases (TALENs).
Each TALEN monomer consists of the TAL effector DNA binding domain with a Fok1 catalytic domain fused to its C terminus. The monomers are designed to bind to candidate target sites (red on the image above) oriented from 5' to 3' on opposite strands of DNA. The spacer region between the sites must be sufficiently large enough for the two Fok1 domains to dimerize and cut the DNA, but not so long that they do not come into contact.
Paired Target Finder takes two RVD sequences (representing the potential TALEN monomers) and identifies paired candidate binding sites on opposite strands of DNA with the proper spacing and orientation between the two sites.
Identified monomer binding sites are scored using the scoring function developed by Moscou and Bogdanove in supplementary script S1 of the paper Moscou, M.J. and Bogdanove, A.J. (2009) A simple cipher governs DNA recognition by TAL effectors. Science. 326(5959):1501.
The scoring function is based on RVD-nucleotide association frequencies for known TAL-effector target pairs. Each RVD-nucleotide pair in the TAL effector/target alignment is assigned a probability score based on these association frequencies. Scores for all RVD-nucleotide pairs are summed to score the entire alignment.
Paired Target Finder has two modes. You can choose to search for sites in a sequence of your choice or in a pre-loaded genome/promoterome.
In this tutorial, we will load search a genome for candidate TALEN binding sites.
- Select a DNA sequence and enter RVD sequences that you want to search.
You may choose between entering your own DNA sequence of interest or searching for candidate binding sites in a genome or promoterome.You can switch between the two modes by clicking on the tabs "Provide a Sequence" or "Search a Genome/Promoterome".
To follow along with this tutorial, click on the "Search a genome/promoterome" tab.
Available genome/promoterome sequences are:
- Arabidopsis thaliana (TAIR10)
- Caenorhabditis elegans (WS220)
- Danio rerio (Zv9)
- Drosophila melanogaster (BDGP5.25)
- Homo sapiens (GRCh37)
- Mus musculus (NCBIM37)
- Oryza sativa (MSU6)
All genome sequences were downloaded from Ensembl.
- Select a spacer range.
Enter the minimum and maximum spacer lengths you wish to try. Choose a spacer range that is optimal for the TALEN architecture you are using.To follow along with the tutorial, leave the spacer min and max set to their defaults of 15 and 30.
- Select a score cutoff and an upstream base.
Paired Target Finder will identify all sites with the proper orientation and spacing and where each monomer's binding site scores better (lower) than a cutoff score. The cutoff score can be set to 3.0, 3.5, or 4.0 times the best possible score for the RVD sequence (see the FAQs page for more information about score cutoffs). The default option is set to 3.0: This cutoff is sufficient to identify all known TAL effector targets in nature, and should be sufficient for predicting TALEN monomer binding.To follow along with the tutorial, set the cutoff to 3.0 (the default). If you wish to identify more sites, you may raise the cutoff, but your search will be much slower!
Users can also choose which nucleotide(s) will be required to be upstream of returned sites. In nature, the majority of TAL effector targets are preceded by a T. However, at least one example of a TAL effector targeting a site preceded by C has been found in nature (Yu et al., 2011), and custom TAL effectors targeting sites preceded by C have been reported in the literature (Miller et al., 2010).
In our hands, sites preceded by a C were significantly less active than those preceded by a T. Therefore, we recommend that users select the default setting for sites preceded by a T only, unless they have a specific reason to do otherwise.
For the purposes of this tutorial, set the upstream base to T only (the default option). This T will be shown in the output.
- Select additional options.
If you wish to receive email notification when your job completes, enter an email address in the appropriate field.Finally, if you wish your results to remain on the server longer than 1 hour, select a different option from the drop-down menu under "Expires".
- Click the Submit button at the bottom of the page.
After you submit your job, you will be taken to a progress page.You can hide the progress details by clicking on the arrow next to "Process log".
Do not navigate away from this page; or bookmark the page so you can return and download your results later! Your job may take several minutes to finish.
- Retrieve and interpret your results.
When your job completes, a table with the candidate TALEN sites will appear in your browser. The results table can also be downloaded as a tab-delimited text or gff file.In the example from the tutorial, shown above, only one candidate TALEN site is identified.
Searching a larger genome, or using RVD sequences with more NN's or novel RVDs will generally result in more potential off-target sites.
TALEN-mediated DNA cleavage can occur wherever two TALEN monomers bind with the proper spacing and orientation. Thus, a search for off-target sites must consider all four possible combinations of TALEN monomer RVD sequences: RVD1+RVD1, RVD1+RVD2, RVD2+RVD1, and RVD2+RVD2. TAL1 indicates which RVD sequence is binding to the plus strand of DNA (the sequence as entered). TAL2 indicates which RVD sequence is binding the opposite strand. TAL1 Score and TAL2 Score give the scores for TAL1 and TAL2 monomer sites, respectively.
Paired Target Finder will return all paired sites where the monomers bind to opposite strands of the DNA with proper orientation and spacing, and where each monomer binding site scores below a cutoff score. The cutoff score is based on the Best Possible Score for each RVD sequence returned above the results table or at the top of the output file.
Best possible score gives the score for the TAL effector on its "perfect" binding site (the site with all RVDs aligned with their most frequently associated nucleotide). If a site's Score is closer to the Best possible score, the TAL effector is more likely to bind to that site.
TAL effectors in nature and their known targets typically have Scores less than 2-3 times the Best Possible Score for the TAL effector. Not all sites scoring below the cutoff are guaranteed to be bound by the TAL effector and some binding sites that score above the cutoff may be missed. Additionally, it is unclear if both monomers must score below a cutoff score in order for the candidate site to be a functional TALEN site. However, the list of sites should contain the majority of TALEN sites in the sequence of interest, with a longer list indicating that a TALEN is more likely to have off-target binding sites.
Note that all TALEN monomer binding sites are directly preceded by a T (or C, if that option is selected) at the 5' end. The T or C is now shown in the output, separated from the target sequence by a space.
Promoterome sequences for all of the above species are also available. Promoteromes were downloaded from the UCSC Genome Browser, except for Oryza sativa (downloaded from the MSU Rice Genome Annotation Project) and Arabidopsis thaliana (downloaded from The Arabidopsis Information Resource).
Promoterome is defined as the 1000 bases upstream of all annotated translational start codons.
Click "Test the tool using sample data" above the tab to automatically select a genome (Arabidopsis thaliana) and to load two RVD sequences in the appropriate text boxes.
To enter different RVD sequences, enter a sequence between 12 and 35 RVDs, separated by spaces, in each box. Use '*' to indicate a missing amino acid (such as N* or H*).