Target Finder Tutorial-- Search a Genome/Promoterome
Target Finder identifies candidate binding sites for a TAL effector in a DNA sequence. Target Finder has two modes. You can choose to search for sites in a sequence of your choice or in a pre-loaded genome/promoterome.
Identified TAL effector binding sites are scored using the scoring function developed by Moscou and Bogdanove in supplementary script S1 of the paper Moscou, M.J. and Bogdanove, A.J. (2009) A simple cipher governs DNA recognition by TAL effectors. Science. 326(5959):1501.
The scoring function is based on RVD-nucleotide association frequencies for known TAL-effector target pairs. Each RVD-nucleotide pair in the TAL effector/target alignment is assigned a probability score based on these association frequencies. Scores for all RVD-nucleotide pairs are summed to score the entire alignment.
TAL Finder returns a browser table of the top (lowest) scoring candidate binding sites in the input DNA sequence for the TAL effector (by default, the top 10 sites are included in the table). All candidate binding sites scoring below a given threshold (discussed in the tutorial) are also available in downloadable output files.
In this tutorial, we will search a genome sequence for candidate binding sites.
- Select the DNA sequence and enter an RVD sequence to search.
You may choose between entering your own DNA sequence of interest or searching for candidate binding sites in a genome or promoterome.
You can switch between the two modes by clicking on the tabs "Provide a Sequence" or "Search a Genome/Promoterome".
To follow along with this tutorial, click on the "Search a Genome/Promoterome" tab.
Available genome/promoterome sequences are:
- Arabidopsis thaliana (TAIR10)
- Caenorhabditis elegans (WS220)
- Danio rerio (Zv9)
- Drosophila melanogaster (BDGP5.25)
- Homo sapiens (GRCh37)
- Mus musculus (NCBIM37)
- Oryza sativa (MSU6)
All genome sequences were downloaded from Ensembl.
- Select a score cutoff and an upstream base.
Target Finder will identify all sites with scores better (lower) than a cutoff score. The cutoff score can be set to 3.0, 3.5, or 4.0 times the best possible score for the RVD sequence (see the FAQs page for more information about score cutoffs). The default option is set to 3.0: This cutoff is sufficient to identify all known TAL effector targets in nature.
To follow along with the tutorial, set the cutoff to 3.0 (the default). If you wish to identify more sites, you may raise the cutoff, but your search will be much slower!
All sites scoring less than the selected cutoff will be included in the downloadable results file; however, only the top 10 sites (or more, if you change this setting) will be displayed in the browser output table.
Users can also choose which nucleotide(s) will be required to be upstream of returned sites. In nature, the majority of TAL effector targets are preceded by a T. However, at least one example of a TAL effector targeting a site preceded by C has been found in nature (Yu et al., 2011), and custom TAL effectors targeting sites preceded by C have been reported in the literature (Miller et al., 2010).
In our hands, sites preceded by a C were significantly less active than those preceded by a T. Therefore, we recommend that users select the default setting for sites preceded by a T only, unless they have a specific reason to do otherwise.
For the purposes of this tutorial, set the upstream base to T only (the default option). This T will be shown in the output.
- Select additional options.
By default, Target Finder searches both the sequence as entered and the reverse complement for candidate TAL effector binding sites. To search only the sequence (and not the reverse complement), uncheck the option box.
To follow along with the tutorial, leave this box checked. When searching a promoterome for potential off-target sites resulting in transcriptional activation, it may be more appropriate to uncheck this box and search only the 5' to 3' oriented promoter sequences.
If you wish to receive email notification when your job completes, enter an email address in the appropriate field.
By default, Targer Finder displays only the 10 best scoring candidate sites in the browser-generated results table (all sites below the score cutoff will be available in the downloadable results file). If you wish to display more sites, you can enter a different number. For the tutorial, leave the number of sites displayed set to 10.
Finally, if you wish your results to remain on the server longer than 1 hour, select a different option from the drop-down menu under "Expires".
- Click the Submit button at the bottom of the page.
After you submit your job, you will be taken to a progress page.
You can hide the progress details by clicking on the arrow next to "Process log".
Do not navigate away from this page; or bookmark the page so you can return and download your results later! Your job may take several minutes to finish.
- Retrieve your results.
After you submit your job, you will be taken to a progress page. When your job completes, a table with the top candidate binding sites will appear in your browser. By default, 10 sites are displayed in the table; if you changed this number, a different number of sites will be shown. Note that if not enough sites meet the score cutoff, the table may show fewer sites.
This table contains the top scoring sites; it may not include all sites! The complete list of sites is available in the downloadable output files listed above the table.
All sites scoring below the cutoff are included in the downloadable tab-delimited results file. Additionally, coordinates for all identified sites are available to download in gff3 format.
- Interpret your results.
Target Finder will return all sites in the DNA sequence scoring below (better than) a cutoff score. The cutoff score is based on the Best Possible Score returned above the results table or at the top of the output file.
Best possible score gives the score for the TAL effector on its "perfect" binding site (the site with all RVDs aligned with their most frequently associated nucleotide). If a site's Score is closer to the Best possible score, the TAL effector is more likely to bind to that site.
TAL effectors in nature and their known targets typically have Scores less than 2-3 times the Best Possible Score for the TAL effector. Therefore, all sites scoring below 3*Best Possible Score (or 3.5 or 4.0 *Best Possible Score, if you selected one of these options) are returned in the downloadable output file. By default, up to 10 of these sites are displayed in the browser table (a different number of sites will be shown if you changed the default setting).
Not all sites listed are guaranteed to be bound by the TAL effector and some binding sites that score above the cutoff may be missed. However, the list of sites should contain the majority of binding sites in the genome interest, with a longer list indicating that a TAL effector is more likely to have off-target binding sites.
You may wish to filter your list in order to address a question of interest. For example, when searching for additional off-target sites in a promoter sequence, you may be interested in sites that will result in gene activation. Therefore, you may want to consider only sites that are near a gene's transcriptional or translational start site.
For genome searches, clicking on the genome coordinates for a site in the table (under the Locus column) will take you to a GBrowse view showing the TAL effector binding site in relationship to other gene features.
In the example above, the TAL effector (indicated by the red arrow) binds to an intron of the gene AT4G24160.
Note that all binding sites are directly preceded by a T (or C, if that option is selected) at the 5' end. The T or C is now shown in the output, separated from the target sequence by a space.
Promoterome sequences for all of the above species are also available. Promoteromes were downloaded from the UCSC Genome Browser, except for Oryza sativa (downloaded from the MSU Rice Genome Annotation Project) and Arabidopsis thaliana (downloaded from The Arabidopsis Information Resource).
Promoterome is defined as the 1000 bases upstream of all annotated translational start codons.
To follow along with this tutorial, select the Arabidopsis thaliana genome from the drop-down list. Click "Test the tool using sample data" above the tab to load a sample RVD sequence in the appropriate text boxes.
To enter a different RVD sequence, enter a sequence between 12 and 35 RVDs, separated by spaces. Use '*' to indicate a missing amino acid (such as N* or H*).