Target Finder Tutorial-- Provide a sequence
Target Finder identifies candidate binding sites for a TAL effector in a DNA sequence. Target Finder has two modes. You can choose to search for sites in a sequence of your choice or in a pre-loaded genome/promoterome.
Identified TAL effector binding sites are scored using the scoring function developed by Moscou and Bogdanove in supplementary script S1 of the paper Moscou, M.J. and Bogdanove, A.J. (2009) A simple cipher governs DNA recognition by TAL effectors. Science. 326(5959):1501.
The scoring function is based on RVD-nucleotide association frequencies for known TAL-effector target pairs. Each RVD-nucleotide pair in the TAL effector/target alignment is assigned a probability score based on these association frequencies. Scores for all RVD-nucleotide pairs are summed to score the entire alignment.
TAL Finder returns a browser table of the top (lowest) scoring candidate binding sites in the input DNA sequence for the TAL effector (by default, the top 10 sites are included in the table). All candidate binding sites scoring below a given threshold (discussed in the tutorial) are also available in downloadable output files.
In this tutorial, we will load our own DNA sequence to search for candidate binding sites.
- Enter the DNA sequence and RVD sequence that you want to search.
You may choose between entering your own DNA sequence of interest or searching for candidate binding sites in a genome or promoterome.
You can switch between the two modes by clicking on the tabs "Provide a Sequence" or "Search a Genome/Promoterome".
To follow along with this tutorial, click on the "Provide a Sequence" tab.
Click "Test the tool using sample data" above the tab to load a sample DNA sequence and RVD sequence in the appropriate text boxes.
To load your own DNA sequence, cut and paste FASTA formatted sequences into the text box or upload a file containing the sequences. All sequences should include a FASTA identifier line (">long_sequence" in the image above) and sequences are limited to the characters ACGTN. For more details on correct formatting, see the Help page.
To enter a different RVD sequence, enter a sequence between 12 and 35 RVDs, separated by spaces. Use '*' to indicate a missing amino acid (such as N* or H*).
- Select a score cutoff and an upstream base.
Target Finder will identify all sites with scores better (lower) than a cutoff score. The cutoff score can be set to 3.0, 3.5, or 4.0 times the best possible score for the RVD sequence (see the FAQs page for more information about score cutoffs). The default option is set to 3.0: This cutoff is sufficient to identify all known TAL effector targets in nature.
To follow along with the tutorial, set the cutoff to 3.0 (the default). If you wish to identify more sites, you may raise the cutoff, but your search will be much slower!
All sites scoring less than the selected cutoff will be included in the downloadable results file; however, only the top 10 sites (or more, if you change this setting) will be displayed in the browser output table.
Users can also choose which nucleotide(s) will be required to be upstream of returned sites. In nature, the majority of TAL effector targets are preceded by a T. However, at least one example of a TAL effector targeting a site preceded by C has been found in nature (Yu et al., 2011), and custom TAL effectors targeting sites preceded by C have been reported in the literature (Miller et al., 2010).
In our hands, sites preceded by a C were significantly less active than those preceded by a T. Therefore, we recommend that users select the default setting for sites preceded by a T only, unless they have a specific reason to do otherwise.
For the purposes of this tutorial, set the upstream base to T only (the default option). This T will be shown in the output.
- Select additional options.
By default, Target Finder searches both the sequence as entered and the reverse complement for candidate TAL effector binding sites. To search only the sequence (and not the reverse complement), uncheck the option box.
To follow along with the tutorial, leave this box checked.
If you wish to receive email notification when your job completes, enter an email address in the appropriate field.
By default, Targer Finder displays only the 10 best scoring candidate sites in the browser-generated results table (all sites below the score cutoff will be available in the downloadable results file). If you wish to display more sites, you can enter a different number. For the tutorial, leave the number of sites displayed set to 10.
Finally, if you wish your results to remain on the server longer than 1 hour, select a different option from the drop-down menu under "Expires".
- Click the Submit button at the bottom of the page.
After you submit your job, you will be taken to a progress page.
You can hide the progress details by clicking on the arrow next to "Process log".
Do not navigate away from this page; or bookmark the page so you can return and download your results later! Your job may take several minutes to finish.
- Retrieve your results.
When your job completes, a table with the top candidate binding sites will appear in your browser. This table contains the top scoring sites; it may not include all sites! The complete list of sites is available in the downloadable output files listed above the table. By default, 10 sites are displayed in the table; if you changed this number, a different number of sites will be shown. Note that if not enough sites meet the score cutoff, the table may show fewer sites.
In the example from the tutorial, shown above, only two sites score below the cutoff score (discussed below); therefore, only two candidate sites are shown in the browser table.
All sites scoring below the cutoff are included in the downloadable tab-delimited results file.
- Interpret your results.
Target Finder will return all sites in the DNA sequence scoring below (better than) a cutoff score. The cutoff score is based on the Best Possible Score returned above the results table or at the top of the output file.
Best possible score gives the score for the TAL effector on its "perfect" binding site (the site with all RVDs aligned with their most frequently associated nucleotide). If a site's Score is closer to the Best possible score, the TAL effector is more likely to bind to that site.
TAL effectors in nature and their known targets typically have Scores less than 2-3 times the Best Possible Score for the TAL effector. Therefore, all sites scoring below 3*Best Possible Score are returned in the downloadable output file. By default, up to 10 of these sites are displayed in the browser table (a different number of sites will be shown if you changed the default setting).
Not all sites listed are guaranteed to be bound by the TAL effector and some binding sites that score above the cutoff may be missed. However, the list of sites should contain the majority of binding sites in the genome interest, with a longer list indicating that a TAL effector is more likely to have off-target binding sites.
You may wish to filter your list in order to address a question of interest. For example, when searching for additional off-target sites in a promoter sequence, you may be interested in sites that will result in gene activation. Therefore, you may want to consider only sites that are near a gene's transcriptional or translational start site.
Note that all binding sites are directly preceded by a T (or C, if that option is selected) at the 5' end. The T or C is now shown in the output, separated from the target sequence by a space.