Frequently Asked Questions
General Questions
- How should I cite TALE-NT?
-
If you use the website to design TALENs or TAL effectors, or to identify binding sites for a TAL effector, please cite the following papers:
Doyle, E.L., Booher, N.J., Standage, D.S., Voytas, D.F., Brendel, V.P., VanDyk, J.K., and Bogdanove, A.J. (2012) TAL Effector-Nucleotide Targeter (TALE-NT) 2.0: tools for TAL effector design and target prediction. Nucleic Acids Res. doi: 10.1093/nar/gks608. (http://dx.doi.org/10.1093/nar/gks608)
Cermak, T., Doyle, E.L., Christian, M., Wang, L., Zhang, Y., Schmidt, C., Baller, J.A., Somia, N.J., Bogdanove, A.J., and Voytas, D.F. (2011) Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting. Nucleic Acids Res. 39:e82. (http://dx.doi.org/10.1093/nar/gkr739) - What do the different TALE-NT tools do?
-
TALEN Targeter should be used to design RVD sequences for targeting TAL effector nucleases (TALENs). Information about building TALENs and custom TAL effectors can be found in the paper listed above.
TAL Effector Targeter designs custom single TAL effectors to target a DNA sequence.
Target Finder uses a scoring model to identify potential target sites for a naturally occuring TAL effector in a DNA sequence. Target Finder also identifies sites in a genome or promoterome sequence for a given custom TAL effector.
All tools are based entirely on RVD and DNA sequences. Because they allow users to customize the number of repeats and spacer lengths whenever possible, they can be used to design TALENs/TAL effectors that will work with any architecture or construction method!
For more information about each tool, scroll down to their individual help sections. Note that the output for each tool is formatted in a slightly different way. For help understanding the output, please refer to the section for your specific tool. - How do I view the output?
-
Output for all of the tools will appear in a table on the query progress page when the query is complete.
Additionally, all of the tools supply a link to a downloadable file. The output file is formatted as tab-delimited text. It is best viewed by copying the output and pasting it as tab-delimited text into a spreadsheet program such as Excel. - How do I enter my DNA sequence(s)? I keep getting an error that says my sequences are not in correct fasta format.
-
The required DNA sequence format is the same whether you choose to cut and paste your sequence into the text box or upload a file containing the sequences.
The first line of each sequence should begin with the '>' character, followed by a description of the sequence/sequence name. The DNA sequence begins on the next line. Even if you only enter one sequence, it must include this first header line! Multiple sequences can be entered as long as each sequence has its own header line. See the examples below.
DNA sequences can be uppercase or lowercase. DNA sequences should contain only the letters A, C, G , T, and N. Note that TALE-NT does not design TALENs/TAL effectors for or search for TAL effector binding sites in regions with N's.
Example of entering one DNA sequence:
Example of entering multiple DNA sequences:
TALEN Targeter Questions
- Why are the design guidelines no longer available?
-
Originally, TALEN Targeter designed TALEN pairs which conformed to the design guidelines proposed in Cermak, et al.(2011). However, a recent paper by Reyon et al. (2011) provided evidence that following the guidelines had little effect on TALEN efficiency. Therefore, we have removed the guidelines from the newest version TALEN Targeter.
Users may now choose to search for TALENs conforming to a variety of architectures. Additional options include searching for TALENs targeting a specific site, or returning all TALENs targeting a sequence.
If you wish to use the guidelines, the old version of TALEN Targeter is still available here. - Should I search for TALENs preceded by a T only, or use the options for C, or T or C?
-
At least one example of a TAL effector targeting a site preceded by C has been found in nature (Yu et al., 2011), and custom TAL effectors targeting sites preceded by C have been reported in the literature (Miller et al., 2010). Therefore, TALEN Targeter now includes options to search for sites preceded by a C.
However, in our hands, sites preceded by a C were significantly less active than those preceded by a T. Therefore, we recommend that users select the default setting for sites preceded by a T only, unless they have a specific reason to do otherwise. - What is an architecture? What architectures are available? How were the preset ranges for spacer and number of RVDs chosen?
-
Different TALEN construction methods incorporate different portions of the TAL effector on either side of the repeat domain. We refer to these as different "architectures". Each architecture works best with a different spacer size and number of RVDs.
TALEN Targeter allows users to search for TALENs using preset ranges for spacer size and number of repeats. Available architectures are as follows:
Cermak et al., 2010, spacer = 15-24, repeats = 15-20
Miller et al., 2011, spacer = 15-20, repeats = 15-20
Mussolino et al., 2011, spacer = 12-15, repeats = 15-20
Li et al, 2011, spacer = 16-31, repeats = 15-20
Spacer ranges were chosen based on the optimal TALEN activity observed for a given architecture in the papers referenced above. The number of RVDs is set at 15-20. All papers above used TALEN pairs with the number of repeats in this range.
You can select an architecture using the drop down menu. - What do the different filter options mean?
-
"Show TALEN pairs (hide redundant TALENs)" returns one TALEN pair targeting each base in the sequence. A base is "targeted" if it is at the center of the spacer. Most bases will be located in the center of the spacer for multiple TALEN pairs. This option returns one TALEN pair per base by choosing the pair with the smallest average number of RVDs and the shortest spacer (within the selected spacer range).
"Show all TALEN pairs (include redundant TALENs)" returns the complete list of TALEN pairs. Multiple TALEN pairs that target the same site may be returned; redundant pairs are not filtered out of the output. This option will give a very long results list.
"Show TALEN pairs targeting a specific site" allows users to specify a single position that they are interested in targeting. This option will return all TALEN pairs that have this specific position at the center of the spacer. - How big will my DNA target be? How many repeats will the TALENs have?
-
By default, each TAL effector of the TALEN pair has between 15 and 20 repeats which corresponds to a DNA target site that is 15-20 bases long. (If you choose to search for a custom spacer length, your TALEN pair may be different). Each target site is preceded at the 5' end by a T (or C, if this option is selected). The 5' T (or C) is not counted in the number of repeats or the target size for each TALEN. If you choose to enter a different range for number of repeats, the size of your TALEN target sequences will reflect this.
The complete TALEN site will consist of a pair of two TALEN targets, oriented 5' to 3' on opposite strands of the DNA, and separated by a spacer. - Do the TALEN target sites include a T (or C) at the 5' end?
-
Yes, the target for each TALEN of the TALEN pair is preceded by a T (or C, if this option is selected) at the 5' end. The 5' T or C is now indicated in the output. It is not counted in the size of the target.
- TALEN Targeter doesn't find any TALEN sites in my DNA sequence! Why not?
-
First, make sure your DNA sequences are long enough. Using the default settings the shortest possible site is 45 bases long (two 15 base TALEN sites separated by a 15 base spacer). To have the best chance of finding sites in a DNA sequence, the sequence should be longer than this.
TALEN Targeter will not design TALs that bind to DNA sequences with unknown bases (contain N's).
If you're sequences are long enough, you may wish to use the option "Provide custom spacer/RVD lengths" to enter a wider range for number of RVDs or spacer.
If you chose the option "Find TALEN Configurations for a Specific Cut Site", you may try specifying a different position a few bases away. - I want to use TALEN Targeter to target a specific site in a gene. How do I do this?
-
To target a specific region of DNA (whether this means targeting anywhere in an exon sequence or targeting one or two specific bases) you should include the region to be targeted, plus about 45 bases of DNA directly flanking the region on each side. In your output, look for TALEN pairs that have your region of interest centered in the spacer.
If you have a single specific base you want to target, choose the option "Find TALEN Configurations for a Specific Cut Site". Enter the position you would like to target in the box. Your output will include all TALEN pairs that have the specified nucleotide at the center of the spacer. - How are the sites in the TALEN Targeter output ordered? Are they ranked in any way?
-
Sites in the TALEN Targeter output are ordered by their position in the DNA sequence. They are not ranked in any way. All of the sites are designed to meet criteria for a "good" site and should work equally well.
- What are the "Unique RE sites in Spacer" shown in the output?
-
TALEN Targeter identifies restriction enzymes that will cut exactly once in the spacer and nowhere else in the 250 bases upstream and 250 bases downstream of the spacer. These sites can be used to screen for TALEN activity. Enzymes and their binding sites are returned using standard single letter nucleotide symbols. For non-palindromic RE sites, both the target sequence and the reverse complement are returned, to help you identify the site in the spacer (for example AlwI:GGATC|GATCC).
- What does the "Cut site" column shown in the output mean?
-
We define cut site as the middle base in the spacer.
- What settings does TALEN Targeter use when counting off-target sites?
-
TALEN Targeter counts off-target sites that are preceded by a T with a score less than 3.0x the best possible score.
- What do TAL1 and TAL2 mean in the hover text for off-target site counts?
-
TALEN-mediated DNA cleavage can occur wherever two TALEN monomers bind with the proper spacing and orientation. Thus, a search for off-target sites must consider all four possible combinations of TALEN monomer RVD sequences: RVD1+RVD1, RVD1+RVD2, RVD2+RVD1, and RVD2+RVD2. TAL1 indicates which RVD sequence is binding to the plus strand of DNA (the sequence as entered). TAL2 indicates which RVD sequence is binding the opposite strand. The number following a combination of TAL1 and TAL2 is the number of off-target sites for a TALEN composed of those 2 monomers. The number of sites displayed in the result table is the sum of the individual counts.
- TALEN Targeter is reporting a single off-target site for my TALEN, but when I use Paired Target Finder to search for it all I get is my target site. What's going on?
-
The off-target counter on TALEN Targeter doesn't take into consideration where the sequence you're designing TALENs for is from. If it's reporting a single off-target for your TALEN, and the sequence you're designing TALENs for is from the genome you're counting off-targets for, the reported off-target is actually the sequence you're trying to target. This naming discrepancy an artifact from when TALEN Targeter designed TALENs using NN-G by default instead of NH-G and TALENs typically had thousands of predicted off-targets.
TAL Effector Targeter Questions
- What do all the checkbox options/design guidelines mean?
-
By default, TAL Effector Targeter designs TAL effectors that obey the positional and composition biases of TAL effectors observed in nature. Although following these rules may not be necessary, it certainly will not reduce the TAL effectors' functionality! If you do not want to require your TAL effectors to obey these rules, or if the default settings do not find enough potential TAL effector binding sites in your DNA sequences, you can check one or more of the boxes to relax the design rules.
TAL Effector Targeter uses the design guidelines proposed in "Cermak, et al.(2011). A recent paper by Reyon et al. (2011) provided evidence that following the guidelines had little effect on TALEN efficiency. However, the effect of the guidelines on TAL effector transcriptional activator activity. Therefore, we have left the guidelines available for TAL Effector Targeter. If you do not with to follow one or more guidelines, uncheck the appropriate box to turn the guideline off. - Should I search for TAL effector sites preceded by a T only, or use the options for C, or T or C?
-
At least one example of a TAL effector targeting a site preceded by C has been found in nature (Yu et al., 2011), and custom TAL effectors targeting sites preceded by C have been reported in the literature (Miller et al., 2010). Therefore, TAL Effector Targeter now includes options to search for sites preceded by a C.
However, in our hands, sites preceded by a C were significantly less active than those preceded by a T. Therefore, we recommend that users select the default setting for sites preceded by a T only, unless they have a specific reason to do otherwise. - How big will my DNA target be? How many repeats will the TAL effector have?
-
By default, each TAL effector has 15-30 repeats which correspond to a DNA target site that is 15-30 bases long. Each target site is preceded at the 5' end by a T (or C, if this option has been selected). This T/C is not counted in the number of repeats or the target size for each TAL effector.
- Do the TAL effector target sites include a T (or C) at the 5' end?
-
Yes, the target for each TAL effector is preceded by a T (or C, if this option is selected) at the 5' end. This T or C is now shown in the output. It is not counted in the size of the target.
- The TAL Effector Targeter webtool doesn't find any good TAL effector sites in my DNA sequence! Why not?
-
First, make sure your DNA sequences are long enough. By default, sites are 16-31 bases long (a 15-30 basepair target preceded by a T (or C) at the 5' end). To have the best chance of finding sites in a DNA sequence, the sequence should be several times longer than this.
TAL Effector Targeter will not design TAL effectors that bind to DNA sequences with unknown bases (contain N's).
You may also consider unchecking some of the default options below the sequence input box to relax the criteria for identifying TAL effector targets.
Target Finder Questions
- How do I enter a sequence to search?
-
You may cut and paste one or more FASTA-formatted sequences into the textbox, upload a text file containing FASTA-formatted sequence(s), or select a genome or promoterome from the drop-down menu. For more information about uploading your own sequence, see "How do I enter my DNA sequence(s)?" under General Questions, above.
- What genome/promoterome sequences are available to search?
-
Available genome sequences include:
- Arabidopsis thaliana (TAIR10)
- Brachypodium distachyon (v1)
- Caenorhabditis elegans (WS220)
- Danio rerio (Zv9)
- Drosophila melanogaster (BDGP5)
- Homo sapiens (GRCh37)
- Mus musculus (NCBIM37)
- Oryza sativa (IRGSP-1.0)
- Rattus norvegicus (Rn5)
- Solanum lycopersicum (SL2)
All genome sequences were downloaded from Ensembl.
Promoterome sequences for all of the above species are also available. Promoteromes were downloaded from the UCSC Genome Browser, except for Oryza sativa (downloaded from the Rice Annotation Project Database) and Arabidopsis thaliana (downloaded from The Arabidopsis Information Resource).
Promoterome is defined as the 1000 bases upstream of all annotated translational start codons. - What if my genome/promoterome sequence isn't available on your site?
-
You can download a version of our TAL Effector Site Finder software to run on your own computer. It will work with any FASTA-formatted file as input. Also, contact us and let us know what sequences you would like us to add. We will continue to add new sequences whenever there is sufficient interest.
- What score cutoff should I select?
-
Target Finder outputs a list of the best (lowest) scoring sites in a genome/promoterome or user-entered sequence. All candidate sites that score less than or equal to the score cutoff times the Best Possible Score for the RVD sequence are included in the downloadable output file. The default cutoff of 3.0 times the best possible score captures all known TAL effector target-pairs in nature, and should be sufficient for most users. Higher thresholds (such as 3.5 or 4.0) will identify more candidate binding sites, but the additional sites will have worse scores, and will be less likely to be functional. Raising the threshold will also make the search must slower.
We recommend that users keep the default score cutoff of 3.0. - Does Target Finder require target sites to be preceded by a T at the 5' end?
-
Target Finder now gives users the option to search for sites preceded by a T or C.
At least one example of a TAL effector targeting a site preceded by C has been found in nature (Yu et al., 2011), and custom TAL effectors targeting sites preceded by C have been reported in the literature (Miller et al., 2010). Therefore, Target Finder now includes options to search for sites preceded by a C.
However, in our hands, sites preceded by a C were significantly less active than those preceded by a T. Therefore, we recommend that users select the default setting for sites preceded by a T only, unless they have a specific reason to do otherwise. - What does the Score column in the output mean?
-
The RVD sequence is scored against each possible target site using a model based on a position weight matrix (PWM) for known RVD-nucleotide associations. RVDs with known binding affinities that are considered by the program are NI, HD, NN, NG, NS, N*, HG, HA, ND, NK, HI, HN, NA, IG, H*.
A lower score indicates a higher binding affinity between the RVD and the target sequence.
Since the PWM is based on known TAL effector-target pairs that occur in nature, the model is most accurate for common RVDs. If your RVD sequence contains a lot of unusual RVDs or RVDs that are not observed in nature, the scoring model will be less accurate.
For more information see supplementary Script S1 of the following paper:
Moscou, M.J. and Bogdanove, A.J. (2009) A simple cipher governs DNA recognition by TAL effectors. Science. 326(5959):1501. (http://dx.doi.org/10.1126/science.1178817) - What does the Best Possible Score reported above the results table and at the top of the output file mean?
-
The best possible score is the score that the RVD sequence would get if it was aligned with its "perfect" target (all NI's aligned with A's, all HD's with C's, all NN/NH's with G's, and all NG's with T's).
The scoring function used is described above. - Will a TAL effector with my RVD sequence bind to all of the target sites in the output?
-
Probably not. Target Finder outputs a list of the best (lowest) scoring sites in a genome/promoterome or user-entered sequence. The sites with the lowest scores appear in the output table automatically generated on the results page when your query finishes. The sites in this table are sorted by score, and you can choose the number of sites that you want to appear in the table (default is 10). All candidate sites that score less than or equal to 3 times the best possible score for the RVD sequence are included in the downloadable output file. (This option can be changed to include all sites scoring less than 3.5 or 4.0 times the Best Possible Score in the output file.)
See the next questions, "How do I know if my TAL effector will bind the target sites in the output?" and "How can I compare predicted TAL effector target sites?" for more information. - How do I know if my TAL effector will bind to a predicted target?
-
One way is to compare the Score with the Best Possible Score. A TAL effector is more likely to bind to a lower scoring target than to a higher scoring target. The Best Possible Score is the lowest score possible for the TAL effector. TAL effectors are more likely to bind targets that score very close to the best possible score. To be bound by the TAL effector, a target should probably have a score that is no more than 3-4 times bigger than the Best Possible Score. Target Finder outputs all sites with a score less than 3 times the Best Possible Score. (This option can be changed to include all sites scoring less than 3.5 or 4.0 times the Best Possible Score in the output file).
However, this is only a very general guideline. Some targets with higher scores may still be bound by the TAL effector, and some targets with lower scores may not be bound.
See the questions, "How do I know if my TAL effector will bind the target sites in the output?" and "How can I compare predicted TAL effector target sites?" for more information. - How can I compare predicted TAL effector target sites?
-
TAL effector targets for the same TAL effector can be compared based on score. TAL effector targets with lower scores are more likely to be bound by the TAL.
Scores should not be used to compare targets for different TAL effectors because the scores increase as the length of the RVD sequence increases. Targets of longer TAL effectors will typically have higher scores than targets of shorter TAL effectors, even if the longer TAL effector matches the target more closely (has fewer mismatches) than the shorter TAL effector. - What does plus strand mean in the output for genome and promoterome searches?
-
For genome searches the plus strand is the plus strand of the chromosome. For promoterome searches plus strand is the strand the gene occurs on.
Paired Target Finder Questions
- What does Paired Target Finder do?
-
Paired Target Finder identifies paired sites for two RVD sequences, oriented 5' to 3' on opposite strands of DNA and separated by a spacer, such that the paired sites may form a functional TALEN binding site. Candidate sites are evaluated using a scoring function. Paired Target Finder output represents a best guess or a starting point to identify potential target (and off-target!) sites for a TALEN.
- How do I enter a sequence to search?
-
You may cut and paste one or more FASTA-formatted sequences into the textbox, upload a text file containing FASTA-formatted sequence(s), or select a genome or promoterome from the drop-down menu. For more information about uploading your own sequence, see "How do I enter my DNA sequence(s)?" under General Questions, above.
- What genome/promoterome sequences are available to search?
-
Available genome sequences include:
- Arabidopsis thaliana (TAIR10)
- Brachypodium distachyon (v1)
- Caenorhabditis elegans (WS220)
- Danio rerio (Zv9)
- Drosophila melanogaster (BDGP5.25)
- Homo sapiens (GRCh37)
- Mus musculus (NCBIM37)
- Oryza sativa (IRGSP-1.0)
- Rattus norvegicus (Rn5)
- Solanum lycopersicum (SL2)
All genome sequences were downloaded from Ensembl.
Promoterome sequences for all of the above species are also available. Promoteromes were downloaded from the UCSC Genome Browser, except for Oryza sativa (downloaded from the Rice Annotation Project Database) and Arabidopsis thaliana (downloaded from The Arabidopsis Information Resource).
Promoterome is defined as the 1000 bases upstream of all annotated translational start codons. - What if my genome/promoterome sequence isn't available on your site?
-
We are working to make a downloadable version of the Paired Target Finder script available so that you can run searches against other genomes on your own computer. Until then, please contact us and let us know what sequences you would like us to add. We will continue to add new sequences whenever there is sufficient interest.
- What spacer length should I choose?
-
Choose the optimal range of spacer lengths for the TALEN architecture you are using. Common architectures and their optimal spacers include
Cermak et al., 2010, spacer = 15-24, repeats = 15-20
Miller et al., 2011, spacer = 15-20, repeats = 15-20
Mussolino et al., 2011, spacer = 12-15, repeats = 15-20
Li et al, 2011, spacer = 16-31, repeats = 15-20 - What score cutoff should I select?
-
Paired Target Finder outputs a list of all candidate paired sites such that each RVD has a score less than or equal to the score cutoff times its Best Possible Score. The default cutoff of 3.0 times the best possible score captures all known TAL effector target-pairs in nature, and should be sufficient for most users. Higher thresholds (such as 3.5 or 4.0) will identify more candidate binding sites, but the additional sites will have worse scores, and will be less likely to be functional. Raising the threshold will also make the search must slower.
We recommend that users keep the default score cutoff of 3.0. - Does Paired Target Finder require target sites to be preceded by a T at the 5' end?
-
Target Finder now gives users the option to search for TALEN sites with each monomer binding site preceded by a T or C.
At least one example of a TAL effector targeting a site preceded by C has been found in nature (Yu et al., 2011), and custom TAL effectors targeting sites preceded by C have been reported in the literature (Miller et al., 2010). Therefore, Paired Target Finder includes options to search for sites preceded by a C.
However, in our hands, sites preceded by a C were significantly less active than those preceded by a T. Therefore, we recommend that users select the default setting for sites preceded by a T only, unless they have a specific reason to do otherwise. - What does the Locus column in the output mean?
-
If you searched a genome or promoterome, the Locus column indicates the chromosome or gene promoter in which the site is located, and the first position of the TALEN binding site. If you supplied your own sequence(s), Locus indicates the name of the sequence in which the site is located, and the first position of the binding site.
- What do the columns TAL1 and TAL2 mean?
-
TALEN-mediated DNA cleavage can occur wherever two TALEN monomers bind with the proper spacing and orientation. Thus, a search for off-target sites must consider all four possible combinations of TALEN monomer RVD sequences: RVD1+RVD1, RVD1+RVD2, RVD2+RVD1, and RVD2+RVD2. TAL1 indicates which RVD sequence is binding to the plus strand of DNA (the sequence as entered). TAL2 indicates which RVD sequence is binding the opposite strand.
- What do the TAL1 Score and TAL2 Score columns in the output mean?
-
Each TALEN monomer RVD sequence is scored against each possible target site using a model based on a position weight matrix (PWM) for known RVD-nucleotide associations. RVDs with known binding affinities that are considered by the program are NI, HD, NN, NG, NS, N*, HG, HA, ND, NK, HI, HN, NA, IG, H*.
A lower score indicates a higher binding affinity between the RVD and the target sequence.
Since the PWM is based on known TAL effector-target pairs that occur in nature, the model is most accurate for common RVDs. If your RVD sequence contains a lot of unusual RVDs or RVDs that are not observed in nature, the scoring model will be less accurate.
TAL1 score indicates the score for the RVD sequence in the TAL1 column; TAL2 score is the score for the RVD sequence in the TAL2 column.
For more information see supplementary Script S1 of the following paper:
Moscou, M.J. and Bogdanove, A.J. (2009) A simple cipher governs DNA recognition by TAL effectors. Science. 326(5959):1501. (http://dx.doi.org/10.1126/science.1178817) - What do the Best Possible Scores reported above the results table and at the top of the output file mean?
-
The best possible score is the score that the RVD sequence would get if it was aligned with its "perfect" target (all NI's aligned with A's, all HD's with C's, all NN/NH's with G's, and all NG's with T's).
The scoring function used is described above. - Will all of the output sites function as TALEN sites for monomers with my RVD sequences?
-
Probably not. A single TAL effector is more likely to bind to a lower scoring target than to a higher scoring target. The Best Possible Score is the lowest score possible for the TAL effector. To be bound by the TAL effector, a target should probably have a score that is no more than 3-4 times bigger than the Best Possible Score. Paired Target Finder outputs all sites with the proper orientation and spacing where each monomer has a score less than 3 times the Best Possible Score for its RVD sequence. (This option can be changed to include all sites scoring less than 3.5 or 4.0 times the Best Possible Score in the output file).
However, this is only a very general guideline. Some targets with higher scores may still be bound by the TAL effector, and some targets with lower scores may not be bound. Additionally, it is unclear how the scoring function applies to paired TALEN monomers. For example, it is possible that only one monomer with a good scoring site would be sufficient for TALEN activity to occur. Therefore, Paired Target Finder results should be treated as an estimate of the off-target potential of a pair of TALEN monomers. - What does plus strand mean in the output for genome and promoterome searches?
-
For genome searches the plus strand is the plus strand of the chromosome. For promoterome searches plus strand is the strand the gene occurs on.