- Search for TFs
- Search for Genes
- Search for Associations
Support & suggestions:
This query allows the user to search the described TF binding sites for one or more DNA motifs. The query string may be a simple nucleotide sequence, a sequence containing IUPAC nucleotide code or even a sequence containing regular expression elements.
Back to top
The query requires a DNA motif as input, which must be at least four bases long.Back to top
The search allows for substitutions, whose number (zero, one or two) can be selected using the box labelled Substitutions.Back to top
The input DNA motifs (and their complementary motifs) are compared with the described motifs contained in the database. The ouput is a list of TF binding sites that contain, are contained in, or precisely match the input moitfs (or their complements). This search allows the user to check whether a newly identified DNA motif corresponds precisely, is contained in, or contains a previously described TF binding site.
If a TF binding site contains a long stretch of fully degenerate nucleotides (N), then many input DNA motifs could match these nucleotides. To avoid obtaining a large list of irrelevant matches, only the terminal Ns of N repeat regions more than two nucleotides long (NN) in the TF binding sites are considered.
For example, a search for the DNA motif ATGAT results in the identification of the binding site belonging to the transcription factor Abf1p, with the consensus sequence TNNCGTNNNNNNTGAT. The DNA motif aligns with the five last nucleotides of the consensus, which is valid since this region contains only one fourfold degenerate position (N). On the other hand, the search for DNA motif AATGAT does not result in the previous TF binding site. Although the motif aligns with the previous TF binding site, its homologous subsequence in the TF binding site contains two fully degenerate positions (NN).
Now, consider the search for the motif TAACGT. This motif aligns perfectly at the beginning of the Ab1fp TF binding site, which contains an N repeat region of length 2. In this case, the alignment is allowed; hence the TF binding site is said to contain the motif TAACGT.Back to top
Simple nucleotide sequences are strings that
consist exclusively of the four characters that represent the DNA
A, T, G and C. A search for a given simple nucleotide sequence only
sequences that match the query string exactly.
Standard IUPAC Nucleotide code is used to describe ambiguous sites in a given DNA sequence motif, where a single character may represent more than one nucleotide. The code is shown in the table below.
Table adapted from .Back to top
A regular expression is a pattern containing characters and syntactic elements that matches a set of strings. The regular expression characters permitted in the searches for DNA motifs are those included in the IUPAC nucleotide code as well as the following syntactic element:
 – Matches one of the characters contained in the brackets.Back to top
 Biochem J. 1985 July 15; 229(2): 281–286. (PubMed)
Back to top
|Back to top|