YEASTRACT logo
 
Tutorial

Web Services new web services

DISCOVERER

Regulatory Associations:
- Search for TFs
- Search for Genes
- Search for Associations

Group genes:
- Group by TF
- Group by GO

Pattern Matching:
- Search by DNA Motif
- Find TF Binding Site(s)
- Search Motifs on Motifs

Utilities:
- ORF List ⇔ Gene List
- IUPAC Code Generation
Generate Regulation Matrix

Retrieve:
- TF-Consensus List
- Evidence Code List
- Upstream Sequence
- Flat files

About Yeastract:
- Contact Us
- Cite YEASTRACT
- Acknowledgments
- Credits


Support & suggestions:

KDBIO/INESC-ID IST

Search by DNA Motif in Transcription Factor Binding Sites - Help

BSRG/IST

Index

  1. Summary
  2. Examples
  3. Input
  4. Options
  5. Output
  6. Notes
    1. Simple Nucleotide Sequences
    2. IUPAC Nucleotide Code
    3. Regular Expressions
  7. References

1. Summary


This query allows the user to search the described TF binding sites for one or more DNA motifs. The query string may be a simple nucleotide sequence, a sequence containing IUPAC nucleotide code or even a sequence containing regular expression elements.


Back to top


2. Examples


Pattern

Matches

TATATAAG TATATAAG
TATAWAAM TATAAAAC, TATAAAAA, TATATAAC, TATAAAAA
TATA[GC]AA[AT] TATAGAAT, TATAGAAA, TATACAAT, TATACAAA
Back to top


3. Input


The query requires a DNA motif as input, which must be at least four bases long.

Back to top


4. Options


The search allows for substitutions, whose number (zero, one or two) can be selected using the box labelled Substitutions.

Back to top


5. Output


The input DNA motifs (and their complementary motifs) are compared with the described motifs contained in the database. The ouput is a list of TF binding sites that contain, are contained in, or precisely match the input moitfs (or their complements). This search allows the user to check whether a newly identified DNA motif corresponds precisely, is contained in, or contains a previously described TF binding site.

If a TF binding site contains a long stretch of fully degenerate nucleotides (N), then many input DNA motifs could match these nucleotides. To avoid obtaining a large list of irrelevant matches, only the terminal Ns of N repeat regions more than two nucleotides long (NN) in the TF binding sites are considered.

For example, a search for the DNA motif ATGAT results in the identification of the binding site belonging to the transcription factor Abf1p, with the consensus sequence TNNCGTNNNNNNTGAT. The DNA motif aligns with the five last nucleotides of the consensus, which is valid since this region contains only one fourfold degenerate position (N). On the other hand, the search for DNA motif AATGAT does not result in the previous TF binding site. Although the motif aligns with the previous TF binding site, its homologous subsequence in the TF binding site contains two fully degenerate positions (NN).

Now, consider the search for the motif TAACGT. This motif aligns perfectly at the beginning of the Ab1fp TF binding site, which contains an N repeat region of length 2. In this case, the alignment is allowed; hence the TF binding site is said to contain the motif TAACGT.

Back to top


6. Notes


Simple Nucleotide Sequences


Simple nucleotide sequences are strings that consist exclusively of the four characters that represent the DNA nucleotides: A, T, G and C. A search for a given simple nucleotide sequence only returns sequences that match the query string exactly.

Back to top

IUPAC Nucleotide Code


Standard IUPAC Nucleotide code is used to describe ambiguous sites in a given DNA sequence motif, where a single character may represent more than one nucleotide. The code is shown in the table below.



IUPAC Code
Meaning
Origin of Description
G
G
Guanine
A
A
Adenine
T
T
Thymine
C
C
Cytosine
R
G or A
puRine
Y
T or C
pYrimidine
M
A or C
aMino
K
G or T
Ketone
S
G or C
Strong interaction
W
A or T
Weak interaction
H
A or C or T
not-G, H follows G in the alphabet
B
G or T or C
not-A, B follows A in the alphabet
V
G or C or A
not-T (not-U), V follows U in the alphabet
D
G or A or T
not-C, D follows C in the alphabet
N
G or A or T or C
aNy

Table adapted from [1].
Back to top

Regular Expressions


A regular expression is a pattern containing characters and syntactic elements that matches a set of strings. The regular expression characters permitted in the searches for DNA motifs are those included in the IUPAC nucleotide code as well as the following syntactic element:

    [] – Matches one of the characters contained in the brackets.

Back to top


7. References


[1] Biochem J. 1985 July 15; 229(2): 281–286. (PubMed)


Back to top
Back to top top w3c xhtml validator w3c css validator w3c xhtml+rdfa validator