YEASTRACT logo
 
Tutorial

Web Services new web services

DISCOVERER

Regulatory Associations:
- Search for TFs
- Search for Genes
- Search for Associations

Group genes:
- Group by TF
- Group by GO

Pattern Matching:
- Search by DNA Motif
- Find TF Binding Site(s)
- Search Motifs on Motifs

Utilities:
- ORF List ⇔ Gene List
- IUPAC Code Generation
Generate Regulation Matrix

Retrieve:
- TF-Consensus List
- Evidence Code List
- Upstream Sequence
- Flat files

About Yeastract:
- Contact Us
- Cite YEASTRACT
- Acknowledgments
- Credits


Support & suggestions:

KDBIO/INESC-ID IST

IUPAC Code Generation for DNA Sequences - Help

BSRG/IST

Index

  1. Summary
  2. Examples
  3. Input
  4. Output
  5. Iupac Nucleotide Code
  6. References



1. Summary

The IUPAC Code Generation utility finds the most compressed representation for a set of DNA strings of the same length, using a degenerate code, that corresponds to the standard nomenclature used by International Union for Pure and Applied Chemistry (IUPAC). This compact representation is based on "multiple-valued logic minimization" approach [1].

Back to top


2. Examples


Example 1
Example 2
Example 3
Example 4
Input   Output   Input   Output   Input   Output   Input   Output
ATAT
TATA
-> ATAT
TATA
    AAAT
AAAA
-> AAAW     AAAT
TAAA
-> AAAT
TAAA
    TCCGCGGA
TCCGTGGA
TCCACGGA
TCCGCGCA
TCCGCGGG
 ->  TCCGCGCA
TCCACGGA
TCCGTGGA
TCCGCGGR
Back to top


3. Input

The utility requires a set of sequences of the same length.

Back to top


4. Output

The output is the most compressed representation of the inserted set of DNA sequences. This compaction of is different from the usual probabilistic compaction of aligned DNA sequences that is based on the position specific frequency of bases. Since the IUPAC code generation as implemented in YEASTRACT does not consider probability of occurrence of a base at a particular location, it does not add further information other than the given sequences. Therefore, the compressed code generated in such a way would give the same set of sequences upon decompression, therefore, preserving the original details. The result is usually a smaller set of strings with one or more ambiguous bases, describing concisely, but also precisely, a set of similar or related DNA strings, such as TF binding sites.

IUPAC code generation is an adoption of ESPRESSO tool for multiple-valued logic minimization by Richard Rudell and Alberto Sangiovanni-Vincentelli, done by Nuno Mendes and David Nunes [2].

Back to top


5. IUPAC Nucleotide Code

Standard IUPAC Nucleotide code is used to describe ambiguous sites in a given DNA sequence motif, where a single character may represent more than one nucleotide. The code is shown in the table below.


IUPAC Code
Meaning
Origin of Description
G
G
Guanine
A
A
Adenine
T
T
Thymine
C
C
Cytosine
R
G or A
puRine
Y
T or C
pYrimidine
M
A or C
aMino
K
G or T
Ketone
S
G or C
Strong interaction
W
A or T
Weak interaction
H
A or C or T
not-G, H follows G in the alphabet
B
G or T or C
not-A, B follows A in the alphabet
V
G or C or A
not-T (not-U), V follows U in the alphabet
D
G or A or T
not-C, D follows C in the alphabet
N
G or A or T or C
aNy
Table adapted from [3].
Back to top

6. References

[1] R. Rudell and A. L. Sangiovanni-Vincentelli. Multiple-Valued Minimization for PLA Optimization. IEEE Transactions on Computer-Aided Design, CAD-6:727-750, September 1987. Link

[2] N. Mendes and D. Nunes, Geração de Código IUPAC, INESC-ID Tec. Rep. 16/2004, Jul 2004. Link

[3] Biochem J. 229(2): 281–286, July 15 1985. PubMed

Back to top
Back to top top w3c xhtml validator w3c css validator w3c xhtml+rdfa validator