YEASTRACT (Yeast Search for Transcriptional Regulators And Consensus Tracking; www.yeastract.com) database presently contains more than 206000 regulatory associations between the yeast genes, based on more than 1300 bibliographic references. Each regulation has been annotated manually, after examination of the relevant references. The database also contains the description of 326 specific DNA binding sites shared among 113 characterized TFs. Further information about each yeast gene was obtained from Saccharomyces Genome Database (SGD), Regulatory Sequence Analysis Tools (RSAT) and Gene Ontology (GO) Consortium.
YEASTRACT database provides assistance in three major issues: prediction of gene transcriptional regulation, DNA motif and global expression analysis according to yeast transcription networks described in the literature. This tutorial presents three case-studies, exemplifying the use of different query options and utilities. Various other ways to exploit available options and utilities are possible.
Throughout YEASTRACT database and this tutorial, the regulatory associations are denominated "Documented" or "Potential":
The accuracy and updating of the information gathered, curated and inserted in this database is crucial to YEASTRACT users. Thus, we will value any contribution from the yeast community to achieve this goal.
The results presented for the 2.3 section (RankByTF) were computed with data from YEASTRACT on June 16, 2013. However, due to subsequent updates the current ranking may differ from the presented one.
Example 1: Identification of the documented and potential regulatory associations for an ORF/Gene
The functional analysis of an ORF or gene can be guided through the identification of its documented and potential transcription factors (TF). This example describes one of the possible ways to explore the regulatory associations for ORF YNR070w, encoding a putative ATP-binding cassette transporter, using various queries and utilities provided by YEASTRACT.
1.1 - Search for Documented Transcription Factors (TFs) The use of "Search Transcription Factors" query allows the identification of TFs which, are Documented and/or Potential transcriptional regulators of a given gene. The search for documented transcription factors acting directly upon YNR070w uncovers Nrg1p and Rap1p. The associated bibliographic references may be checked by the user to know the experimental basis for these regulatory associations.
According to the SGD description of Nrg1p and Rap1p these regulators are involved in glucose repression and chromatin silencing, respectively. Therefore, it may be considered of interest to examine the eventual link of ORF YNR070w to these biological processes.
1.2 - Search for Potential Transcription Factors (TFs) The use of "Search Transcription Factors" query may also identify the potential regulators of YNR070w.By default, all of the searched potential transcription factors will be displayed in tabular form. The Promoter link can be followed to see the binding sites for each TF in the promoter sequence of YNR070w. The distribution of TF binding sites in the promoter region of YNR070w can be viewed by checking the option image while searching.
The display of potential TFs on the image can be
controlled by un-checking their respective box in the color pallet
below the image and pressing the Redisplay button. The color
pallet displays the color for only those TFs for which binding sites
are found in the promoter region of the given gene(s). A close
observation of the image for TFs which are documented regulators for
YNR070w (i.e., Nrg1p and Rap1p) reveals that the binding site
for Nrg1p is present, while that for Rap1p is not. The role of Rap1p
in YNR070w regulation may be indirect or through a binding site
still not described in the literature or not listed in the database.
1.3 - Gene Grouping based on shared Gene Ontology (GO) terms The YEASTRACT utility "Group Genes by GO" allows the grouping of a list of genes according to the GO terms they share. The following list of genes, identified as potential regulators of YNR070w, is subjected to the GO based grouping, selecting Ontology Biological Process and Level 5.
The output (Table 1) displays GO terms in the first column, the percentage of genes out of the given list associated with respective GO terms in the second column and the cluster of genes associated to the GO term in the last column. Depending on the chosen Gene Ontology and level, grouping may differ.
Table 1 - GO associations for genes using Biological Process at level 5
The information in Table 1 reveals that most of the TFs potentially binding to the promoter region of YNR070W are involved in cell cycle, pseudohyphal growth, organic acid metabolism, response to abiotic stimulus and cellular lipid metabolism. The eventual involvement of YNR070W in these processes can thus be hypothesized. The association of this ORF, with the GO term "response to abiotic stimulus" appears to be consistent with its previous association to the PDR network (de Risi et al., 2000), as encoding a putative multidrug transporter (Bauer et al., 1999).
If the ORF/gene under study is predicted to encode a TF, it would be convenient to use the query, "Search Regulated Genes", options Documented or Potential, to retrieve all documented and potential targets for the TF, respectively. The grouping of the searched target genes by GO may also provide clues on the biological processes or molecular functions controlled by the TF.
1.4 - References
Example 2: Gene expression analysis based on regulatory associations
YEASTRACT provides tools for the classification and grouping of large lists of genes of interest, such as those found up- or down-regulated under a specific environmental or biological situation, as suggested by genome-wide expression data inspection. These analyses are based on known or algorithmically identified potential regulatory associations, deposited in the YEASTRACT database, and on the GO-based schema.
2.1 - Transform an ORF list into a gene list and vice-versa
The utility ORF List<->Gene List converts a given list of ORFs or Genes to a list of Genes or ORFs, respectively. In addition, it filters a mixed list into two separate lists of ORFs or Genes. This is useful to make the gene/ORF list reading more intuitive.2.2 - Rank Genes - Rank by Gene Ontology (GO)
The grouping of genes based on the GO terms they share is a feature common to a number of gene expression analysis software and is also implemented in YEASTRACT. Depending on the chosen Gene Ontology and level, grouping may differ. To exemplify this utility, we used the list of genes up-regulated in response to the expression of a point mutation in the PDR1 gene, encoding a transcription factor involved in Pleiotropic Drug Resistance in yeast, named PDR1-3, retrieved from de Risi et al. (2000).The grouping of this gene list, based on the Biological Process ontology at level 5 results in the following table:
Table 2 - GO associations for genes using Biological Process at level 5
In agreement with the published analysis of these results (de Risi et al., 2000) the main functional groups include "response to abiotic stimulus" (drugs included), "drug transport" and "cellular lipid metabolism", among others.2.3 - Rank Genes - Rank by TF
The query “Rank by TF” enables automatic selection and ranking of transcription factors potentially involved in the regulation of the genes in a list of interest. The TFs and their direct targets are presented in a table in decreasing order of a relevance score calculated for each TF, based on either regulations or regulatory paths targeting the genes in the list of interest and deposited in the YEASTRACT database. Different filters can be used in order to steer the search to a particular type of regulatory activity. To exemplify the “Rank by TF” utility, we analyse the results obtained for a list of genes found up-regulated upon exposure to quinine (dos Santos et al., 2009), which we hereby term QN dataset. This is a relatively simple dataset, which corresponds to a well characterized biological response, and is thus adequate to illustrate the usefulness of the different ranking methods. The results presented below were obtained using YEASTRACT on June 16, 2013.2.3.1 - Rank by TF based on regulation enrichment
When ranking by statistical significance of regulations, the TF score is given by a p-value denoting the overrepresentation of regulations of the given TF targeting genes in the list of interest relative to the regulations of that TF targeting genes in the whole YEASTRACT database. The p-value further denotes the probability that the TF regulates at least the number of genes found to be regulated in the list of interest if we were to sample a set of genes of the same size as the list of interest from all the genes in the YEASTRACT database. This probability is modelled by a hypergeometric distribution and the p-value is finally subject to a Bonferroni correction for multiple testing.
Below is the output of the utility “Rank by TF” based on regulation enrichment for the QN dataset, using the default filtering options. In Table 3, the first column indicates the name of the TF, the second column the % of genes in the list targeted by the TF, the third column the % resulting from the ratio between the number of genes in the list targeted by the TF and the number of genes targeted by the TF in the whole YEASTRACT database, the fourth column the enrichment p-value, and the fifth and final column the genes from the list of interest targeted by the TF.
Table 3 - Genes grouped by TF, ordered by regulation enrichment p-value for the QN dataset. Only the first 15 rows of the table are shown.
The enrichment-based ranking of transcription factors reported several transcription factors (Mig1, Nrg2 and Adr1) involved in glucose derepression as being among the top ranking TFs and potential key regulators of the yeast response to low-inhibitory concentrations of quinine, while these TFs would only appear after other more general regulators known to play a role in the regulation of yeast response to several environmental stresses if the genes were ranked according to the number of genes they regulate in the dataset. Significantly, these results are consistent with the fact that yeast adaptation to quinine was shown to involve a glucose limitation response, probably as a consequence of glucose uptake inhibition by the drug (dos Santos et al., 2009).2.3.2 - Rank by TF using TFRank
The second kind of ranking involves the use of the TFRank method (Gonçalves et al., 2011).This method exploits every regulatory path containing the genes in the list of interest to select the relevant part of the network. It achieves the prioritization of regulators by computing a relevance measure reflecting their contribution within the network under study. Advantages of the TFRank algorithm include its ability to consider multiple levels of regulation and interactions between transcription factors in an integrated, rather than isolated-per-TF, network analysis perspective.
The relevance score is obtained using a personalized ranking method related to local clustering on graphs based on a discrete approximation of the heat kernel. It works by diffusing a signal through the transpose of the network (to diffuse in reverse order, that is, reach TFs from their target genes), starting from the genes of interest, and accumulating a score in every gene/node in the network. In YEASTRACT, the TFRank method enables the customization of a parameter, termed heat diffusion coefficient, which allows to control the range of influence of the regulatory cascade in the network. A low value causes slow diffusion and thus sets a preference for more local regulators, while a large value promotes rapid diffusion resulting in a preference for more global regulatory players.
Below we present the output of the utility “Rank by TF” using TFRank for the QN dataset, combined with the default filtering options. In Table 4, the first column indicates the position of the TF in the ranking, the second column the name of the TF, the third column the number of regulations of the TF in the YEASTRACT database, the fourth column the score given to the TF by TFRank, and the fifth and final column the genes from the list of interest targeted by the TF (note that this list does not necessarily contain all the genes in the regulatory paths flowing from the TF and leading to the genes of interest, as only direct regulations between the TF and the target genes in the list of interest are considered).
Table 4 - Potential regulators of the genes in the QN dataset, selected and ranked using the TFRank method with a heat diffusion coefficient value of 0.25. Only the first 15 rows of the table are shown.
In this analysis the TFRank algorithm was used with the diffusion coefficient set to a low value (0.25) in order to favor proximal regulators, presumably more specific to the biological response under study. In this case, TFRank indicated Adr1, Hap4, Gal4, Mal33, Gis1 and Cat8 as the most relevant mediators of the yeast transcriptional response to quinine. Notably, all these TFs were found up-regulated in response to quinine stress and are known to play a role in yeast adaptation to alternative carbon sources. This is in agreement with the fact that quinine induces intracellular glucose limitation. In clear contrast, the top TFs obtained based on the percentage of documented regulatory associations targeting the up-regulated genes in the quinine dataset are associated with more general cellular responses, and none of them was found up-regulated in response to quinine stress (dos Santos et al., 2009). Even when compared to the TF enrichment tool described above, TFRank highlights a higher number of glucose limitation TFs and proposes the network frame shift in which they operate. This highlights the importance of using the different methods of ranking as complementary tools of analysis.2.4 - Search for Regulatory Associations
The query "Find regulatory associations" may be used to group genes according to their documented and potential co-regulations. This query displays all the information obtained using the several options of "Group genes by TF" functionality in a single table, allowing the comparison of the potential and documented regulons deduced for an array gene list. To save space, this comparison is exemplified in Table 5 just for Pdr1p, although the whole list of the implicated TFs appears when using this functionality.
Table 5 clearly shows that there is a significant discrepancy between the genes, which are considered documented or potential targets of Pdr1p. The same is registered for other TFs. The observed differences may be due to the fact that: i) the documented targets of each TF may include indirect targets; ii) the existence of the TF binding site in the promoter region of a gene does not necessarily makes it a target of the corresponding TF; iii) there may be gene targets and binding sites for a specific TF that are not yet described in the literature or included in the database. For example, HXK1, SCW11, MET17, FMP43, FRE4, DSE4, COS10, REV1 genes, all confirmed targets of Pdr1-3p do not possess any Pdr1p binding site in their promoter regions. These genes may be indirect targets of Pdr1p, or their promoter region may include a binding site for this TF, which is not yet defined (or introduced in this database).
Table 5 - Regulatory Associations
Notice that within the query "Find regulatory associations", there are two search options, Any Transcription Factors to Any Gene and All Transcription Factors to Any Gene. The former option was used in the previous analysis to search for regulatory association. The later option searches a regulatory association where all the input TFs control at-least one of the input genes. This option enables the identification of groups of genes whose transcription is potentially under the simultaneous control of a number of different transcription factors. For instance, we may search for the regulatory association between the PDR related TF, Pdr1p,Pdr3p, Pdr8p and Yrr1p and the de Risi gene list using the All Transcription Factors to Any Gene option:
Table 6 - Regulatory associations
The results in table 6 reveals that there are four documented gene targets for Pdr1p, Pdr3p, Yrr1p and Pdr8p, although there is no potential gene target for all four TFs within the list under examination. The possibility that these TFs act together in the up-regulation of their overlapping targets has been examined to some extent. For instance, Pdr1p and Pdr3p can act as homo- or heterodimers (Mamnun et al., 2002) and the transcriptional regulation of Yrr1p or Pdr8p was found to be dependent on Pdr1p or Pdr3p (Hickell et al., 2003, Akache et al., 2004).
2.5 - References
Example 3: Search for a DNA motif within known TF binding sites and promoter regions
The search for over-represented consensus or DNA motifs in the promoter regions of co-regulated genes, revealed by global expression analysis, may contribute to the identification of known or new transcription associations underlying the yeast response under study. YEASTRACT provides "Search by DNA Motif" option to facilitate this analysis. This is exemplified below for the motif CGGGC found to be over-represented in the upstream regions of the genes up-regulated in yeast cells under glucose- or ethanol-limited growth (Wu et al., 2004).3.1 Search for a DNA Motif within known TF Binding Sites
This query allows the user to check if the DNA motif has already been documented as the binding site for a specific TF. This search allows the user to check whether a newly identified DNA motif matches perfectly, is contained in or contains a previously describe TF binding site.
The result of this query shows that the CGGGC motif has no exact matches to any of the 284 different TF binding sites described in the literature and compiled in YEASTRACT, but is contained by the Cup2p binding site, in its most degenerate region (HTHNNGCTGD; Beaudoin and Labbé, 2001). This conclusion appears to suggest that the examined motif does not correspond to any of the TF binding sites described so far.
3.2 Search for Genes having a DNA Motif in their Promoter Regions
This query search the existence of a new DNA motif in the promoter regions of all genes present in the yeast genome. The result of this query shows that the CGGGC motif occurs in the promoter region of 2169 among the approximately 6000 yeast genes. In the promoter region of 567 of these genes it occurs at least twice. This information, together with the tests on statistical significance (Wu et al., 2004), may be useful to anticipate the biological significance of a newly proposed consensus.3.3 References
Suggestions or comments: