Using the WebComparator: A tutorial

This tool serves two purposes: Firstly, it allows the comparison of expression profiles across a compendium of tissues and/or developmental stages of (putatively) homologous genes in wheat and barley. Gene expression was measured through hybridization to the Barley1 and wheat Affymetrix GeneChips, as described in the references listed at the bottom of this page. Secondly, it provides a mechanism by which to assess, probeset by probeset, the reliability of results obtained in any experiments using one or both of these genechips.

Selection of probeset

The selection of the probeset is self-explanatory. Note that a partial probeset name is sufficient. This is handy because on these genechips there are often quite a few probesets designed for the same sequence. These all have the same base name (e.g. Ta.XXXX.Y) followed by assorted suffixes (e.g. _at, _s_at, .S1, .A1 etc.). The precise meaning of most of this notation is explained in Appendix B of Affymetrix's document on GeneChip Expression Analysis. Note that probesets suffixed with .A1 indicate the reverse complemented sequence to the corresponding .S1 probeset, so in general only one of the two will hybridize.

The homology graphs and their interpretation

After entering the required probeset the Comparator attempts to find homologs to the sequence the probeset is designed for and, if homologs are found, these can be displayed in an appropriate “homology graph”. The method by which homologs are determined is described in detail in Ref. 2, but basically it relies on intra- and inter-species BLAST searches. Roughly speaking, homology is assumed to exist if the best BLAST hit has a sufficiently good E-value or if the BLAST hit is of a similar quality as the best BLAST hit.

tutorial_fig1
Fig. 1: A typical homology graph

A typical homology graph is shown in Fig. 1. Each vertex indicates a unique consensus sequence, and if there are several entries for a vertex (e.g. Ta.6644.3.S1_a_at and Ta.6644.3.S1_x_at) these correspond either to different probesets tiled to the same consensus sequence (or, in the case of .A1 probesets, to its reverse complement). Wheat sequences are coloured blue, while barley sequences are coloured red. Arrows, indicating a sufficiently good BLAST hit, are labelled with the corresponding E-value and are oriented from the query sequence to the target sequence. So, for example, it can be seen in Fig. 1 that a BLAST search of the sequences on the barley chip, using Ta.6644.2.S1_at as a query, results in a good match for the barley Contig11223_at. Note however that the reverse BLAST search of the wheat chip using Contig11223_at does not result in a match to Ta.6644.2.S1_at because a far better one exists for this sequence, namely that to Ta.11482.1.A1_at.

Without a detailed phylogenetic study one cannot conclusively differentiate othologs, paralogs and homeologs on these diagrams, however they do provide a first indication. For example, while the three wheat sequences in Fig.1 are all reasonably similar to the barley Contig11223, they are unlikely to correspond to three wheat homeologs (i.e. copies on the A, B and D genomes of wheat) of the same barley gene because in that case one would have additionally expected sufficient similarities among the two wheat sequences themselves. As an example of this, a graph containing three known homeologs is shown in Fig. 2. One of several possible interpretations of Fig.1 would be that Contig11223 and Ta.11482.1.A1 are orthologs, the latter consensus sequence not differentiating between the three homeologs. The two remaining wheat sequences in that Figure could either be duplications of Ta.11482.1 or perhaps related gene family members.


Fig.2: A homology graph containing three wheat homeologs

A note of warning: A very small number of graphs contain an unmanagably large number of sequences, for good reason (e.g. graphs containing ubiquitin-related sequences or histones). These contain a lot of information and it takes a long time to produce the corresponding homology graph, so in the unlikely event that they are of interest to you be prepared to wait!!

Sequence quality information

Hybridization efficiency, especially of probesets on the wheat chip, can be quite variable and can lead to inaccurate measurements of gene expression. We have attempted to provide as much information as we are aware of to assist in the assessment of this. This information includes:

  1. The prune set. Affymetrix normally discards large numbers of probesets (the so-called “prune set” or “5' set”) which are, for one reason or another, of lesser quality. Because of the wheat-community's desire to have a 'discovery chip' these probesets have been included on the wheat chip, however the expression signal resulting from these needs to be interpreted with care. We have marked these sequences on the homology graphs, as well as in the expression plots, with a (P). In addition the signal from any probesets known to cross-hybridize (i.e. _s_at, _x_at, _a_at) needs to be interpreted appropriately, as always – see Appendix B of Affymetrix's document on GeneChip Expression Analysis.

  2. Sequences of opposite orientation. For some sequences possible quality issues become evident from the comparison of the wheat and barley sequences itself. These are flagged by providing an appropriate warning message. For example, selection of the homology graph containing the barley Contig11222_at results in the warning: “Probesets of the type Contig11222_... and TaAffx.54625.1.S... are of opposite orientation.” In other words, while a BLAST comparison indicates that the consensus sequences are closely related, the probesets are tiled to reverse complemented sequences and therefore will in general not both hybridize (in this instance, this can be confirmed by looking at the actual expression profiles themselves). Note that, as discussed above, the wheat chip quite often contains sequences of both orientations (labelled .A1 and .S1) – if that had been the case for TaAffx.54625.1, one can often use the comparison of wheat and barley expression profiles to determine which one is in the correct orientation. An example of this can be seen in the homology graph containing Ta.1135.1, where comparison of expression profiles of Ta.1135.1.A1 and Ta.1135.1.S1 to that of the barley Contig446 shows that it is likely that Ta.1135.1.S1 is in the correct orientation and Ta.1135.1.A1 is not.

  3. Chimeric sequences. There are are limited number of sequences where the sequence comparisons of wheat and barley indicates simultaneous significant local alignments in both orientations, suggesting that there may have been a problem with the sequence assembly used for one of the two chips (which can also lead to hybridization problems, depending on the position of the probes in the probesets). These cases have also been flagged with an appropriate warning message – see, for example, the homology graph containing Contig4891.

The expression profiles & annotation

This part of the comparator is straightforward: by default all expression profiles, and associated annotation, of each sequence in the homology graph are shown. Again, wheat sequences in the prune/5' set are marked by a '(P)'.

Some homology graphs, particularly those involving genes from large gene families, can become quite large. In these cases, it can be convenient to eliminate probesets, such as cross-hybridizing and/or prune probesets, by de-selecting them in the annotation panel. The expression profiles can then be updated by selecting "Redraw Chart". The homology graph may also be redrawn, however sequences which have become completely disconnected will no longer be shown.

Finally, the homology graphs and expression profiles can be exported by right-clicking on them and saving them to an appropriate file.

References

  1. A. Druka , G. Muehlbauer , I. Druka , R. Caldo , U. Baumann , N. Rostoks , A. Schreiber , R. Wise , T. Close , A. Kleinhofs, A. Graner, A. Schulman, P. Langridge, K. Sato, P. Hayes, J. McNicol, D. Marshall, R. Waugh (2006) An atlas of gene expression from seed to seed through barley development. Funct. Int. Genomics, 6 : 202-211

  2. A. W. Schreiber, T. Sutton, R. A. Caldo, E. Kalashyan, B. Lovell, G. Mayo, G. Muehlbauer, A. Druka, R. Waugh, R. Wise, P. Langridge, U. Baumann (2009) Comparative transcriptomics in the Triticeae. BMC Genomics, 10 : 285

Acknowledgements

For the visualization of the homology graphs WebComparator uses graph datastructures and algorithms of QuickGraph (http://quickgraph.codeplex.com) and automatic graph drawing and layout capabilities of Graphviz (http://www.graphviz.org).

 

Any comments or suggestions on the “WebComparator” are always welcome and can be directed either to myself at andreas.schreiber (at) adelaide.edu.au or to Elena Kalashyan at elena.kalashyan (at) acpfg.com.au