Friday, October 31, 2008

Imperfect Phenotypes: Transcripts and Peptides

Source: Ghaemmaghami, S., Huh, W., Bower, K., Howson, R.W., Belle, A., Dephoure, N., O’Shea, E.K., and Welssman, J.S. Global analysis of protein expression in yeast. 2003. Nature, 425:737-41

The authors created a library of strains with TAP-tagged to an ORF. They then could use a single antibody on each strain in mid-log phase to quantify the protein abundance. Comparing these values to those for complementary transcript abundance (via microarrays) and to codon bias scores, they found that there is a significant level of correlation between the measurements. Average transcript-to-protein ratio is fairly consistent near 4,000 proteins per transcript across the range of transcripts, but the protein variability for genes with the same transcript abundance was quite high. Codon bias had a similar good ratio, with high levels of variability at the same codon score. This variation was determined not to be measurement error, as a case of 206 essential proteins were retested on the TAP-tagged library in triplicate. The TAP library was able to detect a greater number of proteins than typical LC/MS methods, due to LC/MS high abundance bias.

So once again, we find that transcript differences aren't the best proxy for understanding cellular changes in response to perturbations. They may point in the right direction but a 10-fold change in one transcript and a 5-fold in another may result in the same peptide number difference. But are proteins any better, what with post-translational modifications and activations? While it is not as common place to measure proteome changes, I'm already thinking we may need to start looking into phosphorylome (?) changes to tie it all together and get a genuine comprehensive view.

Other notes:
-80% of proteome is expressed during normal growth conditions
-tandem affinity purification (TAP) tag is colmodulin binding peptide, TEV cleavage site, two IgG binding domains
-successful integrants for 98% of all ORFs in S. cerevisiae
-tagging does not hinder, TAP can be degraded
-detected 79% of essential, 83% of products corresponding to assigned gene names
=73% of all annotated ORFs
-very abundant mRNAs generally encode for abundant proteins
=some variation due to TAP tag, as subset that was retested had higher correlation than initially found
-similar, but lower, correlation between protein abundance and codon usage as measured by codon adaptation index (CAI)
=just noise at CAI < 0.2

Tuesday, October 28, 2008

Transcripts and Peptides: On-again, Off-again

Source: Foss, E.J., Radulovic, D., Shaffer, S.A., Ruderfer, D.M., Bedalov, A., Goodlett, D.R., and Kruglyak, L. Genetic basis of proteome variation in yeast. 2007. Nature Genetics, 39(11):1369-75

The authors used a mass spectrometry approach coupled with retention time shift software to measure the peptide abundances between the BY and RM strains and their segregants. Replicating a sample via quantitative western blot they found that they could reliably measure peptides as a quantitative trait, and that the levels showed inheritance patterns similar to transcript data from a previous study. A number of peptide levels differ significantly between the two parents and a linkage analysis found four major hotspots responsible for many of these linkages. Two of these hotspots were enriched for peptide synthesis, one being on LEU2, the other two were not enriched for any particular term. Almost all the peptides showed trans linkage. Only three of the four hotspots overlap to hotspots generated by transcript data of the 278 measurable peptides, and only rarely do these hotspots link to both a peptide and its corresponding transcripts. Interestingly, the 278 transcripts find the same hotspots as the total transcript database, suggesting a limited number of polymorphism control the whole proteome, thus the small sample size of proteins may still find the proteome hotspots.

•Other notes:
-weak correlation between transcript and peptide levels, suggesting post-transcriptional regulation acts as a buffer
-BY and RM differ at ~0.6% of genome
-if a representative sample this means 1/3 of all proteins differ between BY and RM
-221 peptides for 278 proteins
-heritability average was 62%
-expanded to 0.17 to get 85 proteins, 109 linkages
=only 7% of peptides map to a marker within 20kb of their ORF
-average correlation between protein and transcript abundance is 0.186
-3 of 4 hotspots overlap
=shared hotspots do not always overlap the same gene’s transcript and protein
=same polymorphism causes changes at different stages for different genes

Monday, October 20, 2008

The Age of Sequencing I: The Sanger Alliance

Source: Shendure and Ji (2008). Next-generation DNA sequencing. Nature Biotech 26:1135-1145.

There is no doubt that Sanger biochemistry (i.e. 'cycle sequencing') has set the fundamentals of sequencing; both in high-throughput clone based methods (e.g. shotgun) and single PCR product targeted sequenicng. In each cycle, labeled ddNTP molecules mark a single nucleotide at its end and through high-resolution electrophoretic separation, the final sequence is read. Each reaction can read about ~1000 bp, with the max accuracy of 99.999% and the cost of $0.5 per kb in shotgun.

This platform has been used for the emergence of the second generation sequencing pipelines: 454, Solexa, SOLiD, Polonator and HeliScope. These methods, despite many differences in their methodologies, follow the same 'cycling' logic followed by an optical signal of some sort. Library preparation, followed by adapter ligation and amplification is the first step of all these methods. Amplification is done either through in situ polonies, emulsion PCR or bridge PCR. The amplification step, however, should ensure the spacial clustering of each clone.

The bottom line is, these methods bypass the steps that are required in classic Sanger method and also use array-based approaches to enforce parallelism.

Saturday, October 11, 2008

Discovering Species-level Functions in Complex Microbial Communities

Source: Kalyuzhnaya et al. (2008). High-resolution metagenomics targets specific functional types in complex microbial communities. Nature Biotrech 26 (9): 1029-1034.

Environmental genomics (metagenomics) has become a hot topic in molecular biology; however, it is costly and highly dependent on the number of reference genomes available. Thus, studying highly complex communities like those of soil or lake sediments are not currently feasible. In this study, the authors use a labeling method to target the species that are specific for a given function. In this case, they have studied methylotrophy and through the usage of labled methane, methanol, methylamine, formaldehyde and formate, they have focused on the species directly assimilating these substrates through extracting the labeled fraction of the genomic DNA extracted from the community (using isopycnic centrifugation).

The 16S rRNA analysis shows that the samples enriched for methylotrophy are way less complex than the initial sample largely including the bona fide methylotrophs: Methylobacter tundripaludum, Methylomonas sp., Methylotenera mobilis, Methyloversatilis universalis, Ralstonia eutropha. It should be noted that some of the enriched species may not be methylotrophs but rather secondary links in the food chain (e.g. using 13C-CO2 produced by methylotrophs).

The authors demonstrate the utility of their approach through identifying a novel methylotroph and reconstructiong its genome and metabolic network.

Tuesday, October 7, 2008

Replication Fork vs Choice of Origin

Source: Courbet et al. 2008. Replication fork movement sets chromatin loop size and origin choice in mammalian cells. 

The stability of the genomes relies on a faithful single round of replication. The ORC has been extensively studied but the question we need to address is the spacial distribution of origins. The authors here show that slowing down the replication fork results in the recruitment of latent origins that are not normally used. The "slow" phenotype was the result of depleted nucleotide pools and addition of A+T to the cell culture restores the "fast" phenotype. Using aphidicolin, a drug which targets DNA pol, the authors established the fact that it is replication fork movement and not nucleotide pool sizes that drives the observed phenomenon.

Now, in a "slow" background, restoring "fastness" by addition of A+T results in two steps: First, 30% of the origins would not fire; second, in the next S phase the pattern mimics that of "fast" cells.

The Metabolome as Phenotype

Source: Fiehn, O., Kopka, J., Dormann, P., Altmann, T., Trethewey, R.N., and Willmitzer, L. Metabolite profiling for plant functional genomics. 2000. Nature Biotechnology, 18: 1157 – 61

The authors grew up three strains of Arabidopsis: one wild-type, one with a DGD mutation, and one with a stomatal density mutation. They used GC/MS to profile the strains with 300+ metabolites in an unbiased approach that picked up a number of unknown molecules. Nearly 60 compounds were found to be significantly different between wt and the DGD mutation. The authors used an unbiased approach and found that a number of unidentified compounds were also noticeably different. Each strain has a unique metabolic profile, as determined through principal component analysis.


Significant ( p < 0.01) metabolite differences, between dgd and wt, sdd1-1 and wt. Known or high-fold different metabolites shown.

Other notes:
-used internal standards for correcting and as normalization
-also normalized to 1 mg plant leaf fresh weight
-testing reproducibility in samples that were measured multiple times found deviations for same day analysis at 8 +/- 6% over 149 polar compounds
-tested biological variability and found it in clear excess of preparation variability, averaging 40%
-single gene change showed 153 of 326 significant differences in metabolites

“This analysis demonstrates the power of the metabolite profiling method to identify and quantify previously overlooked alterations, following a more comprehensive interpretation of the consequences of genetic modifications.”