Monday, September 29, 2008

Profiling the Killer: Targeting Pancreatic Carcinomas

Source: Jones et al. (2008). Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses. Science 321:1801-06.

An annual estimate of ~200,000 patients inflicted by pancreatic cancer with a mortality rate of ~100% makes this specific type of cancer quite a challenge. The authors use almost all of the modern techniques available to them to detail the cellular state of this cancer type. They sequenced the coding genome of 24 patients and identified 1562 somatic mutations (25.5% synonymous, 62.4% missense, 3.8% nonsense, 5.0% small indels,
and 3.3% splice sites or within UTRs). Of the 20,661 genes analyzed by sequencing, 1327 had at least one mutation, and 148 had two or more mutations. The authors then structurally modelled 404 of the missense mutations, where 55 of them were close to an important interface and likely to affect protein function. In general, the average number of mutations in these tumors  (77) are considerably lower than that of say breast cancer (101), possibly denoting fewer generations after tumorigensis.

Then, using SNP arrays the authors mapped genetic deletions or amplifications. Then they combined these data plus mutations plus gene-expression profiles to find key proteins in the emergence of this tumor. These analyses identified 69 gene sets that were genetically altered in the majority of the 24 cancers examined. 31 of these sets could be further grouped into 12 core signaling pathways including KRAS and TGF-b.

Tuesday, September 23, 2008

Classifying the Stem Cell Repertoire

Source: Muller et al (2008). Regulatory networks define phenotypic classes of human stem cell lines. Nature 455:401-405.

Before this study no systematic approach for the classification of stem cells and their pluripotency capacities was introduced. The authors in this paper employ gene expression profiling as a generic method for such classifications. They generate a "stem cell matrix" in which many different types of stem cells are profiled along side differentiated tissues as controls. Subsequently, they use machine learning approaches for an unsupervised clustering of the cell lines based on their expression profiles. 12 distinct classes were identified... While a number of pluripotent stem cells (PSC) were grouped in specific clusters, some like neural stem cells are distributed in all classes. Then, they used an additional 66 profiles as a cross validation phase.

In the end, they use GSEA and MATISSE algorithm to find the pathways and regulatory networks that are associated with different phenotypes in their stem cell matrix.

Monday, September 15, 2008

Mapping Diabetes to a Compound, Gene

Source: Dumas, M-E., Wilder, S.P., Bihoruea, M-T., Barton, R.H., Fearnside, J.F., Argoud, K., D’Amato, L., Wallis, R.H., Blancher, C., Keun, H.C., Baunsgaard, D., Scott, J., Sidelmann, U.G., Nicholson, J.K., and Gauguier, D. Direct quantitative trait locus mapping of mammalian metabolic phenotypes in diabetic and normoglycemic rat models. 2007. Nature Genetics, 39 (5): 666-72

The authors took three mouse lines and did an unbiased NMR metabonomics approach, two of these lines were diabetic. They then used both R/qtl and QTL Reaper to map NMR compounds among offspring of crosses within these strains. After identifying a number of candidate loci that matched both mapping methods, the authors used NMR to identify a molecule that had mapped (benzoate). They then used cognate mouse lines (essentially allele-replaced mice) to test phenotypic response for a region that had shown up from both forms of QTL mapping and also had substantial transcript abundance differences between the lines. The authors found that the cognate strains explained a large amount of the difference for benzoate, but also for a number of other metabolites that were related to the pathway. This was an excellent study in using QTL to identify controlling loci in metabolism, discover the compound, and observe the allelic affect.

Other notes:
-only 110 consistent linkages
-Bonferroni corrected reduced to 22 significant peaks
-mQTLs were less abundant than eQTLs
-Noticed multiple loci linked to same metabolite, suggests polygenic control
=found epigenetic effects

Friday, September 12, 2008

Transcripts are not perfect markers of change

Source: Daran-Lapujade, P., Jansen, M.L.A., Daran, J., van Gulik, W., de Winde, J.H., and Pronk, J.T. Role of transcriptional regulation in controlling fluxes in central carbon metabolism of Saccharomyces cerevisiae. 2004. Journal of Biological Chemistry, 279(10): 9125-38

The authors grew yeast in chemostats under carbon-limitation on one of four carbon sources: glucose, maltose, acetate, and ethanol. They used flux balance analysis to come up with metabolic fluxes through key proteins and also measured transcript abundances across the genome for all four conditions. They found that there was not a great difference found (as compared to Kresnowati et al) except in 117 transcripts, between fermentable sugars and C2-carbon sources, though fluxes differed significantly at many steps in carbon metabolism. The difference between maltose and glucose was limited mainly to maltose transporters, both in flux and transcript abundance. Similarly, there was not a great difference between acetate and ethanol. Looking at the transcript abundance, the authors saw that the 117 transcript profiles clustered into six clusters relating to difference between glucose/maltose and acetate/ethanol, as well as within the carbon sources (ie, between glucose and maltose), the expected genes were in the expected clusters. Looking at the MIPS classification of the genes, 40% are still unknown, while 29% relate to carbon metabolism. The authors then looked at the upstream sequences for clustered genes, discovering conserved sequences for transcription factors. Of these transcription factors a few were predicted while unknown factors seem to play a role in more. Noting the discrepancy between changes in flux and changes in transcript abundance, even the magnitude changes, the authors suggest that most carbon-metabolism is altered via post-transcriptional regulation, and that transcript regulation is only used for rate-limiting steps in pathways. Finally, the authors hypothesize reasons their changed transcript dataset is so small compared to others who have looked at carbon source change and suggest it is due to the chemostat. The authors strongly feel the chemostat keeps a more constant environment, allowing changes to single perturbations, as opposed to stress, growth, and overabundance that is seen in batch.

Other notes:
-180 total transcripts change in response to carbon source
=33 between glucose and maltose
=16 between ethanol and acetate
=117 between sugars and C2-compounds
-complete data set found at www.bt.tudelft.nl/carbon-source
-maltose uptake requires energy-dependent proton-symport mechanism as opposed to glucose’s simple diffusion
-biomass yields for C2 lower due, respiration rates higher due to lower ATP yield
-higher fluxes in TCA, glyoxylate cycle, gluconeogenesis for C2
-lower fluxes in glycolysis, oxidative-PPP, NADP-dependent acetaldehyde and/or isocitrate dehydrogenases for C2
-79 upregulated, 38 downregulated in cultures limited by C2
=79 : 21 carbon metabolism, 7 for TCA, 5 acetyl-CoA metabolism and trafficking, 3 transcriptional regulation, 8 for transport, 7 for nitrogen metabolism and transport (SAM3), only 1 in respiration
=38 : 20 no clear role, 10 carbon metabolism, 4 PPP, 3 transport, 1 signaling
-previous studies on diauxic shift 400 transcripts shown to change 2-fold, 600 in glucose vs ethanol in batch
=225 genes are transcriptionally regulated by glucose, but not in glucose-limited chemostat with low glucose concentrations
=acetate as a byproduct for glucose batch, alters pH gradient, causes stress response
-in chemostat glucose is too low to encourage ethanol/acetate production
=growth rate decreases in batch, held steady in chemostat
-magnitude of changes does not match up, requires more than transcription regulation
=glycolysis and pyruvate showed no correlation
=“during carbon-limited cultivation, fluxes through these central metabolic pathways in S. cerevisiae are not primarily controlled at the transcriptional level”
=DNA microarrays “have limited value as indicators for in vivo activity for proteins”


Metabolites at right, transcript at left; significant decreases between carbon sources underlined, increases highlighted. Many more metabolites differ significantly than their corresponding enzyme's transcript, and magnitudes rarely match!


This has been a pet peeve of mine for a while: the multitude of studies that do some experiment, slap a microarray around and claim: "Aha! Look how many genes change! THIS is quite important!" Research should mature to look deeper at phenotypes: proteomics and metabolomics come to mind. Plus, this opens up a huge field of importance for genomicists: post-transcriptional regulation.

Wednesday, September 10, 2008

Transcription Initiation: Can It Get Any More Complex?

Source: Revyakin et al. (2006). Abortive Initiation and Productive Initiation by RNA Polymerase Involve DNA Scrunching. Science 314:1139-1143.

Transcription initiation involves a number of steps:
1. Attachment to the
 promoter (clodes complex: RPc).
2. Unwinding the DNA ~1 turn to form the open complex (RPo).
3. Synthesis and release of short RNAs (RPitc).
4. Promoter escape and elongation.

The most complex is RPitc in which short RNA transcription is ocurring (~8-11 nt); however, RNA polymerase is not moving as determined by footprinting assays. There have been three models for explaining this behavior:
1. Scrunching: DNA is contracted inward.
2. inchworming: RNA pol is conformationally expanded.
3. transient excursions: RNA pol moves back and forth with long intervals.

If the first model is true, at RPitc stage the DNA is being unwound. The authors have made an experimental setup for detecting these variations (see below) and they use this system to show that this model is correct.

Red rover, red rover, send your genes on over!

Source: Coop, G., Wen, X., Ober, C., Pritchard, J.K., and Przeworksi, M. High-resolution mapping of crossovers reveals extensive variation in fine-scale recombination patterns among humans. 2008. Science, 319: 1395-8

The authors looked at the number of recombination events and found a difference in both the number and, to a lesser extent, location of crossovers in males and females. The authors looked at a small population of humans using a chip to look for SNP haplotypes in nuclear families with multiple children. Hotspots account for most crossover events in both sexes, while some hotspots seem reserved more for one sex versus the other. Heritability was not high, but was significantly different from zero to demonstrate heritability of hotspot usage.

Other notes:
-mothers with higher recombination rates have slightly more offspring
-viable offspring of older mothers tend to have higher recombination rates
-recombination rates go up with gene density
-recombination rates reduce near genes, highest at a distance away from start of genes


Recombination rate vs Distance from Transcriptional Start Site

Tuesday, September 9, 2008

How come nothing in science is ever "normal"?

Source: Callister, S.J., Barry, R.C., Adkins, J.N., Johnson, E.T., Qian, W., Webb-Robertson, B-J.M., Smith, R.D., and Lipton, M.S. Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. 2006. Journal of Proteome Research, 5(2):277-86

The authors examine four methods of normalization to remove variability: Central tendency (centers around a mean, targets bias independent of magnitude), linear regression (centers around least squares line, targets bias linearly dependent on magnitude), local regression (linear regression for subset), and quantile (sets all samples to same distribution). Before normalization, the datasets (standard protein compilation, Deinococcus radiodurans, and brain tissue from methamphetamine-dosed mice) did not overlap, but all forms of normalization removed large amounts of variation and resulted in overlap between samples. No one form of normalization was consistently best at removing variation, all but local regression were found to be best at some sets. Linear regression seems a good method to start with, though quantile resulted in the largest percent reduction in pooled variation across all replicates. In terms of coefficient of variance (also known as RSD), central tendency was consistently the largest improvement. Quantile is the most likely the best though, since it does not assume that the mean peptide ratio is equal to zero (which would be true if all peptides were measured, but due to the nature of MS is false).

• Other notes:
-all data first log transformed to make more symmetric
-all data plotted in an M vs A plot (minus vs average)
=“minus” : ratio, mi = log2(xi, j=1 / xi, j=2)
=“average” : intensity, ai = log2(xi,j=1 * xi,j=2) / 2
=xi,j is abundance of peptide i in sample j
-Central tendency normalization: normalized relative abundance ratio m’i = mi – μ, where μ is the arithmetic mean of the population of peptide abundance ratios
-Linear regression normalization: applying least squares regression to the scatter plot: m’i = mi – m*i where m*i is the predicted peptide ratio calculated from the regression equation
-Local regression normalization: linear regression on certain areas, mainly exterior where abundance approaches saturation or background
-Quantile normalization: assumes distribution of abundances is expected to be similar
1) assign each sample to a column, each compound to a row
2) index each peptide abundance in column
3) sort column by peptide abundance
4) replace each abundance within a row by that row’s mean
5) restore original order by index from (2)


Replicate peptide levels for (left) un-normalized, (center) quantile normalized, and (right) central tendency normalized data.

Monday, September 8, 2008

Extrinsic Stochastic Variations in Gene Expression

Source: Volfson et al. (2006). Origins of extrinsic variability in eukaryotic gene expression. Nature 439:861-864.

Stochastic variation in gene expression across a clonal population has been observed. These variation are classifies as either intrinsic or extrinsic variation. Intrinsic phenomona result from the inherent noise in the regulatory element that control a given gene; whereas, extrinsic variations are caused by more global or environmental stochastic processes.

To address extrinsic variation, the authors make yeast strains with 1, 2 ... 5 gal'-GFP copies. Then they use FACS machines to measure th expression of GFP in each cell. They sh
ow that, on average, GFP expression normalized by copy number is similar in all strains, denoting that the system is not saturated. Then, they hypothesize that intrinsic and extrinsic variation can be distinguished as intrinsic variation affects only one copy; whereas, extrinsic variations affect all the copies simultaneuously. In other words, if the variations are completely intrinsic then the standard deviation divided by the mean i
s proportional to the square root of copy numbers; while, if all the variation are caused by extrinsic factors this value should be independent of copy number.


The Gal system used by the authors largely falls into the second category. The authors then go on to model and validate their observations that falls outside the scope of this summary and I encourage those who are interested to read the original paper.

Thursday, September 4, 2008

miRNAs as Global Regulators: How Global Are We Talking Here?

Source: Selbach et al (2008). Widespread changes in protein synthesis induced by microRNAs. Nature 455:58-63.

miRNAs comprise a major set of post-transcriptional regulators with profound effects on different cellular processes through gene expression regulation. This paper is the first of its kind to monitor the large-sclae effects of deregulations in miRNA expression. Despite their apparent importance, we know little about the depth of regulation by miRNAs. Are there any miRNA master-regulators? How many gene on average are regulated by these small RNAs?

In this paper, the authors use state-of-the-art technologies to detect and compare protein levels in the absence, presence or over-expression of certain miRNAs. In their setup, they start with miRNA transfection into HeLa cells. 8hr post-infection, they label the transfected cells with heavy isotopes of amino acids while using medium-heavy for control samples. They combine the samples and through comparing the heavy to medium-heavy ratio from the mass-spectra, they comment on the abundance of the proteins.


They first show that the mRNAs downregulated in the presence of excessive miRNAs are enriched in the target seed. They subsequently make predictions about which mRNAs are directly targeted by each miRNA.

The take home message from this paper is the fact that the miRNAs affect a large spectrum of proteins... much higher than what we imagined before. For example, the authors show that let-7 regulates the expression of thousands of proteins in the cell.