Tuesday, September 9, 2008

How come nothing in science is ever "normal"?

Source: Callister, S.J., Barry, R.C., Adkins, J.N., Johnson, E.T., Qian, W., Webb-Robertson, B-J.M., Smith, R.D., and Lipton, M.S. Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. 2006. Journal of Proteome Research, 5(2):277-86

The authors examine four methods of normalization to remove variability: Central tendency (centers around a mean, targets bias independent of magnitude), linear regression (centers around least squares line, targets bias linearly dependent on magnitude), local regression (linear regression for subset), and quantile (sets all samples to same distribution). Before normalization, the datasets (standard protein compilation, Deinococcus radiodurans, and brain tissue from methamphetamine-dosed mice) did not overlap, but all forms of normalization removed large amounts of variation and resulted in overlap between samples. No one form of normalization was consistently best at removing variation, all but local regression were found to be best at some sets. Linear regression seems a good method to start with, though quantile resulted in the largest percent reduction in pooled variation across all replicates. In terms of coefficient of variance (also known as RSD), central tendency was consistently the largest improvement. Quantile is the most likely the best though, since it does not assume that the mean peptide ratio is equal to zero (which would be true if all peptides were measured, but due to the nature of MS is false).

• Other notes:
-all data first log transformed to make more symmetric
-all data plotted in an M vs A plot (minus vs average)
=“minus” : ratio, mi = log2(xi, j=1 / xi, j=2)
=“average” : intensity, ai = log2(xi,j=1 * xi,j=2) / 2
=xi,j is abundance of peptide i in sample j
-Central tendency normalization: normalized relative abundance ratio m’i = mi – μ, where μ is the arithmetic mean of the population of peptide abundance ratios
-Linear regression normalization: applying least squares regression to the scatter plot: m’i = mi – m*i where m*i is the predicted peptide ratio calculated from the regression equation
-Local regression normalization: linear regression on certain areas, mainly exterior where abundance approaches saturation or background
-Quantile normalization: assumes distribution of abundances is expected to be similar
1) assign each sample to a column, each compound to a row
2) index each peptide abundance in column
3) sort column by peptide abundance
4) replace each abundance within a row by that row’s mean
5) restore original order by index from (2)


Replicate peptide levels for (left) un-normalized, (center) quantile normalized, and (right) central tendency normalized data.

No comments: