Jump to content

MA plot

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 37.186.40.99 (talk) at 20:36, 11 December 2020 (Added a line referencing a new interative tool to plot MA and select/filter genes interactively.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

An MA plot is an application of a Bland–Altman plot for visual representation of genomic data. The plot visualizes the differences between measurements taken in two samples, by transforming the data onto M (log ratio) and A (mean average) scales, then plotting these values. Though originally applied in the context of two channel DNA microarray gene expression data, MA plots are also used to visualise high-throughput sequencing analysis.[1][2]

Explanation

Microarray data is often normalized within arrays to control for systematic biases in dye coupling and hybridization efficiencies, as well as other technical biases in the DNA probes and the print tip used to spot the array.[3] By minimizing these systematic variations, true biological differences can be found. To determine whether normalization is needed, one can plot Cy5 (R) intensities against Cy3 (G) intensities and see whether the slope of the line is around 1. An improved method, which is basically a scaled, 45 degree rotation of the R vs. G plot is an MA-plot.[4] The MA-plot is a plot of the distribution of the red/green intensity ratio ('M') plotted by the average intensity ('A'). M and A are defined by the following equations.

M is, therefore, the binary logarithm of the intensity ratio (or difference between log intensities) and A is the average log intensity for a dot in the plot. MA plots are then used to visualize intensity-dependent ratio of raw microarray data (microarrays typically show a bias here, with higher A resulting in higher |M|, i.e. the brighter the spot the more likely an observed difference between sample and control). The MA plot puts the variable M on the y-axis and A on the x-axis and gives a quick overview of the distribution of the data.

In many microarray gene expression experiments, an underlying assumption is that most of the genes would not see any change in their expression; therefore, the majority of the points on the y-axis (M) would be located at 0, since log(1) is 0. If this is not the case, then a normalization method such as LOESS should be applied to the data before statistical analysis. (On the diagram below see the red line running below the zero mark before normalization, it should be straight. Since it is not straight, the data should be normalized. After being normalized, the red line is straight on the zero line and shows as pink/black.)

Packages

Several Bioconductor packages, for the R software, provide the facility for creating MA plots. These include affy (ma.plot, mva.pairs), limma (plotMA), marray (maPlot), and edgeR(maPlot)

Similar "RA" plots can be generated using the raPlot function in the caroline CRAN R package.

An interactive MA plot to filter genes by M, A and p-values, search by names or with a lasso, and save selected genes, is available as an R-Shiny code Enhanced-MA-Plot.

Example in the R programming language

library(affy)

if (require(affydata)) 
{
     data(Dilution)
}

y <- (exprs(Dilution)[, c("20B", "10A")])

x11()

ma.plot( rowMeans(log2(y)), log2(y[, 1])-log2(y[, 2]), cex=1 )

title("Dilutions Dataset (array 20B v 10A)")

library(preprocessCore)

#do a quantile normalization
x <- normalize.quantiles(y)

x11()

ma.plot( rowMeans(log2(x)), log2(x[, 1])-log2(x[, 2]), cex=1 ) 
title("Post Norm: Dilutions Dataset (array 20B v 10A)")
MA Plot for raw data, note of the distortion in the upper ranges.
Pre Normalization
MA Plot for quantile normalized data.
Post Normalization
MA Plots.

See also

References

  1. ^ Robinson, M. D.; McCarthy, D. J.; Smyth, G. K. (11 November 2009). "edgeR: a Bioconductor package for differential expression analysis of digital gene expression data". Bioinformatics. 26 (1): 139–140. doi:10.1093/bioinformatics/btp616. PMC 2796818. PMID 19910308.
  2. ^ Love, Michael I; Huber, Wolfgang; Anders, Simon (5 December 2014). "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2". Genome Biology. 15 (12): 550. doi:10.1186/s13059-014-0550-8. PMC 4302049. PMID 25516281.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  3. ^ YH Yang, S Dudoit, P Luu, DM Lin, V Peng, J Ngai, TP Speed. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research vol. 30 (4) pp. e15.
  4. ^ Dudoit, S, Yang, YH, Callow, MJ, Speed, TP. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sin. 12:1 111–139