= Spike-in controls =

Spike-in controls or spike-ins are known quantities of molecules—such as oligonucleotide sequences (RNA, DNA), proteins, or metabolites—added to a biological sample for more accurate quantitative estimation of the molecule of interest across samples and batches. Spike-ins are particularly used in high-throughput sequencing assays, where they act as an internal reference to monitor and normalize technical and biological biases introduced during sample processing such as library preparation, handling, and measurement.

Spike-ins can adjust for specific technical biases and enable accurate estimation of the endogenous molecules of interest, resulting in improved data quality and standardization across different samples or experiments. Spike-ins can be synthetic or exogenous material (not originally part of the sample). In sequencing-based assays, exogenous material is typically derived from the genome of a different species such as Drosophila melanogaster or Arabidopsis thaliana.

== Design ==
Spike-ins are subjected to the same experimental steps and potential biases as the native molecules within a sample after they have been added. They are added early in the experimental workflow, often during or immediately after sample lysis or extraction and prior to sequencing. As such, the suitability of spike-ins, their design, and subsequently analysis should allow accounting for as many sources of experimental variation as possible. Ideally, the spike-ins closely resemble the input material containing epitopes of interest but allow clear differentiation from the native molecules. Since the initial amount of each spike-in molecule is known, its measured quantity at the end of the experiment reflects the cumulative effects of technical factors, such as extraction efficiency, enzymatic reaction efficiencies (e.g., reverse transcription, ligation, amplification), sample loss, and measurement sensitivity.

In sequencing assays, spike-ins can further be combined with unique molecular identifiers to increase sensitivity and specificity.

== Analysis ==
The information obtained from spike-ins is typically leveraged after initial bioinformatics analyses have been carried out — with the final output of such analyses being absolute counts of different spike-in controls for each library. Various spike-in normalization or calibration methods then utilize this information as baseline to adjust the primary signal of interest.

=== Spike-in normalization ===
The choice of a normalization method can significantly influence the post-normalization conclusions drawn from an experiment. The first spike-in normalization method, known as reference-adjusted reads per million (RRPM) used a scaling factor determined from the number of reads aligned to the exogenous genome. Subsequent methods modified RRPM to consider the counts of spike-in reads derived from the input sample. A common approach involves determining the ratio between the observed spike-in read counts and the expected counts, or simply calculating the total spike-in reads per sample. These values are then used to derive sample-specific scaling factors. For instance, if a sample yields fewer spike-in reads than expected or fewer than another sample normalized to the same input, its endogenous gene counts are scaled upwards, under the assumption that the lower spike-in recovery reflects a global technical loss for that sample.

More sophisticated methods may use regression analysis or factor analysis across multiple spike-ins added at various concentrations to model the relationship between input amount and sequencing output, aiming for a more robust estimate of technical bias.

== Applications ==
Several types of spike-in controls are used depending on the application:

- RNA spike-ins: Commonly used in gene expression studies like RNA-Seq and Microarray analysis. Synthetic RNA molecules of defined sequences and lengths are added, often in predefined mixtures covering a wide concentration range. A well-known example is the set developed by the External RNA Controls Consortium (ERCC).
- DNA spike-ins: Used in genomics applications such as ChIP-Seq (Chromatin Immunoprecipitation Sequencing), DNA methylation analysis (e.g., bisulfite sequencing), or other genomic assays. These can be synthetic DNA fragments or genomic DNA from an unrelated species (e.g., adding fly DNA to human samples for ChIP-Seq).
Other less used spike-ins may include peptide or metabolite spike-ins. In proteomics and metabolomics, often stable isotope-labeled synthetic peptides (e.g., AQUA peptides) or metabolites, purified proteins or endogenous metabolites or non-endogenous small molecules are added in known amounts for quantification and normalization.

== See also ==
- DNA sequencing
