Cellular deconvolution: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
consistent citation formatting; templated cites; combined repeated citations
Alter: journal, pages. Add: s2cid, bibcode, pmc. Formatted dashes. | Use this tool. Report bugs. | via #UCB_Gadget
Line 1: Line 1:
'''Cellular deconvolution''' (also referred to as [[cell type]] composition or cell proportion estimation) refers to computational techniques aiming at estimating the proportions of different cell types in samples collected from a [[Tissue (biology)|tissue]].<ref name="Cobos_2020">{{cite journal | vauthors = Cobos FA, Alquicira-Hernandez J, Powell JE, Mestdagh P, De Preter K | title = Benchmarking of cell type deconvolution pipelines for transcriptomics data | journal = Nature Communications | volume = 11 | issue = 1 | pages = 5650 | date = November 2020 | pmid = 33159064 | doi = 10.1038/s41467-020-19015-1 }}</ref> For example, samples collected from the human brain are a mixture of various [[Neuron|neuronal]] and [[Glia|glial]] cell types (e.g. [[microglia]] and [[Astrocyte|astrocytes]]) in different proportions, where each cell type has a diverse [[Gene expression profiling|gene expression profile]].<ref name="Patrick_2020">{{cite journal | vauthors = Patrick E, Taga M, Ergun A, Ng B, Casazza W, Cimpean M, Yung C, Schneider JA, Bennett DA, Gaiteri C, De Jager PL, Bradshaw EM, Mostafavi S | display-authors = 6 | title = Deconvolving the contributions of cell-type heterogeneity on cortical gene expression | journal = PLoS Computational Biology | volume = 16 | issue = 8 | pages = e1008120 | date = August 2020 | pmid = 32804935 | pmc = 7451979 | doi = 10.1371/journal.pcbi.1008120 }}</ref> Since most [[High throughput biology|high-throughput technologies]] use bulk samples and measure the aggregated levels of molecular information (e.g. [[Gene expression|expression levels of genes]]) for all cells in a sample, the measured values would be an aggregate of the values pertaining to the expression landscape of different cell types.<ref>{{cite journal | vauthors = Kuhn A, Kumar A, Beilina A, Dillman A, Cookson MR, Singleton AB | title = Cell population-specific expression analysis of human cerebellum | journal = BMC Genomics | volume = 13 | issue = 1 | pages = 610 | date = November 2012 | pmid = 23145530 | doi = 10.1186/1471-2164-13-610 }}</ref> Therefore, many downstream analyses such as [[Differential gene expression analysis|differential gene expression]] might be confounded by the variations in cell type proportions when using the output of high-throughput technologies applied to bulk samples.<ref name="Cobos_2020" /> The development of statistical methods to identify cell type proportions in large-scale bulk samples is an important step for better understanding of the relationship between cell type composition and diseases.<ref>{{cite journal | vauthors = Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K | title = Computational deconvolution of transcriptomics data from mixed cell populations | journal = Bioinformatics | volume = 34 | issue = 11 | pages = 1969–1979 | date = June 2018 | pmid = 29351586 | doi = 10.1093/bioinformatics/bty019 }}</ref>
'''Cellular deconvolution''' (also referred to as [[cell type]] composition or cell proportion estimation) refers to computational techniques aiming at estimating the proportions of different cell types in samples collected from a [[Tissue (biology)|tissue]].<ref name="Cobos_2020">{{cite journal | vauthors = Cobos FA, Alquicira-Hernandez J, Powell JE, Mestdagh P, De Preter K | title = Benchmarking of cell type deconvolution pipelines for transcriptomics data | journal = Nature Communications | volume = 11 | issue = 1 | pages = 5650 | date = November 2020 | pmid = 33159064 | doi = 10.1038/s41467-020-19015-1 | pmc = 7648640 | bibcode = 2020NatCo..11.5650A }}</ref> For example, samples collected from the human brain are a mixture of various [[Neuron|neuronal]] and [[Glia|glial]] cell types (e.g. [[microglia]] and [[Astrocyte|astrocytes]]) in different proportions, where each cell type has a diverse [[Gene expression profiling|gene expression profile]].<ref name="Patrick_2020">{{cite journal | vauthors = Patrick E, Taga M, Ergun A, Ng B, Casazza W, Cimpean M, Yung C, Schneider JA, Bennett DA, Gaiteri C, De Jager PL, Bradshaw EM, Mostafavi S | display-authors = 6 | title = Deconvolving the contributions of cell-type heterogeneity on cortical gene expression | journal = PLOS Computational Biology | volume = 16 | issue = 8 | pages = e1008120 | date = August 2020 | pmid = 32804935 | pmc = 7451979 | doi = 10.1371/journal.pcbi.1008120 | bibcode = 2020PLSCB..16E8120P }}</ref> Since most [[High throughput biology|high-throughput technologies]] use bulk samples and measure the aggregated levels of molecular information (e.g. [[Gene expression|expression levels of genes]]) for all cells in a sample, the measured values would be an aggregate of the values pertaining to the expression landscape of different cell types.<ref>{{cite journal | vauthors = Kuhn A, Kumar A, Beilina A, Dillman A, Cookson MR, Singleton AB | title = Cell population-specific expression analysis of human cerebellum | journal = BMC Genomics | volume = 13 | issue = 1 | pages = 610 | date = November 2012 | pmid = 23145530 | doi = 10.1186/1471-2164-13-610 | pmc = 3561119 }}</ref> Therefore, many downstream analyses such as [[Differential gene expression analysis|differential gene expression]] might be confounded by the variations in cell type proportions when using the output of high-throughput technologies applied to bulk samples.<ref name="Cobos_2020" /> The development of statistical methods to identify cell type proportions in large-scale bulk samples is an important step for better understanding of the relationship between cell type composition and diseases.<ref>{{cite journal | vauthors = Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K | title = Computational deconvolution of transcriptomics data from mixed cell populations | journal = Bioinformatics | volume = 34 | issue = 11 | pages = 1969–1979 | date = June 2018 | pmid = 29351586 | doi = 10.1093/bioinformatics/bty019 }}</ref>


Cellular deconvolution algorithms have been applied to a variety of samples collected from [[saliva]],<ref name="Zheng_2018">{{cite journal | vauthors = Zheng SC, Webster AP, Dong D, Feber A, Graham DG, Sullivan R, Jevons S, Lovat LB, Beck S, Widschwendter M, Teschendorff AE | display-authors = 6 | title = A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix | journal = Epigenomics | volume = 10 | issue = 7 | pages = 925–940 | date = July 2018 | pmid = 29693419 | doi = 10.2217/epi-2018-0037 }}</ref> [[Buccal exostosis|buccal]],<ref name="Zheng_2018" />, [[Cervix|cervical]],<ref name="Zheng_2018" /> [[PBMC]],<ref>{{cite journal | vauthors = Chiu YJ, Hsieh YH, Huang YH | title = Improved cell composition deconvolution method of bulk gene expression profiles to quantify subsets of immune cells | journal = BMC Medical Genomics | volume = 12 | issue = Suppl 8 | pages = 169 | date = December 2019 | pmid = 31856824 | pmc = 6923925 | doi = 10.1186/s12920-019-0613-5 }}</ref> [[brain]],<ref name="Patrick_2020" /> [[kidney]],<ref name="Cobos_2020" /> and [[Pancreas|pancreatic]] cells,<ref name="Cobos_2020" /> and many studies have shown that estimating and incorporating the proportions of cell types into various analyses improves the interpretability of high-throughput omics data and reduces the [[Confounding|confounding effects]] of cellular [[Cellular heterogeneity|heterogeneity]] in functional analysis of [[omics]] data.<ref name = "Donovan_2020">{{cite journal | vauthors = Donovan MK, D'Antonio-Chronowska A, D'Antonio M, Frazer KA | title = Cellular deconvolution of GTEx tissues powers discovery of disease and cell-type associated regulatory variants | journal = Nature Communications | volume = 11 | issue = 1 | pages = 955 | date = February 2020 | pmid = 32075962 | doi = 10.1038/s41467-020-14561-0 }}</ref><ref>{{cite journal | vauthors = Teschendorff AE, Zhu T, Breeze CE, Beck S | title = EPISCORE: cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-Seq data | journal = Genome Biology | volume = 21 | issue = 1 | pages = 221 | date = September 2020 | pmid = 32883324 | pmc = 7650528 | doi = 10.1186/s13059-020-02126-9 }}</ref>[[File:Cellular deconvolution workflow.jpg|thumb|450x450px|Cellular Deconvolution Pipeline. All methods require the gene expression or DNA methylation profiles of each subject in the study using high-throughput technologies and bulk samples.]]
Cellular deconvolution algorithms have been applied to a variety of samples collected from [[saliva]],<ref name="Zheng_2018">{{cite journal | vauthors = Zheng SC, Webster AP, Dong D, Feber A, Graham DG, Sullivan R, Jevons S, Lovat LB, Beck S, Widschwendter M, Teschendorff AE | display-authors = 6 | title = A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix | journal = Epigenomics | volume = 10 | issue = 7 | pages = 925–940 | date = July 2018 | pmid = 29693419 | doi = 10.2217/epi-2018-0037 }}</ref> [[Buccal exostosis|buccal]],<ref name="Zheng_2018" />, [[Cervix|cervical]],<ref name="Zheng_2018" /> [[PBMC]],<ref>{{cite journal | vauthors = Chiu YJ, Hsieh YH, Huang YH | title = Improved cell composition deconvolution method of bulk gene expression profiles to quantify subsets of immune cells | journal = BMC Medical Genomics | volume = 12 | issue = Suppl 8 | pages = 169 | date = December 2019 | pmid = 31856824 | pmc = 6923925 | doi = 10.1186/s12920-019-0613-5 }}</ref> [[brain]],<ref name="Patrick_2020" /> [[kidney]],<ref name="Cobos_2020" /> and [[Pancreas|pancreatic]] cells,<ref name="Cobos_2020" /> and many studies have shown that estimating and incorporating the proportions of cell types into various analyses improves the interpretability of high-throughput omics data and reduces the [[Confounding|confounding effects]] of cellular [[Cellular heterogeneity|heterogeneity]] in functional analysis of [[omics]] data.<ref name = "Donovan_2020">{{cite journal | vauthors = Donovan MK, D'Antonio-Chronowska A, D'Antonio M, Frazer KA | title = Cellular deconvolution of GTEx tissues powers discovery of disease and cell-type associated regulatory variants | journal = Nature Communications | volume = 11 | issue = 1 | pages = 955 | date = February 2020 | pmid = 32075962 | doi = 10.1038/s41467-020-14561-0 | pmc = 7031340 | bibcode = 2020NatCo..11..955D }}</ref><ref>{{cite journal | vauthors = Teschendorff AE, Zhu T, Breeze CE, Beck S | title = EPISCORE: cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-Seq data | journal = Genome Biology | volume = 21 | issue = 1 | pages = 221 | date = September 2020 | pmid = 32883324 | pmc = 7650528 | doi = 10.1186/s13059-020-02126-9 }}</ref>[[File:Cellular deconvolution workflow.jpg|thumb|450x450px|Cellular Deconvolution Pipeline. All methods require the gene expression or DNA methylation profiles of each subject in the study using high-throughput technologies and bulk samples.]]
== Mathematical Formulation ==
== Mathematical Formulation ==
Most cellular deconvolution algorithms consider an input data in a form of a matrix <math>X_{m\times n}</math>, which represents some molecular information (e.g. gene expression data or [[DNA methylation]] data) measured over a group of <math>n</math> samples and <math>m</math> marks (e.g. genes or [[CpG site|CpG sites]]). The goal of the algorithm is to use these data and return an output matrix <math>W_{k\times n}</math>, representing the proportions of <math>k</math> distinct cell types in each of the <math>n</math> samples. Some methods limit the sum of each column of <math>W</math> matrix less than or equal to one, so that the proportions of cell types some up to the overall number of cells in the sample (less than one when there are some unknown cell types in the samples).<ref name="Houseman_2016">{{cite journal | vauthors = Houseman EA, Kile ML, Christiani DC, Ince TA, Kelsey KT, Marsit CJ | title = Reference-free deconvolution of DNA methylation data and mediation by cell composition effects | journal = BMC Bioinformatics | volume = 17 | issue = 1 | pages = 259 | date = June 2016 | pmid = 27358049 | pmc = 4928286 | doi = 10.1186/s12859-016-1140-4 }}</ref> Moreover, it is assumed that the values of <math>W</math> matrix are non-negative as they pertain to proportions of cell types.<ref name="Houseman_2016" />
Most cellular deconvolution algorithms consider an input data in a form of a matrix <math>X_{m\times n}</math>, which represents some molecular information (e.g. gene expression data or [[DNA methylation]] data) measured over a group of <math>n</math> samples and <math>m</math> marks (e.g. genes or [[CpG site|CpG sites]]). The goal of the algorithm is to use these data and return an output matrix <math>W_{k\times n}</math>, representing the proportions of <math>k</math> distinct cell types in each of the <math>n</math> samples. Some methods limit the sum of each column of <math>W</math> matrix less than or equal to one, so that the proportions of cell types some up to the overall number of cells in the sample (less than one when there are some unknown cell types in the samples).<ref name="Houseman_2016">{{cite journal | vauthors = Houseman EA, Kile ML, Christiani DC, Ince TA, Kelsey KT, Marsit CJ | title = Reference-free deconvolution of DNA methylation data and mediation by cell composition effects | journal = BMC Bioinformatics | volume = 17 | issue = 1 | pages = 259 | date = June 2016 | pmid = 27358049 | pmc = 4928286 | doi = 10.1186/s12859-016-1140-4 }}</ref> Moreover, it is assumed that the values of <math>W</math> matrix are non-negative as they pertain to proportions of cell types.<ref name="Houseman_2016" />
Line 12: Line 12:
Reference-based methods require an a ''priori'' defined reference matrix consisting of the expected value (also called profile or signature) of gene expression (or DNA methylation) for a group of genes (or [[CpG site|CpG sites]]) known to have a differential expression (or methylation)
Reference-based methods require an a ''priori'' defined reference matrix consisting of the expected value (also called profile or signature) of gene expression (or DNA methylation) for a group of genes (or [[CpG site|CpG sites]]) known to have a differential expression (or methylation)
[[File:ReffreevsBased.png|thumb|450x450px|Reference-based methods and reference-free methods for cellular deconvolution. Reference-based approaches aim at estimating the contribution of each signature profile to the overall level of signal while reference-free methods need to estimate both latent cell type signatures and the contribution of each signature.<ref>{{cite journal | vauthors = Li Z, Wu H | title = TOAST: improving reference-free cell composition estimation by cross-cell type differential analysis | journal = Genome Biology | volume = 20 | issue = 1 | pages = 190 | date = September 2019 | pmid = 31484546 | pmc = 6727351 | doi = 10.1186/s13059-019-1778-0 }}</ref>]]
[[File:ReffreevsBased.png|thumb|450x450px|Reference-based methods and reference-free methods for cellular deconvolution. Reference-based approaches aim at estimating the contribution of each signature profile to the overall level of signal while reference-free methods need to estimate both latent cell type signatures and the contribution of each signature.<ref>{{cite journal | vauthors = Li Z, Wu H | title = TOAST: improving reference-free cell composition estimation by cross-cell type differential analysis | journal = Genome Biology | volume = 20 | issue = 1 | pages = 190 | date = September 2019 | pmid = 31484546 | pmc = 6727351 | doi = 10.1186/s13059-019-1778-0 }}</ref>]]
across the cell types.<ref name=":3" /> A reference matrix can be be represented by a matrix <math>H_{m\times k}</math>, representing the expected value for <math>m</math> markers (genes or [[CpG site|CpG sites]]) for each of <math>k</math> cell types known to be presented in the samples. These references can be derived by exploring external single-cell [[Single cell epigenomics|epigenomics]] or [[Single-cell transcriptomics|transcriptomics]] datasets generated for a group of samples similar (e.g. in terms of biological condition, sex and age) to the samples for which the deconvolution method will be applied. These methods use statistical approaches such as non-negative or [[Constrained optimization|constrained]] linear regression methods to dissect the contribution of each cell type to the aggregated bulk signals of genes or CpG sites.<ref name=":5">{{cite journal | vauthors = Titus AJ, Gallimore RM, Salas LA, Christensen BC | title = Cell-type deconvolution from DNA methylation: a review of recent applications | journal = Human Molecular Genetics | volume = 26 | issue = R2 | pages = R216-R224 | date = October 2017 | pmid = 28977446 | pmc = 5886462 | doi = 10.1093/hmg/ddx275 }}</ref> Constrained regression is the basis for many of reference-free cellular deconvolution methods existing in the literature, aiming at estimating the cell proportion values (<math>W_{k\times n}</math>) that maximizes the similarity between <math>HW^T</math> and <math>X</math>.<ref name=":5" />
across the cell types.<ref name=":3" /> A reference matrix can be be represented by a matrix <math>H_{m\times k}</math>, representing the expected value for <math>m</math> markers (genes or [[CpG site|CpG sites]]) for each of <math>k</math> cell types known to be presented in the samples. These references can be derived by exploring external single-cell [[Single cell epigenomics|epigenomics]] or [[Single-cell transcriptomics|transcriptomics]] datasets generated for a group of samples similar (e.g. in terms of biological condition, sex and age) to the samples for which the deconvolution method will be applied. These methods use statistical approaches such as non-negative or [[Constrained optimization|constrained]] linear regression methods to dissect the contribution of each cell type to the aggregated bulk signals of genes or CpG sites.<ref name=":5">{{cite journal | vauthors = Titus AJ, Gallimore RM, Salas LA, Christensen BC | title = Cell-type deconvolution from DNA methylation: a review of recent applications | journal = Human Molecular Genetics | volume = 26 | issue = R2 | pages = R216–R224 | date = October 2017 | pmid = 28977446 | pmc = 5886462 | doi = 10.1093/hmg/ddx275 }}</ref> Constrained regression is the basis for many of reference-free cellular deconvolution methods existing in the literature, aiming at estimating the cell proportion values (<math>W_{k\times n}</math>) that maximizes the similarity between <math>HW^T</math> and <math>X</math>.<ref name=":5" />


==== Construction of reference profiles ====
==== Construction of reference profiles ====
There are a variety of approaches for isolating different cell types to measure their gene expression or DNA methylation levels to be used as references in the deconvolution algorithms. Earlier methods used cell sorting methods such as FACS (fluorescence-activated cell sorting) based on the flow cytometry technique, which separates the populations of cells belonging to different cell types based on their cell sizes, morphologies (shape), and surface protein expressions.<ref>{{cite journal | vauthors = Rosental B, Kozhekbaeva Z, Fernhoff N, Tsai JM, Traylor-Knowles N | title = Coral cell separation and isolation by fluorescence-activated cell sorting (FACS) | journal = BMC Cell Biology | volume = 18 | issue = 1 | pages = 30 | date = August 2017 | pmid = 28851289 | pmc = 5575905 | doi = 10.1186/s12860-017-0146-8 }}</ref><ref>{{cite journal | vauthors = Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlén SE, Greco D, Söderhäll C, Scheynius A, Kere J | display-authors = 6 | title = Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility | journal = PloS One | volume = 7 | issue = 7 | pages = e41361 | date = 2012-07-25 | pmid = 22848472 | pmc = 3405143 | doi = 10.1371/journal.pone.0041361 }}</ref><ref>{{cite journal | vauthors = Koestler DC, Jones MJ, Usset J, Christensen BC, Butler RA, Kobor MS, Wiencke JK, Kelsey KT | display-authors = 6 | title = Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL) | journal = BMC Bioinformatics | volume = 17 | issue = 1 | pages = 120 | date = March 2016 | pmid = 26956433 | pmc = 4782368 | doi = 10.1186/s12859-016-0943-7 }}</ref> With the advance in single-cell technologies, newer approaches started to incorporate references for cell-types measured on a single-cell resolution obtained for a subset of subjects in the study or external subjects from a similar biological condition.<ref name = "Wang_2019">{{cite journal | vauthors = Wang X, Park J, Susztak K, Zhang NR, Li M | title = Bulk tissue cell type deconvolution with multi-subject single-cell expression reference | journal = Nature Communications | volume = 10 | issue = 1 | pages = 380 | date = January 2019 | pmid = 30670690 | doi = 10.1038/s41467-018-08023-x }}</ref><ref name="Cobos_2020" /><ref name = "Jew_2020">{{cite journal | vauthors = Jew B, Alvarez M, Rahmani E, Miao Z, Ko A, Garske KM, Sul JH, Pietiläinen KH, Pajukanta P, Halperin E | display-authors = 6 | title = Accurate estimation of cell composition in bulk expression through robust integration of single-cell information | journal = Nature Communications | volume = 11 | issue = 1 | pages = 1971 | date = April 2020 | pmid = 32332754 | doi = 10.1038/s41467-020-15816-6 }}</ref>
There are a variety of approaches for isolating different cell types to measure their gene expression or DNA methylation levels to be used as references in the deconvolution algorithms. Earlier methods used cell sorting methods such as FACS (fluorescence-activated cell sorting) based on the flow cytometry technique, which separates the populations of cells belonging to different cell types based on their cell sizes, morphologies (shape), and surface protein expressions.<ref>{{cite journal | vauthors = Rosental B, Kozhekbaeva Z, Fernhoff N, Tsai JM, Traylor-Knowles N | title = Coral cell separation and isolation by fluorescence-activated cell sorting (FACS) | journal = BMC Cell Biology | volume = 18 | issue = 1 | pages = 30 | date = August 2017 | pmid = 28851289 | pmc = 5575905 | doi = 10.1186/s12860-017-0146-8 }}</ref><ref>{{cite journal | vauthors = Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlén SE, Greco D, Söderhäll C, Scheynius A, Kere J | display-authors = 6 | title = Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility | journal = PLOS ONE | volume = 7 | issue = 7 | pages = e41361 | date = 2012-07-25 | pmid = 22848472 | pmc = 3405143 | doi = 10.1371/journal.pone.0041361 | bibcode = 2012PLoSO...741361R }}</ref><ref>{{cite journal | vauthors = Koestler DC, Jones MJ, Usset J, Christensen BC, Butler RA, Kobor MS, Wiencke JK, Kelsey KT | display-authors = 6 | title = Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL) | journal = BMC Bioinformatics | volume = 17 | issue = 1 | pages = 120 | date = March 2016 | pmid = 26956433 | pmc = 4782368 | doi = 10.1186/s12859-016-0943-7 }}</ref> With the advance in single-cell technologies, newer approaches started to incorporate references for cell-types measured on a single-cell resolution obtained for a subset of subjects in the study or external subjects from a similar biological condition.<ref name = "Wang_2019">{{cite journal | vauthors = Wang X, Park J, Susztak K, Zhang NR, Li M | title = Bulk tissue cell type deconvolution with multi-subject single-cell expression reference | journal = Nature Communications | volume = 10 | issue = 1 | pages = 380 | date = January 2019 | pmid = 30670690 | doi = 10.1038/s41467-018-08023-x | pmc = 6342984 | bibcode = 2019NatCo..10..380W }}</ref><ref name="Cobos_2020" /><ref name = "Jew_2020">{{cite journal | vauthors = Jew B, Alvarez M, Rahmani E, Miao Z, Ko A, Garske KM, Sul JH, Pietiläinen KH, Pajukanta P, Halperin E | display-authors = 6 | title = Accurate estimation of cell composition in bulk expression through robust integration of single-cell information | journal = Nature Communications | volume = 11 | issue = 1 | pages = 1971 | date = April 2020 | pmid = 32332754 | doi = 10.1038/s41467-020-15816-6 | pmc = 7181686 | bibcode = 2020NatCo..11.1971J }}</ref>


=== Reference-free methods ===
=== Reference-free methods ===
Reference-free methods do not need the reference profiles of cell-type specific genes (or CpGs), although they might still require the identity (name) of cell-type-specific genes (or CpGs).<ref>{{cite journal | vauthors = Tang D, Park S, Zhao H | title = NITUMID: Nonnegative matrix factorization-based Immune-TUmor MIcroenvironment Deconvolution | journal = Bioinformatics | volume = 36 | issue = 5 | pages = 1344–1350 | date = March 2020 | pmid = 31593244 | doi = 10.1093/bioinformatics/btz748 }}</ref> These methods might be considered as a modification of reference-based methods where '''both''' <math>H</math> and <math>W</math> are unknown, and the goal is to jointly estimate both matrices so that the similarity between <math>HW^T</math> and <math>X</math> is maximized. Many of the reference-free methods are based on mathematical framework of [[non-negative matrix factorization]]<ref>{{cite journal | vauthors = Repsilber D, Kern S, Telaar A, Walzl G, Black GF, Selbig J, Parida SK, Kaufmann SH, Jacobsen M | display-authors = 6 | title = Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach | journal = BMC Bioinformatics | volume = 11 | issue = 1 | pages = 27 | date = January 2010 | pmid = 20070912 | doi = 10.1186/1471-2105-11-27 }}</ref><ref>{{cite journal | vauthors = Moffitt RA, Marayati R, Flate EL, Volmar KE, Loeza SG, Hoadley KA, Rashid NU, Williams LA, Eaton SC, Chung AH, Smyla JK, Anderson JM, Kim HJ, Bentrem DJ, Talamonti MS, Iacobuzio-Donahue CA, Hollingsworth MA, Yeh JJ | display-authors = 6 | title = Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma | journal = Nature Genetics | volume = 47 | issue = 10 | pages = 1168–78 | date = October 2015 | pmid = 26343385 | doi = 10.1038/ng.3398 }}</ref><ref name=":3" />, which imposes a non-negativity constraint on the elements of <math>H</math> and <math>W</math>. Additional constraints such as the assumption of [[orthogonality]] between the columns of <math>H</math> might be incorporated to improve the interpretability of results and prevent [[overfitting]].
Reference-free methods do not need the reference profiles of cell-type specific genes (or CpGs), although they might still require the identity (name) of cell-type-specific genes (or CpGs).<ref>{{cite journal | vauthors = Tang D, Park S, Zhao H | title = NITUMID: Nonnegative matrix factorization-based Immune-TUmor MIcroenvironment Deconvolution | journal = Bioinformatics | volume = 36 | issue = 5 | pages = 1344–1350 | date = March 2020 | pmid = 31593244 | doi = 10.1093/bioinformatics/btz748 }}</ref> These methods might be considered as a modification of reference-based methods where '''both''' <math>H</math> and <math>W</math> are unknown, and the goal is to jointly estimate both matrices so that the similarity between <math>HW^T</math> and <math>X</math> is maximized. Many of the reference-free methods are based on mathematical framework of [[non-negative matrix factorization]]<ref>{{cite journal | vauthors = Repsilber D, Kern S, Telaar A, Walzl G, Black GF, Selbig J, Parida SK, Kaufmann SH, Jacobsen M | display-authors = 6 | title = Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach | journal = BMC Bioinformatics | volume = 11 | issue = 1 | pages = 27 | date = January 2010 | pmid = 20070912 | doi = 10.1186/1471-2105-11-27 | pmc = 3098067 }}</ref><ref>{{cite journal | vauthors = Moffitt RA, Marayati R, Flate EL, Volmar KE, Loeza SG, Hoadley KA, Rashid NU, Williams LA, Eaton SC, Chung AH, Smyla JK, Anderson JM, Kim HJ, Bentrem DJ, Talamonti MS, Iacobuzio-Donahue CA, Hollingsworth MA, Yeh JJ | display-authors = 6 | title = Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma | journal = Nature Genetics | volume = 47 | issue = 10 | pages = 1168–78 | date = October 2015 | pmid = 26343385 | doi = 10.1038/ng.3398 | pmc = 4912058 }}</ref><ref name=":3" />, which imposes a non-negativity constraint on the elements of <math>H</math> and <math>W</math>. Additional constraints such as the assumption of [[orthogonality]] between the columns of <math>H</math> might be incorporated to improve the interpretability of results and prevent [[overfitting]].


{| class="wikitable sortable mw-collapsible"
{| class="wikitable sortable mw-collapsible"
Line 28: Line 28:
|Year
|Year
|-
|-
|[https://cibersort.stanford.edu/ CIBERSORT]<ref>{{cite journal | vauthors = Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, Khodadoust MS, Esfahani MS, Luca BA, Steiner D, Diehn M, Alizadeh AA | display-authors = 6 | title = Determining cell type abundance and expression from bulk tissues with digital cytometry | journal = Nature Biotechnology | volume = 37 | issue = 7 | pages = 773–782 | date = July 2019 | pmid = 31061481 | doi = 10.1038/s41587-019-0114-2 }}</ref>
|[https://cibersort.stanford.edu/ CIBERSORT]<ref>{{cite journal | vauthors = Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, Khodadoust MS, Esfahani MS, Luca BA, Steiner D, Diehn M, Alizadeh AA | display-authors = 6 | title = Determining cell type abundance and expression from bulk tissues with digital cytometry | journal = Nature Biotechnology | volume = 37 | issue = 7 | pages = 773–782 | date = July 2019 | pmid = 31061481 | doi = 10.1038/s41587-019-0114-2 | pmc = 6610714 }}</ref>
|Robust enumeration of cell subsets from tissue expression profiles
|Robust enumeration of cell subsets from tissue expression profiles
|Reference based
|Reference based
Line 34: Line 34:
|2018
|2018
|-
|-
|[https://github.com/kkang7/CDSeq CDSeq]<ref>{{cite journal | vauthors = Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L | display-authors = 6 | title = CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data | journal = PLoS Computational Biology | volume = 15 | issue = 12 | pages = e1007510 | date = December 2019 | pmid = 31790389 | pmc = 6907860 | doi = 10.1371/journal.pcbi.1007510 }}</ref>
|[https://github.com/kkang7/CDSeq CDSeq]<ref>{{cite journal | vauthors = Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L | display-authors = 6 | title = CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data | journal = PLOS Computational Biology | volume = 15 | issue = 12 | pages = e1007510 | date = December 2019 | pmid = 31790389 | pmc = 6907860 | doi = 10.1371/journal.pcbi.1007510 | bibcode = 2019PLSCB..15E7510K }}</ref>
|A complete deconvolution method for dissecting tissue heterogeneity
|A complete deconvolution method for dissecting tissue heterogeneity
|Reference free
|Reference free
Line 40: Line 40:
|2019
|2019
|-
|-
|[https://github.com/YuningHao/FARDEEP.git FARDEEP]<ref name = "Has_2019">{{cite journal | vauthors = Hao Y, Yan M, Heath BR, Lei YL, Xie Y | title = Fast and robust deconvolution of tumor infiltrating lymphocyte from expression profiles using least trimmed squares | journal = PLoS Computational Biology | volume = 15 | issue = 5 | pages = e1006976 | date = May 2019 | pmid = 31059559 | pmc = 6522071 | doi = 10.1371/journal.pcbi.1006976 }}</ref>
|[https://github.com/YuningHao/FARDEEP.git FARDEEP]<ref name = "Has_2019">{{cite journal | vauthors = Hao Y, Yan M, Heath BR, Lei YL, Xie Y | title = Fast and robust deconvolution of tumor infiltrating lymphocyte from expression profiles using least trimmed squares | journal = PLOS Computational Biology | volume = 15 | issue = 5 | pages = e1006976 | date = May 2019 | pmid = 31059559 | pmc = 6522071 | doi = 10.1371/journal.pcbi.1006976 | bibcode = 2019PLSCB..15E6976H }}</ref>
|Fast and robust deconvolution of expression profiles
|Fast and robust deconvolution of expression profiles
|Reference based
|Reference based
Line 58: Line 58:
|2019
|2019
|-
|-
|[http://epic.gfellerlab.org/ EPIC]<ref>{{cite journal | vauthors = Racle J, de Jonge K, Baumgaertner P, Speiser DE, Gfeller D | title = Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data | journal = eLife | volume = 6 | pages = e26476 | date = November 2017 | pmid = 29130882 | doi = 10.7554/eLife.26476 | veditors = Valencia A }}</ref>
|[http://epic.gfellerlab.org/ EPIC]<ref>{{cite journal | vauthors = Racle J, de Jonge K, Baumgaertner P, Speiser DE, Gfeller D | title = Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data | journal = eLife | volume = 6 | pages = e26476 | date = November 2017 | pmid = 29130882 | doi = 10.7554/eLife.26476 | pmc = 5718706 | veditors = Valencia A }}</ref>
|Estimating the proportions of different cell types from bulk gene expression data
|Estimating the proportions of different cell types from bulk gene expression data
|Reference based
|Reference based
Line 76: Line 76:
|2019
|2019
|-
|-
|[https://meichendong.github.io/SCDC/articles/SCDC.html SCDC]<ref>{{cite journal | vauthors = Dong M, Thennavan A, Urrutia E, Li Y, Perou CM, Zou F, Jiang Y | title = SCDC: bulk gene expression deconvolution by multiple single-cell RNA sequencing references | journal = Briefings in Bioinformatics | volume = 22 | issue = 1 | pages = 416–427 | date = January 2021 | pmid = 31925417 | doi = 10.1093/bib/bbz166 }}</ref>
|[https://meichendong.github.io/SCDC/articles/SCDC.html SCDC]<ref>{{cite journal | vauthors = Dong M, Thennavan A, Urrutia E, Li Y, Perou CM, Zou F, Jiang Y | title = SCDC: bulk gene expression deconvolution by multiple single-cell RNA sequencing references | journal = Briefings in Bioinformatics | volume = 22 | issue = 1 | pages = 416–427 | date = January 2021 | pmid = 31925417 | doi = 10.1093/bib/bbz166 | pmc = 7820884 }}</ref>
|Bulk gene expression deconvolution by multiple single-Cell RNA sequencing references
|Bulk gene expression deconvolution by multiple single-Cell RNA sequencing references
|Reference based
|Reference based
Line 82: Line 82:
|2020
|2020
|-
|-
|[https://github.com/dtsoucas/DWLS DWLS]<ref>{{cite journal | vauthors = Tsoucas D, Dong R, Chen H, Zhu Q, Guo G, Yuan GC | title = Accurate estimation of cell-type composition from gene expression data | journal = Nature Communications | volume = 10 | issue = 1 | pages = 2975 | date = July 2019 | pmid = 31278265 | doi = 10.1038/s41467-019-10802-z }}</ref>
|[https://github.com/dtsoucas/DWLS DWLS]<ref>{{cite journal | vauthors = Tsoucas D, Dong R, Chen H, Zhu Q, Guo G, Yuan GC | title = Accurate estimation of cell-type composition from gene expression data | journal = Nature Communications | volume = 10 | issue = 1 | pages = 2975 | date = July 2019 | pmid = 31278265 | doi = 10.1038/s41467-019-10802-z | pmc = 6611906 | bibcode = 2019NatCo..10.2975T }}</ref>
|Gene expression deconvolution using dampened weighted least squares
|Gene expression deconvolution using dampened weighted least squares
|Reference based
|Reference based
Line 100: Line 100:
|2020
|2020
|-
|-
|[https://bioconductor.org/packages/TOAST TOAST]<ref>{{cite journal | vauthors = Li Z, Wu Z, Jin P, Wu H | title = Dissecting differential signals in high-throughput data from complex tissues | journal = Bioinformatics | volume = 35 | issue = 20 | pages = 3898–3905 | date = October 2019 | pmid = 30903684 | doi = 10.1093/bioinformatics/btz196 }}</ref>
|[https://bioconductor.org/packages/TOAST TOAST]<ref>{{cite journal | vauthors = Li Z, Wu Z, Jin P, Wu H | title = Dissecting differential signals in high-throughput data from complex tissues | journal = Bioinformatics | volume = 35 | issue = 20 | pages = 3898–3905 | date = October 2019 | pmid = 30903684 | doi = 10.1093/bioinformatics/btz196 | pmc = 6931351 }}</ref>
|Tools for the analysis of heterogeneous tissues
|Tools for the analysis of heterogeneous tissues
|Reference free
|Reference free
Line 130: Line 130:


==== In silico cell-type level resolution ====
==== In silico cell-type level resolution ====
The advance of single-cell technologies enables the profiling of each individual cell in a sample, which help elucidate the issue of cellular heterogeneity by measuring the proportions of different cells in samples. Even though the quality of single cell profiling technologies has been on the rise in recent years, these technologies are still costly, limiting their applications in large populations of samples.<ref name="pmid30670690">{{cite journal | vauthors = Wang X, Park J, Susztak K, Zhang NR, Li M | title = Bulk tissue cell type deconvolution with multi-subject single-cell expression reference | journal = Nature Communications | volume = 10 | issue = 1 | pages = 380 | date = January 2019 | pmid = 30670690 | pmc = 6342984 | doi = 10.1038/s41467-018-08023-x }}</ref> Single cell technologies such as single cell transcriptomic methods also tend to have higher error rates due to factors such as high dropout events.<ref>{{cite journal | vauthors = Ran D, Zhang S, Lytal N, An L | title = scDoc: correcting drop-out events in single-cell RNA-seq data | journal = Bioinformatics | volume = 36 | issue = 15 | pages = 4233–4239 | date = August 2020 | pmid = 32365169 | doi = 10.1093/bioinformatics/btaa283 }}</ref><ref>{{cite journal | vauthors = Yamawaki TM, Lu DR, Ellwanger DC, Bhatt D, Manzanillo P, Arias V, Zhou H, Yoon OK, Homann O, Wang S, Li CM | display-authors = 6 | title = Systematic comparison of high-throughput single-cell RNA-seq methods for immune cell profiling | journal = BMC Genomics | volume = 22 | issue = 1 | pages = 66 | date = January 2021 | pmid = 33472597 | pmc = 7818754 | doi = 10.1186/s12864-020-07358-4 }}</ref> Cellular deconvolution methods provide a robust and cost-effective [[In silico|''in silico'']] alternatives for understanding the samples on a cell-type level resolution, by relying on single cell information of only a small subset of cells in the sample, the reference profiles generated by external sources, or even no reference profile at all.<ref name = "Wang_2020b">{{cite journal | vauthors = Wang J, Roeder K, Devlin B | title = Bayesian estimation of cell-type-specific gene expression per bulk sample with prior derived from single-cell data. | journal = BioRxiv | date = January 2020 | doi = 10.1101/2020.08.05.238949 }}</ref>
The advance of single-cell technologies enables the profiling of each individual cell in a sample, which help elucidate the issue of cellular heterogeneity by measuring the proportions of different cells in samples. Even though the quality of single cell profiling technologies has been on the rise in recent years, these technologies are still costly, limiting their applications in large populations of samples.<ref name="pmid30670690">{{cite journal | vauthors = Wang X, Park J, Susztak K, Zhang NR, Li M | title = Bulk tissue cell type deconvolution with multi-subject single-cell expression reference | journal = Nature Communications | volume = 10 | issue = 1 | pages = 380 | date = January 2019 | pmid = 30670690 | pmc = 6342984 | doi = 10.1038/s41467-018-08023-x | bibcode = 2019NatCo..10..380W }}</ref> Single cell technologies such as single cell transcriptomic methods also tend to have higher error rates due to factors such as high dropout events.<ref>{{cite journal | vauthors = Ran D, Zhang S, Lytal N, An L | title = scDoc: correcting drop-out events in single-cell RNA-seq data | journal = Bioinformatics | volume = 36 | issue = 15 | pages = 4233–4239 | date = August 2020 | pmid = 32365169 | doi = 10.1093/bioinformatics/btaa283 }}</ref><ref>{{cite journal | vauthors = Yamawaki TM, Lu DR, Ellwanger DC, Bhatt D, Manzanillo P, Arias V, Zhou H, Yoon OK, Homann O, Wang S, Li CM | display-authors = 6 | title = Systematic comparison of high-throughput single-cell RNA-seq methods for immune cell profiling | journal = BMC Genomics | volume = 22 | issue = 1 | pages = 66 | date = January 2021 | pmid = 33472597 | pmc = 7818754 | doi = 10.1186/s12864-020-07358-4 }}</ref> Cellular deconvolution methods provide a robust and cost-effective [[In silico|''in silico'']] alternatives for understanding the samples on a cell-type level resolution, by relying on single cell information of only a small subset of cells in the sample, the reference profiles generated by external sources, or even no reference profile at all.<ref name = "Wang_2020b">{{cite journal | vauthors = Wang J, Roeder K, Devlin B | title = Bayesian estimation of cell-type-specific gene expression per bulk sample with prior derived from single-cell data. | journal = bioRxiv | date = January 2020 | doi = 10.1101/2020.08.05.238949 | s2cid = 221096767 }}</ref>


==== (Re)analysis of old data ====
==== (Re)analysis of old data ====
Line 144: Line 144:
==== Lack of reference for rare, unknown, or uncharacterized cell types ====
==== Lack of reference for rare, unknown, or uncharacterized cell types ====


Reference-based approaches assume the existence of prior knowledge on the types of cells existing in a sample. Therefore, these methods may fail to perform accurately when the data includes rare or otherwise unknown cell types with no references incorporated in the algorithm.<ref name = "Donovan_2020" /> For example, cancer tumors consist of heterogeneous mixtures of various healthy cells of different types such as immune cells and cells related to affected tissues in addition to tumor cells.<ref name = "Has_2019" /> Although it might be possible to provide references for the immune cells, we do not usually have access to references or signatures for cancer cells due to the unique patterns of mutations and distributions of molecular information in each individual.<ref name=":3" /> These situations have been addressed in some studies under the label of deconvolution methods with partial reference availability.<ref>{{cite journal | vauthors = Qin Y, Zhang W, Sun X, Nan S, Wei N, Wu HJ, Zheng X | title = Deconvolution of heterogeneous tumor samples using partial reference signals | journal = PLoS Computational Biology | volume = 16 | issue = 11 | pages = e1008452 | date = November 2020 | pmid = 33253170 | pmc = 7728196 | doi = 10.1371/journal.pcbi.1008452 }}</ref>
Reference-based approaches assume the existence of prior knowledge on the types of cells existing in a sample. Therefore, these methods may fail to perform accurately when the data includes rare or otherwise unknown cell types with no references incorporated in the algorithm.<ref name = "Donovan_2020" /> For example, cancer tumors consist of heterogeneous mixtures of various healthy cells of different types such as immune cells and cells related to affected tissues in addition to tumor cells.<ref name = "Has_2019" /> Although it might be possible to provide references for the immune cells, we do not usually have access to references or signatures for cancer cells due to the unique patterns of mutations and distributions of molecular information in each individual.<ref name=":3" /> These situations have been addressed in some studies under the label of deconvolution methods with partial reference availability.<ref>{{cite journal | vauthors = Qin Y, Zhang W, Sun X, Nan S, Wei N, Wu HJ, Zheng X | title = Deconvolution of heterogeneous tumor samples using partial reference signals | journal = PLOS Computational Biology | volume = 16 | issue = 11 | pages = e1008452 | date = November 2020 | pmid = 33253170 | pmc = 7728196 | doi = 10.1371/journal.pcbi.1008452 | bibcode = 2020PLSCB..16E8452Q }}</ref>


== Applications ==
== Applications ==
Line 150: Line 150:
=== Relationship between cell proportions and phenotypes ===
=== Relationship between cell proportions and phenotypes ===
[[File:ConfoundingAlzS.png|thumb|440x440px|Confounding effect of cell proportions can leads to false associations between cortical gene expression and Alzheimer's disease clinical pathology.<ref name="Patrick_2020" />]]
[[File:ConfoundingAlzS.png|thumb|440x440px|Confounding effect of cell proportions can leads to false associations between cortical gene expression and Alzheimer's disease clinical pathology.<ref name="Patrick_2020" />]]
Studies have shown that the proportions of different cell types might show correlations with various phenotypes such as different diseases. For example, the proportions of Parathyroid oxyphil cells in the samples collected from the [[parathyroid gland]] for groups of patients show a significant correlation with the presence of clinical characteristics of [[chronic kidney disease]] (CKD).<ref>{{cite journal | vauthors = Ding Y, Zou Q, Jin Y, Zhou J, Wang H | title = Relationship between parathyroid oxyphil cell proportion and clinical characteristics of patients with chronic kidney disease | journal = International Urology and Nephrology | volume = 52 | issue = 1 | pages = 155–159 | date = January 2020 | pmid = 31686279 | doi = 10.1007/s11255-019-02330-y }}</ref> Another study applying the cellular deconvolution algorithms to gene expression data of Alzheimer’s patients find that patients with lower proportions of neuronal cells in the samples collected from their [[cerebral cortex]] are more likely to show the clinical characteristics of [[dementia]].<ref>{{cite journal | vauthors = Andrade-Moraes CH, Oliveira-Pinto AV, Castro-Fonseca E, da Silva CG, Guimarães DM, Szczupak D, Parente-Bruno DR, Carvalho LR, Polichiso L, Gomes BV, Oliveira LM, Rodriguez RD, Leite RE, Ferretti-Rebustini RE, Jacob-Filho W, Pasqualucci CA, Grinberg LT, Lent R | display-authors = 6 | title = Cell number changes in Alzheimer's disease relate to dementia, not to plaques and tangles | journal = Brain | volume = 136 | issue = Pt 12 | pages = 3738–52 | date = December 2013 | pmid = 24136825 | doi = 10.1093/brain/awt273 }}</ref> Cellular deconvolution algorithms could enable researchers to investigate the interactions between cell proportions and various diseases or biological phenotypes.
Studies have shown that the proportions of different cell types might show correlations with various phenotypes such as different diseases. For example, the proportions of Parathyroid oxyphil cells in the samples collected from the [[parathyroid gland]] for groups of patients show a significant correlation with the presence of clinical characteristics of [[chronic kidney disease]] (CKD).<ref>{{cite journal | vauthors = Ding Y, Zou Q, Jin Y, Zhou J, Wang H | title = Relationship between parathyroid oxyphil cell proportion and clinical characteristics of patients with chronic kidney disease | journal = International Urology and Nephrology | volume = 52 | issue = 1 | pages = 155–159 | date = January 2020 | pmid = 31686279 | doi = 10.1007/s11255-019-02330-y | s2cid = 207895174 }}</ref> Another study applying the cellular deconvolution algorithms to gene expression data of Alzheimer’s patients find that patients with lower proportions of neuronal cells in the samples collected from their [[cerebral cortex]] are more likely to show the clinical characteristics of [[dementia]].<ref>{{cite journal | vauthors = Andrade-Moraes CH, Oliveira-Pinto AV, Castro-Fonseca E, da Silva CG, Guimarães DM, Szczupak D, Parente-Bruno DR, Carvalho LR, Polichiso L, Gomes BV, Oliveira LM, Rodriguez RD, Leite RE, Ferretti-Rebustini RE, Jacob-Filho W, Pasqualucci CA, Grinberg LT, Lent R | display-authors = 6 | title = Cell number changes in Alzheimer's disease relate to dementia, not to plaques and tangles | journal = Brain | volume = 136 | issue = Pt 12 | pages = 3738–52 | date = December 2013 | pmid = 24136825 | doi = 10.1093/brain/awt273 | pmc = 3859218 }}</ref> Cellular deconvolution algorithms could enable researchers to investigate the interactions between cell proportions and various diseases or biological phenotypes.


=== Dissecting the confounding effects of cell proportions in EWAS and TWAS studies ===
=== Dissecting the confounding effects of cell proportions in EWAS and TWAS studies ===
[[Epigenome-wide association study (EWAS)]] and [[transcriptome-wide association studies (TWAS)]] aim at finding the molecular markers such as genes or methylation CpG sites that show significant correlations between their expression or methylation levels and a biological phenotype of interest such as a disease. Since the proportions of cell types in samples vary and might show significant correlations with the disease or phenotype of interest, these correlations may [[Confounding|confound]] the functional relationships between genes or CpG sites and the disease or phenotypes under study.<ref>{{cite journal | vauthors = Glastonbury CA, Couto Alves A, El-Sayed Moustafa JS, Small KS | title = Cell-Type Heterogeneity in Adipose Tissue Is Associated with Complex Traits and Reveals Disease-Relevant Cell-Specific eQTLs | journal = American Journal of Human Genetics | volume = 104 | issue = 6 | pages = 1013–1024 | date = June 2019 | pmid = 31130283 | doi = 10.1016/j.ajhg.2019.03.025 }}</ref> For example, studies aimed at finding genes involved in Alzheimer's disease may end up selecting genes that are exclusively expressed in neurons and therefore have lower expression levels in Alzheimer's patients due to compositional changes of cell types during neurodegeneration.<ref>{{cite journal | vauthors = Mathys H, Davila-Velderrain J, Peng Z, Gao F, Mohammadi S, Young JZ, Menon M, He L, Abdurrob F, Jiang X, Martorell AJ, Ransohoff RM, Hafler BP, Bennett DA, Kellis M, Tsai LH | display-authors = 6 | title = Single-cell transcriptomic analysis of Alzheimer's disease | journal = Nature | volume = 570 | issue = 7761 | pages = 332–337 | date = June 2019 | pmid = 31042697 | pmc = 6865822 | doi = 10.1038/s41586-019-1195-2 }}</ref> Such genes are not actionable targets for the treatment of Alzheimer's since they are not causally involved in the biological mechanism underlying Alzheimer's disease, but are only brought up by the confounding effects of cell types.
[[Epigenome-wide association study (EWAS)]] and [[transcriptome-wide association studies (TWAS)]] aim at finding the molecular markers such as genes or methylation CpG sites that show significant correlations between their expression or methylation levels and a biological phenotype of interest such as a disease. Since the proportions of cell types in samples vary and might show significant correlations with the disease or phenotype of interest, these correlations may [[Confounding|confound]] the functional relationships between genes or CpG sites and the disease or phenotypes under study.<ref>{{cite journal | vauthors = Glastonbury CA, Couto Alves A, El-Sayed Moustafa JS, Small KS | title = Cell-Type Heterogeneity in Adipose Tissue Is Associated with Complex Traits and Reveals Disease-Relevant Cell-Specific eQTLs | journal = American Journal of Human Genetics | volume = 104 | issue = 6 | pages = 1013–1024 | date = June 2019 | pmid = 31130283 | doi = 10.1016/j.ajhg.2019.03.025 | pmc = 6556877 }}</ref> For example, studies aimed at finding genes involved in Alzheimer's disease may end up selecting genes that are exclusively expressed in neurons and therefore have lower expression levels in Alzheimer's patients due to compositional changes of cell types during neurodegeneration.<ref>{{cite journal | vauthors = Mathys H, Davila-Velderrain J, Peng Z, Gao F, Mohammadi S, Young JZ, Menon M, He L, Abdurrob F, Jiang X, Martorell AJ, Ransohoff RM, Hafler BP, Bennett DA, Kellis M, Tsai LH | display-authors = 6 | title = Single-cell transcriptomic analysis of Alzheimer's disease | journal = Nature | volume = 570 | issue = 7761 | pages = 332–337 | date = June 2019 | pmid = 31042697 | pmc = 6865822 | doi = 10.1038/s41586-019-1195-2 | bibcode = 2019Natur.570..332M }}</ref> Such genes are not actionable targets for the treatment of Alzheimer's since they are not causally involved in the biological mechanism underlying Alzheimer's disease, but are only brought up by the confounding effects of cell types.


== References ==
== References ==

Revision as of 13:38, 13 March 2021

Cellular deconvolution (also referred to as cell type composition or cell proportion estimation) refers to computational techniques aiming at estimating the proportions of different cell types in samples collected from a tissue.[1] For example, samples collected from the human brain are a mixture of various neuronal and glial cell types (e.g. microglia and astrocytes) in different proportions, where each cell type has a diverse gene expression profile.[2] Since most high-throughput technologies use bulk samples and measure the aggregated levels of molecular information (e.g. expression levels of genes) for all cells in a sample, the measured values would be an aggregate of the values pertaining to the expression landscape of different cell types.[3] Therefore, many downstream analyses such as differential gene expression might be confounded by the variations in cell type proportions when using the output of high-throughput technologies applied to bulk samples.[1] The development of statistical methods to identify cell type proportions in large-scale bulk samples is an important step for better understanding of the relationship between cell type composition and diseases.[4]

Cellular deconvolution algorithms have been applied to a variety of samples collected from saliva,[5] buccal,[5], cervical,[5] PBMC,[6] brain,[2] kidney,[1] and pancreatic cells,[1] and many studies have shown that estimating and incorporating the proportions of cell types into various analyses improves the interpretability of high-throughput omics data and reduces the confounding effects of cellular heterogeneity in functional analysis of omics data.[7][8]

Cellular Deconvolution Pipeline. All methods require the gene expression or DNA methylation profiles of each subject in the study using high-throughput technologies and bulk samples.

Mathematical Formulation

Most cellular deconvolution algorithms consider an input data in a form of a matrix , which represents some molecular information (e.g. gene expression data or DNA methylation data) measured over a group of samples and marks (e.g. genes or CpG sites). The goal of the algorithm is to use these data and return an output matrix , representing the proportions of distinct cell types in each of the samples. Some methods limit the sum of each column of matrix less than or equal to one, so that the proportions of cell types some up to the overall number of cells in the sample (less than one when there are some unknown cell types in the samples).[9] Moreover, it is assumed that the values of matrix are non-negative as they pertain to proportions of cell types.[9]

Current strategies

There are two broad categories of methods aiming at estimating the proportion of cell types in samples using some type of omics data (bulk gene expression or DNA methylation data). These approaches are labeled as reference-based (also called supervised) and reference-free (also called unsupervised) methods[10][11]

Reference-based methods

Reference-based methods require an a priori defined reference matrix consisting of the expected value (also called profile or signature) of gene expression (or DNA methylation) for a group of genes (or CpG sites) known to have a differential expression (or methylation)

Reference-based methods and reference-free methods for cellular deconvolution. Reference-based approaches aim at estimating the contribution of each signature profile to the overall level of signal while reference-free methods need to estimate both latent cell type signatures and the contribution of each signature.[12]

across the cell types.[10] A reference matrix can be be represented by a matrix , representing the expected value for markers (genes or CpG sites) for each of cell types known to be presented in the samples. These references can be derived by exploring external single-cell epigenomics or transcriptomics datasets generated for a group of samples similar (e.g. in terms of biological condition, sex and age) to the samples for which the deconvolution method will be applied. These methods use statistical approaches such as non-negative or constrained linear regression methods to dissect the contribution of each cell type to the aggregated bulk signals of genes or CpG sites.[13] Constrained regression is the basis for many of reference-free cellular deconvolution methods existing in the literature, aiming at estimating the cell proportion values () that maximizes the similarity between and .[13]

Construction of reference profiles

There are a variety of approaches for isolating different cell types to measure their gene expression or DNA methylation levels to be used as references in the deconvolution algorithms. Earlier methods used cell sorting methods such as FACS (fluorescence-activated cell sorting) based on the flow cytometry technique, which separates the populations of cells belonging to different cell types based on their cell sizes, morphologies (shape), and surface protein expressions.[14][15][16] With the advance in single-cell technologies, newer approaches started to incorporate references for cell-types measured on a single-cell resolution obtained for a subset of subjects in the study or external subjects from a similar biological condition.[17][1][18]

Reference-free methods

Reference-free methods do not need the reference profiles of cell-type specific genes (or CpGs), although they might still require the identity (name) of cell-type-specific genes (or CpGs).[19] These methods might be considered as a modification of reference-based methods where both and are unknown, and the goal is to jointly estimate both matrices so that the similarity between and is maximized. Many of the reference-free methods are based on mathematical framework of non-negative matrix factorization[20][21][10], which imposes a non-negativity constraint on the elements of and . Additional constraints such as the assumption of orthogonality between the columns of might be incorporated to improve the interpretability of results and prevent overfitting.

Some recent cellular deconvolution methods selected based on citations and publishing year.
Title Category Input Data Type Year
CIBERSORT[22] Robust enumeration of cell subsets from tissue expression profiles Reference based Gene expression 2018
CDSeq[23] A complete deconvolution method for dissecting tissue heterogeneity Reference free Gene expression 2019
FARDEEP[24] Fast and robust deconvolution of expression profiles Reference based Gene expression 2019
UNDO[25] Unsupervised deconvolution of tumor-stromal mixed expressions Reference free Gene expression 2015
dtangle[26] Accurate and robust cell type deconvolution Reference based Gene expression 2019
EPIC[27] Estimating the proportions of different cell types from bulk gene expression data Reference based Gene expression 2017
BSEQ-sc[28] Deconvolution of bulk sequencing experiments using single cell data Reference based Gene expression 2016
MuSiC[17] Cell-type Identification by estimating relative subsets of RNA transcripts Reference based Gene expression 2019
SCDC[29] Bulk gene expression deconvolution by multiple single-Cell RNA sequencing references Reference based Gene expression 2020
DWLS[30] Gene expression deconvolution using dampened weighted least squares Reference based Gene expression 2019
deconvSeq[31] Deconvolution of cell mixture distribution in sequencing data Reference based Gene expression 2019
Bisque[18] Decomposition of bulk expression with single-cell sequencing Reference based Gene expression 2020
TOAST[32] Tools for the analysis of heterogeneous tissues Reference free DNA methylation 2019
Houseman[9] Reference-free deconvolution of DNA methylation data and mediation by cell composition effects Reference based DNA methylation 2016
methylCC[33] Technology-independent estimation of cell type composition using differentially methylated regions Reference based DNA methylation 2019
BayesCCE[34] Bayesian framework for estimating cell-type composition from DNA methylation Reference free DNA methylation 2018

Advantages and limitations

Advantages

In silico cell-type level resolution

The advance of single-cell technologies enables the profiling of each individual cell in a sample, which help elucidate the issue of cellular heterogeneity by measuring the proportions of different cells in samples. Even though the quality of single cell profiling technologies has been on the rise in recent years, these technologies are still costly, limiting their applications in large populations of samples.[35] Single cell technologies such as single cell transcriptomic methods also tend to have higher error rates due to factors such as high dropout events.[36][37] Cellular deconvolution methods provide a robust and cost-effective in silico alternatives for understanding the samples on a cell-type level resolution, by relying on single cell information of only a small subset of cells in the sample, the reference profiles generated by external sources, or even no reference profile at all.[38]

(Re)analysis of old data

There are large amounts of old bulk data from studies concerning various diseases and biological conditions. These datasets could be considered important resources in studying of rare disease, long follow-up studies or samples and tissues that are difficult to extract. Since the biological samples for many of these studies are not available or accessible anymore, reprofiling the data using single cell technologies might not be within the realm of possibilities for many studies. Invention of more advanced cellular deconvolution methods gives the opportunity to researchers to come back to old omics studies, reanalyze their datasets, and scrutinize their findings.[38]

Limitations

Reliability of reference

Reference-based approaches rely on the availability of accurate references to estimate cell proportions. The discrepancy between the biology of the samples underlying the references and the samples for which the cell proportions are being estimated could introduce bias in estimated cell proportions.[39] Studies have shown that using references obtained from samples with different phenotypes such as age, gender, and disease status than the population of interest reduces the performance of reference-based methods to levels lower than their reference-free counterparts.[10][39]

Lack of reference for rare, unknown, or uncharacterized cell types

Reference-based approaches assume the existence of prior knowledge on the types of cells existing in a sample. Therefore, these methods may fail to perform accurately when the data includes rare or otherwise unknown cell types with no references incorporated in the algorithm.[7] For example, cancer tumors consist of heterogeneous mixtures of various healthy cells of different types such as immune cells and cells related to affected tissues in addition to tumor cells.[24] Although it might be possible to provide references for the immune cells, we do not usually have access to references or signatures for cancer cells due to the unique patterns of mutations and distributions of molecular information in each individual.[10] These situations have been addressed in some studies under the label of deconvolution methods with partial reference availability.[40]

Applications

Relationship between cell proportions and phenotypes

Confounding effect of cell proportions can leads to false associations between cortical gene expression and Alzheimer's disease clinical pathology.[2]

Studies have shown that the proportions of different cell types might show correlations with various phenotypes such as different diseases. For example, the proportions of Parathyroid oxyphil cells in the samples collected from the parathyroid gland for groups of patients show a significant correlation with the presence of clinical characteristics of chronic kidney disease (CKD).[41] Another study applying the cellular deconvolution algorithms to gene expression data of Alzheimer’s patients find that patients with lower proportions of neuronal cells in the samples collected from their cerebral cortex are more likely to show the clinical characteristics of dementia.[42] Cellular deconvolution algorithms could enable researchers to investigate the interactions between cell proportions and various diseases or biological phenotypes.

Dissecting the confounding effects of cell proportions in EWAS and TWAS studies

Epigenome-wide association study (EWAS) and transcriptome-wide association studies (TWAS) aim at finding the molecular markers such as genes or methylation CpG sites that show significant correlations between their expression or methylation levels and a biological phenotype of interest such as a disease. Since the proportions of cell types in samples vary and might show significant correlations with the disease or phenotype of interest, these correlations may confound the functional relationships between genes or CpG sites and the disease or phenotypes under study.[43] For example, studies aimed at finding genes involved in Alzheimer's disease may end up selecting genes that are exclusively expressed in neurons and therefore have lower expression levels in Alzheimer's patients due to compositional changes of cell types during neurodegeneration.[44] Such genes are not actionable targets for the treatment of Alzheimer's since they are not causally involved in the biological mechanism underlying Alzheimer's disease, but are only brought up by the confounding effects of cell types.

References

  1. ^ a b c d e Cobos FA, Alquicira-Hernandez J, Powell JE, Mestdagh P, De Preter K (November 2020). "Benchmarking of cell type deconvolution pipelines for transcriptomics data". Nature Communications. 11 (1): 5650. Bibcode:2020NatCo..11.5650A. doi:10.1038/s41467-020-19015-1. PMC 7648640. PMID 33159064.
  2. ^ a b c Patrick E, Taga M, Ergun A, Ng B, Casazza W, Cimpean M, et al. (August 2020). "Deconvolving the contributions of cell-type heterogeneity on cortical gene expression". PLOS Computational Biology. 16 (8): e1008120. Bibcode:2020PLSCB..16E8120P. doi:10.1371/journal.pcbi.1008120. PMC 7451979. PMID 32804935.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  3. ^ Kuhn A, Kumar A, Beilina A, Dillman A, Cookson MR, Singleton AB (November 2012). "Cell population-specific expression analysis of human cerebellum". BMC Genomics. 13 (1): 610. doi:10.1186/1471-2164-13-610. PMC 3561119. PMID 23145530.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  4. ^ Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K (June 2018). "Computational deconvolution of transcriptomics data from mixed cell populations". Bioinformatics. 34 (11): 1969–1979. doi:10.1093/bioinformatics/bty019. PMID 29351586.
  5. ^ a b c Zheng SC, Webster AP, Dong D, Feber A, Graham DG, Sullivan R, et al. (July 2018). "A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix". Epigenomics. 10 (7): 925–940. doi:10.2217/epi-2018-0037. PMID 29693419.
  6. ^ Chiu YJ, Hsieh YH, Huang YH (December 2019). "Improved cell composition deconvolution method of bulk gene expression profiles to quantify subsets of immune cells". BMC Medical Genomics. 12 (Suppl 8): 169. doi:10.1186/s12920-019-0613-5. PMC 6923925. PMID 31856824.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  7. ^ a b Donovan MK, D'Antonio-Chronowska A, D'Antonio M, Frazer KA (February 2020). "Cellular deconvolution of GTEx tissues powers discovery of disease and cell-type associated regulatory variants". Nature Communications. 11 (1): 955. Bibcode:2020NatCo..11..955D. doi:10.1038/s41467-020-14561-0. PMC 7031340. PMID 32075962.
  8. ^ Teschendorff AE, Zhu T, Breeze CE, Beck S (September 2020). "EPISCORE: cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-Seq data". Genome Biology. 21 (1): 221. doi:10.1186/s13059-020-02126-9. PMC 7650528. PMID 32883324.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  9. ^ a b c Houseman EA, Kile ML, Christiani DC, Ince TA, Kelsey KT, Marsit CJ (June 2016). "Reference-free deconvolution of DNA methylation data and mediation by cell composition effects". BMC Bioinformatics. 17 (1): 259. doi:10.1186/s12859-016-1140-4. PMC 4928286. PMID 27358049.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  10. ^ a b c d e Teschendorff AE, Zheng SC (May 2017). "Cell-type deconvolution in epigenome-wide association studies: a review and recommendations". Epigenomics. 9 (5): 757–768. doi:10.2217/epi-2016-0153. PMID 28517979.
  11. ^ Sun X, Sun S, Yang S (September 2019). "An Efficient and Flexible Method for Deconvoluting Bulk RNA-Seq Data with Single-Cell RNA-Seq Data". Cells. 8 (10): 1161. doi:10.3390/cells8101161. PMC 6830085. PMID 31569701.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  12. ^ Li Z, Wu H (September 2019). "TOAST: improving reference-free cell composition estimation by cross-cell type differential analysis". Genome Biology. 20 (1): 190. doi:10.1186/s13059-019-1778-0. PMC 6727351. PMID 31484546.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  13. ^ a b Titus AJ, Gallimore RM, Salas LA, Christensen BC (October 2017). "Cell-type deconvolution from DNA methylation: a review of recent applications". Human Molecular Genetics. 26 (R2): R216–R224. doi:10.1093/hmg/ddx275. PMC 5886462. PMID 28977446.
  14. ^ Rosental B, Kozhekbaeva Z, Fernhoff N, Tsai JM, Traylor-Knowles N (August 2017). "Coral cell separation and isolation by fluorescence-activated cell sorting (FACS)". BMC Cell Biology. 18 (1): 30. doi:10.1186/s12860-017-0146-8. PMC 5575905. PMID 28851289.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  15. ^ Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlén SE, Greco D, et al. (2012-07-25). "Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility". PLOS ONE. 7 (7): e41361. Bibcode:2012PLoSO...741361R. doi:10.1371/journal.pone.0041361. PMC 3405143. PMID 22848472.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  16. ^ Koestler DC, Jones MJ, Usset J, Christensen BC, Butler RA, Kobor MS, et al. (March 2016). "Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL)". BMC Bioinformatics. 17 (1): 120. doi:10.1186/s12859-016-0943-7. PMC 4782368. PMID 26956433.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  17. ^ a b Wang X, Park J, Susztak K, Zhang NR, Li M (January 2019). "Bulk tissue cell type deconvolution with multi-subject single-cell expression reference". Nature Communications. 10 (1): 380. Bibcode:2019NatCo..10..380W. doi:10.1038/s41467-018-08023-x. PMC 6342984. PMID 30670690.
  18. ^ a b Jew B, Alvarez M, Rahmani E, Miao Z, Ko A, Garske KM, et al. (April 2020). "Accurate estimation of cell composition in bulk expression through robust integration of single-cell information". Nature Communications. 11 (1): 1971. Bibcode:2020NatCo..11.1971J. doi:10.1038/s41467-020-15816-6. PMC 7181686. PMID 32332754.
  19. ^ Tang D, Park S, Zhao H (March 2020). "NITUMID: Nonnegative matrix factorization-based Immune-TUmor MIcroenvironment Deconvolution". Bioinformatics. 36 (5): 1344–1350. doi:10.1093/bioinformatics/btz748. PMID 31593244.
  20. ^ Repsilber D, Kern S, Telaar A, Walzl G, Black GF, Selbig J, et al. (January 2010). "Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach". BMC Bioinformatics. 11 (1): 27. doi:10.1186/1471-2105-11-27. PMC 3098067. PMID 20070912.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  21. ^ Moffitt RA, Marayati R, Flate EL, Volmar KE, Loeza SG, Hoadley KA, et al. (October 2015). "Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma". Nature Genetics. 47 (10): 1168–78. doi:10.1038/ng.3398. PMC 4912058. PMID 26343385.
  22. ^ Newman AM, Steen CB, Liu CL, Gentles AJ, Chaudhuri AA, Scherer F, et al. (July 2019). "Determining cell type abundance and expression from bulk tissues with digital cytometry". Nature Biotechnology. 37 (7): 773–782. doi:10.1038/s41587-019-0114-2. PMC 6610714. PMID 31061481.
  23. ^ Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, et al. (December 2019). "CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data". PLOS Computational Biology. 15 (12): e1007510. Bibcode:2019PLSCB..15E7510K. doi:10.1371/journal.pcbi.1007510. PMC 6907860. PMID 31790389.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  24. ^ a b Hao Y, Yan M, Heath BR, Lei YL, Xie Y (May 2019). "Fast and robust deconvolution of tumor infiltrating lymphocyte from expression profiles using least trimmed squares". PLOS Computational Biology. 15 (5): e1006976. Bibcode:2019PLSCB..15E6976H. doi:10.1371/journal.pcbi.1006976. PMC 6522071. PMID 31059559.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  25. ^ Niya Wang <Wangny@Vt.Edu> (2017), UNDO, Bioconductor, doi:10.18129/b9.bioc.undo, retrieved 2021-02-23
  26. ^ Hunt GJ, Freytag S, Bahlo M, Gagnon-Bartsch JA (June 2019). "dtangle: accurate and robust cell type deconvolution". Bioinformatics. 35 (12): 2093–2099. doi:10.1093/bioinformatics/bty926. PMID 30407492.
  27. ^ Racle J, de Jonge K, Baumgaertner P, Speiser DE, Gfeller D (November 2017). Valencia A (ed.). "Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data". eLife. 6: e26476. doi:10.7554/eLife.26476. PMC 5718706. PMID 29130882.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  28. ^ Baron M, Veres A, Wolock SL, Faust AL, Gaujoux R, Vetere A, et al. (October 2016). "A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure". Cell Systems. 3 (4): 346–360.e4. doi:10.1016/j.cels.2016.08.011. PMC 5228327. PMID 27667365.
  29. ^ Dong M, Thennavan A, Urrutia E, Li Y, Perou CM, Zou F, Jiang Y (January 2021). "SCDC: bulk gene expression deconvolution by multiple single-cell RNA sequencing references". Briefings in Bioinformatics. 22 (1): 416–427. doi:10.1093/bib/bbz166. PMC 7820884. PMID 31925417.
  30. ^ Tsoucas D, Dong R, Chen H, Zhu Q, Guo G, Yuan GC (July 2019). "Accurate estimation of cell-type composition from gene expression data". Nature Communications. 10 (1): 2975. Bibcode:2019NatCo..10.2975T. doi:10.1038/s41467-019-10802-z. PMC 6611906. PMID 31278265.
  31. ^ Du R, Carey V, Weiss ST (December 2019). "deconvSeq: deconvolution of cell mixture distribution in sequencing data". Bioinformatics. 35 (24): 5095–5102. doi:10.1093/bioinformatics/btz444. PMID 31147676.
  32. ^ Li Z, Wu Z, Jin P, Wu H (October 2019). "Dissecting differential signals in high-throughput data from complex tissues". Bioinformatics. 35 (20): 3898–3905. doi:10.1093/bioinformatics/btz196. PMC 6931351. PMID 30903684.
  33. ^ Hicks SC, Irizarry RA (November 2019). "methylCC: technology-independent estimation of cell type composition using differentially methylated regions". Genome Biology. 20 (1): 261. doi:10.1186/s13059-019-1827-8. PMC 6883691. PMID 31783894.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  34. ^ Rahmani E, Schweiger R, Shenhav L, Wingert T, Hofer I, Gabel E, et al. (September 2018). "BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference". Genome Biology. 19 (1): 141. doi:10.1186/s13059-018-1513-2. PMC 6151042. PMID 30241486.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  35. ^ Wang X, Park J, Susztak K, Zhang NR, Li M (January 2019). "Bulk tissue cell type deconvolution with multi-subject single-cell expression reference". Nature Communications. 10 (1): 380. Bibcode:2019NatCo..10..380W. doi:10.1038/s41467-018-08023-x. PMC 6342984. PMID 30670690.
  36. ^ Ran D, Zhang S, Lytal N, An L (August 2020). "scDoc: correcting drop-out events in single-cell RNA-seq data". Bioinformatics. 36 (15): 4233–4239. doi:10.1093/bioinformatics/btaa283. PMID 32365169.
  37. ^ Yamawaki TM, Lu DR, Ellwanger DC, Bhatt D, Manzanillo P, Arias V, et al. (January 2021). "Systematic comparison of high-throughput single-cell RNA-seq methods for immune cell profiling". BMC Genomics. 22 (1): 66. doi:10.1186/s12864-020-07358-4. PMC 7818754. PMID 33472597.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  38. ^ a b Wang J, Roeder K, Devlin B (January 2020). "Bayesian estimation of cell-type-specific gene expression per bulk sample with prior derived from single-cell data". bioRxiv. doi:10.1101/2020.08.05.238949. S2CID 221096767.
  39. ^ a b Gervin K, Salas LA, Bakulski KM, van Zelm MC, Koestler DC, Wiencke JK, et al. (August 2019). "Systematic evaluation and validation of reference and library selection methods for deconvolution of cord blood DNA methylation data". Clinical Epigenetics. 11 (1): 125. doi:10.1186/s13148-019-0717-y. PMC 6712867. PMID 31455416.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  40. ^ Qin Y, Zhang W, Sun X, Nan S, Wei N, Wu HJ, Zheng X (November 2020). "Deconvolution of heterogeneous tumor samples using partial reference signals". PLOS Computational Biology. 16 (11): e1008452. Bibcode:2020PLSCB..16E8452Q. doi:10.1371/journal.pcbi.1008452. PMC 7728196. PMID 33253170.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  41. ^ Ding Y, Zou Q, Jin Y, Zhou J, Wang H (January 2020). "Relationship between parathyroid oxyphil cell proportion and clinical characteristics of patients with chronic kidney disease". International Urology and Nephrology. 52 (1): 155–159. doi:10.1007/s11255-019-02330-y. PMID 31686279. S2CID 207895174.
  42. ^ Andrade-Moraes CH, Oliveira-Pinto AV, Castro-Fonseca E, da Silva CG, Guimarães DM, Szczupak D, et al. (December 2013). "Cell number changes in Alzheimer's disease relate to dementia, not to plaques and tangles". Brain. 136 (Pt 12): 3738–52. doi:10.1093/brain/awt273. PMC 3859218. PMID 24136825.
  43. ^ Glastonbury CA, Couto Alves A, El-Sayed Moustafa JS, Small KS (June 2019). "Cell-Type Heterogeneity in Adipose Tissue Is Associated with Complex Traits and Reveals Disease-Relevant Cell-Specific eQTLs". American Journal of Human Genetics. 104 (6): 1013–1024. doi:10.1016/j.ajhg.2019.03.025. PMC 6556877. PMID 31130283.
  44. ^ Mathys H, Davila-Velderrain J, Peng Z, Gao F, Mohammadi S, Young JZ, et al. (June 2019). "Single-cell transcriptomic analysis of Alzheimer's disease". Nature. 570 (7761): 332–337. Bibcode:2019Natur.570..332M. doi:10.1038/s41586-019-1195-2. PMC 6865822. PMID 31042697.