# Polygenic score

An illustration of the distribution and stratification ability of a polygenic risk score. The left panel shows how in the predictions of disease risk, the PRS on the x-axis, can separate cases (i.e. people with the diseases) from the controls (people without the disease). The y-axis describes how many in each group are assigned a certain PRS. To the right, the same population is divided into three groups according to the predicted risk, their assigned PRS. The observed risk is shown on the y-axis and the separation of the groups is in correspondence with the predicted risks.

In genetics, a polygenic score, also called a polygenic risk score (PRS), genetic risk score, or genome-wide score, is a number that summarises the estimated effect of many genetic variants on an individual's phenotype, typically calculated as a weighted sum of trait-associated alleles.[1][2][3] It reflects an individual's estimated genetic predisposition for a given trait and can be used as a predictor for that trait.[4][5][6][7][8] In other words, it gives an estimate of how likely an individual is to have a given trait only based on genetics, without taking environmental factors into account. Polygenic scores are widely used in animal breeding and plant breeding (usually termed genomic prediction or genomic selection) due to their efficacy in improving livestock breeding and crops.[9]

Recent progress in machine learning (ML) analysis of large genomic datasets has enabled the creation of polygenic predictors of complex human traits, including risk for many important complex diseases,[10][11] which are typically affected by many genetic variants that each confer a small effect on overall risk.[12][13] In a polygenic risk predictor the lifetime (or age-range) risk for the disease is a numerical function (Polygenic Risk Score or PRS) which depends on the states of thousands of individual genetic variants (i.e., Single Nucleotide Polymorphisms, or SNPs).

Polygenic Risk Scores are an area of intense scientific investigation: hundreds of papers are written each year on topics such as learning algorithms for genomic prediction, new predictor training, validation testing of predictors, clinical application of PRS.[14] [15] [16] [6][11] In 2018 the American Heart Association named polygenic risk scores as one of the major breakthroughs in research in heart disease and stroke. [17]

## History

An early (2006) example of a genetic risk score applied to Type 2 Diabetes in humans. Individuals with Type 2 diabetes (white bars) have a higher score than controls (black bars).[18]

One of the first precursors to the modern polygenic score was proposed under the term marker-assisted selection (MAS) in 1990.[19] According to MAS, breeders are able to increase the efficiency of artificial selection by estimating the regression coefficients of genetic markers that are correlated with differences in the trait of interest and assigning individual animals a "score" from this information. A major development of these fundamentals was proposed in 2001 by researchers who discovered that the use of a Bayesian prior could help to mitigate the problem of the number of markers being greater than the sample of animals.[20]

These methods were first applied to humans in the late 2000s, starting with a proposal in 2007 that these scores could be used in human genetics to identify individuals at high risk for disease.[21] This was successfully applied in empirical research for the first time in 2009 by researchers who organized a genome-wide association study (GWAS) of schizophrenia to construct scores of risk propensity. This study was also the first to use the term polygenic score for a prediction drawn from a linear combination of single-nucleotide polymorphism (SNP) genotypes, which was able to explain 3% of the variance in schizophrenia.[22]

## Methods of construction

A polygenic score (PGS) is constructed from the "weights" derived from a genome-wide association study (GWAS), or from some form of machine learning algorithm. In a GWAS, a set of genetic markers (usually SNPs) is genotyped on a training sample, and effect sizes are estimated for each marker's association with the trait of interest. These weights are then used to assign individualized polygenic scores in an independent replication sample.[1] The estimated score, ${\displaystyle {\hat {S}}}$, generally follows the form

${\displaystyle {\hat {S}}=\sum _{j=1}^{m}X_{j}{\hat {\beta }}_{j}}$,

where the ${\displaystyle {\hat {S}}}$ of an individual is equal to the weighted sum of the individual's marker genotypes, ${\displaystyle X_{j}}$, at ${\displaystyle {m}}$ SNPs.[1] Weights are estimated using some form of regression analysis. Because the number of genomic variants is usually larger than the sample size, one cannot use OLS multiple regression (p > n problem[23][24]). Researchers have proposed various methodologies that deal with this problem as well as how to generate the weights of the SNPs, ${\displaystyle {\hat {\beta }}_{j}}$, and how to determine which ${\displaystyle {m}}$ SNPs should be included.

### Pruning and thresholding

The simplest so-called "pruning and thresholding" method of construction sets weights equal to the coefficient estimates from a regression of the trait on each genetic variant. The included SNPs may be selected using an algorithm that attempts to ensure that each marker is approximately independent. Failing to account for non-random association of genetic variants will typically reduce the score's predictive accuracy. This is important because genetic variants are often correlated with other nearby variants, such that the weight of a causal variant will be attenuated if it is more strongly correlated with its neighbors than a null variant. This is called linkage disequilibrium, a common phenomenon that arises from the shared evolutionary history of neighboring genetic variants. Further restriction can be achieved by multiple-testing different sets of SNPs selected at various thresholds, such as all SNPs which are genome-wide statistically-significant hits (often taken to be p < 5 × 10-8), or all SNPs with p < 0.05 or all SNPs with p < 0.50, and the one with greatest performance used for further analysis; especially for highly polygenic traits, the best polygenic score will tend to use most or all SNPs.[25]

### Bayesian methods

Bayesian approaches, originally pioneered in concept in 2001,[20] attempt to explicitly model preexisting genetic architecture, thereby accounting for the distribution of effect sizes with a prior that should improve the accuracy of a polygenic score. One of the most popular modern Bayesian methods uses "linkage disequilibrium prediction" (LDpred for short) to set the weight for each SNP equal to the average of its posterior distribution after linkage disequilibrium has been accounted for. LDpred tends to outperform simpler methods of pruning and thresholding, especially at large sample sizes; for example, its estimations have improved the predicted variance of a polygenic score for schizophrenia in a large data set from 20.1% to 25.3%.[8]

### Penalized regression

Penalized regression methods, such as LASSO and ridge regression, can also be used to improve the accuracy of polygenic scores. Penalized regression can be interpreted as placing informative prior probabilities on how many genetic variants are expected to affect a trait, and the distribution of their effect sizes. In other words, these methods in effect "penalize" the large coefficients in a regression model and shrink them conservatively. Ridge regression accomplishes this by shrinking the prediction with a term that penalizes the sum of the squared coefficients.[4] LASSO accomplishes something similar by penalizing the sum of absolute coefficients.[26] Bayesian counterparts exist for LASSO and ridge regression, and other priors have been suggested and used. They can perform better in some circumstances.[27] A multi-dataset, multi-method study[24] found that of 15 different methods compared across four datasets, minimum redundancy maximum relevance was the best performing method. Furthermore, variable selection methods tended to outperform other methods. Variable selection methods do not use all the available genomic variants present in a dataset, but attempt to select an optimal subset of variants to use. This leads to less overfitting but more bias (see bias-variance tradeoff).

## Validation methods

Broadly speaking there are two methods used for PRS validation.

1. Test prediction quality in a new dataset containing individuals not used in the training of the predictor. This out-of-sample validation is now a standard requirement in peer review of new genomic predictors. Ideally these individuals would have experienced a different environment than the training set (e.g., were born and raised in a different part of the world, or in different decades). Examples of large scale out-of-sample validations include: CAD in French Canadians,[28] breast cancer,[29] blood and urine biomarkers,[30] among many more.
2. Perhaps the most rigorous validation method is to compare siblings who have grown up together. It has been shown that PRS can predict which of two brothers or which of two sisters has a specific condition, such as heart disease or breast cancer. The predictors work almost as well in predicting sibling disease status as when comparing two random individuals from the general population who did not share family environments while growing up. This is strong evidence for causal genetic effects. These results also suggest that embryo selection using PRS can reduce disease risk for children born through IVF.[14][31][32][33]

## Predictive performance

The benefit of polygenic scores is that they can be used to predict the future for crops, animal breeding, and humans alike. Although the same basic concepts underlie these areas of prediction, they face different challenges that require different methodologies. The ability to produce very large family size in nonhuman species, accompanied by deliberate selection, leads to a smaller effective population, higher degrees of linkage disequilibrium among individuals, and a higher average genetic relatedness among individuals within a population. For example, members of plant and animal breeds that humans have effectively created, such as modern maize or domestic cattle, are all technically "related". In human genomic prediction, by contrast, unrelated individuals in large populations are selected to estimate the effects of common SNPs. Because of smaller effective population in livestock, the mean coefficient of relationship between any two individuals is likely high, and common SNPs will tag causal variants at greater physical distance than for humans; this is the major reason for lower SNP-based heritability estimates for humans compared to livestock. In both cases, however, sample size is key for maximizing the accuracy of genomic prediction.[34]

While modern genomic prediction scoring in humans is generally referred to as a "polygenic score" (PGS) or a "polygenic risk score" (PRS), in livestock the more common term is "genomic estimated breeding value", or GEBV (similar to the more familiar "EBV", but with genotypic data). Conceptually, a GEBV is the same as a PGS: a linear function of genetic variants that are each weighted by the apparent effect of the variant. Despite this, polygenic prediction in livestock is useful for a fundamentally different reason than for humans. In humans, a PRS is used for the prediction of individual phenotype, while in livestock a GEBV is typically used to predict the offspring's average value of a phenotype of interest in terms of the genetic material it inherited from a parent. In this way, a GEBV can be understood as the average of the offspring of an individual or pair of individual animals. GEBVs are also typically communicated in the units of the trait of interest. For example, the expected increase in milk production of the offspring of a specific parent compared to the offspring from a reference population might be a typical way of using a GEBV in dairy cow breeding and selection.[34]

Some accuracy values are given in the sections below for comparison purposes. These are given in terms of correlations r and have been converted from explained variance if given in that format in the source. (-1 ≤ r ≤ 1 where a larger number implies better predictions.)

### In plants

The predictive value of polygenic scoring has large practical benefits for plant and animal breeding because it increases the selection precision and allows for shorter generations, both of which speed up evolution.[35] Genomic prediction with some version of polygenic scoring has been used in experiments on maize, small grains such as barley, wheat, oats and rye, and rice biparental families. In many cases, these predictions have been so successful that researchers have advocated for its use in combating global population growth and climate change.[9]

• In 2015, r ≈ 0.55 for total root length in maize.[36]
• In 2014, r ≈ 0.03 to 0.99 across four traits in barley.[37]

### In humans

For humans, while most polygenic scores are not predictive enough to diagnose disease, they could potentially be used in addition to other covariates (such as age, BMI, smoking status) to improve estimates of disease susceptibility.[41][2][13] However, even if a polygenic score might not make reliable diagnostic predictions across an entire population, it may still make very accurate predictions for outliers at extreme high or low risk. The clinical utility may therefore still be large even if average measures of prediction performance are moderate.[11]

Although issues such as systematically poorer performance in individuals of non-European ancestry limit ethical and practical widespread use,[42] several authors have noted that many causal variants that underlie common genetic variation in Europeans are shared across different continents for (e.g.) BMI and type 2 diabetes in African populations[43] as well as schizophrenia in Chinese populations.[44] Other researchers recognize that polygenic underprediction in non-European population should galvanize new GWAS that prioritize greater genetic diversity in order to maximize the potential health benefits brought about by predictive polygenic scores.[45] Significant scientific efforts are being made to this end.

Embryo genetic screening is common with millions biopsied and tested each year worldwide. Extremely accurate genotyping methods have been developed so that the embryo genotype can be determined to high precision.[46][47] Testing for aneuploidy and monogenetic diseases has increasingly become praxis over decades whereas tests for polygenic diseases have begun to be employed more recently. The use of polygenic scores for embryo selection has been criticised due to ethical and safety issues as well as limited practical utility.[48][49][50] However, trait-specific evaluations claiming the contrary have been put forth[31] [51] and ethical arguments for PGS-based embryo selection have also been made.[52][53][54] The topic continues to be an active area of research not only within genomics but also within clinical applications and ethics.

As of 2019, polygenic scores from well over a hundred phenotypes have been developed from genome-wide association statistics.[55] These include scores that can be categorized as anthropometric, behavioural, cardiovascular, non-cancer illness, psychiatric/neurological, and response to treatment/medication.[56]

#### Examples of continuous traits

As above, these examples of performance report the correlation r between predicted score and phenotype:

• In 2016, r ≈ 0.30 for educational attainment variation at age 16.[57] This polygenic score was based on a GWAS using data from 293,000 people.[58]
• In 2016, r ≈ 0.31 for case/control status for first-episode psychosis.[59]
• In 2017, r ≈ 0.29 for case/control status for schizophrenia in combined European and Chinese samples.[44]
• In 2018, r ≈ 0.67 for height variation in adulthood, resulting in prediction within ~3 cm for most individuals in the study.[60]
• In 2018, r ≈ 0.23 for intelligence from samples of 269,867 Europeans.[61]
• In 2018, r ≈ 0.33 to 0.36 for educational attainment and r ≈ 0.26 to 0.32 for intelligence from over 1.1 million Europeans.[62]
• In 2020, r ≈ 0.76 for lipoprotein A levels in blood samples from 300,000 British Europeans.[30]

#### Examples of disease prediction

When predicting disease risk, PRS gives a continuous score that estimates the risk of having or getting the disease, within some pre-defined time span. A common metric for evaluating such continuous estimates of yes/no questions (see Binary classification) is the area under the ROC curve (AUC). Here are some example results of PRS performance, as measured in AUC (0 ≤ AUC ≤ 1 where a larger number implies better prediction):

• In 2018, AUC ≈ 0.64 for coronary disease using ~120,000 British individuals.[63]A
• In 2019, AUC ≈ 0.63 for breast cancer, developed from ~95,000 case subjects and ~75,000 controls of European ancestry.[29]
• In 2019, AUC ≈ 0.71 for hypothyroidism for ~24,000 case subjects and ~463,00 controls of European ancestry.[11]
• In 2020, AUC ≈ 0.71 for schizophrenia, using 90 cohorts including ~67,000 case subjects and ~94,000 controls with ~80% of European ancestry and ~20% of East Asian ancestry.[64]

Note that these results use purely genetic information as input; including additional information such as age and sex often greatly improves the predictions. The coronary disease predictor and the hypothyroidism predictor above achieve AUCs of ~ 0.80 and ~0.78, respectively, when also including age and sex.[6][11]

#### Importance of sample size

PGS predictor performance increases with the dataset sample size available for training. Here illustrated for hypertension, hypothyroidism and type 2 diabetes. The x-axis labels number of cases (i.e. individuals with the disease) present in the training data and uses a logarithmic scale. The entire range is from 1,000 cases up to over 100,000 cases. The numbers of controls (i.e. individuals without the disease) in the training data were much larger than the numbers of cases. These particular predictors were trained using the LASSO algorithm.[16]

The performance of a polygenic predictor is highly dependent on the size of the dataset that is available for analysis and ML training. Recent scientific progress in prediction power relies heavily on the creation and expansion of large biobanks containing data for both genotypes and phenotypes of very many individuals. As of 2021, there exist several biobanks with hundreds of thousands samples, i.e., data entries with both genetic and trait information for each individual (see for instance the incomplete list of biobanks). Some of these projects are expected to reach sample sizes of millions in a few years. These gigantic scientific efforts are a prerequisite for the continued development of accurate polygenic predictors, as well as for a multitude of other types of research in medicine and social sciences. All samples in these (human) biobanks are consensual, volunteering adults who have signed agreements about what information is included, who has access to it and for what purposes it can be used.

Since there is so much information in an individual's DNA, data from many thousands of individuals are required to detect the relevant variants for a specific trait. Exactly how many are required depends very much on the trait in question. Typically, one can loosely speak of three phases in the dependence of predictor performance on the data sample size. First, there is a phase of very rapid increase of predictive power as the sample sizes grow from small — say a few thousands — to medium sizes of several tens of thousands or perhaps even hundreds of thousand samples. Then there is a phase of steady but less dramatic growth until the third phase where the performance levels off and does not change much when increasing the sample size even further. This is the limit of how accurate a polygenic predictor that only uses genetic information can be and is set by the heritability of the specific trait. The sample size required to reach this performance level for a certain trait is determined by the complexity of the underlying genetic architecture and the distribution of genetic variance in the sampled population. This sample size dependence is illustrated in the figure for hypothyroidism, hypertension and type 2 diabetes.

Note again, that current methods to construct polygenic predictors are sensitive to the ancestries present in the data. As of 2021, most available data have been primarily of populations with European ancestry, which is the reason why PGS generally perform better within this ancestry. The construction of more diverse biobanks with successful recruitment from all ancestries is required to rectify this skewed access to and benefits from PGS-based medicine.[45]

## Research on clinical applications

A New England Journal of Medicine Perspective stated[65]

"It is likely that tailoring decisions about prescribing preventive medicines or screening practices will be the main future use of genetic risk scores. If a PRS adds to existing clinical predictors of risk such as the Framingham Risk Score or the Q index for heart disease, it could be incorporated into preventive care as readily as any other biomarker."

"There seems little doubt that interpretation of these scores will become an accepted part of clinical practice in the future..."

The UK National Health Service plans to genotype 5 million individuals and study the incorporation of PRS into standard clinical care.[66]

Commercial entities such as Myriad and Ambry provide polygenic breast cancer risk prediction, in addition to tests for monogenic risk alleles such as BRCA1 and BRCA2.[67] [68]

## Non-predictive uses

In humans, polygenic scores were originally computed in an effort to predict the prevalence and etiology of complex, heritable diseases, which are typically affected by many genetic variants that individually confer a small effect to overall risk. A genome-wide association study (GWAS) of a such a polygenic trait is able to identify these individual genetic loci of small effect in a large enough sample, and various methods of aggregating the results can be used to form a polygenic score.[clarification needed] This score will typically explain at least a few percent of a phenotype's variance, and can therefore be assumed to effectively incorporate a significant fraction of the genetic variants affecting that phenotype. A polygenic score can be used in several different ways: as a lower bound to test whether heritability estimates may be biased; as a measure of genetic overlap of traits (genetic correlation), which might indicate e.g. shared genetic bases for groups of mental disorders; as a means to assess group differences in a trait such as height, or to examine changes in a trait over time due to natural selection indicative of a soft selective sweep (as e.g. for intelligence where the changes in frequency would be too small to detect on each individual hit but not on the overall polygenic score); in Mendelian randomization (assuming no pleiotropy with relevant traits); to detect & control for the presence of genetic confounds in outcomes (e.g. the correlation of schizophrenia with poverty); or to investigate gene–environment interactions and correlations.

## Genetic architecture

It is possible to analyze the specific genetic variants (SNPs) utilized in human complex trait predictors, which can vary from hundreds to as many as thirty thousand. There are now dozens of well-validated PRS, for phenotypes including disease conditions (diabetes, heart disease, cancer) and quantitative traits (height, bone density, biomarkers).[11]

The fraction of SNPs in or near genic regions varies widely by phenotype. For the majority of disease conditions studied, a large amount of the variance is accounted for by SNPs outside of coding regions. The state of these SNPs cannot be determined from exome-sequencing data. This suggests that exome data alone will miss much of the heritability for these traits—i.e., existing PRS cannot be computed from exome data alone.

The fraction of SNPs and of total variance that is in common between pairs of predictors is typically small. This is counter to previous intuitions concerning pleiotropy: it had been assumed that primarily protein-coding genic regions must be responsible for phenotype variation, and since the number of genes is limited (making up at most a few percent of the entire genome) any causal variant would be likely to affect multiple phenotypes or disease risks.[69] Previous reasoning concerning pleiotropy does not take into account the very high dimensionality of genomic information space. Once it is realized that causal variants can be located far from protein-coding genic regions the space of possibilities becomes immensely larger.

Direct analysis of existing PRS shows that the DNA regions used in disease risk predictors seem to be largely disjoint, suggesting that individual genetic disease risks are largely uncorrelated. It seems possible in theory for an individual to be a low-risk outlier in all conditions simultaneously.[11]

Given roughly 10 million common variants between any two individual humans, and typically a few thousand SNPs (in largely disjoint regions) used to capture most of the variance in a phenotype predictor, the dimensionality of the space of individual variation (phenotypes) has been theorized to be on the order of a few thousand.[16]

Another aspect of genetic architecture that was not broadly anticipated is that additive, or linear, models are capable of capturing most of the expected phenotypic variation. Tests for nonlinear effects (e.g., interactions between alleles) have typically found only small effects.[70][16] This approximate linearity also reduces the effects of pleiotropy (interactions between different genetic variants are smaller than expected), and increases confidence that PRS construction is tractable, with considerable improvements in the near term as datasets increase in size.[16]

## Notes

A. ^ Preprint lists AUC for pure PRS while the published version of the paper only lists AUC for PGS combined with age, sex and genotyping array information.

## References

1. ^ a b c Dudbridge F (March 2013). "Power and predictive accuracy of polygenic risk scores". PLOS Genetics. 9 (3): e1003348. doi:10.1371/journal.pgen.1003348. PMC 3605113. PMID 23555274.
2. ^ a b Torkamani A, Wineinger NE, Topol EJ (September 2018). "The personal and clinical utility of polygenic risk scores". Nature Reviews. Genetics. 19 (9): 581–590. doi:10.1038/s41576-018-0018-x. PMID 29789686. S2CID 46893131.
3. ^ Lambert SA, Abraham G, Inouye M (November 2019). "Towards clinical utility of polygenic risk scores". Human Molecular Genetics. 28 (R2): R133–R142. doi:10.1093/hmg/ddz187. PMID 31363735.
4. ^ a b de Vlaming R, Groenen PJ (2015). "The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics". BioMed Research International. 2015: 143712. doi:10.1155/2015/143712. PMC 4529984. PMID 26273586.
5. ^ Lewis CM, Vassos E (November 2017). "Prospects for using risk scores in polygenic medicine". Genome Medicine. 9 (1): 96. doi:10.1186/s13073-017-0489-y. PMC 5683372. PMID 29132412.
6. ^ a b c Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, et al. (September 2018). "Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations". Nature Genetics. 50 (9): 1219–1224. doi:10.1038/s41588-018-0183-z. PMC 6128408. PMID 30104762.
7. ^ Yanes T, Meiser B, Kaur R, Scheepers-Joynt M, McInerny S, Taylor S, et al. (March 2020). "Uptake of polygenic risk information among women at increased risk of breast cancer" (PDF). Clinical Genetics. 97 (3): 492–501. doi:10.1111/cge.13687. PMID 31833054. S2CID 209342044.
8. ^ a b Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, et al. (October 2015). "Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores". American Journal of Human Genetics. 97 (4): 576–92. doi:10.1016/j.ajhg.2015.09.001. PMC 4596916. PMID 26430803.
9. ^ a b Spindel JE, McCouch SR (December 2016). "When more is better: how data sharing would accelerate genomic selection of crop plants". The New Phytologist. 212 (4): 814–826. doi:10.1111/nph.14174. PMID 27716975.
10. ^ Regalado A (8 March 2019). "23andMe thinks polygenic risk scores are ready for the masses, but experts aren't so sure". MIT Technology Review. Retrieved 2020-08-14.
11. Lello L, Raben TG, Yong SY, Tellier LC, Hsu SD (2019-10-25). "Genomic Prediction of 16 Complex Disease Risks Including Heart Attack, Diabetes, Breast and Prostate Cancer". Scientific Reports. 9 (1): 15286. doi:10.1038/s41598-019-51258-x. PMID 31653892. Retrieved 2021-04-12.
12. ^ Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J (July 2017). "10 Years of GWAS Discovery: Biology, Function, and Translation". American Journal of Human Genetics. 101 (1): 5–22. doi:10.1016/j.ajhg.2017.06.005. PMC 5501872. PMID 28686856.
13. ^ a b Spiliopoulou A, Nagy R, Bermingham ML, Huffman JE, Hayward C, Vitart V, et al. (July 2015). "Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models". Human Molecular Genetics. 24 (14): 4167–82. doi:10.1093/hmg/ddv145. PMC 4476450. PMID 25918167.
14. ^ a b "Modern genetics will improve health and usher in "designer" children". The Economist. 2019-11-09. Retrieved 2021-04-12.
15. ^ "Test could predict risk of future heart disease for just £40". The Guardian. 2018-10-08. Retrieved 2021-04-12.
16. Raben TG, Lello L, Widen E, Hsu SD (2021-01-14). "From Genotype to Phenotype: polygenic prediction of complex human traits". arXiv:2101.05870 [q-bio].
17. ^ "Big picture genetic scoring approach reliably predicts heart disease". Science Daily. 2019-06-11. Retrieved 2021-04-12.
18. ^ Weedon MN, McCarthy MI, Hitman G, Walker M, Groves CJ, Zeggini E, et al. (October 2006). "Combining information from common type 2 diabetes risk polymorphisms improves disease prediction". PLOS Medicine. 3 (10): e374. doi:10.1371/journal.pmed.0030374. PMC 1584415. PMID 17020404.
19. ^ Xie C, Xu S (April 1998). "Efficiency of multistage marker-assisted selection in the improvement of multiple quantitative traits". Heredity. 80 ( Pt 4) (3): 489–98. doi:10.1046/j.1365-2540.1998.00308.x. PMID 9618913.
20. ^ a b Meuwissen TH, Hayes BJ, Goddard ME (April 2001). "Prediction of total genetic value using genome-wide dense marker maps". Genetics. 157 (4): 1819–29. PMC 1461589. PMID 11290733.
21. ^ Wray NR, Goddard ME, Visscher PM (October 2007). "Prediction of individual genetic risk to disease from genome-wide association studies". Genome Research. 17 (10): 1520–8. doi:10.1101/gr.6665407. PMC 1987352. PMID 17785532.
22. ^ Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Sklar P (August 2009). "Common polygenic variation contributes to risk of schizophrenia and bipolar disorder". Nature. 460 (7256): 748–52. Bibcode:2009Natur.460..748P. doi:10.1038/nature08185. PMC 3912837. PMID 19571811.
23. ^ James G (2013). An Introduction to Statistical Learning: with Applications in R. Springer. ISBN 978-1461471370.
24. ^ a b Haws DC, Rish I, Teyssedre S, He D, Lozano AC, Kambadur P, et al. (2015-10-06). "Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods". PLOS ONE. 10 (10): e0138903. Bibcode:2015PLoSO..1038903H. doi:10.1371/journal.pone.0138903. PMC 4595020. PMID 26439851.
25. ^ Ware EB, Schmitz LL, Faul J, Gard A, Mitchell C, Smith JA, Zhao W, Weir D, Kardia SL (January 2017). "Heterogeneity in polygenic scores for common human traits". bioRxiv: 106062. doi:10.1101/106062.
26. ^ Vattikuti S, Lee JJ, Chang CC, Hsu SD, Chow CC (2014). "Applying compressed sensing to genome-wide association studies". GigaScience. 3 (1): 10. doi:10.1186/2047-217X-3-10. PMC 4078394. PMID 25002967.
27. ^ Gianola D, Rosa GJ (2015). "One hundred years of statistical developments in animal breeding". Annual Review of Animal Biosciences. 3: 19–56. doi:10.1146/annurev-animal-022114-110733. PMID 25387231.
28. ^ Wünnemann F, Ken Sin L, Langford-Avelar A, Bussell D, Dubé MP, Tardif JC, Lettre G (2019-06-11). "Validation of Genome-Wide Polygenic Risk Scores for Coronary Artery Disease in French Canadians". Circulation: Genomic and Precision Medicine. AHA Journals. Retrieved 2021-04-12.
29. ^ a b Mavaddat N, et al. (2019-01-03). "Polygenic risk scores for prediction of breast cancer and breast cancer subtypes". The American Journal of Human Genetics. 104 (1). doi:10.1016/j.ajhg.2018.11.002. PMID 30554720. Retrieved 2021-04-12.
30. ^ a b Widen E, Raben TG, Lello L, Hsu SD (2021-04-01). "Machine Learning Prediction of Biomarkers from SNPs and of Disease Risk from Biomarkers in the UK Biobank". MedRxiv. preprint. doi:10.1101/2021.04.01.21254711. Retrieved 2021-04-12.
31. ^ a b Treff NR, Eccles J, Marin D, Messick E, Lello L, Gerber J, Jia X, Tellier LC (2020-06-12). "Preimplantation Genetic Testing for Polygenic Disease Relative Risk Reduction: Evaluation of Genomic Index Performance in 11,883 Adult Sibling Pairs". Genes. 11 (6): 648. doi:10.3390/genes11060648. PMID 32545548. Retrieved 2021-04-12.
32. ^ Lello L, Raben TG, Hsu SD (2020-08-06). "Sibling validation of polygenic risk scores and complex trait prediction". Scientific Reports. 10 (1). doi:10.1038/s41598-020-69927-7. PMID 32764582. Retrieved 2021-04-12.
33. ^ Reid NJ, Brockman DG, Leonard CE, Pelletier R, Khera AV (2021-04-02). "Concordance of a High Polygenic Score Among Relatives". Circulation: Genomic and Precision Medicine. AHA Journals. Retrieved 2021-04-12.
34. ^ a b Wray NR, Kemper KE, Hayes BJ, Goddard ME, Visscher PM (April 2019). "Complex Trait Prediction from Genome Data: Contrasting EBV in Livestock to PRS in Humans: Genomic Prediction". Genetics. 211 (4): 1131–1141. doi:10.1534/genetics.119.301859. PMC 6456317. PMID 30967442.
35. ^ Heslot N, Jannink JL, Sorrells ME (January 2015). "Perspectives for Genomic Selection Applications and Research in Plants". Crop Science. 55 (1): 1–12. doi:10.2135/cropsci2014.03.0249. ISSN 0011-183X.
36. ^ Pace J, Yu X, Lübberstedt T (September 2015). "Genomic prediction of seedling root length in maize (Zea mays L.)". The Plant Journal. 83 (5): 903–12. doi:10.1111/tpj.12937. PMID 26189993.
37. ^ Sallam AH, Endelman JB, Jannink JL, Smith KP (2015-03-01). "Assessing Genomic Selection Prediction Accuracy in a Dynamic Barley Breeding Population". The Plant Genome. 8 (1): 0. doi:10.3835/plantgenome2014.05.0020. ISSN 1940-3372. PMID 33228279.
38. ^ Hayr MK, Druet T, Garrick DJ (2016-04-01). "027 Performance of genomic prediction using haplotypes in New Zealand dairy cattle". Journal of Animal Science. 94 (supplement2): 13. doi:10.2527/msasas2016-027. ISSN 1525-3163.
39. ^ Chen L, Vinsky M, Li C (February 2015). "Accuracy of predicting genomic breeding values for carcass merit traits in Angus and Charolais beef cattle". Animal Genetics. 46 (1): 55–9. doi:10.1111/age.12238. PMID 25393962.
40. ^ Liu T, Qu H, Luo C, Shu D, Wang J, Lund MS, Su G (October 2014). "Accuracy of genomic prediction for growth and carcass traits in Chinese triple-yellow chickens". BMC Genetics. 15 (110): 110. doi:10.1186/s12863-014-0110-y. PMC 4201679. PMID 25316160.
41. ^ Lewis CM, Vassos E (May 2020). "Polygenic risk scores: from research tools to clinical instruments". Genome Medicine. 12 (1): 44. doi:10.1186/s13073-020-00742-5. PMC 7236300. PMID 32423490.
42. ^ Duncan L, Shen H, Gelaye B, Meijsen J, Ressler K, Feldman M, et al. (July 2019). "Analysis of polygenic risk score usage and performance in diverse human populations". Nature Communications. 10 (1): 3328. Bibcode:2019NatCo..10.3328D. doi:10.1038/s41467-019-11112-0. PMC 6658471. PMID 31346163.
43. ^ Wang Y, Guo J, Ni G, Yang J, Visscher PM, Yengo L (July 2020). "Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations". Nature Communications. 11 (1): 3865. Bibcode:2020NatCo..11.3865W. doi:10.1038/s41467-020-17719-y. PMC 7395791. PMID 32737319.
44. ^ a b Li Z, Chen J, Yu H, He L, Xu Y, Zhang D, et al. (November 2017). "Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia" (PDF). Nature Genetics. 49 (11): 1576–1583. doi:10.1038/ng.3973. PMID 28991256. S2CID 205355668.
45. ^ a b Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ (April 2019). "Clinical use of current polygenic risk scores may exacerbate health disparities". Nature Genetics. 51 (4): 584–591. doi:10.1038/s41588-019-0379-x. PMC 6563838. PMID 30926966.
46. ^ Zeevi DA, Backenroth D, Hakam-Spector E, et al. (2021-03-26). "Expanded clinical validation of Haploseek for comprehensive preimplantation genetic testing". Genetics in Medicine. doi:10.1038/s41436-021-01145-6. PMID 33772222. Retrieved 2021-04-14.
47. ^ Treff NR, Zimmerman R, Bechor E, Hsu J, Rana B, Jensen J, Li J, Samoilenko A, Mowrey W, Van Alstine J, Leondires M, Miller K, Paganetti E, Lello L, Avery S, Hsu S, Tellier LC (2019-08-01). "Validation of concurrent preimplantation genetic testing for polygenic and monogenic disorders, structural rearrangements, and whole and segmental chromosome aneuploidy with a single universal platform". European Journal of Medical Genetics. 62 (8). doi:10.1016/j.ejmg.2019.04.004. PMID 31026593. Retrieved 2021-04-14.
48. ^ Birney E. "Why using genetic risk scores on embryos is wrong". ewanbirney.com. Retrieved 2020-12-16.
49. ^ Karavani E, Zuk O, Zeevi D, Barzilai N, Stefanis NC, Hatzimanolis A, et al. (November 2019). "Screening Human Embryos for Polygenic Traits Has Limited Utility". Cell. 179 (6): 1424–1435.e8. doi:10.1016/j.cell.2019.10.033. PMC 6957074. PMID 31761530.
50. ^ Lázaro-Muñoz G, Pereira S, Carmi S, Lencz T (October 2020). "Screening embryos for polygenic conditions and traits: ethical considerations for an emerging technology". Genetics in Medicine: 1–3. doi:10.1038/s41436-020-01019-3. PMID 33106616.
51. ^ Treff NR, Eccles J, Lello L, Bechor E, Hsu J, Plunkett K, Zimmerman R, Rana B, Samoilenko A, Hsu S, Tellier LC (2019-12-04). "Utility and First Clinical Application of Screening Embryos for Polygenic Disease Risk Reduction". Frontiers in Endocrinology. 10 (845). doi:10.3389/fendo.2019.00845. PMID 31920964. Retrieved 2021-04-13.
52. ^ Savulescu, Julian; Munday, Sarah (2021-01-18). "Three models for the regulation of polygenic scores in reproduction". Journal of Medical Ethics. doi:10.1136/medethics-2020-106588. PMID 33462079. Retrieved 2021-04-13.
53. ^ Kemper JM, Gyngell C, Savulescu J (2019-08-15). "Subsidizing PGD: The Moral Case for Funding Genetic Selection". Journal of Bioethical Inquiry. 2019 (16): 405–414. doi:10.1007/s11673-019-09932-2. PMID 31418161. Retrieved 2021-04-13.
54. ^ Savulescu J, Kahane G (2016-09-01). "Understanding procreative beneficence". The Oxford Handbook of Reproductive Ethics. doi:10.1093/oxfordhb/9780199981878.013.26. PMID 12058767. Retrieved 2021-04-13.
55. ^ "The Polygenic Score (PGS) Catalog". Polygenic Score (PGS) Catalog. Retrieved 29 April 2020. An open database of polygenic scores and the relevant metadata required for accurate application and evaluation
56. ^ Richardson TG, Harrison S, Hemani G, Davey Smith G (March 2019). "An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome". eLife. 8: e43657. doi:10.7554/eLife.43657. PMC 6400585. PMID 30835202.
57. ^ Selzam S, Krapohl E, von Stumm S, O'Reilly PF, Rimfeld K, Kovas Y, et al. (February 2017). "Predicting educational achievement from DNA". Molecular Psychiatry. 22 (2): 267–272. doi:10.1038/mp.2016.107. PMC 5285461. PMID 27431296.
58. ^ Okbay A, Beauchamp JP, Fontana MA, Lee JJ, Pers TH, Rietveld CA, et al. (May 2016). "Genome-wide association study identifies 74 loci associated with educational attainment". Nature. 533 (7604): 539–42. Bibcode:2016Natur.533..539O. doi:10.1038/nature17671. PMC 4883595. PMID 27225129.
59. ^ Vassos E, Di Forti M, Coleman J, Iyegbe C, Prata D, Euesden J, et al. (March 2017). "An Examination of Polygenic Score Risk Prediction in Individuals With First-Episode Psychosis". Biological Psychiatry. 81 (6): 470–477. doi:10.1016/j.biopsych.2016.06.028. PMID 27765268.
60. ^ Lello L, Avery SG, Tellier L, Vazquez AI, de Los Campos G, Hsu SD (October 2018). "Accurate Genomic Prediction of Human Height". Genetics. 210 (2): 477–497. doi:10.1534/genetics.118.301267. PMC 6216598. PMID 30150289.
61. ^ Savage JE, Jansen PR, Stringer S, Watanabe K, Bryois J, de Leeuw CA, et al. (2018). "Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence". Nature Genetics. 50 (7): 912–19. doi:10.1038/s41588-018-0152-6. PMC 6411041. PMID 29942086.
62. ^ Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, et al. (2018). "Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals". Nature Genetics. 50 (8): 1112–1121. doi:10.1038/s41588-018-0147-3. PMC 6393768. PMID 30038396.
63. ^ Khera A, Chaggin M, Aragam KG, Emdin CA, Klarin D, Haas ME, Roselli C, Natarajan P, Kathiresan S (2017-11-15). "Genome-wide polygenic score to identify a monogenic risk-equivalent for coronary disease". bioRxiv 10.1101/218388.
64. ^ The Schizophrenia Working Group of the Psychiatric Genomics Consortium; Ripke, Stephan (2020-09-12). "Mapping genomic loci prioritises genes and implicates synaptic biology in schizophrenia". MedRxiv. preprint. doi:10.1101/2020.09.12.20192922. Retrieved 2021-04-13.
65. ^ Hunter DJ, Drazen JM (2019-06-20). "Has the Genome Granted Our Wish Yet?". The New England Journal of Medicine. 380 (1): 2391–2393. doi:10.1056/NEJMp1904511. PMID 31091368. Retrieved 2021-04-12.
66. ^ Devlin, Hannah (2019-03-23). "Are genetic tests useful to predict cancer?". The Guardian. Retrieved 2021-04-12.