Paul Christiano (researcher)

Paul Christiano
Paul Christiano
Alma mater	Massachusetts Institute of Technology (BS); University of California, Berkeley (PhD);
Known for	AI alignment; Reinforcement learning from human feedback;
	Scientific career
Institutions	OpenAI; Alignment Research Center;
Website	paulfchristiano.com

Paul Christiano is an American researcher in the field of artificial intelligence (AI), with a specific focus on AI alignment, which is the subfield of AI safety research that aims to steer AI systems toward human interests.^[1] He formerly led the language model alignment team at OpenAI and is now the head of the non-profit Alignment Research Center, which works on theoretical AI alignment and evaluations of machine learning models.

Education

In 2012, Christiano graduated from MIT with a degree in mathematics.^[2] At MIT, he researched data structures, quantum cryptography, and combinatorial optimization.^[3]

Career

At OpenAI, Christiano's co-authored the paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing reinforcement learning from human feedback (RLHF).^[4]^[5] This technique, used for training ChatGPT and similar language models, allows models to learn from subjective human preferences, rather than goal functions that may be poor proxies of human interests.^[6]^[7] Other works such as "AI safety via debate" (2018) focus on the problem of scalable oversight – supervising AIs in domains where humans would have difficulty judging output quality.^[8]^[9]^[10]

Christiano left OpenAI in 2021 to work on more conceptual and theoretical issues in AI alignment, and subsequently founded the Alignment Research Center to focus on this area.^[1] One subject of study is the problem of eliciting latent knowledge from advanced machine learning models.^[11]^[12]

Christiano is known for his views on the potential risks of advanced AI, stating in a 2023 interview that there is a "10–20% chance of AI takeover, [with] many [or] most humans dead". He also conjectured a "50/50 chance of doom shortly after you have AI systems that are human level".^[13]^[1]

References

^ ^a ^b ^c "A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns". Fortune. Retrieved 2023-06-04.
^ "Paul Christiano".
^ "About the Authors: Theory of Computing: An Open Access Electronic Journal in Theoretical Computer Science".
^ Christiano, Paul F; Leike, Jan; Brown, Tom; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep Reinforcement Learning from Human Preferences". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
^ Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (2022-12-06). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744. arXiv:2203.02155.
^ "Learning from human preferences". openai.com. Retrieved 2023-06-04.
^ "How reinforcement learning with human feedback is unlocking the power of generative AI". VentureBeat. 2023-04-23. Retrieved 2023-06-04.
^ Irving, G.; Christiano, P.; Amodei, Dario (2018-05-02). "AI safety via debate". arXiv:1805.00899 [stat.ML].
^ Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nissan; Lowe, Ryan; Leike, J.; Christiano, P. (2021-09-22). "Recursively Summarizing Books with Human Feedback". arXiv:2109.10862 [cs.CL].
^ Christiano, P.; Shlegeris, Buck; Amodei, Dario (2018-10-19). "Supervising strong learners by amplifying weak experts". arXiv:1810.08575 [cs.LG].
^ Burns, Collin; Ye, Haotian; Klein, Dan; Steinhardt, Jacob (2022). "Discovering Latent Knowledge in Language Models Without Supervision". arXiv:2212.03827 [cs.CL].
^ Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you". Google Docs. Alignment Research Center. Retrieved 2023-04-16.
^ Nolan, Beatrice. "Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom'". Business Insider. Retrieved 2023-06-04.

External links

Personal website

[:0-1] "A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns". Fortune. Retrieved 2023-06-04.

[2] "Paul Christiano".

[3] "About the Authors: Theory of Computing: An Open Access Electronic Journal in Theoretical Computer Science".

[4] Christiano, Paul F; Leike, Jan; Brown, Tom; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep Reinforcement Learning from Human Preferences". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.

[5] Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (2022-12-06). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744. arXiv:2203.02155.

[6] "Learning from human preferences". openai.com. Retrieved 2023-06-04.

[7] "How reinforcement learning with human feedback is unlocking the power of generative AI". VentureBeat. 2023-04-23. Retrieved 2023-06-04.

[8] Irving, G.; Christiano, P.; Amodei, Dario (2018-05-02). "AI safety via debate". arXiv:1805.00899 [stat.ML].

[9] Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nissan; Lowe, Ryan; Leike, J.; Christiano, P. (2021-09-22). "Recursively Summarizing Books with Human Feedback". arXiv:2109.10862 [cs.CL].

[10] Christiano, P.; Shlegeris, Buck; Amodei, Dario (2018-10-19). "Supervising strong learners by amplifying weak experts". arXiv:1810.08575 [cs.LG].

[11] Burns, Collin; Ye, Haotian; Klein, Dan; Steinhardt, Jacob (2022). "Discovering Latent Knowledge in Language Models Without Supervision". arXiv:2212.03827 [cs.CL].

[12] Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you". Google Docs. Alignment Research Center. Retrieved 2023-04-16.

[13] Nolan, Beatrice. "Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom'". Business Insider. Retrieved 2023-06-04.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]