Jump to content

Paul Christiano (researcher)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Alenoach (talk | contribs) at 09:36, 28 July 2023 (Link). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Paul Christiano
Alma mater
Known for
Scientific career
Institutions
Websitepaulfchristiano.com

Paul Christiano is an American researcher in the field of artificial intelligence (AI), with a specific focus on AI alignment, which is the subfield of AI safety research that aims to steer AI systems toward human interests.[1] He formerly led the language model alignment team at OpenAI and is now the head of the non-profit Alignment Research Center, which works on theoretical AI alignment and evaluations of machine learning models.

Education

In 2012, Christiano graduated from MIT with a degree in mathematics.[2] At MIT, he researched data structures, quantum cryptography, and combinatorial optimization.[3]

Career

At OpenAI, Christiano's co-authored the paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing reinforcement learning from human feedback (RLHF).[4][5] This technique, used for training ChatGPT and similar language models, allows models to learn from subjective human preferences, rather than goal functions that may be poor proxies of human interests.[6][7] Other works such as "AI safety via debate" (2018) focus on the problem of scalable oversight – supervising AIs in domains where humans would have difficulty judging output quality.[8][9][10]

Christiano left OpenAI in 2021 to work on more conceptual and theoretical issues in AI alignment, and subsequently founded the Alignment Research Center to focus on this area.[1] One subject of study is the problem of eliciting latent knowledge from advanced machine learning models.[11][12]

Christiano is known for his views on the potential risks of advanced AI, stating in a 2023 interview that there is a "10–20% chance of AI takeover, [with] many [or] most humans dead". He also conjectured a "50/50 chance of doom shortly after you have AI systems that are human level".[13][1]

References

  1. ^ a b c "A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns". Fortune. Retrieved 2023-06-04.
  2. ^ "Paul Christiano".
  3. ^ "About the Authors: Theory of Computing: An Open Access Electronic Journal in Theoretical Computer Science".
  4. ^ Christiano, Paul F; Leike, Jan; Brown, Tom; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep Reinforcement Learning from Human Preferences". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
  5. ^ Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (2022-12-06). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744. arXiv:2203.02155.
  6. ^ "Learning from human preferences". openai.com. Retrieved 2023-06-04.
  7. ^ "How reinforcement learning with human feedback is unlocking the power of generative AI". VentureBeat. 2023-04-23. Retrieved 2023-06-04.
  8. ^ Irving, G.; Christiano, P.; Amodei, Dario (2018-05-02). "AI safety via debate". arXiv:1805.00899 [stat.ML].
  9. ^ Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nissan; Lowe, Ryan; Leike, J.; Christiano, P. (2021-09-22). "Recursively Summarizing Books with Human Feedback". arXiv:2109.10862 [cs.CL].
  10. ^ Christiano, P.; Shlegeris, Buck; Amodei, Dario (2018-10-19). "Supervising strong learners by amplifying weak experts". arXiv:1810.08575 [cs.LG].
  11. ^ Burns, Collin; Ye, Haotian; Klein, Dan; Steinhardt, Jacob (2022). "Discovering Latent Knowledge in Language Models Without Supervision". arXiv:2212.03827 [cs.CL].
  12. ^ Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you". Google Docs. Alignment Research Center. Retrieved 2023-04-16.
  13. ^ Nolan, Beatrice. "Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom'". Business Insider. Retrieved 2023-06-04.