Jump to content

Paul Christiano (researcher): Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Hstevens86 (talk | contribs)
→‎Career: Fixed punctuation
Tags: Mobile edit Mobile web edit
Hstevens86 (talk | contribs)
→‎Career: Removed incorrect comma
Tags: Mobile edit Mobile web edit
Line 22: Line 22:
At OpenAI, Christiano co-authored the paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing [[reinforcement learning from human feedback]] (RLHF).<ref>{{Cite journal |last1=Christiano |first1=Paul F |last2=Leike |first2=Jan |last3=Brown |first3=Tom |last4=Martic |first4=Miljan |last5=Legg |first5=Shane |last6=Amodei |first6=Dario |date=2017 |title=Deep Reinforcement Learning from Human Preferences |url=https://proceedings.neurips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=30}}</ref><ref>{{Cite journal |last1=Ouyang |first1=Long |last2=Wu |first2=Jeffrey |last3=Jiang |first3=Xu |last4=Almeida |first4=Diogo |last5=Wainwright |first5=Carroll |last6=Mishkin |first6=Pamela |last7=Zhang |first7=Chong |last8=Agarwal |first8=Sandhini |last9=Slama |first9=Katarina |last10=Ray |first10=Alex |last11=Schulman |first11=John |last12=Hilton |first12=Jacob |last13=Kelton |first13=Fraser |last14=Miller |first14=Luke |last15=Simens |first15=Maddie |date=2022-12-06 |title=Training language models to follow instructions with human feedback |url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=35 |pages=27730–27744|arxiv=2203.02155 }}</ref> Other works such as "AI safety via debate" (2018) focus on the problem of ''scalable oversight'' – supervising AIs in domains where humans would have difficulty judging output quality.<ref>{{Cite arXiv |last1=Irving |first1=G. |last2=Christiano |first2=P. |last3=Amodei |first3=Dario |date=2018-05-02 |title=AI safety via debate |class=stat.ML |eprint=1805.00899 }}</ref><ref>{{Cite arXiv |last1=Wu |first1=Jeff |last2=Ouyang |first2=Long |last3=Ziegler |first3=Daniel M. |last4=Stiennon |first4=Nissan |last5=Lowe |first5=Ryan |last6=Leike |first6=J. |last7=Christiano |first7=P. |date=2021-09-22 |title=Recursively Summarizing Books with Human Feedback |class=cs.CL |eprint=2109.10862}}</ref><ref>{{Cite arXiv |last1=Christiano |first1=P. |last2=Shlegeris |first2=Buck |last3=Amodei |first3=Dario |date=2018-10-19 |title=Supervising strong learners by amplifying weak experts |class=cs.LG |eprint=1810.08575}}</ref>
At OpenAI, Christiano co-authored the paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing [[reinforcement learning from human feedback]] (RLHF).<ref>{{Cite journal |last1=Christiano |first1=Paul F |last2=Leike |first2=Jan |last3=Brown |first3=Tom |last4=Martic |first4=Miljan |last5=Legg |first5=Shane |last6=Amodei |first6=Dario |date=2017 |title=Deep Reinforcement Learning from Human Preferences |url=https://proceedings.neurips.cc/paper_files/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=30}}</ref><ref>{{Cite journal |last1=Ouyang |first1=Long |last2=Wu |first2=Jeffrey |last3=Jiang |first3=Xu |last4=Almeida |first4=Diogo |last5=Wainwright |first5=Carroll |last6=Mishkin |first6=Pamela |last7=Zhang |first7=Chong |last8=Agarwal |first8=Sandhini |last9=Slama |first9=Katarina |last10=Ray |first10=Alex |last11=Schulman |first11=John |last12=Hilton |first12=Jacob |last13=Kelton |first13=Fraser |last14=Miller |first14=Luke |last15=Simens |first15=Maddie |date=2022-12-06 |title=Training language models to follow instructions with human feedback |url=https://proceedings.neurips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html |journal=Advances in Neural Information Processing Systems |language=en |volume=35 |pages=27730–27744|arxiv=2203.02155 }}</ref> Other works such as "AI safety via debate" (2018) focus on the problem of ''scalable oversight'' – supervising AIs in domains where humans would have difficulty judging output quality.<ref>{{Cite arXiv |last1=Irving |first1=G. |last2=Christiano |first2=P. |last3=Amodei |first3=Dario |date=2018-05-02 |title=AI safety via debate |class=stat.ML |eprint=1805.00899 }}</ref><ref>{{Cite arXiv |last1=Wu |first1=Jeff |last2=Ouyang |first2=Long |last3=Ziegler |first3=Daniel M. |last4=Stiennon |first4=Nissan |last5=Lowe |first5=Ryan |last6=Leike |first6=J. |last7=Christiano |first7=P. |date=2021-09-22 |title=Recursively Summarizing Books with Human Feedback |class=cs.CL |eprint=2109.10862}}</ref><ref>{{Cite arXiv |last1=Christiano |first1=P. |last2=Shlegeris |first2=Buck |last3=Amodei |first3=Dario |date=2018-10-19 |title=Supervising strong learners by amplifying weak experts |class=cs.LG |eprint=1810.08575}}</ref>


Christiano left OpenAI in 2021 to work on more conceptual and theoretical issues in AI alignment, and subsequently founded the [[Alignment Research Center]] to focus on this area.<ref name=":0">{{Cite web |title=A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns |url=https://fortune.com/2023/05/03/openai-ex-safety-researcher-warns-ai-destroy-humanity/ |access-date=2023-06-04 |website=Fortune |language=en}}</ref> One subject of study is the problem of ''eliciting latent knowledge'' from advanced machine learning models''.''<ref>{{Cite arXiv|last1=Burns |first1=Collin |last2=Ye |first2=Haotian |last3=Klein |first3=Dan |last4=Steinhardt |first4=Jacob |date=2022 |title=Discovering Latent Knowledge in Language Models Without Supervision |class=cs.CL |eprint=2212.03827}}</ref><ref>{{Cite web |last1=Christiano |first1=Paul |last2=Cotra |first2=Ajeya |last3=Xu |first3=Mark |date=December 2021 |title=Eliciting Latent Knowledge: How to tell if your eyes deceive you |url=https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit?usp=embed_facebook |access-date=2023-04-16 |website=Google Docs |publisher=Alignment Research Center |language=en}}</ref>
Christiano left OpenAI in 2021 to work on more conceptual and theoretical issues in AI alignment and subsequently founded the [[Alignment Research Center]] to focus on this area.<ref name=":0">{{Cite web |title=A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns |url=https://fortune.com/2023/05/03/openai-ex-safety-researcher-warns-ai-destroy-humanity/ |access-date=2023-06-04 |website=Fortune |language=en}}</ref> One subject of study is the problem of ''eliciting latent knowledge'' from advanced machine learning models''.''<ref>{{Cite arXiv|last1=Burns |first1=Collin |last2=Ye |first2=Haotian |last3=Klein |first3=Dan |last4=Steinhardt |first4=Jacob |date=2022 |title=Discovering Latent Knowledge in Language Models Without Supervision |class=cs.CL |eprint=2212.03827}}</ref><ref>{{Cite web |last1=Christiano |first1=Paul |last2=Cotra |first2=Ajeya |last3=Xu |first3=Mark |date=December 2021 |title=Eliciting Latent Knowledge: How to tell if your eyes deceive you |url=https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit?usp=embed_facebook |access-date=2023-04-16 |website=Google Docs |publisher=Alignment Research Center |language=en}}</ref>


Christiano is known for his views on the potential risks of advanced AI, stating in a 2023 interview that there is a “10–20% chance of AI takeover, [with] many [or] most humans dead.” He also conjectured a “50/50 chance of doom shortly after you have AI systems that are human level.”<ref>{{Cite web |last=Nolan |first=Beatrice |title=Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom' |url=https://www.businessinsider.com/openai-researcher-ai-doom-50-chatgpt-2023-5 |access-date=2023-06-04 |website=Business Insider |language=en-US}}</ref><ref name=":0" />
Christiano is known for his views on the potential risks of advanced AI, stating in a 2023 interview that there is a “10–20% chance of AI takeover, [with] many [or] most humans dead.” He also conjectured a “50/50 chance of doom shortly after you have AI systems that are human level.”<ref>{{Cite web |last=Nolan |first=Beatrice |title=Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom' |url=https://www.businessinsider.com/openai-researcher-ai-doom-50-chatgpt-2023-5 |access-date=2023-06-04 |website=Business Insider |language=en-US}}</ref><ref name=":0" />

Revision as of 16:56, 9 November 2023

Paul Christiano
Alma mater
Known for
Scientific career
Institutions
Websitepaulfchristiano.com

Paul Christiano is an American researcher in the field of artificial intelligence (AI), with a specific focus on AI alignment, which is the subfield of AI safety research that aims to steer AI systems toward human interests.[1] He formerly led the language model alignment team at OpenAI and is now the head of the non-profit Alignment Research Center, which works on theoretical AI alignment and evaluations of machine learning models.[2]

Education

Christiano won a silver medal in the international mathematics olympiad in 2008.[3] In 2012, Christiano graduated from the Massachusetts Institute of Technology (MIT) with a degree in mathematics.[4] At MIT, he researched data structures, quantum cryptography, and combinatorial optimization.[5] He then went on to complete a PhD at the University of California, Berkeley.[6]

Career

At OpenAI, Christiano co-authored the paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing reinforcement learning from human feedback (RLHF).[7][8] Other works such as "AI safety via debate" (2018) focus on the problem of scalable oversight – supervising AIs in domains where humans would have difficulty judging output quality.[9][10][11]

Christiano left OpenAI in 2021 to work on more conceptual and theoretical issues in AI alignment and subsequently founded the Alignment Research Center to focus on this area.[1] One subject of study is the problem of eliciting latent knowledge from advanced machine learning models.[12][13]

Christiano is known for his views on the potential risks of advanced AI, stating in a 2023 interview that there is a “10–20% chance of AI takeover, [with] many [or] most humans dead.” He also conjectured a “50/50 chance of doom shortly after you have AI systems that are human level.”[14][1]

References

  1. ^ a b c "A.I. has a '10 or 20% chance' of conquering humanity, former OpenAI safety researcher warns". Fortune. Retrieved 2023-06-04.
  2. ^ Piper, Kelsey (2023-03-29). "How to test what an AI model can — and shouldn't — do". Vox. Retrieved 2023-08-04.
  3. ^ "IMO 2008".
  4. ^ "Paul Christiano".
  5. ^ "About the Authors: Theory of Computing: An Open Access Electronic Journal in Theoretical Computer Science".
  6. ^ FHI, Future of Humanity Institute-. "Future of Humanity Institute". The Future of Humanity Institute. Retrieved 2023-08-04.
  7. ^ Christiano, Paul F; Leike, Jan; Brown, Tom; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep Reinforcement Learning from Human Preferences". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc.
  8. ^ Ouyang, Long; Wu, Jeffrey; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (2022-12-06). "Training language models to follow instructions with human feedback". Advances in Neural Information Processing Systems. 35: 27730–27744. arXiv:2203.02155.
  9. ^ Irving, G.; Christiano, P.; Amodei, Dario (2018-05-02). "AI safety via debate". arXiv:1805.00899 [stat.ML].
  10. ^ Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nissan; Lowe, Ryan; Leike, J.; Christiano, P. (2021-09-22). "Recursively Summarizing Books with Human Feedback". arXiv:2109.10862 [cs.CL].
  11. ^ Christiano, P.; Shlegeris, Buck; Amodei, Dario (2018-10-19). "Supervising strong learners by amplifying weak experts". arXiv:1810.08575 [cs.LG].
  12. ^ Burns, Collin; Ye, Haotian; Klein, Dan; Steinhardt, Jacob (2022). "Discovering Latent Knowledge in Language Models Without Supervision". arXiv:2212.03827 [cs.CL].
  13. ^ Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you". Google Docs. Alignment Research Center. Retrieved 2023-04-16.
  14. ^ Nolan, Beatrice. "Ex-OpenAI researcher says there's a 50% chance AI development could end in 'doom'". Business Insider. Retrieved 2023-06-04.

External links