Jump to content

AlphaZero: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Re-ordered three games in intro sentence to be parallel order to what follows. Added comment that black plays first in shogi. robt_a_morris@yahoo.com
m typos
Line 1: Line 1:
'''AlphaZero''' is a computer program developed by [[artificial intelligence]] research company [[DeepMind]] to master the games of [[chess]],[[shogi]] and [[Go(game)|go]]. The [[algorithm]] uses an approach similar to [[AlphaGo Zero]]. On December 5, 2017, the DeepMind team released a [[preprint]] introducing AlphaZero, which within 24 hours achieved a superhuman level of play in these three games by defeating world-champion programs, [[Stockfish (chess)|Stockfish]], [[Elmo (shogi engine)|elmo]], and the 3-day version of AlphaGo Zero. In each case it made use of custom [[tensor processing unit]]s (TPUs) that the Google programs were optimized to use.<ref name=preprint>{{Cite arXiv|author-link1=David Silver (programmer)|first1=David|last1= Silver|first2=Thomas|last2= Hubert|first3= Julian|last3=Schrittwieser|first4= Ioannis|last4=Antonoglou |first5= Matthew|last5= Lai|first6= Arthur|last6= Guez|first7= Marc|last7= Lanctot|first8= Laurent|last8= Sifre|first9= Dharshan|last9= Kumaran|authorlink9=Dharshan Kumaran|first10= Thore|last10= Graepel|first11= Timothy|last11= Lillicrap|first12= Karen|last12= Simonyan|first13=Demis |last13=Hassabis|author-link13=Demis Hassabis |eprint=1712.01815|title=Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm|class=cs.AI|date=5 December 2017}}</ref> AlphaZero was trained solely via "self-play" using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the [[neural network]]s, all in [[parallel computing|parallel]], with no access to [[Chess opening book|opening books]] or [[Endgame tablebase|endgame tables]]. After just four hours of training, DeepMind estimated AlphaZero was playing at a higher [[Elo rating]] than Stockfish 8; after 9 hours of training, the algorithm decisively defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws).<ref name="preprint"/><ref name=telegraph>{{Cite news|url=https://www.telegraph.co.uk/science/2017/12/06/entire-human-chess-knowledge-learned-surpassed-deepminds-alphazero/|title=Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours|last=Knapton|first=Sarah|date=6 December 2017|publisher=[[Telegraph.co.uk]]|access-date=6 December 2017|last2=Watson|first2=Leon|language=en-GB}}</ref><ref>{{Cite news|first=James|last= Vincent|url=https://www.theverge.com/2017/12/6/16741106/deepmind-ai-chess-alphazero-shogi-go|title=DeepMind’s AI became a superhuman chess player in a few hours, just for fun|publisher=[[The Verge]]|date=6 December 2017|access-date=6 December 2017}}</ref> The trained algorithm played on a single machine with four TPUs. DeepMind's paper on AlphaZero was published in the journal ''[[Science (journal)|Science]]'' on 7 December 2018.<ref name="Science20181207">{{Cite journal|first1 = David|last1 = Silver|author-link1=David Silver (programmer)|first2 =Thomas |last2 = Hubert|author-link2=|first3 = Julian |last3 =Schrittwieser|first4 = Ioannis |last4 = Antonoglou|first5 = Matthew |last5 = Lai|first6 =Arthur |last6 = Guez|first7 = Marc |last7 = Lanctot|first8 = Laurent |last8 = Sifre|first9 = Dharshan |last9 = Kumaran|first10= Thore |last10= Graepel|first11= Timothy |last11= Lillicrap|first12=Karen |last12= Simonyan|first13= Demis |last13= Hassabis|author-link13=Demis Hassabis|title = A general reinforcement learning algorithm that masters chess, shogi, and go through self-play|url = http://science.sciencemag.org/content/362/6419/1140|journal = [[Science (journal)|Science]]| issn= |pages = 1140-1144|volume = 362|issue = 6419|doi = 10.1126/science.aar6404|pmid = |date= 7 December 2018|bibcode =|accessdate=7 December 2018}}</ref>
'''AlphaZero''' is a computer program developed by [[artificial intelligence]] research company [[DeepMind]] to master the games of [[chess]], [[shogi]] and [[Go (game)|go]]. The [[algorithm]] uses an approach similar to [[AlphaGo Zero]]. On December 5, 2017, the DeepMind team released a [[preprint]] introducing AlphaZero, which within 24 hours achieved a superhuman level of play in these three games by defeating world-champion programs, [[Stockfish (chess)|Stockfish]], [[Elmo (shogi engine)|elmo]], and the 3-day version of AlphaGo Zero. In each case it made use of custom [[tensor processing unit]]s (TPUs) that the Google programs were optimized to use.<ref name=preprint>{{Cite arXiv|author-link1=David Silver (programmer)|first1=David|last1= Silver|first2=Thomas|last2= Hubert|first3= Julian|last3=Schrittwieser|first4= Ioannis|last4=Antonoglou |first5= Matthew|last5= Lai|first6= Arthur|last6= Guez|first7= Marc|last7= Lanctot|first8= Laurent|last8= Sifre|first9= Dharshan|last9= Kumaran|authorlink9=Dharshan Kumaran|first10= Thore|last10= Graepel|first11= Timothy|last11= Lillicrap|first12= Karen|last12= Simonyan|first13=Demis |last13=Hassabis|author-link13=Demis Hassabis |eprint=1712.01815|title=Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm|class=cs.AI|date=5 December 2017}}</ref> AlphaZero was trained solely via "self-play" using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the [[neural network]]s, all in [[parallel computing|parallel]], with no access to [[Chess opening book|opening books]] or [[Endgame tablebase|endgame tables]]. After just four hours of training, DeepMind estimated AlphaZero was playing at a higher [[Elo rating]] than Stockfish 8; after 9 hours of training, the algorithm decisively defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws).<ref name="preprint"/><ref name=telegraph>{{Cite news|url=https://www.telegraph.co.uk/science/2017/12/06/entire-human-chess-knowledge-learned-surpassed-deepminds-alphazero/|title=Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours|last=Knapton|first=Sarah|date=6 December 2017|publisher=[[Telegraph.co.uk]]|access-date=6 December 2017|last2=Watson|first2=Leon|language=en-GB}}</ref><ref>{{Cite news|first=James|last= Vincent|url=https://www.theverge.com/2017/12/6/16741106/deepmind-ai-chess-alphazero-shogi-go|title=DeepMind’s AI became a superhuman chess player in a few hours, just for fun|publisher=[[The Verge]]|date=6 December 2017|access-date=6 December 2017}}</ref> The trained algorithm played on a single machine with four TPUs. DeepMind's paper on AlphaZero was published in the journal ''[[Science (journal)|Science]]'' on 7 December 2018.<ref name="Science20181207">{{Cite journal|first1 = David|last1 = Silver|author-link1=David Silver (programmer)|first2 =Thomas |last2 = Hubert|author-link2=|first3 = Julian |last3 =Schrittwieser|first4 = Ioannis |last4 = Antonoglou|first5 = Matthew |last5 = Lai|first6 =Arthur |last6 = Guez|first7 = Marc |last7 = Lanctot|first8 = Laurent |last8 = Sifre|first9 = Dharshan |last9 = Kumaran|first10= Thore |last10= Graepel|first11= Timothy |last11= Lillicrap|first12=Karen |last12= Simonyan|first13= Demis |last13= Hassabis|author-link13=Demis Hassabis|title = A general reinforcement learning algorithm that masters chess, shogi, and go through self-play|url = http://science.sciencemag.org/content/362/6419/1140|journal = [[Science (journal)|Science]]| issn= |pages = 1140-1144|volume = 362|issue = 6419|doi = 10.1126/science.aar6404|pmid = |date= 7 December 2018|bibcode =|accessdate=7 December 2018}}</ref>


==Relation to AlphaGo Zero==
==Relation to AlphaGo Zero==

Revision as of 20:08, 13 April 2019

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. The algorithm uses an approach similar to AlphaGo Zero. On December 5, 2017, the DeepMind team released a preprint introducing AlphaZero, which within 24 hours achieved a superhuman level of play in these three games by defeating world-champion programs, Stockfish, elmo, and the 3-day version of AlphaGo Zero. In each case it made use of custom tensor processing units (TPUs) that the Google programs were optimized to use.[1] AlphaZero was trained solely via "self-play" using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks, all in parallel, with no access to opening books or endgame tables. After just four hours of training, DeepMind estimated AlphaZero was playing at a higher Elo rating than Stockfish 8; after 9 hours of training, the algorithm decisively defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws).[1][2][3] The trained algorithm played on a single machine with four TPUs. DeepMind's paper on AlphaZero was published in the journal Science on 7 December 2018.[4]

Relation to AlphaGo Zero

AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ) algorithm, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include:[1]

  • AZ has hard-coded rules for setting search hyperparameters.
  • The neural network is now updated continually.
  • Go (unlike chess) is symmetric under certain reflections and rotations; AlphaGo Zero was programmed to take advantage of these symmetries. AlphaZero is not.
  • Chess can end in a draw unlike Go; therefore AlphaZero can take into account the possibility of a drawn game.

AlphaZero vs. Stockfish and elmo

Comparing Monte Carlo tree search searches, AlphaZero searches just 80,000 positions per second in chess and 40,000 in shogi, compared to 70 million for Stockfish and 35 million for elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variation.[1]

Training

AlphaZero was trained solely via self-play, using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks. In parallel, the in-training AlphaZero was periodically matched against its benchmark (Stockfish, elmo, or AlphaGo Zero) in brief one-second-per-move games to determine how well the training was progressing. DeepMind judged that AlphaZero's performance exceeded the benchmark after around four hours of training for Stockfish, two hours for elmo, and eight hours for AlphaGo Zero.[1]

Preliminary results

Outcome

Chess

In AlphaZero's chess tournament against Stockfish 8 (2016 TCEC world champion), each program was given one minute's worth of thinking time per move. Stockfish was allocated 64 threads and a hash size of 1 GB,[1] a setting that Stockfish's Tord Romstad later criticized as suboptimal.[5][note 1] AlphaZero was trained on chess for a total of nine hours before the tournament. During the tournament, AlphaZero ran on a single machine with four application-specific TPUs. In 100 games from the normal starting position, AlphaZero won 25 games as White, won 3 as Black, and drew the remaining 72.[7] In a series of twelve 100-game matches (of unspecified time or resource constraints) against Stockfish starting from the 12 most popular human openings, AlphaZero won 290, drew 886 and lost 24.[1]

Shogi

AlphaZero was trained on shogi for a total of two hours before the tournament. In 100 shogi games against elmo (World Computer Shogi Championship 27 summer 2017 tournament version with YaneuraOu 4.73 search), AlphaZero won ninety times, lost eight times and drew twice.[7] As in the chess games, each program got one minute per move, and elmo was given 64 threads and a hash size of 1 GB.[1]

Go

After 34 hours of self-learning of Go and against AlphaGo Zero, AlphaZero won 60 games and lost 40.[1][7]

Analysis

DeepMind stated in its preprint that "The game of chess represented the pinnacle of AI research over several decades. State-of-the-art programs are based on powerful engines that search many millions of positions, leveraging handcrafted domain expertise and sophisticated domain adaptations. AlphaZero is a generic reinforcement learning algorithm – originally devised for the game of go – that achieved superior results within a few hours, searching a thousand times fewer positions, given no domain knowledge except the rules."[1] DeepMind's Demis Hassabis, a chess player himself, called AlphaZero's play style "alien": It sometimes wins by offering counterintuitive sacrifices, like offering up a queen and bishop to exploit a positional advantage. "It's like chess from another dimension."[8]

Given the difficulty in chess of forcing a win against a strong opponent, the 28–72–0 result is a significant margin of victory. However, some grandmasters, such as Hikaru Nakamura and Komodo developer Larry Kaufman, downplayed AlphaZero's victory, arguing that the match would have been closer if the programs had access to an opening database (since Stockfish was optimized for that scenario).[9] Romstad additionally pointed out that Stockfish is not optimized for rigidly fixed-time moves and the version used is a year old.[5][10]

Similarly, some shogi observers argued that the elmo hash size was too low, that the resignation settings and the "EnteringKingRule" settings (cf. shogi § Entering King) may have been inappropriate, and that elmo is already obsolete compared with newer programs.[11][12]

Reaction and criticism

Papers headlined that the chess training took only four hours: "It was managed in little more than the time between breakfast and lunch."[2][13] Wired hyped AlphaZero as "the first multi-skilled AI board-game champ".[14] AI expert Joanna Bryson noted that Google's "knack for good publicity" was putting it in a strong position against challengers. "It's not only about hiring the best programmers. It's also very political, as it helps make Google as strong as possible when negotiating with governments and regulators looking at the AI sector."[7]

Human chess grandmasters were very impressed by AlphaZero. Danish grandmaster Peter Heine Nielsen likened AlphaZero's play to that of a superior alien species.[7] Norwegian grandmaster Jon Ludvig Hammer characterized AlphaZero's play as "insane attacking chess" with profound positional understanding.[2] Former champion Garry Kasparov said "It's a remarkable achievement, even if we should have expected it after AlphaGo."[9][15]

Grandmaster Hikaru Nakamura was less impressed, and stated "I don't necessarily put a lot of credibility in the results simply because my understanding is that AlphaZero is basically using the Google supercomputer and Stockfish doesn't run on that hardware; Stockfish was basically running on what would be my laptop. If you wanna have a match that's comparable you have to have Stockfish running on a supercomputer as well."[6]

Top US correspondence chess player Wolff Morrow was also unimpressed, claiming that AlphaZero would probably not make the semifinals of a fair competition such as TCEC where all engines play on equal hardware. Morrow further stated that although he might not be able to beat AlphaZero if AlphaZero played drawish openings such as the Petroff Defence, AlphaZero would not be able to beat him in a correspondence chess game either.[16]

Motohiro Isozaki, the author of YaneuraOu, noted that although AlphaZero did comprehensively beat elmo, the rating of AlphaZero in shogi stopped growing at a point which is at most 100~200 higher than elmo. This gap is not that high, and elmo and other shogi software should be able to catch up in 1–2 years.[17]

Final results

DeepMind addressed many of the criticisms in their final version of the paper, published in December 2018 in Science.[4] They further clarified that AlphaZero was not running on a supercomputer; it was trained using 5,000 tensor processing units (TPUs), but only ran on four TPUs and a 44-core CPU in its matches.[18]

Chess

In the final results, Stockfish ran under the same conditions as in the TCEC superfinal: 44 CPU cores, Syzygy endgame tablebases, and a 32GB hash size. Instead of a fixed time control of one move per minute, both engines were given 3 hours plus 15 seconds per move to finish the game. The version of Stockfish used was version 8. AlphaZero won with a score of 155 wins to 6 losses, with the rest drawn. DeepMind also played a series of games using the TCEC opening positions. AlphaZero won 95 out of the 100 mini-matches from these positions.

Shogi

Similar to Stockfish, Elmo ran under the same conditions as in the 2017 CSA championship. The version of Elmo used was WCSC27 in combination with YaneuraOu 2017 Early KPPT 4.79 64AVX2 TOURNAMENT. Elmo operated on the same hardware as Stockfish: 44 CPU cores and a 32GB hash size. AlphaZero won 98.2% of games when playing black (which plays first in shogi) and 91.2% overall.

Reactions and criticisms

Human grandmasters were generally impressed with AlphaZero's games against Stockfish.[19] Former world champion Garry Kasparov said it was a pleasure to watch AlphaZero play, especially since it plays the same open and dynamic style as him.[20][21]

Reactions from the computer chess community were more muted.[citation needed] Komodo developer Mark Lefler called it a "pretty amazing achievement", but also pointed out that the data is old, since Stockfish has gained a lot in strength in the months since January 2018 (when Stockfish 8 was released). Fellow developer Larry Kaufman went further and claimed that AlphaZero would probably lose a match against the latest version of Stockfish, Stockfish 10, under TCEC conditions. Kaufman argued that the only advantage of neural network–based engines was that they used a GPU, so if one doesn't care about power consumption (e.g. in an equal-hardware contest where both engines have access to the same CPU and GPU) then anything the GPU achieves is "free". Based on this, he stated that the strongest engine is likely to be a hybrid that utilizes both neural networks and standard alpha–beta search.[22]

See also

Notes

  1. ^ Stockfish developer Tord Romstad responded with

    The match results by themselves are not particularly meaningful because of the rather strange choice of time controls and Stockfish parameter settings: The games were played at a fixed time of 1 minute/move, which means that Stockfish has no use of its time management heuristics (lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move; at a fixed time per move, the strength will suffer significantly). The version of Stockfish used is one year old, was playing with far more search threads than has ever received any significant amount of testing, and had way too small hash tables for the number of threads. I believe the percentage of draws would have been much higher in a match with more normal conditions.[6]

References

  1. ^ a b c d e f g h i j Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI].
  2. ^ a b c Knapton, Sarah; Watson, Leon (6 December 2017). "Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours". Telegraph.co.uk. Retrieved 6 December 2017.
  3. ^ Vincent, James (6 December 2017). "DeepMind's AI became a superhuman chess player in a few hours, just for fun". The Verge. Retrieved 6 December 2017.
  4. ^ a b Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (7 December 2018). "A general reinforcement learning algorithm that masters chess, shogi, and go through self-play". Science. 362 (6419): 1140–1144. doi:10.1126/science.aar6404. Retrieved 7 December 2018.
  5. ^ a b "AlphaZero: Reactions From Top GMs, Stockfish Author". chess.com. 8 December 2017. Retrieved 9 December 2017.
  6. ^ a b "AlphaZero: Reactions From Top GMs, Stockfish Author". chess.com. 8 December 2017. Retrieved 13 December 2017.
  7. ^ a b c d e "'Superhuman' Google AI claims chess crown". BBC News. 6 December 2017. Retrieved 7 December 2017.
  8. ^ Knight, Will (8 December 2017). "Alpha Zero's "Alien" Chess Shows the Power, and the Peculiarity, of AI". MIT Technology Review. Retrieved 11 December 2017.
  9. ^ a b "Google's AlphaZero Destroys Stockfish In 100-Game Match". Chess.com. Retrieved 7 December 2017.
  10. ^ Katyanna Quach. "DeepMind's AlphaZero AI clobbered rival chess app on non-level playing...board". The Register (December 14, 2017).
  11. ^ "Some concerns on the matching conditions between AlphaZero and Shogi engine". コンピュータ将棋 レーティング. "uuunuuun" (a blogger who rates free shogi engines). Retrieved 9 December 2017. (via "瀧澤 誠@elmo (@mktakizawa) | Twitter". mktakizawa (elmo developer). 9 December 2017. Retrieved 11 December 2017.)
  12. ^ "DeepMind社がやねうら王に注目し始めたようです". The developer of YaneuraOu, a search component used by elmo. 7 December 2017. Retrieved 9 December 2017.
  13. ^ Badshah, Nadeem (7 December 2017). "Google's DeepMind robot becomes world-beating chess grandmaster in four hours". The Times of London. Retrieved 7 December 2017.
  14. ^ "Alphabet's Latest AI Show Pony Has More Than One Trick". WIRED. 6 December 2017. Retrieved 7 December 2017.
  15. ^ Gibbs, Samuel (7 December 2017). "AlphaZero AI beats champion chess program after teaching itself in four hours". The Guardian. Retrieved 8 December 2017.
  16. ^ "Talking modern correspondence chess". Chessbase. 26 June 2018. Retrieved 11 July 2018.
  17. ^ http://yaneuraou.yaneu.com/2017/12/07/deepmind%E7%A4%BE%E3%81%8C%E3%82%84%E3%81%AD%E3%81%86%E3%82%89%E7%8E%8B%E3%81%AB%E6%B3%A8%E7%9B%AE%E3%81%97%E5%A7%8B%E3%82%81%E3%81%9F%E3%82%88%E3%81%86%E3%81%A7%E3%81%99/
  18. ^ As given in the Science paper, a TPU is "roughly similar in inference speed to a Titan V GPU, although the architectures are not directly comparable" (Ref. 24).
  19. ^ "AlphaZero Crushes Stockfish In New 1,000-Game Match". Chess.com. 6 December 2018.
  20. ^ Sean Ingle (11 December 2018). "'Creative' AlphaZero leads way for chess computers and, maybe, science". The Guardian.
  21. ^ Albert Silver (7 December 2018). "Inside the (deep) mind of AlphaZero". Chessbase.
  22. ^ "Komodo MCTS (Monte Carlo Tree Search) is the new star of TCEC". Chessdom. 18 December 2018.