State–action–reward–state–action: Revision history

Filter revisionsshowhide

To date:

Tag filter:

Invert selection

External tools:

For any version listed below, click on its date to view it. For more help, see Help:Page history and Help:Edit summary. (cur) = difference from current version, (prev) = difference from preceding version, m = minor edit, → = section edit, ← = automatic edit summary

curprev 14:2514:25, 19 July 2024‎ SpiralSource talk contribs‎ m 5,765 bytes +10‎ linked quintuple -> tuple undo Tag: 2017 wikitext editor

curprev 20:5920:59, 1 July 2024‎ 2003:e6:9f24:c8f9:196c:1841:fe87:bb84 talk‎ 5,755 bytes −8‎ No edit summary undo

curprev 07:3507:35, 1 June 2024‎ Magioladitis talk contribs‎ m 5,763 bytes −1‎ Moved punctuation mark to correct place + other fixes, References after punctuation per WP:CITEFOOT and WP:PAIC undo Tag: AWB

curprev 06:5406:54, 21 May 2024‎ LucasBrown talk contribs‎ 5,764 bytes +49‎ Adding short description: "Machine learning algorithm" undo Tag: Shortdesc helper

curprev 09:2309:23, 13 December 2023‎ 84.92.101.125 talk‎ 5,715 bytes −3‎ →‎Algorithm undo

curprev 13:3813:38, 5 December 2023‎ C8uyPqgR talk contribs‎ 5,718 bytes +15‎ Changed from the notation used in the first edition of Richard Sutton's 'Reinforcement Learning' to that of the second edition undo

curprev 05:0805:08, 27 November 2023‎ C8uyPqgR talk contribs‎ 5,703 bytes +2‎ No edit summary undo

curprev 06:3306:33, 9 August 2023‎ 68.5.88.55 talk‎ 5,701 bytes +1‎ fix init value section undo

curprev 03:3203:32, 17 May 2023‎ 金色黎明 talk contribs‎ 5,700 bytes +74‎ →‎Discount factor (gamma): fix bare link undo

curprev 13:2313:23, 30 December 2022‎ Thatsme314 talk contribs‎ m 5,626 bytes 0‎ →‎See also: lowercase undo

curprev 17:5617:56, 1 November 2022‎ Scyllagist talk contribs‎ m 5,626 bytes 0‎ to --> on undo Tag: 2017 wikitext editor

curprev 14:3514:35, 31 October 2022‎ 129.26.135.254 talk‎ 5,626 bytes −335‎ the footnote is in [1], the current ref [2] has nothing on footnotes and such. undo Tag: references removed

curprev 23:1823:18, 3 July 2022‎ Marcocnl88 talk contribs‎ m 5,961 bytes −7‎ Removed the word "simply", words like this add little value and most often the opposite is true (i.e. not simple at all). undo

curprev 15:2115:21, 2 July 2022‎ SamL 199917 talk contribs‎ 5,968 bytes +14‎ easier to understand algorithm undo

curprev 16:2716:27, 4 May 2022‎ Niplav talk contribs‎ 5,954 bytes +78‎ No edit summary undo Tag: Visual edit

curprev 16:4216:42, 5 December 2021‎ Mruanova talk contribs‎ m 5,876 bytes +451‎ Analytics India Magazine undo Tag: Visual edit

curprev 15:4715:47, 29 September 2021‎ Fwagen talk contribs‎ m 5,425 bytes +23‎ →‎Discount factor (gamma): added (Retrieved 2021-09-29) to ref. undo
curprev 15:4515:45, 29 September 2021‎ Fwagen talk contribs‎ m 5,402 bytes +132‎ →‎Discount factor (gamma): add the term myopic with a ref. undo

curprev 05:1405:14, 12 July 2021‎ Hooman Mallahzadeh talk contribs‎ m 5,270 bytes +19‎ Collapsing sidebar. undo

curprev 05:5905:59, 6 May 2021‎ OAbot talk contribs‎ m 5,251 bytes +16‎ Open access bot: doi added to citation with #oabot. undo

curprev 00:4700:47, 4 March 2021‎ 2001:56a:f99b:7700:d581:6203:3e65:c752 talk‎ 5,235 bytes +22‎ No edit summary undo

curprev 09:2709:27, 6 January 2021‎ 80.114.172.70 talk‎ 5,213 bytes −2‎ →‎Algorithm undo

curprev 21:0421:04, 5 December 2020‎ Monkbot talk contribs‎ m 5,215 bytes −5‎ Task 18 (cosmetic): eval 3 templates: del empty params (1×); undo Tag: AWB

curprev 04:4304:43, 21 September 2020‎ Citation bot talk contribs‎ 5,220 bytes +16‎ Add: s2cid, author pars. 1-1. Removed parameters. Some additions/deletions were actually parameter name changes. | You can use this bot yourself. Report bugs here. | Suggested by Abductive | Category:Machine learning algorithms | via #UCB_Category undo

curprev 04:0504:05, 7 July 2020‎ Eudamonic talk contribs‎ 5,204 bytes +30‎ added Differentiable computing navbox undo

curprev 03:5303:53, 3 July 2020‎ 203.177.172.11 talk‎ 5,174 bytes +2‎ →‎Algorithm undo

curprev 04:4604:46, 3 May 2020‎ Vthierry talk contribs‎ 5,172 bytes +6‎ →‎Algorithm undo

curprev 10:1810:18, 6 February 2020‎ Pxenviq talk contribs‎ 5,166 bytes −2‎ Undid revision 939416119 by Pxenviq (talk) undo Tag: Undo
curprev 10:0710:07, 6 February 2020‎ Pxenviq talk contribs‎ 5,168 bytes +2‎ No edit summary undo

curprev 22:4422:44, 3 December 2019‎ Citation bot talk contribs‎ m 5,166 bytes +75‎ Add: url. | You can use this bot yourself. Report bugs here.| Activated by User:Nemo bis | via #UCB_webform undo

curprev 21:1721:17, 26 November 2019‎ Diageo11 talk contribs‎ m 5,091 bytes +1‎ typo undo Tag: Visual edit

curprev 09:4309:43, 17 October 2019‎ CjF talk contribs‎ 5,090 bytes +270‎ No edit summary undo

curprev 22:4122:41, 10 July 2019‎ 94.212.245.18 talk‎ 4,820 bytes −22‎ Remove link to Wikiversary (not working, actual article is incomplete) undo

curprev 07:0707:07, 5 July 2019‎ Citation bot talk contribs‎ m 4,842 bytes −8‎ Removed URL that duplicated unique identifier. | You can use this bot yourself. Report bugs here.| Activated by User:Marianne Zimmerman undo

curprev 12:1112:11, 26 February 2019‎ The Anome talk contribs‎ 4,850 bytes 0‎ fmt undo Tag: Visual edit

curprev 13:0513:05, 15 February 2019‎ Justin Ormont talk contribs‎ m 4,850 bytes +8‎ linked learning rate undo

curprev 13:4713:47, 9 November 2018‎ Headbomb talk contribs‎ m 4,842 bytes −1‎ ce undo

curprev 02:0802:08, 10 July 2018‎ Josvebot talk contribs‎ m 4,843 bytes +48‎ Bot: fixing WP:CHECKWIKI error #37 (no DEFAULTSORT for article with special character) undo

curprev 14:4614:46, 11 March 2018‎ Bderrett talk contribs‎ m 4,795 bytes −4‎ Fix typo undo Tag: Visual edit

curprev 21:3921:39, 9 March 2018‎ Bderrett talk contribs‎ m 4,799 bytes −74‎ Clarify that Q-learning attempts to compute the state-action value function of the optimal policy. undo Tag: Visual edit

curprev 22:5822:58, 28 February 2018‎ Lfstevens talk contribs‎ 4,873 bytes +25‎ →‎top: ce, ref cleanup undo Tag: Visual edit

curprev 19:3219:32, 6 February 2018‎ Bomberzocker talk contribs‎ 4,848 bytes +9‎ cited wrong chapter, fixed formula special characters. Sorry. undo
curprev 19:2719:27, 6 February 2018‎ Bomberzocker talk contribs‎ 4,839 bytes −27‎ old url returned http 404 error, fixed this. Changing formula to t+1 again based on updated source. Norvig & Russel may have an error in their book. Needs further research. undo
curprev 19:0919:09, 6 February 2018‎ 89.3.238.121 talk‎ 4,866 bytes +101‎ No edit summary undo

curprev 14:3914:39, 3 February 2018‎ 128.119.241.213 talk‎ 4,765 bytes 0‎ Changed SARSA to Sarsa. See, for example, the current drafts of the 2nd edition of Sutton and Barto's book: Reinforcement Learning an Introduction. undo

curprev 11:5811:58, 1 February 2018‎ Bomberzocker talk contribs‎ 4,765 bytes −2‎ →‎Algorithm: equation had a mistake. Source: Stuart Russel & Peter Norvig: Artificial Intelligence: A Modern Approach undo

curprev 10:0910:09, 29 October 2017‎ 2.242.24.134 talk‎ 4,767 bytes +4‎ corrected the formular according to: http://incompleteideas.net/sutton/book/ebook/node64.html . r_t or r_{t+1} depends on whether the environment reacts instantaneaously or one time step later. it appears to make more sense to assume a temporal delay. undo

curprev 02:4102:41, 20 September 2017‎ Tony1 talk contribs‎ 4,763 bytes +8‎ No edit summary undo
curprev 02:4002:40, 20 September 2017‎ Tony1 talk contribs‎ m 4,755 bytes 0‎ Tony1 moved page State-Action-Reward-State-Action to State–action–reward–state–action: Not a protocol; refers to a sequence. undo

curprev 14:2514:25, 4 July 2017‎ Drbepp talk contribs‎ m 4,755 bytes −4‎ link to reference 2 (Sutton's book) updated, old link was broken undo

19 July 2024

1 July 2024

1 June 2024

21 May 2024

13 December 2023

5 December 2023

27 November 2023

9 August 2023

17 May 2023

30 December 2022

1 November 2022

31 October 2022

3 July 2022

2 July 2022

4 May 2022

5 December 2021

29 September 2021

12 July 2021

6 May 2021

4 March 2021

6 January 2021

5 December 2020

21 September 2020

7 July 2020

3 July 2020

3 May 2020

6 February 2020

3 December 2019

26 November 2019

17 October 2019

10 July 2019

5 July 2019

26 February 2019

15 February 2019

9 November 2018

10 July 2018

11 March 2018

9 March 2018

28 February 2018

6 February 2018

3 February 2018

1 February 2018

29 October 2017

20 September 2017

4 July 2017