Jump to content

Talk:Markov decision process: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Shuroo (talk | contribs)
No edit summary
Gdupont (talk | contribs)
Line 65: Line 65:


:I vote for (2). MDP is SDP restricted to discrete time and a discrete state space. [[User:Encyclops|Encyclops]] ([[User talk:Encyclops|talk]]) 21:48, 6 June 2009 (UTC)
:I vote for (2). MDP is SDP restricted to discrete time and a discrete state space. [[User:Encyclops|Encyclops]] ([[User talk:Encyclops|talk]]) 21:48, 6 June 2009 (UTC)

:In my own reading, MDP are not limited to discrete time nor discrete state space. Indeed in Sutton & Barto boork (Sutton, R. S. and Barto A. G. ''Reinforcement Learning: An Introduction''. The MIT Press, Cambridge, MA, 1998), it is clearly said that the discrete description is only chosen for convenience since it avoid to use probability densities and integrals. So IMHO (3) is the right choice. [[User:Gdupont|G.Dupont]] ([[User talk:Gdupont|talk]]) 21:00, 10 August 2010 (UTC)

Revision as of 21:00, 10 August 2010

WikiProject iconRobotics C‑class High‑importance
WikiProject iconThis article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
CThis article has been rated as C-class on Wikipedia's content assessment scale.
HighThis article has been rated as High-importance on the project's importance scale.

It would also be nice to have a section on Semi-Markov Decision Processes. (This extension to MDP is particularly important for intrinsically motivated RL and temporal abstraction in RL.) Npdoty 01:33, 24 April 2006 (UTC)[reply]

It would be nice to hear about partially observable MDPs as well! --Michael Stone 22:59, 23 May 2005 (UTC)[reply]

Not to mention a link to Markov chain! I've been meaning to expand this article, but I'm trying to decide how best to do it. Anything that can go in Markov chain should go there, and only stuff specific to decision processes should go here, but there will need to be some frequent cross-reference. I think eventually POMDPs should get their own article, as well, which should similarly avoid duplicating material in Hidden Markov model. --Delirium 06:15, 13 November 2005 (UTC)[reply]

What is γ

The constant γ is used but never defined. What is it?

Well, at the first usage it's described as "discounting factor γ (usually just under 1)", which pretty much defines it - do you think it needs to be more prominent than that?
This is now fixed in the article

Invented by Howard?

"They were invented by Ronald A. Howard in 1960"

Is that right? Is "invent" the proper term? Also, weren't there works on MDPs (even if with other names) before 1960?

Stochastic games were introduced already in [1]. Since they are more general than MDPs, I would be surprised if MDPs were not used even earlier than that.
  1. ^ Shapley, L.S.: "Stochastic Games", pages 1095--1100. In Proceedings of the National Academy of Sciences 39(10), 1953

—The preceding unsigned comment was added by Svensandberg (talkcontribs) 13:31, 9 January 2007 (UTC).[reply]

"Invent" may not be the right word. However, Howard's book was very important. In E. V. Denardo's book "Dynamic Programming" he does mention Shapley (1953) but adds "a lovely book by Howard (1960) highlighted policy iteration and aroused great interest in this model". So that book set off a lot of subsequent research. And it is still a classic. Feel free to replace the word "invent" with another more appropriate... Encyclops 22:58, 9 January 2007 (UTC)[reply]
I rewrote that a bit and added a reference to Bellman 1957 (which I found in Howard's book). OK? Svensandberg 16:31, 17 January 2007 (UTC)[reply]

What is R(s)?

It's probably the immediate reward received after transitioning from state s, but this is not explained in the article currently. There's only R_a(s,s').

To keep the equations simple, you could change the definition to use R(s) and refer to the Minor Extensions section. —Preceding unsigned comment added by 80.221.23.134 (talk) 11:47, 10 November 2008 (UTC)[reply]

Finite state spaces?

Since P is not treated as a density, I assume A is a finite set, but this is not mentioned. Also, is S a finite set? Note that in Partially observable Markov decision process, both sets are said to be finite. Would be helpful to clarify this. I'm going to make the change, so please correct me if I'm wrong. --Rinconsoleao (talk) 15:12, 26 May 2009 (UTC)[reply]

Ininite state spaces

The technique for the discounted Markov decision process is valid for an infinite (denumerable) state space and other more general spaces. Shuroo (talk) 12:56, 5 June 2009 (UTC)[reply]

OK, I have added a clarifying note to that effect, but I have also added a 'citation needed' tag. Can you add a good reference book on this case? --Rinconsoleao (talk) 11:08, 6 June 2009 (UTC)[reply]

References

Here is the best book. It won the ORSA Lancaster prize a few weeks ago. Markov decision processes: discrete stochastic dynamic programming, ML Puterman - 1994 - John Wiley & Sons, Inc. New York, NY, USA Shuroo (talk) 18:24, 6 June 2009 (UTC)[reply]

Another attractive and easier to read is: Intoroduction to stochastic dynamic programming by SM Ross, 1983, Academic press. Shuroo (talk) 07:15, 7 June 2009 (UTC)[reply]

Clarifying question (especially for mathematicians)

A lot of economists apply dynamic programming to stochastic models, without using the terminology 'Markov decision process'. Which of the following, if any, is true?

(1) 'Markov decision process' is a synonym for 'stochastic dynamic programming'
(2) 'Markov decision processes' is a subfield of 'stochastic dynamic programming'
(3) 'Stochastic dynamic programming' is a subfield of 'Markov decision processes'
(4) None of the above is precisely true

Answers, comments, debate, references appreciated. --Rinconsoleao (talk) 11:14, 6 June 2009 (UTC)[reply]

I guess MDP is used for a discrete (finite or denumerable) state space. The dynamic programming technique can be used also for continuous state space (e.g. Euclidian space) if the Markov property holds. However, I am not aware of the broadest sufficient condition for its validity. In any case, it seems to me that (2) would be correct if you will write 'infinite horizon dynamic programming'. Shuroo (talk) 18:38, 6 June 2009 (UTC)[reply]

I vote for (2). MDP is SDP restricted to discrete time and a discrete state space. Encyclops (talk) 21:48, 6 June 2009 (UTC)[reply]
In my own reading, MDP are not limited to discrete time nor discrete state space. Indeed in Sutton & Barto boork (Sutton, R. S. and Barto A. G. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, 1998), it is clearly said that the discrete description is only chosen for convenience since it avoid to use probability densities and integrals. So IMHO (3) is the right choice. G.Dupont (talk) 21:00, 10 August 2010 (UTC)[reply]