Jump to content

Proximal policy optimization

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Cool-RR (talk | contribs) at 17:32, 13 May 2022 (Typo). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Proximal Policy Optimization (PPO) is a family of model-free reinforcement learning algorithms. PPO algorithms are policy gradient methods, which means that they search the space of policies rather than assigning values to state-action pairs.

PPO algorithms have some of the benefits of trust region policy optimization (TRPO) algorithms, but they are simpler to implement, more general, and have better sample complexity.[1]

See also

References

  1. ^ Schulman, John; Wolski, Filip; Dhariwal, Prafulla; Radford, Alec; Klimov, Oleg (2017). "Proximal Policy Optimization Algorithms". arXiv:1707.06347.

External links