More actions
imported>rabierre No edit summary |
(Repair batch-0005 pages from live compare) |
||
| (7 intermediate revisions by one other user not shown) | |||
| Line 1: | Line 1: | ||
== Reinforcement Learning == | == Reinforcement Learning == | ||
=== Lecture 4: Model Free Prediction === | |||
=== Lecture 5: Model Free Control === | === Lecture 5: Model Free Control === | ||
동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s | 동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s | ||
* on policy vs off policy | * on policy vs off policy | ||
* ε-Greedy | * Policy Iteration: Iterate these two step | ||
## Policy evaluation | |||
** Evaluate value function with given policy π | |||
## Policy Improvement | |||
** Update policy in current state s, current action a, current reward r to next state s', nest action a' -> | |||
sarsa | |||
* Greedy policy improvement | |||
* ε-Greedy policy improvement | |||
** 1-ε 의 확률로 greedy action | |||
** ε의 확률로 random action | |||
* GLIE: Greedy in the Limit with Infinite Exploration | |||
** ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다 | |||
* Sarsa | * Sarsa | ||
** one step update policy TD? | ** one step update policy TD? | ||
| Line 10: | Line 23: | ||
## GLIE sequence of policies | ## GLIE sequence of policies | ||
## Robinson Monro sequence of step sizes | ## Robinson Monro sequence of step sizes | ||
Latest revision as of 00:44, 27 March 2026
Reinforcement Learning
Lecture 4: Model Free Prediction
Lecture 5: Model Free Control
동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
- on policy vs off policy
- Policy Iteration: Iterate these two step
- Policy evaluation
- Evaluate value function with given policy π
- Policy Improvement
- Update policy in current state s, current action a, current reward r to next state s', nest action a' ->
sarsa
- Greedy policy improvement
- ε-Greedy policy improvement
- 1-ε 의 확률로 greedy action
- ε의 확률로 random action
- GLIE: Greedy in the Limit with Infinite Exploration
- ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
- Sarsa
- one step update policy TD?
- on policy
- Sarsa는 다음과 같은 조건에서 converge한다
- GLIE sequence of policies
- Robinson Monro sequence of step sizes