More actions
imported>rabierre No edit summary |
imported>rabierre No edit summary |
||
| Line 8: | Line 8: | ||
* Policy Iteration: Iterate these two step | * Policy Iteration: Iterate these two step | ||
## Policy evaluation | ## Policy evaluation | ||
** Evaluate value function with given policy π | |||
## Policy Improvement | ## Policy Improvement | ||
** Update policy in current state s, current action a, current reward r to next state s', nest action a' -> sarsa | |||
* Sarsa | * Sarsa | ||
** one step update policy TD? | ** one step update policy TD? | ||
Revision as of 07:14, 5 August 2017
Reinforcement Learning
Lecture 4: Model Free Prediction
Lecture 5: Model Free Control
동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
- on policy vs off policy
- ε-Greedy
- Policy Iteration: Iterate these two step
- Policy evaluation
- Evaluate value function with given policy π
- Policy Improvement
- Update policy in current state s, current action a, current reward r to next state s', nest action a' -> sarsa
- Sarsa
- one step update policy TD?
- on policy
- Sarsa는 다음과 같은 조건에서 converge한다
- GLIE sequence of policies
- Robinson Monro sequence of step sizes