머신러닝스터디/2017/Reinforcement Learning/: Difference between revisions

From ZeroWiki

Revision as of 07:14, 5 August 2017

Reinforcement Learning

Lecture 4: Model Free Prediction

Lecture 5: Model Free Control

동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s

on policy vs off policy
ε-Greedy
Policy Iteration: Iterate these two step

1. Policy evaluation

- Evaluate value function with given policy π

1. Policy Improvement

- Update policy in current state s, current action a, current reward r to next state s', nest action a' -> sarsa
Sarsa
- one step update policy TD?
- on policy
- Sarsa는 다음과 같은 조건에서 converge한다

1. GLIE sequence of policies
2. Robinson Monro sequence of step sizes

Retrieved from "https://mediawiki.zeropage.org/index.php?title=머신러닝스터디/2017/Reinforcement_Learning/&oldid=50406"