머신러닝스터디/2017/Reinforcement Learning/: Difference between revisions

Revision as of 07:15, 5 August 2017

- Update policy in current state s, current action a, current reward r to next state s', nest action a' ->

sarsa

@@ Line 5: / Line 5: @@
 동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
 * on policy vs off policy
-* ε-Greedy
 * Policy Iteration: Iterate these two step
 ## Policy evaluation
 ** Evaluate value function with given policy π
 ## Policy Improvement
-** Update policy in current state s, current action a, current reward r to next state s', nest action a' -> sarsa
+** Update policy in current state s, current action a, current reward r to next state s', nest action a' ->
+sarsa
+* Greedy policy improvement
+* ε-Greedy policy improvement
 * Sarsa
 ** one step update policy TD?