머신러닝스터디/2017/Reinforcement Learning/: Difference between revisions

Latest revision as of 00:44, 27 March 2026

- Update policy in current state s, current action a, current reward r to next state s', nest action a' ->

sarsa

Greedy policy improvement
ε-Greedy policy improvement
- 1-ε 의 확률로 greedy action
- ε의 확률로 random action
GLIE: Greedy in the Limit with Infinite Exploration
- ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
Sarsa
- one step update policy TD?
- on policy
- Sarsa는 다음과 같은 조건에서 converge한다

@@ Line 1: / Line 1: @@
 == Reinforcement Learning ==
+=== Lecture 4: Model Free Prediction ===
 === Lecture 5: Model Free Control ===
 동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
 * on policy vs off policy
-* ε-Greedy
+* Policy Iteration: Iterate these two step
+## Policy evaluation
+** Evaluate value function with given policy π
+## Policy Improvement
+** Update policy in current state s, current action a, current reward r to next state s', nest action a' ->
+sarsa
+* Greedy policy improvement
+* ε-Greedy policy improvement
+** 1-ε 의 확률로 greedy action
+** ε의 확률로 random action
+* GLIE: Greedy in the Limit with Infinite Exploration
+** ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
 * Sarsa
 ** one step update policy TD?
@@ Line 10: / Line 23: @@
 ## GLIE sequence of policies
 ## Robinson Monro sequence of step sizes