Toggle menu
Toggle personal menu
Not logged in
Your IP address will be publicly visible if you make any edits.

머신러닝스터디/2017/Reinforcement Learning/: Difference between revisions

From ZeroWiki
imported>rabierre
No edit summary
(Repair batch-0005 pages from live compare)
 
(11 intermediate revisions by one other user not shown)
Line 1: Line 1:
== Reinforcement Learning ==
== Reinforcement Learning ==
=== Lecture 4: Model Free Prediction ===
=== Lecture 5: Model Free Control ===
=== Lecture 5: Model Free Control ===
동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s
* on policy vs off policy
* on policy vs off policy
* ε-Greedy  
* Policy Iteration: Iterate these two step
## Policy evaluation
** Evaluate value function with given policy π
## Policy Improvement
** Update policy in current state s, current action a, current reward r to next state s', nest action a' ->
sarsa
* Greedy policy improvement
* ε-Greedy policy improvement
** 1-ε 의 확률로 greedy action
** ε의 확률로 random action
* GLIE: Greedy in the Limit with Infinite Exploration
** ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
* Sarsa
* Sarsa
** one step update policy TD?
** on policy
** on policy
 
** Sarsa는 다음과 같은 조건에서 converge한다
## GLIE sequence of policies
## Robinson Monro sequence of step sizes

Latest revision as of 00:44, 27 March 2026

Reinforcement Learning

Lecture 4: Model Free Prediction

Lecture 5: Model Free Control

동영상 주소: https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=2466s

  • on policy vs off policy
  • Policy Iteration: Iterate these two step
    1. Policy evaluation
    • Evaluate value function with given policy π
    1. Policy Improvement
    • Update policy in current state s, current action a, current reward r to next state s', nest action a' ->

sarsa

  • Greedy policy improvement
  • ε-Greedy policy improvement
    • 1-ε 의 확률로 greedy action
    • ε의 확률로 random action
  • GLIE: Greedy in the Limit with Infinite Exploration
    • ε이 step k에서 1/k로 점점 작아진다면(fade out) GLIE이다
  • Sarsa
    • one step update policy TD?
    • on policy
    • Sarsa는 다음과 같은 조건에서 converge한다
    1. GLIE sequence of policies
    2. Robinson Monro sequence of step sizes