• A Hybrid Learning Strategy for Discovery of Policies of Action


  •   
  • FileName: IBERAMIASBIA_TS9_A1.pdf [read-online]
    • Abstract: Pontifical Catholic University of Paraná - PUCPR. Results. Efficiency of the Q-learning, 1 ... Pontifical Catholic University of Paraná - PUCPR. Results. Efficiency of the Q-learning, 1 ...

Download the ebook

Pontifical Catholic University of Paraná – PUCPR
Curitiba – PR, Brazil
Graduate Program in Computer Science - PPGIA
A Hybrid Learning Strategy for
Discovery of Policies of Action
R. Ribeiro, A. L. Koerich and F. Enembreck
XVIII Brazilian Artificial Intelligence Symposium (SBIA 2006),
Ribeirão Preto, SP, Brazil, October 2006
Presentation Outline
Motivation & Challenge;
Goal;
Background:
– Adaptive Autonomous Agents;
– Reinforcement Learning;
– Q-Learning Algorithm;
– Policy Estimation Techniques based on Instance-Based Learning;
Contributions:
– Evaluation Methodology;
– Simulator;
– Hybrid Learning Method;
Experimental Results;
Conclusion & Future Work.
Pontifical Catholic University of Paraná - PUCPR 2
Motivation & Challenge
Learning Agents;
Discovery and Evaluation of Policies of Action;
Generic Evaluation Methodology;
Hybrid Learning Method.
Pontifical Catholic University of Paraná - PUCPR 3
Background
ADAPTIVE AUTONOMOUS AGENTS:
– Finding an action policies autonomously;
– Incremental learning based in reward/punishments;
REINFORCEMENT LEARNING:
– Learning through of trial/error interactions with an
environment;
Q-LEARNING ALGORITHM:
– Convergence for an optimal policy visiting all states of the
environment.
Pontifical Catholic University of Paraná - PUCPR 4
Reinforcement Learning
Foundations of Reinforcement Learning:
– Environment, action policies and reward.
Agent
Sensing (s) Rewards/
Action (a)
punish (+/-)
R(s,a)
Environment
Pontifical Catholic University of Paraná - PUCPR 5
Example of learning
EXAMPLE (Problem proposed):
(a) Set up of States b) Without Learning (c) Intermediate Policies
(d) 1000 steps (e) 1500 steps (f)Optimal Policy
Pontifical Catholic University of Paraná - PUCPR 6
Evaluation Methology
EVALUATION METHOLOGY:
– Different domains;
– Quality measures are often specific (kilometers,
money, force, energy, etc);
– Different ways of evaluation the same problem (n. of
steps, n. of changes of actions, processing time).
Pontifical Catholic University of Paraná - PUCPR 7
Contributions
Generic Evaluation Methodology of Policies of Action;
Simulator;
Hybrid Learning Method;
Experimental Results.
Pontifical Catholic University of Paraná - PUCPR 8
Evaluation Methology
Pontifical Catholic University of Paraná - PUCPR 9
Evaluation Methology
Algorithm EvaluatePolicy(P);
1 Initiating Correct=0, Wrong=0, CostP=0, CostA*=0;
2 For each s ∈ S:
CostP = cost(s, s_goal, P);
CostA*= cost(s, s_goal, PA*);
If CostP


Use: 0.0626