A Hybrid Learning Strategy for Discovery of Policies of Action
Pontifical Catholic University of Paraná – PUCPR
Curitiba – PR, Brazil
Graduate Program in Computer Science - PPGIA
A Hybrid Learning Strategy for
Discovery of Policies of Action
R. Ribeiro, A. L. Koerich and F. Enembreck
XVIII Brazilian Artificial Intelligence Symposium (SBIA 2006),
Ribeirão Preto, SP, Brazil, October 2006
Motivation & Challenge;
– Adaptive Autonomous Agents;
– Reinforcement Learning;
– Q-Learning Algorithm;
– Policy Estimation Techniques based on Instance-Based Learning;
– Evaluation Methodology;
– Hybrid Learning Method;
Conclusion & Future Work.
Pontifical Catholic University of Paraná - PUCPR 2
Motivation & Challenge
Discovery and Evaluation of Policies of Action;
Generic Evaluation Methodology;
Hybrid Learning Method.
Pontifical Catholic University of Paraná - PUCPR 3
ADAPTIVE AUTONOMOUS AGENTS:
– Finding an action policies autonomously;
– Incremental learning based in reward/punishments;
– Learning through of trial/error interactions with an
– Convergence for an optimal policy visiting all states of the
Pontifical Catholic University of Paraná - PUCPR 4
Foundations of Reinforcement Learning:
– Environment, action policies and reward.
Sensing (s) Rewards/
Pontifical Catholic University of Paraná - PUCPR 5
Example of learning
EXAMPLE (Problem proposed):
(a) Set up of States b) Without Learning (c) Intermediate Policies
(d) 1000 steps (e) 1500 steps (f)Optimal Policy
Pontifical Catholic University of Paraná - PUCPR 6
– Different domains;
– Quality measures are often specific (kilometers,
money, force, energy, etc);
– Different ways of evaluation the same problem (n. of
steps, n. of changes of actions, processing time).
Pontifical Catholic University of Paraná - PUCPR 7
Generic Evaluation Methodology of Policies of Action;
Hybrid Learning Method;
Pontifical Catholic University of Paraná - PUCPR 8
Pontifical Catholic University of Paraná - PUCPR 9
1 Initiating Correct=0, Wrong=0, CostP=0, CostA*=0;
2 For each s ∈ S:
CostP = cost(s, s_goal, P);
CostA*= cost(s, s_goal, PA*);
- Related pdf books
- Semiclassical Evolution of Dissipative Markovian Systems
- Concurrent Software Testing: A Systematic Review
- Notes on torsion and simple homotopy theory Preliminary ...
- Positive quadratic diﬀerential forms: topological equivalence ...
- NOTAS DE AULA C \' ALCULO VETORIAL
- Algebra Linear
- Programa \"aprendendo a Estudar\" - apresentação
- A Hybrid Learning Strategy for Discovery of Policies of Action
- Pesquisa Operacional / Programação Matemática
- On divergent diagrams of ﬁnite codimension
- Multiple solutions for some elliptic equations with a ...
- CADERNOS DE MATEMATICA 07,
- Classiﬁcation Abstraction: An Intrinsic Element
- CADERNOS DE MATEMATICA 02,
- ON THE WELL-POSEDNESS FOR THE GENERALIZED OSTROVSKY,
- On the well-Posedness for some perturbations of the KdV
- 570 Cita¸oes em Trabalhos de Pesquisa
- Popular epubs
- Package 1a Discovery
- A summary of the Discovery Health plans
- Bates Technical College Bylaws and Policies Table of Contents
- Child Care Wellness Policies
- Long-Term Care Policies in Michigan
- Bryan H. Wood - UNITED STATES MARINE CORPS PLANS, POLICIES ...
- Applying GA for Reward Allotment in an Event-driven Hybrid ...
- The effect of smoke-free policies on revenue in bars in Tasmania ...
- An Analysis of Hybrid Active Filter Topologies for Harmonic Flow ...