Arxiv on Feb. 14th


Title: Efficient Exploration through Bayesian Deep Q-Networks
Authors: Kamyar Azizzadenesheli and Emma Brunskill and Animashree Anandkumar
Categories: cs.AI cs.LG stat.ML

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling
based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for
targeted exploration in high dimensions through posterior sampling but is
usually computationally expensive. We address this limitation by introducing
uncertainty only at the output layer of the network through a Bayesian Linear
Regression (BLR) model. This layer can be trained with fast closed-form updates
and its samples can be drawn efficiently through the Gaussian distribution. We
apply our method to a wide range of Atari games in Arcade Learning
Environments. Since BDQN carries out more efficient exploration, it is able to
reach higher rewards substantially faster than a key baseline, the double deep
Q network (DDQN). ,  2915kb)

Title: Diversity-Driven Exploration Strategy for Deep Reinforcement Learning
Authors: Zhang-Wei Hong, Tzu-Yun Shann, Shih-Yang Su, Yi-Hsiang Chang, Chun-Yi
Categories: cs.AI stat.ML

Efficient exploration remains a challenging research problem in reinforcement
learning, especially when an environment contains large state spaces, deceptive
local optima, or sparse rewards. To tackle this problem, we present a
diversity-driven approach for exploration, which can be easily combined with
both off- and on-policy reinforcement learning algorithms. We show that by
simply adding a distance measure to the loss function, the proposed methodology
significantly enhances an agent’s exploratory behaviors, and thus preventing
the policy from being trapped in local optima. We further propose an adaptive
scaling method for stabilizing the learning process. Our experimental results
in Atari 2600 show that our method outperforms baseline approaches in several
tasks in terms of mean scores and exploration efficiency. ,  5800kb)

Title: Progressive Reinforcement Learning with Distillation for Multi-Skilled
  Motion Control
Authors: Glen Berseth, Cheng Xie, Paul Cernek, Michiel Van de Panne
Categories: cs.AI cs.LG stat.ML
Comments: 15 pages, Conference paper

Deep reinforcement learning has demonstrated increasing capabilities for
continuous control problems, including agents that can move with skill and
agility through their environment. An open problem in this setting is that of
developing good strategies for integrating or merging policies for multiple
skills, where each individual skill is a specialist in a specific skill and its
associated state distribution. We extend policy distillation methods to the
continuous action setting and leverage this technique to combine expert
policies, as evaluated in the domain of simulated bipedal locomotion across
different classes of terrain. We also introduce an input injection method for
augmenting an existing policy network to exploit new input features. Lastly,
our method uses transfer learning to assist in the efficient acquisition of new
skills. The combination of these methods allows a policy to be incrementally
augmented with new skills. We compare our progressive learning and integration
via distillation (PLAID) method against three alternative baselines. ,  4972kb)

Title: Efficient Model-Based Deep Reinforcement Learning with Variational State
Authors: Dane Corneil, Wulfram Gerstner and Johanni Brea
Categories: cs.LG cs.AI stat.ML

Modern reinforcement learning algorithms reach super-human performance in
many board and video games, but they are sample inefficient, i.e. they
typically require significantly more playing experience than humans to reach an
equal performance level. To improve sample efficiency, an agent may build a
model of the environment and use planning methods to update its policy. In this
article we introduce VaST (Variational State Tabulation), which maps an
environment with a high-dimensional state space (e.g. the space of visual
inputs) to an abstract tabular environment. Prioritized sweeping with small
backups, a highly efficient planning method, can then be used to update
state-action values. We show how VaST can rapidly learn to maximize reward in
tasks like 3D navigation and efficiently adapt to sudden changes in rewards or
transition probabilities. ,  6651kb)


No Responses Yet to “Arxiv on Feb. 14th”

  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: