Arxiv on Mar. 2nd


Title: Towards Cooperation in Sequential Prisoner’s Dilemmas: a Deep Multiagent
  Reinforcement Learning Approach
Authors: Weixun Wang, Jianye Hao, Yixi Wang, Matthew Taylor
Categories: cs.AI cs.GT cs.LG cs.MA
Comments: 13 pages, 21 figures

The Iterated Prisoner’s Dilemma has guided research on social dilemmas for
decades. However, it distinguishes between only two atomic actions: cooperate
and defect. In real-world prisoner’s dilemmas, these choices are temporally
extended and different strategies may correspond to sequences of actions,
reflecting grades of cooperation. We introduce a Sequential Prisoner’s Dilemma
(SPD) game to better capture the aforementioned characteristics. In this work,
we propose a deep multiagent reinforcement learning approach that investigates
the evolution of mutual cooperation in SPD games. Our approach consists of two
phases. The first phase is offline: it synthesizes policies with different
cooperation degrees and then trains a cooperation degree detection network. The
second phase is online: an agent adaptively selects its policy based on the
detected degree of opponent cooperation. The effectiveness of our approach is
demonstrated in two representative SPD 2D games: the Apple-Pear game and the
Fruit Gathering game. Experimental results show that our strategy can avoid
being exploited by exploitative opponents and achieve cooperation with
cooperative opponents. ,  6332kb)

Title: Deep Reinforcement Learning for Sponsored Search Real-time Bidding
Authors: Jun Zhao, Guang Qiu, Ziyu Guan, Wei Zhao, Xiaofei He
Categories: cs.AI

Bidding optimization is one of the most critical problems in online
advertising. Sponsored search (SS) auction, due to the randomness of user query
behavior and platform nature, usually adopts keyword-level bidding strategies.
In contrast, the display advertising (DA), as a relatively simpler scenario for
auction, has taken advantage of real-time bidding (RTB) to boost the
performance for advertisers. In this paper, we consider the RTB problem in
sponsored search auction, named SS-RTB. SS-RTB has a much more complex dynamic
environment, due to stochastic user query behavior and more complex bidding
policies based on multiple keywords of an ad. Most previous methods for DA
cannot be applied. We propose a reinforcement learning (RL) solution for
handling the complex dynamic environment. Although some RL methods have been
proposed for online advertising, they all fail to address the “environment
changing” problem: the state transition probabilities vary between two days.
Motivated by the observation that auction sequences of two days share similar
transition patterns at a proper aggregation level, we formulate a robust MDP
model at hour-aggregation level of the auction data and propose a
control-by-model framework for SS-RTB. Rather than generating bid prices
directly, we decide a bidding model for impressions of each hour and perform
real-time bidding accordingly. We also extend the method to handle the
multi-agent problem. We deployed the SS-RTB system in the e-commerce search
auction platform of Alibaba. Empirical experiments of offline evaluation and
online A/B test demonstrate the effectiveness of our method. ,  1972kb)

Title: Model-Based Value Estimation for Efficient Model-Free Reinforcement
Authors: Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E.
Gonzalez, Sergey Levine
Categories: cs.LG cs.AI stat.ML

Recent model-free reinforcement learning algorithms have proposed
incorporating learned dynamics models as a source of additional data with the
intention of reducing sample complexity. Such methods hold the promise of
incorporating imagined data coupled with a notion of model uncertainty to
accelerate the learning of continuous control tasks. Unfortunately, they rely
on heuristics that limit usage of the dynamics model. We present model-based
value expansion, which controls for uncertainty in the model by only allowing
imagination to fixed depth. By enabling wider use of learned dynamics models
within a model-free reinforcement learning algorithm, we improve value
estimation, which, in turn, reduces the sample complexity of learning. ,  4856kb)

Title: Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal
Authors: Adrian \v{S}o\v{s}i\’c, Elmar Rueckert, Jan Peters, Abdelhak M.
Zoubir, Heinz Koeppl
Categories: stat.ML cs.AI cs.LG cs.RO cs.SY
Comments: 40 pages, 13 figures

Recent advances in the field of inverse reinforcement learning (IRL) have
yielded sophisticated frameworks which relax the original modeling assumption
that the behavior of an observed agent reflects only a single intention.
Instead, the demonstration data is typically divided into parts, to account for
the fact that different trajectories may correspond to different intentions,
e.g., because they were generated by different domain experts. In this work, we
go one step further: using the intuitive concept of subgoals, we build upon the
premise that even a single trajectory can be explained more efficiently locally
within a certain context than globally, enabling a more compact representation
of the observed behavior. Based on this assumption, we build an implicit
intentional model of the agent’s goals to forecast its behavior in unobserved
situations. The result is an integrated Bayesian prediction framework which
provides smooth policy estimates that are consistent with the expert’s plan and
significantly outperform existing IRL solutions. Most notably, our framework
naturally handles situations where the intentions of the agent change with time
and classical IRL algorithms fail. In addition, due to its probabilistic
nature, the model can be straightforwardly applied in an active learning
setting to guide the demonstration process of the expert. ,  1789kb)


No Responses Yet to “Arxiv on Mar. 2nd”

  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: