Arxiv on Feb. 20th


Title: Reactive Reinforcement Learning in Asynchronous Environments
Authors: Jaden B. Travnik, Kory W. Mathewson, Richard S. Sutton, Patrick M.
Categories: cs.AI cs.LG
Comments: 11 pages, 7 figures, currently under journal peer review

The relationship between a reinforcement learning (RL) agent and an
asynchronous environment is often ignored. Frequently used models of the
interaction between an agent and its environment, such as Markov Decision
Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the
fact that, in an asynchronous environment, the state of the environment may
change during computation performed by the agent. In an asynchronous
environment, minimizing reaction time—the time it takes for an agent to react
to an observation—also minimizes the time in which the state of the
environment may change following observation. In many environments, the
reaction time of an agent directly impacts task performance by permitting the
environment to transition into either an undesirable terminal state or a state
where performing the chosen action is inappropriate. We propose a class of
reactive reinforcement learning algorithms that address this problem of
asynchronous environments by immediately acting after observing new state
information. We compare a reactive SARSA learning algorithm with the
conventional SARSA learning algorithm on two asynchronous robotic tasks
(emergency stopping and impact prevention), and show that the reactive RL
algorithm reduces the reaction time of the agent by approximately the duration
of the algorithm’s learning update. This new class of reactive algorithms may
facilitate safer control and faster decision making without any change to
standard learning guarantees. ,  1681kb)

Title: Sim-To-Real Optimization Of Complex Real World Mobile Network with
  Imperfect Information via Deep Reinforcement Learning from Self-play
Authors: Yongxi Tan, Jin Yang, Xin Chen, Qitao Song, Yunjun Chen, Zhangxiang
Ye, Zhenqiang Su
Categories: cs.AI cs.LG stat.ML
Comments: 7 figures

Mobile network that millions of people use every day is one of the most
complex systems in real world. Optimization of mobile network to meet exploding
customer demand and reduce CAPEX/OPEX poses greater challenges than in prior
works. Learning to solve complex problems in real world to benefit everyone and
make the world better has long been ultimate goal of AI. However, it still
remains an unsolved problem for deep reinforcement learning (DRL), given
imperfect information in real world, huge state/action space, lots of data
needed for training, associated time/cost, multi-agent interactions, potential
negative impact to real world, etc. To bridge this reality gap, we proposed a
DRL framework to direct transfer optimal policy learned from multi-tasks in
source domain to unseen similar tasks in target domain without any further
training in both domains. First, we distilled temporal-spatial relationships
between cells and mobile users to scalable 3D image-like tensor to best
characterize partially observed mobile network. Second, inspired by AlphaGo, we
used a novel self-play mechanism to empower DRL agent to gradually improve its
intelligence by competing for best record on multiple tasks. Third, a
decentralized DRL method is proposed to coordinate multi-agents to compete and
cooperate as a team to maximize global reward and minimize potential negative
impact. Using 7693 unseen test tasks over 160 unseen simulated mobile networks
and 6 field trials over 4 commercial mobile networks in real world, we
demonstrated the capability of our approach to direct transfer the learning
from one simulator to another simulator, and from simulation to real world.
This is the first time that a DRL agent successfully transfers its learning
directly from simulation to very complex real world problems with incomplete
and imperfect information, huge state/action space and multi-agent
interactions. ,  949kb)

Title: Accelerated Primal-Dual Policy Optimization for Safe Reinforcement
Authors: Qingkai Liang, Fanyu Que, Eytan Modiano
Categories: cs.AI cs.LG stat.ML

Constrained Markov Decision Process (CMDP) is a natural framework for
reinforcement learning tasks with safety constraints, where agents learn a
policy that maximizes the long-term reward while satisfying the constraints on
the long-term cost. A canonical approach for solving CMDPs is the primal-dual
method which updates parameters in primal and dual spaces in turn. Existing
methods for CMDPs only use on-policy data for dual updates, which results in
sample inefficiency and slow convergence. In this paper, we propose a policy
search method for CMDPs called Accelerated Primal-Dual Optimization (APDO),
which incorporates an off-policy trained dual variable in the dual update
procedure while updating the policy in primal space with on-policy likelihood
ratio gradient. Experimental results on a simulated robot locomotion task show
that APDO achieves better sample efficiency and faster convergence than
state-of-the-art approaches for CMDPs. ,  72kb)

Title: Recommendations with Negative Feedback via Pairwise Deep Reinforcement
Authors: Xiangyu Zhao and Liang Zhang and Zhuoye Ding and Long Xia and Jiliang
Tang and Dawei Yin
Categories: cs.IR cs.LG stat.ML

Recommender systems play a crucial role in mitigating the problem of
information overload by suggesting users’ personalized items or services. The
vast majority of traditional recommender systems consider the recommendation
procedure as a static process and make recommendations following a fixed
strategy. In this paper, we propose a novel recommender system with the
capability of continuously improving its strategies during the interactions
with users. We model the sequential interactions between users and a
recommender system as a Markov Decision Process (MDP) and leverage
Reinforcement Learning (RL) to automatically learn the optimal strategies via
recommending trial-and-error items and receiving reinforcements of these items
from users’ feedback. Users’ feedback can be positive and negative and both
types of feedback have great potentials to boost recommendations. However, the
number of negative feedback is much larger than that of positive one; thus
incorporating them simultaneously is challenging since positive feedback could
be buried by negative one. In this paper, we develop a novel approach to
incorporate them into the proposed deep recommender system (DEERS) framework.
The experimental results based on real-world e-commerce data demonstrate the
effectiveness of the proposed framework. Further experiments have been
conducted to understand the importance of both positive and negative feedback
in recommendations. ,  378kb)

Title: Modeling the Formation of Social Conventions in Multi-Agent Populations
Authors: Ismael T. Freire, Clement Moulin-Frier, Marti Sanchez-Fibla, Xerxes D.
Arsiwalla, Paul Verschure
Categories: cs.MA cs.AI cs.GT q-bio.NC stat.ML
Comments: 30 pages, 12 figures

In order to understand the formation of social conventions we need to know
the specific role of control and learning in multi-agent systems. To advance in
this direction, we propose, within the framework of the Distributed Adaptive
Control (DAC) theory, a novel Control-based Reinforcement Learning architecture
(CRL) that can account for the acquisition of social conventions in multi-agent
populations that are solving a benchmark social decision-making problem. Our
new CRL architecture, as a concrete realization of DAC multi-agent theory,
implements a low-level sensorimotor control loop handling the agent’s reactive
behaviors (pre-wired reflexes), along with a layer based on model-free
reinforcement learning that maximizes long-term reward. We apply CRL in a
multi-agent game-theoretic task in which coordination must be achieved in order
to find an optimal solution. We show that our CRL architecture is able to both
find optimal solutions in discrete and continuous time and reproduce human
experimental data on standard game-theoretic metrics such as efficiency in
acquiring rewards, fairness in reward distribution and stability of convention
formation. ,  750kb)

Title: Efficient Large-Scale Fleet Management via Multi-Agent Deep
  Reinforcement Learning
Authors: Kaixiang Lin, Renyu Zhao, Zhe Xu and Jiayu Zhou
Categories: cs.MA cs.AI

Large-scale online ride-sharing platforms have substantially transformed our
lives by reallocating transportation resources to alleviate traffic congestion
and promote transportation efficiency. An efficient fleet management strategy
not only can significantly improve the utilization of transportation resources
but also increase the revenue and customer satisfaction. It is a challenging
task to design an effective fleet management strategy that can adapt to an
environment involving complex dynamics between demand and supply. Existing
studies usually work on a simplified problem setting that can hardly capture
the complicated stochastic demand-supply variations in high-dimensional space.
In this paper we propose to tackle the large-scale fleet management problem
using reinforcement learning, and propose a contextual multi-agent
reinforcement learning framework including two concrete algorithms, namely
contextual deep Q-learning and contextual multi-agent actor-critic, to achieve
explicit coordination among a large number of agents adaptive to different
contexts. We show significant improvements of the proposed framework over
state-of-the-art approaches through extensive empirical studies. ,  2383kb)
Title: A Deep Q-Learning Agent for the L-Game with Variable Batch Training
Authors: Petros Giannakopoulos, Yannis Cotronis
Categories: cs.LG cs.AI

We employ the Deep Q-Learning algorithm with Experience Replay to train an
agent capable of achieving a high-level of play in the L-Game while
self-learning from low-dimensional states. We also employ variable batch size
for training in order to mitigate the loss of the rare reward signal and
significantly accelerate training. Despite the large action space due to the
number of possible moves, the low-dimensional state space and the rarity of
rewards, which only come at the end of a game, DQL is successful in training an
agent capable of strong play without the use of any search methods or domain
knowledge. ,  424kb)

Title: Improving Mild Cognitive Impairment Prediction via Reinforcement
  Learning and Dialogue Simulation
Authors: Fengyi Tang, Kaixiang Lin, Ikechukwu Uchendu, Hiroko H. Dodge, Jiayu
Categories: cs.LG cs.CL stat.ML
Comments: 9 pages, 4 figures, 4 tables

Mild cognitive impairment (MCI) is a prodromal phase in the progression from
normal aging to dementia, especially Alzheimers disease. Even though there is
mild cognitive decline in MCI patients, they have normal overall cognition and
thus is challenging to distinguish from normal aging. Using transcribed data
obtained from recorded conversational interactions between participants and
trained interviewers, and applying supervised learning models to these data, a
recent clinical trial has shown a promising result in differentiating MCI from
normal aging. However, the substantial amount of interactions with medical
staff can still incur significant medical care expenses in practice. In this
paper, we propose a novel reinforcement learning (RL) framework to train an
efficient dialogue agent on existing transcripts from clinical trials.
Specifically, the agent is trained to sketch disease-specific lexical
probability distribution, and thus to converse in a way that maximizes the
diagnosis accuracy and minimizes the number of conversation turns. We evaluate
the performance of the proposed reinforcement learning framework on the MCI
diagnosis from a real clinical trial. The results show that while using only a
few turns of conversation, our framework can significantly outperform
state-of-the-art supervised learning approaches. ,  617kb)


No Responses Yet to “Arxiv on Feb. 20th”

  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: