Arxiv on Feb. 22nd


Title: Clipped Action Policy Gradient
Authors: Yasuhiro Fujita and Shin-ichi Maeda
Categories: cs.LG cs.AI stat.ML

Many continuous control tasks have bounded action spaces and clip
out-of-bound actions before execution. Policy gradient methods often optimize
policies as if actions were not clipped. We propose clipped action policy
gradient (CAPG) as an alternative policy gradient estimator that exploits the
knowledge of actions being clipped to reduce the variance in estimation. We
prove that CAPG is unbiased and achieves lower variance than the original
estimator that ignores action bounds. Experimental results demonstrate that
CAPG generally outperforms the original estimator, indicating its promise as a
better policy gradient estimator for continuous control tasks. ,  445kb)


No Responses Yet to “Arxiv on Feb. 22nd”

  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: