English (United Kingdom)

https://curated-unify.zendy.io/wp-json/zendy-region/v1/featured_content/oa?rat=en

https://curated-unify.zendy.io/wp-json/zendy-region/v1/highlighted_journal/

Zendy Plus

Presents the access of premium content as premium feature

Premium Content

Presents the keyphrase highlighting as premium feature

Keyphrase Highlighting

Presents the summarisation as premium feature

Summarisation

Insights

Presents the pdf analysis as premium feature

PDF Analysis

Presents the zaia usage as premium feature

ZAIA

Zendy Tools

Zendy Open

We consider the problem of designing sample efficient learning algorithms forinfinite horizon discounted reward Markov Decision Process. Specifically, wepropose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizesan accelerated stochastic gradient descent process to obtain the natural policygradient. ANPG achieves $\mathcal{O}({\epsilon^{-2}})$ sample complexity and$\mathcal{O}(\epsilon^{-1})$ iteration complexity with general parameterizationwhere $\epsilon$ defines the optimality error. This improves thestate-of-the-art sample complexity by a $\log(\frac{1}{\epsilon})$ factor. ANPGis a first-order algorithm and unlike some existing literature, does notrequire the unverifiable assumption that the variance of importance sampling(IS) weights is upper bounded. In the class of Hessian-free and IS-freealgorithms, ANPG beats the best-known sample complexity by a factor of$\mathcal{O}(\epsilon^{-\frac{1}{2}})$ and simultaneously matches theirstate-of-the-art iteration complexity.

Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm  with General Parameterization for Infinite Horizon Discounted Reward Markov  Decision Processes