English (United Kingdom)

https://curated-unify.zendy.io/wp-json/zendy-region/v1/featured_content/oa?rat=en

https://curated-unify.zendy.io/wp-json/zendy-region/v1/highlighted_journal/

Zendy Plus

Presents the access of premium content as premium feature

Premium Content

Presents the keyphrase highlighting as premium feature

Keyphrase Highlighting

Presents the summarisation as premium feature

Summarisation

Insights

Presents the pdf analysis as premium feature

PDF Analysis

Presents the zaia usage as premium feature

ZAIA

Zendy Tools

Zendy Open

We study reinforcement learning in non-episodic factored Markov decisionprocesses (FMDPs). We propose two near-optimal and oracle-efficient algorithmsfor FMDPs. Assuming oracle access to an FMDP planner, they enjoy a Bayesian anda frequentist regret bound respectively, both of which reduce to thenear-optimal bound $\widetilde{O}(DS\sqrt{AT})$ for standard non-factored MDPs.We propose a tighter connectivity measure, factored span, for FMDPs and prove alower bound that depends on the factored span rather than the diameter $D$. Inorder to decrease the gap between lower and upper bounds, we propose anadaptation of the REGAL.C algorithm whose regret bound depends on the factoredspan. Our oracle-efficient algorithms outperform previously proposednear-optimal algorithms on computer network administration simulations.

Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting