English (United Kingdom)

https://curated-unify.zendy.io/wp-json/zendy-region/v1/featured_content/oa?rat=en

https://curated-unify.zendy.io/wp-json/zendy-region/v1/highlighted_journal/

Zendy Plus

Presents the access of premium content as premium feature

Premium Content

Presents the keyphrase highlighting as premium feature

Keyphrase Highlighting

Presents the summarisation as premium feature

Summarisation

Insights

Presents the pdf analysis as premium feature

PDF Analysis

Presents the zaia usage as premium feature

ZAIA

Zendy Tools

Zendy Open

Strong worst-case performance bounds for episodic reinforcement learningexist but fortunately in practice RL algorithms perform much better than suchbounds would predict. Algorithms and theory that provide strongproblem-dependent bounds could help illuminate the key features of what makes aRL problem hard and reduce the barrier to using RL algorithms in practice. As astep towards this we derive an algorithm for finite horizon discrete MDPs andassociated analysis that both yields state-of-the art worst-case regret boundsin the dominant terms and yields substantially tighter bounds if the RLenvironment has small environmental norm, which is a function of the varianceof the next-state value functions. An important benefit of our algorithmic isthat it does not require apriori knowledge of a bound on the environmentalnorm. As a result of our analysis, we also help address an open learning theoryquestion~\cite{jiang2018open} about episodic MDPs with a constant upper-boundon the sum of rewards, providing a regret bound with no $H$-dependence in theleading term that scales a polynomial function of the number of episodes.

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds