English (United Kingdom)

https://curated-unify.zendy.io/wp-json/zendy-region/v1/featured_content/oa?rat=en

https://curated-unify.zendy.io/wp-json/zendy-region/v1/highlighted_journal/

Zendy Plus

Presents the access of premium content as premium feature

Premium Content

Presents the keyphrase highlighting as premium feature

Keyphrase Highlighting

Presents the summarisation as premium feature

Summarisation

Insights

Presents the pdf analysis as premium feature

PDF Analysis

Presents the zaia usage as premium feature

ZAIA

Zendy Tools

Zendy Open

Gradient-based temporal difference (GTD) algorithms are widely used inoff-policy learning scenarios. Among them, the two time-scale TD with gradientcorrection (TDC) algorithm has been shown to have superior performance. Incontrast to previous studies that characterized the non-asymptotic convergencerate of TDC only under identical and independently distributed (i.i.d.) datasamples, we provide the first non-asymptotic convergence analysis for twotime-scale TDC under a non-i.i.d.\ Markovian sample path and linear functionapproximation. We show that the two time-scale TDC can converge as fast asO(log t/(t^(2/3))) under diminishing stepsize, and can converge exponentiallyfast under constant stepsize, but at the cost of a non-vanishing error. Wefurther propose a TDC algorithm with blockwisely diminishing stepsize, and showthat it asymptotically converges with an arbitrarily small error at ablockwisely linear convergence rate. Our experiments demonstrate that such analgorithm converges as fast as TDC under constant stepsize, and still enjoyscomparable accuracy as TDC under diminishing stepsize.

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples