Premium
An Acquisition of Evaluation Function for Shogi by Learning Self‐Play
Author(s) -
Yamamoto Masahito,
Suzuki Keiji,
Ohuchi Azuma
Publication year - 2001
Publication title -
international transactions in operational research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.032
H-Index - 52
eISSN - 1475-3995
pISSN - 0969-6016
DOI - 10.1111/1475-3995.00267
Subject(s) - computer science , champion , reinforcement learning , artificial intelligence , function (biology) , artificial neural network , machine learning , evolutionary biology , political science , law , biology
Since Deep Blue, which is a chess program, beat the world human chess champion, recent interest in computer games has been directed to shogi. However, the search space for shogi is larger than that of chess and a captured piece is available again in shogi. To overcome these difficulties, we propose a reinforcement learning method by self‐play, in order to obtain a static evaluation function, which is a map from any positions in shogi to real values. Our proposed method is based on temporal difference learning, developed by R. Sutton and applied to backgammon by G. Tesauro. In our method, the neural network, which takes the board description of shogi positions and outputs the winning percentage from the position, is trained by only self‐play without any knowledge of shogi. In order to show the effectiveness of obtained evaluation function, some computational experiments will be presented.