Wednesday, December 6, 2017

AlphaZero

The Deep Mind division of Google just released a paper in which they describe how they applied the methods they used to develop a Go program that achieved super human strength to develop programs to play chess and shogi.  The best previously existing chess and shogi programs are based on alpha/beta search and have been incrementally improved over many years.  They have been stronger than the best human chess players for about 20 years (Deep Blue beat Gary Kasparov in a 6 game match in 1997) and recently surpassed human shogi players as well.  However the Google programs (named AlphaZero in each case) appear to be stronger still, defeating Stockfish, a strong chess program, and Elmo, a strong shogi program by wide margins in 100 game matches.  10 example games (all wins for AlphaZero) in the match against Stockfish were included in an appendix to the paper and can be played over here.  They are pretty convincing (with the caveat that there doesn't seem to be much opening variation).  AlphaZero wins several games with long term positional sacrifices where it is not initially apparent that it has sufficient compensation for the material loss.

This is kind of a big deal.  People had tried applying Monte Carlo search with neural net evaluation functions to chess before but were unable to match the performance of highly tuned alpha/beta search programs developed over many years with lots of domain specific knowledge hardwired in.   Using a general algorithm with domain specific knowledge limited to the rules of the game to quickly develop apparently superior programs is impressive and a bit scary.