Wednesday, December 6, 2017

AlphaZero

The Deep Mind division of Google just released a paper in which they describe how they applied the methods they used to develop a Go program that achieved super human strength to develop programs to play chess and shogi.  The best previously existing chess and shogi programs are based on alpha/beta search and have been incrementally improved over many years.  They have been stronger than the best human chess players for about 20 years (Deep Blue beat Gary Kasparov in a 6 game match in 1997) and recently surpassed human shogi players as well.  However the Google programs (named AlphaZero in each case) appear to be stronger still, defeating Stockfish, a strong chess program, and Elmo, a strong shogi program by wide margins in 100 game matches.  10 example games (all wins for AlphaZero) in the match against Stockfish were included in an appendix to the paper and can be played over here.  They are pretty convincing (with the caveat that there doesn't seem to be much opening variation).  AlphaZero wins several games with long term positional sacrifices where it is not initially apparent that it has sufficient compensation for the material loss.

This is kind of a big deal.  People had tried applying Monte Carlo search with neural net evaluation functions to chess before but were unable to match the performance of highly tuned alpha/beta search programs developed over many years with lots of domain specific knowledge hardwired in.   Using a general algorithm with domain specific knowledge limited to the rules of the game to quickly develop apparently superior programs is impressive and a bit scary.

1 comment:

  1. The most striking thing to me was that it took AlphaZero 4 hours' self-learning to in effect recap all of human (and at least most of computer) chess progress since beginning of the game.

    The qualifier on the computer side is due to my having read the wikipedia article on AlphaZero, which suggests that Stockfish's configuration (and the fixed-time moves) were decidedly non-optimal for Stockfish. Also I gather that AlphaZero may have made use of more HW processing power than Stockfish, though I haven't seen a clear description of this.

    BTW there is on Netflix an enjoyable documentary on AlphaGo. It stays more on the human than the technical side but they had access to Lee Sedol and the development team and I found it pretty compelling.

    One observer interviewed said that AlphaGo's occasional 'slack' moves (in which an alternative move could have gained more territory) were a result of the fact that AlphaGo was going for more certainty rather than more territory - i.e. Go masters tend to use score as a proxy for winning probability, but AlphaZero is saying 'No, a win by one is still a win'.

    ReplyDelete