10:18
SAI progress
We already reported that the training D (aka 20x256 networks) failed to produce good networks. One possible reason was that we copied the training rate of previous trainings and maybe this was not right for the larger nets.
In particular, we copied the sudden drop in training rate that was *necessary* for training A (aka 6x128) —because the rate was wrong before— and still *worked* for training B (9x192) and C (12x256).
Reconsidering things a bit, we thought that maybe one problem was that this dropped the training rate too much when games were still pretty low level, and hence after that networks could find difficult to learn high-level features.
Hence we started a new experiment: redo training D copying AlphaZero training rates.
This required some maths to do the conversions, but basically at the beginning the suggested rate (0.02) was half way of our sudden drop during training A.
So we thought of doing the following: retrain D from the beginning with 0.02 until recent games, then rewind to say the last million games and train with 0.002, then rewind to the games above 10100 Elo and train with 0.0002.
We were yesterday at the end of the first swipe with high rate. Last three networks, corresponding to generations 507, 508 and 509, were trained with lower rate, to prepare before the rewind. The very last network was g1fd-792f
and we decided to test it just to see how bad it was.
We did not expect this net to be any good, but we had at least to check if it was abysmal before proceeding.
We were really surprised to find that it is instead pretty good and very promising.
This may be an effect of the rate drop in nets 507-509, but it is encouraging either way.