17 October 2020
Channel «SAI progress» created
S
12:41
SAI progress
Hello everybody,
this is a new channel to broadcast frequent updates about SAI progress and training experiments.
SAI progress pinned this message
S
12:42
SAI progress
These four nets were trained with KLE loss instead of MSE loss
12:45
KLE stands for Kullback–Leibler divergence error, that is cross-entropy. This loss should in principle be the most suitable for a binary variable such as the outcome of a game, but for some reason DeepMind chose MSE over KLE for AlphaGo Zero, so we would like to see if we can find any difference between the two.
Channel photo changed
S
15:44
SAI progress
This one instead is a "null control" where no loss term for the value head was included. Neither MSE, nor KLE but only the policy loss term. Apparently there is no significant statistical difference in a single train experiment
S
18:19
SAI progress
This is point 6/8 of a factorial experiment in 20x256 training. We are changing rate, window size, mse loss weight and steps. To assess network strength we selected a subpanel of 6 very different nets. More info when the experiment will be finished.
18 October 2020
S
11:15
SAI progress
This is point 7/8. Next and last one in three days 😵
19 October 2020
S
09:19
SAI progress
KLE experiment goes on, as one generation did not show measurable differences with MSE. These four nets were trained starting from the best KLE network from previous generation. We will continue in this way to see if after 5 or 10 generations we can spot some differences. Main training will ignore these networks until the end of the experiment.
20 October 2020
S
07:41
SAI progress
Third generation of KLE nets
S
18:31
SAI progress
Fourth generation of KLE nets
18:31
We will go on to ten generations in this experiment and then draw some conclusions
S
21:06
SAI progress
Last point of the 20x256 factorial experiment
21:14
This experiment was intended to study the effect of several training hyperparameters on the training of 20x256 nets.
We started from the last 20x256 network that seemed actually stronger than the corresponding 12x256. That would be g184-6df22881. From there we trained with 4 combination of parameters:
21:18
generations is the generation window: 14 means from g19a to g1a7 and 42 from g17e to g1a7
mini_rate is the rate step per ram_minibatch (=64 positions for this net)
mse_weight is the coefficient for the value head term in the training loss: policy_weight is always 1
21:20
The training steps went from a minimum of 2k per generation to a maximum of 14k per generation. (Each step uses 128 positions.)
21:25
This is the full table of runs, with actual parameters and hashes. The first line is the starting network of generation g184
21:26
And these are the results in a single plot
21:27
It is apparent that the lower rate (1e-6) was better, and that the more the steps, the weaker the network.
21:29
This is better shown in the "effects" plot. Coefficients are negative because apparently "less in better" for these four factors, and the most relevant are steps and rate
S
21:58
SAI progress
In reply to this message
With 150 games per network, the standard error on each of these points is about 4%, which is quite large
22:00
In reply to this message
With 8 runs in a orthogonal design, the standard error on these effects is about 1.4%, which is still large, looking to the size of the effects
22:02
For all these reasons, we are going to increase the number of games for these matches from 150 to about 400 and see if the results are confirmed.
21 October 2020
S
08:30
SAI progress
In reply to this message
These and all other factorial experiment matches' type renamed to "experiment"
S
21:39
SAI progress
Update on 20x256 factorial experiment after game number increase
21:41
Results are kind of inverted. Standard error on these points is about 2.5%, so still pretty large and the visible differences might well be statistical fluctuations
21:42
Also effects have changed completely, and still are of small size with respect to the standard error (about 0.8% for these figures)
21:44
Basically the conclusion for now is that the effect of hyperparameters on nets strength is small if present at all
22 October 2020
S
17:02
SAI progress
KLE experiment 5th generation
17:04
There appear to be very erratic performances of some of these nets. Compare for example c175 and c5b9 against 2e8e and a83d
S
17:45
SAI progress
Purple points (and lines) are KLE nets (and chosen ones). The ratings may well be inflated, in particular the last one.
24 October 2020
S
16:22
SAI progress
This is the average performances of latest nets (one in four) against panel nets. (Redrawn with a better scale.)
16:22
Since panel nets are always the same, this plot is never inflated (as may be the Elo plot)
16:22
It seems that we have been somewhat stuck for the last 40 generations
16:31
These are the performances against single panel nets
16:44
6th KLE generation: 3072 was selected
25 October 2020
S
14:46
SAI progress
7th KLE generation: d16e was selected
14:46
8th KLE generation: 2d51 was selected
26 October 2020
S
09:27
SAI progress
9th KLE generation: dbb1 was selected
S
18:29
SAI progress
10th generation: 8047 was selected
18:43
I think we will do one more generation, just to compare with one panel network. Will put KLE nets #3, #7, #11 against the subpanel to compare with corresponding MSE nets
27 October 2020
S
17:48
SAI progress
11th generation: 50d5 was selected. Subpanel matches in progress
S
21:29
SAI progress
Maybe we can insist with KLE networks a little bit more...
30 October 2020
S
10:37
SAI progress
KLE 12th generation. d1c6 selected. We will go light with matches in KLE experiment from now on.
10:37
KLE 13th generation. 8ba7 selected.
S
18:21
SAI progress
KLE 14th generation. c901 selected.
31 October 2020
S
12:03
SAI progress
Again a good performance of KLE nets
12:05
By the way, in KLE 15th generation 03f2 was selected
1 November 2020
S
18:02
SAI progress
KLE 16th generation. 5228 was selected
2 November 2020
S
09:53
SAI progress
KLE 17th generation. 7776 was selected
3 November 2020
S
08:52
SAI progress
KLE 18th generation. 983b was selected
08:53
Next generation will have again the evaluation against the subpanel and then we will choose whether to switch to KLE in the main training
4 November 2020
S
07:46
SAI progress
Seems like we better consider a transition from MSE to KLE training. We will think how to do this in the best and most smooth way. SAI doesn't like discontinuities.
07:48
BTW in the most recent KLE training (19th) 779e was selected.
6 November 2020
S
10:34
SAI progress
Added one point trained with both KLE and MSE equal to zero (only policy loss), just to check that the value head is actually being used in training currently 😅
13 November 2020
S
19:01
SAI progress
Graph about the policy weight of best 5 first moves and the alpha of blank board (aka expected fair komi)[right axis]
16 November 2020
S
16:03
SAI progress
Performance of last 124 generations VS panel (one point every 4 gens)
16:05
We included KLE training from generation 543. The first 8 promotion candidate networks of each generation are KLE-trained and the other 8 are MSE-trained. In the first two attempts, KLE won both times.
17 November 2020
S
09:59
SAI progress
In the last three generation the promoted network was always a KLE network!
10:00
What is this experiment? It's a long story...
S
10:18
SAI progress
We already reported that the training D (aka 20x256 networks) failed to produce good networks. One possible reason was that we copied the training rate of previous trainings and maybe this was not right for the larger nets.
In particular, we copied the sudden drop in training rate that was *necessary* for training A (aka 6x128) —because the rate was wrong before— and still *worked* for training B (9x192) and C (12x256).
Reconsidering things a bit, we thought that maybe one problem was that this dropped the training rate too much when games were still pretty low level, and hence after that networks could find difficult to learn high-level features.
Hence we started a new experiment: redo training D copying AlphaZero training rates.
This required some maths to do the conversions, but basically at the beginning the suggested rate (0.02) was half way of our sudden drop during training A.
So we thought of doing the following: retrain D from the beginning with 0.02 until recent games, then rewind to say the last million games and train with 0.002, then rewind to the games above 10100 Elo and train with 0.0002.
We were yesterday at the end of the first swipe with high rate. Last three networks, corresponding to generations 507, 508 and 509, were trained with lower rate, to prepare before the rewind. The very last network was g1fd-792f and we decided to test it just to see how bad it was.
We did not expect this net to be any good, but we had at least to check if it was abysmal before proceeding.
We were really surprised to find that it is instead pretty good and very promising.
This may be an effect of the rate drop in nets 507-509, but it is encouraging either way.
10:20
Because of this little surprise, we will *continue* this first swipe with equivalent rate 0.02 to the most recent games (restarting from 506) and see how strong we get with a real test.
18 November 2020
S
20:43
SAI progress
This match is between the very last D network of the first swipe (rate 0.02, generation 544) VS the official C net of generation 543. It is confirmed that these D networks are neither very strong nor very weak. We will retrain the last 3 nets with rates 0.0093, 0.0043, 0.002, test again and start the second swipe
1 December 2020
S
09:08
SAI progress
Sorry for the recent lack of updates. We are on the third and last swipe of D (20x256) networks, and maybe we have something
09:10
I must say that this network was trained on games of nets stronger than it current opponent. Not so much stronger though... We will see.
S
09:50
SAI progress
Just to be sure... we are going to test 6fd3aa against the subpanel now
S
12:11
SAI progress
Subpanel results confirm that this network is strong
S
12:42
SAI progress
ok, still preliminary, but...
S
14:02
SAI progress
😳
S
15:06
SAI progress
against LZ networks: some progress but we are still far
15:07
We will go on with the third swipe and see how things go
2 December 2020
S
15:51
SAI progress
These are three networks of the recent D train. Their strength is confirmed.
15:55
We will keep testing one net every 16 generations until we get to the current games, and then we will have to decide what to do.
1) move the main pipeline to D (20x256) networks (because they are so much stronger, but at the cost of slowing games a lot), or
2) try the same training procedure that worked for D network on C nets. This could potentially yield stronger networks with the same structure and would be a win-win, but at the cost of waiting some weeks to be able to test this procedure works also for C networks.
4 December 2020
S
17:53
SAI progress
Hello! Apparently the strength of subsequent nets of the third swipe is slowly decreasing.
17:54
We checked the training, and we discovered some overfitting in the MSE loss term
17:55
Here the x axis is the generation. The three data series are: first swipe (test orange, train blue, rate 0.02, stable, large overfitting); second swipe (test grey, train green, rate 0.002, overfitting starts around generation 480); third swipe (test yellow, train purple, rate 0.0002, same overfitting as second swipe)
17:56
retrain-D21.pdf
Not included, change data exporting settings to download.
4.5 MB
17:56
Zoom on third swipe generations
18:01
We are going to test some networks of the second swipe, just to check if those around generation 448 are as strong as those of the third swipe.
5 December 2020
S
09:39
SAI progress
Updated graph. Overfitting in the final part of the third swipe is even worse than in the second swipe. In agreement, the corresponding net of generation 528 had a terrible performance.
09:40
Now we will be trying to redo the third swipe from generation 472 with generation window increased from 12 to 16 generations per training and see if it improves.
09:41
Meanwhile we will be testing other networks of the second swipe to see when and how fast they got their strength.
6 December 2020
S
18:07
SAI progress
The test of D networks (20x256) from the recent three-swipe experiment is here.
In purple nets from the third swipe of generations from 432 to 512. In blue nets from the second swipe of earlier generations. In red, pointed by the arrow, the last net from the first swipe.
18:11
Net 320 blue is the first network of the second swipe, trained starting from the red one directly. The huge strength difference is therefore the consequence of just that single training, done on barely 7 generations of games for 35k steps.
9 December 2020
S
08:43
SAI progress
We continued the third swipe with the most recent generations and the overfitting ends, after a while.
08:44
The network obtained after the overfitting ended is strong again
08:44
We are not switching immediately to 20x256 for self-plays though, as this would slow down games a lot. We are going to try the same training procedure with a new 12x256 network and we aim to switch to that, if it proves to be stronger.
24 December 2020
S
15:50
SAI progress
Hello! We completed the first swipe of the above training procedure on 12x256 structure.
15:50
Apparently the new training procedure works also for 12x256 networks. You see here the very first net of the second swipe
15:51
I anticipate that we are going to promote one of these networks very soon.
S
19:01
SAI progress
First network tested, compared and promoted!
19:02
Will continue the second swipe of training and test new nets as they come out (one every 16 generations)
25 December 2020
S
11:14
SAI progress
The usual training over the newly promoted 186f82 network was shocking!
11:16
In reply to this message
After promotion, we changed the x value of the above net to 3.062M games, so that the line of blue dots progresses to the right of the plot
11:22
Second and third nets of the second swipe were put up for promotion, but they are not stronger of the new generation nets and will not be promoted.
11:24
Here are their matches against same generation nets and against current best
1 January 2021
S
09:44
SAI progress
So, happy new year! And now I will explain the recent exploit
09:45
09:54
At the beginning of December, after seeing the very promising results of our latest attempt to train a 20x256 network, we started a similar procedure on 12x256 networks.
The general idea was that the rates chosen during the current pipeline where too low, or at least, they became too low too soon and we wanted to correct this.
In fact in the current pipeline, every generation we choose greedily between 16 networks, which are trained mixing two learning rates, two training window sizes, two set of loss weights and four number of steps. Choosing greedily may be good in the short run, but proved to be less good in the long run. In fact it caused the rates to stay too much low.
09:59
So we started with a brand new network (random weights) and trained on a moving window of games with a high rate 0.02 for a first "swipe" which reached generation g22d=557 (arbitrarily chosen recent point). The networks obtained during this step were tested just at the beginning, when they were strong, because 0.02 was comparable with the training rate of the corresponding old networks.
10:05
Then we took the last of these weights and started again training on a moving window, but from generation g140=320 with rate 0.002. These networks should have been immediately strong, so while they were coming out from the training procedure we started testing them. The very first one was g140-186f, which was promising enough that we decided to put it up for promotion. It won (barely) on the 16 networks of generation g241=577 and was promoted, renamed g241-186f and its x value on the plot was made equal to that of g241 networks.
10:06
g241-186f is the last low blue point in the plot, just before the first huge jump in strength
10:11
As the second swipe progressed, we would have expected subsequent networks to be somewhat stronger than 186f and to put them too in promotion with reasonable success. What we did not think of, was that the usual training pipeline used much lower training rates, and so the networks trained in generation g242 benefited from the sudden drop in rate and brought out the true strength of the training procedure, and hence the huge jump from Elo around 10180 to around 10280.
10:17
This sudden improvement prevented the subsequent networks of the second swipe from being competitive against promotion networks. In fact we still tested them, but without much consequence. The second swipe ended with generation g221=545. Then of course we dropped the rate! We trained four networks of generation g230=560 and then four more of generation g240=576, starting from the last second-swipe network and with rate 0.0002.
10:19
All these networks were very strong and put for promotion in generation g247=583. One of the g230 networks won and was promoted as g247-a6a6, with another jump of 100 Elo to around 10380.
10:20
So now we will let the waters stir a bit, make the self-plays of these new generations make their effect on the future trainings and see what happens.
10:23
After that, it is clear that the usual training pipeline will need to include all this wisdom, and become a bit more complicate, starting from suitable high-rate weights and not from the previous network. There is a little time for that.