-
Notifications
You must be signed in to change notification settings - Fork 168
SPSA Tuning Workloads
The Create Tune
button on the OpenBench sidebar can be used to create SPSA tunes. The options on this page are very similar to the Create Test
page. Those options will not be reiterated, and you should refer to the "Creating Tests" wikipage for that information. The tune page adds a few new options, which are explain below. The SPSA Input is explained last.
SPSA (Simultaneous Perturbation Stochastic Approximation) is an optimization algorithm well-suited for handling a large number of variables, which may or may not be differentiable. The search function is extremely complex, and cannot be tuned using a simple loss function in the same sense that a Neural Network can be trained. As such, we use game-pair results as a proxy for a typical gradient.
Instead, what SPSA does is it takes a set of params P = { p0, p1, ... }
, and then generates some slightly different, but complimentary, versions. For example, we might come up with P_white = { p0 + 3, p1 - 1, ... }
and P_black = { p0 - 3, p1 + 1, ... }
. Then we will play an engine against itself, with one copy getting P_white
, and the other getting P_black
. If white wins, we will assume that the shifting of P
({ +3, -1, ... }
) is in the correct direction, and then we will slightly adjust the current value of P. We will repeat this process for hundreds of thousands of games, in the hopes that we will creep towards a more optimal understanding of the variables. More information about that is in the SPSA Methodologies section.
Before moving on, we will introduce two values, C_end
and R_end
. These are constant inputs provided to the SPSA mechanism, for each variable we want to tune. C_end
acts as a sort of step size. When creating P_white
and P_black
, we will either add or subtract the value of C_end
for a given parameter. So in the above, the C_end
vector was { 3, 1, ... }
. We'll note that C_end
is the final value, and the actual vector C
starts a bit higher, and the step size decreases as we near the end of the tuning session. R_end
is an analog for learning rate, and starts a bit higher in the same way as C_end
. When given some gradient ( a game-pair result ), we will adjust P
by C x R
.
An ideal value for R_end
is 0.0020
. An ideal value for C_end
might be about 1/20th the reasonable range of a variable. It is very important that the values for C_end
are selected well. They should be large enough to conceivably produce a difference in strength in the engine, but not much more.
OpenBench allows you to tune floating point parameters, but when using integers, there are some potential pitfalls. Traditionally, you must set C_end
to be at least 0.50
for integer parameters, otherwise it is possible that x + C_end
might equal x - C_end
. To get around this, OpenBench raises C
at each step for integer parameters to be at least 0.50
. This means you can provide any C_end
value you want, but note from above that the adjustment step depends linearly on C. A value of 0.00
for C_end
will result in the parameter never moving.
You might encounter this issue when attempting to tune very small, very volatile values, such as concepts like the base reduction for Null Move Pruning, or perhaps the depth condition for Singular Extensions. In the case of the Null Move Pruning reduction, a typical value is R=3
. You might attempt to tune this between 0
and 6
. With a C_end
of 0.50
, the C
values for the first iterations could be multiple times larger, leading to playing R=1
against R=5
. This change is too extreme, and will cause the parameter to soar towards one direction, well past the optima. A remedy in this case, is to merge the base value with other things, like the depth and eval conditions. That way you can tune with additional granularity.
In the case of Singular Extension depth, we once again would get extreme results if our current depth threshold is 8
, and suddenly we start testing 5
against 10
. In this case, you might consider setting C_end=0.10
, and then multiply R_end
by a factor. This will help ensure that all games are played with a minimal (1
) difference in the threshold, without lowering the learning rate too greatly.
There is no easy answer to the above issues. I personally refer to these types of parameters as "poisoned parameters", because without very good hyper-parameter selection, they have the ability to poison an entire tuning session. I have taken lengthy notes of my experience here in Torch, and can provide some insight upon request.
Reporting has two options, Bulk
and Batched
. If Bulk
is selected, then the worker will finish playing their entire workload before sending back the results. This can be useful, since one motivation for playing more than a single-pair of games per SPSA-point is to get a higher quality estimate of the gradient. If Batched
is selected, then results are returned for each game-pair, as they are finished. Although note that there is a reporting rate limit for SPSA tests, just like standard tests.
Distribution has two options, Single
and Multiple
. If Single
is selected, then the Server will generate only a singular SPSA-point for the Client. If Multiple
is selected, then the Server will generate one SPSA-point for every thread on the Client. For example, if you had a 60 thread machine obtain a test with the distribution set to Singular
, you would, at a minimum, play 1 pairs of games with 60 copies of the sole SPSA-point. If you had set it to Multiple
, the client would play, at a minimum, 30 pairs of games, played each half of the pair concurrently, with 30different SPSA-points. IE, 30 copies of Cutechess, spawned with a concurrency of 2.
These options were implemented to be able to test them, without simply picking one method and sticking to it. In practice, the Multiple distribution method results in more core idle time. Refer to this commit to learn more. In general, I've found that using a Batched
reporting, with a Single
distribution, and 8 pairs-per-point is sufficiently optimal. As it minimizes downtime, and appears to still maintain correctness even on 5,000 concurrent cores are working on the tune.
Alpha, Gamma, and A-Ratio are hyperparameters that dictate an analog for the decay of the "learning rate" and "step size" during the SPSA tune. The default values for Alpha
and Gamma
are 0.602
and 0.101
respectively. These are values obtained directly from some of the original SPSA academic papers. The default value for A-Ratio
is 0.1, which seems reasonable. It is extremely unlikely that you will want to modify these values. If you do, you would not need to be reading this page.
Pairs-Per is an analog for workload size. This option determines the number of pairs you must play to evaluate each point; A value of 8
seems to work well for LTC tunes. You might consider raising the value when doing very fast tunes. This is because the fewer games played per workload, the more overhead and downtime is introduced. Iterations is simply the number of SPSA-points to fully evaluate.
This is the most important section. It would be a good idea to have your engine be capable of automatically producing this text, when it is getting ready to tune something. Below is an example, which we will step through. Each line of this input contains 7 values, separated by commas.
phase_vals_0, int, 14.0, -5.0, 40.0, 2.25, 0.002
eval_scalar_0, float, 1024.0, 512.0, 1536.0, 51.2, 0.002
Field | Example | Explanation |
---|---|---|
1 | phase_vals_0 |
This is the name of the parameter to be tuned |
2 |
int / float
|
This can either be int or float
|
3 | 14.0 |
Current value for the parameter. Also known as the current "optimal" value |
4 | -5.0 |
Minimum value for the parameter. We will never request the engine play with a lower value |
5 | 40.0 |
Mmaximum value for the parameter. We will never request the engine play with a higher value |
6 | 2.25 |
C_end , also known as the step-size at the end of the tuning session |
7 | 0.002 |
R_end , also known as the learning rate at the end of the tuning session |
The UCI protocol does not allow for SPIN options to contain floats. This would make tuning with floats impossible. A workaround is to have your engine report the option as a STRING instead. You can trust that OpenBench will ensure the values are within the [min, max] that was provided through the SPSA Input field.
Some testing in Torch suggests that using a SINGLE distribution method is superior to MULTIPLE. Our testing was not able to find a meaningful difference between BULK and BATCH reporting. Finally, we did not see a difference in tune quality when limiting the SPSA test to only a handful of workers, vs allowing the entire fleet to work on the test at the same time.