Skip to content

SPSA Tuning Workloads

Andrew Grant edited this page Mar 29, 2024 · 1 revision

The Create Tune button on the OpenBench sidebar can be used to create SPSA tunes. The options on this page are very similar to the Create Test page. Those options will not be reiterated, and you should refer to the "Creating Tests" wikipage for that information. The tune page adds a few new options, which are explain below. The SPSA Input is explained last.

Basic Overview of SPSA

SPSA (Simultaneous Perturbation Stochastic Approximation) is an optimization algorithm well-suited for handling a large number of variables, which may or may not be differentiable. The search function is extremely complex, and cannot be tuned using a simple loss function in the same sense that a Neural Network can be trained. As such, we use game-pair results as a proxy for a typical gradient.

Instead, what SPSA does is it takes a set of params P = { p0, p1, ... }, and then generates some slightly different, but complimentary, versions. For example, we might come up with P_white = { p0 + 3, p1 - 1, ... } and P_black = { p0 - 3, p1 + 1, ... }. Then we will play an engine against itself, with one copy getting P_white, and the other getting P_black. If white wins, we will assume that the shifting of P ({ +3, -1, ... }) is in the correct direction, and then we will slightly adjust the current value of P. We will repeat this process for hundreds of thousands of games, in the hopes that we will creep towards a more optimal understanding of the variables. More information about that is in the SPSA Methodologies section.

Before moving on, we will introduce two values, C_end and R_end. These are constant inputs provided to the SPSA mechanism, for each variable we want to tune. C_end acts as a sort of step size. When creating P_white and P_black, we will either add or subtract the value of C_end for a given parameter. So in the above, the C_end vector was { 3, 1, ... }. We'll note that C_end is the final value, and the actual vector C starts a bit higher, and the step size decreases as we near the end of the tuning session. R_end is an analog for learning rate, and starts a bit higher in the same way as C_end. When given some gradient ( a game-pair result ), we will adjust P by C x R.

An ideal value for R_end is 0.0020. An ideal value for C_end might be about 1/20th the reasonable range of a variable. It is very important that the values for C_end are selected well. They should be large enough to conceivably produce a difference in strength in the engine, but not much more.

An Important Caveat

OpenBench allows you to tune floating point parameters, but when using integers, there are some potential pitfalls. Traditionally, you must set C_end to be at least 0.50 for integer parameters, otherwise it is possible that x + C_end might equal x - C_end. To get around this, OpenBench raises C at each step for integer parameters to be at least 0.50. This means you can provide any C_end value you want, but note from above that the adjustment step depends linearly on C. A value of 0.00 for C_end will result in the parameter never moving.

You might encounter this issue when attempting to tune very small, very volatile values, such as concepts like the base reduction for Null Move Pruning, or perhaps the depth condition for Singular Extensions. In the case of the Null Move Pruning reduction, a typical value is R=3. You might attempt to tune this between 0 and 6. With a C_end of 0.50, the C values for the first iterations could be multiple times larger, leading to playing R=1 against R=5. This change is too extreme, and will cause the parameter to soar towards one direction, well past the optima. A remedy in this case, is to merge the base value with other things, like the depth and eval conditions. That way you can tune with additional granularity.

In the case of Singular Extension depth, we once again would get extreme results if our current depth threshold is 8, and suddenly we start testing 5 against 10. In this case, you might consider setting C_end=0.10, and then multiply R_end by a factor. This will help ensure that all games are played with a minimal (1) difference in the threshold, without lowering the learning rate too greatly.

There is no easy answer to the above issues. I personally refer to these types of parameters as "poisoned parameters", because without very good hyper-parameter selection, they have the ability to poison an entire tuning session. I have taken lengthy notes of my experience here in Torch, and can provide some insight upon request.

SPSA Methodologies

Reporting has two options, Bulk and Batched. If Bulk is selected, then the worker will finish playing their entire workload before sending back the results. This can be useful, since one motivation for playing more than a single-pair of games per SPSA-point is to get a higher quality estimate of the gradient. If Batched is selected, then results are returned for each game-pair, as they are finished. Although note that there is a reporting rate limit for SPSA tests, just like standard tests.

Distribution has two options, Single and Multiple. If Single is selected, then the Server will generate only a singular SPSA-point for the Client. If Multiple is selected, then the Server will generate one SPSA-point for every thread on the Client. For example, if you had a 60 thread machine obtain a test with the distribution set to Singular, you would, at a minimum, play 1 pairs of games with 60 copies of the sole SPSA-point. If you had set it to Multiple, the client would play, at a minimum, 30 pairs of games, played each half of the pair concurrently, with 30different SPSA-points. IE, 30 copies of Cutechess, spawned with a concurrency of 2.

These options were implemented to be able to test them, without simply picking one method and sticking to it. In practice, the Multiple distribution method results in more core idle time. Refer to this commit to learn more. In general, I've found that using a Batched reporting, with a Single distribution, and 8 pairs-per-point is sufficiently optimal. As it minimizes downtime, and appears to still maintain correctness even on 5,000 concurrent cores are working on the tune.

SPSA Hyperparameters

Alpha, Gamma, and A-Ratio are hyperparameters that dictate an analog for the decay of the "learning rate" and "step size" during the SPSA tune. The default values for Alpha and Gamma are 0.602 and 0.101 respectively. These are values obtained directly from some of the original SPSA academic papers. The default value for A-Ratio is 0.1, which seems reasonable. It is extremely unlikely that you will want to modify these values. If you do, you would not need to be reading this page.

Pairs-Per is an analog for workload size. This option determines the number of pairs you must play to evaluate each point; A value of 8 seems to work well for LTC tunes. You might consider raising the value when doing very fast tunes. This is because the fewer games played per workload, the more overhead and downtime is introduced. Iterations is simply the number of SPSA-points to fully evaluate.

SPSA Input

This is the most important section. It would be a good idea to have your engine be capable of automatically producing this text, when it is getting ready to tune something. Below is an example, which we will step through. Each line of this input contains 7 values, separated by commas.

phase_vals_0, int, 14.0, -5.0, 40.0, 2.25, 0.002
eval_scalar_0, float, 1024.0, 512.0, 1536.0, 51.2, 0.002
Field Example Explanation
1 phase_vals_0 This is the name of the parameter to be tuned
2 int / float This can either be int or float
3 14.0 Current value for the parameter. Also known as the current "optimal" value
4 -5.0 Minimum value for the parameter. We will never request the engine play with a lower value
5 40.0 Mmaximum value for the parameter. We will never request the engine play with a higher value
6 2.25 C_end, also known as the step-size at the end of the tuning session
7 0.002 R_end, also known as the learning rate at the end of the tuning session

Final Notes

The UCI protocol does not allow for SPIN options to contain floats. This would make tuning with floats impossible. A workaround is to have your engine report the option as a STRING instead. You can trust that OpenBench will ensure the values are within the [min, max] that was provided through the SPSA Input field.

Some testing in Torch suggests that using a SINGLE distribution method is superior to MULTIPLE. Our testing was not able to find a meaningful difference between BULK and BATCH reporting. Finally, we did not see a difference in tune quality when limiting the SPSA test to only a handful of workers, vs allowing the entire fleet to work on the test at the same time.