Skip to content

Latest commit

 

History

History
204 lines (172 loc) · 9.72 KB

06-learning-solutions.md

File metadata and controls

204 lines (172 loc) · 9.72 KB

Previous post

We promised to publish an analysis of the solutions that learn from the inputs given to them. We are terribly sorry for the delay.

This post also announces the winners of the special prizes.

See also the Russian version of this page.

Learning solutions

We found 9 submissions to exhibit self-learning behavior. That is, their correctness rate improves as they are being continuously tested.

The idea is quite simple. Words belong to a limited dictionary, while nonwords are generated randomly. For this reason, any string that is seen repeatedly by the program being tested, is more likely to be a word than a nonword. If the testing continues long enough, most words in the dictionary will have been seen repeatedly. Nonwords can repeat, too, but because the space from which they are drawn is much bigger, this is far more seldom.

The nine learning submissions that we received deserved a separate comparison, which is now presented here.

Baseline correctness

First, we decided to see how the learning solutions would perform if they weren't allowed to learn at all. We ran them in a special mode where the solution is restarted every time a new block contains a word already seen.

In the table below, the baseline correctness and error rates were measured on 10,000 words (the same amount as we used in the main section). The baseline rank is the overall rank a solution would have had if it hadn't been allowed to learn (that is, if we ran all solutions in restart mode).

The table is ordered by the baseline correctness rate. As you can see, some of these solutions show strong results even without the learning effects.

# ID Baseline (10,000 blocks, with restarts) Main (10,000 blocks, without restarts)
Rank Correct F- F+ Rank Correct F- F+
1 Balzac 14 80.72% 3.50% 35.11% 6 81.62% 9.52% 27.26%
2 Konstantin Akimov 20 80.13% 5.50% 34.28% 21 80.13% 5.50% 34.28%
3 AndSDev 29 79.12% 1.58% 40.23% 27 79.50% 3.05% 37.99%
4 Zavtramen 30 79.06% 13.55% 28.35% 15 80.71% 9.92% 28.67%
5 Anton Bulychev 64 75.20% 5.81% 43.83% 53 76.53% 11.84% 35.13%
6 rd.nvkz 100 71.27% 9.27% 48.24% 92 71.51% 8.31% 48.72%
7 Vladimir Prihojenko 117 69.42% 14.23% 46.97% 103 70.83% 10.39% 48.01%
8 Ihor Brazhnichenko 122 67.99% 14.93% 49.13% 76 73.86% 24.68% 27.61%
9 Aleksandr Plesovskikh 235 49.93% 100% 0% 168 63.03% 68.30% 5.49%

The raw results of the run on 10,000 blocks with restarts are available as a JSON file.

Large-scale learning effects

It takes way more than 10,000 blocks for most dictionary words to appear at least twice each. To explore the learning potential fully, we used a much larger set: 1,000,000 blocks. (Arguably, we should have used it to test all submissions in the first place, but because many of the submissions weren't optimized for performance, it would have taken us months.) This larger dataset is an extension of the official testcase sequence, so the first 10,000 blocks are the same as in the main set.

When tested on such a large set, 4 of the 9 solutions ran out of memory and crashed. The table below summarizes each solution's final achievement; for those that crashed, the last achievement before the crash is shown. For performance reasons, our test system was configured to dump intermediate results only every 10,000 blocks, so the actual achievements by the crashing solutions are probably slightly better. The table below is ordered by the last achieved correctness rate.

We discovered that Node.js behaves in a nondeterministic manner even on identical programs with identical inputs. This causes the same solution to crash at different times when the entire set of tests is run repeatedly. Below are the results of one such experiment; your results might be slightly different. Note that we had to run the testing harness with the --unsafe option; without it, the Node.js VM seemed to cause out-of-memory crashes much earlier.

# ID Blocks Correct F- F+
1 rd.nvkz 1,000,000 93.99% 4.12% 7.90%
2 AndSDev 1,000,000 93.65% 4.52% 8.19%
3 Konstantin Akimov 260,000 93.20% 8.28% 5.32%
4 Balzac 1,000,000 88.94% 1.80% 20.31%
5 Zavtramen 1,000,000 86.48% 1.95% 25.09%
6 Aleksandr Plesovskikh 420,000 86.39% 3.00% 24.23%
7 Anton Bulychev 1,000,000 83.84% 6.03% 26.28%
8 Ihor Brazhnichenko 360,000 81.82% 1.21% 35.17%
9 Vladimir Prihojenko 300,000 73.28% 2.37% 51.08%

The following graph shows how the correctness rate of the solutions improves with the number of blocks processed. Note that the horizontal scale is logarithmic.

Cumulative correctness rate

For the following graph, we calculated the correctness rate on the latest 10,000 blocks at each point. This highlights the effects of learning even better, showing how well the solution performs after the learning, without the early learning period bringing down the average.

Marginal correctness rate

The following two graphs show how the false negative and false positive rates changed over time. Here, the rates are also marginal (that is, calculated for the latest 10,000 blocks at each point).

Marginal false negatives

Marginal false positives

Interestingly, some of the solutions become worse after an initial period of improvement. The graph of false positives provides a likely explanation: because nonwords are generated randomly, they, too, can appear repeatedly (especially if they are short), albeit it happens much less freuqntly than with words. This leads to the accumulation of certain nonwords that the solution mistakes for words. Depending on the details of the learning algorithm, some solutions are more prone to this effect than others, and some seem to be completely immune to it.

The raw results of the run on 1,000,000 blocks are available as a JSON file.

Special prizes

We had declared that we would award two special prizes, and we decided to hand them to the authors of the best learning solutions. There wasn't an official criterion for what counts as “best”, so here is our rationale.

One 400 USD special prize goes to rd.nvkz, whose solution achieved the highest correctness rate we have seen, 93.99%. The learning potential of this solution is impressive: it kept improving long after the marginal correctness rates of most other solutions stabilized or even started to degrade.

Another 400 USD special prize goes to AndSDev for the second-highest correctness rate, 93.65%. At 1,000,000 blocks, these two leading solutions were almost indistinguishable, so we think they both deserve the special prizes. Although this solution doesn't continue improving at 1,000,000 blocks, it does not degrade, either, keeping its high correctness rate stable.

Congratulations to the winners of the special awards!