We promised to publish an analysis of the solutions that learn from the inputs given to them. We are terribly sorry for the delay.

This post also announces the winners of the special prizes.

See also the Russian version of this page.

Learning solutions

We found 9 submissions to exhibit self-learning behavior. That is, their correctness rate improves as they are being continuously tested.

The idea is quite simple. Words belong to a limited dictionary, while nonwords are generated randomly. For this reason, any string that is seen repeatedly by the program being tested, is more likely to be a word than a nonword. If the testing continues long enough, most words in the dictionary will have been seen repeatedly. Nonwords can repeat, too, but because the space from which they are drawn is much bigger, this is far more seldom.

The nine learning submissions that we received deserved a separate comparison, which is now presented here.

Baseline correctness

First, we decided to see how the learning solutions would perform if they weren't allowed to learn at all. We ran them in a special mode where the solution is restarted every time a new block contains a word already seen.

In the table below, the baseline correctness and error rates were measured on 10,000 words (the same amount as we used in the main section). The baseline rank is the overall rank a solution would have had if it hadn't been allowed to learn (that is, if we ran all solutions in restart mode).

The table is ordered by the baseline correctness rate. As you can see, some of these solutions show strong results even without the learning effects.

#	ID	Baseline (10,000 blocks, with restarts)				Main (10,000 blocks, without restarts)
#	ID	Rank	Correct	F-	F+	Rank	Correct	F-	F+
1	Balzac	14	80.72%	3.50%	35.11%	6	81.62%	9.52%	27.26%
2	Konstantin Akimov	20	80.13%	5.50%	34.28%	21	80.13%	5.50%	34.28%
3	AndSDev	29	79.12%	1.58%	40.23%	27	79.50%	3.05%	37.99%
4	Zavtramen	30	79.06%	13.55%	28.35%	15	80.71%	9.92%	28.67%
5	Anton Bulychev	64	75.20%	5.81%	43.83%	53	76.53%	11.84%	35.13%
6	rd.nvkz	100	71.27%	9.27%	48.24%	92	71.51%	8.31%	48.72%
7	Vladimir Prihojenko	117	69.42%	14.23%	46.97%	103	70.83%	10.39%	48.01%
8	Ihor Brazhnichenko	122	67.99%	14.93%	49.13%	76	73.86%	24.68%	27.61%
9	Aleksandr Plesovskikh	235	49.93%	100%	0%	168	63.03%	68.30%	5.49%

The raw results of the run on 10,000 blocks with restarts are available as a JSON file.

Large-scale learning effects

It takes way more than 10,000 blocks for most dictionary words to appear at least twice each. To explore the learning potential fully, we used a much larger set: 1,000,000 blocks. (Arguably, we should have used it to test all submissions in the first place, but because many of the submissions weren't optimized for performance, it would have taken us months.) This larger dataset is an extension of the official testcase sequence, so the first 10,000 blocks are the same as in the main set.

When tested on such a large set, 4 of the 9 solutions ran out of memory and crashed. The table below summarizes each solution's final achievement; for those that crashed, the last achievement before the crash is shown. For performance reasons, our test system was configured to dump intermediate results only every 10,000 blocks, so the actual achievements by the crashing solutions are probably slightly better. The table below is ordered by the last achieved correctness rate.

We discovered that Node.js behaves in a nondeterministic manner even on identical programs with identical inputs. This causes the same solution to crash at different times when the entire set of tests is run repeatedly. Below are the results of one such experiment; your results might be slightly different. Note that we had to run the testing harness with the --unsafe option; without it, the Node.js VM seemed to cause out-of-memory crashes much earlier.

#	ID	Blocks	Correct	F-	F+
1	rd.nvkz	1,000,000	93.99%	4.12%	7.90%
2	AndSDev	1,000,000	93.65%	4.52%	8.19%
3	Konstantin Akimov	260,000	93.20%	8.28%	5.32%
4	Balzac	1,000,000	88.94%	1.80%	20.31%
5	Zavtramen	1,000,000	86.48%	1.95%	25.09%
6	Aleksandr Plesovskikh	420,000	86.39%	3.00%	24.23%
7	Anton Bulychev	1,000,000	83.84%	6.03%	26.28%
8	Ihor Brazhnichenko	360,000	81.82%	1.21%	35.17%
9	Vladimir Prihojenko	300,000	73.28%	2.37%	51.08%

The following graph shows how the correctness rate of the solutions improves with the number of blocks processed. Note that the horizontal scale is logarithmic.

For the following graph, we calculated the correctness rate on the latest 10,000 blocks at each point. This highlights the effects of learning even better, showing how well the solution performs after the learning, without the early learning period bringing down the average.

The following two graphs show how the false negative and false positive rates changed over time. Here, the rates are also marginal (that is, calculated for the latest 10,000 blocks at each point).

Interestingly, some of the solutions become worse after an initial period of improvement. The graph of false positives provides a likely explanation: because nonwords are generated randomly, they, too, can appear repeatedly (especially if they are short), albeit it happens much less freuqntly than with words. This leads to the accumulation of certain nonwords that the solution mistakes for words. Depending on the details of the learning algorithm, some solutions are more prone to this effect than others, and some seem to be completely immune to it.

The raw results of the run on 1,000,000 blocks are available as a JSON file.

Special prizes

We had declared that we would award two special prizes, and we decided to hand them to the authors of the best learning solutions. There wasn't an official criterion for what counts as “best”, so here is our rationale.

One 400 USD special prize goes to rd.nvkz, whose solution achieved the highest correctness rate we have seen, 93.99%. The learning potential of this solution is impressive: it kept improving long after the marginal correctness rates of most other solutions stabilized or even started to degrade.

Another 400 USD special prize goes to AndSDev for the second-highest correctness rate, 93.65%. At 1,000,000 blocks, these two leading solutions were almost indistinguishable, so we think they both deserve the special prizes. Although this solution doesn't continue improving at 1,000,000 blocks, it does not degrade, either, keeping its high correctness rate stable.

Congratulations to the winners of the special awards!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

06-learning-solutions.md

06-learning-solutions.md

Learning solutions

Baseline correctness

Large-scale learning effects

Special prizes

Files

06-learning-solutions.md

Latest commit

History

06-learning-solutions.md

File metadata and controls

Learning solutions

Baseline correctness

Large-scale learning effects

Special prizes