-
Notifications
You must be signed in to change notification settings - Fork 205
Benchmarks
This page contains a variety of benchmark results of various features on various domains. To keep the size of this page in check, we remove obsolete measurements (either abandoned
Each dataset has train and test splits. Our primary comparison factor is AP Recall (APR) and Mean Reciprocal Rank (MRR) on the test split (see below). (When talking to outsiders, accuracy-at-one is the easiest measure to use, but it is much more noisy than MRR, so it's brittle for day-to-day evaluation.) Unless otherwise specified, our models are always retrained on the train split before measuring accuracy on the test split.
We are benchmarking on several datasets from two basic families. The TREC-based questions are general factoid questions of wide variety and character, answerable primarily from Wikipedia:
- curated (https://github.com/brmson/dataset-factoid-curated) is a cleaned up version of the TREC dataset with some IRC-based questions from early user testing also added in. This is our primary "general QA" benchmark.
- large2180 (in dev branch of https://github.com/brmson/dataset-factoid-curated) is a larger version of the TREC dataset, that mixes in some noisy question to the curated dataset in the interest of having more data to train on. We use this dataset to test how our machine learning scales up.
- trecnew-raw (in dev branch of https://github.com/brmson/dataset-factoid-curated) has just a test split and contains factoid questions like in curated, but not their cleaned up and filtered version, allowing a more realistic comparison with the "old" programs from the TREC challenges. We rarely use this dataset, just for some final benchmarks when writing papers.
Raw measurements for various historical commits are available from http://pasky.or.cz/dev/brmson/yodaqa-eval/ ...
The other family of datasets is originally based on WebQuestions, exhibiting more monotonous questions modelled around the Freebase knowledge base and always asking for entities (not for example for numbers):
-
wq (https://github.com/brmson/dataset-factoid-webquestions) is the main WebQuestions dataset, following the original train and test splits, just in a cleaned up format etc. We do not test on this dataset regularly, mainly due to its size.
-
movies? (https://github.com/brmson/dataset-factoid-movies), at this point moviesC, is filtered just to (mostly) questions from WebQuestions that pertain movies, but enriched with many more questions asked by our users that also often ask for numbers or other more complex questions, contain typos, etc. We also checked Google's performance on these (https://github.com/brmson/google-qa). This is another of our current primary benchmarks.
Raw measurements for various historical commits are available from http://pasky.or.cz/dev/brmson/yodaqa-movies-eval/ ...
Typically, our tests are done using data/eval/train-and-eval.sh
.
When we need to benchmark without retraining, we use something like:
data/eval/_multistage-traineval.sh . trecnew-raw-test 0 0
Each split+commit combination results in three lines of
data/eval/tsvout-stats.sh
output. The important line for us
is the one with the commit prefixed by u as that's the initial
pipeline stage (followup stages create user-friendly output, but
typicaly senselessly overfit).
The format of each line is:
dataset-split commit commitdate Commit message ans/irr/tot ACC1%/ APR% mrr 0.mrr avgtime xyz
ans and ACC1 is the accuracy-at-one --- number (and percentage) of questions where the top answer is correct (since we attempt to answer all questions, this would be precision@100 in DeepQA parlance). APR is Answer Production Recall, i.e. number of questions where the correct answer is generated as a hypothesis. MRR is the mean of reciprocial rank over all questions; a question with top answer correct will have RR=1, a question with second answer correct will have RR=0.5, etc. Ignore the avgtime, it's currently garbage, unfortunately.
We use the master branch to measure TREC-based QA.
v1.4 --- curated APR 77.7%, MRR 0.405; large2180 APR 75.5%, MRR 0.379:
curated-test 2b85c94 2015-11-10 AnswerScoreDecisionF... 131/282/430 30.5%/65.6% mrr 0.401 avgtime 3596.696
curated-test u2b85c94 2015-11-10 AnswerScoreDecisionF... 136/334/430 31.6%/77.7% mrr 0.405 avgtime 3320.982
curated-test v2b85c94 2015-11-10 AnswerScoreDecisionF... 135/282/430 31.4%/65.6% mrr 0.409 avgtime 3524.656
curated-trai 2b85c94 2015-11-10 AnswerScoreDecisionF... 295/308/430 68.6%/71.6% mrr 0.699 avgtime 4083.608
curated-trai u2b85c94 2015-11-10 AnswerScoreDecisionF... 164/340/430 38.1%/79.1% mrr 0.480 avgtime 3639.840
curated-trai v2b85c94 2015-11-10 AnswerScoreDecisionF... 257/308/430 59.8%/71.6% mrr 0.649 avgtime 3968.469
large2180-te 2b85c94 2015-11-10 AnswerScoreDecisionF... 209/432/694 30.1%/62.2% mrr 0.387 avgtime 5137.313
large2180-te u2b85c94 2015-11-10 AnswerScoreDecisionF... 205/524/694 29.5%/75.5% mrr 0.379 avgtime 4828.143
large2180-te v2b85c94 2015-11-10 AnswerScoreDecisionF... 213/432/694 30.7%/62.2% mrr 0.389 avgtime 5076.575
large2180-tr c92760c 2015-11-04 HIGHLEVEL.md constra... 726/926/1479 49.1%/62.6% mrr 0.543 avgtime 11820.064
large2180-tr uc92760c 2015-11-04 HIGHLEVEL.md constra... 453/1075/1479 30.6%/72.7% mrr 0.389 avgtime 10848.609
large2180-tr vc92760c 2015-11-04 HIGHLEVEL.md constra... 602/926/1479 40.7%/62.6% mrr 0.483 avgtime 11638.433
v1.3 --- curated APR 77.9%, MRR 0.413; large2180 APR 75.5%, MRR 0.390:
curated-test 88f39c2 2015-10-19 Mbprop.txt: Retrain ... 138/279/430 32.1%/64.9% mrr 0.407 avgtime 2947.311
curated-test u88f39c2 2015-10-19 Mbprop.txt: Retrain ... 144/335/430 33.5%/77.9% mrr 0.413 avgtime 2681.062
curated-test v88f39c2 2015-10-19 Mbprop.txt: Retrain ... 144/279/430 33.5%/64.9% mrr 0.418 avgtime 2874.982
curated-trai 88f39c2 2015-10-19 Mbprop.txt: Retrain ... 290/306/430 67.4%/71.2% mrr 0.691 avgtime 3725.780
curated-trai u88f39c2 2015-10-19 Mbprop.txt: Retrain ... 169/335/430 39.3%/77.9% mrr 0.479 avgtime 3295.355
curated-trai v88f39c2 2015-10-19 Mbprop.txt: Retrain ... 260/306/430 60.5%/71.2% mrr 0.649 avgtime 3611.334
large2180-te 88f39c2 2015-10-19 Mbprop.txt: Retrain ... 218/435/694 31.4%/62.7% mrr 0.392 avgtime 4509.625
large2180-te u88f39c2 2015-10-19 Mbprop.txt: Retrain ... 217/524/694 31.3%/75.5% mrr 0.390 avgtime 4223.223
large2180-te v88f39c2 2015-10-19 Mbprop.txt: Retrain ... 207/435/694 29.8%/62.7% mrr 0.382 avgtime 4450.021
large2180-tr 88f39c2 2015-10-19 Mbprop.txt: Retrain ... 729/916/1479 49.3%/61.9% mrr 0.544 avgtime 12199.161
large2180-tr u88f39c2 2015-10-19 Mbprop.txt: Retrain ... 468/1063/1479 31.6%/71.9% mrr 0.398 avgtime 11243.693
large2180-tr v88f39c2 2015-10-19 Mbprop.txt: Retrain ... 598/916/1479 40.4%/61.9% mrr 0.480 avgtime 12019.682
v1.2 --- curated APR 77.2%, MRR 0.439; large2180 APR 74.8%, MRR 0.411:
curated-test 0296763 2015-08-30 data/ml/biocrf/model... 146/287/430 34.0%/66.7% mrr 0.431 avgtime 2392.096
curated-test u0296763 2015-08-30 data/ml/biocrf/model... 152/332/430 35.3%/77.2% mrr 0.439 avgtime 2157.916
curated-test v0296763 2015-08-30 data/ml/biocrf/model... 151/287/430 35.1%/66.7% mrr 0.440 avgtime 2343.056
curated-trai 0296763 2015-08-30 data/ml/biocrf/model... 290/303/430 67.4%/70.5% mrr 0.689 avgtime 3887.648
curated-trai u0296763 2015-08-30 data/ml/biocrf/model... 181/332/430 42.1%/77.2% mrr 0.503 avgtime 3595.703
curated-trai v0296763 2015-08-30 data/ml/biocrf/model... 257/303/430 59.8%/70.5% mrr 0.644 avgtime 3816.893
large2180-te 0296763 2015-08-30 data/ml/biocrf/model... 224/439/694 32.3%/63.3% mrr 0.402 avgtime 3326.777
large2180-te u0296763 2015-08-30 data/ml/biocrf/model... 233/519/694 33.6%/74.8% mrr 0.411 avgtime 2994.481
large2180-te v0296763 2015-08-30 data/ml/biocrf/model... 221/439/694 31.8%/63.3% mrr 0.399 avgtime 3260.786
large2180-tr 0296763 2015-08-30 data/ml/biocrf/model... 735/925/1479 49.7%/62.5% mrr 0.551 avgtime 7906.924
large2180-tr u0296763 2015-08-30 data/ml/biocrf/model... 485/1052/1479 32.8%/71.1% mrr 0.406 avgtime 7057.941
large2180-tr v0296763 2015-08-30 data/ml/biocrf/model... 586/925/1479 39.6%/62.5% mrr 0.477 avgtime 7726.841
v1.1 --- curated APR 77.2%, MRR 0.409; large2180 APR 74.8%, MRR 0.398:
curated-test 76cc1af 2015-08-26 Merge branch 'master... 134/284/430 31.2%/66.0% mrr 0.405 avgtime 3460.146
curated-test u76cc1af 2015-08-26 Merge branch 'master... 135/332/430 31.4%/77.2% mrr 0.409 avgtime 3231.877
curated-test v76cc1af 2015-08-26 Merge branch 'master... 127/284/430 29.5%/66.0% mrr 0.397 avgtime 3411.869
curated-trai 76cc1af 2015-08-26 Merge branch 'master... 301/306/430 70.0%/71.2% mrr 0.705 avgtime 5815.394
curated-trai u76cc1af 2015-08-26 Merge branch 'master... 199/333/430 46.3%/77.4% mrr 0.538 avgtime 5533.997
curated-trai v76cc1af 2015-08-26 Merge branch 'master... 281/306/430 65.3%/71.2% mrr 0.677 avgtime 5747.069
large2180-te 76cc1af 2015-08-26 Merge branch 'master... 222/443/694 32.0%/63.8% mrr 0.408 avgtime 3622.175
large2180-te u76cc1af 2015-08-26 Merge branch 'master... 218/519/694 31.4%/74.8% mrr 0.398 avgtime 3285.847
large2180-te v76cc1af 2015-08-26 Merge branch 'master... 235/443/694 33.9%/63.8% mrr 0.416 avgtime 3556.244
large2180-tr 76cc1af 2015-08-26 Merge branch 'master... 752/927/1479 50.8%/62.7% mrr 0.558 avgtime 8455.257
large2180-tr u76cc1af 2015-08-26 Merge branch 'master... 498/1051/1479 33.7%/71.1% mrr 0.412 avgtime 7622.098
large2180-tr v76cc1af 2015-08-26 Merge branch 'master... 616/927/1479 41.6%/62.7% mrr 0.491 avgtime 8287.513
trecnew-raw- ovt 2015-08-29 Merge branch 'master... 121/233/447 27.1%/52.1% mrr 0.346 avgtime 3756.961
trecnew-raw- ovt 2015-08-29 Merge branch 'master... 118/272/447 26.4%/60.9% mrr 0.325 avgtime 3496.736
trecnew-raw- ovt 2015-08-29 Merge branch 'master... 123/233/447 27.5%/52.1% mrr 0.345 avgtime 3681.780
v1.0 (the first YodaQA paper) --- curated APR 79.3%, MRR 0.420:
curated-test 0ae3b79 2015-04-14 Merge branch 'master... 137/292/430 31.9%/67.9% mrr 0.413 avgtime 6767.419
curated-test u0ae3b79 2015-04-14 Merge branch 'master... 139/341/430 32.3%/79.3% mrr 0.420 avgtime 6549.246
curated-test v0ae3b79 2015-04-14 Merge branch 'master... 138/292/430 32.1%/67.9% mrr 0.418 avgtime 6687.020
curated-trai 0ae3b79 2015-04-14 Merge branch 'master... 152/283/430 35.3%/65.8% mrr 0.454 avgtime 6566.500
curated-trai u0ae3b79 2015-04-14 Merge branch 'master... 131/329/430 30.5%/76.5% mrr 0.392 avgtime 6358.768
curated-trai v0ae3b79 2015-04-14 Merge branch 'master... 155/283/430 36.0%/65.8% mrr 0.456 avgtime 6492.669
trecnew-raw- ovt 2015-04-14 Merge branch 'master... 118/237/447 26.4%/53.0% mrr 0.333 avgtime 6213.230
trecnew-raw- ovt 2015-04-14 Merge branch 'master... 112/278/447 25.1%/62.2% mrr 0.323 avgtime 6056.471
trecnew-raw- ovt 2015-04-14 Merge branch 'master... 112/237/447 25.1%/53.0% mrr 0.326 avgtime 6159.455
We don't do day-to-day development on this baseline, but this section records performance evolution on the Bing-enabled version running at http://live.ailao.eu/.
Current version (v1.4):
large2180-te 6a040cb 2015-11-10 Merge remote-trackin... 255/470/694 36.7%/67.7% mrr 0.447 avgtime 8150.822
large2180-te u6a040cb 2015-11-10 Merge remote-trackin... 242/553/694 34.9%/79.7% mrr 0.439 avgtime 7758.923
large2180-te v6a040cb 2015-11-10 Merge remote-trackin... 245/470/694 35.3%/67.7% mrr 0.439 avgtime 8055.959
large2180-tr 6a040cb 2015-11-10 Merge remote-trackin... 766/981/1479 51.8%/66.3% mrr 0.579 avgtime 16448.392
large2180-tr u6a040cb 2015-11-10 Merge remote-trackin... 467/1131/1479 31.6%/76.5% mrr 0.409 avgtime 15269.455
large2180-tr v6a040cb 2015-11-10 Merge remote-trackin... 635/981/1479 42.9%/66.3% mrr 0.514 avgtime 16181.728
A bit later version:
large2180-te 35a4484 2015-10-16 Merge branch 'master... 260/469/694 37.5%/67.6% mrr 0.454 avgtime 11034.758
large2180-te u35a4484 2015-10-16 Merge branch 'master... 227/558/694 32.7%/80.4% mrr 0.422 avgtime 10687.774
large2180-te v35a4484 2015-10-16 Merge branch 'master... 261/469/694 37.6%/67.6% mrr 0.452 avgtime 10955.408
large2180-tr 35a4484 2015-10-16 Merge branch 'master... 759/996/1479 51.3%/67.3% mrr 0.581 avgtime 15775.905
large2180-tr u35a4484 2015-10-16 Merge branch 'master... 483/1131/1479 32.7%/76.5% mrr 0.418 avgtime 14665.273
large2180-tr v35a4484 2015-10-16 Merge branch 'master... 640/996/1479 43.3%/67.3% mrr 0.515 avgtime 15518.062
large2180-te e5ed8a5 2015-09-10 Added one minute tim... 253/492/694 36.5%/70.9% mrr 0.456 avgtime 6951.470
large2180-te ue5ed8a5 2015-09-10 Added one minute tim... 235/557/694 33.9%/80.3% mrr 0.433 avgtime 6608.611
large2180-te ve5ed8a5 2015-09-10 Added one minute tim... 253/492/694 36.5%/70.9% mrr 0.455 avgtime 6857.989
large2180-tr e5ed8a5 2015-09-10 Added one minute tim... 813/1013/1479 55.0%/68.5% mrr 0.605 avgtime 21314.917
large2180-tr ue5ed8a5 2015-09-10 Added one minute tim... 531/1152/1479 35.9%/77.9% mrr 0.445 avgtime 20472.813
large2180-tr ve5ed8a5 2015-09-10 Added one minute tim... 667/1013/1479 45.1%/68.5% mrr 0.535 avgtime 21075.419
Version running up to 2015-09-18:
large2180-te f04cce6 2015-07-21 Merge branch 'master... 264/520/694 38.0%/74.9% mrr 0.477 avgtime 6248.368
large2180-te uf04cce6 2015-07-21 Merge branch 'master... 230/587/694 33.1%/84.6% mrr 0.430 avgtime 5976.657
large2180-te vf04cce6 2015-07-21 Merge branch 'master... 259/520/694 37.3%/74.9% mrr 0.474 avgtime 6166.965
large2180-tr f04cce6 2015-07-21 Merge branch 'master... 599/1052/1479 40.5%/71.1% mrr 0.498 avgtime 12523.736
large2180-tr uf04cce6 2015-07-21 Merge branch 'master... 510/1191/1479 34.5%/80.5% mrr 0.437 avgtime 11852.452
large2180-tr vf04cce6 2015-07-21 Merge branch 'master... 585/1052/1479 39.6%/71.1% mrr 0.490 avgtime 12329.911
v1.2 with Bing search (live since 2015-09-18):
curated-test e5ed8a5 2015-09-10 Added one minute tim... 178/319/430 41.4%/74.2% mrr 0.500 avgtime 5827.692
curated-test ue5ed8a5 2015-09-10 Added one minute tim... 167/360/430 38.8%/83.7% mrr 0.481 avgtime 5635.870
curated-test ve5ed8a5 2015-09-10 Added one minute tim... 177/319/430 41.2%/74.2% mrr 0.502 avgtime 5779.753
curated-trai e5ed8a5 2015-09-10 Added one minute tim... 328/336/430 76.3%/78.1% mrr 0.772 avgtime 7043.856
curated-trai ue5ed8a5 2015-09-10 Added one minute tim... 196/364/430 45.6%/84.7% mrr 0.549 avgtime 6767.992
curated-trai ve5ed8a5 2015-09-10 Added one minute tim... 289/336/430 67.2%/78.1% mrr 0.720 avgtime 6963.149
large2180-te e5ed8a5 2015-09-10 Added one minute tim... 253/492/694 36.5%/70.9% mrr 0.456 avgtime 6951.470
large2180-te ue5ed8a5 2015-09-10 Added one minute tim... 235/557/694 33.9%/80.3% mrr 0.433 avgtime 6608.611
large2180-te ve5ed8a5 2015-09-10 Added one minute tim... 253/492/694 36.5%/70.9% mrr 0.455 avgtime 6857.989
large2180-tr e5ed8a5 2015-09-10 Added one minute tim... 813/1013/1479 55.0%/68.5% mrr 0.605 avgtime 21314.917
large2180-tr ue5ed8a5 2015-09-10 Added one minute tim... 531/1152/1479 35.9%/77.9% mrr 0.445 avgtime 20472.813
large2180-tr ve5ed8a5 2015-09-10 Added one minute tim... 667/1013/1479 45.1%/68.5% mrr 0.535 avgtime 21075.419
We primarily use the d/movies branch for WebQuestions style questions - this branch has disabled enwiki as a data source since our primary motivation in the movies-based questions is QA just on structured knowledge bases.
Also note that the pipeline phase1 (v- prefixed commits) actually seems non-overfitted here. We didn't factor that into our reports or benchmark instructions yet --- for simplicity to keep the common approach for both TREC and WQ based scenarios. We'll probably drop this soon, though.
Master:
moviesD-test 7bbda27 2015-12-02 FocusGenerator addFo... 141/205/260 54.2%/78.8% mrr 0.614 avgtime 801.693
moviesD-test u7bbda27 2015-12-02 FocusGenerator addFo... 135/215/260 51.9%/82.7% mrr 0.604 avgtime 635.000
moviesD-test v7bbda27 2015-12-02 FocusGenerator addFo... 138/205/260 53.1%/78.8% mrr 0.613 avgtime 742.282
moviesD-trai 7bbda27 2015-12-02 FocusGenerator addFo... 454/513/624 72.8%/82.2% mrr 0.765 avgtime 2061.984
moviesD-trai u7bbda27 2015-12-02 FocusGenerator addFo... 356/527/624 57.1%/84.5% mrr 0.653 avgtime 1612.969
moviesD-trai v7bbda27 2015-12-02 FocusGenerator addFo... 413/513/624 66.2%/82.2% mrr 0.722 avgtime 1898.797
v1.4 --- moviesD APR 81.9%, MRR 0.590:
moviesD-test e10cf37 2015-11-03 Mbprop.txt: Retrain ... 138/206/260 53.1%/79.2% mrr 0.609 avgtime 1571.417
moviesD-test ue10cf37 2015-11-03 Mbprop.txt: Retrain ... 130/213/260 50.0%/81.9% mrr 0.590 avgtime 1419.293
moviesD-test ve10cf37 2015-11-03 Mbprop.txt: Retrain ... 137/206/260 52.7%/79.2% mrr 0.609 avgtime 1512.312
moviesD-trai e10cf37 2015-11-03 Mbprop.txt: Retrain ... 455/512/624 72.9%/82.1% mrr 0.766 avgtime 17632.270
moviesD-trai ue10cf37 2015-11-03 Mbprop.txt: Retrain ... 362/525/624 58.0%/84.1% mrr 0.658 avgtime 17203.644
moviesD-trai ve10cf37 2015-11-03 Mbprop.txt: Retrain ... 406/512/624 65.1%/82.1% mrr 0.715 avgtime 17474.994
v1.3 --- moviesC APR 79.0%, MRR 0.573; moviesD APR 76.5%, MRR 0.531; wq APR 75.7%, MRR 0.476:
moviesC-test 6eadf12 2015-10-18 Mbprop.txt: Retrain ... 118/173/233 50.6%/74.2% mrr 0.577 avgtime 876.665
moviesC-test u6eadf12 2015-10-18 Mbprop.txt: Retrain ... 119/184/233 51.1%/79.0% mrr 0.573 avgtime 739.296
moviesC-test v6eadf12 2015-10-18 Mbprop.txt: Retrain ... 121/173/233 51.9%/74.2% mrr 0.585 avgtime 819.947
moviesC-trai 6eadf12 2015-10-18 Mbprop.txt: Retrain ... 379/438/542 69.9%/80.8% mrr 0.742 avgtime 1829.013
moviesC-trai u6eadf12 2015-10-18 Mbprop.txt: Retrain ... 290/444/542 53.5%/81.9% mrr 0.619 avgtime 1466.149
moviesC-trai v6eadf12 2015-10-18 Mbprop.txt: Retrain ... 347/438/542 64.0%/80.8% mrr 0.700 avgtime 1689.706
moviesD-test 6c13b62 2015-10-19 +moviesD dataset... 127/190/260 48.8%/73.1% mrr 0.551 avgtime 630.581
moviesD-test u6c13b62 2015-10-19 +moviesD dataset... 117/199/260 45.0%/76.5% mrr 0.531 avgtime 482.417
moviesD-test v6c13b62 2015-10-19 +moviesD dataset... 124/190/260 47.7%/73.1% mrr 0.547 avgtime 571.359
moviesD-trai 6c13b62 2015-10-19 +moviesD dataset... 425/485/624 68.1%/77.7% mrr 0.719 avgtime 2140.000
moviesD-trai u6c13b62 2015-10-19 +moviesD dataset... 322/492/624 51.6%/78.8% mrr 0.595 avgtime 1735.939
moviesD-trai v6c13b62 2015-10-19 +moviesD dataset... 364/485/624 58.3%/77.7% mrr 0.658 avgtime 1984.069
wq-test-ovt- 6eadf12 2015-10-18 Mbprop.txt: Retrain ... 863/1393/2032 42.5%/68.6% mrr 0.502 avgtime 5812.585
wq-test-ovt- u6eadf12 2015-10-18 Mbprop.txt: Retrain ... 795/1538/2032 39.1%/75.7% mrr 0.476 avgtime 5122.649
wq-test-ovt- v6eadf12 2015-10-18 Mbprop.txt: Retrain ... 857/1393/2032 42.2%/68.6% mrr 0.499 avgtime 5606.749
wq-train-ovt 6eadf12 2015-10-18 Mbprop.txt: Retrain ... 1906/2773/3778 50.4%/73.4% mrr 0.582 avgtime 17218.725
wq-train-ovt u6eadf12 2015-10-18 Mbprop.txt: Retrain ... 1689/2968/3778 44.7%/78.6% mrr 0.531 avgtime 15051.915
wq-train-ovt v6eadf12 2015-10-18 Mbprop.txt: Retrain ... 1839/2773/3778 48.7%/73.4% mrr 0.566 avgtime 16566.801
v1.2, v1.1 (both same results) --- moviesC APR 75.5%, MRR 0.494; wq APR 67.3%, MRR 0.425:
moviesC-test a770e5f 2015-08-21 Mark: label-lookup 1... 102/168/233 43.8%/72.1% mrr 0.509 avgtime 585.312
moviesC-test ua770e5f 2015-08-21 Mark: label-lookup 1... 95/176/233 40.8%/75.5% mrr 0.494 avgtime 447.181
moviesC-test va770e5f 2015-08-21 Mark: label-lookup 1... 104/168/233 44.6%/72.1% mrr 0.517 avgtime 530.785
moviesC-trai a770e5f 2015-08-21 Mark: label-lookup 1... 313/388/542 57.7%/71.6% mrr 0.629 avgtime 1463.521
moviesC-trai ua770e5f 2015-08-21 Mark: label-lookup 1... 240/399/542 44.3%/73.6% mrr 0.522 avgtime 1176.910
moviesC-trai va770e5f 2015-08-21 Mark: label-lookup 1... 287/388/542 53.0%/71.6% mrr 0.596 avgtime 1351.434
wq-test-ovt- 8795cd0 2015-08-27 Merge remote-trackin... 757/1257/2032 37.3%/61.9% mrr 0.445 avgtime 5117.716
wq-test-ovt- u8795cd0 2015-08-27 Merge remote-trackin... 699/1368/2032 34.4%/67.3% mrr 0.425 avgtime 4516.366
wq-test-ovt- v8795cd0 2015-08-27 Merge remote-trackin... 749/1257/2032 36.9%/61.9% mrr 0.443 avgtime 4922.379
wq-train-ovt 8795cd0 2015-08-27 Merge remote-trackin... 1702/2486/3778 45.1%/65.8% mrr 0.522 avgtime 22590.390
wq-train-ovt u8795cd0 2015-08-27 Merge remote-trackin... 1519/2658/3778 40.2%/70.4% mrr 0.477 avgtime 21017.841
uq-train-ovt v8795cd0 2015-08-27 Merge remote-trackin... 1673/2486/3778 44.3%/65.8% mrr 0.510 avgtime 22058.533
This section will be probably quite fluid.
Baseline:
moviesD-test e10cf37 2015-11-03 Mbprop.txt: Retrain ... 138/206/260 53.1%/79.2% mrr 0.609 avgtime 1571.417
moviesD-test ue10cf37 2015-11-03 Mbprop.txt: Retrain ... 130/213/260 50.0%/81.9% mrr 0.590 avgtime 1419.293
moviesD-test ve10cf37 2015-11-03 Mbprop.txt: Retrain ... 137/206/260 52.7%/79.2% mrr 0.609 avgtime 1512.312
moviesD-trai e10cf37 2015-11-03 Mbprop.txt: Retrain ... 455/512/624 72.9%/82.1% mrr 0.766 avgtime 17632.270
moviesD-trai ue10cf37 2015-11-03 Mbprop.txt: Retrain ... 362/525/624 58.0%/84.1% mrr 0.658 avgtime 17203.644
moviesD-trai ve10cf37 2015-11-03 Mbprop.txt: Retrain ... 406/512/624 65.1%/82.1% mrr 0.715 avgtime 17474.994
large2180-te 2b85c94 2015-11-10 AnswerScoreDecisionF... 209/432/694 30.1%/62.2% mrr 0.387 avgtime 5137.313
large2180-te u2b85c94 2015-11-10 AnswerScoreDecisionF... 205/524/694 29.5%/75.5% mrr 0.379 avgtime 4828.143
large2180-te v2b85c94 2015-11-10 AnswerScoreDecisionF... 213/432/694 30.7%/62.2% mrr 0.389 avgtime 5076.575
large2180-tr c92760c 2015-11-04 HIGHLEVEL.md constra... 726/926/1479 49.1%/62.6% mrr 0.543 avgtime 11820.064
large2180-tr uc92760c 2015-11-04 HIGHLEVEL.md constra... 453/1075/1479 30.6%/72.7% mrr 0.389 avgtime 10848.609
large2180-tr vc92760c 2015-11-04 HIGHLEVEL.md constra... 602/926/1479 40.7%/62.6% mrr 0.483 avgtime 11638.433
LAT by SV nominalization in case of NSUBJ:
moviesD-test 43b438d 2015-11-12 AnswerScoreDecisionF... 127/203/260 48.8%/78.1% mrr 0.580 avgtime 1274.038
moviesD-test u43b438d 2015-11-12 AnswerScoreDecisionF... 130/213/260 50.0%/81.9% mrr 0.592 avgtime 1112.648
moviesD-test v43b438d 2015-11-12 AnswerScoreDecisionF... 131/203/260 50.4%/78.1% mrr 0.590 avgtime 1212.274
moviesD-test ud5233fa 2015-11-12 Merge remote-trackin... no answers generated
moviesD-trai d5233fa 2015-11-12 Merge remote-trackin... 449/514/624 72.0%/82.4% mrr 0.761 avgtime 3496.390
moviesD-trai ud5233fa 2015-11-12 Merge remote-trackin... 355/525/624 56.9%/84.1% mrr 0.653 avgtime 3047.916
moviesD-trai vd5233fa 2015-11-12 Merge remote-trackin... 406/514/624 65.1%/82.4% mrr 0.714 avgtime 3338.435
large2180-te d738df8 2015-11-12 LATBySV: Fix crash o... 206/434/694 29.7%/62.5% mrr 0.380 avgtime 4617.641
large2180-te ud738df8 2015-11-12 LATBySV: Fix crash o... 208/524/694 30.0%/75.5% mrr 0.382 avgtime 4313.823
large2180-te vd738df8 2015-11-12 LATBySV: Fix crash o... 206/434/694 29.7%/62.5% mrr 0.381 avgtime 4558.018
large2180-tr d738df8 2015-11-12 LATBySV: Fix crash o... 718/921/1479 48.5%/62.3% mrr 0.544 avgtime 11262.440
large2180-tr ud738df8 2015-11-12 LATBySV: Fix crash o... 436/1075/1479 29.5%/72.7% mrr 0.383 avgtime 10310.996
large2180-tr vd738df8 2015-11-12 LATBySV: Fix crash o... 591/921/1479 40.0%/62.3% mrr 0.477 avgtime 11082.713
Merged.
Baseline:
moviesD-test 43b438d 2015-11-12 AnswerScoreDecisionF... 127/203/260 48.8%/78.1% mrr 0.580 avgtime 1274.038
moviesD-test u43b438d 2015-11-12 AnswerScoreDecisionF... 130/213/260 50.0%/81.9% mrr 0.592 avgtime 1112.648
moviesD-test v43b438d 2015-11-12 AnswerScoreDecisionF... 131/203/260 50.4%/78.1% mrr 0.590 avgtime 1212.274
moviesD-trai d5233fa 2015-11-12 Merge remote-trackin... 449/514/624 72.0%/82.4% mrr 0.761 avgtime 3496.390
moviesD-trai ud5233fa 2015-11-12 Merge remote-trackin... 355/525/624 56.9%/84.1% mrr 0.653 avgtime 3047.916
moviesD-trai vd5233fa 2015-11-12 Merge remote-trackin... 406/514/624 65.1%/82.4% mrr 0.714 avgtime 3338.435
Including question/description relatedness score in the concept classifier:
moviesD-test e9f8721 2015-11-15 ConceptClassifier: R... 137/206/260 52.7%/79.2% mrr 0.605 avgtime 1251.465
moviesD-test ue9f8721 2015-11-15 ConceptClassifier: R... 132/215/260 50.8%/82.7% mrr 0.594 avgtime 1095.514
moviesD-test ve9f8721 2015-11-15 ConceptClassifier: R... 136/206/260 52.3%/79.2% mrr 0.609 avgtime 1192.673
moviesD-trai e9f8721 2015-11-15 ConceptClassifier: R... 457/514/624 73.2%/82.4% mrr 0.769 avgtime 3616.423
moviesD-trai ue9f8721 2015-11-15 ConceptClassifier: R... 367/527/624 58.8%/84.5% mrr 0.664 avgtime 3182.397
moviesD-trai ve9f8721 2015-11-15 ConceptClassifier: R... 414/514/624 66.3%/82.4% mrr 0.724 avgtime 3459.919
Merged.
Baseline:
moviesD-test ea71748 2015-12-01 Merge branch 'master... 139/207/260 53.5%/79.6% mrr 0.612 avgtime 1240.304
moviesD-test uea71748 2015-12-01 Merge branch 'master... 136/215/260 52.3%/82.7% mrr 0.603 avgtime 1132.611
moviesD-test vea71748 2015-12-01 Merge branch 'master... 134/207/260 51.5%/79.6% mrr 0.602 avgtime 1202.870
moviesD-trai ea71748 2015-12-01 Merge branch 'master... 447/514/624 71.6%/82.4% mrr 0.761 avgtime 3977.617
moviesD-trai uea71748 2015-12-01 Merge branch 'master... 358/527/624 57.4%/84.5% mrr 0.656 avgtime 3621.058
moviesD-trai vea71748 2015-12-01 Merge branch 'master... 401/514/624 64.3%/82.4% mrr 0.713 avgtime 3855.856
Fix witness language multiplication of some branched properties:
moviesD-test 6b117b2 2015-12-01 Merge remote-trackin... 136/206/260 52.3%/79.2% mrr 0.601 avgtime 748.599
moviesD-test u6b117b2 2015-12-01 Merge remote-trackin... 133/215/260 51.2%/82.7% mrr 0.598 avgtime 588.144
moviesD-test v6b117b2 2015-12-01 Merge remote-trackin... 137/206/260 52.7%/79.2% mrr 0.606 avgtime 688.220
moviesD-trai 6b117b2 2015-12-01 Merge remote-trackin... 453/514/624 72.6%/82.4% mrr 0.766 avgtime 2215.680
moviesD-trai u6b117b2 2015-12-01 Merge remote-trackin... 355/527/624 56.9%/84.5% mrr 0.655 avgtime 1769.992
moviesD-trai v6b117b2 2015-12-01 Merge remote-trackin... 407/514/624 65.2%/82.4% mrr 0.720 avgtime 2056.199
Adding features on nature of focus in answers:
moviesD-test 7bbda27 2015-12-02 FocusGenerator addFo... 141/205/260 54.2%/78.8% mrr 0.614 avgtime 801.693
moviesD-test u7bbda27 2015-12-02 FocusGenerator addFo... 135/215/260 51.9%/82.7% mrr 0.604 avgtime 635.000
moviesD-test v7bbda27 2015-12-02 FocusGenerator addFo... 138/205/260 53.1%/78.8% mrr 0.613 avgtime 742.282
moviesD-trai 7bbda27 2015-12-02 FocusGenerator addFo... 454/513/624 72.8%/82.2% mrr 0.765 avgtime 2061.984
moviesD-trai u7bbda27 2015-12-02 FocusGenerator addFo... 356/527/624 57.1%/84.5% mrr 0.653 avgtime 1612.969
moviesD-trai v7bbda27 2015-12-02 FocusGenerator addFo... 413/513/624 66.2%/82.2% mrr 0.722 avgtime 1898.797
Merged.
Baseline:
moviesD-test 7bbda27 2015-12-02 FocusGenerator addFo... 141/205/260 54.2%/78.8% mrr 0.614 avgtime 801.693
moviesD-test u7bbda27 2015-12-02 FocusGenerator addFo... 135/215/260 51.9%/82.7% mrr 0.604 avgtime 635.000
moviesD-test v7bbda27 2015-12-02 FocusGenerator addFo... 138/205/260 53.1%/78.8% mrr 0.613 avgtime 742.282
moviesD-trai 7bbda27 2015-12-02 FocusGenerator addFo... 454/513/624 72.8%/82.2% mrr 0.765 avgtime 2061.984
moviesD-trai u7bbda27 2015-12-02 FocusGenerator addFo... 356/527/624 57.1%/84.5% mrr 0.653 avgtime 1612.969
moviesD-trai v7bbda27 2015-12-02 FocusGenerator addFo... 413/513/624 66.2%/82.2% mrr 0.722 avgtime 1898.797
Explorative instead of a priori (logistic regression labelling):
moviesD-test e462e45 2015-12-04 Merge remote-trackin... 90/160/260 34.6%/61.5% mrr 0.421 avgtime 947.432
moviesD-test ue462e45 2015-12-04 Merge remote-trackin... 91/176/260 35.0%/67.7% mrr 0.425 avgtime 791.527
moviesD-test ve462e45 2015-12-04 Merge remote-trackin... 88/160/260 33.8%/61.5% mrr 0.415 avgtime 887.226
moviesD-trai e462e45 2015-12-04 Merge remote-trackin... 353/418/624 56.6%/67.0% mrr 0.607 avgtime 2504.473
moviesD-trai ue462e45 2015-12-04 Merge remote-trackin... 252/427/624 40.4%/68.4% mrr 0.484 avgtime 2082.617
moviesD-trai ve462e45 2015-12-04 Merge remote-trackin... 299/418/624 47.9%/67.0% mrr 0.552 avgtime 2358.309
Explorative instead of generic (fetch all) (new baseline):
moviesD-test c5805b9 2015-12-04 Merge remote-trackin... 137/200/260 52.7%/76.9% mrr 0.598 avgtime 840.640
moviesD-test uc5805b9 2015-12-04 Merge remote-trackin... 133/210/260 51.2%/80.8% mrr 0.588 avgtime 681.419
moviesD-test vc5805b9 2015-12-04 Merge remote-trackin... 136/200/260 52.3%/76.9% mrr 0.596 avgtime 781.449
moviesD-trai c5805b9 2015-12-04 Merge remote-trackin... 439/512/624 70.4%/82.1% mrr 0.750 avgtime 2343.264
moviesD-trai uc5805b9 2015-12-04 Merge remote-trackin... 344/528/624 55.1%/84.6% mrr 0.635 avgtime 1883.171
moviesD-trai vc5805b9 2015-12-04 Merge remote-trackin... 392/512/624 62.8%/82.1% mrr 0.698 avgtime 2187.892
Fixed score-based ordering, mean score for 2-property paths:
moviesD-test 45af8db 2015-12-05 Merge remote-trackin... 138/204/260 53.1%/78.5% mrr 0.594 avgtime 1214.348
moviesD-test u45af8db 2015-12-05 Merge remote-trackin... 132/220/260 50.8%/84.6% mrr 0.584 avgtime 1043.612
moviesD-test v45af8db 2015-12-05 Merge remote-trackin... 134/204/260 51.5%/78.5% mrr 0.591 avgtime 1155.532
moviesD-trai 45af8db 2015-12-05 Merge remote-trackin... 446/513/624 71.5%/82.2% mrr 0.759 avgtime 3343.911
moviesD-trai u45af8db 2015-12-05 Merge remote-trackin... 332/537/624 53.2%/86.1% mrr 0.627 avgtime 2851.963
moviesD-trai v45af8db 2015-12-05 Merge remote-trackin... 396/513/624 63.5%/82.2% mrr 0.706 avgtime 3184.205
Limit also the number of 2-property paths, not just 1-prop paths (new baseline):
moviesD-test c7418cc 2015-12-05 FBPathGloVeScoring: ... 136/207/260 52.3%/79.6% mrr 0.599 avgtime 1071.796
moviesD-test uc7418cc 2015-12-05 FBPathGloVeScoring: ... 137/220/260 52.7%/84.6% mrr 0.606 avgtime 902.943
moviesD-test vc7418cc 2015-12-05 FBPathGloVeScoring: ... 142/207/260 54.6%/79.6% mrr 0.611 avgtime 1010.326
moviesD-trai c7418cc 2015-12-05 FBPathGloVeScoring: ... 447/509/624 71.6%/81.6% mrr 0.758 avgtime 2786.857
moviesD-trai uc7418cc 2015-12-05 FBPathGloVeScoring: ... 347/530/624 55.6%/84.9% mrr 0.643 avgtime 2326.904
moviesD-trai vc7418cc 2015-12-05 FBPathGloVeScoring: ... 402/509/624 64.4%/81.6% mrr 0.710 avgtime 2633.049
Try changing limit 15 -> 5:
moviesD-test 8c9b29e 2015-12-05 exploringPaths topPa... 128/204/260 49.2%/78.5% mrr 0.574 avgtime 810.969
moviesD-test u8c9b29e 2015-12-05 exploringPaths topPa... 124/217/260 47.7%/83.5% mrr 0.570 avgtime 655.753
moviesD-test v8c9b29e 2015-12-05 exploringPaths topPa... 129/204/260 49.6%/78.5% mrr 0.581 avgtime 751.032
moviesD-trai 8c9b29e 2015-12-05 exploringPaths topPa... 443/507/624 71.0%/81.2% mrr 0.752 avgtime 2121.804
moviesD-trai u8c9b29e 2015-12-05 exploringPaths topPa... 358/520/624 57.4%/83.3% mrr 0.648 avgtime 1696.961
moviesD-trai v8c9b29e 2015-12-05 exploringPaths topPa... 395/507/624 63.3%/81.2% mrr 0.700 avgtime 1966.618
Try disabling a priori fbpath question labelling:
moviesD-test a8f31c3 2015-12-05 Try disabling a prio... 91/187/260 35.0%/71.9% mrr 0.440 avgtime 896.918
moviesD-test ua8f31c3 2015-12-05 Try disabling a prio... 80/206/260 30.8%/79.2% mrr 0.414 avgtime 735.018
moviesD-test va8f31c3 2015-12-05 Try disabling a prio... 84/187/260 32.3%/71.9% mrr 0.427 avgtime 834.787
moviesD-trai a8f31c3 2015-12-05 Try disabling a prio... 358/458/624 57.4%/73.4% mrr 0.637 avgtime 2310.752
moviesD-trai ua8f31c3 2015-12-05 Try disabling a prio... 248/488/624 39.7%/78.2% mrr 0.503 avgtime 1885.626
moviesD-trai va8f31c3 2015-12-05 Try disabling a prio... 296/458/624 47.4%/73.4% mrr 0.568 avgtime 2160.346
Retrain explorative (GloVe) classifier using moviesD, include non-link relations (new baseline):
moviesD-test a24f2f7 2015-12-06 Merge branch 'fbpath... 134/209/260 51.5%/80.4% mrr 0.598 avgtime 1685.146
moviesD-test ua24f2f7 2015-12-06 Merge branch 'fbpath... 132/218/260 50.8%/83.8% mrr 0.592 avgtime 1519.137
moviesD-test va24f2f7 2015-12-06 Merge branch 'fbpath... 135/209/260 51.9%/80.4% mrr 0.602 avgtime 1625.133
moviesD-trai a24f2f7 2015-12-06 Merge branch 'fbpath... 442/511/624 70.8%/81.9% mrr 0.754 avgtime 4611.489
moviesD-trai ua24f2f7 2015-12-06 Merge branch 'fbpath... 356/530/624 57.1%/84.9% mrr 0.649 avgtime 4140.708
moviesD-trai va24f2f7 2015-12-06 Merge branch 'fbpath... 405/511/624 64.9%/81.9% mrr 0.712 avgtime 4455.004
Try disabling a priori fbpath question labelling:
moviesD-test 4d753b0 2015-12-05 Try disabling a prio... 98/184/260 37.7%/70.8% mrr 0.465 avgtime 783.514
moviesD-test u4d753b0 2015-12-05 Try disabling a prio... 95/209/260 36.5%/80.4% mrr 0.456 avgtime 609.502
moviesD-test v4d753b0 2015-12-05 Try disabling a prio... 88/184/260 33.8%/70.8% mrr 0.445 avgtime 712.502
moviesD-trai 4d753b0 2015-12-05 Try disabling a prio... 383/472/624 61.4%/75.6% mrr 0.670 avgtime 2193.474
moviesD-trai u4d753b0 2015-12-05 Try disabling a prio... 266/502/624 42.6%/80.4% mrr 0.519 avgtime 1765.960
moviesD-trai v4d753b0 2015-12-05 Try disabling a prio... 325/472/624 52.1%/75.6% mrr 0.602 avgtime 2041.186
Building witness-based relations:
moviesD-test ee63449 2015-12-07 Merge branch 'fbpath... 122/206/260 46.9%/79.2% mrr 0.558 avgtime 1035.734
moviesD-test uee63449 2015-12-07 Merge branch 'fbpath... 116/219/260 44.6%/84.2% mrr 0.542 avgtime 919.843
moviesD-test vee63449 2015-12-07 Merge branch 'fbpath... 122/206/260 46.9%/79.2% mrr 0.562 avgtime 999.078
moviesD-trai ee63449 2015-12-07 Merge branch 'fbpath... 436/508/624 69.9%/81.4% mrr 0.748 avgtime 3081.712
moviesD-trai uee63449 2015-12-07 Merge branch 'fbpath... 320/530/624 51.3%/84.9% mrr 0.614 avgtime 2694.448
moviesD-trai vee63449 2015-12-07 Merge branch 'fbpath... 390/508/624 62.5%/81.4% mrr 0.698 avgtime 2959.652
[Building witness-based relations] Try disabling a priori fbpath question labelling:
moviesD-test 05176c1 2015-12-05 Try disabling a prio... 93/184/260 35.8%/70.8% mrr 0.455 avgtime 906.356
moviesD-test u05176c1 2015-12-05 Try disabling a prio... 90/208/260 34.6%/80.0% mrr 0.442 avgtime 730.845
moviesD-test v05176c1 2015-12-05 Try disabling a prio... 94/184/260 36.2%/70.8% mrr 0.447 avgtime 845.308
moviesD-trai 05176c1 2015-12-05 Try disabling a prio... 371/457/624 59.5%/73.2% mrr 0.648 avgtime 2295.579
moviesD-trai u05176c1 2015-12-05 Try disabling a prio... 263/487/624 42.1%/78.0% mrr 0.519 avgtime 1871.413
moviesD-trai v05176c1 2015-12-05 Try disabling a prio... 331/457/624 53.0%/73.2% mrr 0.605 avgtime 2143.996
[Building witness-based relations] Improved question focus in "who did play X Y in Z":
moviesD-test 6ed5826 2015-12-07 question FocusGenera... 127/204/260 48.8%/78.5% mrr 0.573 avgtime 1072.878
moviesD-test u6ed5826 2015-12-07 question FocusGenera... 116/219/260 44.6%/84.2% mrr 0.543 avgtime 903.636
moviesD-test v6ed5826 2015-12-07 question FocusGenera... 126/204/260 48.5%/78.5% mrr 0.570 avgtime 1012.997
moviesD-trai 6ed5826 2015-12-07 question FocusGenera... 432/509/624 69.2%/81.6% mrr 0.742 avgtime 2833.613
moviesD-trai u6ed5826 2015-12-07 question FocusGenera... 316/531/624 50.6%/85.1% mrr 0.606 avgtime 2368.257
moviesD-trai v6ed5826 2015-12-07 question FocusGenera... 389/509/624 62.3%/81.6% mrr 0.697 avgtime 2676.699
Baseline:
moviesD-test e10cf37 2015-11-03 Mbprop.txt: Retrain ... 138/206/260 53.1%/79.2% mrr 0.609 avgtime 1571.417
moviesD-test ue10cf37 2015-11-03 Mbprop.txt: Retrain ... 130/213/260 50.0%/81.9% mrr 0.590 avgtime 1419.293
moviesD-test ve10cf37 2015-11-03 Mbprop.txt: Retrain ... 137/206/260 52.7%/79.2% mrr 0.609 avgtime 1512.312
moviesD-trai e10cf37 2015-11-03 Mbprop.txt: Retrain ... 455/512/624 72.9%/82.1% mrr 0.766 avgtime 17632.270
moviesD-trai ue10cf37 2015-11-03 Mbprop.txt: Retrain ... 362/525/624 58.0%/84.1% mrr 0.658 avgtime 17203.644
moviesD-trai ve10cf37 2015-11-03 Mbprop.txt: Retrain ... 406/512/624 65.1%/82.1% mrr 0.715 avgtime 17474.994
Migrated:
moviesD-test ee93719 2015-11-08 Migrate Freebase fro... 138/199/260 53.1%/76.5% mrr 0.604 avgtime 1687.771
moviesD-test uee93719 2015-11-08 Migrate Freebase fro... 135/208/260 51.9%/80.0% mrr 0.589 avgtime 1540.271
moviesD-test vee93719 2015-11-08 Migrate Freebase fro... 134/199/260 51.5%/76.5% mrr 0.594 avgtime 1627.686
moviesD-trai ee93719 2015-11-08 Migrate Freebase fro... 447/511/624 71.6%/81.9% mrr 0.758 avgtime 4067.582
moviesD-trai uee93719 2015-11-08 Migrate Freebase fro... 359/519/624 57.5%/83.2% mrr 0.656 avgtime 3648.616
moviesD-trai vee93719 2015-11-08 Migrate Freebase fro... 398/511/624 63.8%/81.9% mrr 0.707 avgtime 3901.511
This also involves (i) updating to BaseKB Gold (Freebase snapshot from April rather than January) and (ii) reducing topLinkedConcepts from 5 to 4 (as some of our queries were too large for Virtuoso when we had too many parallel concepts).
(work in progress - this is actually a slowdown, while the goal was performance speedup)
Note that the label-lookup, dectrees changes introduced before v1.1 did not improve performance on curated, but did improve movies, webquestions and large2180.
v1.1 with 12 inst. of 6 search results per IR query --- curated APR 80.0%, MRR 0.440 (but ~12s -> 20s per question):
curated-test 5768167 2015-08-29 AnswerScoreDecisionF... 138/290/430 32.1%/67.4% mrr 0.425 avgtime 5754.981
curated-test u5768167 2015-08-29 AnswerScoreDecisionF... 152/344/430 35.3%/80.0% mrr 0.440 avgtime 5465.723
curated-test v5768167 2015-08-29 AnswerScoreDecisionF... 139/290/430 32.3%/67.4% mrr 0.427 avgtime 5705.498
curated-trai 597b437 2015-08-28 SolrFullPrimarySearc... 300/308/430 69.8%/71.6% mrr 0.706 avgtime 4601.686
curated-trai u597b437 2015-08-28 SolrFullPrimarySearc... 194/344/430 45.1%/80.0% mrr 0.532 avgtime 4255.334
curated-trai v597b437 2015-08-28 SolrFullPrimarySearc... 284/308/430 66.0%/71.6% mrr 0.685 avgtime 4530.166
v1.1 without IR from enwiki --- curated APR 42.1%, MRR 0.253 (but ~2.5s per question):
curated-test 8795cd0 2015-08-27 Merge remote-trackin... 91/156/430 21.2%/36.3% mrr 0.254 avgtime 1085.359
curated-test u8795cd0 2015-08-27 Merge remote-trackin... 85/181/430 19.8%/42.1% mrr 0.253 avgtime 939.621
curated-test v8795cd0 2015-08-27 Merge remote-trackin... 88/156/430 20.5%/36.3% mrr 0.253 avgtime 1037.196
curated-trai 8795cd0 2015-08-27 Merge remote-trackin... 165/184/430 38.4%/42.8% mrr 0.398 avgtime 863.772
curated-trai u8795cd0 2015-08-27 Merge remote-trackin... 129/188/430 30.0%/43.7% mrr 0.339 avgtime 671.766
curated-trai v8795cd0 2015-08-27 Merge remote-trackin... 154/184/430 35.8%/42.8% mrr 0.382 avgtime 795.719
v1.1 without IR from structured knowledge bases (DBpedia, Freebase) --- curated APR 70.7%, MRR 0.378:
curated-test a9bf875 2015-08-29 YodaQA: -structured ... 103/273/430 24.0%/63.5% mrr 0.336 avgtime 2271.578
curated-test ua9bf875 2015-08-29 YodaQA: -structured ... 124/304/430 28.8%/70.7% mrr 0.378 avgtime 2073.512
curated-test va9bf875 2015-08-29 YodaQA: -structured ... 100/273/430 23.3%/63.5% mrr 0.337 avgtime 2201.423
curated-trai a9bf875 2015-08-29 YodaQA: -structured ... 291/292/430 67.7%/67.9% mrr 0.678 avgtime 2735.233
curated-trai ua9bf875 2015-08-29 YodaQA: -structured ... 198/314/430 46.0%/73.0% mrr 0.522 avgtime 2475.768
curated-trai va9bf875 2015-08-29 YodaQA: -structured ... 262/292/430 60.9%/67.9% mrr 0.641 avgtime 2638.769
v1.1 without answer typing using external resources (WordNet, DBpedia) --- curated APR 77.2%, MRR 0.394:
curated-test e36e53c 2015-08-29 AnswerAnalysis: Disa... 116/279/430 27.0%/64.9% mrr 0.373 avgtime 1768.499
curated-test ue36e53c 2015-08-29 AnswerAnalysis: Disa... 132/332/430 30.7%/77.2% mrr 0.394 avgtime 1564.041
curated-test ve36e53c 2015-08-29 AnswerAnalysis: Disa... 118/279/430 27.4%/64.9% mrr 0.380 avgtime 1723.894
curated-trai e36e53c 2015-08-29 AnswerAnalysis: Disa... 298/303/430 69.3%/70.5% mrr 0.698 avgtime 2563.871
curated-trai ue36e53c 2015-08-29 AnswerAnalysis: Disa... 196/333/430 45.6%/77.4% mrr 0.530 avgtime 2302.198
curated-trai ve36e53c 2015-08-29 AnswerAnalysis: Disa... 267/303/430 62.1%/70.5% mrr 0.657 avgtime 2496.949
v1.1 without entity linking --- curated APR 68.1%, MRR 0.318:
curated-test ecb30e3 2015-08-29 QuestionAnalysis: -C... 90/261/430 20.9%/60.7% mrr 0.298 avgtime 1624.336
curated-test uecb30e3 2015-08-29 QuestionAnalysis: -C... 96/293/430 22.3%/68.1% mrr 0.318 avgtime 1451.379
curated-test vecb30e3 2015-08-29 QuestionAnalysis: -C... 91/261/430 21.2%/60.7% mrr 0.307 avgtime 1577.783
curated-trai ecb30e3 2015-08-29 QuestionAnalysis: -C... 277/280/430 64.4%/65.1% mrr 0.648 avgtime 2008.781
curated-trai uecb30e3 2015-08-29 QuestionAnalysis: -C... 187/299/430 43.5%/69.5% mrr 0.496 avgtime 1788.493
curated-trai vecb30e3 2015-08-29 QuestionAnalysis: -C... 262/280/430 60.9%/65.1% mrr 0.626 avgtime 1942.117
v1.1 without decision forest and label-lookup --- curated APR 79.3%, MRR 0.436; large2180 APR 76.5%, MRR 0.399:
curated-test 20ab096 2015-07-28 Merge commit '0e52a1... 124/286/430 28.8%/66.5% mrr 0.386 avgtime 5522.054
curated-test u20ab096 2015-07-28 Merge commit '0e52a1... 150/341/430 34.9%/79.3% mrr 0.436 avgtime 5242.850
curated-test v20ab096 2015-07-28 Merge commit '0e52a1... 121/286/430 28.1%/66.5% mrr 0.382 avgtime 5428.800
curated-trai 20ab096 2015-07-28 Merge commit '0e52a1... 198/298/430 46.0%/69.3% mrr 0.546 avgtime 4790.583
curated-trai u20ab096 2015-07-28 Merge commit '0e52a1... 154/332/430 35.8%/77.2% mrr 0.458 avgtime 4522.161
curated-trai v20ab096 2015-07-28 Merge commit '0e52a1... 188/298/430 43.7%/69.3% mrr 0.531 avgtime 4697.530
large2180-te 20ab096 2015-07-28 Merge commit '0e52a1... 187/438/694 26.9%/63.1% mrr 0.357 avgtime 3614.539
large2180-te u20ab096 2015-07-28 Merge commit '0e52a1... 218/531/694 31.4%/76.5% mrr 0.399 avgtime 3338.304
large2180-te v20ab096 2015-07-28 Merge commit '0e52a1... 181/438/694 26.1%/63.1% mrr 0.351 avgtime 3526.321
large2180-tr 20ab096 2015-07-28 Merge commit '0e52a1... 425/905/1479 28.7%/61.2% mrr 0.373 avgtime 12576.337
large2180-tr u20ab096 2015-07-28 Merge commit '0e52a1... 415/1058/1479 28.1%/71.5% mrr 0.366 avgtime 11938.729
large2180-tr v20ab096 2015-07-28 Merge commit '0e52a1... 408/905/1479 27.6%/61.2% mrr 0.367 avgtime 12385.858
v1.1 without decision forest, with label-lookup --- curated APR 77.2%, MRR 0.413; large2180 APR 74.8%, MRR 0.399:
curated-test a6ee873 2015-08-21 Mark: label-lookup 1... 119/281/430 27.7%/65.3% mrr 0.372 avgtime 2388.535
curated-test ua6ee873 2015-08-21 Mark: label-lookup 1... 140/332/430 32.6%/77.2% mrr 0.413 avgtime 2170.687
curated-test va6ee873 2015-08-21 Mark: label-lookup 1... 114/281/430 26.5%/65.3% mrr 0.367 avgtime 2321.839
curated-trai a6ee873 2015-08-21 Mark: label-lookup 1... 183/296/430 42.6%/68.8% mrr 0.521 avgtime 3267.536
curated-trai ua6ee873 2015-08-21 Mark: label-lookup 1... 165/333/430 38.4%/77.4% mrr 0.464 avgtime 2986.020
curated-trai va6ee873 2015-08-21 Mark: label-lookup 1... 184/296/430 42.8%/68.8% mrr 0.520 avgtime 3175.556
large2180-te a6ee873 2015-08-21 Mark: label-lookup 1... 216/430/694 31.1%/62.0% mrr 0.386 avgtime 29212.673
large2180-te ua6ee873 2015-08-21 Mark: label-lookup 1... 221/519/694 31.8%/74.8% mrr 0.399 avgtime 28906.655
large2180-te va6ee873 2015-08-21 Mark: label-lookup 1... 208/430/694 30.0%/62.0% mrr 0.382 avgtime 29153.467
large2180-tr a6ee873 2015-08-21 Mark: label-lookup 1... 465/895/1479 31.4%/60.5% mrr 0.404 avgtime 40675.033
large2180-tr ua6ee873 2015-08-21 Mark: label-lookup 1... 454/1051/1479 30.7%/71.1% mrr 0.381 avgtime 39922.785
large2180-tr va6ee873 2015-08-21 Mark: label-lookup 1... 476/895/1479 32.2%/60.5% mrr 0.407 avgtime 40524.531
v1.1 without a CRF-based passage answer producer --- curated APR 77.2%, MRR 0.433; large2180 APR 74.8%, MRR 0.399:
curated-test 3fd576a 2015-08-29 PassageAnalysis: -BI... 145/286/430 33.7%/66.5% mrr 0.431 avgtime 2982.463
curated-test u3fd576a 2015-08-29 PassageAnalysis: -BI... 150/332/430 34.9%/77.2% mrr 0.433 avgtime 2742.708
curated-test v3fd576a 2015-08-29 PassageAnalysis: -BI... 153/286/430 35.6%/66.5% mrr 0.445 avgtime 2911.970
curated-trai 3fd576a 2015-08-29 PassageAnalysis: -BI... 297/303/430 69.1%/70.5% mrr 0.697 avgtime 2634.163
curated-trai u3fd576a 2015-08-29 PassageAnalysis: -BI... 176/332/430 40.9%/77.2% mrr 0.491 avgtime 2315.214
curated-trai v3fd576a 2015-08-29 PassageAnalysis: -BI... 258/303/430 60.0%/70.5% mrr 0.645 avgtime 2531.022
large2180-te 3fd576a 2015-08-29 PassageAnalysis: -BI... 217/446/694 31.3%/64.3% mrr 0.408 avgtime 3381.604
large2180-te u3fd576a 2015-08-29 PassageAnalysis: -BI... 215/519/694 31.0%/74.8% mrr 0.399 avgtime 3048.320
large2180-te v3fd576a 2015-08-29 PassageAnalysis: -BI... 217/446/694 31.3%/64.3% mrr 0.407 avgtime 3290.635
large2180-tr 3fd576a 2015-08-29 PassageAnalysis: -BI... 723/910/1479 48.9%/61.5% mrr 0.541 avgtime 8509.359
large2180-tr u3fd576a 2015-08-29 PassageAnalysis: -BI... 474/1050/1479 32.0%/71.0% mrr 0.399 avgtime 7668.941
large2180-tr v3fd576a 2015-08-29 PassageAnalysis: -BI... 605/910/1479 40.9%/61.5% mrr 0.478 avgtime 8273.441
Let's explore the impact of CRF a little further, comparing v1.1 that has disabled NP-based answer hypothesis generator (7d7b24d) with one that has in addition the CRF disabled (5a7ae5e) --- then, we can finally see a small MRR and APR drop showing that CRF contributes something:
curated-test 7d7b24d 2015-08-30 PassageAnalysis: -Ca... 117/253/430 27.2%/58.8% mrr 0.359 avgtime 1985.050
curated-test u7d7b24d 2015-08-30 PassageAnalysis: -Ca... 125/279/430 29.1%/64.9% mrr 0.375 avgtime 1801.975
curated-test v7d7b24d 2015-08-30 PassageAnalysis: -Ca... 121/253/430 28.1%/58.8% mrr 0.369 avgtime 1919.153
curated-trai 7d7b24d 2015-08-30 PassageAnalysis: -Ca... 305/308/430 70.9%/71.6% mrr 0.712 avgtime 2452.001
curated-trai u7d7b24d 2015-08-30 PassageAnalysis: -Ca... 211/319/430 49.1%/74.2% mrr 0.564 avgtime 2211.858
curated-trai v7d7b24d 2015-08-30 PassageAnalysis: -Ca... 274/308/430 63.7%/71.6% mrr 0.673 avgtime 2360.485
curated-test 5a7ae5e 2015-08-30 PassageAnalysis: als... 132/248/430 30.7%/57.7% mrr 0.377 avgtime 1774.094
curated-test u5a7ae5e 2015-08-30 PassageAnalysis: als... 128/273/430 29.8%/63.5% mrr 0.371 avgtime 1586.492
curated-test v5a7ae5e 2015-08-30 PassageAnalysis: als... 136/248/430 31.6%/57.7% mrr 0.386 avgtime 1705.106
curated-trai 5a7ae5e 2015-08-30 PassageAnalysis: als... 266/276/430 61.9%/64.2% mrr 0.627 avgtime 1903.655
curated-trai u5a7ae5e 2015-08-30 PassageAnalysis: als... 165/288/430 38.4%/67.0% mrr 0.462 avgtime 1667.754
curated-trai v5a7ae5e 2015-08-30 PassageAnalysis: als... 229/276/430 53.3%/64.2% mrr 0.578 avgtime 1813.072
So, could it be that CRF is useless with the other generators mixed in? That is curious, let's try v1.1 with retrained CRF model --- oh, curated APR 72.%, MRR 0.439; large2180 APR 74.8%, MRR 0.411; oops:
curated-test 0296763 2015-08-30 data/ml/biocrf/model... 146/287/430 34.0%/66.7% mrr 0.431 avgtime 2392.096
curated-test u0296763 2015-08-30 data/ml/biocrf/model... 152/332/430 35.3%/77.2% mrr 0.439 avgtime 2157.916
curated-test v0296763 2015-08-30 data/ml/biocrf/model... 151/287/430 35.1%/66.7% mrr 0.440 avgtime 2343.056
curated-trai 0296763 2015-08-30 data/ml/biocrf/model... 290/303/430 67.4%/70.5% mrr 0.689 avgtime 3887.648
curated-trai u0296763 2015-08-30 data/ml/biocrf/model... 181/332/430 42.1%/77.2% mrr 0.503 avgtime 3595.703
curated-trai v0296763 2015-08-30 data/ml/biocrf/model... 257/303/430 59.8%/70.5% mrr 0.644 avgtime 3816.893
large2180-te 0296763 2015-08-30 data/ml/biocrf/model... 224/439/694 32.3%/63.3% mrr 0.402 avgtime 3326.777
large2180-te u0296763 2015-08-30 data/ml/biocrf/model... 233/519/694 33.6%/74.8% mrr 0.411 avgtime 2994.481
large2180-te v0296763 2015-08-30 data/ml/biocrf/model... 221/439/694 31.8%/63.3% mrr 0.399 avgtime 3260.786
large2180-tr 0296763 2015-08-30 data/ml/biocrf/model... 735/925/1479 49.7%/62.5% mrr 0.551 avgtime 7906.924
large2180-tr u0296763 2015-08-30 data/ml/biocrf/model... 485/1052/1479 32.8%/71.1% mrr 0.406 avgtime 7057.941
large2180-tr v0296763 2015-08-30 data/ml/biocrf/model... 586/925/1479 39.6%/62.5% mrr 0.477 avgtime 7726.841
So the whole issue is that at some point, we had to retrain this and forgot. It is too late to fix this for v1.1, so we will tag the retrained version as v1.2 right after that.
v1.2 without answer typing using external resources (WordNet, DBpedia) --- wq MRR 0.422 (so, this kind of typing is not very important when we already know the originating property):
wq-test-ovt- 4acbefc 2015-09-07 AnswerAnalysis: Disa... 732/1242/2032 36.0%/61.1% mrr 0.433 avgtime 3195.309
wq-test-ovt- u4acbefc 2015-09-07 AnswerAnalysis: Disa... 705/1368/2032 34.7%/67.3% mrr 0.422 avgtime 2743.912
wq-test-ovt- v4acbefc 2015-09-07 AnswerAnalysis: Disa... 747/1242/2032 36.8%/61.1% mrr 0.438 avgtime 3042.177
wq-train-ovt 4acbefc 2015-09-07 AnswerAnalysis: Disa... 1655/2479/3778 43.8%/65.6% mrr 0.511 avgtime 8228.916
wq-train-ovt u4acbefc 2015-09-07 AnswerAnalysis: Disa... 1501/2658/3778 39.7%/70.4% mrr 0.472 avgtime 6979.765
wq-train-ovt v4acbefc 2015-09-07 AnswerAnalysis: Disa... 1635/2479/3778 43.3%/65.6% mrr 0.502 avgtime 7784.849
v1.1 without decision forest and label-lookup --- moviesC APR 72.1%, MRR 0.449:
moviesC-test fb80dc3 2015-08-20 data/eval/moviesC-*:... 92/157/233 39.5%/67.4% mrr 0.483 avgtime 842.395
moviesC-test ufb80dc3 2015-08-20 data/eval/moviesC-*:... 81/168/233 34.8%/72.1% mrr 0.449 avgtime 710.244
moviesC-test vfb80dc3 2015-08-20 data/eval/moviesC-*:... 93/157/233 39.9%/67.4% mrr 0.483 avgtime 789.272
moviesC-trai fb80dc3 2015-08-20 data/eval/moviesC-*:... 205/350/542 37.8%/64.6% mrr 0.462 avgtime 1686.444
moviesC-trai ufb80dc3 2015-08-20 data/eval/moviesC-*:... 185/379/542 34.1%/69.9% mrr 0.429 avgtime 1432.278
moviesC-trai vfb80dc3 2015-08-20 data/eval/moviesC-*:... 207/350/542 38.2%/64.6% mrr 0.466 avgtime 1588.147
v1.1 without decision forest, with label-lookup --- moviesC APR 75.5%, MRR 0.468; wq APR 67.3%, MRR 0.408:
moviesC-test 0d660b4 2015-08-27 Merge remote-trackin... 94/161/233 40.3%/69.1% mrr 0.490 avgtime 788.321
moviesC-test u0d660b4 2015-08-27 Merge remote-trackin... 86/176/233 36.9%/75.5% mrr 0.468 avgtime 656.824
moviesC-test v0d660b4 2015-08-27 Merge remote-trackin... 94/161/233 40.3%/69.1% mrr 0.497 avgtime 735.070
moviesC-trai 0d660b4 2015-08-27 Merge remote-trackin... 217/365/542 40.0%/67.3% mrr 0.487 avgtime 1417.650
moviesC-trai u0d660b4 2015-08-27 Merge remote-trackin... 185/399/542 34.1%/73.6% mrr 0.438 avgtime 1148.684
moviesC-trai v0d660b4 2015-08-27 Merge remote-trackin... 215/365/542 39.7%/67.3% mrr 0.482 avgtime 1315.276
wq-test-ovt- 0d660b4 2015-08-27 Merge remote-trackin... 730/1232/2032 35.9%/60.6% mrr 0.433 avgtime 3639.533
wq-test-ovt- u0d660b4 2015-08-27 Merge remote-trackin... 665/1368/2032 32.7%/67.3% mrr 0.408 avgtime 3095.558
wq-test-ovt- v0d660b4 2015-08-27 Merge remote-trackin... 728/1232/2032 35.8%/60.6% mrr 0.431 avgtime 3462.939
wq-train-ovt 0d660b4 2015-08-27 Merge remote-trackin... 1525/2441/3778 40.4%/64.6% mrr 0.478 avgtime 11511.939
wq-train-ovt u0d660b4 2015-08-27 Merge remote-trackin... 1416/2658/3778 37.5%/70.4% mrr 0.456 avgtime 10022.556
wq-train-ovt v0d660b4 2015-08-27 Merge remote-trackin... 1498/2441/3778 39.7%/64.6% mrr 0.474 avgtime 11056.607
v1.1+enwiki with decision forest and label-lookup (just as a curious experiment) --- moviesC APR 84.5%, MRR 0.506; wq APR 78.3%, MRR 0.431:
moviesC-test 52cdd6c 2015-08-28 AnswerScoreDecisionF... 112/177/233 48.1%/76.0% mrr 0.565 avgtime 1738.979
moviesC-test u52cdd6c 2015-08-28 AnswerScoreDecisionF... 94/197/233 40.3%/84.5% mrr 0.506 avgtime 1581.404
moviesC-test v52cdd6c 2015-08-28 AnswerScoreDecisionF... 112/177/233 48.1%/76.0% mrr 0.568 avgtime 1703.425
moviesC-trai 52cdd6c 2015-08-28 AnswerScoreDecisionF... 388/431/542 71.6%/79.5% mrr 0.749 avgtime 4379.111
moviesC-trai u52cdd6c 2015-08-28 AnswerScoreDecisionF... 246/470/542 45.4%/86.7% mrr 0.553 avgtime 4003.825
moviesC-trai v52cdd6c 2015-08-28 AnswerScoreDecisionF... 352/431/542 64.9%/79.5% mrr 0.704 avgtime 4288.703
wq-test-ovt- 94ba475 2015-08-26 Merge branch 'f/labe... 792/1339/2032 39.0%/65.9% mrr 0.466 avgtime 10818.454
wq-test-ovt- u94ba475 2015-08-26 Merge branch 'f/labe... 696/1591/2032 34.3%/78.3% mrr 0.431 avgtime 10039.444
wq-test-ovt- v94ba475 2015-08-26 Merge branch 'f/labe... 778/1339/2032 38.3%/65.9% mrr 0.464 avgtime 10634.258
wq-train-ovt 94ba475 2015-08-26 Merge branch 'f/labe... 1622/2664/3778 42.9%/70.5% mrr 0.512 avgtime 54641.405
wq-train-ovt u94ba475 2015-08-26 Merge branch 'f/labe... 1451/3057/3778 38.4%/80.9% mrr 0.473 avgtime 52529.836
wq-train-ovt v94ba475 2015-08-26 Merge branch 'f/labe... 1637/2664/3778 43.3%/70.5% mrr 0.515 avgtime 54082.333