Improve benchmarking and performance measurements #13

creswick · 2014-11-07T23:02:57Z

The benchmark suite and evaluation tools haven't been used in a while, and it would be nice to have both run-time timing results and performance metrics taken as part of the chatter release process so we can look back over time and see if / how the classifiers change as we tweak the implementations and change training data.

This ticket is to build the infrastructure so that it's easy to add a new classifier for an existing task (eg: POS tagging, Chunking) as well as add new tasks (eg: Named Entity Recognition) and generate clear results that show false positives, false negatives, and true positives in a way that matches the behavior of NLTK (for a clear point of comparison -- someone should be able to roughly compare chatter result numbers with other toolkits; I feel no particular attachment to NLTKs evaluation details, but I see no reason to invent our own).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve benchmarking and performance measurements #13

Improve benchmarking and performance measurements #13

creswick commented Nov 7, 2014

Improve benchmarking and performance measurements #13

Improve benchmarking and performance measurements #13

Comments

creswick commented Nov 7, 2014