Add test harness for regexp optimization #18213

hvds · 2020-10-06T15:47:25Z

I welcome opinions on whether this is a sane approach to testing optimization flags for regexps before I start adding lots more tests. I can feel a lot of TODOs coming on ...

@iabyn of particular interest is whether this approach is going to be a problem for your regexp substring plans. Should I change the return data for re::optimization to provide substrs as an arrayref right now? Or are those plans far enough out that I shouldn't worry about them?

khwilliamson · 2020-10-08T12:29:09Z

I didn't examine every line in detail, but I think it should be merged after squashing the commits. The outline looks good, and it is a new test that can be tweaked as we go along.

xsawyerx · 2020-11-01T19:41:44Z

@iabyn ping?

iabyn · 2020-11-09T13:47:09Z

On Tue, Oct 06, 2020 at 08:47:42AM -0700, Hugo van der Sanden wrote: I welcome opinions on whether this is a sane approach to testing optimization flags for regexps before I start adding lots more tests. I can feel a lot of TODOs coming on ... @iabyn of particular interest is whether this approach is going to be a problem for your regexp substring plans. Should I change the return data for re::optimization to provide substrs as an arrayref right now? Or are those plans far enough out that I shouldn't worry about them? You can view, comment on, or merge this pull request online at:

Such plans are so far off that you might as well go with anchored/floating etc for now. So this gets a +1 from me. Just some general comments on data file format: - we should allow lines containing comments: i.e. 'next if /^\s*#/' although they should still do a pass() so the test number corresponds to the line number within __DATA__. - possibly extend comments to any line: i.e. s/#.*$// (unless it's likely people will need to write patterns or data containing '#'s)? If we do this, then trailing comments will be delineated by a # rather than a tab. - should the field separator be whitepace rather than tabs?

…

-- Counsellor Troi states something other than the blindingly obvious. -- Things That Never Happen in "Star Trek" #16

hvds · 2020-11-11T12:56:54Z

Such plans are so far off that you might as well go with anchored/floating etc for now. So this gets a +1 from me.

Thanks @iabyn.

Just some general comments on data file format:

we should allow lines containing comments: i.e. 'next if /^\s*#/' although they should still do a pass() so the test number corresponds to the line number within DATA.

Yes, good idea; I'll add this, and extend it to blank lines too.

possibly extend comments to any line: i.e. s/#.*$// (unless it's likely people will need to write patterns or data containing '#'s)? If we do this, then trailing comments will be delineated by a # rather than a tab.

should the field separator be whitepace rather than tabs?

I'm not so sure about these: it is no great hardship to write \t for literal tabs when needed in patterns, nor does it reduce legibility. That's less true when working around specialness of space or hash. Also, the final column is not just a comment but part of the legend for the test, so I feel it would be slightly misleading to give it a leading hash.

I do realise that it can be a little inconvenient to insert tabs if one's editor is set up to avoid them in perl's source (as mine is), but to me the benefits seemed to outweigh that.

khwilliamson · 2020-12-13T23:32:39Z

Is this ready for merge?

With a varying number of tests per data line, the plan is too much work to maintain.

Say why we're skipping; skip min/max tests for substrings if we didn't get the substring; skip checking test for substrings if we didn't get the substring we expect to be checked.

hvds · 2020-12-14T14:12:11Z

Is this ready for merge?

Probably, yes; sorry, I've had no spare capacity in recent weeks.

I've rebased and pushed the latest state, which improves the handling of skips somewhat, and adds more tests: there are now 32 patterns generating 586 tests of which 14 are TODO; it would be useful also if you (@khwilliamson) could take a look at those TODOs and get a sense a) whether you agree the expected results are what we'd ideally want, and b) where those 14 cases fall on the spectrum between nice-to-have and bug. I don't think anything need gate on that though, I think it's probably fine to merge as is.

hvds requested review from demerphq, khwilliamson and iabyn October 6, 2020 15:47

khwilliamson approved these changes Oct 9, 2020

View reviewed changes

hvds added 5 commits December 14, 2020 13:13

Add test harness for regexp optimization

a9b6621

Allow comments in regexp optimization tests

cb097ac

No plan for regexp optimization tests

bb39471

With a varying number of tests per data line, the plan is too much work to maintain.

Better skipping for regexp optimization tests

be966e2

Say why we're skipping; skip min/max tests for substrings if we didn't get the substring; skip checking test for substrings if we didn't get the substring we expect to be checked.

Test regexp optimizations for substrings

37d0e2a

hvds force-pushed the hv/topt branch from 2ed6414 to 37d0e2a Compare December 14, 2020 14:03

khwilliamson merged commit 36ff5b9 into blead Dec 24, 2020

hvds deleted the hv/topt branch May 4, 2021 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test harness for regexp optimization #18213

Add test harness for regexp optimization #18213

hvds commented Oct 6, 2020

khwilliamson commented Oct 8, 2020

xsawyerx commented Nov 1, 2020

iabyn commented Nov 9, 2020 via email

hvds commented Nov 11, 2020

khwilliamson commented Dec 13, 2020

hvds commented Dec 14, 2020

Add test harness for regexp optimization #18213

Add test harness for regexp optimization #18213

Conversation

hvds commented Oct 6, 2020

khwilliamson commented Oct 8, 2020

xsawyerx commented Nov 1, 2020

iabyn commented Nov 9, 2020 via email

hvds commented Nov 11, 2020

khwilliamson commented Dec 13, 2020

hvds commented Dec 14, 2020