Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test harness for regexp optimization #18213

Merged
merged 5 commits into from
Dec 24, 2020
Merged

Add test harness for regexp optimization #18213

merged 5 commits into from
Dec 24, 2020

Conversation

hvds
Copy link
Contributor

@hvds hvds commented Oct 6, 2020

I welcome opinions on whether this is a sane approach to testing optimization flags for regexps before I start adding lots more tests. I can feel a lot of TODOs coming on ...

@iabyn of particular interest is whether this approach is going to be a problem for your regexp substring plans. Should I change the return data for re::optimization to provide substrs as an arrayref right now? Or are those plans far enough out that I shouldn't worry about them?

@khwilliamson
Copy link
Contributor

I didn't examine every line in detail, but I think it should be merged after squashing the commits. The outline looks good, and it is a new test that can be tweaked as we go along.

@xsawyerx
Copy link
Member

xsawyerx commented Nov 1, 2020

@iabyn ping?

@iabyn
Copy link
Contributor

iabyn commented Nov 9, 2020 via email

@hvds
Copy link
Contributor Author

hvds commented Nov 11, 2020

Such plans are so far off that you might as well go with anchored/floating etc for now. So this gets a +1 from me.

Thanks @iabyn.

Just some general comments on data file format:

  • we should allow lines containing comments: i.e. 'next if /^\s*#/' although they should still do a pass() so the test number corresponds to the line number within DATA.

Yes, good idea; I'll add this, and extend it to blank lines too.

  • possibly extend comments to any line: i.e. s/#.*$// (unless it's likely people will need to write patterns or data containing '#'s)? If we do this, then trailing comments will be delineated by a # rather than a tab.
  • should the field separator be whitepace rather than tabs?

I'm not so sure about these: it is no great hardship to write \t for literal tabs when needed in patterns, nor does it reduce legibility. That's less true when working around specialness of space or hash. Also, the final column is not just a comment but part of the legend for the test, so I feel it would be slightly misleading to give it a leading hash.

I do realise that it can be a little inconvenient to insert tabs if one's editor is set up to avoid them in perl's source (as mine is), but to me the benefits seemed to outweigh that.

@khwilliamson
Copy link
Contributor

Is this ready for merge?

hvds added 5 commits December 14, 2020 13:13
With a varying number of tests per data line, the plan is too much work
to maintain.
Say why we're skipping; skip min/max tests for substrings if we didn't
get the substring; skip checking test for substrings if we didn't get
the substring we expect to be checked.
@hvds
Copy link
Contributor Author

hvds commented Dec 14, 2020

Is this ready for merge?

Probably, yes; sorry, I've had no spare capacity in recent weeks.

I've rebased and pushed the latest state, which improves the handling of skips somewhat, and adds more tests: there are now 32 patterns generating 586 tests of which 14 are TODO; it would be useful also if you (@khwilliamson) could take a look at those TODOs and get a sense a) whether you agree the expected results are what we'd ideally want, and b) where those 14 cases fall on the spectrum between nice-to-have and bug. I don't think anything need gate on that though, I think it's probably fine to merge as is.

@khwilliamson khwilliamson merged commit 36ff5b9 into blead Dec 24, 2020
@hvds hvds deleted the hv/topt branch May 4, 2021 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants