-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugs in integration of oniguruma #2663
Comments
Several of the FLAGS (as in `test(_; FLAGS)`) simply do not work as advertised, whereas the functionality they promise is available via "Extended Groups". To make the documentation more useful while allowing for future improvements, it has been revised to focus on what is and ought to be the case. (Specifically, references to "m" and "s" as possible FLAGS have been dropped, and a concession to reality regarding "p" has been made.) See Issues jqlang#2562 and jqlang#2663
As noted in #2562, the "m" flag does have an effect. The tests above seem to use
As I understand it, the reason "s" and "p" do not change the behaviour of |
@annettejanewilson thanks for that.
produces
|
The option So the |
Also, |
Near as I can tell For Oniguruma to understand the
(in |
So I think in fact we're looking at an Oniguruma bug. I've single stepped through it a bit and I can't quite find the bug yet. |
@nicowilliams wrote:
As I mentioned, using e.g. jq 1.6, the "p" FLAG is recognized, but not as advertised: "p" does cause
|
@pkoppstein well, I was trying jq from |
@nicowilliams - Yes, we have to be careful about distinguishing between FLAGS and the options allowed in "extended groups". Regarding the latter, lower-case "p" is not one of the options. The Oniguruma documentation gives these syntax summaries for extended groups:
|
But you can enable it to be one of them by adding |
@nicowilliams wrote:
I would have expected to see something about that on the Oniguruma configuration page: https://github.com/kkos/oniguruma/blob/master/doc/SYNTAX.md Can you see it mentioned there? |
No, I found it by code inspection. Clearly it's not meant to be supported. Or maybe it is but they failed to document it. IMO the should just add it unconditionally, but anyways. |
(1) Re FLAGS (second argument to test/1): It may be helpful to remember that: (a) The jq documentation is muddled in that the descriptions of the "m" and "s" have been incorrectly swapped (that is, "m" should (in accordance with universal agreement) refer to ^ and $; and "s" to .) (b) jq's "m" FLAG behaves in accordance with the (correct) description of the "p" FLAG (i.e. "s" + "m") (c) jq's "p" FLAG behaves in accordance with the description: Thus the missing functionality is multiline-only behavior. (2) Re the "p" Extended Group option (as in Since the Oniguruma documentation is consistent (to the point of being emphatic) that Could the "p" FLAG be implemented in terms of |
@nicowilliams @itchyny @annettejanewilson - What are your thoughts about dealing with this for jq 1.7, assuming no fixes will be available for this release? It seems to me that the main options involve some combination of these elements:
|
@pkoppstein and maybe
Does (1) wed us to Oniguruma? Probably not. We could parse those from any regex if we had to. So I'm certainly happy with (1), but I'd like to do the others too. |
@nicowilliams asked:
No, at least in the sense that the following engines (*) all understand
Please note that your point (4) is essentially my point (0). That is, there seems to be a high likelihood that not all the major problems will be resolved in the near term (for whatever reason), so it makes sense to prepare for that eventuality. How would it be if I reworked what I have to make it clear which sentences are purely descriptive of "buggy" behavior that will hopefully change? Another thought is that it should be possible to rejigger the FLAGS and use the Extended Group functionality to (*) Footnote: https://regex101.com/ |
Several of the FLAGS (as in test(_; FLAGS)) simply do not work as advertised, whereas the functionality they promise is available via "Extended Groups". To make the documentation more useful while allowing for future improvements, it has been revised to clarify what is and what ought to be the case. See Issues jqlang#2562 and jqlang#2663
I'm uncertain this is accurate. My current understanding is this:
If my understanding here is correct, I don't think it's accurate to say the descriptions of "m" and "s" have been incorrectly swapped in the jq documentation. The documentation is misleading but it's not incorrect with respect to jq's behaviour. nicowilliams, earlier in the conversation:
This surprises me, and I wonder whether this is borne of confusion over the meaning and polarity of Oniguruma's options. When I first looked into this I got confused myself. At present I don't believe I've seen evidence that Oniguruma has a bug, but I am definitely feeling like I need to be more thorough in my testing, because it's clear there are areas I've come to different conclusions, and I want to be more certain I'm not working on faulty assumptions.
I don't have recommendations right now, I'm too uncertain until I've had time to research further. I'm a bit wary of inadvertently making the docs worse by compounding misunderstandings. |
@annettejanewilson - Yes, "universal" was a bit of an overstatement, At about the same time as you were writing your most recent post, I was (1) retaining the distinction between: a) FLAGS as they appear in the second arg of jq's b) single-letter options as used in "Extended Groups" (e.g. (2) for the terms "multi-line" and "single line", and the FLAGS "m" and
(3) avoiding any reference to the words used in Oniguruma's defined symbols. In other words, I would like to focus on what jqMaster DOES and what we anticipate jq should do in the future if there is a discrepancy. Thanks.p.s. p.s. I thought it would be useful if we could agree on a table showing the
|
I just wanted to explicitly add there is currently no way to change the behavior of ^/$ even if the documentation is also confusing and/or wrong, and that i would think is definitely a bug somewhere. 1504:0:karl@Amalthea:~$ ~/bin/jq --version
jq-1.7
1505:0:karl@Amalthea:~$ echo -e "a\nb\nc" | ~/bin/jq -sR 'test("^b"; "")'
false
1506:0:karl@Amalthea:~$ echo -e "a\nb\nc" | ~/bin/jq -sR 'test("^b"; "m")'
false
1507:0:karl@Amalthea:~$ echo -e "a\nb\nc" | ~/bin/jq -sR 'test("^b"; "s")'
false
1508:0:karl@Amalthea:~$ echo -e "a\nb\nc" | ~/bin/jq -sR 'test("^b"; "p")'
false
1509:0:karl@Amalthea:~$ echo -e "a\nb\nc" | ~/bin/jq -sR 'test("^a"; "")'
true
1510:0:karl@Amalthea:~$ echo -e "a\nb\nc" | ~/bin/jq -sR 'test("^a"; "m")'
true
1511:0:karl@Amalthea:~$ echo -e "a\nb\nc" | ~/bin/jq -sR 'test("^a"; "s")'
true
1512:0:karl@Amalthea:~$ echo -e "a\nb\nc" | ~/bin/jq -sR 'test("^a"; "p")'
true |
[EDIT: erroneous remarks about the "m" FLAG have been corrected.]
There are several outstanding bug reports regarding jq's support for regular expression functionality (e.g. #2562), but
it's beginning to dawn on me that there might be some significant issues with the integration of the Oniguruma library, so I thought it could be helpful to have a central "hub" for identifying these issues so that hopefully they can be more efficiently resolved.
(1) Both the manual and builtin.c envision FLAGS (as in
test(_; FLAGS)
) as being any of the characters in [gilmnpsx], but in practice, "m", "p", and "s" do not work as advertised in the jq manual:According to the manual:
But in practice:
Here are some examples:
(2) There seems to be a general problem in the support for "Extended Groups" (EGs) of the form
(?-OFF)
or(?-OFF:regex)
.Consider for example:
(I have not uncovered problems with EGs of the form
(?ON)
and(?ON-OFF)
,where
ON
is understood to be a string of letters chosen from [imsxWDSPy].)It's hard to imagine that this is an Oniguruma issue (I have checked at https://github.com/kkos/oniguruma/issues?q=is%3Aissue+extended).
The text was updated successfully, but these errors were encountered: