-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compositional grounding is not composing #1115
Comments
The sequence allows for the following sort to happen, no? |
But the bigger question is in the title... Do we know where compositions are lost? |
The sequence only contains compositional groundings that aren't really composed. I'm thinking of something like eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala Lines 343 to 347 in 621ca46
|
Ok... |
In order to collect the inexactPredicateGroundings there are variables like isArg and isPred. If there are predicates, this seems like the only place they would be composed. Pseudo call stack:
|
Ok, thanks! Can you please try to find where the concepts get lost? For example, for the phrase "transportation of oil", where do we lose the grounding for "oil"? |
The program is running, but IIRC oil is grounded to fuel and is not an exact match. Transportation is not an exact match, either. |
Ben's "transportation of water" works, almost. The scores don't sort well. The noisyOr is only applied to filled slots, so filling a second slot with a lower value is not as good as having the first single slot.
|
Sure. I think most matches in the real world will be inexact due to language sparsity. But we should assemble predicates and their arguments regardless of exact/inexact match. |
This is pretty nice! |
They (water and transportation) weren't combined before. They were separate compositional groundings in the Seq that got sorted. |
I see. Awesome! |
When there is a predicate, as is the case with both "(price) of oil" and "water (transportation)", the findExactPredicateGroundingAndRange does not necessarily find an exact match at all or may find an exact match which doesn't include the predicate, I should have called it be called findExactGroundingAndRangeIfThereIsAPredicateSomewhere. If it doesn't find an exact match as is the case with "price of oil", perhaps it should specifically target the predicate, make do with whatever it can find, and then go on to look for possible arguments, etc. In this particular case, it looks like the ontology doesn't help by having a node called price_or_cost. It would be better to pick one or the other and move the extra to an example. For "water transportation", water matches first, even though it isn't the predicate, but it is possible to do them in the other order, apparently. |
I hope that @zupon is getting notifications in case he notices that I'm off track. |
I have been getting these notifications, and it looks like you might be right on track! At one point, we did want to try to match the entire mention span even when there were predicates. That would get us stuff like "climate change", where it's sorta compositional but we also just have a I can't look at this tonight, but I will try to take a look tomorrow. But I think this is looking in the right place. |
Right now it's failing one NOT test by getting extra things, but it is probably at least composing. If I turn the failingTests to passingTests, it gets 747 passing while master gets 771. I had added some things like multiple exact matches. Maybe that hurt it. I don't have a feel for the data. |
Thank you both! I think we should enable this change, which looks like it's improving things a lot. Also, @EgoLaparra: would it be possible for you to look at the 771 - 747 = 24 tests that fail now and try to see if we can fix them? |
I'm probably going to make a few more changes around eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/EidosOntologyGrounder.scala Lines 137 to 145 in 472520f
|
@kwalcock let me know when you are done with those changes and I will take a look at the failing tests. |
@EgoLaparra, I made the small change. |
Thank you both!
…On Sat, Feb 26, 2022 at 12:49 PM Keith Alcock ***@***.***> wrote:
@EgoLaparra <https://github.com/EgoLaparra>, I made the small change.
—
Reply to this email directly, view it on GitHub
<#1115 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI75TQ33OUT4JNEXZ2MZ53U5EVEDANCNFSM5PK663SQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
Almost all those tests seem to fail because composed groundings are scored higher than non-composed ones, even when the expected grounding is non-composed. For example, for the cause mention
Another example: the prediction for
Is this behavior (i.e., composed scores > non-composed scores) expected? |
There is in SRLCompositionalGrounder.scala a calculation of score in which the properties are multiplied by 0.5f before they are noisyOr'd. Changing that value might switch how these values sort. |
I think it's worth tuning the score that Keith mentioned. Also, the first compositional example: |
For both the What should happen if there are predicates is (1) the algorithm looks first at the entire span (e.g. What may be happening here is the (fuzzy) exact matching for the span is accidentally allowing the entire @MihaiSurdeanu AFAIK we don't actually keep track of the tokens used for each slot in the predicate grounding in a way that we can refer back to them later, but the algorithm does (or should) remove tokens that are used for grounding, like with the exact match case for predicates/arguments. |
There is this comment in SRLCompositionalGrounder: // TODO (at some point) -- the empty sequence here is a placeholder for increase/decrease triggers
// Currently we don't have "access" to those here, but that could be changed On the other hand, there are in the ontology words like "higher". One is in the ontology node higher_temperatures and there are others in the examples. To keep them in the text to be grounded might help in these cases. |
Here's the node for price_or_cost, which has some hints of increase. It wouldn't have to be that way. - node:
name: price_or_cost
patterns:
- \b([Cc]osts?|[Pp]rices?)\b
examples:
- additional costs
- capital costs
- cost
- costs
- expenses
- hidden costs
- high costs
- higher costs
- indirect costs
- input costs
- operating costs
- opportunity costs
- price
- recurrent costs
- social costs |
I've been playing with the properties score multiplier that @kwalcock mentioned and only very small values seem to have an effect. Setting it to In case you find it useful, I've pushed a branch that only runs the 29 failing tests: composition-failing-tests |
The stopwords are coming from https://github.com/clulab/eidos/blob/master/src/main/resources/org/clulab/wm/eidos/english/filtering/stops.txt and https://github.com/clulab/eidos/blob/master/src/main/resources/org/clulab/wm/eidos/english/filtering/transparent.txt and aren't good words to rule out. I'll try POS:
But also add . and , and probably lots of other things. It's expensive to unnecessarily ground against things. |
Now only 11 instead of 13 originally passing tests fail, which is good. Overall, though, 186 fail instead of the low of 141. 718 pass, down from 774. I will spot check for stupid mistakes, but otherwise it will probably have to be left at close to this. It is implemented at: eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala Lines 359 to 432 in 50b6076
|
This seems in retrospect to be ambiguous:
By look to the right I understood to look anywhere to the right all the way until the end of the mention, and to the left to look until the beginning of the mention. I've got a mention of 17 words here and there are some very long distance relationships. I wonder whether it should be one to the right, one to the left, two to the right, two to the left, etc. to find the closest one but with the right still at an advantage. There's probably not time to experiment with that. To the end doesn't really make sense, because that would always include the first one (in the sentence) which would always be found and not need to be a default. On the other hand, a different one in the sentence might be closer so that the first one is wrong. |
Taking the closest one, not going all the way to the end, performs worse, with 191 failed tests instead of 186. If it eventually looks like the properties are being over-generated, the search could be limited to the immediate vicinity and then be abandoned. |
Updated code: eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala Lines 412 to 470 in 106b1ce
|
The code looks great @kwalcock !
Thanks! |
I'm working on it. |
water transportation, which has an exact match on water: {
"@type" : "Extraction",
"@id" : "_:Extraction_3",
"type" : "concept",
"subtype" : "entity",
"labels" : [ "Concept", "Entity" ],
"text" : "water transportation",
"rule" : "simple-np++Decrease_ported_syntax_1_verb",
"canonicalName" : "water transportation",
"groundings" : [ {
"@type" : "Groundings",
"name" : "wm_compositional",
"version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
"versionDate" : "2021-10-19T19:46:27Z",
"values" : [ {
"@type" : "PredicateGrounding",
"theme" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/water",
"value" : 1.0
} ],
"themeProcess" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/",
"value" : 1.2696834802627563
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/commercial_transportation",
"value" : 1.1144535541534424
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/personal_transportation",
"value" : 1.1033762693405151
} ],
"value" : 1.0025968551635742,
"display" : "THEME: wm/concept/goods/water (1.0); THEME PROCESS: wm/process/transportation/ (1.2696835)"
}, {
"@type" : "PredicateGrounding",
"theme" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/water",
"value" : 1.0
} ],
"value" : 0.9900000095367432,
"display" : "THEME: wm/concept/goods/water (1.0)"
} ]
} |
transportation of oil, only inexact matches: {
"@type" : "Extraction",
"@id" : "_:Extraction_3",
"type" : "concept",
"subtype" : "entity",
"labels" : [ "Concept", "Entity" ],
"text" : "transportation of oil",
"rule" : "simple-np++Decrease_ported_syntax_1_verb",
"canonicalName" : "transportation oil",
"groundings" : [ {
"@type" : "Groundings",
"name" : "wm_compositional",
"version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
"versionDate" : "2021-10-19T19:46:27Z",
"values" : [ {
"@type" : "PredicateGrounding",
"theme" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/fuel",
"value" : 1.8379604816436768
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/environment/natural_resources/fossil_fuels",
"value" : 1.727161169052124
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/protein_foods",
"value" : 0.9827450513839722
} ],
"themeProcess" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/",
"value" : 1.2696834802627563
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/commercial_transportation",
"value" : 1.1144535541534424
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/personal_transportation",
"value" : 1.1033762693405151
} ],
"value" : 0.7849923372268677,
"display" : "THEME: wm/concept/goods/fuel (1.8379605); THEME PROCESS: wm/process/transportation/ (1.2696835)"
} ]
} |
cost of transportation of oil: {
"@type" : "Extraction",
"@id" : "_:Extraction_3",
"type" : "concept",
"subtype" : "entity",
"labels" : [ "Concept", "Entity" ],
"text" : "cost of transportation of oil",
"rule" : "gazetteer++simple-np++Decrease_ported_syntax_1_verb",
"canonicalName" : "transportation oil",
"groundings" : [ {
"@type" : "Groundings",
"name" : "wm_compositional",
"version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
"versionDate" : "2021-10-19T19:46:27Z",
"values" : [ {
"@type" : "PredicateGrounding",
"theme" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/fuel",
"value" : 1.8379604816436768
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/environment/natural_resources/fossil_fuels",
"value" : 1.727161169052124
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/concept/goods/protein_foods",
"value" : 0.9827450513839722
} ],
"themeProcess" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/",
"value" : 1.2696834802627563
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/commercial_transportation",
"value" : 1.1144535541534424
}, {
"@type" : "Grounding",
"ontologyConcept" : "wm/process/transportation/personal_transportation",
"value" : 1.1033762693405151
} ],
"themeProcessProperties" : [ {
"@type" : "Grounding",
"ontologyConcept" : "wm/property/price_or_cost",
"value" : 1.0
} ],
"value" : 0.890346109867096,
"display" : "THEME: wm/concept/goods/fuel (1.8379605); THEME PROCESS: wm/process/transportation/ (1.2696835), Process properties: wm/property/price_or_cost (1.0)"
} ]
} |
If I can read the JSON correctly, it looks to me like all these 3 examples are correct! Nice job @kwalcock ! |
|
Most of the errors are from a slot being filled in that wasn't before, particularly for properties and processes. Perhaps the tests paid less attention to those, or maybe the isPred = !isArg results in too many potential processes and maybe there are too many long-distance properties or the property match threshold is too low. This is already a week overdue. I'm not sure there's time for analysis. Examples:
Fewer are changes:
|
Agreed that we can't spend too much time on this at this point. |
It is grounding to "impact of the drought". "Impact" doesn't get printed because it is special, probably transparent. |
Then I think it is fine! Should we merge this PR? |
I would need to disable failing tests before it can go into master. Right now it is more important to go into kwalcock/wmexchanger which has all the updates for the new DART protocols. That hadn't gone to master yet because it was only just put into use. It is headed there for the code freeze. |
Thanks! |
The documents were regrounded and uploaded. I slacked Ben and Robyn. |
Beautiful!! Thanks! |
Regarding this:
There is code intended to do very much the same at these places which use a set of words which to exclude: eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/srl/SRLCompositionalGrounder.scala Line 108 in d9fa808
eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/srl/SRLCompositionalGrounder.scala Line 116 in d9fa808
eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/srl/SRLCompositionalGrounder.scala Lines 558 to 568 in d9fa808
The conditions are slightly different: the POS is not checked. It is also not used consequently throughout the code. It only rules out predicates right now. I wonder whether I should update the conditions or reuse them. |
While I'm at it, the predicates are calculated in SentenceHelper and that definition might be reused. It is slightly different: eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/srl/SentenceHelper.scala Lines 18 to 34 in d9fa808
|
Yep, I think the attachment strings should be excluded from grounding. Do these include increase/decrease words? What type are they, TriggeredAttachment? |
TriggeredAttachment is a superclass of things like Decrease, Increase, Quantification, Property, Pos(itive)Change, NegChange, Heding, Negation. This is code that Becky wrote, so probably well thought-out. It could be narrowed down to some of the subclasses. I'm not liking that the present code rules out words rather than words at positions. If a word is repeated in a sentence, removing based only text will take it out everywhere even if it is used differently. |
I agree with everything you said. But having the same word repeated in the
same phrase is pretty unlikely…
…On Thu, Mar 10, 2022 at 20:21 Keith Alcock ***@***.***> wrote:
TriggeredAttachment is a superclass of things like Decrease, Increase,
Quantification, Property, Pos(itive)Change, NegChange, Heding, Negation.
This is code that Becky wrote, so probably well thought-out. It could be
narrowed down to some of the subclasses.
I'm not liking that the present code rules out words rather than words at
positions. If a word is repeated in a sentence, removing based only text
will take it out everywhere even if it is used differently.
—
Reply to this email directly, view it on GitHub
<#1115 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI75TQVC3VTJ7X4W4ZINT3U7K32LANCNFSM5PK663SQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Exact matches are only run on the ontology node names. The ontology node examples have the approximate matches. We've been recently using lemmas to compare against the unlemmatized (raw) node names. The means that "higher temperatures" would be changed to "higher temperature" before comparison and not match the node wm/concept/environment/higher_temperatures. I don't think we can ask the ontology creators to use lemmas, but it is easy for us to test both words and lemmas and take the best. I have made that change. |
Less easy is dealing with "higher". It is one of those words ruled out because it is an attribute or something and therefore would not be used in an exact match. I think that creators of the ontology might put specific texts in the node names that they want to match no matter what. I'm waiting to invalidate the attributes and parts of speech until after the exact match calculation has completed. |
Another example seems to be armed conflict. |
African governments works better in this version. It is a node. Also "cash crops". There are some times in which the match is a distraction, however. So, I've made it so that at least one exactly matching word needs to be valid. In the sentence "exceptional prolonged and persistent agro-pastoral drought sequence Highlights 1" it changed the match from prolonged to drought. wm/concept/time/prolonged is strange. It seems more like a property. |
Thanks @kwalcock ! |
I wonder whether at
eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala
Line 285 in 8758be8
the exact and inexact groundings should not be combined in a sequence but rather into a combined predicate grounding.
The text was updated successfully, but these errors were encountered: