Compositional grounding is not composing #1115

kwalcock · 2022-02-25T18:28:03Z

I wonder whether at

eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala

Line 285 in 8758be8

val predicateGroundings = exactPredicateGrounding +: inexactPredicateGroundings

the exact and inexact groundings should not be combined in a sequence but rather into a combined predicate grounding.

MihaiSurdeanu · 2022-02-25T23:36:48Z

The sequence allows for the following sort to happen, no?

MihaiSurdeanu · 2022-02-25T23:37:19Z

But the bigger question is in the title... Do we know where compositions are lost?

kwalcock · 2022-02-25T23:57:21Z

The sequence only contains compositional groundings that aren't really composed. I'm thinking of something like

eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala

Lines 343 to 347 in 621ca46

    
           val combinedPredicateGroundings = inexactPredicateGroundings.flatMap { inexactPredicateGrounding => 
        
             combineExactAndInexact(exactPredicateGrounding, inexactPredicateGrounding) 
        
           } 
        
           // Do all the ones that can be combined, then the exact only, then the individual inexact ones. 
        
           val predicateGroundings = (combinedPredicateGroundings :+ exactPredicateGrounding) ++ inexactPredicateGroundings

MihaiSurdeanu · 2022-02-26T00:06:12Z

Ok...
But where are these combined with the concepts that serve as arguments to these predicates?

kwalcock · 2022-02-26T00:21:37Z

In order to collect the inexactPredicateGroundings there are variables like isArg and isPred. If there are predicates, this seems like the only place they would be composed.

Pseudo call stack:

groundEidosMention
groundSentenceSpan
groundSentenceSpan
if (validPredicates.isEmpty) groundWithoutPredicates(sentenceHelper)
else groundWithPredicates(sentenceHelper)
  groundWithPredicates
  findExactPredicateGroundingAndRange
  findInexactPredicateGroundings

MihaiSurdeanu · 2022-02-26T00:40:51Z

Ok, thanks!

Can you please try to find where the concepts get lost? For example, for the phrase "transportation of oil", where do we lose the grounding for "oil"?

kwalcock · 2022-02-26T00:46:21Z

The program is running, but IIRC oil is grounded to fuel and is not an exact match. Transportation is not an exact match, either.
Not sure why. Inexact matches are not combined.

kwalcock · 2022-02-26T01:16:00Z

Ben's "transportation of water" works, almost. The scores don't sort well. The noisyOr is only applied to filled slots, so filling a second slot with a lower value is not as good as having the first single slot.

THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.2596835
THEME: wm/concept/goods/water (1.0) THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.0025969
THEME: wm/concept/goods/water (1.0) Total: 0.99

MihaiSurdeanu · 2022-02-26T01:16:19Z

Sure. I think most matches in the real world will be inexact due to language sparsity. But we should assemble predicates and their arguments regardless of exact/inexact match.

MihaiSurdeanu · 2022-02-26T01:17:28Z

Ben's "transportation of water" works, almost. The scores don't sort well. The noisyOr is only applied to filled slots, so filling a second slot with a lower value is not as good as having the first single slot.
THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.2596835
THEME: wm/concept/goods/water (1.0) THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.0025969
THEME: wm/concept/goods/water (1.0) Total: 0.99

This is pretty nice!
Then what did Ben see? Is there a bug?

kwalcock · 2022-02-26T01:20:01Z

They (water and transportation) weren't combined before. They were separate compositional groundings in the Seq that got sorted.

MihaiSurdeanu · 2022-02-26T01:22:03Z

I see. Awesome!

kwalcock · 2022-02-26T02:38:21Z

When there is a predicate, as is the case with both "(price) of oil" and "water (transportation)", the findExactPredicateGroundingAndRange does not necessarily find an exact match at all or may find an exact match which doesn't include the predicate, I should have called it be called findExactGroundingAndRangeIfThereIsAPredicateSomewhere. If it doesn't find an exact match as is the case with "price of oil", perhaps it should specifically target the predicate, make do with whatever it can find, and then go on to look for possible arguments, etc. In this particular case, it looks like the ontology doesn't help by having a node called price_or_cost. It would be better to pick one or the other and move the extra to an example. For "water transportation", water matches first, even though it isn't the predicate, but it is possible to do them in the other order, apparently.

kwalcock · 2022-02-26T02:39:05Z

I hope that @zupon is getting notifications in case he notices that I'm off track.

zupon · 2022-02-26T02:52:01Z

I have been getting these notifications, and it looks like you might be right on track! At one point, we did want to try to match the entire mention span even when there were predicates. That would get us stuff like "climate change", where it's sorta compositional but we also just have a climate_change concept. Since we wanted the matching for things like this to be fuzzy, we then did the non-exact matching. This is useful when we have mentions that don't exactly match our ontology, but it sounds like it's now a problem whereby if we do get some non-exact match this way we just stop and ignore the rest, instead of continuing with the remaining content.

I can't look at this tonight, but I will try to take a look tomorrow. But I think this is looking in the right place.

kwalcock · 2022-02-26T07:59:46Z

Right now it's failing one NOT test by getting extra things, but it is probably at least composing. If I turn the failingTests to passingTests, it gets 747 passing while master gets 771. I had added some things like multiple exact matches. Maybe that hurt it. I don't have a feel for the data.

MihaiSurdeanu · 2022-02-26T14:02:40Z

Thank you both! I think we should enable this change, which looks like it's improving things a lot.

Also, @EgoLaparra: would it be possible for you to look at the 771 - 747 = 24 tests that fail now and try to see if we can fix them?
Thank you all!

kwalcock · 2022-02-26T15:48:08Z

I'm probably going to make a few more changes around

eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/EidosOntologyGrounder.scala

Lines 137 to 145 in 472520f

    
           val result = overlapTuples 
        
               .sorted 
        
               .take(3) 
        
               .map { overlapTuple => 
        
                 val singleOntologyNodeGrounding = OntologyNodeGrounding(overlapTuple._4, 1.0f) 
        
                 val range = Range(overlapTuple._2, overlapTuple._2 - overlapTuple._1) // - because it is -length 
        
                 (singleOntologyNodeGrounding, range) 
        
               }

within a couple of hours. Those lines did not survive the night.

EgoLaparra · 2022-02-26T17:02:48Z

@kwalcock let me know when you are done with those changes and I will take a look at the failing tests.

kwalcock · 2022-02-26T19:49:09Z

@EgoLaparra, I made the small change.

MihaiSurdeanu · 2022-02-26T22:30:46Z

Thank you both!

…

On Sat, Feb 26, 2022 at 12:49 PM Keith Alcock ***@***.***> wrote: @EgoLaparra <https://github.com/EgoLaparra>, I made the small change. — Reply to this email directly, view it on GitHub <#1115 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI75TQ33OUT4JNEXZ2MZ53U5EVEDANCNFSM5PK663SQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

EgoLaparra · 2022-02-27T02:42:04Z

Almost all those tests seem to fail because composed groundings are scored higher than non-composed ones, even when the expected grounding is non-composed.

For example, for the cause mention high unemployment the expected grounding is Seq("wm/concept/economy/unemployment", "", "", "") but the predictions is Seq(wm/concept/economy/unemployment, wm/property/quality, , ,). The top 3 groundings produced by the algorithm are:

Seq(wm/concept/economy/unemployment, wm/property/quality, , , 0.99481976)
Seq(wm/concept/economy/unemployment, , , , 0.99)
Seq(, wm/property/quality, , , 0.4819764)

Another example: the prediction for Increasing tensions is Seq(wm/concept/crisis_or_disaster/conflict/tension, wm/property/price_or_cost, ,) while the expected grounding is Seq("wm/concept/crisis_or_disaster/conflict/tension", "", "", ""). The top 3:

Seq(wm/concept/crisis_or_disaster/conflict/tension, wm/property/price_or_cost, , , 0.9948348)
Seq(wm/concept/crisis_or_disaster/conflict/tension, , , , 0.99)
Seq(, wm/property/price_or_cost, , , 0.48348033)

Is this behavior (i.e., composed scores > non-composed scores) expected?

kwalcock · 2022-02-27T03:09:26Z

There is in SRLCompositionalGrounder.scala a calculation of score in which the properties are multiplied by 0.5f before they are noisyOr'd. Changing that value might switch how these values sort.

MihaiSurdeanu · 2022-02-27T13:38:11Z

I think it's worth tuning the score that Keith mentioned. Also, the first compositional example:
Seq(wm/concept/economy/unemployment, wm/property/quality, , ,)
seems correct to me.
However, the second one is clearly wrong...
Any other ideas on how to fix these? @zupon, you did look at situations such as the second one, right? In this case, I think "tensions" is consumed by the concept, and "increasing" is a magnitude adjective and it should not be used or grounding. Thus, there should be no tokens left for grounding the phantom property "price_or_cost"... You did keep track of which words were consumed during grounding, right @zupon ?

zupon · 2022-02-27T17:11:22Z

For both the high unemployment and Increasing tensions examples, are there predicates? I can sorta see the link from high --> quality, but I'm not seeing where Increasing --> price would make sense.

What should happen if there are predicates is (1) the algorithm looks first at the entire span (e.g. high unemployment) to see if it matches a node or example exactly (for various meanings of "exactly"); (2) if it doesn't, we then take each predicate or argument individually and then ground it. When we match something with the fuzzy exact match, we do (or are supposed to, anyway) remove those tokens that matched from our subsequent grounding attempts. For example, if we had the span climate change regulations, our exact match might return the climate_change Concept. After the match, we remove those two tokens and are only left with regulations to ground.

What may be happening here is the (fuzzy) exact matching for the span is accidentally allowing the entire Increasing tensions span to check against the Property branch using w2v rather than only using regex like we do elsewhere.

@MihaiSurdeanu AFAIK we don't actually keep track of the tokens used for each slot in the predicate grounding in a way that we can refer back to them later, but the algorithm does (or should) remove tokens that are used for grounding, like with the exact match case for predicates/arguments.

kwalcock · 2022-02-28T02:16:57Z

There is this comment in SRLCompositionalGrounder:

      // TODO (at some point) -- the empty sequence here is a placeholder for increase/decrease triggers
      //  Currently we don't have "access" to those here, but that could be changed

On the other hand, there are in the ontology words like "higher". One is in the ontology node higher_temperatures and there are others in the examples. To keep them in the text to be grounded might help in these cases.

kwalcock · 2022-02-28T02:29:39Z

Here's the node for price_or_cost, which has some hints of increase. It wouldn't have to be that way.

                - node:
                    name: price_or_cost
                    patterns:
                        - \b([Cc]osts?|[Pp]rices?)\b
                    examples:
                        - additional costs
                        - capital costs
                        - cost
                        - costs
                        - expenses
                        - hidden costs
                        - high costs
                        - higher costs
                        - indirect costs
                        - input costs
                        - operating costs
                        - opportunity costs
                        - price
                        - recurrent costs
                        - social costs

EgoLaparra · 2022-02-28T02:51:46Z

I've been playing with the properties score multiplier that @kwalcock mentioned and only very small values seem to have an effect. Setting it to 0.1f or 0.05f only 1 of the 29 tests passes. With 0.01f, 9 tests pass. Setting it to 0.0f, 20 of the tests pass, and adding also a 0.0f multiplier to processes, 27 out the 29 tests pass, however, this seems like switching off the composition.

In case you find it useful, I've pushed a branch that only runs the 29 failing tests: composition-failing-tests

kwalcock · 2022-03-02T05:19:39Z

The stopwords are coming from https://github.com/clulab/eidos/blob/master/src/main/resources/org/clulab/wm/eidos/english/filtering/stops.txt and https://github.com/clulab/eidos/blob/master/src/main/resources/org/clulab/wm/eidos/english/filtering/transparent.txt and aren't good words to rule out. I'll try POS:

Tag != IN*
!= DT*
!= PRP*

But also add . and , and probably lots of other things. It's expensive to unnecessarily ground against things.

kwalcock · 2022-03-02T06:12:53Z

Now only 11 instead of 13 originally passing tests fail, which is good. Overall, though, 186 fail instead of the low of 141. 718 pass, down from 774. I will spot check for stupid mistakes, but otherwise it will probably have to be left at close to this.

It is implemented at:

eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala

Lines 359 to 432 in 50b6076

    
           def findInexactPredicateGroundings(mentionRange: Range, sentenceHelper: SentenceHelper): Seq[PredicateGrounding] = { 
        
             val mentionStart = mentionRange.start 
        
             val isStops: IndexedSeq[Boolean] = mentionRange.map(isStopword(sentenceHelper, _)) 
        
             val indices: IndexedSeq[Int] = isStops.indices // indices(0) + mentionStart = mentionRange(0) 
        
             if (indices.forall(isStops)) Seq.empty // There is nothing to ground, so short-circuit it. 
        
             else { 
        
               val propertyOntologyGroundingOpts: Seq[Option[OntologyGrounding]] = indices.map { index => 
        
                 if (isStops(index)) None 
        
                 else maybeProperty(Interval(mentionStart + index), sentenceHelper) 
        
               } 
        
               val isProps: IndexedSeq[Boolean] = indices.map { index => propertyOntologyGroundingOpts(index).isDefined } 
        
               val isArgs: IndexedSeq[Boolean] = indices.map { index => !isStops(index) && !isProps(index) && sentenceHelper.isArg(mentionStart + index) } 
        
               val isPreds: IndexedSeq[Boolean] = indices.map { index => !isStops(index) && !isProps(index) && !isArgs(index) } 
        
               // Since these are all mutually exclusive, one could go through just once and trickle down. 
        
               // Since isPreds is the catch-all, there must be something for every index.  There is no other. 
        
               val intArgs: IndexedSeq[Int] = boolsToInts(isArgs) // turns (false, true, false, true) to (0, 1, 0 2) 
        
               val intPreds: IndexedSeq[Int] = boolsToInts(isPreds) // turns (true, false, false, true) to (1, 0, 0, 2) 
        
               // 0, either not property or property that does not belong to arg or pred 
        
               // +n, property that belongs to an arg 
        
               // -n, property that belongs to a pred 
        
               val intProps: IndexedSeq[Int] = indices.map { index => 
        
                 if (!isProps(index)) 0 
        
                 else { 
        
                   // Prefer looking to the right, after. 
        
                   val argWhereAndWhatOpt: Option[(Int, Int)] = Collection.findWhereWhatOptAfter(intArgs, index)(_ != 0) 
        
                   val predWhereAndWhatOpt: Option[(Int, Int)] = Collection.findWhereWhatOptAfter(intPreds, index)(_ != 0) 
        
                   if (argWhereAndWhatOpt.isDefined || predWhereAndWhatOpt.isDefined) 
        
                     indexAndCountToCount(index, argWhereAndWhatOpt, predWhereAndWhatOpt) 
        
                   else { 
        
                     // Then look left if necessary, before. 
        
                     val argWhereAndWhatOpt = Collection.findWhereWhatOptBefore(intArgs, index)(_ != 0) 
        
                     val predWhereAndWhatOpt = Collection.findWhereWhatOptBefore(intPreds, index)(_ != 0) 
        
                     if (argWhereAndWhatOpt.isDefined || predWhereAndWhatOpt.isDefined) { 
        
                       indexAndCountToCount(index, argWhereAndWhatOpt, predWhereAndWhatOpt) 
        
                     } 
        
                     else 
        
                       0 
        
                   } 
        
                 } 
        
               } 
        
               // There is a property and it maps to no arg or pred. 
        
               val zeroPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == 0 } 
        
               val predicateTuple = 
        
                 if (zeroPropertyIndexes.nonEmpty) { 
        
                   // This can only happen if there are no arguments or predicates. 
        
                   // This cannot be None because zeroProperties is nonEmpty. 
        
                   val propertyOntologyGroundingOpt = getBest(zeroPropertyIndexes, propertyOntologyGroundingOpts) 
        
                   // Arbitrarily put the property in the concept property slot. 
        
                   PredicateTuple(None, propertyOntologyGroundingOpt, None, None, emptyOntologyGrounding) 
        
                 } 
        
                 else { 
        
                   // We're only grounding against the very first argument or predicate, so +/- 1. 
        
                   val argPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == 1 } 
        
                   val bestArgPropertiesOpt = getBest(argPropertyIndexes, propertyOntologyGroundingOpts) 
        
                   val predPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == -1 } 
        
                   val bestPredPropertiesOpt = getBest(predPropertyIndexes, propertyOntologyGroundingOpts) 
        
                   val argOntologyGroundingOpt = Collection.optIndexOf(isArgs, true) // Since we're using the first one only, or none. 
        
                       .map { index => 
        
                         groundToBranches(Seq(SRLCompositionalGrounder.CONCEPT), Interval(mentionStart + index), s, topNOpt, thresholdOpt) 
        
                       } 
        
                   val predOntologyGroundingOpt = Collection.optIndexOf(isPreds, true) // Since we're using the first one only, or none. 
        
                       .map { index => 
        
                         groundToBranches(Seq(SRLCompositionalGrounder.PROCESS), Interval(mentionStart + index), s, topNOpt, thresholdOpt) 
        
                       } 
        
                   PredicateTuple(argOntologyGroundingOpt, bestArgPropertiesOpt, predOntologyGroundingOpt, bestPredPropertiesOpt, emptyOntologyGrounding) 
        
                 } 
        
               val predicateGroundings = Seq(PredicateGrounding(predicateTuple)) 
        
               val sortedPredicateGroundings = predicateGroundings.sortBy(-_.score) 
        
               val slicedPredicateGroundings = topNOpt.map(sortedPredicateGroundings.take).getOrElse(sortedPredicateGroundings) 
        
               slicedPredicateGroundings 
        
             } 
        
           }

kwalcock · 2022-03-02T07:20:51Z

This seems in retrospect to be ambiguous:

	4. Check if properties in 2 attach to them (for each property look first to the right, then to left if none found; pick the first process/concept)

By look to the right I understood to look anywhere to the right all the way until the end of the mention, and to the left to look until the beginning of the mention. I've got a mention of 17 words here and there are some very long distance relationships. I wonder whether it should be one to the right, one to the left, two to the right, two to the left, etc. to find the closest one but with the right still at an advantage. There's probably not time to experiment with that.

To the end doesn't really make sense, because that would always include the first one (in the sentence) which would always be found and not need to be a default. On the other hand, a different one in the sentence might be closer so that the first one is wrong.

kwalcock · 2022-03-02T09:12:25Z

Taking the closest one, not going all the way to the end, performs worse, with 191 failed tests instead of 186. If it eventually looks like the properties are being over-generated, the search could be limited to the immediate vicinity and then be abandoned.

kwalcock · 2022-03-02T09:22:43Z

Updated code:

eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/SRLCompositionalGrounder.scala

Lines 412 to 470 in 106b1ce

    
           def findInexactPredicateGroundings(mentionRange: Range, sentenceHelper: SentenceHelper): Seq[PredicateGrounding] = { 
        
             val mentionStart = mentionRange.start 
        
             val isStops: IndexedSeq[Boolean] = mentionRange.map(isSkipword(sentenceHelper, _)) 
        
             val indices: IndexedSeq[Int] = isStops.indices // indices(0) + mentionStart = mentionRange(0) 
        
             if (indices.forall(isStops)) Seq.empty // There is nothing to ground, so short-circuit it. 
        
             else { 
        
               val propertyOntologyGroundingOpts: Seq[Option[OntologyGrounding]] = indices.map { index => 
        
                 if (isStops(index)) None 
        
                 else maybeProperty(Interval(mentionStart + index), sentenceHelper) 
        
               } 
        
               val isProps: IndexedSeq[Boolean] = indices.map { index => propertyOntologyGroundingOpts(index).isDefined } 
        
               val isArgs: IndexedSeq[Boolean] = indices.map { index => !isStops(index) && !isProps(index) && sentenceHelper.isArg(mentionStart + index) } 
        
               val isPreds: IndexedSeq[Boolean] = indices.map { index => !isStops(index) && !isProps(index) && !isArgs(index) } 
        
               // Since these are all mutually exclusive, one could go through just once and trickle down. 
        
               // Since isPreds is the catch-all, there must be something for every index.  There is no other. 
        
               val intArgs: IndexedSeq[Int] = boolsToInts(isArgs) // turns (false, true, false, true) to (0, 1, 0 2) 
        
               val intPreds: IndexedSeq[Int] = boolsToInts(isPreds).map(_ * -1) // turns (true, false, false, true) to (-1, 0, 0, -2) 
        
               // 0, either not property or property that does not belong to arg or pred 
        
               // +n, property that belongs to an arg 
        
               // -n, property that belongs to a pred 
        
               val intProps: IndexedSeq[Int] = indices.map { index => 
        
                 if (!isProps(index)) 0 
        
                 else getRightThenLeftCount(index, intArgs, intPreds) 
        
                 // else getAroundCount(index, intArgs, intPreds) 
        
               } 
        
               // There is a property and it maps to no arg or pred. 
        
               val zeroPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == 0 } 
        
               val predicateTuple = 
        
                 if (zeroPropertyIndexes.nonEmpty) { 
        
                   // This can only happen if there are no arguments or predicates. 
        
                   // This cannot be None because zeroProperties is nonEmpty. 
        
                   val propertyOntologyGroundingOpt = getBest(zeroPropertyIndexes, propertyOntologyGroundingOpts) 
        
                   // Arbitrarily put the property in the concept property slot. 
        
                   PredicateTuple(None, propertyOntologyGroundingOpt, None, None, emptyOntologyGrounding) 
        
                 } 
        
                 else { 
        
                   // We're only grounding against the very first argument or predicate, so +/- 1. 
        
                   val argPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == 1 } 
        
                   val bestArgPropertiesOpt = getBest(argPropertyIndexes, propertyOntologyGroundingOpts) 
        
                   val predPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == -1 } 
        
                   val bestPredPropertiesOpt = getBest(predPropertyIndexes, propertyOntologyGroundingOpts) 
        
                   val argOntologyGroundingOpt = Collection.optIndexOf(isArgs, true) // Since we're using the first one only, or none. 
        
                       .map { index => 
        
                         groundToBranches(Seq(SRLCompositionalGrounder.CONCEPT), Interval(mentionStart + index), s, topNOpt, thresholdOpt) 
        
                       } 
        
                   val predOntologyGroundingOpt = Collection.optIndexOf(isPreds, true) // Since we're using the first one only, or none. 
        
                       .map { index => 
        
                         groundToBranches(Seq(SRLCompositionalGrounder.PROCESS), Interval(mentionStart + index), s, topNOpt, thresholdOpt) 
        
                       } 
        
                   PredicateTuple(argOntologyGroundingOpt, bestArgPropertiesOpt, predOntologyGroundingOpt, bestPredPropertiesOpt, emptyOntologyGrounding) 
        
                 } 
        
               val predicateGroundings = Seq(PredicateGrounding(predicateTuple)) 
        
               val sortedPredicateGroundings = predicateGroundings.sortBy(-_.score) 
        
               val slicedPredicateGroundings = topNOpt.map(sortedPredicateGroundings.take).getOrElse(sortedPredicateGroundings) 
        
               slicedPredicateGroundings 
        
             } 
        
           }

MihaiSurdeanu · 2022-03-02T13:26:37Z

The code looks great @kwalcock !
I think we're very close to being done. Before that:

Can you please send the groundings for our usual suspects: "water transportation", "transportation of oil", "cost of transportation of oil"?
Send the outputs for the some of newly failing tests?

Thanks!

kwalcock · 2022-03-02T15:04:37Z

I'm working on it.

kwalcock · 2022-03-02T15:11:29Z

water transportation, which has an exact match on water:

{
    "@type" : "Extraction",
    "@id" : "_:Extraction_3",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Concept", "Entity" ],
    "text" : "water transportation",
    "rule" : "simple-np++Decrease_ported_syntax_1_verb",
    "canonicalName" : "water transportation",
    "groundings" : [ {
      "@type" : "Groundings",
      "name" : "wm_compositional",
      "version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
      "versionDate" : "2021-10-19T19:46:27Z",
      "values" : [ {
        "@type" : "PredicateGrounding",
        "theme" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/water",
          "value" : 1.0
        } ],
        "themeProcess" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/",
          "value" : 1.2696834802627563
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/commercial_transportation",
          "value" : 1.1144535541534424
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/personal_transportation",
          "value" : 1.1033762693405151
        } ],
        "value" : 1.0025968551635742,
        "display" : "THEME: wm/concept/goods/water (1.0); THEME PROCESS: wm/process/transportation/ (1.2696835)"
      }, {
        "@type" : "PredicateGrounding",
        "theme" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/water",
          "value" : 1.0
        } ],
        "value" : 0.9900000095367432,
        "display" : "THEME: wm/concept/goods/water (1.0)"
      } ]
    }

kwalcock · 2022-03-02T15:16:30Z

transportation of oil, only inexact matches:

{
    "@type" : "Extraction",
    "@id" : "_:Extraction_3",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Concept", "Entity" ],
    "text" : "transportation of oil",
    "rule" : "simple-np++Decrease_ported_syntax_1_verb",
    "canonicalName" : "transportation oil",
    "groundings" : [ {
      "@type" : "Groundings",
      "name" : "wm_compositional",
      "version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
      "versionDate" : "2021-10-19T19:46:27Z",
      "values" : [ {
        "@type" : "PredicateGrounding",
        "theme" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/fuel",
          "value" : 1.8379604816436768
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/environment/natural_resources/fossil_fuels",
          "value" : 1.727161169052124
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/protein_foods",
          "value" : 0.9827450513839722
        } ],
        "themeProcess" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/",
          "value" : 1.2696834802627563
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/commercial_transportation",
          "value" : 1.1144535541534424
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/personal_transportation",
          "value" : 1.1033762693405151
        } ],
        "value" : 0.7849923372268677,
        "display" : "THEME: wm/concept/goods/fuel (1.8379605); THEME PROCESS: wm/process/transportation/ (1.2696835)"
      } ]
    }

kwalcock · 2022-03-02T15:19:58Z

cost of transportation of oil:

{
    "@type" : "Extraction",
    "@id" : "_:Extraction_3",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Concept", "Entity" ],
    "text" : "cost of transportation of oil",
    "rule" : "gazetteer++simple-np++Decrease_ported_syntax_1_verb",
    "canonicalName" : "transportation oil",
    "groundings" : [ {
      "@type" : "Groundings",
      "name" : "wm_compositional",
      "version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
      "versionDate" : "2021-10-19T19:46:27Z",
      "values" : [ {
        "@type" : "PredicateGrounding",
        "theme" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/fuel",
          "value" : 1.8379604816436768
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/environment/natural_resources/fossil_fuels",
          "value" : 1.727161169052124
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/protein_foods",
          "value" : 0.9827450513839722
        } ],
        "themeProcess" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/",
          "value" : 1.2696834802627563
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/commercial_transportation",
          "value" : 1.1144535541534424
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/personal_transportation",
          "value" : 1.1033762693405151
        } ],
        "themeProcessProperties" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/property/price_or_cost",
          "value" : 1.0
        } ],
        "value" : 0.890346109867096,
        "display" : "THEME: wm/concept/goods/fuel (1.8379605); THEME PROCESS: wm/process/transportation/ (1.2696835), Process properties: wm/property/price_or_cost (1.0)"
      } ]
    }

MihaiSurdeanu · 2022-03-02T15:48:26Z

If I can read the JSON correctly, it looks to me like all these 3 examples are correct! Nice job @kwalcock !

kwalcock · 2022-03-02T16:14:01Z

canonicalName: drought
words: of the drought
[info] - should process "The impact of the drought has been exacerbated by high local cereal prices , excess livestock mortality , conflict and restricted humanitarian access in some areas ." effect themeProperty correctly *** FAILED ***
[info]   "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:45)
[info]   Analysis:
[info]   "[wm/property/preparedness]" -> "[]"

It wanted
val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "", "", "")
but apparently got
val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "wm/property/preparedness", "", "")

kwalcock · 2022-03-02T16:30:07Z

Most of the errors are from a slot being filled in that wasn't before, particularly for properties and processes. Perhaps the tests paid less attention to those, or maybe the isPred = !isArg results in too many potential processes and maybe there are too many long-distance properties or the property match threshold is too low.

This is already a week overdue. I'm not sure there's time for analysis.

Examples:

[info]   "[wm/property/stability]" was not equal to "[]" (TestGrounding.scala:47)
[info]   "[wm/property/condition]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/process/population/death]" was not equal to "[]" (TestGrounding.scala:46)
[info]   "[wm/process/combining]" was not equal to "[]" (TestGrounding.scala:46)
[info]   "[wm/property/violent]" was not equal to "[]" (TestGrounding.scala:47)
[info]   "[wm/property/quality]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/aftermath]" was not equal to "[]" (TestGrounding.scala:47)
[info]   "[wm/property/stability]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/stability]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/price_or_cost]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:47)
etc.

Fewer are changes:

[info]   "wm/process/[farming/herding]" was not equal to "wm/process/[population/migrate/]" (TestGrounding.scala:46)
[info]   "wm/concept/[entity/muslim_communities]" was not equal to "wm/concept/[agriculture/]" (TestGrounding.scala:44)
[info]   "wm/concept/[entity/hamas]" was not equal to "wm/concept/[plan/]" (TestGrounding.scala:44)
[info]   "wm/concept/[entity/muslim_communities]" was not equal to "wm/concept/[government/]" (TestGrounding.scala:44)
[info]   "...ocess/communication/[campaign]" was not equal to "...ocess/communication/[informing]" (TestGrounding.scala:46)
etc.

MihaiSurdeanu · 2022-03-02T16:44:17Z

Agreed that we can't spend too much time on this at this point.
But the above example for "of the drought" worries me. Which tokens were used for the preparedness grounding? It seems "drought" was used for drought. The other two "of" and "the" are stop words. So, then what was used?

kwalcock · 2022-03-02T17:20:16Z

It is grounding to "impact of the drought". "Impact" doesn't get printed because it is special, probably transparent.

MihaiSurdeanu · 2022-03-02T17:22:05Z

Then I think it is fine!

Should we merge this PR?

kwalcock · 2022-03-02T17:27:22Z

I would need to disable failing tests before it can go into master. Right now it is more important to go into kwalcock/wmexchanger which has all the updates for the new DART protocols. That hadn't gone to master yet because it was only just put into use. It is headed there for the code freeze.

MihaiSurdeanu · 2022-03-02T17:28:18Z

Thanks!
Let me know when it's done so we can email MITRE.

kwalcock · 2022-03-02T23:33:10Z

The documents were regrounded and uploaded. I slacked Ben and Robyn.

MihaiSurdeanu · 2022-03-02T23:34:33Z

Beautiful!! Thanks!

kwalcock · 2022-03-10T22:37:46Z

@MihaiSurdeanu

Regarding this:

Do NOT use them if they were used as triggers for increase/decrease attributes, AND they are tagged as JJ* or VBN.

There is code intended to do very much the same at these places which use a set of words which to exclude:

eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/srl/SRLCompositionalGrounder.scala

Line 108 in d9fa808

getAttachmentStrings(mention.odinMention),

eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/srl/SRLCompositionalGrounder.scala

Line 116 in d9fa808

    
           def groundSentenceSpan(s: Sentence, start: Int, end: Int, exclude: Set[String], topN: Option[Int], threshold: Option[Float]): Seq[OntologyGrounding] = {

eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/srl/SRLCompositionalGrounder.scala

Lines 558 to 568 in d9fa808

    
           // Get the triggers, quantifiers, and context phrases of the attachments 
        
           private def getAttachmentStrings(mention: Mention): Set[String] = { 
        
             mention.attachments.flatMap { a => 
        
               a match { 
        
                 case _: Property => Seq.empty 
        
                 case t: TriggeredAttachment => Seq(t.trigger) ++ t.quantifiers.getOrElse(Seq()) 
        
                 case c: ContextAttachment => Seq(c.text) 
        
                 case _ => throw new RuntimeException("Unexpected attachment") 
        
               } 
        
             } 
        
           }

The conditions are slightly different: the POS is not checked. It is also not used consequently throughout the code. It only rules out predicates right now. I wonder whether I should update the conditions or reuse them.

kwalcock · 2022-03-11T01:00:44Z

While I'm at it, the predicates are calculated in SentenceHelper and that definition might be reused. It is slightly different:

eidos/src/main/scala/org/clulab/wm/eidos/groundings/grounders/srl/SentenceHelper.scala

Lines 18 to 34 in d9fa808

    
           val validPredicates: Seq[Int] = { 
        
             // don't use the roots as here we're only interested in outgoing "sources" for predicates, 
        
             // and the roots also contain the unreachable nodes etc. 
        
             val original = srls.edges.map(edge => edge.source) 
        
               // keep only predicates that are within the mention 
        
               .filter(tokenInterval.contains) 
        
               // remove the predicates which correspond to our increase/decrease/quantifiers 
        
               .filterNot(exclude contains words(_)) 
        
             // add back in ones that SRL "missed" 
        
             val corrected = for { 
        
               i <- tokenInterval 
        
               if !original.contains(i) 
        
               if outgoingOfType(i, Seq("compound")).nonEmpty 
        
             } yield i 
        
             // start with those closest to the syntactic root of the sentence to begin with "higher level" predicates 
        
             (original ++ corrected).sortBy(minGraphDistanceToSyntacticRoot) 
        
           }

MihaiSurdeanu · 2022-03-11T02:47:43Z

Yep, I think the attachment strings should be excluded from grounding. Do these include increase/decrease words? What type are they, TriggeredAttachment?

kwalcock · 2022-03-11T03:20:58Z

TriggeredAttachment is a superclass of things like Decrease, Increase, Quantification, Property, Pos(itive)Change, NegChange, Heding, Negation. This is code that Becky wrote, so probably well thought-out. It could be narrowed down to some of the subclasses.

I'm not liking that the present code rules out words rather than words at positions. If a word is repeated in a sentence, removing based only text will take it out everywhere even if it is used differently.

MihaiSurdeanu · 2022-03-11T03:22:43Z

I agree with everything you said. But having the same word repeated in the same phrase is pretty unlikely…

…

On Thu, Mar 10, 2022 at 20:21 Keith Alcock ***@***.***> wrote: TriggeredAttachment is a superclass of things like Decrease, Increase, Quantification, Property, Pos(itive)Change, NegChange, Heding, Negation. This is code that Becky wrote, so probably well thought-out. It could be narrowed down to some of the subclasses. I'm not liking that the present code rules out words rather than words at positions. If a word is repeated in a sentence, removing based only text will take it out everywhere even if it is used differently. — Reply to this email directly, view it on GitHub <#1115 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI75TQVC3VTJ7X4W4ZINT3U7K32LANCNFSM5PK663SQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

kwalcock · 2022-03-13T05:31:41Z

Exact matches are only run on the ontology node names. The ontology node examples have the approximate matches. We've been recently using lemmas to compare against the unlemmatized (raw) node names. The means that "higher temperatures" would be changed to "higher temperature" before comparison and not match the node wm/concept/environment/higher_temperatures. I don't think we can ask the ontology creators to use lemmas, but it is easy for us to test both words and lemmas and take the best. I have made that change.

kwalcock · 2022-03-13T05:38:12Z

Less easy is dealing with "higher". It is one of those words ruled out because it is an attribute or something and therefore would not be used in an exact match. I think that creators of the ontology might put specific texts in the node names that they want to match no matter what. I'm waiting to invalidate the attributes and parts of speech until after the exact match calculation has completed.

kwalcock · 2022-03-13T06:08:30Z

Another example seems to be armed conflict.

kwalcock · 2022-03-13T07:07:55Z

African governments works better in this version. It is a node. Also "cash crops". There are some times in which the match is a distraction, however. So, I've made it so that at least one exactly matching word needs to be valid. In the sentence

"exceptional prolonged and persistent agro-pastoral drought sequence Highlights 1"

it changed the match from prolonged to drought. wm/concept/time/prolonged is strange. It seems more like a property.

MihaiSurdeanu · 2022-03-13T18:12:14Z

Thanks @kwalcock !
I agree with the changes you made. I like the idea of having >= 1 valid words in the exact match!

kwalcock assigned kwalcock and zupon Feb 25, 2022

Compositional grounding is not composing #1115

Compositional grounding is not composing #1115

Comments

kwalcock commented Feb 25, 2022

MihaiSurdeanu commented Feb 25, 2022

MihaiSurdeanu commented Feb 25, 2022

kwalcock commented Feb 25, 2022

MihaiSurdeanu commented Feb 26, 2022

kwalcock commented Feb 26, 2022

MihaiSurdeanu commented Feb 26, 2022

kwalcock commented Feb 26, 2022

kwalcock commented Feb 26, 2022

MihaiSurdeanu commented Feb 26, 2022

MihaiSurdeanu commented Feb 26, 2022

kwalcock commented Feb 26, 2022

MihaiSurdeanu commented Feb 26, 2022

kwalcock commented Feb 26, 2022

kwalcock commented Feb 26, 2022

zupon commented Feb 26, 2022

kwalcock commented Feb 26, 2022

MihaiSurdeanu commented Feb 26, 2022

kwalcock commented Feb 26, 2022

EgoLaparra commented Feb 26, 2022

kwalcock commented Feb 26, 2022

MihaiSurdeanu commented Feb 26, 2022 via email

EgoLaparra commented Feb 27, 2022

kwalcock commented Feb 27, 2022

MihaiSurdeanu commented Feb 27, 2022

zupon commented Feb 27, 2022

kwalcock commented Feb 28, 2022

kwalcock commented Feb 28, 2022

EgoLaparra commented Feb 28, 2022

kwalcock commented Mar 2, 2022 • edited Loading

kwalcock commented Mar 2, 2022

kwalcock commented Mar 2, 2022 • edited Loading

kwalcock commented Mar 2, 2022 • edited Loading

kwalcock commented Mar 2, 2022

MihaiSurdeanu commented Mar 2, 2022

kwalcock commented Mar 2, 2022

kwalcock commented Mar 2, 2022

kwalcock commented Mar 2, 2022

kwalcock commented Mar 2, 2022

MihaiSurdeanu commented Mar 2, 2022

kwalcock commented Mar 2, 2022

kwalcock commented Mar 2, 2022

MihaiSurdeanu commented Mar 2, 2022

kwalcock commented Mar 2, 2022

MihaiSurdeanu commented Mar 2, 2022

kwalcock commented Mar 2, 2022

MihaiSurdeanu commented Mar 2, 2022

kwalcock commented Mar 2, 2022

MihaiSurdeanu commented Mar 2, 2022

kwalcock commented Mar 10, 2022

kwalcock commented Mar 11, 2022

MihaiSurdeanu commented Mar 11, 2022

kwalcock commented Mar 11, 2022

MihaiSurdeanu commented Mar 11, 2022 via email

kwalcock commented Mar 13, 2022

kwalcock commented Mar 13, 2022

kwalcock commented Mar 13, 2022

kwalcock commented Mar 13, 2022

MihaiSurdeanu commented Mar 13, 2022

kwalcock commented Mar 2, 2022 •

edited

Loading

kwalcock commented Mar 2, 2022 •

edited

Loading

kwalcock commented Mar 2, 2022 •

edited

Loading