Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compositional grounding is not composing #1115

Open
kwalcock opened this issue Feb 25, 2022 · 87 comments
Open

Compositional grounding is not composing #1115

kwalcock opened this issue Feb 25, 2022 · 87 comments
Assignees

Comments

@kwalcock
Copy link
Member

I wonder whether at

val predicateGroundings = exactPredicateGrounding +: inexactPredicateGroundings

the exact and inexact groundings should not be combined in a sequence but rather into a combined predicate grounding.

@MihaiSurdeanu
Copy link
Contributor

The sequence allows for the following sort to happen, no?

@MihaiSurdeanu
Copy link
Contributor

But the bigger question is in the title... Do we know where compositions are lost?

@kwalcock
Copy link
Member Author

The sequence only contains compositional groundings that aren't really composed. I'm thinking of something like

val combinedPredicateGroundings = inexactPredicateGroundings.flatMap { inexactPredicateGrounding =>
combineExactAndInexact(exactPredicateGrounding, inexactPredicateGrounding)
}
// Do all the ones that can be combined, then the exact only, then the individual inexact ones.
val predicateGroundings = (combinedPredicateGroundings :+ exactPredicateGrounding) ++ inexactPredicateGroundings

@MihaiSurdeanu
Copy link
Contributor

Ok...
But where are these combined with the concepts that serve as arguments to these predicates?

@kwalcock
Copy link
Member Author

In order to collect the inexactPredicateGroundings there are variables like isArg and isPred. If there are predicates, this seems like the only place they would be composed.

Pseudo call stack:

groundEidosMention
groundSentenceSpan
groundSentenceSpan
if (validPredicates.isEmpty) groundWithoutPredicates(sentenceHelper)
else groundWithPredicates(sentenceHelper)
  groundWithPredicates
  findExactPredicateGroundingAndRange
  findInexactPredicateGroundings

@MihaiSurdeanu
Copy link
Contributor

Ok, thanks!

Can you please try to find where the concepts get lost? For example, for the phrase "transportation of oil", where do we lose the grounding for "oil"?

@kwalcock
Copy link
Member Author

The program is running, but IIRC oil is grounded to fuel and is not an exact match. Transportation is not an exact match, either.
Not sure why. Inexact matches are not combined.

@kwalcock
Copy link
Member Author

Ben's "transportation of water" works, almost. The scores don't sort well. The noisyOr is only applied to filled slots, so filling a second slot with a lower value is not as good as having the first single slot.

THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.2596835
THEME: wm/concept/goods/water (1.0) THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.0025969
THEME: wm/concept/goods/water (1.0) Total: 0.99

@MihaiSurdeanu
Copy link
Contributor

Sure. I think most matches in the real world will be inexact due to language sparsity. But we should assemble predicates and their arguments regardless of exact/inexact match.

@MihaiSurdeanu
Copy link
Contributor

Ben's "transportation of water" works, almost. The scores don't sort well. The noisyOr is only applied to filled slots, so filling a second slot with a lower value is not as good as having the first single slot.

THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.2596835
THEME: wm/concept/goods/water (1.0) THEME PROCESS: wm/process/transportation/ (1.2696835) Total: 1.0025969
THEME: wm/concept/goods/water (1.0) Total: 0.99

This is pretty nice!
Then what did Ben see? Is there a bug?

@kwalcock
Copy link
Member Author

They (water and transportation) weren't combined before. They were separate compositional groundings in the Seq that got sorted.

@MihaiSurdeanu
Copy link
Contributor

I see. Awesome!

@kwalcock
Copy link
Member Author

When there is a predicate, as is the case with both "(price) of oil" and "water (transportation)", the findExactPredicateGroundingAndRange does not necessarily find an exact match at all or may find an exact match which doesn't include the predicate, I should have called it be called findExactGroundingAndRangeIfThereIsAPredicateSomewhere. If it doesn't find an exact match as is the case with "price of oil", perhaps it should specifically target the predicate, make do with whatever it can find, and then go on to look for possible arguments, etc. In this particular case, it looks like the ontology doesn't help by having a node called price_or_cost. It would be better to pick one or the other and move the extra to an example. For "water transportation", water matches first, even though it isn't the predicate, but it is possible to do them in the other order, apparently.

@kwalcock
Copy link
Member Author

I hope that @zupon is getting notifications in case he notices that I'm off track.

@zupon
Copy link
Contributor

zupon commented Feb 26, 2022

I have been getting these notifications, and it looks like you might be right on track! At one point, we did want to try to match the entire mention span even when there were predicates. That would get us stuff like "climate change", where it's sorta compositional but we also just have a climate_change concept. Since we wanted the matching for things like this to be fuzzy, we then did the non-exact matching. This is useful when we have mentions that don't exactly match our ontology, but it sounds like it's now a problem whereby if we do get some non-exact match this way we just stop and ignore the rest, instead of continuing with the remaining content.

I can't look at this tonight, but I will try to take a look tomorrow. But I think this is looking in the right place.

@kwalcock
Copy link
Member Author

Right now it's failing one NOT test by getting extra things, but it is probably at least composing. If I turn the failingTests to passingTests, it gets 747 passing while master gets 771. I had added some things like multiple exact matches. Maybe that hurt it. I don't have a feel for the data.

@MihaiSurdeanu
Copy link
Contributor

Thank you both! I think we should enable this change, which looks like it's improving things a lot.

Also, @EgoLaparra: would it be possible for you to look at the 771 - 747 = 24 tests that fail now and try to see if we can fix them?
Thank you all!

@kwalcock
Copy link
Member Author

I'm probably going to make a few more changes around

val result = overlapTuples
.sorted
.take(3)
.map { overlapTuple =>
val singleOntologyNodeGrounding = OntologyNodeGrounding(overlapTuple._4, 1.0f)
val range = Range(overlapTuple._2, overlapTuple._2 - overlapTuple._1) // - because it is -length
(singleOntologyNodeGrounding, range)
}
within a couple of hours. Those lines did not survive the night.

@EgoLaparra
Copy link
Contributor

@kwalcock let me know when you are done with those changes and I will take a look at the failing tests.

@kwalcock
Copy link
Member Author

@EgoLaparra, I made the small change.

@MihaiSurdeanu
Copy link
Contributor

MihaiSurdeanu commented Feb 26, 2022 via email

@EgoLaparra
Copy link
Contributor

Almost all those tests seem to fail because composed groundings are scored higher than non-composed ones, even when the expected grounding is non-composed.

For example, for the cause mention high unemployment the expected grounding is Seq("wm/concept/economy/unemployment", "", "", "") but the predictions is Seq(wm/concept/economy/unemployment, wm/property/quality, , ,). The top 3 groundings produced by the algorithm are:

Seq(wm/concept/economy/unemployment, wm/property/quality, , , 0.99481976)
Seq(wm/concept/economy/unemployment, , , , 0.99)
Seq(, wm/property/quality, , , 0.4819764)

Another example: the prediction for Increasing tensions is Seq(wm/concept/crisis_or_disaster/conflict/tension, wm/property/price_or_cost, ,) while the expected grounding is Seq("wm/concept/crisis_or_disaster/conflict/tension", "", "", ""). The top 3:

Seq(wm/concept/crisis_or_disaster/conflict/tension, wm/property/price_or_cost, , , 0.9948348)
Seq(wm/concept/crisis_or_disaster/conflict/tension, , , , 0.99)
Seq(, wm/property/price_or_cost, , , 0.48348033)

Is this behavior (i.e., composed scores > non-composed scores) expected?

@kwalcock
Copy link
Member Author

There is in SRLCompositionalGrounder.scala a calculation of score in which the properties are multiplied by 0.5f before they are noisyOr'd. Changing that value might switch how these values sort.

@MihaiSurdeanu
Copy link
Contributor

I think it's worth tuning the score that Keith mentioned. Also, the first compositional example:
Seq(wm/concept/economy/unemployment, wm/property/quality, , ,)
seems correct to me.
However, the second one is clearly wrong...
Any other ideas on how to fix these? @zupon, you did look at situations such as the second one, right? In this case, I think "tensions" is consumed by the concept, and "increasing" is a magnitude adjective and it should not be used or grounding. Thus, there should be no tokens left for grounding the phantom property "price_or_cost"... You did keep track of which words were consumed during grounding, right @zupon ?

@zupon
Copy link
Contributor

zupon commented Feb 27, 2022

For both the high unemployment and Increasing tensions examples, are there predicates? I can sorta see the link from high --> quality, but I'm not seeing where Increasing --> price would make sense.

What should happen if there are predicates is (1) the algorithm looks first at the entire span (e.g. high unemployment) to see if it matches a node or example exactly (for various meanings of "exactly"); (2) if it doesn't, we then take each predicate or argument individually and then ground it. When we match something with the fuzzy exact match, we do (or are supposed to, anyway) remove those tokens that matched from our subsequent grounding attempts. For example, if we had the span climate change regulations, our exact match might return the climate_change Concept. After the match, we remove those two tokens and are only left with regulations to ground.

What may be happening here is the (fuzzy) exact matching for the span is accidentally allowing the entire Increasing tensions span to check against the Property branch using w2v rather than only using regex like we do elsewhere.

@MihaiSurdeanu AFAIK we don't actually keep track of the tokens used for each slot in the predicate grounding in a way that we can refer back to them later, but the algorithm does (or should) remove tokens that are used for grounding, like with the exact match case for predicates/arguments.

@kwalcock
Copy link
Member Author

There is this comment in SRLCompositionalGrounder:

      // TODO (at some point) -- the empty sequence here is a placeholder for increase/decrease triggers
      //  Currently we don't have "access" to those here, but that could be changed

On the other hand, there are in the ontology words like "higher". One is in the ontology node higher_temperatures and there are others in the examples. To keep them in the text to be grounded might help in these cases.

@kwalcock
Copy link
Member Author

Here's the node for price_or_cost, which has some hints of increase. It wouldn't have to be that way.

                - node:
                    name: price_or_cost
                    patterns:
                        - \b([Cc]osts?|[Pp]rices?)\b
                    examples:
                        - additional costs
                        - capital costs
                        - cost
                        - costs
                        - expenses
                        - hidden costs
                        - high costs
                        - higher costs
                        - indirect costs
                        - input costs
                        - operating costs
                        - opportunity costs
                        - price
                        - recurrent costs
                        - social costs

@EgoLaparra
Copy link
Contributor

I've been playing with the properties score multiplier that @kwalcock mentioned and only very small values seem to have an effect. Setting it to 0.1f or 0.05f only 1 of the 29 tests passes. With 0.01f, 9 tests pass. Setting it to 0.0f, 20 of the tests pass, and adding also a 0.0f multiplier to processes, 27 out the 29 tests pass, however, this seems like switching off the composition.

In case you find it useful, I've pushed a branch that only runs the 29 failing tests: composition-failing-tests

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

The stopwords are coming from https://github.com/clulab/eidos/blob/master/src/main/resources/org/clulab/wm/eidos/english/filtering/stops.txt and https://github.com/clulab/eidos/blob/master/src/main/resources/org/clulab/wm/eidos/english/filtering/transparent.txt and aren't good words to rule out. I'll try POS:

Tag != IN*
!= DT*
!= PRP*

But also add . and , and probably lots of other things. It's expensive to unnecessarily ground against things.

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

Now only 11 instead of 13 originally passing tests fail, which is good. Overall, though, 186 fail instead of the low of 141. 718 pass, down from 774. I will spot check for stupid mistakes, but otherwise it will probably have to be left at close to this.

It is implemented at:

def findInexactPredicateGroundings(mentionRange: Range, sentenceHelper: SentenceHelper): Seq[PredicateGrounding] = {
val mentionStart = mentionRange.start
val isStops: IndexedSeq[Boolean] = mentionRange.map(isStopword(sentenceHelper, _))
val indices: IndexedSeq[Int] = isStops.indices // indices(0) + mentionStart = mentionRange(0)
if (indices.forall(isStops)) Seq.empty // There is nothing to ground, so short-circuit it.
else {
val propertyOntologyGroundingOpts: Seq[Option[OntologyGrounding]] = indices.map { index =>
if (isStops(index)) None
else maybeProperty(Interval(mentionStart + index), sentenceHelper)
}
val isProps: IndexedSeq[Boolean] = indices.map { index => propertyOntologyGroundingOpts(index).isDefined }
val isArgs: IndexedSeq[Boolean] = indices.map { index => !isStops(index) && !isProps(index) && sentenceHelper.isArg(mentionStart + index) }
val isPreds: IndexedSeq[Boolean] = indices.map { index => !isStops(index) && !isProps(index) && !isArgs(index) }
// Since these are all mutually exclusive, one could go through just once and trickle down.
// Since isPreds is the catch-all, there must be something for every index. There is no other.
val intArgs: IndexedSeq[Int] = boolsToInts(isArgs) // turns (false, true, false, true) to (0, 1, 0 2)
val intPreds: IndexedSeq[Int] = boolsToInts(isPreds) // turns (true, false, false, true) to (1, 0, 0, 2)
// 0, either not property or property that does not belong to arg or pred
// +n, property that belongs to an arg
// -n, property that belongs to a pred
val intProps: IndexedSeq[Int] = indices.map { index =>
if (!isProps(index)) 0
else {
// Prefer looking to the right, after.
val argWhereAndWhatOpt: Option[(Int, Int)] = Collection.findWhereWhatOptAfter(intArgs, index)(_ != 0)
val predWhereAndWhatOpt: Option[(Int, Int)] = Collection.findWhereWhatOptAfter(intPreds, index)(_ != 0)
if (argWhereAndWhatOpt.isDefined || predWhereAndWhatOpt.isDefined)
indexAndCountToCount(index, argWhereAndWhatOpt, predWhereAndWhatOpt)
else {
// Then look left if necessary, before.
val argWhereAndWhatOpt = Collection.findWhereWhatOptBefore(intArgs, index)(_ != 0)
val predWhereAndWhatOpt = Collection.findWhereWhatOptBefore(intPreds, index)(_ != 0)
if (argWhereAndWhatOpt.isDefined || predWhereAndWhatOpt.isDefined) {
indexAndCountToCount(index, argWhereAndWhatOpt, predWhereAndWhatOpt)
}
else
0
}
}
}
// There is a property and it maps to no arg or pred.
val zeroPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == 0 }
val predicateTuple =
if (zeroPropertyIndexes.nonEmpty) {
// This can only happen if there are no arguments or predicates.
// This cannot be None because zeroProperties is nonEmpty.
val propertyOntologyGroundingOpt = getBest(zeroPropertyIndexes, propertyOntologyGroundingOpts)
// Arbitrarily put the property in the concept property slot.
PredicateTuple(None, propertyOntologyGroundingOpt, None, None, emptyOntologyGrounding)
}
else {
// We're only grounding against the very first argument or predicate, so +/- 1.
val argPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == 1 }
val bestArgPropertiesOpt = getBest(argPropertyIndexes, propertyOntologyGroundingOpts)
val predPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == -1 }
val bestPredPropertiesOpt = getBest(predPropertyIndexes, propertyOntologyGroundingOpts)
val argOntologyGroundingOpt = Collection.optIndexOf(isArgs, true) // Since we're using the first one only, or none.
.map { index =>
groundToBranches(Seq(SRLCompositionalGrounder.CONCEPT), Interval(mentionStart + index), s, topNOpt, thresholdOpt)
}
val predOntologyGroundingOpt = Collection.optIndexOf(isPreds, true) // Since we're using the first one only, or none.
.map { index =>
groundToBranches(Seq(SRLCompositionalGrounder.PROCESS), Interval(mentionStart + index), s, topNOpt, thresholdOpt)
}
PredicateTuple(argOntologyGroundingOpt, bestArgPropertiesOpt, predOntologyGroundingOpt, bestPredPropertiesOpt, emptyOntologyGrounding)
}
val predicateGroundings = Seq(PredicateGrounding(predicateTuple))
val sortedPredicateGroundings = predicateGroundings.sortBy(-_.score)
val slicedPredicateGroundings = topNOpt.map(sortedPredicateGroundings.take).getOrElse(sortedPredicateGroundings)
slicedPredicateGroundings
}
}

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

This seems in retrospect to be ambiguous:

	4. Check if properties in 2 attach to them (for each property look first to the right, then to left if none found; pick the first process/concept)

By look to the right I understood to look anywhere to the right all the way until the end of the mention, and to the left to look until the beginning of the mention. I've got a mention of 17 words here and there are some very long distance relationships. I wonder whether it should be one to the right, one to the left, two to the right, two to the left, etc. to find the closest one but with the right still at an advantage. There's probably not time to experiment with that.

To the end doesn't really make sense, because that would always include the first one (in the sentence) which would always be found and not need to be a default. On the other hand, a different one in the sentence might be closer so that the first one is wrong.

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

Taking the closest one, not going all the way to the end, performs worse, with 191 failed tests instead of 186. If it eventually looks like the properties are being over-generated, the search could be limited to the immediate vicinity and then be abandoned.

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

Updated code:

def findInexactPredicateGroundings(mentionRange: Range, sentenceHelper: SentenceHelper): Seq[PredicateGrounding] = {
val mentionStart = mentionRange.start
val isStops: IndexedSeq[Boolean] = mentionRange.map(isSkipword(sentenceHelper, _))
val indices: IndexedSeq[Int] = isStops.indices // indices(0) + mentionStart = mentionRange(0)
if (indices.forall(isStops)) Seq.empty // There is nothing to ground, so short-circuit it.
else {
val propertyOntologyGroundingOpts: Seq[Option[OntologyGrounding]] = indices.map { index =>
if (isStops(index)) None
else maybeProperty(Interval(mentionStart + index), sentenceHelper)
}
val isProps: IndexedSeq[Boolean] = indices.map { index => propertyOntologyGroundingOpts(index).isDefined }
val isArgs: IndexedSeq[Boolean] = indices.map { index => !isStops(index) && !isProps(index) && sentenceHelper.isArg(mentionStart + index) }
val isPreds: IndexedSeq[Boolean] = indices.map { index => !isStops(index) && !isProps(index) && !isArgs(index) }
// Since these are all mutually exclusive, one could go through just once and trickle down.
// Since isPreds is the catch-all, there must be something for every index. There is no other.
val intArgs: IndexedSeq[Int] = boolsToInts(isArgs) // turns (false, true, false, true) to (0, 1, 0 2)
val intPreds: IndexedSeq[Int] = boolsToInts(isPreds).map(_ * -1) // turns (true, false, false, true) to (-1, 0, 0, -2)
// 0, either not property or property that does not belong to arg or pred
// +n, property that belongs to an arg
// -n, property that belongs to a pred
val intProps: IndexedSeq[Int] = indices.map { index =>
if (!isProps(index)) 0
else getRightThenLeftCount(index, intArgs, intPreds)
// else getAroundCount(index, intArgs, intPreds)
}
// There is a property and it maps to no arg or pred.
val zeroPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == 0 }
val predicateTuple =
if (zeroPropertyIndexes.nonEmpty) {
// This can only happen if there are no arguments or predicates.
// This cannot be None because zeroProperties is nonEmpty.
val propertyOntologyGroundingOpt = getBest(zeroPropertyIndexes, propertyOntologyGroundingOpts)
// Arbitrarily put the property in the concept property slot.
PredicateTuple(None, propertyOntologyGroundingOpt, None, None, emptyOntologyGrounding)
}
else {
// We're only grounding against the very first argument or predicate, so +/- 1.
val argPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == 1 }
val bestArgPropertiesOpt = getBest(argPropertyIndexes, propertyOntologyGroundingOpts)
val predPropertyIndexes = indices.filter { index => isProps(index) && intProps(index) == -1 }
val bestPredPropertiesOpt = getBest(predPropertyIndexes, propertyOntologyGroundingOpts)
val argOntologyGroundingOpt = Collection.optIndexOf(isArgs, true) // Since we're using the first one only, or none.
.map { index =>
groundToBranches(Seq(SRLCompositionalGrounder.CONCEPT), Interval(mentionStart + index), s, topNOpt, thresholdOpt)
}
val predOntologyGroundingOpt = Collection.optIndexOf(isPreds, true) // Since we're using the first one only, or none.
.map { index =>
groundToBranches(Seq(SRLCompositionalGrounder.PROCESS), Interval(mentionStart + index), s, topNOpt, thresholdOpt)
}
PredicateTuple(argOntologyGroundingOpt, bestArgPropertiesOpt, predOntologyGroundingOpt, bestPredPropertiesOpt, emptyOntologyGrounding)
}
val predicateGroundings = Seq(PredicateGrounding(predicateTuple))
val sortedPredicateGroundings = predicateGroundings.sortBy(-_.score)
val slicedPredicateGroundings = topNOpt.map(sortedPredicateGroundings.take).getOrElse(sortedPredicateGroundings)
slicedPredicateGroundings
}
}

@MihaiSurdeanu
Copy link
Contributor

The code looks great @kwalcock !
I think we're very close to being done. Before that:

  • Can you please send the groundings for our usual suspects: "water transportation", "transportation of oil", "cost of transportation of oil"?
  • Send the outputs for the some of newly failing tests?

Thanks!

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

I'm working on it.

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

water transportation, which has an exact match on water:

{
    "@type" : "Extraction",
    "@id" : "_:Extraction_3",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Concept", "Entity" ],
    "text" : "water transportation",
    "rule" : "simple-np++Decrease_ported_syntax_1_verb",
    "canonicalName" : "water transportation",
    "groundings" : [ {
      "@type" : "Groundings",
      "name" : "wm_compositional",
      "version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
      "versionDate" : "2021-10-19T19:46:27Z",
      "values" : [ {
        "@type" : "PredicateGrounding",
        "theme" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/water",
          "value" : 1.0
        } ],
        "themeProcess" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/",
          "value" : 1.2696834802627563
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/commercial_transportation",
          "value" : 1.1144535541534424
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/personal_transportation",
          "value" : 1.1033762693405151
        } ],
        "value" : 1.0025968551635742,
        "display" : "THEME: wm/concept/goods/water (1.0); THEME PROCESS: wm/process/transportation/ (1.2696835)"
      }, {
        "@type" : "PredicateGrounding",
        "theme" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/water",
          "value" : 1.0
        } ],
        "value" : 0.9900000095367432,
        "display" : "THEME: wm/concept/goods/water (1.0)"
      } ]
    }

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

transportation of oil, only inexact matches:

{
    "@type" : "Extraction",
    "@id" : "_:Extraction_3",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Concept", "Entity" ],
    "text" : "transportation of oil",
    "rule" : "simple-np++Decrease_ported_syntax_1_verb",
    "canonicalName" : "transportation oil",
    "groundings" : [ {
      "@type" : "Groundings",
      "name" : "wm_compositional",
      "version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
      "versionDate" : "2021-10-19T19:46:27Z",
      "values" : [ {
        "@type" : "PredicateGrounding",
        "theme" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/fuel",
          "value" : 1.8379604816436768
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/environment/natural_resources/fossil_fuels",
          "value" : 1.727161169052124
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/protein_foods",
          "value" : 0.9827450513839722
        } ],
        "themeProcess" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/",
          "value" : 1.2696834802627563
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/commercial_transportation",
          "value" : 1.1144535541534424
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/personal_transportation",
          "value" : 1.1033762693405151
        } ],
        "value" : 0.7849923372268677,
        "display" : "THEME: wm/concept/goods/fuel (1.8379605); THEME PROCESS: wm/process/transportation/ (1.2696835)"
      } ]
    }

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

cost of transportation of oil:

{
    "@type" : "Extraction",
    "@id" : "_:Extraction_3",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Concept", "Entity" ],
    "text" : "cost of transportation of oil",
    "rule" : "gazetteer++simple-np++Decrease_ported_syntax_1_verb",
    "canonicalName" : "transportation oil",
    "groundings" : [ {
      "@type" : "Groundings",
      "name" : "wm_compositional",
      "version" : "7df8e4a4237596bb40ceea9e72bb46821fc01827",
      "versionDate" : "2021-10-19T19:46:27Z",
      "values" : [ {
        "@type" : "PredicateGrounding",
        "theme" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/fuel",
          "value" : 1.8379604816436768
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/environment/natural_resources/fossil_fuels",
          "value" : 1.727161169052124
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/concept/goods/protein_foods",
          "value" : 0.9827450513839722
        } ],
        "themeProcess" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/",
          "value" : 1.2696834802627563
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/commercial_transportation",
          "value" : 1.1144535541534424
        }, {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/process/transportation/personal_transportation",
          "value" : 1.1033762693405151
        } ],
        "themeProcessProperties" : [ {
          "@type" : "Grounding",
          "ontologyConcept" : "wm/property/price_or_cost",
          "value" : 1.0
        } ],
        "value" : 0.890346109867096,
        "display" : "THEME: wm/concept/goods/fuel (1.8379605); THEME PROCESS: wm/process/transportation/ (1.2696835), Process properties: wm/property/price_or_cost (1.0)"
      } ]
    }

@MihaiSurdeanu
Copy link
Contributor

If I can read the JSON correctly, it looks to me like all these 3 examples are correct! Nice job @kwalcock !

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

canonicalName: drought
words: of the drought
[info] - should process "The impact of the drought has been exacerbated by high local cereal prices , excess livestock mortality , conflict and restricted humanitarian access in some areas ." effect themeProperty correctly *** FAILED ***
[info]   "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:45)
[info]   Analysis:
[info]   "[wm/property/preparedness]" -> "[]"

It wanted
val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "", "", "")
but apparently got
val effectGroundings = Seq("wm/concept/crisis_or_disaster/environmental/drought", "wm/property/preparedness", "", "")

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

Most of the errors are from a slot being filled in that wasn't before, particularly for properties and processes. Perhaps the tests paid less attention to those, or maybe the isPred = !isArg results in too many potential processes and maybe there are too many long-distance properties or the property match threshold is too low.

This is already a week overdue. I'm not sure there's time for analysis.

Examples:

[info]   "[wm/property/stability]" was not equal to "[]" (TestGrounding.scala:47)
[info]   "[wm/property/condition]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/process/population/death]" was not equal to "[]" (TestGrounding.scala:46)
[info]   "[wm/process/combining]" was not equal to "[]" (TestGrounding.scala:46)
[info]   "[wm/property/violent]" was not equal to "[]" (TestGrounding.scala:47)
[info]   "[wm/property/quality]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/aftermath]" was not equal to "[]" (TestGrounding.scala:47)
[info]   "[wm/property/stability]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/stability]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/price_or_cost]" was not equal to "[]" (TestGrounding.scala:45)
[info]   "[wm/property/preparedness]" was not equal to "[]" (TestGrounding.scala:47)
etc.

Fewer are changes:

[info]   "wm/process/[farming/herding]" was not equal to "wm/process/[population/migrate/]" (TestGrounding.scala:46)
[info]   "wm/concept/[entity/muslim_communities]" was not equal to "wm/concept/[agriculture/]" (TestGrounding.scala:44)
[info]   "wm/concept/[entity/hamas]" was not equal to "wm/concept/[plan/]" (TestGrounding.scala:44)
[info]   "wm/concept/[entity/muslim_communities]" was not equal to "wm/concept/[government/]" (TestGrounding.scala:44)
[info]   "...ocess/communication/[campaign]" was not equal to "...ocess/communication/[informing]" (TestGrounding.scala:46)
etc.

@MihaiSurdeanu
Copy link
Contributor

Agreed that we can't spend too much time on this at this point.
But the above example for "of the drought" worries me. Which tokens were used for the preparedness grounding? It seems "drought" was used for drought. The other two "of" and "the" are stop words. So, then what was used?

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

It is grounding to "impact of the drought". "Impact" doesn't get printed because it is special, probably transparent.

@MihaiSurdeanu
Copy link
Contributor

Then I think it is fine!

Should we merge this PR?

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

I would need to disable failing tests before it can go into master. Right now it is more important to go into kwalcock/wmexchanger which has all the updates for the new DART protocols. That hadn't gone to master yet because it was only just put into use. It is headed there for the code freeze.

@MihaiSurdeanu
Copy link
Contributor

Thanks!
Let me know when it's done so we can email MITRE.

@kwalcock
Copy link
Member Author

kwalcock commented Mar 2, 2022

The documents were regrounded and uploaded. I slacked Ben and Robyn.

@MihaiSurdeanu
Copy link
Contributor

Beautiful!! Thanks!

@kwalcock
Copy link
Member Author

@MihaiSurdeanu

Regarding this:

Do NOT use them if they were used as triggers for increase/decrease attributes, AND they are tagged as JJ* or VBN.

There is code intended to do very much the same at these places which use a set of words which to exclude:

def groundSentenceSpan(s: Sentence, start: Int, end: Int, exclude: Set[String], topN: Option[Int], threshold: Option[Float]): Seq[OntologyGrounding] = {

// Get the triggers, quantifiers, and context phrases of the attachments
private def getAttachmentStrings(mention: Mention): Set[String] = {
mention.attachments.flatMap { a =>
a match {
case _: Property => Seq.empty
case t: TriggeredAttachment => Seq(t.trigger) ++ t.quantifiers.getOrElse(Seq())
case c: ContextAttachment => Seq(c.text)
case _ => throw new RuntimeException("Unexpected attachment")
}
}
}

The conditions are slightly different: the POS is not checked. It is also not used consequently throughout the code. It only rules out predicates right now. I wonder whether I should update the conditions or reuse them.

@kwalcock
Copy link
Member Author

While I'm at it, the predicates are calculated in SentenceHelper and that definition might be reused. It is slightly different:

val validPredicates: Seq[Int] = {
// don't use the roots as here we're only interested in outgoing "sources" for predicates,
// and the roots also contain the unreachable nodes etc.
val original = srls.edges.map(edge => edge.source)
// keep only predicates that are within the mention
.filter(tokenInterval.contains)
// remove the predicates which correspond to our increase/decrease/quantifiers
.filterNot(exclude contains words(_))
// add back in ones that SRL "missed"
val corrected = for {
i <- tokenInterval
if !original.contains(i)
if outgoingOfType(i, Seq("compound")).nonEmpty
} yield i
// start with those closest to the syntactic root of the sentence to begin with "higher level" predicates
(original ++ corrected).sortBy(minGraphDistanceToSyntacticRoot)
}

@MihaiSurdeanu
Copy link
Contributor

Yep, I think the attachment strings should be excluded from grounding. Do these include increase/decrease words? What type are they, TriggeredAttachment?

@kwalcock
Copy link
Member Author

TriggeredAttachment is a superclass of things like Decrease, Increase, Quantification, Property, Pos(itive)Change, NegChange, Heding, Negation. This is code that Becky wrote, so probably well thought-out. It could be narrowed down to some of the subclasses.

I'm not liking that the present code rules out words rather than words at positions. If a word is repeated in a sentence, removing based only text will take it out everywhere even if it is used differently.

@MihaiSurdeanu
Copy link
Contributor

MihaiSurdeanu commented Mar 11, 2022 via email

@kwalcock
Copy link
Member Author

Exact matches are only run on the ontology node names. The ontology node examples have the approximate matches. We've been recently using lemmas to compare against the unlemmatized (raw) node names. The means that "higher temperatures" would be changed to "higher temperature" before comparison and not match the node wm/concept/environment/higher_temperatures. I don't think we can ask the ontology creators to use lemmas, but it is easy for us to test both words and lemmas and take the best. I have made that change.

@kwalcock
Copy link
Member Author

Less easy is dealing with "higher". It is one of those words ruled out because it is an attribute or something and therefore would not be used in an exact match. I think that creators of the ontology might put specific texts in the node names that they want to match no matter what. I'm waiting to invalidate the attributes and parts of speech until after the exact match calculation has completed.

@kwalcock
Copy link
Member Author

Another example seems to be armed conflict.

@kwalcock
Copy link
Member Author

African governments works better in this version. It is a node. Also "cash crops". There are some times in which the match is a distraction, however. So, I've made it so that at least one exactly matching word needs to be valid. In the sentence

"exceptional prolonged and persistent agro-pastoral drought sequence Highlights 1"

it changed the match from prolonged to drought. wm/concept/time/prolonged is strange. It seems more like a property.

@MihaiSurdeanu
Copy link
Contributor

Thanks @kwalcock !
I agree with the changes you made. I like the idea of having >= 1 valid words in the exact match!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants