Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use JTS instead of ESRI for geometries, Parts 1 and 2 #13604

Merged
merged 4 commits into from
Dec 9, 2019

Conversation

jagill
Copy link
Contributor

@jagill jagill commented Oct 25, 2019

I am in the process of refactoring the geospatial functions to use JTS instead of Esri. Why?

  • JTS is a more standard library, that is compatible with PostGIS and GEOS and most geometry libraries.
  • It conforms to the ISO standards better.
  • Serialization is about 80% of the CPU for many geometrical functions, and
  • JTS serializes about 20% faster than ESRI, and has the potential of serializing 10-20x faster.

Since geometries are always converted to/from slices, it's fine for some functions to use Esri and some to use JTS (this is the case already). This PR converts most of the "easy" functions (those that don't involve much in the way of logical or numerical changes). However, JTS uses the standard order for polygon rings (counter-clockwise) instead of Esri's clockwise. This means that many of the tests need to have the order of the vertices reversed. I've isolated all of those changes into the commit that uses JTS for conversion from/to WKT.

== RELEASE NOTES ==

General Changes
* Use JTS instead of Esri for many geometrical operations.
  + Polygon WKTs must have closed loops.  Previously Esri would close the loops for you.
  + Certain other invalid geometries will fail to be created from WKTs, such as
    `LINESTRING(0 0, 0 0, 0 0)`.
  + Returned WKTs may have a different point order.
  + Fixes incorrect calculation of extreme points in certain cases.

@jagill
Copy link
Contributor Author

jagill commented Oct 25, 2019

cc @arhimondr

@@ -121,7 +121,7 @@ public void validateBenchmarkData()
public void setup()
throws IOException
{
simpleGeometry = stGeometryFromText(utf8Slice("POLYGON ((16.5 54, 16.5 54.1, 16.51 54.1, 16.8 54))"));
simpleGeometry = stGeometryFromText(utf8Slice("POLYGON ((16.5 54, 16.5 54.1, 16.51 54.1, 16.8 54, 16.5 54))"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make Polygon WKTs valid in test cases

Although it is by the standard - it is a breaking change. Do we care that if somebody had existing WKT stored in the database that were compatible with ESRI it will stop working?

If we are fine with that, should we have a better release note?

public final class GeometryUtils
{
public static final CoordinateSequenceFactory COORDINATE_SEQUENCE_FACTORY = new PackedCoordinateSequenceFactory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are this objects (CoordinateSequenceFactory, GeometryFactory) thread safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upd: those are threadsafe, but mutable. Let's make them private

@@ -158,9 +158,6 @@ public void testArea()
// Empty polygon
assertFunction("ST_Area(to_spherical_geography(ST_GeometryFromText('POLYGON EMPTY')))", DOUBLE, null);

// Invalid polygon (too few vertices)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, certain types of invalid geometries could be created, but would have to be checked for explicitly in functions such as these. Now, those geometries fail on construction (I've updated the release notes to mention this).

@@ -172,7 +169,7 @@ public void testArea()

assertArea("POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))", 123.64E8);

assertArea("POLYGON((-122.150124 37.486095, -122.149201 37.486606, -122.145725 37.486580, -122.145923 37.483961 , -122.149324 37.482480 , -122.150837 37.483238, -122.150901 37.485392))", 163290.93943446054);
assertArea("POLYGON((-122.150124 37.486095, -122.149201 37.486606, -122.145725 37.486580, -122.145923 37.483961, -122.149324 37.482480, -122.150837 37.483238, -122.150901 37.485392, -122.150124 37.486095))", 163290.93943446054);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to belong to the previous commit

@@ -210,12 +207,6 @@ public void testLength()
// Empty linestring returns null
assertLength("LINESTRING EMPTY", null);

// Linestring with one point has length 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing these two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@@ -68,4 +68,28 @@ public static GeometrySerializationType getForCode(int code)
throw new IllegalArgumentException("Invalid type code: " + code);
}
}

public String toString()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Why the standard toString from the enum is not good enough? If this is something that we use for actual serialization / deserialization, should we have a separate method and not reuse the toString interface?

Copy link
Contributor Author

@jagill jagill Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used for serde: it is used for a user-facing error message. It's a bit nicer and conforms to the previous error message. However, I have no strong feelings and I'm happy for users to read MULTI_LINE_STRING.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed it: it's probably better to change the test error message than add a new method that must be maintained.

}

@Description("Returns a Geometry type object from Well-Known Binary representation (WKB)")
@ScalarFunction("ST_GeomFromBinary")
@SqlType(GEOMETRY_TYPE_NAME)
public static Slice stGeomFromBinary(@SqlType(VARBINARY) Slice input)
{
return serialize(geomFromBinary(input));
return GeometrySerde.serialize(geomFromBinary(input));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GeometrySerde.serialize and GeometrySerde.deserialize are confusing. Should we rename GeometrySerde to EsriGeometrySerde?

case ENVELOPE:
return "Envelope";
default:
return this.name();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw IllegalArgument

@jagill jagill force-pushed the use-jts branch 3 times, most recently from 2e9e7d6 to 05bebdf Compare October 30, 2019 18:06
Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me.

The only concern i have - is that we are braking backward compatibility by disallowing "invalid" polygons that don't have the last point. But as we discussed - that should be fine. As such polygons are "invalid".

@mbasmanova Would you like to have a look before we merge this?

public final class GeometryUtils
{
public static final CoordinateSequenceFactory COORDINATE_SEQUENCE_FACTORY = new PackedCoordinateSequenceFactory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not using them outside anymore, could you please make them private?

@@ -38,8 +40,8 @@

public final class GeometryUtils
{
public static final CoordinateSequenceFactory COORDINATE_SEQUENCE_FACTORY = new PackedCoordinateSequenceFactory();
public static final GeometryFactory GEOMETRY_FACTORY = new GeometryFactory(COORDINATE_SEQUENCE_FACTORY);
private static final CoordinateSequenceFactory COORDINATE_SEQUENCE_FACTORY = new PackedCoordinateSequenceFactory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i see, you are making them private in the next commit. Could you please move it to the commit before?

@jagill jagill force-pushed the use-jts branch 2 times, most recently from dc3f559 to f6149ab Compare November 8, 2019 02:06
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 8, 2019

CLA Check
The committers are authorized under a signed CLA.

  • ✅ James Gill (7aa067b9283e4f310426ce00c913c3921b295a3a, 1417e52084d2f414e47f44cd747b86863c9bbef4, 3ddf604ba86da57076a4ad67771b322d7988a641, eb2793f3e1aa871c91d03262bffd69ab34323f92)

@jagill jagill force-pushed the use-jts branch 2 times, most recently from d0b4060 to 820ea91 Compare November 8, 2019 20:59
@jagill
Copy link
Contributor Author

jagill commented Nov 8, 2019

Running the verifier suite, I found only two discrepancies:

  1. The order of polygon WKTs changed in the expected way,
  2. I found some unexpected changes to ST_XMax (and Y and Min) that -- when investigated -- showed the new values matched manual testing. So I'll consider this an unintentional bug fix.

As a first step to converting to JTS, convert the WKT serde functions to
JTS.  While the code change here is small, JTS uses the standard order
for polygon rings (counter-clockwise), while Esri uses the reverse order
(clockwise).  This means many of the test cases need to have their
coordinates reversed.

Also, a couple tests which tested invalid geometries will actually be
caught in WKT parsing; so they were removed from the other function
tests.
Converting most unary properties and operators to JTS.  Binary relations
and operators were not touched, as well as unary operators that would
require significant logic changes.
@arhimondr arhimondr merged commit 46ae108 into prestodb:master Dec 9, 2019
@jagill jagill deleted the use-jts branch December 9, 2019 19:10
@caithagoras caithagoras mentioned this pull request Jan 22, 2020
6 tasks
@bhhari
Copy link
Contributor

bhhari commented Jan 29, 2020

fixes #14031

@yupeng9 yupeng9 mentioned this pull request Jul 9, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants