Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Optimize raster outputs #1040

Closed
smathermather opened this issue Oct 7, 2019 · 20 comments
Closed

Feature request: Optimize raster outputs #1040

smathermather opened this issue Oct 7, 2019 · 20 comments

Comments

@smathermather
Copy link
Contributor

Outputs from OpenDroneMap can be very large. For elevation models and orthophotos, we should optionally output much smaller versions as documented here:

https://geoserver.geo-solutions.it/edu/en/raster_data/advanced_gdal/example5.html

For elevation models, output would be with the following GDAL flags -co TILED=YES -co COMPRESS=DEFLATE. These should be set as default.

For orthophotos, the following parameters:
-co TILED=YES -co COMPRESS=JPEG -co PHOTOMETRIC=YCBCR --config GDAL_TIFF_INTERNAL_MASK YES -b 1 -b 2 -b 3 -mask 4

For orthophotos, the -co COMPRESS=JPEG -co PHOTOMETRIC=YCBCR flags should be optional, as users may not want to use lossy compression on their final output product.

@Saijin-Naib
Copy link
Contributor

Saijin-Naib commented Oct 8, 2019

There are a few parameters I've been playing with for compressing GeoTIFFs and I agree with your above assessment. I have a few other things to suggest, if I may:

  1. To maintain compatibility with ESRI products, INTERLEAVE must be set to PIXEL, despite BAND typically yielding more efficient files (GDAL has no problem with BAND)

  2. TFW should be YES. They are a legacy extension to GeoTIFF, but they can be invaluable to have as a backup if for any reason the GeoTIFF metadata get destroyed (opening in a non-GeoTIFF aware program, etc).

  3. DEMs can achieve excellent compression while being compatible with both GDAL/ESRI via LERC_ZSTD with the MAX_Z_ERROR being set to the determined spatial resolution of the generated DEM. It is lossy, but if you set the ERROR properly, you're not losing any real precision of significance, just inflated floating point values that don't represent the real measured phenomena.

  4. PREDICTOR=2 seems to work best

  5. External overviews. Yes, this makes the output file not a true Cloud Optimized GeoTIFF (COG), but it allows for the easy/quick regeneration of the overviews whenever needed (or even dropping of them to save file size/transmission time), which is not possible with internal overviews. Having overviews at all should be strongly considered as they greatly speed the display and visualization of the data in every product I've tried, though QGIS/GDAL tend to render faster with missing overviews than ArcGIS does.

  6. COMPRESS=ZSTD. ZSTD compresses oftentimes better than DEFLATE at similar settings, and tends to be more performant with multithreaded readers/writers.

@smathermather
Copy link
Contributor Author

  1. To maintain compatibility with ESRI products, INTERLEAVE must be set to PIXEL, despite BAND typically yielding more efficient files (GDAL has no problem with BAND)

It sounds like we should give users the choice of optimization. If BAND is more efficient, and a user is never using or interfacing with ESRI products, they definitely don't want to use INTERLEAVE as PIXEL, since we are trying to optimize for speed and size.

  1. TFW should be YES. They are a legacy extension to GeoTIFF, but they can be invaluable to have as a backup if for any reason the GeoTIFF metadata get destroyed (opening in a non-GeoTIFF aware program, etc).

I am not sure that we want to add TFW. This is something that can be handled in post-processing. The purpose of any flags we set by default or via easy flags in ODM shouldn't be the full complement of possibilities, but the most likely use cases, in this case for optimizing size and speed.

  1. DEMs can achieve excellent compression while being compatible with both GDAL/ESRI via LERC_ZSTD with the MAX_Z_ERROR being set to the determined spatial resolution of the generated DEM. It is lossy, but if you set the ERROR properly, you're not losing any real precision of significance, just inflated floating point values that don't represent the real measured phenomena.

Very interesting! @pierotofy: any opinions on this? I hate to throw quantitative data away, but if it's not significant, it is tempting to consider.

  1. PREDICTOR=2 seems to work best

This is for the DEMs?

  1. External overviews. Yes, this makes the output file not a true Cloud Optimized GeoTIFF (COG), but it allows for the easy/quick regeneration of the overviews whenever needed (or even dropping of them to save file size/transmission time), which is not possible with internal overviews. Having overviews at all should be strongly considered as they greatly speed the display and visualization of the data in every product I've tried, though QGIS/GDAL tend to render faster with missing overviews than ArcGIS does.

I am not so sure. The reason COGs have them internally isn't just a convenience but a speed bump. I agree that I generally prefer overviews to be external for the reasons listed above, but in this case, we are optimizing for speed at the cost of other things. With this as the objective, I think we would have internal overviews.

  1. COMPRESS=ZSTD. ZSTD compresses oftentimes better than DEFLATE at similar settings, and tends to be more performant with multithreaded readers/writers.

Awesome. I can get behind this after some quick testing.

@Saijin-Naib
Copy link
Contributor

Saijin-Naib commented Oct 17, 2019

It sounds like we should give users the choice of optimization. If BAND is more efficient, and a user is never using or interfacing with ESRI products, they definitely don't want to use INTERLEAVE as PIXEL, since we are trying to optimize for speed and size.

Sounds perfectly reasonable to me. Just wanted to make the distinction that unfortunately, currently ESRI doesn't handle them properly. It's been a nightmare for me during the course of trying to establish optimized data storage standards & procedures.

I am not sure that we want to add TFW. This is something that can be handled in post-processing. The purpose of any flags we set by default or via easy flags in ODM shouldn't be the full complement of possibilities, but the most likely use cases, in this case for optimizing size and speed.

Indeed, it can be handled in post, provided the source GeoTIFF never gets the metadata trashed by being opened/saved by an incompatible product. Is it likely to be a problem? I doubt it, but since TWFs are a few bytes each and are a mirror of the metadata in the GeoTIFF, they seem like cheap insurance to (paranoid) me. I produce them locally, as part of my data storage practices, for instance.

Very interesting! @pierotofy: any opinions on this? I hate to throw quantitative data away, but if it's not significant, it is tempting to consider.

Depending upon the assessed precision, the savings can be MASSIVE with no real loss in actual measured accuracy. I'm struggling to make this part of our data storage standards here as the general feeling is more decimals are better than less. For instance, what I'm trying to push for is stored precision of 0.001ft/px (about 305micron) [we work in local state plane feet]. This is one significant figure more than anyone can reasonably see field surveyed data reaching (even with laser/LiDAR) any time soon, so any rounding happens in the noise and the real measured pheonomena should be at least a significant figure away yet.

This is for the DEMs?

Yeah, mostly. Any time you're using DEFLATE/LZW/ZSTD, really.

I am not so sure. The reason COGs have them internally isn't just a convenience but a speed bump. I agree that I generally prefer overviews to be external for the reasons listed above, but in this case, we are optimizing for speed at the cost of other things. With this as the objective, I think we would have internal overviews.

I thought that the speed increase with internal overviews was most noticeble when the dataset is being served via HTTP, and of no consequence when the data is on a local filesystem or network filesystem. Perhaps that bears some testing, or possibly a toggle?

Awesome. I can get behind this after some quick testing.
Sounds great!

Thanks for taking a look into this!

@smathermather
Copy link
Contributor Author

DEMs can achieve excellent compression while being compatible with both GDAL/ESRI via LERC_ZSTD with the MAX_Z_ERROR being set to the determined spatial resolution of the generated DEM. It is lossy, but if you set the ERROR properly, you're not losing any real precision of significance, just inflated floating point values that don't represent the real measured phenomena.

Can you give an example of this command in GDAL?

@Saijin-Naib
Copy link
Contributor

Saijin-Naib commented Oct 17, 2019

Sure! I'll give you what I use internally at the moment. Bear in mind, my settings are for data in State Plane (feet) CRSs, set to yield precision of about 305micron.

gdal_translate -of GTiff -co COMPRESS=LERC_ZSTD -co PREDICTOR=2 -co ZSTD_LEVEL=9 -co MAX_Z_ERROR=0.001

I have other options specified in my profile, but I've only included the ones directly related to LERC_ZSTD compression in the line above.

Also, this page/article was a huge influence on me and inspired me to evaluate different settings internally.
https://kokoalberti.com/articles/geotiff-compression-optimization-guide/

@smathermather
Copy link
Contributor Author

Ahh, that's a great resource. Thanks!

@smathermather
Copy link
Contributor Author

Related thread from an earlier pull request: #376

@Saijin-Naib
Copy link
Contributor

Question: What is your default tile size for GeoTIFF/overviews?

From my research, it looks like most online services assume 256x256px tiles, so I have my internal standards set to use that tile size.

@vincentsarago
Copy link

👋 Interesting discussion here.
About internal versus external overviews, having them external have the only advantage to reduce the GeoTIFF size but then you have to manage multiple files and be sure the environment is set to enable sidecar files.

At a point, if you are really worried about file size, just don't add overviews.

As the creator of https://github.com/cogeotiff/rio-cogeo I'll strongly encourage to use it instead of multiple GDAL command in bash... but this is totally subjective 😄

The reason COGs have them internally isn't just a convenience but a speed bump.

The main reason is to reduce the number of GET range requests you need to do to fetch the data.

@pierotofy
Copy link
Member

Hey @vincentsarago ✋ agree on internal overviews, better to keep a single file.

Nice work on rio-cogeo, btw! I'm hoping to start working on a tiler app for Django to include in WebODM based on https://github.com/cogeotiff/rio-tiler (which you also created) so that we can get rid of static tiles.

@smathermather
Copy link
Contributor Author

Yes, I think this now migrates into full on COG territory, which is really the domain of OpenDroneMap/WebODM#449. As non-COG web users get benefits from tiling, compression, and overviews too, I'm tempted to leave this issue open though.

@pierotofy -- looking at COGs by default from ODM and then direct use in WebODM?

@pierotofy
Copy link
Member

@smathermather that's a possibility (COGs by default), or the addition of a --cog option, to be automatically turned on via WebODM. The former seems simpler, but might penalize runtime slightly for those who don't care about overviews or don't use WebODM.

@pierotofy
Copy link
Member

COGs in WebODM: https://www.opendronemap.org/2019/12/opendronemap-update-cloud-optimized-geotiffs-plant-health-histograms-and-more/ OpenDroneMap/WebODM#746

@smathermather
Copy link
Contributor Author

Bam!

@pierotofy
Copy link
Member

I've changed the default compression from LZW to DEFLATE in 7900882

ZSTD is not included in the UbuntuGIS packages (see https://gis.stackexchange.com/questions/357801/zstd-support-for-gdal-geotiffs-with-ubuntugis) so adding it would require some changes to the way we install GDAL.

There's certainly space to optimize DEM compression further.

@Saijin-Naib
Copy link
Contributor

Seems salient:
https://twitter.com/EvenRouault/status/1313157425743175680

@smathermather
Copy link
Contributor Author

Great hash tag too.

@Saijin-Naib
Copy link
Contributor

Comparison with mf_Obriens processed at Double Quality

GSD approximately 4.34cm/px

Ortho Resolution:
87243x48708


ODM Current Output:
Float32
DEFLATE
Size:
3.52GB
image


One Significant Figure more Precise than GSD
LERC_ZSTD
Float32
ZSTD Level 22
LERC_Z 0.001m
Size:
2.32GB
image


Same Precision as GSD
LERC_ZSTD
Float32
ZSTD Level 22
LERC_Z 0.0434m
Size:
971MB
image


Half Precision of GSD
LERC_ZSTD
Float32
ZSTD Level 22
LERC_Z 0.0868m
Size:
868MB
image

@pierotofy
Copy link
Member

Can this be closed? I think compression is in place with the latest version (just not sure if all the suggestions here have been addressed).

@smathermather
Copy link
Contributor Author

Close it. We're probably doing well enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants